JIDO Architectural Recovery Plan

Executive Assessment: SALVAGEABLE

Verdict: The architecture is NOT fundamentally broken - it’s fundamentally misaligned. The dialyzer violations indicate disconnected layers, not unsalvageable code.

Scope Reality Check:

13 implementation files (~2000 lines)
7 files with Foundation dependencies
Core Jido framework appears solid
Issues concentrated in integration layers

Recommendation: SYSTEMATIC RECONSTRUCTION over abandonment.

Root Cause Analysis

This isn’t “bad code” - it’s evolutionary architecture without migration:

Phase 1: Pure Jido agents (working)
Phase 2: Foundation integration layer added (partially working)
Phase 3: Advanced coordination grafted on (broken)
Phase 4: Tests patched to pass (lying)

The 200+ dialyzer violations represent accumulated technical debt from architectural evolution without refactoring.

Fixability Assessment

✅ FIXABLE ISSUES (80% of violations)

Phantom Dependencies - Replace or implement missing modules
Type Contract Lies - Align specs with reality
Bridge Result Mismatch - Fix return values
Macro Hygiene - Fix __MODULE__ usage
Queue Type Confusion - Consistent opaque types

⚠️ REQUIRES REDESIGN (20% of violations)

Coordination Result Propagation - Simplify expectations
Multi-agent Workflow - Accept current limitations
Circuit Breaker Integration - Build or remove

Systematic Fix Strategy

Phase 1: Foundation Reality Check (2-3 hours)

Goal: Establish what Foundation modules actually exist and work

# Audit existing Foundation modules
find lib/foundation -name "*.ex" -exec grep -l "defmodule Foundation" {} \;

# Create missing critical modules or remove references

Actions:

Foundation.Telemetry: Either implement or replace with :telemetry
Foundation.CircuitBreaker: Either implement or remove protection
Foundation.Cache: Either implement or remove caching
Registry functions: Fix missing functions or use alternatives

Success Criteria: No more “Function does not exist” errors

Phase 2: Bridge Contract Honesty (3-4 hours)

Goal: Make Bridge.delegate_task return what coordinators expect

Current (broken):

def delegate_task(delegator, delegate, task) do
  send(delegate, {:mabeam_task, task.id, task})
  :ok  # <-- Missing result propagation!
end

Fixed (honest):

def delegate_task(delegator, delegate, task, opts \\ []) do
  timeout = Keyword.get(opts, :timeout, 5000)
  
  case GenServer.call(delegate, {:execute_task, task}, timeout) do
    {:ok, result} -> {:ok, result}
    error -> {:error, :delegation_failed}
  end
rescue
    _e -> {:error, :delegation_failed}
end

Success Criteria: CoordinatorAgent patterns can match Bridge results

Phase 3: Type System Truth (2-3 hours)

Goal: Align all @spec with actual function behavior

Strategy:

Run dialyzer on each file individually
Update specs to match “Success typing” dialyzer reports
Remove “extra types” that functions can never return
Add missing error cases functions actually return

Example Fix:

# Before (lying):
@spec process_task(pid(), map(), keyword()) :: {:ok, result} | {:error, reason}

# After (honest):
@spec process_task(pid(), map(), keyword()) :: {:error, :server_not_found}

Success Criteria: Zero “Type specification” warnings

Phase 4: Macro Hygiene Fix (1 hour)

Goal: Fix get_default_capabilities/0 in FoundationAgent

Current (broken):

defp get_default_capabilities() do
  case __MODULE__ do  # Always the using module!
    JidoSystem.Agents.TaskAgent -> [:task_processing]  # Never matches
    ...
  end
end

Fixed (working):

defmacro __using__(opts) do
  quote do
    # ... existing code ...
    
    defp get_default_capabilities() do
      unquote(opts[:capabilities] || [:general_purpose])
    end
  end
end

Success Criteria: Each agent gets correct capabilities

Phase 5: Queue Type Consistency (1 hour)

Goal: Ensure all queue operations use proper opaque types

Actions:

Initialize all queues with :queue.new()
Never treat queues as plain tuples
Add type guards where needed

Success Criteria: No “opaque type mismatch” errors

Phase 6: Dead Code Removal (1-2 hours)

Goal: Remove unreachable code paths and unused functions

Strategy:

Remove unused aliases
Delete functions dialyzer marks as “never called”
Remove impossible pattern matches
Simplify error handling to what’s actually possible

Success Criteria: Zero “unused” and “unreachable” warnings

Testing Strategy During Fix

Phase-by-Phase Verification

After each phase: Run dialyzer on affected files
Every 2 phases: Run full test suite
Expect breakage: Tests will fail as we fix lies
Fix tests: Update to verify actual behavior

Test Reconstruction Priorities

Keep integration tests - They verify real behavior
Fix unit tests - They often test wrong assumptions
Add type tests - Verify dialyzer fixes work
Remove impossible tests - Tests for unreachable code

Expected Timeline

Total Effort: 10-15 hours over 3-4 days

Day 1: Phases 1-2 (Foundation + Bridge fixes)
Day 2: Phases 3-4 (Types + Macros)
Day 3: Phases 5-6 (Cleanup + Testing)
Day 4: Integration verification

Risk Assessment

LOW RISK:

Foundation module fixes (well-scoped)
Type specification updates (non-breaking)
Dead code removal (by definition safe)

MEDIUM RISK:

Bridge contract changes (may break coordinators)
Macro fixes (affects all agents)

HIGH RISK:

None identified - scope is contained

Alternative: “Stable Broken” Approach

If systematic fixes seem too ambitious:

Minimal Viability Path (2-3 hours)

Silence dialyzer with strategic @dialyzer annotations
Document known issues in module docs
Add defensive guards in critical paths
Keep current test behavior (even if wrong)

This creates a “stable broken” system useful for study but not production.

Recommendation

GO FOR SYSTEMATIC RECONSTRUCTION

Rationale:

Scope is manageable (13 files, focused issues)
Root causes are architectural, not implementation
Fixes will create genuinely robust system
Alternative is perpetual technical debt

The dialyzer violations are a GIFT - they show exactly what needs fixing without guesswork.

Success Definition

Phase 1 Success: mix dialyzer runs clean (zero violations) Phase 2 Success: Tests pass and verify correct behavior
Phase 3 Success: System is architecturally honest and reliable

This is absolutely achievable given the concentrated scope of issues.