← Back to Foundation

JIDO BUILDOUT PAIN POINTS

Documentation for JIDO_BUILDOUT_PAIN_POINTS from the Foundation repository.

Jido Buildout Pain Points and Architectural Issues

Executive Summary

Following the Jido integration analysis and upcoming Foundation infrastructure fixes, this document identifies critical pain points, architectural misalignments, and design issues that need addressing for a robust production system. The current implementation, while functional in tests, has fundamental architectural gaps that will surface during real-world usage.

Critical Pain Points

1. Mock-Driven Development Anti-Pattern

Issue: The entire JidoSystem was built against test mocks rather than real implementations.

Impact:

  • False confidence from passing tests
  • Runtime failures in production
  • Architectural assumptions based on mock behavior
  • Hidden performance characteristics

Example:

# In tests: Simple in-memory cache
Foundation.Cache.put(key, value)  # Uses Process dictionary

# In production: Should be distributed, persistent cache
Foundation.Cache.put(key, value, ttl: 3600, persistent: true)

Recommendation: Implement “Contract Tests” that verify mocks match production behavior.

2. Leaky Abstraction in Agent State Management

Issue: JidoSystem agents directly manipulate internal queue structures.

Current Code:

# Violates Jido's abstraction
defp queue_size(state) do
  case Map.get(state, :task_queue) do
    queue when is_tuple(queue) -> :queue.len(queue)  # Assumes internal structure!
    _ -> 0
  end
end

Correct Approach:

# Use Jido's API
defp queue_size(agent) do
  Jido.Agent.get_queue_length(agent)
end

Impact:

  • Dialyzer violations
  • Brittle code dependent on internals
  • Upgrade path blocked

3. Misaligned Integration Patterns

Issue: Bridge pattern creates indirection without clear value boundaries.

Current Flow:

JidoAction → Bridge → CircuitBreaker → External Service
    ↓           ↓           ↓              ↓
  (Jido)    (Bridge)   (Foundation)    (Unknown)

Problems:

  • Multiple failure points
  • Unclear responsibility boundaries
  • Telemetry data scattered across layers
  • Recovery logic duplicated

Better Pattern:

defmodule JidoSystem.Actions.ResilientAction do
  use Jido.Action
  use Foundation.Infrastructure.Resilient  # Mix-in pattern
  
  # Declarative configuration
  resilient_config do
    circuit_breaker :external_api
    retry_policy exponential: [max: 3, base: 100]
    timeout 5_000
  end
end

4. Agent Lifecycle Management Chaos

Issue: No clear ownership of agent lifecycle between Jido, Foundation, and MABEAM.

Current Reality:

  • Jido starts agents
  • Bridge registers with Foundation
  • MABEAM may also register
  • Health monitoring is optional
  • No coordinated shutdown

Missing:

defmodule JidoSystem.AgentLifecycle do
  @behaviour Foundation.Lifecycle
  
  def on_start(agent, metadata) do
    with :ok <- register_with_foundation(agent, metadata),
         :ok <- setup_monitoring(agent),
         :ok <- announce_to_cluster(agent) do
      {:ok, agent}
    end
  end
  
  def on_terminate(agent, reason) do
    # Coordinated cleanup
  end
end

5. Signal/Event System Fragmentation

Issue: Multiple overlapping event systems without clear integration.

Current Systems:

  1. Jido Signals (CloudEvents)
  2. Foundation Telemetry (:telemetry)
  3. MABEAM coordination messages
  4. OTP system messages

Problems:

  • Which system for which use case?
  • Event transformation overhead
  • Lost events between systems
  • No unified event store

Needed: Unified event bus with adapters.

6. Configuration Management Nightmare

Issue: Configuration scattered across multiple systems with no validation.

Current State:

# config/config.exs
config :foundation, registry_impl: ...
config :jido_system, task_agent_pool_size: ...
config :jido, default_timeout: ...

# Runtime configuration?
# Environment-specific overrides?
# Feature flags?
# Dynamic reconfiguration?

Missing: Centralized, validated, runtime-reloadable configuration.

7. Testing Strategy Incoherence

Issue: Tests don’t reflect production topology or failure modes.

Test Assumptions:

  • Single node
  • Synchronous execution
  • No network failures
  • Infinite resources
  • No concurrent operations

Production Reality:

  • Multi-node clusters
  • Async everything
  • Network partitions
  • Resource constraints
  • Massive concurrency

8. Resource Management Theater

Issue: Resource management exists in name only.

Current “Implementation”:

def acquire_resource(type, metadata) do
  # Just forwards to Foundation
  Foundation.ResourceManager.acquire_resource(type, metadata)
end

What’s Missing:

  • Actual resource limits
  • Backpressure mechanisms
  • Priority queues
  • Resource pooling
  • Starvation prevention

Architectural Misalignments

1. Protocol Impedance Mismatch

Foundation uses protocols for flexibility, but Jido expects concrete implementations:

# Foundation: "Use any registry implementation"
defprotocol Foundation.Registry do
  def register(impl, key, pid, metadata)
end

# Jido: "I need specific guarantees"
def start_agent(config) do
  # Assumes specific registry behavior
  # No way to specify which implementation
end

2. Supervision Tree Conflicts

Both Jido and Foundation want to own the supervision tree:

Current:                          Should Be:
Foundation.Application            JidoSystem.Application
├── Registry                      ├── Foundation.Subsystem
├── Telemetry                     │   └── (Foundation components)
└── ResourceManager               └── Jido.Subsystem
                                      └── (Jido components)
JidoSystem.Application    
└── (Separate tree)

3. Capability Discovery Limitations

The current capability model is too simplistic:

# Current: Static capabilities
capabilities: [:planning, :execution]

# Needed: Rich capability descriptions
capabilities: %{
  planning: %{
    algorithms: [:a_star, :dijkstra],
    constraints: %{max_nodes: 10_000},
    performance: %{avg_time_ms: 250}
  }
}

Design Flaws

1. Missing Backpressure Throughout

No component implements proper backpressure:

  • Agents accept unlimited instructions
  • No queue size limits
  • No load shedding
  • No flow control

2. Error Handling Inconsistency

Mix of error handling strategies:

  • Some use {:error, reason}
  • Some raise exceptions
  • Some use :telemetry events
  • Some silently fail

3. No Distributed Systems Considerations

Code assumes local execution:

  • PIDs passed around freely
  • No node failure handling
  • No partition tolerance
  • No eventual consistency

4. Performance Blind Spots

No performance considerations:

  • Unbounded queue operations
  • No batching
  • Synchronous coordination
  • No caching strategies

Future State Architecture Recommendations

1. Unified Agent Platform

Create JidoSystem.Platform that unifies all concerns:

defmodule JidoSystem.Platform do
  use JidoSystem.Platform.Builder
  
  platform do
    # Declarative configuration
    agents TaskAgent, MonitorAgent, CoordinatorAgent
    
    infrastructure do
      registry :mabeam
      telemetry :foundation
      resources :limited, max_memory: "2GB"
    end
    
    policies do
      circuit_breaker default: [threshold: 5, timeout: 30_000]
      rate_limit api: [rate: 100, per: :minute]
      retry default: [max: 3, backoff: :exponential]
    end
    
    deployment do
      clustering :kubernetes
      discovery :dns
      health_check "/health"
    end
  end
end

2. Contract-Driven Development

Define contracts between layers:

defmodule JidoSystem.Contracts.Cache do
  use Foundation.Contract
  
  contract Foundation.Cache do
    operation :get do
      input key: term(), opts: keyword()
      output {:ok, value} | {:error, reason}
      guarantees [:read_your_writes, :eventual_consistency]
      sla response_time: {95, :percentile, 10, :ms}
    end
  end
end

3. Layered Architecture with Clear Boundaries

┌─────────────────────────────────────────┐
│          Application Layer              │
│   (Business Logic, Workflows, Rules)    │
├─────────────────────────────────────────┤
│           Agent Layer                   │
│   (Jido Agents, Actions, Sensors)      │
├─────────────────────────────────────────┤
│         Coordination Layer              │
│   (MABEAM, Consensus, Discovery)       │
├─────────────────────────────────────────┤
│        Infrastructure Layer             │
│   (Foundation Services, Resources)      │
├─────────────────────────────────────────┤
│           Platform Layer                │
│   (OTP, BEAM, Distribution)            │
└─────────────────────────────────────────┘

4. Event-Driven Architecture with Event Sourcing

defmodule JidoSystem.EventStore do
  def append(stream, events)
  def read(stream, from: position)
  def subscribe(stream, handler)
  
  # Projections for different views
  def project(events, projection)
end

5. Capability-Based Security

defmodule JidoSystem.Capabilities do
  def grant(agent, capability, constraints)
  def revoke(agent, capability)
  def check(agent, capability, params)
  
  # Dynamic capability negotiation
  def negotiate(requester, provider, needs)
end

Implementation Priorities Post-Infrastructure Fix

Phase 1: Foundation Stabilization (After current fixes)

  1. Implement comprehensive contract tests
  2. Add proper error handling throughout
  3. Create unified configuration system
  4. Fix supervision tree organization

Phase 2: Core Improvements

  1. Add backpressure to all queues
  2. Implement proper resource management
  3. Create unified event bus
  4. Add distributed system primitives

Phase 3: Advanced Features

  1. Capability-based discovery
  2. Dynamic reconfiguration
  3. Multi-tenancy support
  4. Advanced monitoring/observability

Phase 4: Production Hardening

  1. Chaos engineering tests
  2. Performance optimization
  3. Security audit
  4. Documentation completion

Risk Mitigation Strategies

1. Gradual Migration Path

  • Keep existing APIs working
  • Add new features alongside old
  • Deprecate gradually
  • Provide migration tools

2. Comprehensive Testing

  • Property-based tests for protocols
  • Chaos engineering for distributed scenarios
  • Load tests for performance
  • Integration tests for real deployments

3. Observability First

  • Instrument everything
  • Standardize metrics
  • Centralize logging
  • Distributed tracing

4. Documentation as Code

  • Generate docs from code
  • Validate examples in CI
  • Architecture decision records
  • Runbooks for operations

Conclusion

The current JidoSystem implementation has significant architectural issues stemming from:

  1. Building against mocks instead of real implementations
  2. Unclear boundaries between systems
  3. Missing distributed systems considerations
  4. Incomplete resource management
  5. Fragmented event/signal systems

After completing the immediate Foundation infrastructure fixes (Cache, CircuitBreaker, Registry.count), the system needs a comprehensive architectural revision to address these pain points. The proposed future state provides a cleaner, more maintainable architecture that properly separates concerns while providing the flexibility and robustness required for production agent systems.

The key is to move from the current “collection of parts” to a cohesive platform that provides clear abstractions, proper boundaries, and production-ready primitives for building autonomous agent systems.