← Back to Python adapter generalization

Current architecture

Documentation for current-architecture from the Dspex repository.

Current Architecture Analysis

Overview

The DSPex Python adapter provides a robust bridge between Elixir and Python for ML operations, currently focused on DSPy. This analysis examines the existing architecture to identify reusable components and areas requiring generalization.

Architecture Layers

1. Adapter Layer (Elixir)

The adapter layer provides the high-level interface for Elixir applications:

# Direct adapter for single Python process
defmodule DSPex.Adapters.PythonPort do
  @behaviour DSPex.Adapters.Adapter
  
  def create_program(signature, options \\ [])
  def execute_program(program_id, inputs, options \\ [])
  def configure_lm(config)
end

# Pool adapter for concurrent operations
defmodule DSPex.Adapters.PythonPoolV2 do
  @behaviour DSPex.Adapters.Adapter
  # Same interface, but uses NimblePool for concurrency
end

Key Components:

  • Implements DSPex.Adapters.Adapter behaviour
  • Provides DSPy-specific operations (create_program, execute_program, configure_lm)
  • Handles configuration and error translation

2. Bridge Layer (Communication)

The bridge layer manages Python subprocess communication:

defmodule DSPex.PythonBridge.Bridge do
  use GenServer
  
  # Core functions
  def call(bridge, command, args, timeout \\ 30_000)
  def cast(bridge, command, args)
  
  # Lifecycle management
  def start_link(options)
  def stop(bridge)
end

Key Features:

  • GenServer-based process management
  • Port-based IPC with Python subprocess
  • Health monitoring and statistics
  • Automatic restart on failure

3. Protocol Layer

The protocol defines message format and serialization:

defmodule DSPex.PythonBridge.Protocol do
  # Wire format: [4-byte length header] + [JSON payload]
  
  def encode_request(id, command, args)
  def decode_response(data)
  def frame_message(payload)
  def unframe_message(data)
end

Message Format:

// Request
{
  "id": 123,
  "command": "create_program",
  "args": {
    "signature": {...},
    "options": {...}
  },
  "timestamp": "2024-01-01T00:00:00Z"
}

// Response
{
  "id": 123,
  "success": true,
  "result": {...},
  "timestamp": "2024-01-01T00:00:00Z"
}

4. Session Management

For pooled operations, session management provides stateful interactions:

defmodule DSPex.PythonBridge.SessionPoolV2 do
  # Session-aware execution
  def execute_in_session(session_id, command, args, options)
  
  # Anonymous execution (no session)
  def execute_anonymous(command, args, options)
  
  # Worker management via NimblePool
end

Features:

  • Session affinity for stateful operations
  • ETS-based session tracking
  • Automatic session cleanup
  • Worker health monitoring

5. Python Bridge Implementation

The Python side handles command execution:

class DSPyBridge:
    def __init__(self, mode="standalone", worker_id=None):
        self.mode = mode
        self.worker_id = worker_id
        self.programs = {}
        self.stats = {...}
        
    def handle_request(self, request):
        command = request['command']
        args = request.get('args', {})
        
        handlers = {
            'ping': self.ping,
            'configure_lm': self.configure_lm,
            'create_program': self.create_program,
            'execute_program': self.execute_program,
            # ... more handlers
        }
        
        if command in handlers:
            return handlers[command](args)

DSPy-Specific Components

1. Signature Handling

DSPy signatures are dynamically created from Elixir definitions:

def _create_signature_class(self, signature_config):
    """Creates a DSPy signature class from configuration"""
    class_name = signature_config['name']
    fields = {}
    
    # Create InputField/OutputField instances
    for field_name, field_config in signature_config['inputs'].items():
        fields[field_name] = dspy.InputField(
            desc=field_config.get('description', '')
        )

2. Program Management

DSPy programs are created and stored:

def create_program(self, args):
    signature = self._create_signature_class(args['signature'])
    program = dspy.Predict(signature)
    program_id = str(uuid.uuid4())
    self.programs[program_id] = program
    return {"program_id": program_id}

3. Language Model Configuration

DSPy-specific LM setup:

def configure_lm(self, args):
    lm_type = args.get('type', 'gemini')
    if lm_type == 'gemini':
        lm = dspy.Google(
            model=args.get('model', 'gemini-1.5-flash'),
            api_key=args.get('api_key')
        )
        dspy.settings.configure(lm=lm)

Generic Components

1. Port Communication

The Port-based IPC is completely generic:

  • Uses Erlang ports for subprocess management
  • 4-byte length-prefixed message framing
  • JSON for serialization (could use MessagePack, etc.)
  • Bidirectional async communication

2. Error Handling

Comprehensive error handling that’s framework-agnostic:

defmodule DSPex.PythonBridge.PoolErrorHandler do
  # Error categories
  @error_categories %{
    initialization: %{severity: :critical, recovery: :immediate_retry},
    timeout: %{severity: :high, recovery: :backoff_retry},
    communication: %{severity: :medium, recovery: :circuit_break},
    # ... more categories
  }
end

3. Pool Management

NimblePool integration is generic:

  • Worker lifecycle management
  • Connection pooling
  • Overflow handling
  • Health monitoring

4. Monitoring and Telemetry

Framework-agnostic observability:

  • Request/response metrics
  • Error rates and categories
  • Performance statistics
  • Worker health status

Coupling Analysis

Tight Coupling Points

  1. Adapter Interface: Methods like create_program, execute_program are DSPy-specific
  2. Python Handlers: Direct DSPy API usage in command handlers
  3. Signature Types: Elixir signature DSL maps to DSPy signatures
  4. Configuration: LM configuration assumes DSPy’s model setup

Loose Coupling Points

  1. Communication Protocol: JSON-based, command-driven
  2. Process Management: Generic Port handling
  3. Pool Infrastructure: NimblePool integration
  4. Error Handling: Category-based, framework-agnostic
  5. Session Management: Generic session tracking

Reusability Assessment

Highly Reusable (90%+ generic)

  • Port communication
  • Protocol handling
  • Pool management
  • Error handling and recovery
  • Session management
  • Monitoring infrastructure

Needs Abstraction (DSPy-specific)

  • Adapter interface methods
  • Python command handlers
  • Signature creation logic
  • LM configuration

Estimated Reuse Potential

  • Infrastructure: 85% reusable as-is
  • Communication: 95% reusable
  • Business Logic: 20% reusable (DSPy-specific)
  • Overall: 70% of codebase is generic

Key Insights

  1. Well-Structured Separation: The architecture already separates concerns well
  2. Protocol Flexibility: Command-based protocol makes adding new operations easy
  3. Infrastructure Maturity: Production-ready pooling, error handling, and monitoring
  4. Clear Extension Points: Command handlers and adapter interface are the main customization points
  5. Minimal Refactoring Needed: Most components can be reused with minor modifications