Current Architecture Analysis
Overview
The DSPex Python adapter provides a robust bridge between Elixir and Python for ML operations, currently focused on DSPy. This analysis examines the existing architecture to identify reusable components and areas requiring generalization.
Architecture Layers
1. Adapter Layer (Elixir)
The adapter layer provides the high-level interface for Elixir applications:
# Direct adapter for single Python process
defmodule DSPex.Adapters.PythonPort do
@behaviour DSPex.Adapters.Adapter
def create_program(signature, options \\ [])
def execute_program(program_id, inputs, options \\ [])
def configure_lm(config)
end
# Pool adapter for concurrent operations
defmodule DSPex.Adapters.PythonPoolV2 do
@behaviour DSPex.Adapters.Adapter
# Same interface, but uses NimblePool for concurrency
end
Key Components:
- Implements
DSPex.Adapters.Adapter
behaviour - Provides DSPy-specific operations (create_program, execute_program, configure_lm)
- Handles configuration and error translation
2. Bridge Layer (Communication)
The bridge layer manages Python subprocess communication:
defmodule DSPex.PythonBridge.Bridge do
use GenServer
# Core functions
def call(bridge, command, args, timeout \\ 30_000)
def cast(bridge, command, args)
# Lifecycle management
def start_link(options)
def stop(bridge)
end
Key Features:
- GenServer-based process management
- Port-based IPC with Python subprocess
- Health monitoring and statistics
- Automatic restart on failure
3. Protocol Layer
The protocol defines message format and serialization:
defmodule DSPex.PythonBridge.Protocol do
# Wire format: [4-byte length header] + [JSON payload]
def encode_request(id, command, args)
def decode_response(data)
def frame_message(payload)
def unframe_message(data)
end
Message Format:
// Request
{
"id": 123,
"command": "create_program",
"args": {
"signature": {...},
"options": {...}
},
"timestamp": "2024-01-01T00:00:00Z"
}
// Response
{
"id": 123,
"success": true,
"result": {...},
"timestamp": "2024-01-01T00:00:00Z"
}
4. Session Management
For pooled operations, session management provides stateful interactions:
defmodule DSPex.PythonBridge.SessionPoolV2 do
# Session-aware execution
def execute_in_session(session_id, command, args, options)
# Anonymous execution (no session)
def execute_anonymous(command, args, options)
# Worker management via NimblePool
end
Features:
- Session affinity for stateful operations
- ETS-based session tracking
- Automatic session cleanup
- Worker health monitoring
5. Python Bridge Implementation
The Python side handles command execution:
class DSPyBridge:
def __init__(self, mode="standalone", worker_id=None):
self.mode = mode
self.worker_id = worker_id
self.programs = {}
self.stats = {...}
def handle_request(self, request):
command = request['command']
args = request.get('args', {})
handlers = {
'ping': self.ping,
'configure_lm': self.configure_lm,
'create_program': self.create_program,
'execute_program': self.execute_program,
# ... more handlers
}
if command in handlers:
return handlers[command](args)
DSPy-Specific Components
1. Signature Handling
DSPy signatures are dynamically created from Elixir definitions:
def _create_signature_class(self, signature_config):
"""Creates a DSPy signature class from configuration"""
class_name = signature_config['name']
fields = {}
# Create InputField/OutputField instances
for field_name, field_config in signature_config['inputs'].items():
fields[field_name] = dspy.InputField(
desc=field_config.get('description', '')
)
2. Program Management
DSPy programs are created and stored:
def create_program(self, args):
signature = self._create_signature_class(args['signature'])
program = dspy.Predict(signature)
program_id = str(uuid.uuid4())
self.programs[program_id] = program
return {"program_id": program_id}
3. Language Model Configuration
DSPy-specific LM setup:
def configure_lm(self, args):
lm_type = args.get('type', 'gemini')
if lm_type == 'gemini':
lm = dspy.Google(
model=args.get('model', 'gemini-1.5-flash'),
api_key=args.get('api_key')
)
dspy.settings.configure(lm=lm)
Generic Components
1. Port Communication
The Port-based IPC is completely generic:
- Uses Erlang ports for subprocess management
- 4-byte length-prefixed message framing
- JSON for serialization (could use MessagePack, etc.)
- Bidirectional async communication
2. Error Handling
Comprehensive error handling that’s framework-agnostic:
defmodule DSPex.PythonBridge.PoolErrorHandler do
# Error categories
@error_categories %{
initialization: %{severity: :critical, recovery: :immediate_retry},
timeout: %{severity: :high, recovery: :backoff_retry},
communication: %{severity: :medium, recovery: :circuit_break},
# ... more categories
}
end
3. Pool Management
NimblePool integration is generic:
- Worker lifecycle management
- Connection pooling
- Overflow handling
- Health monitoring
4. Monitoring and Telemetry
Framework-agnostic observability:
- Request/response metrics
- Error rates and categories
- Performance statistics
- Worker health status
Coupling Analysis
Tight Coupling Points
- Adapter Interface: Methods like
create_program
,execute_program
are DSPy-specific - Python Handlers: Direct DSPy API usage in command handlers
- Signature Types: Elixir signature DSL maps to DSPy signatures
- Configuration: LM configuration assumes DSPy’s model setup
Loose Coupling Points
- Communication Protocol: JSON-based, command-driven
- Process Management: Generic Port handling
- Pool Infrastructure: NimblePool integration
- Error Handling: Category-based, framework-agnostic
- Session Management: Generic session tracking
Reusability Assessment
Highly Reusable (90%+ generic)
- Port communication
- Protocol handling
- Pool management
- Error handling and recovery
- Session management
- Monitoring infrastructure
Needs Abstraction (DSPy-specific)
- Adapter interface methods
- Python command handlers
- Signature creation logic
- LM configuration
Estimated Reuse Potential
- Infrastructure: 85% reusable as-is
- Communication: 95% reusable
- Business Logic: 20% reusable (DSPy-specific)
- Overall: 70% of codebase is generic
Key Insights
- Well-Structured Separation: The architecture already separates concerns well
- Protocol Flexibility: Command-based protocol makes adding new operations easy
- Infrastructure Maturity: Production-ready pooling, error handling, and monitoring
- Clear Extension Points: Command handlers and adapter interface are the main customization points
- Minimal Refactoring Needed: Most components can be reused with minor modifications