Performance Considerations
Overview
This document analyzes the performance implications of generalizing the Python adapter and provides optimization strategies to maintain or improve current performance levels.
Current Performance Baseline
Based on the DSPex performance optimizations completed on 2025-07-15:
Key Metrics
- Pool Creation: ~2 seconds for multiple workers (parallel)
- Request Latency: < 10ms overhead per operation
- CircuitBreaker Tests: 0.1 seconds for 26 tests (1200x improvement)
- Worker Initialization: Parallel using
Task.async
- Memory Usage: Stable with pool size
Performance Characteristics
- Zero Artificial Delays: All
Process.sleep
removed - Right-Sized Timeouts: 10 seconds for pool operations
- Parallel Processing: Worker creation happens concurrently
- Event-Driven: No polling or busy-waiting
Performance Impact Analysis
1. Abstraction Overhead
Python Side
# Current: Direct method calls
def handle_request(self, request):
if command == "create_program":
return self.create_program(args)
# Generalized: Dynamic dispatch
def handle_request(self, request):
if command in self._handlers:
return self._handlers[command](args)
Impact: Negligible (~1-2μs per request) Mitigation: Use dict lookup instead of if/elif chains
Elixir Side
# Current: Direct module calls
DSPex.Adapters.PythonPoolV2.create_program(signature)
# Generalized: Dynamic dispatch
{:ok, adapter} = MLBridge.get_adapter(:dspy)
adapter.create_program(signature)
Impact: One-time lookup cost (~10μs) Mitigation: Cache adapter references
2. Memory Overhead
Per-Framework Memory
Current DSPy Bridge: ~50MB base + programs
Generalized:
- Base Bridge: ~30MB (shared infrastructure)
- DSPy Extension: ~20MB (framework-specific)
- Total: ~50MB (same as current)
Multi-Framework Scenarios
Single Framework: No change
Two Frameworks: +20-30MB per additional framework
N Frameworks: Base + (N × Framework overhead)
Mitigation: Lazy loading of framework-specific code
3. Startup Performance
Current Startup
1. Start Elixir adapter
2. Launch Python process
3. Import DSPy
4. Initialize bridge
Total: ~2-3 seconds
Generalized Startup
1. Start Elixir adapter (same)
2. Launch Python process (same)
3. Import base bridge (~100ms faster)
4. Lazy-load framework on first use
Total: ~1.9-2.9 seconds (slightly faster)
Optimization Strategies
1. Lazy Framework Loading
class BaseBridge:
def __init__(self):
self._framework_loaded = False
self._framework = None
def _ensure_framework(self):
if not self._framework_loaded:
self._load_framework()
self._framework_loaded = True
def handle_command(self, command, args):
# Only load framework when needed
if command != 'ping' and command != 'get_stats':
self._ensure_framework()
return super().handle_command(command, args)
2. Command Batching
defmodule DSPex.MLBridge do
@doc """
Execute multiple commands in a single round trip
"""
def batch_execute(framework, commands) do
with {:ok, adapter} <- get_adapter(framework) do
adapter.call_bridge("batch", %{commands: commands})
end
end
end
3. Adapter Pooling
defmodule DSPex.MLBridge.AdapterPool do
@moduledoc """
Pools adapter instances for different frameworks
"""
def get_or_create_adapter(framework) do
case :ets.lookup(:adapter_pool, framework) do
[{^framework, adapter}] ->
{:ok, adapter}
[] ->
with {:ok, adapter} <- create_adapter(framework) do
:ets.insert(:adapter_pool, {framework, adapter})
{:ok, adapter}
end
end
end
end
4. Optimized Protocol
# Add binary protocol option for performance-critical paths
class BaseBridge:
def __init__(self, protocol='json'):
self.protocol = protocol
if protocol == 'msgpack':
import msgpack
self.encode = msgpack.packb
self.decode = msgpack.unpackb
else:
self.encode = json.dumps
self.decode = json.loads
5. Connection Reuse
defmodule DSPex.PythonBridge.ConnectionPool do
@moduledoc """
Reuse Python process connections across frameworks
"""
def get_connection(framework) do
# Reuse existing Python process if compatible
case find_compatible_connection(framework) do
{:ok, conn} -> {:ok, conn}
:not_found -> create_new_connection(framework)
end
end
end
Framework-Specific Optimizations
DSPy Optimizations
class DSPyBridge(BaseBridge):
def __init__(self):
super().__init__()
# Pre-compile common signatures
self._signature_cache = {}
def create_signature(self, config):
cache_key = hash(str(config))
if cache_key in self._signature_cache:
return self._signature_cache[cache_key]
# Create and cache
LangChain Optimizations
class LangChainBridge(BaseBridge):
def __init__(self):
super().__init__()
# Reuse LLM connections
self._llm_pool = {}
def get_or_create_llm(self, config):
key = (config['type'], config['model'])
if key not in self._llm_pool:
self._llm_pool[key] = self._create_llm(config)
return self._llm_pool[key]
Transformers Optimizations
class TransformersBridge(BaseBridge):
def __init__(self):
super().__init__()
# Model caching with LRU eviction
self._model_cache = LRUCache(maxsize=3)
def load_model(self, model_name):
if model_name in self._model_cache:
return self._model_cache[model_name]
# Load and cache with memory tracking
Benchmarking Strategy
1. Micro-benchmarks
defmodule BridgeBenchmark do
use Benchfella
@signature %{name: "Test", inputs: %{}, outputs: %{}}
bench "current adapter" do
DSPex.Adapters.PythonPoolV2.create_program(@signature)
end
bench "generalized adapter" do
{:ok, adapter} = DSPex.MLBridge.get_adapter(:dspy)
adapter.create_program(@signature)
end
bench "generalized with cache" do
adapter = Process.get(:cached_adapter) ||
case DSPex.MLBridge.get_adapter(:dspy) do
{:ok, a} -> Process.put(:cached_adapter, a); a
end
adapter.create_program(@signature)
end
end
2. End-to-End Performance Tests
defmodule E2EPerformanceTest do
test "multi-framework performance" do
# Measure framework switching overhead
frameworks = [:dspy, :langchain, :custom]
results = Enum.map(frameworks, fn framework ->
{time, _} = :timer.tc(fn ->
{:ok, adapter} = MLBridge.get_adapter(framework)
# Perform operations
end)
{framework, time}
end)
# Assert reasonable switching time
Enum.each(results, fn {_, time} ->
assert time < 100_000 # 100ms max
end)
end
end
3. Load Testing
defmodule LoadTest do
def run_load_test(framework, concurrent_users, duration) do
MLBridge.ensure_started(framework)
tasks = for _ <- 1..concurrent_users do
Task.async(fn ->
run_user_simulation(framework, duration)
end)
end
results = Task.await_many(tasks, duration + 5000)
analyze_results(results)
end
end
Memory Management
1. Framework Lifecycle
class BaseBridge:
def cleanup_framework(self):
"""Called when switching frameworks or on shutdown"""
if hasattr(self, '_framework'):
# Framework-specific cleanup
self._cleanup_framework_resources()
# Clear references
self._framework = None
# Force garbage collection
import gc
gc.collect()
2. Resource Pooling
defmodule DSPex.MLBridge.ResourceManager do
@moduledoc """
Manages resources across frameworks
"""
def configure_limits(framework, limits) do
# Set per-framework resource limits
%{
max_memory: limits[:max_memory] || "2GB",
max_models: limits[:max_models] || 10,
max_connections: limits[:max_connections] || 4
}
end
def enforce_limits(framework) do
# Monitor and enforce resource usage
current = get_resource_usage(framework)
limits = get_limits(framework)
if current.memory > limits.max_memory do
cleanup_oldest_resources(framework)
end
end
end
3. Shared Resource Optimization
# Share common resources across frameworks
class ResourcePool:
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
cls._instance.initialize()
return cls._instance
def initialize(self):
self.tokenizers = {} # Shared across frameworks
self.embeddings = {} # Cached embeddings
self.connections = {} # HTTP connection pooling
Production Deployment
1. Monitoring Metrics
defmodule DSPex.MLBridge.Metrics do
def track_performance do
:telemetry.attach_many(
"ml-bridge-performance",
[
[:ml_bridge, :adapter, :get],
[:ml_bridge, :command, :execute],
[:ml_bridge, :framework, :switch]
],
&handle_event/4,
nil
)
end
defp handle_event(event, measurements, metadata, _) do
# Track framework-specific metrics
StatsD.histogram(
"ml_bridge.#{metadata.framework}.#{event}",
measurements.duration
)
end
end
2. Performance Alerts
# prometheus_rules.yml
groups:
- name: ml_bridge_performance
rules:
- alert: HighLatency
expr: ml_bridge_request_duration_p99 > 100
for: 5m
annotations:
summary: "ML Bridge high latency"
- alert: MemoryLeak
expr: rate(ml_bridge_memory_usage[5m]) > 10485760 # 10MB/5min
for: 10m
annotations:
summary: "Possible memory leak in ML Bridge"
3. Capacity Planning
defmodule DSPex.MLBridge.Capacity do
@doc """
Calculate required resources for multi-framework deployment
"""
def calculate_requirements(frameworks, expected_load) do
base_memory = 100 # MB for base system
framework_memory = Enum.sum(
Enum.map(frameworks, &framework_memory_requirement/1)
)
pool_memory = expected_load.concurrent_requests * 10 # MB per worker
%{
total_memory: base_memory + framework_memory + pool_memory,
recommended_workers: calculate_workers(expected_load),
cpu_cores: calculate_cpu_requirement(frameworks, expected_load)
}
end
end
Performance Best Practices
1. Framework Selection
- Use single framework when possible
- Load frameworks based on actual usage patterns
- Consider framework-specific deployment for heavy users
2. Caching Strategy
- Cache adapter references in process dictionary
- Use ETS for cross-process adapter sharing
- Implement TTL for cached resources
3. Connection Management
- Reuse Python processes across compatible frameworks
- Implement connection pooling at framework level
- Monitor connection health proactively
4. Resource Cleanup
- Implement aggressive garbage collection
- Clear unused framework resources
- Monitor memory usage per framework
5. Deployment Options
# Option 1: Single node, multiple frameworks
config :dspex, :deployment_mode, :single_node
# Option 2: Framework-specific nodes
config :dspex, :deployment_mode, :multi_node
config :dspex, :node_mapping, %{
dspy: :"dspy@node1",
langchain: :"langchain@node2"
}
# Option 3: Hybrid with routing
config :dspex, :deployment_mode, :hybrid
config :dspex, :routing_strategy, :least_loaded
Conclusion
The generalized architecture can maintain or improve current performance through:
- Lazy loading reduces startup time
- Caching eliminates redundant lookups
- Resource sharing reduces memory overhead
- Parallel initialization maintains fast pool creation
With proper optimization, the multi-framework support adds minimal overhead (< 5%) while providing significant architectural benefits.