← Back to Docs

BRIDGE ISOLATION ANALYSIS

Documentation for BRIDGE_ISOLATION_ANALYSIS from the Dspex repository.

Bridge Isolation and State Management Analysis

The Real Problem: Global State and Lack of Isolation

The “Program already exists” error reveals fundamental architectural issues with the current Python bridge design that go beyond simple test failures.

Root Issues

Issue #1: Global State Management

Current Behavior: The Python bridge maintains a global program registry that persists across all operations:

# In dspy_bridge.py - GLOBAL STATE
self.programs = {}  # All programs stored here globally

Problems:

  • βœ… Program creation succeeds first time
  • ❌ Subsequent attempts with same ID fail
  • ❌ No way to reset/clear state
  • ❌ Tests interfere with each other
  • ❌ No isolation between different clients

Issue #2: Missing Concurrent Execution Support

Current Limitation: One Python bridge process serves all requests globally:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Test A      │───▢│                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚  Single Python   β”‚
β”‚ Test B      │───▢│  Bridge Process  β”‚ 
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚  (Global State)  β”‚
β”‚ Test C      │───▢│                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Problems:

  • ❌ Tests share the same execution space
  • ❌ Program ID conflicts inevitable
  • ❌ No session isolation
  • ❌ Cannot run parallel DSPy workloads safely

What Should Be Happening

Architecture Option 1: Session-Based Isolation

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Session A   │───▢│ Bridge Process   β”‚
β”‚ Programs:   β”‚    β”‚ Namespace: A     β”‚
β”‚ - prog_1    β”‚    β”‚ Programs: {A}    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  
β”‚ Session B   │───▢│ Same Process     β”‚
β”‚ Programs:   β”‚    β”‚ Namespace: B     β”‚
β”‚ - prog_1    β”‚    β”‚ Programs: {B}    β”‚ ← Same ID, different namespace
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Architecture Option 2: Process-Per-Session Isolation

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Session A   │───▢│ Python Process A β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ Isolated State   β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Session B   │───▢│ Python Process B β”‚ ← Complete isolation
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ Isolated State   β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Immediate Solutions

Solution 1: Add Reset/Clear State Command

Quick Fix: Add bridge commands to reset state:

def reset_programs(self, args):
    """Reset all program state - useful for testing"""
    self.programs.clear()
    return {"status": "reset", "cleared_programs": len(self.programs)}

def clear_program(self, args):
    """Remove specific program"""
    program_id = args.get("program_id")
    if program_id in self.programs:
        del self.programs[program_id]
        return {"status": "cleared", "program_id": program_id}
    return {"status": "not_found", "program_id": program_id}

Pros: βœ… Immediate fix for test isolation Cons: ❌ Still no concurrent session support

Solution 2: Session-Based Namespacing

Better Fix: Add session/namespace support:

class DSPyBridge:
    def __init__(self):
        self.sessions = {}  # session_id -> {programs: {}, config: {}}
        self.default_session = "default"
    
    def create_program(self, args):
        session_id = args.get("session_id", self.default_session)
        if session_id not in self.sessions:
            self.sessions[session_id] = {"programs": {}, "config": {}}
        
        programs = self.sessions[session_id]["programs"]
        # ... rest of create logic using session-specific programs

Pros:

  • βœ… Multiple isolated execution spaces
  • βœ… No ID conflicts between sessions
  • βœ… Test isolation
  • βœ… Concurrent workload support

Solution 3: Process Pool Architecture

Advanced Fix: Multiple Python processes managed by supervisor:

# In Elixir supervision tree
children = [
  {DSPex.PythonBridge.ProcessPool, []},
  {DSPex.PythonBridge.SessionManager, []}
]

# Session manager assigns processes to sessions
%{
  session_a: pid_1,
  session_b: pid_2,
  session_c: pid_1  # Can reuse if session_a is done
}

DSPy Concurrent Execution Considerations

Current DSPy Limitations

DSPy itself has thread-safety considerations:

  1. Global Configuration: dspy.configure(lm=...) affects global state
  2. Model State: Some LM backends maintain connection pools
  3. Cache State: DSPy may cache completions globally

Safe Concurrent Patterns

Pattern 1: Process-Level Isolation

# Each Python process gets its own DSPy configuration
process_A: dspy.configure(lm=openai_client_A)
process_B: dspy.configure(lm=openai_client_B)

Pattern 2: Session-Level Configuration

class SessionConfig:
    def __init__(self, session_id):
        self.lm = create_lm_for_session(session_id)
        self.cache = {}
        
    def __enter__(self):
        dspy.configure(lm=self.lm)
        
    def __exit__(self):
        dspy.configure(lm=None)

Implementation Priority

Phase 1: Immediate Test Fix (Low Risk)

# Add reset command to existing bridge
def handle_command(self, command, args):
    handlers = {
        "create_program": self.create_program,
        "execute_program": self.execute_program,
        "reset_state": self.reset_state,  # NEW
        # ... existing handlers
    }

Phase 2: Session Support (Medium Risk)

  • Add session_id parameter to all commands
  • Namespace programs by session
  • Update Elixir side to manage sessions

Phase 3: Process Pool (High Risk)

  • Multi-process architecture
  • Session-to-process assignment
  • Process lifecycle management

Risk Assessment

Current State Risks

  • ❌ Test Flakiness: Random failures due to state conflicts
  • ❌ Production Issues: Cannot run multiple workloads safely
  • ❌ Debugging Difficulty: State pollution between operations
  • ❌ Scalability Problems: Single process bottleneck

Implementation Risks

  • Phase 1: βœ… Low risk, immediate benefit
  • Phase 2: ⚠️ Requires protocol changes, but manageable
  • Phase 3: ❌ Major architecture change, high complexity

Recommendation

Start with Phase 1 (reset command) to fix immediate test issues, then evaluate Phase 2 (sessions) for proper concurrent execution support.

The core insight is correct: we need isolated execution spaces for DSPy programs, and the current global state design prevents this fundamentally.

Test Scenarios That Should Work

# Scenario 1: Parallel test execution
Task.async(fn -> test_layer_3_program("session_a") end)
Task.async(fn -> test_layer_3_program("session_b") end)

# Scenario 2: Test cleanup
test "program creation" do
  session = create_test_session()
  create_program(session, "test_prog")
  # Session automatically cleaned up
end

# Scenario 3: Production concurrent workloads  
user_a_session = start_session("user_a")
user_b_session = start_session("user_b")
# Both can have programs with same IDs safely

This analysis shows the “simple” test errors actually reveal deep architectural limitations that need addressing for robust production use.