DSPex Known Issues
Critical Issues
1. “Session not found: anonymous” Bug 🐛
Status: Active Bug
Severity: High
First Reported: 2025-07-15
Affects: All V2 pool operations using anonymous sessions
Problem Description
The DSPex V2 pool implementation has a session affinity routing bug that causes intermittent failures with the error message:
Session not found: anonymous
Root Cause Analysis
The bug occurs due to incorrect session management in the pool worker architecture:
- create_program is called with no explicit session_id
- It defaults to
"anonymous"
session (hardcoded inpython_pool_v2.ex:22
) create_program
gets routed to Worker A via session affinity- Worker A creates the session in its local
session_programs
dictionary - Session affinity system binds
"anonymous"
to Worker A - execute_program is called immediately after with the same
"anonymous"
session - Due to pool load balancing,
execute_program
gets routed to Worker B (different worker!) - Worker B doesn’t have the
"anonymous"
session in its local dictionary - ERROR: “Session not found: anonymous”
Why This is a Real Bug
This is not “automatically handled” behavior as might be assumed:
- Performance Impact: First requests fail unnecessarily, causing latency
- User Experience: Creates confusing error messages in logs
- Resource Waste: Retry attempts consume CPU and network resources
- Reliability Issues: Intermittent failures make the system appear unreliable
- Debugging Confusion: Makes it hard to distinguish real errors from routing issues
Affected Code Locations
Default session assignment:
/lib/dspex/adapters/python_pool_v2.ex:22
@default_session "anonymous"
Session creation in Python:
/priv/python/dspy_bridge.py:343-344
if session_id not in self.session_programs: self.session_programs[session_id] = {}
Session lookup failure:
/priv/python/dspy_bridge.py:488-489
if session_id not in self.session_programs: raise ValueError(f"Session not found: {session_id}")
Session affinity binding:
/lib/dspex/python_bridge/session_pool_v2.ex:406
SessionAffinity.bind_session(session_id, worker.worker_id)
Current “Workaround” (Masking the Bug)
The retry mechanism currently masks this issue:
- First attempt fails because it hits the wrong worker
- Retry may succeed by chance if routed back to the original worker
- This creates intermittent failures and poor user experience
Reproduction Steps
- Create a program using V2 pool (triggers anonymous session)
- Execute the program immediately after creation
- Observe “Session not found: anonymous” error on first attempt
- Retry succeeds, masking the underlying routing issue
Example Error Log
Command error: Session not found: anonymous
Traceback (most recent call last):
File ".../dspy_bridge.py", line 1320, in main
result = bridge.handle_command(command, args)
File ".../dspy_bridge.py", line 168, in handle_command
result = handlers[command](args)
File ".../dspy_bridge.py", line 489, in execute_program
raise ValueError(f"Session not found: {session_id}")
ValueError: Session not found: anonymous
[warning] Retry attempt 1/3 failed, retrying in 988ms
[info] Retry succeeded on attempt 2/3
Proposed Solutions
Fix Session Routing (Recommended)
- Anonymous operations should bypass session affinity entirely
- Or pre-create anonymous sessions on all workers
Eliminate Anonymous Sessions
- Generate unique session IDs for all operations
- Remove the concept of “anonymous” sessions
Improve Session Management
- Share session state across workers
- Or implement proper session-to-worker binding
Workaround for Users
For now, users can avoid this issue by:
- Creating explicit session IDs instead of relying on anonymous sessions
- Using the simple DSPy example instead of the V2 pool
- Expecting and handling the retry behavior
Impact Assessment
- Signature Examples: ✅ Works but with retry noise
- Concurrent Examples: ✅ Works but with retry noise
- Production Usage: ⚠️ May cause performance degradation
- User Experience: ❌ Confusing error messages
Fix Priority
HIGH - This should be fixed before any production deployment as it:
- Reduces system reliability
- Creates poor user experience
- Wastes computational resources
- Makes error diagnosis difficult
Other Known Issues
2. Logger Deprecation Warnings
Status: Minor
Severity: Low
Fix: Replace Logger.warn
with Logger.warning
3. Python Bridge Startup Latency
Status: Performance Issue
Severity: Medium
Description: Python workers take 2-3 seconds to initialize
Reporting New Issues
Please report issues at: https://github.com/anthropics/dspex/issues
Include:
- Error messages and stack traces
- Reproduction steps
- Environment information
- Expected vs actual behavior