Design Document
Overview
This design outlines the comprehensive test infrastructure overhaul for DSPex using the Supertester framework (github.com/nshkrdotcom/supertester), following the established OTP testing standards documented in docs/code-standards/comprehensive-otp-testing-standards.md.
DSPex’s current test suite has unique challenges:
- Three-layer testing architecture (mock_adapter, bridge_mock, full_integration)
- Python bridge integration via Snakepit
- Multiple LLM adapters (InstructorLite, HTTP, Mock, Python)
- Complex routing logic between native and Python implementations
- No Supertester integration leading to potential Process.sleep usage
The solution involves integrating DSPex with Supertester while preserving the three-layer architecture, implementing all tests using Supertester’s OTP helpers, and ensuring proper testing of DSPex’s unique Python bridge and routing capabilities.
Architecture
High-Level Architecture
graph TB
subgraph "Supertester Framework (External Dependency)"
UTH[UnifiedTestFoundation]
OTP[OTPHelpers]
GS[GenServerHelpers]
SUP[SupervisorHelpers]
MSG[MessageHelpers]
PERF[PerformanceHelpers]
CHAOS[ChaosHelpers]
DATA[DataGenerators]
ASSERT[Assertions]
end
subgraph "DSPex Three-Layer Test Architecture"
L1[Layer 1: Mock Adapter Tests]
L2[Layer 2: Bridge Mock Tests]
L3[Layer 3: Full Integration Tests]
end
subgraph "DSPex Components Under Test"
ROUTER[Router]
PIPELINE[Pipeline]
NATIVE[Native Modules]
PYTHON[Python Bridge]
LLM[LLM Adapters]
SNAKEPIT[Snakepit Integration]
end
UTH --> L1
UTH --> L2
UTH --> L3
OTP --> PYTHON
GS --> ROUTER
SUP --> SNAKEPIT
MSG --> L2
PERF --> PIPELINE
CHAOS --> LLM
DATA --> L1
ASSERT --> NATIVE
Three-Layer Testing Integration with Supertester
Layer 1: Mock Adapter Tests (~70ms)
- Uses Supertester.MockHelpers for fast unit testing
- Tests DSPex logic without Python or external dependencies
- Focuses on router decisions, pipeline orchestration, native modules
Layer 2: Bridge Mock Tests
- Uses Supertester.MessageHelpers for protocol testing
- Validates serialization/deserialization with Snakepit
- Tests bridge communication patterns without full Python
Layer 3: Full Integration Tests
- Uses full Supertester suite for end-to-end testing
- Tests actual Python DSPy module execution
- Validates complete workflows with real LLM adapters
Test Organization Following Code Standards
test/
├── support/
│ ├── dspex_test_helpers.ex # DSPex-specific helpers using Supertester
│ ├── mock_python_bridge.ex # Layer 2 bridge mock implementation
│ ├── test_mode_config.ex # Three-layer test mode configuration
│ └── test_schemas.ex # Test data schemas
├── unit/
│ ├── router_test.exs # Using Supertester.GenServerHelpers
│ ├── pipeline_test.exs # Using Supertester orchestration
│ ├── native/
│ │ ├── signature_test.exs # Using Supertester.Assertions
│ │ ├── template_test.exs # Using Supertester patterns
│ │ └── validator_test.exs # Using Supertester validation
│ └── llm/
│ ├── adapter_test.exs # Using Supertester adapter patterns
│ ├── client_test.exs # Using Supertester.OTPHelpers
│ └── adapters/
│ ├── instructor_lite_test.exs
│ ├── http_test.exs
│ └── mock_test.exs
├── integration/
│ ├── python_bridge_test.exs # Using Supertester.SupervisorHelpers
│ ├── snakepit_integration_test.exs # Using Supertester pool patterns
│ ├── router_integration_test.exs # Using Supertester routing helpers
│ └── pipeline_integration_test.exs # End-to-end with Supertester
├── performance/
│ ├── router_benchmark_test.exs # Using Supertester.PerformanceHelpers
│ ├── pipeline_benchmark_test.exs # Using workload patterns
│ └── python_bridge_benchmark_test.exs # Throughput testing
└── chaos/
├── python_failure_test.exs # Using Supertester.ChaosHelpers
├── adapter_chaos_test.exs # LLM adapter failure scenarios
└── pipeline_chaos_test.exs # Complex failure cascades
Components and Interfaces
1. Test Support Integration with Supertester
DSPex.TestHelpers
Purpose: DSPex-specific test utilities built on top of Supertester.
defmodule DSPex.TestHelpers do
@moduledoc """
DSPex-specific test helpers built on Supertester framework.
Supports three-layer testing architecture while following OTP standards.
"""
import Supertester.OTPHelpers
import Supertester.GenServerHelpers
import Supertester.DataGenerators
@doc """
Sets up DSPex router with appropriate test mode.
No Process.sleep, proper OTP synchronization.
"""
def with_test_router(test_mode, opts \\ [], fun) do
# Configure based on test layer
router_config = case test_mode do
:mock_adapter -> [adapters: [:mock], python_bridge: false]
:bridge_mock -> [adapters: [:mock, :bridge_mock], python_bridge: :mock]
:full_integration -> [adapters: :all, python_bridge: true]
end
router_name = unique_process_name("dspex_router")
{:ok, router} = setup_isolated_genserver(
DSPex.Router,
"router_test",
Keyword.merge(router_config, opts ++ [name: router_name])
)
# Wait for router initialization without sleep
wait_for_genserver_sync(router)
try do
fun.(router)
after
cleanup_on_exit(fn -> GenServer.stop(router) end)
end
end
@doc """
Creates isolated Python bridge for testing.
Uses Supertester patterns for Snakepit integration.
"""
def with_python_bridge(mode, fun) do
case mode do
:mock ->
# Layer 1: No Python bridge
fun.(:no_bridge)
:bridge_mock ->
# Layer 2: Mock bridge with protocol testing
bridge_name = unique_process_name("mock_bridge")
{:ok, bridge} = setup_isolated_genserver(
DSPex.MockPythonBridge,
"bridge_test",
name: bridge_name
)
try do
fun.(bridge)
after
cleanup_on_exit(fn -> GenServer.stop(bridge) end)
end
:full ->
# Layer 3: Real Snakepit integration
session_id = unique_session_id("dspex_test")
# Use Snakepit with session affinity
fun.({:snakepit, session_id})
end
end
@doc """
Waits for pipeline completion without timing dependencies.
"""
def wait_for_pipeline_completion(pipeline, timeout \\ 5000) do
wait_until(fn ->
case GenServer.call(pipeline, :get_status) do
{:completed, _} -> true
{:running, _} -> false
{:error, _} -> true # Stop waiting on error
end
end, timeout)
end
end
DSPex.MockPythonBridge
Purpose: Layer 2 bridge mock for protocol testing.
defmodule DSPex.MockPythonBridge do
@moduledoc """
Mock Python bridge for Layer 2 protocol testing.
Implements Snakepit-compatible interface with Supertester patterns.
"""
use GenServer
import Supertester.OTPHelpers
import Supertester.MessageHelpers
def start_link(opts) do
GenServer.start_link(__MODULE__, opts, name: opts[:name])
end
def init(opts) do
# Set up message tracing for protocol testing
trace_table = setup_isolated_trace_table("bridge_mock")
state = %{
responses: %{},
message_history: [],
trace_table: trace_table,
delay_config: opts[:delays] || %{}
}
{:ok, state}
end
def handle_call({:execute, module, method, args}, from, state) do
# Record for protocol verification
message = {module, method, args, System.monotonic_time()}
state = update_in(state.message_history, &[message | &1])
# Simulate processing without Process.sleep
if delay = state.delay_config[module] do
# Use OTP timer for delays
Process.send_after(self(), {:delayed_response, from, module, method}, delay)
{:noreply, state}
else
response = get_mock_response(state, module, method, args)
{:reply, response, state}
end
end
# Protocol testing helpers
def set_response(bridge, module, method, response) do
GenServer.call(bridge, {:set_response, module, method, response})
end
def get_message_history(bridge) do
GenServer.call(bridge, :get_message_history)
end
end
2. Unit Test Specifications Using Supertester
Router Tests
defmodule DSPex.RouterTest do
use ExUnit.Case, async: true
import Supertester.OTPHelpers
import Supertester.GenServerHelpers
import Supertester.Assertions
import DSPex.TestHelpers
describe "router initialization" do
test "starts with configured adapters" do
with_test_router(:mock_adapter, [adapters: [:mock, :http]], fn router ->
assert_genserver_state(router, fn state ->
MapSet.equal?(
MapSet.new(state.available_adapters),
MapSet.new([:mock, :http])
)
end)
end)
end
test "tracks adapter capabilities" do
with_test_router(:full_integration, [], fn router ->
# Register capabilities without sleep
:ok = cast_and_sync(router, {:register_capability, :python, :chain_of_thought})
:ok = cast_and_sync(router, {:register_capability, :native, :signature})
# Verify routing decisions
assert {:native, _} = call_with_timeout(router, {:route, :signature, %{}})
assert {:python, _} = call_with_timeout(router, {:route, :chain_of_thought, %{}})
end)
end
end
describe "smart routing with OTP patterns" do
setup do
{:ok, router} = setup_isolated_genserver(DSPex.Router, "routing_test")
%{router: router}
end
test "routes based on performance metrics", %{router: router} do
# Simulate performance data collection
operations = for i <- 1..10 do
{:cast, {:record_metric, :native, :template, i * 10}}
end
{:ok, _} = stress_test_server(router, operations, 1000)
# Verify routing preference
assert_genserver_state(router, fn state ->
state.metrics[:native][:template][:avg_latency] < 100
end)
end
test "handles concurrent routing requests", %{router: router} do
# Use Supertester's concurrent testing
requests = for i <- 1..20 do
{:call, {:route, :signature, %{id: i}}}
end
{:ok, results} = concurrent_calls(router, requests, 20)
# All should route successfully
assert Enum.all?(results, &match?({:ok, {_, _}}, &1))
end
end
end
Pipeline Tests
defmodule DSPex.PipelineTest do
use ExUnit.Case, async: true
import Supertester.OTPHelpers
import Supertester.GenServerHelpers
import Supertester.Assertions
import DSPex.TestHelpers
describe "pipeline execution" do
test "executes sequential steps" do
steps = [
{:native, DSPex.Native.Signature, spec: "input -> output"},
{:native, DSPex.Native.Template, template: "Result: <%= @output %>"}
]
pipeline_name = unique_process_name("pipeline")
{:ok, pipeline} = setup_isolated_genserver(
DSPex.Pipeline,
"sequential_test",
name: pipeline_name,
steps: steps
)
# Execute without timing dependencies
:ok = cast_and_sync(pipeline, {:execute, %{input: "test"}})
# Wait for completion properly
wait_for_pipeline_completion(pipeline)
# Verify result
assert {:ok, result} = call_with_timeout(pipeline, :get_result)
assert result =~ "Result: "
end
test "handles parallel execution" do
steps = [
{:parallel, [
{:native, DSPex.Native.Signature, spec: "q -> a"},
{:mock, "dspy.ChainOfThought", signature: "q -> analysis"}
]}
]
with_test_router(:bridge_mock, [], fn router ->
pipeline_name = unique_process_name("parallel_pipeline")
{:ok, pipeline} = setup_isolated_genserver(
DSPex.Pipeline,
"parallel_test",
name: pipeline_name,
steps: steps,
router: router
)
# Test parallel execution
task = Task.async(fn ->
GenServer.call(pipeline, {:execute, %{q: "test query"}})
end)
# Verify both branches execute
wait_for_genserver_sync(pipeline)
assert_genserver_state(pipeline, fn state ->
length(state.active_tasks) == 2
end)
# Get result
{:ok, result} = Task.await(task)
assert Map.has_key?(result, :native_result)
assert Map.has_key?(result, :python_result)
end)
end
end
end
3. Integration Tests Using Supertester
defmodule DSPex.PythonBridgeIntegrationTest do
use ExUnit.Case, async: true
import Supertester.SupervisorHelpers
import Supertester.OTPHelpers
import Supertester.Assertions
import DSPex.TestHelpers
@moduletag :integration
describe "Snakepit integration" do
test "executes Python DSPy modules" do
with_python_bridge(:full, fn {:snakepit, session_id} ->
# Execute DSPy module via Snakepit
result = Snakepit.execute_in_session(
session_id,
"dspy_execute",
%{
module: "dspy.ChainOfThought",
signature: "question -> answer",
inputs: %{question: "What is DSPy?"}
}
)
assert {:ok, %{answer: answer}} = result
assert is_binary(answer)
end)
end
test "handles Python process failures gracefully" do
with_python_bridge(:full, fn {:snakepit, session_id} ->
# Inject failure
{:ok, _} = Snakepit.execute_in_session(
session_id,
"inject_failure",
%{type: :process_crash, delay: 100}
)
# Attempt operation
result = Snakepit.execute_in_session(
session_id,
"dspy_execute",
%{module: "dspy.Predict", signature: "q -> a", inputs: %{q: "test"}}
)
# Should handle gracefully
assert {:error, _} = result
# Verify session recovery
wait_for_process_restart(:snakepit_worker, session_id)
# Should work after recovery
result2 = Snakepit.execute_in_session(
session_id,
"dspy_execute",
%{module: "dspy.Predict", signature: "q -> a", inputs: %{q: "test"}}
)
assert {:ok, _} = result2
end)
end
end
end
4. Performance Tests Using Supertester
defmodule DSPex.RouterBenchmarkTest do
use ExUnit.Case, async: false
import Supertester.PerformanceHelpers
import DSPex.TestHelpers
@moduletag :benchmark
describe "routing performance" do
test "measures routing overhead" do
with_test_router(:mock_adapter, [], fn router ->
# Benchmark routing decisions
result = benchmark_operations([
{"simple route", fn ->
GenServer.call(router, {:route, :signature, %{}})
end},
{"complex route", fn ->
GenServer.call(router, {:route, :chain_of_thought, %{
constraints: [:native_preferred, :low_latency]
}})
end}
], 1000)
# Routing should be fast
assert_performance_within_bounds(result, %{
max_time: 100, # microseconds
max_memory: 1000 # bytes
})
end)
end
test "routing under load" do
with_test_router(:full_integration, [], fn router ->
# Test different load patterns
for pattern <- [:steady, :burst, :ramp_up] do
result = workload_pattern_test(
fn -> GenServer.call(router, {:route, :predict, %{input: "test"}}) end,
pattern,
duration: 5000
)
# Should maintain performance
assert result.p95_latency < 10_000 # 10ms
assert result.error_rate < 0.01 # <1% errors
end
end)
end
end
end
5. Chaos Tests Using Supertester
defmodule DSPex.AdapterChaosTest do
use ExUnit.Case, async: true
import Supertester.ChaosHelpers
import Supertester.Assertions
import DSPex.TestHelpers
@moduletag :chaos
describe "adapter failure resilience" do
test "handles LLM adapter failures" do
with_test_router(:full_integration, [], fn router ->
# Configure multiple adapters
adapters = [:instructor_lite, :http, :python]
# Get initial state
{:ok, initial_state} = GenServer.call(router, :get_state)
# Inject adapter failures
chaos_scenario = [
{:adapter_timeout, :instructor_lite, 5000},
{:adapter_error, :http, :rate_limit},
{:adapter_crash, :python, 1000}
]
{:ok, chaos_result} = chaos_test_orchestrator(
chaos_scenario,
:parallel,
target: router
)
# Router should adapt
assert_genserver_responsive(router)
# Should fallback to working adapters
{:ok, {adapter, _}} = GenServer.call(router, {:route, :predict, %{}})
assert adapter in [:mock, :http] # Fallback adapters
# Verify recovery
{:ok, recovered} = verify_system_recovery(initial_state, timeout: 10_000)
assert recovered
end)
end
test "pipeline resilience to cascading failures" do
steps = [
{:native, DSPex.Native.Signature, spec: "input -> parsed"},
{:python, "dspy.ChainOfThought", signature: "parsed -> analysis"},
{:native, DSPex.Native.Template, template: "<%= @analysis %>"}
]
with_test_router(:full_integration, [], fn router ->
pipeline_name = unique_process_name("chaos_pipeline")
{:ok, pipeline} = setup_isolated_genserver(
DSPex.Pipeline,
"chaos_test",
name: pipeline_name,
steps: steps,
router: router
)
# Inject multiple failures
inject_process_failure([router, pipeline], :random_crash, delay: 100)
simulate_network_corruption(2000, :packet_loss)
# Attempt execution
result = GenServer.call(pipeline, {:execute, %{input: "test"}}, 10_000)
# Should degrade gracefully
case result do
{:ok, partial_result} ->
# May have partial results
assert Map.has_key?(partial_result, :parsed)
{:error, reason} ->
# Should provide clear error
assert is_binary(reason) or is_atom(reason)
end
# System should recover
assert_resilience_to_chaos(router, chaos_scenario)
end)
end
end
end
Test Execution Strategy
Three-Layer Testing with Supertester
# config/test.exs
config :dspex,
test_mode: System.get_env("TEST_MODE", "mock_adapter")
# Supertester automatically handles test isolation
config :ex_unit,
capture_log: true
# Layer-specific configuration
case System.get_env("TEST_MODE") do
"mock_adapter" ->
config :dspex, adapters: [:mock]
config :snakepit, enabled: false
"bridge_mock" ->
config :dspex, adapters: [:mock, :bridge_mock]
config :snakepit, enabled: false
"full_integration" ->
config :dspex, adapters: :all
config :snakepit, enabled: true
end
CI/CD Integration
# .github/workflows/test.yml
test:
strategy:
matrix:
test_layer: [fast, protocol, integration]
steps:
- name: Run DSPex tests with Supertester
run: |
mix deps.get
mix test.${{ matrix.test_layer }}
- name: Check test quality
run: |
mix test.pattern_check # No Process.sleep
mix test.isolation_check # Proper Supertester usage
Quality Gates
- All tests must use
async: true
with Supertester isolation - Zero Process.sleep usage (enforced by pattern check)
- All three layers must pass independently
- Performance benchmarks must not regress >10%
- Chaos tests must demonstrate recovery
Success Metrics
Quantitative Metrics
- Test coverage >95% across all DSPex modules
- Layer 1 execution <100ms total
- Layer 2 execution <500ms total
- Layer 3 execution <5s total
- Zero Process.sleep verified by tooling
- All tests use Supertester helpers
Qualitative Metrics
- Three-layer architecture preserved and enhanced
- Clear layer boundaries with appropriate testing
- Consistent with code standards in comprehensive-otp-testing-standards.md
- Python bridge testing properly handles async operations
- Router testing validates smart decisions
- Educational value for DSPex architecture