โ† Back to Dspex

README TESTING

Documentation for README_TESTING from the Dspex repository.

DSPex 3-Layer Testing Architecture

A comprehensive testing framework that provides fast development cycles while maintaining full system confidence through layered test execution.

๐Ÿ“ One-Paragraph Summary

Run mix test.fast for daily development (70ms), mix test.protocol for wire protocol validation, and mix test.integration for full Python DSPy integration testing. The 3-layer architecture separates pure Elixir unit tests (Layer 1) that run in milliseconds using mock adapters, from protocol validation tests (Layer 2) that verify wire communication without Python dependencies, from complete end-to-end integration tests (Layer 3) that require a full Python DSPy environment. This design enables rapid TDD cycles while maintaining comprehensive system validation - run mix test.all to execute all layers sequentially with progress reporting.

๐Ÿ—๏ธ Architecture Overview

Layer 1: Mock Adapter (Pure Elixir)

  • Purpose: Fast unit testing without external dependencies
  • Speed: ~70ms execution time
  • Concurrency: Up to 50 concurrent tests
  • Use Case: 90%+ of development testing, TDD cycles

Layer 2: Bridge Mock (Protocol Testing)

  • Purpose: Validate wire protocol and serialization
  • Speed: ~500ms execution time
  • Concurrency: Up to 10 concurrent tests
  • Use Case: Critical protocol paths, error handling

Layer 3: Full Integration (Real Python)

  • Purpose: End-to-end validation with real DSPy
  • Speed: 2-5s per test
  • Concurrency: Sequential execution
  • Use Case: Smoke tests, production confidence

๐Ÿš€ Quick Start

# Layer 1: Fast unit tests with mock adapter (~70ms)
mix test.fast

# Layer 2: Protocol testing without full Python bridge
mix test.protocol  

# Layer 3: Full integration tests with Python bridge
mix test.integration

# Run all layers sequentially with status reporting
mix test.all

Using Environment Variables (Advanced)

# Layer 1: Fast unit tests with mock adapter (default)
mix test
TEST_MODE=mock_adapter mix test

# Layer 2: Test bridge protocol without Python
TEST_MODE=bridge_mock mix test

# Layer 3: Complete E2E tests with real Python DSPy
TEST_MODE=full_integration mix test

๐Ÿ“‹ Layer Execution Script

Use the provided script for comprehensive layer testing:

# Run specific layers
./scripts/test_layers.exs mock              # Layer 1 only
./scripts/test_layers.exs bridge            # Layer 2 only  
./scripts/test_layers.exs full              # Layer 3 only

# Run all layers in sequence
./scripts/test_layers.exs all

# With timing and performance analysis
./scripts/test_layers.exs all --timing --stats

# Verbose output for debugging
./scripts/test_layers.exs mock --verbose --trace

# Parallel execution (Layers 1-2)
./scripts/test_layers.exs all --parallel

Script Options

  • --verbose, -v: Show detailed command output
  • --trace, -t: Enable ExUnit test tracing
  • --parallel, -p: Run compatible layers in parallel
  • --timing: Show execution timing
  • --stats: Show performance analysis
  • --help, -h: Display help information

โš™๏ธ Configuration

Environment Variables

# Set test mode
export TEST_MODE=mock_adapter    # Layer 1 (default)
export TEST_MODE=bridge_mock     # Layer 2
export TEST_MODE=full_integration # Layer 3

Application Configuration

# config/test_dspy.exs
config :dspex, :test_mode, :mock_adapter  # Default layer

# Layer 1: Mock Adapter
config :dspex, :mock_adapter,
  response_delay_ms: 0,        # No delay for speed
  error_rate: 0.0,             # No random errors
  deterministic: true,         # Consistent responses
  mock_responses: %{}          # Custom responses

# Layer 2: Bridge Mock Server  
config :dspex, :bridge_mock_server,
  response_delay_ms: 10,       # Minimal protocol overhead
  error_probability: 0.0,      # No random errors
  max_programs: 100,           # Program limit
  enable_logging: true         # Debug logging

๐Ÿงช Test Organization

Layer 1 Tests (Fast Unit Tests)

# test/dspex/adapters/mock_test.exs
defmodule MyBusinessLogicTest do
  use ExUnit.Case, async: true
  
  alias DSPex.Adapters.Mock
  
  test "business logic works correctly" do
    {:ok, _} = Mock.start_link()
    
    # Test your Ash business logic with deterministic responses
    signature = %{
      "inputs" => [%{"name" => "question", "type" => "string"}],
      "outputs" => [%{"name" => "answer", "type" => "string"}]
    }
    
    {:ok, _} = Mock.create_program(%{id: "test", signature: signature})
    {:ok, result} = Mock.execute_program("test", %{"question" => "test"})
    
    assert Map.has_key?(result, "answer")
  end
end

Layer 2 Tests (Protocol Validation)

# Run with: TEST_MODE=bridge_mock mix test
test "protocol handles errors correctly" do
  # Test serialization, timeouts, error scenarios
  # without needing Python dependencies
end

Layer 3 Tests (Full Integration)

# Existing integration tests - preserved as-is
# Run with: TEST_MODE=full_integration mix test
test "complete system integration" do
  # Real Python DSPy testing
  # Full end-to-end workflows
end

๐Ÿ“Š Performance Targets

LayerTarget TimeTypical UseConcurrency
Layer 1< 5 secondsDevelopment, TDDHigh (50)
Layer 2< 15 secondsProtocol validationMedium (10)
Layer 3< 2 minutesE2E confidenceSequential (1)

๐Ÿ”„ Development Workflow

1. Fast Development (Layer 1)

# Rapid TDD cycle - run constantly during development
TEST_MODE=mock_adapter mix test --stale

2. Protocol Check (Layer 2)

# Before commits - validate protocol changes
TEST_MODE=bridge_mock mix test

3. Pre-Production (Layer 3)

# Before releases - full system validation
TEST_MODE=full_integration mix test

4. Complete Validation

# Run all layers with performance monitoring
./scripts/test_layers.exs all --timing --stats

๐ŸŽฏ Testing Strategies

Test Distribution

  • 90%+ tests: Layer 1 (business logic, edge cases)
  • 5-10% tests: Layer 2 (critical protocol paths)
  • 1-5% tests: Layer 3 (smoke tests, key workflows)

Layer-Specific Testing

Layer 1: Focus on Business Logic

  • Ash resource behaviors
  • Data transformations
  • Error handling
  • Edge cases
  • Concurrent operations

Layer 2: Focus on Protocol

  • Request/response serialization
  • Error propagation
  • Timeout handling
  • Message correlation
  • Wire format validation

Layer 3: Focus on Integration

  • Real DSPy model calls
  • Complete workflows
  • Production scenarios
  • Performance under load

๐Ÿ› ๏ธ Advanced Usage

Custom Mock Scenarios

# Set up custom response scenarios
Mock.set_scenario(:custom_qa, %{
  "answer" => "Custom test response"
})

# Error injection for resilience testing
Mock.inject_error(%{
  create_program: %{
    probability: 0.1,
    type: :network_error,
    message: "Simulated network failure"
  }
})

Bridge Mock Configuration

# Add error scenarios to bridge mock
BridgeMockServer.add_error_scenario(:test_server, %{
  command: "execute_program",
  probability: 1.0,
  error_type: :timeout,
  message: "Request timeout"
})

Performance Analysis

# Detailed performance breakdown
./scripts/test_layers.exs all --timing --stats --verbose

# Example output:
# Layer 1: โœ… 0.07s (target: 5s) - 99% faster than target
# Layer 2: โœ… 0.5s (target: 15s) - 97% faster than target  
# Layer 3: โœ… 8.2s (target: 120s) - 93% faster than target

๐Ÿ› Debugging

Test Mode Detection Issues

# Check current test mode
iex -S mix
iex> DSPex.Testing.TestMode.current_test_mode()
:mock_adapter

# Verify environment
echo $TEST_MODE

Layer-Specific Debugging

# Layer 1 debugging
TEST_MODE=mock_adapter mix test --trace --max-cases=1

# Layer 2 debugging  
TEST_MODE=bridge_mock mix test --trace --verbose

# Layer 3 debugging
TEST_MODE=full_integration mix test --trace --timeout=60000

Mock Adapter State Inspection

# In tests or IEx
{:ok, _} = DSPex.Adapters.Mock.start_link()

# Check statistics
DSPex.Adapters.Mock.get_stats()

# Inspect programs
DSPex.Adapters.Mock.get_programs()

# Reset state
DSPex.Adapters.Mock.reset()

๐Ÿ”ง CI/CD Integration

GitHub Actions Example

name: Test All Layers
on: [push, pull_request]

jobs:
  test-layer-1:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: erlef/setup-beam@v1
      - run: TEST_MODE=mock_adapter mix test --max-cases=50
      
  test-layer-2:
    runs-on: ubuntu-latest  
    steps:
      - uses: actions/checkout@v3
      - uses: erlef/setup-beam@v1
      - run: TEST_MODE=bridge_mock mix test --max-cases=10
      
  test-layer-3:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: erlef/setup-beam@v1
      - uses: actions/setup-python@v4
      - run: pip install dspy-ai google-generativeai
      - run: TEST_MODE=full_integration mix test --max-cases=1

Pre-commit Hooks

#!/bin/sh
# .git/hooks/pre-commit

echo "Running Layer 1 tests..."
TEST_MODE=mock_adapter mix test || exit 1

echo "Running Layer 2 tests..."  
TEST_MODE=bridge_mock mix test || exit 1

echo "All tests passed!"

๐Ÿ“š API Reference

Test Mode Functions

# Get current test mode
DSPex.Testing.TestMode.current_test_mode()

# Set process-level override
DSPex.Testing.TestMode.set_test_mode(:bridge_mock)

# Get effective mode (with overrides)
DSPex.Testing.TestMode.effective_test_mode()

# Get layer configuration
DSPex.Testing.TestMode.get_test_config()

# Check capabilities
DSPex.Testing.TestMode.layer_supports_async?()
DSPex.Testing.TestMode.get_isolation_level()

Mock Adapter API

# Start/stop
{:ok, pid} = DSPex.Adapters.Mock.start_link()

# Bridge-compatible API
{:ok, result} = DSPex.Adapters.Mock.ping()
{:ok, result} = DSPex.Adapters.Mock.create_program(config)
{:ok, result} = DSPex.Adapters.Mock.execute_program(id, inputs)

# Test utilities
DSPex.Adapters.Mock.reset()
DSPex.Adapters.Mock.get_stats()
DSPex.Adapters.Mock.set_scenario(name, config)
DSPex.Adapters.Mock.inject_error(config)

๐Ÿค Contributing

When adding new tests:

  1. Layer 1: Add fast unit tests for all business logic
  2. Layer 2: Add protocol tests for new bridge commands
  3. Layer 3: Add integration tests only for critical workflows

Follow the testing pyramid: many Layer 1 tests, some Layer 2 tests, few Layer 3 tests.

๐Ÿ“ˆ Monitoring

Track test execution performance:

# Regular performance check
./scripts/test_layers.exs all --timing --stats

# Performance regression detection
# Layer 1 taking >5s = investigate
# Layer 2 taking >15s = investigate  
# Layer 3 taking >2m = investigate

๐ŸŽ‰ Happy Testing! The 3-layer architecture provides fast feedback loops while maintaining comprehensive system confidence.