LIVE TEST DIVERGENCE

Documentation for LIVE_TEST_DIVERGENCE from the Ds ex repository.

Live API Divergence Strategy

Overview

DSPEx now features a sophisticated three-mode test architecture that handles the gradual divergence from seamless fallback to dedicated live API testing. This document outlines the strategic approach to implementing live API requirements while maintaining development velocity.

Test Mode Architecture

1. Pure Mock Mode (🟦 Default)

mix test              # Uses pure mock mode
mix test.mock         # Explicit pure mock mode
DSPEX_TEST_MODE=mock mix test

Characteristics:

Zero network requests
Deterministic responses
Fast execution (< 1s for full suite)
Perfect for development and CI/CD

Use Cases:

Unit tests for all components
Basic integration tests
Development workflow
CI/CD pipeline validation
Performance testing (consistent conditions)

2. Fallback Mode (🟡 Hybrid)

mix test.fallback     # Live API when available, mock fallback
DSPEX_TEST_MODE=fallback mix test

Characteristics:

Attempts live API when keys available
Seamless fallback to mock when not
Variable execution time
Validates both paths work

Use Cases:

Development with optional API keys
Integration validation with real APIs
Gradual migration testing
Local development with intermittent connectivity

3. Live API Mode (🟢 Strict)

mix test.live         # Requires API keys, fails without
DSPEX_TEST_MODE=live mix test

Characteristics:

Requires valid API keys
No fallback behavior
Real API validation
Slower, potentially flaky

Use Cases:

Pre-production validation
Real API behavior testing
Error condition validation
Performance benchmarking against live APIs

When Tests Should Require Live APIs

✅ Keep in Mock/Fallback (Recommended)

Most tests should remain in mock or fallback mode:

# Unit tests - always mock
test "program validates input fields correctly" do
  # Mock client ensures consistent behavior
end

# Basic integration - fallback works great
test "end-to-end prediction flow" do
  # setup_adaptive_client/1 handles mode switching
  {client_type, client} = MockHelpers.setup_adaptive_client(:gemini)
  # Test works in any mode
end

# Performance tests - always mock for consistency
test "high-frequency requests are handled efficiently" do
  # Force mock client for consistent timing
  {:ok, client} = DSPEx.MockClientManager.start_link(:gemini, %{
    simulate_delays: false
  })
end

🔄 Gradual Migration Candidates

Tests that may eventually need live API validation:

# Mark tests that will eventually need live APIs
@tag :future_live_required
test "complex multi-turn conversation handling" do
  # Currently works with fallback
  # May need live APIs for realistic conversation flow validation
end

@tag :live_preferred  
test "multi-provider optimization" do
  # Works in fallback but benefits from real API behavior
end

🟢 Require Live APIs (Selective)

Only implement live-only requirements for tests that cannot be validated with mocks:

# 1. Real API-specific behavior that can't be mocked
test "handles gemini rate limiting gracefully" do
  case DSPEx.TestModeConfig.get_test_mode() do
    :live ->
      # Test real rate limiting behavior
      run_rate_limit_test()
    _ ->
      skip("Requires live API mode: mix test.live")
  end
end

# 2. Model-specific response validation
test "gpt-4 vs gpt-3.5 response quality differences" do
  # Requires real model comparison
  require_live_mode!("Model comparison requires real API responses")
end

# 3. Cost and billing validation
test "token usage tracking accuracy" do
  # Real token counting validation
  require_live_mode!("Token counting requires real API usage")
end

# 4. Network and infrastructure testing
test "handles network timeouts and recovers" do
  # Real network conditions
  require_live_mode!("Network testing requires real connections")
end

Implementation Strategy

Phase 1: Current State - Three-Mode Architecture ✅

The current implementation provides:

# Adaptive test helper that respects test modes
{client_type, client} = MockHelpers.setup_adaptive_client(:gemini)

# Mode-aware client setup
case DSPEx.TestModeConfig.get_test_mode() do
  :mock -> setup_isolated_mock_client(provider)
  :fallback -> setup_adaptive_client(provider)
  :live -> setup_live_client(provider)
end

Phase 2: Conditional Live Requirements (In Progress)

Implement smart test skipping and mode validation:

defmodule MyIntegrationTest do
  use ExUnit.Case
  import DSPEx.LiveTestHelpers
  
  @tag :live_required
  test "real API rate limiting" do
    require_live_mode!("Rate limiting testing requires real API calls")
    
    # Test implementation
    run_live_rate_limit_test()
  end
  
  @tag :live_preferred
  test "multi-provider optimization" do
    case DSPEx.TestModeConfig.get_test_mode() do
      :live ->
        run_with_live_apis()
      _ ->
        IO.puts("Running with mock/fallback - results may vary from live behavior")
        run_with_adaptive_clients()
    end
  end
end

Phase 3: Dedicated Live Test Suites (Future)

Create separate test directories for different requirements:

test/
├── unit/              # Pure mock only
├── integration/       # Adaptive (mock/fallback/live)
├── live_integration/  # Live API only
│   ├── rate_limiting_test.exs
│   ├── cost_validation_test.exs
│   ├── model_comparison_test.exs
│   └── network_resilience_test.exs
└── performance/       # Mock only (consistent conditions)

Technical Implementation Patterns

1. Test Mode Utilities

defmodule DSPEx.LiveTestHelpers do
  def require_live_mode!(message \\ "Test requires live API mode") do
    unless DSPEx.TestModeConfig.live_only_mode?() do
      skip(message <> " - Use: mix test.live")
    end
  end
  
  def skip_unless_live_mode(message \\ "Skipping in mock/fallback mode") do
    unless DSPEx.TestModeConfig.live_only_mode?() do
      skip(message)
    end
  end
  
  def adaptive_test_setup(provider) do
    # Use the established pattern
    MockHelpers.setup_adaptive_client(provider)
  end
end

2. Mode-Aware Test Configuration

# config/test.exs
config :dspex, :test_modes,
  default_mode: :mock,
  live_test_timeout: 30_000,
  mock_test_timeout: 5_000,
  enable_mode_logging: true

# Conditional test configuration
case DSPEx.TestModeConfig.get_test_mode() do
  :live ->
    ExUnit.configure(timeout: 30_000, max_failures: 5)
  :fallback ->
    ExUnit.configure(timeout: 15_000)
  :mock ->
    ExUnit.configure(timeout: 5_000, max_failures: :infinity)
end

3. Performance Test Isolation

# Performance tests should always use controlled conditions
describe "performance characteristics" do
  test "request processing time is reasonable" do
    # Force mock for consistent performance testing
    {:ok, client} = DSPEx.MockClientManager.start_link(:gemini, %{
      simulate_delays: false,
      responses: :contextual
    })
    
    # Measure only local processing time
    start_time = System.monotonic_time(:millisecond)
    _result = DSPEx.MockClientManager.request(client, messages)
    end_time = System.monotonic_time(:millisecond)
    
    # Assert consistent performance
    assert (end_time - start_time) < 200
  end
end

4. Supervision and Fault Tolerance Testing

# Handle process lifecycle properly in supervision tests
describe "supervision and fault tolerance" do
  test "client crash doesn't affect other components" do
    # Start unlinked to prevent exit signal propagation
    {:ok, client} = GenServer.start(DSPEx.ClientManager, {:gemini, %{}})
    program = Predict.new(TestSignature, client)
    
    # Kill the client and handle the resulting exit gracefully
    ref = Process.monitor(client)
    Process.exit(client, :kill)
    
    receive do
      {:DOWN, ^ref, :process, ^client, :killed} -> :ok
    after
      1000 -> flunk("Client process did not die as expected")
    end
    
    # Use try/catch to handle GenServer call to dead process
    result = 
      try do
        Program.forward(program, %{question: "Post-crash test"})
      catch
        :exit, _reason -> {:error, :client_dead}
      end
    
    assert match?({:error, _}, result)
    assert Process.alive?(self())
  end
end

Migration Checklist

✅ Completed (Phase 1)

Three-mode test architecture
Adaptive client setup with setup_adaptive_client/1
Mode-aware logging and indicators
Mix task integration (test.mock, test.fallback, test.live)
Environment contamination prevention
Performance test isolation (400x faster)
Supervision test reliability

🔄 In Progress (Phase 2)

Test mode validation helpers
Mode-specific test skipping
Performance test isolation from network conditions
Supervision and fault tolerance patterns
Live test identification and tagging
Comprehensive live test helper module
Documentation for when to require live APIs

📋 Future (Phase 3)

Dedicated live test suites
CI/CD integration for selective live testing
Cost monitoring for live test runs
Live test result caching and comparison
Production environment testing patterns

Performance Results

Before Optimization

Unit performance test: 1674ms (network dependent)
Integration performance test: 13090ms (extremely slow)
Supervision tests: Failed with process management issues

After Optimization

Unit performance test: 4.3ms (400x faster)
Integration performance test: 32ms (400x faster)
Supervision tests: Reliable with proper error handling
Full test suite: < 7 seconds in mock mode

Best Practices Summary

✅ DO

Use MockHelpers.setup_adaptive_client/1 for most integration tests
Force mock clients for performance and timing tests
Implement mode-aware test skipping for live-only requirements
Use clear test categorization with tags
Validate environment cleanliness to prevent contamination
Handle process lifecycle properly in supervision tests

❌ DON’T

Don’t require live APIs unless absolutely necessary
Don’t mix live and mock clients in the same test
Don’t depend on network conditions for performance tests
Don’t create live-only tests without fallback documentation
Don’t ignore test mode configuration in client setup
Don’t use linked GenServers in crash testing without proper handling

🎯 GOAL

Maintain development velocity while providing confidence in live API integration through strategic, selective live API testing.

Conclusion

The three-mode test architecture successfully balances development velocity with integration confidence. Performance tests now run 400x faster while maintaining reliability. The system gracefully handles the spectrum from pure mock development to strict live API validation, enabling teams to choose the appropriate level of integration testing for their context.