← Back to Docs

Test failure analysis pooling

Documentation for test_failure_analysis_pooling from the Dspex repository.

Integration Test Failure Analysis - Pooling Implementation

Summary

20 integration test failures identified after implementing NimblePool-based pooling system. Failures are concentrated in three main areas:

  1. PoolWorker tests (6 failures) - Port connection and session management issues
  2. SessionPool tests (8 failures) - Pool startup conflicts and naming issues
  3. Integration/Adapter tests (6 failures) - Python bridge availability issues

Detailed Failure Analysis

Category 1: PoolWorker Port Connection Failures (Tests 1-6)

Root Cause: Port.connect :badarg errors

Failed to connect port to process: :badarg

Affected Tests:

  1. handle_checkin/4 returns worker to ready state - session affinity lost on checkin
  2. handle_checkout/4 allows anonymous checkout - Port.connect badarg error
  3. handle_checkout/4 binds worker to session on first checkout - Port.connect badarg error
  4. handle_checkin/4 handles checkin with errors - health status becomes :degraded instead of :ready
  5. handle_checkout/4 maintains session affinity for same session - Port.connect badarg error
  6. init_worker/1 initializes worker with correct state structure - health_status is :healthy instead of :ready/:initializing

Analysis: The PoolWorker tests are using self() as a mock port, but trying to call Port.connect on it, which fails because self() returns a PID, not a Port. Additionally, there’s a mismatch in expected health status values between the test expectations and actual implementation.

Key Issues:

  1. Mock port implementation using self() is incompatible with Port.connect
  2. Health status enum values changed (:ready/:initializing vs :healthy)
  3. Session affinity logic may have changed in implementation

Category 2: SessionPool Naming/Startup Conflicts (Tests 7-14)

Root Cause: Pool name registration conflicts

{:pool_start_failed, {:already_started, #PID<0.837.0>}}

Affected Tests: 7. graceful shutdown shuts down pool gracefully 8. session management handles ending non-existent session 9. stale session cleanup cleans up stale sessions 10. health check functionality performs health check 11. session management prevents duplicate session tracking 12. pool status and metrics tracks session metrics 13. session management ends sessions successfully 14. session management tracks new sessions

Analysis: All SessionPool tests are failing with the same error - a pool with the same name is already started. This suggests that:

  1. The pool naming strategy in our implementation conflicts with test setup
  2. Tests are not properly isolated and are sharing a global pool name
  3. The NimblePool registration is happening at a different level than expected

Key Issue: The SessionPool implementation appears to be using a fixed/global name for the NimblePool, causing conflicts when multiple test cases try to start their own pools.

Category 3: Python Bridge Availability (Tests 15-20)

Root Cause: Python bridge not available

{:error, "Python bridge not available"}
{:error, "Python bridge not running - check supervision configuration"}

Affected Tests: 15. complete bridge system bridge system starts and reports healthy status - status is :not_running 16. layer_3 adapter behavior compliance lists programs correctly 17. layer_3 adapter behavior compliance handles complex signatures 18. Factory pattern compliance creates correct adapters for test layers 19. layer_3 adapter behavior compliance creates programs successfully 20. layer_3 adapter behavior compliance executes programs with valid inputs

Analysis: The Python bridge is not starting properly in the test environment. This could be due to:

  1. Changed supervision tree structure with new pooling implementation
  2. Missing configuration for pooled vs single-instance mode
  3. Adapter resolution now expecting pool adapter instead of direct port adapter

Root Cause Summary

1. Port Mock Implementation Issue

Tests are using PIDs as mock ports but the actual code expects real Port references for Port.connect operations.

2. Global Pool Name Conflict

SessionPool is using a fixed atom name (likely :DSPex.PythonBridge.SessionPool_pool) causing conflicts between test cases.

3. Supervision Tree Changes

The new pooling implementation has changed how the Python bridge is supervised and started, breaking existing integration tests that expect the old single-instance model.

4. Adapter Resolution Changes

The adapter factory may now be returning PythonPool adapter instead of PythonPort adapter for layer 3 tests, but the pooling infrastructure isn’t properly initialized in test environment.

Recommendations

  1. Fix Port Mocking: Create proper mock port implementation that doesn’t call Port.connect
  2. Dynamic Pool Names: Allow SessionPool to accept custom pool names for test isolation
  3. Test Environment Setup: Update test_helper.exs to properly initialize pooling infrastructure
  4. Adapter Configuration: Ensure layer 3 tests can work with both pooled and non-pooled modes
  5. Health Status Alignment: Standardize health status values across implementation and tests