← Back to Test old20250712

OTP ASYNC APPROACHES 20250701

Documentation for OTP_ASYNC_APPROACHES_20250701 from the Foundation repository.

OTP Async Testing Approaches - Performance & Correctness Analysis

Date: 2025-07-01
Context: Foundation SupervisedSend & DeadLetterQueue infrastructure testing
Problem: Testing asynchronous GenServer operations without Process.sleep() anti-patterns

The Community Problem

Everyone recommends Task.async/await (#4) because OTP async patterns are genuinely hard. But this creates a performance disaster when applied to hot-path infrastructure where microseconds matter.

Approach Comparison

1. Test Synchronization Messages ⭐ BEST FOR HOT PATH

# In GenServer
def handle_cast({:retry_messages, filter}, state) do
  # ... do work ...
  if test_mode?(), do: send(test_pid(), :retry_completed)
  {:noreply, state}
end

# In test
GenServer.cast(DeadLetterQueue, {:retry_messages, :all})
assert_receive :retry_completed, 1000

Performance: ⭐⭐⭐⭐⭐ EXCELLENT - Zero overhead in production
Correctness: ⭐⭐⭐⭐⭐ PERFECT - Actual completion notification
Complexity: ⭐⭐⭐ MODERATE - Requires test-aware code
Maintenance: ⭐⭐⭐⭐ GOOD - Clean separation of concerns

Pros:

  • Zero production overhead
  • Deterministic completion detection
  • No timing races
  • Scales to any load

Cons:

  • Requires test-mode detection in production code
  • Slight code complexity increase
  • Need to remember to add test notifications

Best For: Hot-path infrastructure, message queues, routing systems


2. Synchronous Test-Only Operations ⭐ BEST FOR SIMPLICITY

# Add test-only synchronous version
def retry_messages_sync(filter) do
  GenServer.call(__MODULE__, {:retry_messages_sync, filter})
end

def handle_call({:retry_messages_sync, filter}, _from, state) do
  result = do_retry_messages(filter, state)
  {:reply, result, state}
end

# In test
result = DeadLetterQueue.retry_messages_sync(:all)
assert length(result.successful) > 0

Performance: ⭐⭐⭐⭐⭐ EXCELLENT - Zero production overhead
Correctness: ⭐⭐⭐⭐⭐ PERFECT - Synchronous completion guarantee
Complexity: ⭐⭐⭐⭐⭐ SIMPLE - Just add sync version
Maintenance: ⭐⭐⭐⭐⭐ EXCELLENT - Clean API separation

Pros:

  • Zero production overhead
  • Completely deterministic
  • Clean test/prod API separation
  • Simple to implement and understand

Cons:

  • Requires maintaining parallel sync APIs
  • Doesn’t test actual async behavior
  • More code to maintain

Best For: Infrastructure with clear sync/async boundaries


3. State Polling with :sys.get_state/1 ⭐ MOST OTP-NATIVE

# In test
GenServer.cast(DeadLetterQueue, {:retry_messages, :all})

# Poll until state indicates completion
:timer.tc(fn ->
  Stream.repeatedly(fn -> 
    state = :sys.get_state(DeadLetterQueue)
    length(DeadLetterQueue.list_messages())
  end)
  |> Stream.take_while(fn message_count -> message_count > 0 end)
  |> Enum.to_list()
end)

Performance: ⭐⭐⭐ ACCEPTABLE - Polling overhead only in tests
Correctness: ⭐⭐⭐⭐ VERY GOOD - Eventually consistent
Complexity: ⭐⭐ COMPLEX - Requires understanding polling patterns
Maintenance: ⭐⭐⭐ MODERATE - Can be flaky with timing

Pros:

  • No production code changes
  • Tests actual async behavior
  • Uses standard OTP tools
  • No test mode coupling

Cons:

  • Polling creates CPU overhead
  • Can be flaky on slow systems
  • More complex test code
  • Potential for infinite loops

Best For: Complex state machines, when you can’t modify production code


4. Task.async/await 💀 COMMUNITY FAVORITE (BUT WRONG FOR HOT PATH)

# In test - What everyone recommends
task = Task.async(fn ->
  GenServer.cast(DeadLetterQueue, {:retry_messages, :all})
  Process.sleep(50)  # Still need sleep!
  DeadLetterQueue.list_messages()
end)
messages = Task.await(task, 5000)

Performance: 💀 TERRIBLE - Spawns processes for every test operation
Correctness: ⭐⭐ POOR - Still relies on sleep timing
Complexity: ⭐⭐⭐⭐ SIMPLE - Easy to understand
Maintenance: ⭐⭐ POOR - Hides the real timing problem

Pros:

  • Easy to understand
  • Community-recommended pattern
  • Timeout handling built-in
  • No production code changes

Cons:

  • SPAWNS A TASK FOR EVERY TEST OPERATION 💀
  • Still requires Process.sleep internally
  • Massive overhead for infrastructure testing
  • False sense of solving the async problem
  • Resource waste (processes, schedulers)

Why Everyone Recommends It: OTP async is hard, Task.async is easy to understand

Why It’s Wrong For Infrastructure: Hot-path code tested with Task.async becomes a performance disaster. If your infrastructure handles 10,000 messages/second, your tests spawn 10,000 tasks.


Performance Impact Analysis

Hot Path Infrastructure Characteristics

  • Message Rate: 1,000-100,000 operations/second
  • Latency Requirements: Sub-millisecond
  • Test Frequency: Every change, CI/CD pipelines
  • Memory Constraints: Must not leak under load

Task.async Impact on Infrastructure Tests

# Bad: Community recommendation for infrastructure
Enum.each(1..10_000, fn _ ->
  task = Task.async(fn ->
    SupervisedSend.send_supervised(pid, message)
    Process.sleep(1)  # Still needed!
  end)
  Task.await(task)
end)
# Result: 10,000 spawned tasks, massive scheduler overhead

Proper Approach for Infrastructure

# Good: Test synchronization
setup do
  :ets.new(:test_completions, [:public, :named_table])
  :ok
end

# In SupervisedSend
defp maybe_notify_test_completion(result) do
  if Application.get_env(:foundation, :test_mode) do
    :ets.insert(:test_completions, {self(), result})
  end
end

# In test
Enum.each(1..10_000, fn _ ->
  SupervisedSend.send_supervised(pid, message)
  assert_receive :send_completed  # Or poll ETS
end)
# Result: Zero additional processes, deterministic completion

Real-World Examples

❌ Wrong: Phoenix LiveView Testing (Common Mistake)

# What many do - spawns tasks for every DOM update
test "real-time updates" do
  Enum.each(1..1000, fn i ->
    task = Task.async(fn ->
      send(live_view, {:update, i})
      Process.sleep(10)
      render(live_view) =~ "Item #{i}"
    end)
    assert Task.await(task)
  end)
end
# Performance: Disaster

✅ Right: Phoenix LiveView Testing

# Proper approach - test what actually matters
test "real-time updates" do
  Enum.each(1..1000, fn i ->
    send(live_view, {:update, i})
    assert_receive {:live_view_updated, ^i}, 100
  end)
end
# Performance: Excellent, tests actual async behavior

Recommendations by Use Case

Hot-Path Infrastructure (SupervisedSend, Routers, Queues)

Use #1 (Test Sync Messages) - Zero production overhead, perfect correctness

Business Logic (User Operations, API Endpoints)

Use #2 (Sync Test APIs) - Clean separation, simple to understand

Complex State Machines (GenStateMachine, Multi-step Workflows)

Use #3 (State Polling) - When you can’t modify production code

Simple One-off Operations (File Processing, Email Sending)

Use #4 (Task.async) - Only when performance doesn’t matter

The Community Problem Explained

Why everyone says “use Task.async”:

  1. OTP async patterns ARE genuinely difficult
  2. Task.async is easy to understand
  3. It “works” for simple cases
  4. People don’t measure the performance impact

Why this hurts infrastructure:

  1. Infrastructure tests become slower than the actual code
  2. CI/CD pipelines waste resources on test overhead
  3. False confidence in async handling
  4. Hides real timing issues with Process.sleep

Conclusion

For Foundation SupervisedSend/DeadLetterQueue: Use approach #1 (test sync messages) because:

  • This is hot-path infrastructure
  • Performance is critical
  • We control the production code
  • Zero overhead is achievable

The Rule: If your code processes >1000 operations/second, Task.async in tests is an anti-pattern.


Author: Production OTP Infrastructure Analysis
Review: Critical path performance requirements
Next: Fix DeadLetterQueue test to use proper sync messages