← Back to Docs20250627

026 SLEEP FUNCTIONS REMOVAL REPORT

Documentation for 026_SLEEP_FUNCTIONS_REMOVAL_REPORT from the Foundation repository.

SLEEP FUNCTIONS REMOVAL REPORT

CRITICAL ISSUE IDENTIFIED

Found 16 Process.sleep() calls in production lib/ code that need immediate removal and proper concurrent implementation.

ROOT CAUSE ANALYSIS

The sleep functions were introduced as LAZY HACKS instead of proper:

  • Process synchronization using GenServer calls/casts
  • OTP supervision and process linking
  • Message passing with receive blocks
  • Timer-based scheduling with Process.send_after
  • Proper async/await patterns

SLEEP FUNCTIONS TO FIX (ONE AT A TIME)

1. /lib/foundation/service_registry.ex:484

Current Code:

Process.sleep(10)

Problem: Blocking wait in service lifecycle management Fix Strategy: Use GenServer synchronization with proper state management Test to Run: mix test test/foundation/service_registry_test.exs

2. /lib/foundation/utils.ex:582

Current Code:

Process.sleep(delay)

Problem: Blocking delay in retry mechanism Fix Strategy: Use Process.send_after with receive block for non-blocking delays Test to Run: mix test test/foundation/utils_test.exs

3. /lib/foundation/process_registry.ex:769

Current Code:

Process.sleep(50)

Problem: Race condition hack in process cleanup Fix Strategy: Use proper process monitoring with Process.monitor and receive Test to Run: mix test test/foundation/process_registry_test.exs

4. /lib/foundation/process_registry.ex:780

Current Code:

Process.sleep(200)

Problem: Arbitrary wait for process termination Fix Strategy: Use Process.exit with trap_exit and proper shutdown handling Test to Run: mix test test/foundation/process_registry_test.exs

5. /lib/foundation/mabeam/load_balancer.ex:292

Current Code:

:test_mode_agent -> spawn(fn -> Process.sleep(:infinity) end)

Problem: Infinite sleep for test agents Fix Strategy: Use GenServer with proper state management and test hooks Test to Run: mix test test/foundation/mabeam/load_balancer_test.exs

6. /lib/foundation/infrastructure/pool_workers/http_worker.ex:313

Current Code:

Process.sleep(delay_ms)

Problem: Blocking HTTP retry delays Fix Strategy: Use :timer.apply_after for non-blocking scheduled retries Test to Run: mix test test/foundation/infrastructure/pool_workers/http_worker_test.exs

7-11. /lib/foundation/infrastructure/pool_workers/http_worker.ex (Lines 327,340,359,379,384)

Current Code:

Process.sleep(Enum.random(10..100))

Problem: Random blocking delays in HTTP operations Fix Strategy: Use Process.send_after with random intervals and message handling Test to Run: mix test test/foundation/infrastructure/pool_workers/http_worker_test.exs

12. /lib/foundation/mabeam/coordination.ex:231

Current Code:

Process.sleep(50)

Problem: Race condition hack in coordination protocol Fix Strategy: Use proper GenServer synchronization and state machines Test to Run: mix test test/foundation/mabeam/coordination_test.exs

13. /lib/foundation/mabeam/coordination.ex:4096

Current Code:

Process.sleep(total_delay)

Problem: Blocking delay in coordination timing Fix Strategy: Use GenServer timer with handle_info/2 for scheduled operations Test to Run: mix test test/foundation/mabeam/coordination_test.exs

14. /lib/foundation/beam/processes.ex:194

Current Code:

Process.sleep(50)

Problem: Blocking wait in process lifecycle management Fix Strategy: Use Process.monitor with proper message handling Test to Run: mix test test/foundation/beam/processes_test.exs

15. /lib/foundation/beam/processes.ex:209

Current Code:

Process.sleep(10)

Problem: Race condition hack in process startup Fix Strategy: Use GenServer.start_link with proper initialization callbacks Test to Run: mix test test/foundation/beam/processes_test.exs

IMPLEMENTATION APPROACH (TEST-DRIVEN)

For Each Sleep Function:

  1. READ the specific file and understand the context
  2. IDENTIFY why the sleep was used (race condition, timing, retry, etc.)
  3. IMPLEMENT proper concurrent solution:
    • GenServer calls for synchronization
    • Process.monitor for process lifecycle
    • Process.send_after for scheduling
    • receive blocks for message handling
  4. TEST the specific module immediately after changes
  5. VERIFY no regressions in related functionality

Proper Concurrency Patterns to Use:

Instead of Process.sleep for retries:

# BAD
Process.sleep(delay)
retry_operation()

# GOOD  
Process.send_after(self(), :retry, delay)
receive do
  :retry -> retry_operation()
end

Instead of Process.sleep for synchronization:

# BAD
Process.sleep(50)  # hope process is ready

# GOOD
GenServer.call(process, :wait_ready, timeout)

Instead of Process.sleep for process cleanup:

# BAD
Process.exit(pid, :kill)
Process.sleep(100)  # hope it's dead

# GOOD
ref = Process.monitor(pid)
Process.exit(pid, :shutdown)
receive do
  {:DOWN, ^ref, :process, ^pid, _reason} -> :ok
after 5000 -> 
  Process.exit(pid, :kill)
end

EXPECTED OUTCOMES

  • Zero Process.sleep calls in production lib/ code
  • Proper OTP-compliant concurrent patterns
  • Faster test execution (no arbitrary waits)
  • More reliable process coordination
  • Better error handling and fault tolerance

NOTES

  • The /lib/foundation/utils.ex:399 sleep is in a doctest example and can remain
  • All test files will be addressed separately after lib/ fixes are complete
  • Each fix must be tested immediately to prevent regressions
  • No shortcuts, no placeholders - proper implementation required