Absolutely. Here is the third and final technical blueprint document for Layer 1. This document focuses on the dynamic interactions between processes, illustrating both the “happy path” and, critically, the fault-tolerant “unhappy path.”
These sequence diagrams are the most important for codifying the OTP-centric design of DSPEx
. They make the abstract concepts of supervision and “let it crash” concrete and actionable for the development team.
DSPEx
Technical Blueprint - Document 3 of 3
Topic: Dynamic Interaction & Failure Handling
Objective: To detail the sequence of messages and function calls between concurrent processes during a program’s execution. This includes modeling a successful run and, more importantly, a run where a transient I/O error occurs, demonstrating the framework’s designed resilience.
Diagram 3.1: Successful Predict.forward
Sequence Diagram
- Purpose: This diagram illustrates the “happy path” for a single, successful call to
DSPEx.Predict.forward/2
. It shows the precise, ordered flow of messages between the four key process actors. - Type: Sequence Diagram.
(executing `Predict.forward`) participant LM as LM_Client
(GenServer) participant Task as I/O Task User->>+Prog: `DSPEx.Program.forward(predict_module, inputs)` Note over Prog: A new process context for this request. Prog->>LM: `GenServer.cast(client_pid, {:request, self(), ...})` activate LM Note right of LM: Receives request asynchronously.
Immediately free to handle other requests. LM->>Task: `Task.Supervisor.async_nolink(supervisor, ...)` deactivate LM activate Task Note over Task: Task runs concurrently.
It is NOT linked to the GenServer. Task->>Task: Makes HTTP request via Tesla. Note over Task: Blocks only within this
lightweight process. Task-->>Task: Receives successful HTTP response. Task->>Task: Parses JSON into `completion_map`. Task->>Prog: `send(prog_pid, {:ok, completion_map})` deactivate Task Prog->>Prog: `receive do ... end` block gets the message. Note over Prog: Unblocks upon receiving the response. Prog->>Prog: Formats `completion_map` into `%DSPEx.Prediction{}`. Prog-->>-User: Returns `%DSPEx.Prediction{}`
Key Architectural Details:
- Asynchronous Cast: The
ProgramProcess
usesGenServer.cast
to talk to theLM
client. This is a non-blocking, “fire-and-forget” message. This is crucial because it means theLM
clientGenServer
āa shared resourceāis never blocked waiting for a single request to complete. - Decoupled Task Spawning: The
LM
client spawns theI/O Task
but crucially does not link to it (async_nolink
). This ensures that if theI/O Task
crashes, it will not crash the sharedLM
clientGenServer
. - Direct Response to Caller: The
I/O Task
is given the PID of the originalProgramProcess
and sends its response directly there. The data bypasses theLM
client on the return path, making the flow more efficient and further decoupling the client from the execution of any single request. - Blocking via
receive
: TheProgramProcess
effectively “waits” for the result by using areceive
block. This is the standard, efficient way for an Elixir process to wait for a message without consuming CPU cycles.
Diagram 3.2: Predict.forward
with I/O Failure & Supervised Retry
- Purpose: This diagram models the critical fault-tolerance path. It shows what happens when the network call fails and how the system is designed to handle it gracefully without crashing the entire application.
- Type: Sequence Diagram.
(executing `Predict.forward`) participant LM as LM_Client
(GenServer) participant Task as I/O Task User->>+Prog: `DSPEx.Program.forward(predict_module, inputs)` Prog->>LM: `GenServer.cast(client_pid, {:request, self(), ...})` activate LM LM->>Prog: `Task.Supervisor.async(supervisor_for_prog, ...)` note over Prog: The `ProgramProcess` itself
can start a supervisor to
watch its own tasks. deactivate LM activate Prog Prog->>Task: spawns & supervises deactivate Prog activate Task Task->>Task: Makes HTTP request via Tesla. Note over Task: Network fails! `:econnrefused`. Task-->>Task: **CRASHES** deactivate Task BEAM->>Prog: Sends `{:DOWN, ref, :process, pid, :econnrefused}` message. note over Prog: Supervisor automatically sends this
message because the task was linked. activate Prog Prog->>Prog: `handle_info({:DOWN, ...})` or `try/rescue` catches the failure. Prog->>Prog: Applies retry logic (e.g., waits 1 sec). loop Retry Attempt #1 Prog->>LM: `GenServer.cast(client_pid, ...)` Note over Prog: Retries the entire request sequence. LM->>Task: `Task.Supervisor.async(...)` activate Task Task->>Task: Makes HTTP request. Task-->>Task: Success! Task->>Prog: `send(prog_pid, {:ok, completion_map})` deactivate Task end Prog->>Prog: `receive` gets the successful response. Prog->>Prog: Formats `%DSPEx.Prediction{}`. Prog-->>-User: Returns `%DSPEx.Prediction{}` deactivate Prog
Correction & Refinement: The initial design had the LM
client spawn the task. A more robust OTP pattern is for the caller (ProgramProcess
) to be the one that owns and supervises the task. The LM
client’s role is simply to provide the function to be executed. This diagram reflects that corrected, more robust pattern.
Key Architectural Details:
- Caller as Supervisor: The
ProgramProcess
is responsible for supervising the tasks it needs to complete its work. It asks theLM
client for the function to run, but runs it within a task that it itself watches. This aligns responsibility with interest: theProgramProcess
is the one that cares if the API call succeeds. - “Let it Crash” in Action: The
I/O Task
is the crash boundary. When the network fails, the Elixir/BEAM philosophy is not to defensively wrap the call intry/catch
. Instead, we let the external error crash the process designed to handle it. - Failure as a Message: The supervising
ProgramProcess
receives the crash information as a standard Elixir message ({:DOWN, ...}
). This turns an exceptional event into a normal data-handling event within the supervisor’s logic. - Contained Failure: The crash is completely contained. The
User Process
is still waiting, the sharedLM
GenServer
is unaffected and serving other requests, and only theProgramProcess
has to deal with the failure. This is the cornerstone of building resilient systems in OTP. - Decoupled Retry Logic: The logic for how to retry (e.g., how many times, with what backoff) lives entirely within the
Program
module, not in the low-levelLM
client. This allows different programs to have different resilience strategies if needed.
With these three blueprint documents, the engineering team has a complete and detailed specification for building DSPEx
v0.1. The diagrams cover the static architecture, data formats, and dynamic process interactions, ensuring a shared and accurate understanding of the system’s internal mechanics before implementation begins.