Of course. Here is a comprehensive series of technical diagrams that illustrate the architecture, core workflows, and key features of the DSPex implementation. The diagrams are designed to provide a clear understanding of the system, from high-level architecture to detailed operational flows.
Introduction
These diagrams provide a visual guide to the DSPex library’s architecture and functionality. They cover the main processes: making a prediction, evaluating a program’s performance, and optimizing a program with few-shot demonstrations. The diagrams also detail the supporting services like configuration, telemetry, and the flexible testing framework.
Diagram 1: High-Level System Architecture
This diagram shows the main components of the DSPex library and their relationships. It illustrates how the core abstractions (Program
, Signature
) are used by key modules and how supporting services integrate with the underlying “Foundation” library and external APIs.
(Behavior)"] Signature["DSPEx.Signature
(Behavior)"] Example["DSPEx.Example
(Struct)"] end subgraph KM["Key Modules"] Predict[DSPEx.Predict] Evaluate[DSPEx.Evaluate] Teleprompter[DSPEx.Teleprompter] end subgraph CL["Client Layer"] Client[DSPEx.Client] ClientManager["DSPEx.ClientManager
(GenServer)"] end subgraph SA["Services & Application"] App[Dspex.Application] -- supervises --> Supervisor[Dspex.Supervisor] Supervisor -- starts --> ConfigManager[Services.ConfigManager] Supervisor -- starts --> TelemetrySetup[Services.TelemetrySetup] Supervisor -- starts --> Finch[Finch HTTP Pool] end Adapter[DSPEx.Adapter] end subgraph ES["External Systems"] Foundation[Foundation Library] LLM_APIs["LLM APIs
(Gemini, OpenAI)"] end A --> Predict A --> Evaluate A --> Teleprompter Predict -- implements --> Program Predict -- uses --> Signature Predict -- uses --> Adapter Predict -- uses --> Client Evaluate -- uses --> Program Evaluate -- uses --> Example Teleprompter -- uses --> Program Teleprompter -- uses --> Example Teleprompter -- produces --> OptimizedProgram[DSPEx.OptimizedProgram] Client -- uses --> ClientManager ClientManager -- uses --> Finch ClientManager -- calls --> LLM_APIs Adapter -- used by --> Predict ConfigManager -- integrates with --> Foundation TelemetrySetup -- integrates with --> Foundation Client -- integrates with --> Foundation %% Elixir-inspired styling classDef userLayer fill:#4e2a8e,stroke:#24292e,stroke-width:2px,color:#fff classDef coreAbstraction fill:#7c4dbd,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef keyModule fill:#9b72d0,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef clientLayer fill:#b89ce0,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef serviceLayer fill:#d4c5ec,stroke:#4e2a8e,stroke-width:1px,color:#24292e classDef externalSystem fill:#f5f5f5,stroke:#666,stroke-width:1px,color:#24292e classDef adapter fill:#fdfbf7,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef subgraphTitleTop fill:#e6e0f0,stroke:#b89ce0,stroke-width:2px,color:#24292e classDef subgraphTitleNested fill:#f2f0f7,stroke:#d4c5ec,stroke-width:1px,color:#24292e class A userLayer class Program,Signature,Example coreAbstraction class Predict,Evaluate,Teleprompter keyModule class Client,ClientManager clientLayer class App,Supervisor,ConfigManager,TelemetrySetup,Finch serviceLayer class Foundation,LLM_APIs externalSystem class Adapter,OptimizedProgram adapter class UI,CORE,ES subgraphTitleTop class CA,KM,CL,SA subgraphTitleNested %% Darker arrow styling for better visibility linkStyle default stroke:#24292e,stroke-width:2px
Workflow Description:
- A user’s application interacts with the primary modules:
Predict
for inference,Evaluate
for performance measurement, andTeleprompter
for optimization. - These modules are built upon core abstractions: a
Program
defines an executable unit, aSignature
defines its I/O contract, and anExample
holds data. - The
Predict
module uses anAdapter
to format data and aClient
to communicate with externalLLM APIs
. - The
ClientManager
provides a stateful, resilient GenServer-based alternative to the functionalClient
. It uses aFinch
pool for HTTP requests. - The entire system is started as an OTP
Application
, which supervises the key background services (ConfigManager
,TelemetrySetup
). - These services integrate deeply with an external
Foundation
library to provide centralized configuration, telemetry, and other infrastructure services.
Diagram 2: Core Prediction Flow (DSPEx.Predict
)
This sequence diagram details the steps involved when making a single prediction. It shows the flow of control from the Predict
program through the Adapter
and Client
layers to the LLM API and back.
rate limiting, mocking, etc. Client->>LLM_API: POST /v1/generateContent LLM_API-->>Client: 200 OK (JSON Response) Client->>Telemetry: :telemetry.execute([:dspex, :client, :request, :stop]) Client-->>Predict: {:ok, api_response} Predict->>Adapter: parse_response(signature, api_response) Adapter-->>Predict: {:ok, outputs} Predict->>Telemetry: :telemetry.execute([:dspex, :predict, :stop]) Predict-->>User: {:ok, outputs}
Workflow Description:
- Initiation: A user calls
DSPEx.Predict.forward
with a program instance and input data. - Telemetry Start: The
Predict
module emits a:start
telemetry event. - Format Request:
Predict
callsDSPEx.Adapter.format_messages
. The adapter uses the program’sSignature
and any few-shotdemos
to create a list of messages suitable for the LLM API. - Client Request: The formatted messages are passed to
DSPEx.Client.request
. - API Call: The
Client
(orClientManager
) handles the HTTP call to the external LLM provider (e.g., Gemini). It manages resilience patterns like circuit breaking and emits its own telemetry events. - Receive Response: The
Client
receives the raw JSON response from the API. - Parse Response: The
Predict
module passes the API response toDSPEx.Adapter.parse_response
. The adapter extracts the relevant content and structures it according to theSignature
’s output fields. - Telemetry Stop: A
:stop
telemetry event is emitted with duration and success metrics. - Return Result: The final, structured output map is returned to the user.
Diagram 3: Program Evaluation Flow (DSPEx.Evaluate
)
This diagram illustrates how DSPEx.Evaluate
concurrently assesses a program’s performance against a dataset. It highlights the use of Task.async_stream
for parallelism.
(examples)"] B -- for each example --> C("Spawn Concurrent Task") C --> D["DSPEx.Program.forward
(program, example.inputs)"] D --> E["Prediction
Result"] C --> F["metric_fn
(example, prediction)"] F --> G["Score
(e.g., 1.0 or 0.0)"] E -- used by --> F end subgraph RA["Result Aggregation"] B -- collects results from all tasks --> H[List of Scores & Errors] H --> I["Process Results"] I --> J[Calculate Average Score & Stats] J --> K["%{score: 0.95, stats: {...}}"] end %% Elixir-inspired styling classDef setupPhase fill:#4e2a8e,stroke:#24292e,stroke-width:2px,color:#fff classDef concurrentEngine fill:#7c4dbd,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef taskNode fill:#9b72d0,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef processNode fill:#b89ce0,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef resultNode fill:#d4c5ec,stroke:#4e2a8e,stroke-width:1px,color:#24292e classDef aggregationPhase fill:#f5f5f5,stroke:#666,stroke-width:2px,color:#24292e classDef subgraphTitleTop fill:#e6e0f0,stroke:#b89ce0,stroke-width:2px,color:#24292e class A,Inputs setupPhase class B concurrentEngine class C taskNode class D,E,F,G processNode class H,I,J,K aggregationPhase class SETUP,CEE,RA subgraphTitleTop %% Darker arrow styling for better visibility linkStyle default stroke:#24292e,stroke-width:2px
Workflow Description:
- Initiation: The user calls
DSPEx.Evaluate.run
with aprogram
to test, a list ofexamples
(each with inputs and expected outputs), and ametric_fn
. - Concurrency:
Evaluate
usesTask.async_stream
to process the list of examples in parallel, up to a configurable concurrency limit. - Single Example Evaluation (per task):
- For each example, a task is spawned.
- Inside the task,
DSPEx.Program.forward
is called with the example’s inputs to get aprediction
. - The
metric_fn
is then called with the originalexample
and the newprediction
to generate a qualityscore
.
- Aggregation: The main process collects the results (scores or errors) from all completed tasks.
- Final Report: The collected scores are aggregated to produce a final
evaluation_result
map, containing the average score and detailed statistics like success rate, duration, and error counts.
Diagram 4: Program Optimization Flow (BootstrapFewShot
)
This diagram shows the “compilation” process of the BootstrapFewShot
teleprompter. It details the algorithm used to generate high-quality few-shot examples (demonstrations) for a “student” program.
(e.g., GPT-4 based)"] S["Student Program
(e.g., Gemini-Flash based)"] DS["Training Dataset
(Examples)"] MF["Metric Function
(fn(ex, pred) -> score)"] end subgraph OPT["Optimization Process: BootstrapFewShot.compile"] Start("Start") --> P1["Generate Candidates
(Concurrent)"] P1 -- for each example in DS --> P1a["teacher.forward
(example.inputs)"] P1a --> P1b[Create Demonstration Candidate] P1b --> Demos["Demonstration
Candidates"] Demos --> P2["Evaluate Demonstrations
(Concurrent)"] P2 -- for each demo --> P2a["metric_fn
(demo, demo.outputs)"] P2a --> P2b["Score Demo"] P2b --> P2c["Filter by
quality_threshold"] P2c --> QualityDemos["High-Quality Demos"] QualityDemos --> P3["Select Best Demos"] P3 -- sort by score, take N --> SelectedDemos["Selected Demos"] SelectedDemos --> P4["Create Optimized Student"] S --> P4 P4 --> OptimizedStudent["Optimized Student Program
(with new demos)"] end %% Elixir-inspired styling classDef inputLayer fill:#4e2a8e,stroke:#24292e,stroke-width:2px,color:#fff classDef startNode fill:#7c4dbd,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef processStep fill:#9b72d0,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef candidateNode fill:#b89ce0,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef evaluationNode fill:#d4c5ec,stroke:#4e2a8e,stroke-width:1px,color:#24292e classDef outputNode fill:#fdfbf7,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef subgraphTitleTop fill:#e6e0f0,stroke:#b89ce0,stroke-width:2px,color:#24292e class T,S,DS,MF inputLayer class Start startNode class P1,P2,P3,P4 processStep class P1a,P1b,Demos candidateNode class P2a,P2b,P2c,QualityDemos evaluationNode class SelectedDemos,OptimizedStudent outputNode class INPUTS,OPT subgraphTitleTop %% Darker arrow styling for better visibility linkStyle default stroke:#24292e,stroke-width:2px
Workflow Description:
- Inputs: The
compile
function takes astudent
program (to be optimized), a more powerfulteacher
program, atrainset
of examples, and ametric_fn
. - Generate Candidates: The teleprompter iterates through the
trainset
. For each example, it uses theteacher
program to generate a high-quality prediction. This input-output pair becomes a “demonstration candidate.” This step is run concurrently. - Evaluate Demonstrations: Each demonstration candidate is evaluated using the
metric_fn
. Since the demonstration’s outputs were generated by the teacher, they are treated as both the prediction and the expected output, effectively scoring the internal consistency and quality of the teacher’s generation. Candidates that score below aquality_threshold
are discarded. - Select Best Demos: The surviving high-quality demos are sorted by their score, and the top
N
(e.g.,max_bootstrapped_demos
) are selected. - Create Optimized Student: The selected demonstrations are attached to the
student
program, creating a new, optimized program. If the student program doesn’t have a native:demos
field, it’s wrapped inDSPEx.OptimizedProgram
.
Diagram 5: Test Mode Architecture
This diagram explains the flexible testing framework, which allows developers to run tests in different modes (mock
, fallback
, live
) to control interaction with external APIs.
environment variable"] end subgraph RB["Runtime Behavior"] G[DSPEx.TestModeConfig] -- reads env var --> H{Get Current Mode} I[DSPEx.Client / ClientManager] -- queries --> G I --> J{What is the test mode?} J -- :mock --> K["Force Mock:
Generate contextual
mock response.
No network call."] J -- :live --> L{"API Key Available?"} J -- :fallback --> M{"API Key Available?"} L -- Yes --> N[Attempt Live API Call] L -- No --> O[Fail Test] M -- Yes --> N M -- No --> P["Seamless Fallback:
Generate contextual
mock response.
No network call."] end %% Elixir-inspired styling classDef userCommand fill:#4e2a8e,stroke:#24292e,stroke-width:2px,color:#fff classDef testSetup fill:#7c4dbd,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef configComponent fill:#9b72d0,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef decisionNode fill:#b89ce0,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef mockMode fill:#d4c5ec,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef liveMode fill:#c8e6c9,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef fallbackMode fill:#fff3cd,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef failMode fill:#ffcdd2,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef subgraphTitleTop fill:#e6e0f0,stroke:#b89ce0,stroke-width:2px,color:#24292e class A,B,C,D userCommand class E,F testSetup class G,I configComponent class H,J,L,M decisionNode class K,P mockMode class N liveMode class O failMode class UC,TS,RB subgraphTitleTop %% Darker arrow styling for better visibility linkStyle default stroke:#24292e,stroke-width:2px
Workflow Description:
- Command: The developer runs a test command like
mix test.mock
ormix test.fallback
. - Environment Setup: The corresponding Mix task (
Mix.Tasks.Test.Mock
, etc.) sets theDSPEX_TEST_MODE
environment variable. - Mode Detection: During the test run,
DSPEx.TestModeConfig
reads this environment variable to determine the active mode. - Conditional Logic: The
DSPEx.Client
andClientManager
modules queryTestModeConfig
and adjust their behavior accordingly:- :mock: All API calls are intercepted. A contextual mock response is generated locally without any network requests. This is the default, ensuring tests are fast and don’t require credentials.
- :live: The client attempts a real API call. If API keys are missing, the call (and thus the test) will fail. This is for strict integration testing.
- :fallback: The client first checks for API keys. If present, it attempts a live API call. If not, it “seamlessly” falls back to the mock behavior, allowing tests to pass while still validating live integration when possible.