DIAGS

Documentation for DIAGS from the Ds ex repository.

Of course. Here is a comprehensive series of technical diagrams that illustrate the architecture, core workflows, and key features of the DSPex implementation. The diagrams are designed to provide a clear understanding of the system, from high-level architecture to detailed operational flows.

Introduction

These diagrams provide a visual guide to the DSPex library’s architecture and functionality. They cover the main processes: making a prediction, evaluating a program’s performance, and optimizing a program with few-shot demonstrations. The diagrams also detail the supporting services like configuration, telemetry, and the flexible testing framework.

Diagram 1: High-Level System Architecture

This diagram shows the main components of the DSPex library and their relationships. It illustrates how the core abstractions (Program, Signature) are used by key modules and how supporting services integrate with the underlying “Foundation” library and external APIs.

graph TD subgraph UI["User Interaction"] A[User/Application Code] end subgraph CORE["DSPex Core Library"] subgraph CA["Core Abstractions"] Program["DSPEx.Program
(Behavior)"] Signature["DSPEx.Signature
(Behavior)"] Example["DSPEx.Example
(Struct)"] end subgraph KM["Key Modules"] Predict[DSPEx.Predict] Evaluate[DSPEx.Evaluate] Teleprompter[DSPEx.Teleprompter] end subgraph CL["Client Layer"] Client[DSPEx.Client] ClientManager["DSPEx.ClientManager
(GenServer)"] end subgraph SA["Services & Application"] App[Dspex.Application] -- supervises --> Supervisor[Dspex.Supervisor] Supervisor -- starts --> ConfigManager[Services.ConfigManager] Supervisor -- starts --> TelemetrySetup[Services.TelemetrySetup] Supervisor -- starts --> Finch[Finch HTTP Pool] end Adapter[DSPEx.Adapter] end subgraph ES["External Systems"] Foundation[Foundation Library] LLM_APIs["LLM APIs
(Gemini, OpenAI)"] end A --> Predict A --> Evaluate A --> Teleprompter Predict -- implements --> Program Predict -- uses --> Signature Predict -- uses --> Adapter Predict -- uses --> Client Evaluate -- uses --> Program Evaluate -- uses --> Example Teleprompter -- uses --> Program Teleprompter -- uses --> Example Teleprompter -- produces --> OptimizedProgram[DSPEx.OptimizedProgram] Client -- uses --> ClientManager ClientManager -- uses --> Finch ClientManager -- calls --> LLM_APIs Adapter -- used by --> Predict ConfigManager -- integrates with --> Foundation TelemetrySetup -- integrates with --> Foundation Client -- integrates with --> Foundation %% Elixir-inspired styling classDef userLayer fill:#4e2a8e,stroke:#24292e,stroke-width:2px,color:#fff classDef coreAbstraction fill:#7c4dbd,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef keyModule fill:#9b72d0,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef clientLayer fill:#b89ce0,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef serviceLayer fill:#d4c5ec,stroke:#4e2a8e,stroke-width:1px,color:#24292e classDef externalSystem fill:#f5f5f5,stroke:#666,stroke-width:1px,color:#24292e classDef adapter fill:#fdfbf7,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef subgraphTitleTop fill:#e6e0f0,stroke:#b89ce0,stroke-width:2px,color:#24292e classDef subgraphTitleNested fill:#f2f0f7,stroke:#d4c5ec,stroke-width:1px,color:#24292e class A userLayer class Program,Signature,Example coreAbstraction class Predict,Evaluate,Teleprompter keyModule class Client,ClientManager clientLayer class App,Supervisor,ConfigManager,TelemetrySetup,Finch serviceLayer class Foundation,LLM_APIs externalSystem class Adapter,OptimizedProgram adapter class UI,CORE,ES subgraphTitleTop class CA,KM,CL,SA subgraphTitleNested %% Darker arrow styling for better visibility linkStyle default stroke:#24292e,stroke-width:2px

Workflow Description:

A user’s application interacts with the primary modules: Predict for inference, Evaluate for performance measurement, and Teleprompter for optimization.
These modules are built upon core abstractions: a Program defines an executable unit, a Signature defines its I/O contract, and an Example holds data.
The Predict module uses an Adapter to format data and a Client to communicate with external LLM APIs.
The ClientManager provides a stateful, resilient GenServer-based alternative to the functional Client. It uses a Finch pool for HTTP requests.
The entire system is started as an OTP Application, which supervises the key background services (ConfigManager, TelemetrySetup).
These services integrate deeply with an external Foundation library to provide centralized configuration, telemetry, and other infrastructure services.

Diagram 2: Core Prediction Flow (`DSPEx.Predict`)

This sequence diagram details the steps involved when making a single prediction. It shows the flow of control from the Predict program through the Adapter and Client layers to the LLM API and back.

sequenceDiagram participant User participant Predict as DSPEx.Predict.forward participant Adapter as DSPEx.Adapter participant Client as DSPEx.Client/ClientManager participant Telemetry as Telemetry System participant LLM_API as External LLM API User->>Predict: forward(program, inputs) Predict->>Telemetry: :telemetry.execute([:dspex, :predict, :start]) Predict->>Adapter: format_messages(signature, inputs) Note over Predict,Adapter: Includes demo formatting if available Adapter-->>Predict: {:ok, messages} Predict->>Client: request(messages, opts) Client->>Telemetry: :telemetry.execute([:dspex, :client, :request, :start]) Note right of Client: Handles circuit breakers,
rate limiting, mocking, etc. Client->>LLM_API: POST /v1/generateContent LLM_API-->>Client: 200 OK (JSON Response) Client->>Telemetry: :telemetry.execute([:dspex, :client, :request, :stop]) Client-->>Predict: {:ok, api_response} Predict->>Adapter: parse_response(signature, api_response) Adapter-->>Predict: {:ok, outputs} Predict->>Telemetry: :telemetry.execute([:dspex, :predict, :stop]) Predict-->>User: {:ok, outputs}

Workflow Description:

Initiation: A user calls DSPEx.Predict.forward with a program instance and input data.
Telemetry Start: The Predict module emits a :start telemetry event.
Format Request: Predict calls DSPEx.Adapter.format_messages. The adapter uses the program’s Signature and any few-shot demos to create a list of messages suitable for the LLM API.
Client Request: The formatted messages are passed to DSPEx.Client.request.
API Call: The Client (or ClientManager) handles the HTTP call to the external LLM provider (e.g., Gemini). It manages resilience patterns like circuit breaking and emits its own telemetry events.
Receive Response: The Client receives the raw JSON response from the API.
Parse Response: The Predict module passes the API response to DSPEx.Adapter.parse_response. The adapter extracts the relevant content and structures it according to the Signature’s output fields.
Telemetry Stop: A :stop telemetry event is emitted with duration and success metrics.
Return Result: The final, structured output map is returned to the user.

Diagram 3: Program Evaluation Flow (`DSPEx.Evaluate`)

This diagram illustrates how DSPEx.Evaluate concurrently assesses a program’s performance against a dataset. It highlights the use of Task.async_stream for parallelism.

graph TD subgraph SETUP["Setup"] A[DSPEx.Evaluate.run] Inputs["program, examples, metric_fn"] A -- receives --> Inputs end subgraph CEE["Concurrent Evaluation Engine"] A --> B["Task.async_stream
(examples)"] B -- for each example --> C("Spawn Concurrent Task") C --> D["DSPEx.Program.forward
(program, example.inputs)"] D --> E["Prediction
Result"] C --> F["metric_fn
(example, prediction)"] F --> G["Score
(e.g., 1.0 or 0.0)"] E -- used by --> F end subgraph RA["Result Aggregation"] B -- collects results from all tasks --> H[List of Scores & Errors] H --> I["Process Results"] I --> J[Calculate Average Score & Stats] J --> K["%{score: 0.95, stats: {...}}"] end %% Elixir-inspired styling classDef setupPhase fill:#4e2a8e,stroke:#24292e,stroke-width:2px,color:#fff classDef concurrentEngine fill:#7c4dbd,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef taskNode fill:#9b72d0,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef processNode fill:#b89ce0,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef resultNode fill:#d4c5ec,stroke:#4e2a8e,stroke-width:1px,color:#24292e classDef aggregationPhase fill:#f5f5f5,stroke:#666,stroke-width:2px,color:#24292e classDef subgraphTitleTop fill:#e6e0f0,stroke:#b89ce0,stroke-width:2px,color:#24292e class A,Inputs setupPhase class B concurrentEngine class C taskNode class D,E,F,G processNode class H,I,J,K aggregationPhase class SETUP,CEE,RA subgraphTitleTop %% Darker arrow styling for better visibility linkStyle default stroke:#24292e,stroke-width:2px

Workflow Description:

Initiation: The user calls DSPEx.Evaluate.run with a program to test, a list of examples (each with inputs and expected outputs), and a metric_fn.
Concurrency: Evaluate uses Task.async_stream to process the list of examples in parallel, up to a configurable concurrency limit.
Single Example Evaluation (per task):
- For each example, a task is spawned.
- Inside the task, DSPEx.Program.forward is called with the example’s inputs to get a prediction.
- The metric_fn is then called with the original example and the new prediction to generate a quality score.
Aggregation: The main process collects the results (scores or errors) from all completed tasks.
Final Report: The collected scores are aggregated to produce a final evaluation_result map, containing the average score and detailed statistics like success rate, duration, and error counts.

Diagram 4: Program Optimization Flow (`BootstrapFewShot`)

This diagram shows the “compilation” process of the BootstrapFewShot teleprompter. It details the algorithm used to generate high-quality few-shot examples (demonstrations) for a “student” program.

graph TD subgraph INPUTS["Inputs"] T["Teacher Program
(e.g., GPT-4 based)"] S["Student Program
(e.g., Gemini-Flash based)"] DS["Training Dataset
(Examples)"] MF["Metric Function
(fn(ex, pred) -> score)"] end subgraph OPT["Optimization Process: BootstrapFewShot.compile"] Start("Start") --> P1["Generate Candidates
(Concurrent)"] P1 -- for each example in DS --> P1a["teacher.forward
(example.inputs)"] P1a --> P1b[Create Demonstration Candidate] P1b --> Demos["Demonstration
Candidates"] Demos --> P2["Evaluate Demonstrations
(Concurrent)"] P2 -- for each demo --> P2a["metric_fn
(demo, demo.outputs)"] P2a --> P2b["Score Demo"] P2b --> P2c["Filter by
quality_threshold"] P2c --> QualityDemos["High-Quality Demos"] QualityDemos --> P3["Select Best Demos"] P3 -- sort by score, take N --> SelectedDemos["Selected Demos"] SelectedDemos --> P4["Create Optimized Student"] S --> P4 P4 --> OptimizedStudent["Optimized Student Program
(with new demos)"] end %% Elixir-inspired styling classDef inputLayer fill:#4e2a8e,stroke:#24292e,stroke-width:2px,color:#fff classDef startNode fill:#7c4dbd,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef processStep fill:#9b72d0,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef candidateNode fill:#b89ce0,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef evaluationNode fill:#d4c5ec,stroke:#4e2a8e,stroke-width:1px,color:#24292e classDef outputNode fill:#fdfbf7,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef subgraphTitleTop fill:#e6e0f0,stroke:#b89ce0,stroke-width:2px,color:#24292e class T,S,DS,MF inputLayer class Start startNode class P1,P2,P3,P4 processStep class P1a,P1b,Demos candidateNode class P2a,P2b,P2c,QualityDemos evaluationNode class SelectedDemos,OptimizedStudent outputNode class INPUTS,OPT subgraphTitleTop %% Darker arrow styling for better visibility linkStyle default stroke:#24292e,stroke-width:2px

Workflow Description:

Inputs: The compile function takes a student program (to be optimized), a more powerful teacher program, a trainset of examples, and a metric_fn.
Generate Candidates: The teleprompter iterates through the trainset. For each example, it uses the teacher program to generate a high-quality prediction. This input-output pair becomes a “demonstration candidate.” This step is run concurrently.
Evaluate Demonstrations: Each demonstration candidate is evaluated using the metric_fn. Since the demonstration’s outputs were generated by the teacher, they are treated as both the prediction and the expected output, effectively scoring the internal consistency and quality of the teacher’s generation. Candidates that score below a quality_threshold are discarded.
Select Best Demos: The surviving high-quality demos are sorted by their score, and the top N (e.g., max_bootstrapped_demos) are selected.
Create Optimized Student: The selected demonstrations are attached to the student program, creating a new, optimized program. If the student program doesn’t have a native :demos field, it’s wrapped in DSPEx.OptimizedProgram.

Diagram 5: Test Mode Architecture

This diagram explains the flexible testing framework, which allows developers to run tests in different modes (mock, fallback, live) to control interaction with external APIs.

graph TD subgraph UC["User Commands"] A["mix test.mock"] B["mix test.fallback"] C["mix test.live"] D["mix test (defaults to mock)"] end subgraph TS["Test Setup"] E[Mix Tasks] A --> E B --> E C --> E D --> E E --> F["Set DSPEX_TEST_MODE
environment variable"] end subgraph RB["Runtime Behavior"] G[DSPEx.TestModeConfig] -- reads env var --> H{Get Current Mode} I[DSPEx.Client / ClientManager] -- queries --> G I --> J{What is the test mode?} J -- :mock --> K["Force Mock:
Generate contextual
mock response.
No network call."] J -- :live --> L{"API Key Available?"} J -- :fallback --> M{"API Key Available?"} L -- Yes --> N[Attempt Live API Call] L -- No --> O[Fail Test] M -- Yes --> N M -- No --> P["Seamless Fallback:
Generate contextual
mock response.
No network call."] end %% Elixir-inspired styling classDef userCommand fill:#4e2a8e,stroke:#24292e,stroke-width:2px,color:#fff classDef testSetup fill:#7c4dbd,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef configComponent fill:#9b72d0,stroke:#4e2a8e,stroke-width:2px,color:#fff classDef decisionNode fill:#b89ce0,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef mockMode fill:#d4c5ec,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef liveMode fill:#c8e6c9,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef fallbackMode fill:#fff3cd,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef failMode fill:#ffcdd2,stroke:#4e2a8e,stroke-width:2px,color:#24292e classDef subgraphTitleTop fill:#e6e0f0,stroke:#b89ce0,stroke-width:2px,color:#24292e class A,B,C,D userCommand class E,F testSetup class G,I configComponent class H,J,L,M decisionNode class K,P mockMode class N liveMode class O failMode class UC,TS,RB subgraphTitleTop %% Darker arrow styling for better visibility linkStyle default stroke:#24292e,stroke-width:2px

Workflow Description:

Command: The developer runs a test command like mix test.mock or mix test.fallback.
Environment Setup: The corresponding Mix task (Mix.Tasks.Test.Mock, etc.) sets the DSPEX_TEST_MODE environment variable.
Mode Detection: During the test run, DSPEx.TestModeConfig reads this environment variable to determine the active mode.
Conditional Logic: The DSPEx.Client and ClientManager modules query TestModeConfig and adjust their behavior accordingly:
- :mock: All API calls are intercepted. A contextual mock response is generated locally without any network requests. This is the default, ensuring tests are fast and don’t require credentials.
- :live: The client attempts a real API call. If API keys are missing, the call (and thus the test) will fail. This is for strict integration testing.
- :fallback: The client first checks for API keys. If present, it attempts a live API call. If not, it “seamlessly” falls back to the mock behavior, allowing tests to pass while still validating live integration when possible.

Introduction

Diagram 1: High-Level System Architecture

Diagram 2: Core Prediction Flow (DSPEx.Predict)

Diagram 3: Program Evaluation Flow (DSPEx.Evaluate)

Diagram 4: Program Optimization Flow (BootstrapFewShot)

Diagram 5: Test Mode Architecture

Diagram 2: Core Prediction Flow (`DSPEx.Predict`)

Diagram 3: Program Evaluation Flow (`DSPEx.Evaluate`)

Diagram 4: Program Optimization Flow (`BootstrapFewShot`)