Of course. Based on your excellent and detailed staged implementation plan, here is a complete and exhaustive test structure for the DSPEx
project.
This structure is designed to be built out progressively, aligning perfectly with your 6-stage plan. It uses descriptive describe
blocks to categorize tests, making it clear which aspects of a module’s functionality are being tested. It also includes placeholders for property-based and integration tests.
Generated Test Structure for DSPEx
Here are the skeleton test files for each core module.
test/test_helper.exs
This is the standard helper file that will be executed before the test suite runs.
# test/test_helper.exs
ExUnit.start()
# Start Mox for mocking dependencies in tests, especially the HTTP client.
Mox.defmock(DSPEx.Test.ReqMock, for: DSPEx.Test.ReqBehaviour)
test/dspex/signature_test.exs
This suite validates the heart of the declarative system: parsing and generating signatures.
# test/dspex/signature_test.exs
defmodule DSPEx.SignatureTest do
use ExUnit.Case, async: true
# --- Module for testing the `use DSPEx.Signature` macro ---
defmodule MyQASignature do
@moduledoc "A simple QA signature."
use DSPEx.Signature, "question -> answer"
end
defmodule ComplexSignature do
@moduledoc "A more complex signature."
use DSPEx.Signature, "context, question -> rationale, answer"
end
# --- End of test modules ---
describe "Signature Parsing (Compile-Time)" do
test "parses a simple 'input -> output' format"
test "parses multiple inputs and outputs with commas"
test "handles varied whitespace gracefully"
test "raises a CompileError for signatures without '->'"
test "raises a CompileError for duplicate input field names"
test "raises a CompileError for duplicate output field names"
test "raises a CompileError if input and output fields overlap"
end
describe "Generated Signature Module (`use DSPEx.Signature`)" do
test "correctly implements the DSPEx.Signature behaviour"
test "instructions/0 returns the module's docstring"
test "input_fields/0 returns the correct list of atoms"
test "output_fields/0 returns the correct list of atoms"
test "defines a struct with all input and output fields as keys"
test "defines a public type `t` for the struct"
end
describe "Signature as a Data Structure" do
test "can be instantiated with data"
test "can be used in pattern matching"
end
end
test/dspex/example_test.exs
This suite validates the Example
data container, which is crucial for handling training and evaluation data.
# test/dspex/example_test.exs
defmodule DSPEx.ExampleTest do
use ExUnit.Case, async: true
alias DSPEx.Example
setup do
%{example: Example.new(%{question: "Q", answer: "A", context: "C"})}
end
describe "Example Creation & Initialization" do
test "new/1 creates an example with data and empty input keys"
test "new/1 handles an empty map"
end
describe "with_inputs/2" do
test "correctly designates a single input key", %{example: example} do
e = Example.with_inputs(example, [:question])
assert MapSet.equal?(e.input_keys, MapSet.new([:question]))
end
test "correctly designates multiple input keys", %{example: example} do
e = Example.with_inputs(example, [:question, :context])
assert MapSet.equal?(e.input_keys, MapSet.new([:question, :context]))
end
test "overwrites previously set input keys"
end
describe "inputs/1 and labels/1" do
test "inputs/1 returns a map of only the designated input fields"
test "labels/1 returns a map of all non-input fields"
test "handles cases with no designated inputs (all fields are labels)"
test "handles cases where all fields are inputs (no labels)"
end
end
test/dspex/prediction_test.exs
This suite validates the Prediction
data container, focusing on its structure and the Access
behaviour for developer ergonomics.
# test/dspex/prediction_test.exs
defmodule DSPEx.PredictionTest do
use ExUnit.Case, async: true
alias DSPEx.Prediction
setup do
%{
prediction: %Prediction{
inputs: %{question: "What is 2+2?"},
outputs: %{answer: "4", rationale: "Because math."},
raw_response: %{id: "dummy_id"}
}
}
end
describe "Prediction Struct" do
test "can be created with inputs, outputs, and raw_response"
end
describe "Access Behaviour Implementation" do
test "allows direct access to output fields using dot notation", %{prediction: p} do
assert p.answer == "4"
assert p.rationale == "Because math."
end
test "returns nil for a non-existent key using dot notation", %{prediction: p} do
assert p.non_existent_field == nil
end
test "works correctly with Access.fetch/2"
test "works correctly with Access.get/2"
test "works correctly with Access.pop/2, returning a modified struct"
test "works correctly with Access.get_and_update/3"
end
end
test/dspex/client_test.exs
This suite is critical for testing the resilience and core functionality of the GenServer
-based API client. It will rely heavily on Mox
for mocking.
# test/dspex/client_test.exs
defmodule DSPEx.ClientTest do
use ExUnit.Case, async: true
import Mox
# The GenServer state is tested indirectly via its public API.
# Mock the Req behaviour to isolate the client from actual HTTP calls.
setup :verify_on_exit!
describe "Client Initialization" do
test "start_link/1 successfully starts and registers a client process"
test "init/1 correctly sets up state with config, cache, and fuse"
test "raises an error if required options like :name or :config are missing"
end
describe "Successful Requests (Happy Path)" do
test "request/2 sends a correctly formatted POST request via Req"
test "request/2 returns {:ok, response_body} on a 200 status"
test "request/2 populates the cache on a successful first request"
end
describe "Caching Logic" do
test "request/2 returns a cached response without making a new HTTP call"
test "does not cache failed requests"
test "different request bodies produce different cache keys"
end
describe "Circuit Breaker (Fuse) Logic" do
test "continues to make requests when failures are below the threshold"
test "melts the fuse and returns {:error, :fuse_melted} after reaching failure threshold"
test "rejects calls immediately when fuse is melted"
test "attempts a request again after the fuse's reset timeout"
end
describe "Error Handling" do
test "handles non-200 HTTP status codes by raising an error for the fuse to catch"
test "handles network-level errors (e.g., {:error, :econnrefused})"
end
end
test/dspex/adapter_test.exs
This suite tests the “translation” layer between DSPEx
’s abstract world and the concrete world of LLM APIs.
# test/dspex/adapter_test.exs
defmodule DSPEx.AdapterTest do
use ExUnit.Case, async: true
alias DSPEx.Adapter.Chat
alias DSPEx.Example
# Define test signatures once
defmodule QASig do
@moduledoc "Question -> Answer"
use DSPEx.Signature, "question -> answer"
end
defmodule CoTSig do
@moduledoc "Question -> Rationale, Answer"
use DSPEx.Signature, "question -> rationale, answer"
end
describe "DSPEx.Adapter.Chat - format/3" do
test "formats a zero-shot request correctly"
test "formats a few-shot request with one demo"
test "formats a few-shot request with multiple demos in the correct order"
test "builds the correct system prompt from signature instructions"
test "correctly formats multiple input fields"
end
describe "DSPEx.Adapter.Chat - parse/2" do
test "parses a simple, well-formed response with one output field"
test "parses a complex response with multiple output fields"
test "handles extra text before, after, and between fields"
test "returns {:error, reason} if an output field is missing"
test "returns {:error, reason} for completely malformed responses"
test "trims whitespace from parsed content"
end
# When the JSON adapter is built, its tests would go here.
# describe "DSPEx.Adapter.JSON" do
# test "format/3 creates a prompt requesting JSON output"
# test "parse/2 successfully decodes a valid JSON response"
# test "parse/2 returns an error for invalid JSON"
# end
end
test/dspex/predict_test.exs
This suite tests the primary execution module, Predict
, integrating the Client
and Adapter
.
# test/dspex/predict_test.exs
defmodule DSPEx.PredictTest do
use ExUnit.Case, async: true
import Mox
alias DSPEx.Predict
defmodule TestSig, do: use(DSPEx.Signature, "question -> answer")
describe "Predict Initialization" do
test "new/2 creates a Predict struct with a signature and client"
test "can be initialized with an adapter and demos"
end
describe "forward/2 Execution" do
test "successfully executes the format -> request -> parse pipeline on a happy path"
test "returns a fully populated DSPEx.Prediction struct on success"
test "propagates errors from the adapter's format function"
test "propagates errors from the client request"
test "propagates errors from the adapter's parse function"
end
describe "configure/2" do
test "returns a new program with updated demos"
test "returns a new program with an updated adapter"
test "does not mutate the original program struct"
end
end
test/dspex/evaluate_test.exs
This suite tests the concurrent evaluation engine.
# test/dspex/evaluate_test.exs
defmodule DSPEx.EvaluateTest do
use ExUnit.Case
alias DSPEx.Evaluate
# A mock program for testing evaluation without real API calls.
defmodule MockProgram do
@behaviour DSPEx.Program
def forward(_program, %{should_fail: true}), do: {:error, :mock_failure}
def forward(_program, inputs), do: {:ok, %DSPEx.Prediction{inputs: inputs, outputs: %{answer: "mock_#{inputs.question}"}}}
def configure(p, c), do: Map.merge(p, c)
end
describe "Evaluation Execution" do
test "runs evaluation on a dev set and returns an aggregate score"
test "correctly applies the metric function to each successful prediction"
test "handles an empty dev set gracefully"
test "processes examples concurrently"
# Note: Testing true concurrency is tricky. A proxy is to test that the
# total execution time with sleeps is less than the sum of all sleeps.
test "execution is faster than sequential processing"
end
describe "Result and Error Handling" do
test "correctly partitions successful and failed evaluations"
test "handles program failures gracefully without crashing the stream"
test "calculates a score of 0.0 when all evaluations fail"
end
describe "Options and UI" do
test "respects the :num_threads option"
test "displays a progress bar when :display_progress is true"
test "does not display a progress bar when :display_progress is false"
end
end
test/dspex/teleprompter/bootstrap_fewshot_test.exs
This suite tests the first and most important optimizer.
# test/dspex/teleprompter/bootstrap_fewshot_test.exs
defmodule DSPEx.Teleprompter.BootstrapFewShotTest do
use ExUnit.Case
alias DSPEx.Teleprompter.BootstrapFewShot
describe "BootstrapFewShot Initialization" do
test "initializes with a metric function and other options"
end
describe "Compilation Logic" do
# These tests will require more complex mocks for student/teacher programs
# and the evaluation engine.
test "compile/3 runs the teacher on the trainset to generate candidate demos"
test "filters candidate demos using the provided metric function"
test "selects the top-k successful demos"
test "configures the student program with the selected demos"
test "returns the newly configured student program"
end
describe "Edge Cases" do
test "handles the case where no successful demos are generated"
test "respects the max_bootstrapped_demos limit"
test "shuffles the trainset if specified"
end
end
Of course. Here is the continuation of the exhaustive test structure for DSPEx
, covering the advanced modules from Stage 6 of your implementation plan.
This completes the test suite skeleton, providing a clear roadmap for test-driven development of the entire framework.
test/dspex/chain_of_thought_test.exs
This suite tests the ChainOfThought
program, which extends Predict
by prompting for an intermediate reasoning step.
# test/dspex/chain_of_thought_test.exs
defmodule DSPEx.ChainOfThoughtTest do
use ExUnit.Case, async: true
import Mox
alias DSPEx.ChainOfThought
defmodule QASig do
use DSPEx.Signature, "question -> answer"
end
describe "ChainOfThought Initialization" do
test "new/2 creates a CoT struct with a base signature and client"
test "automatically creates an extended signature with a 'rationale' field"
test "the extended signature places 'rationale' before other output fields"
test "can be initialized with a custom rationale field name (e.g., :reasoning)"
end
describe "forward/2 Execution" do
# These tests require mocking the adapter and client.
setup :verify_on_exit!
test "successfully executes the full pipeline for a CoT prompt" do
# Mock the adapter to check that it's called with the *extended* signature.
# Mock the client to return a response with both rationale and answer.
end
test "returns a Prediction struct containing both :rationale and :answer in its outputs"
test "propagates errors from the client or adapter"
end
describe "Integration with Adapters" do
test "the default Chat adapter formats a prompt that explicitly asks for a rationale"
test "the default Chat adapter can parse both a rationale and an answer"
end
end
test/dspex/multi_chain_comparison_test.exs
This suite tests the MultiChainComparison
module, which is a “meta-program” for refining answers.
# test/dspex/multi_chain_comparison_test.exs
defmodule DSPEx.MultiChainComparisonTest do
use ExUnit.Case, async: true
import Mox
alias DSPEx.MultiChainComparison
alias DSPEx.Prediction
defmodule CompareSig do
# The signature for the comparison task itself
use DSPEx.Signature, "question, candidate_answers -> best_answer"
end
setup do
%{
predictions: [
%Prediction{outputs: %{answer: "Paris"}},
%Prediction{outputs: %{answer: "The City of Light"}},
%Prediction{outputs: %{answer: "paris"}}
]
}
end
describe "MultiChainComparison Initialization" do
test "initializes with a signature for the comparison logic"
end
describe "forward/2 Execution Logic" do
setup :verify_on_exit!
test "formats a prompt containing the question and all candidate answers"
test "calls the LLM client to get the best answer"
test "parses the final 'best_answer' from the LLM's response"
test "returns a final Prediction struct"
end
describe "Edge Case Handling" do
test "handles an empty list of candidate predictions"
test "handles a list with only one candidate prediction"
end
end
test/dspex/parallel_test.exs
This suite tests the Parallel
executor, focusing on concurrency and result aggregation.
# test/dspex/parallel_test.exs
defmodule DSPEx.ParallelTest do
use ExUnit.Case
alias DSPEx.Parallel
defmodule MockSlowProgram do
@behaviour DSPEx.Program
def forward(_program, %{id: id, sleep: sleep_ms}) do
Process.sleep(sleep_ms)
{:ok, %{result: "Program #{id} done"}}
end
def forward(_program, %{id: id, should_fail: true}) do
Process.sleep(10)
{:error, "Program #{id} failed"}
end
def configure(p, c), do: Map.merge(p, c)
end
describe "Parallel Execution" do
test "executes a list of programs concurrently" do
programs = [
{%MockSlowProgram{}, %{id: 1, sleep: 100}},
{%MockSlowProgram{}, %{id: 2, sleep: 100}},
{%MockSlowProgram{}, %{id: 3, sleep: 100}}
]
# The total time should be slightly more than 100ms, not 300ms.
assert {:ok, _results} = :timer.tc(fn -> Parallel.run(programs) end) |> then(fn {time, res} ->
assert time < 200_000 # Time in microseconds
res
end)
end
test "returns results in the same order as the input programs"
end
describe "Result and Error Handling" do
test "aggregates both :ok and :error tuples from the program executions"
test "a single failing program does not stop the execution of others"
end
end
test/dspex/retriever_test.exs
This suite tests the Retriever
behaviour and its integration into a RAG
program.
# test/dspex/retriever_test.exs
defmodule DSPEx.RetrieverTest do
use ExUnit.Case, async: true
import Mox
# --- Mocks and Test Modules ---
defmodule MockRetriever do
@behaviour DSPEx.Retriever
def forward(_re, "query1"), do: {:ok, [%{text: "Passage A"}, %{text: "Passage B"}]}
def forward(_re, _any_query), do: {:ok, []}
end
defmodule RAG do
@behaviour DSPEx.Program
# A simplified RAG module for testing.
defstruct [:retriever, :predictor]
def forward(%{retriever: r, predictor: p} = _program, inputs) do
with {:ok, passages} <- DSPEx.Retriever.forward(r, inputs.question),
# Format context for the predictor
context = Enum.map_join(passages, "\n", & &1.text),
{:ok, prediction} <- DSPEx.Program.forward(p, Map.put(inputs, :context, context)) do
{:ok, prediction}
else
error -> error
end
end
def configure(p, c), do: Map.merge(p, c)
end
# --- End of Test Modules ---
describe "Retriever Behaviour" do
test "forward/2 takes a query and returns a list of passages"
end
describe "RAG Program Integration" do
setup :verify_on_exit!
test "a RAG program successfully orchestrates retrieval and prediction" do
# Mock the Predictor's forward call to assert it receives the context.
predictor_mock = mock_program()
expect(DSPEx.Program, :forward, fn ^predictor_mock, %{context: context, question: "query1"} ->
assert context == "Passage A\nPassage B"
{:ok, :final_prediction}
end)
rag_program = %RAG{retriever: %MockRetriever{}, predictor: predictor_mock}
assert {:ok, :final_prediction} = DSPEx.Program.forward(rag_program, %{question: "query1"})
end
end
end
test/integration/full_pipeline_test.exs
This suite performs a full end-to-end integration test of a common use-case, like an optimized RAG pipeline.
# test/integration/full_pipeline_test.exs
defmodule DSPEx.Integration.FullPipelineTest do
use ExUnit.Case
# Do not run tests concurrently as they involve multiple components
# and could rely on shared mocks or state.
# This test suite will be more complex, requiring mocks for the client,
# a functional in-memory retriever, and real program/teleprompter modules.
describe "Optimized RAG Pipeline" do
test "BootstrapFewShot can successfully compile a RAG program"
test "the compiled RAG program produces better results than the uncompiled one"
test "the full flow from question -> retrieve -> predict -> evaluate works"
end
describe "Optimized ChainOfThought Pipeline" do
test "BootstrapFewShot can successfully compile a ChainOfThought program"
test "the compiled CoT program generates better rationales and answers"
end
end
test/property/dspex_property_test.exs
This suite uses property-based testing to verify system invariants and find edge cases automatically.
# test/property/dspex_property_test.exs
defmodule DSPEx.PropertyTest do
use ExUnit.Case
use PropCheck
# --- Generators for property testing ---
# Example generator:
let :field_name, do: string(?a..?z, min: 1, max: 10) |> map(&String.to_atom/1)
let :field_list, do: list_of(:field_name) |> uniq()
# ---
describe "Signature Properties" do
property "parsing is robust against unusual but valid spacing"
property "input and output fields of a parsed signature are always disjoint"
end
describe "Example Properties" do
property "for any example, inputs() and labels() are always disjoint"
property "for any example, the keys of inputs() and labels() combined equal all data keys"
end
describe "Adapter Properties" do
# This property checks for a form of round-tripping.
property "parsing a formatted prompt recovers the original data structure"
end
end