14 gemini

Documentation for 14_gemini from the Ds ex repository.

Of course. Having defined the complete test structure, the final piece of the technical plan is to specify the developer experience: the public-facing documentation and API surface. This ensures the framework is not only powerful and well-tested but also intuitive and easy to adopt.

This document concludes the technical specification by outlining the primary README.md, module-level documentation (@moduledoc), and function-level documentation (@doc) that will guide users.

Document 5: Documentation & Developer Experience

Objective: To define the public-facing documentation and API, ensuring that DSPEx is approachable, understandable, and a pleasure to use for Elixir developers. This includes a comprehensive README.md for project discovery and high-quality inline documentation for ExDoc.

1. Project `README.md`

This is the front door to the project. It must be clear, concise, and provide a compelling “Quick Start” example that demonstrates the core value proposition of the framework.

# DSPEx: A DSPy-inspired Framework for Elixir

**Programming, not just prompting, Large Language Models.**

DSPEx brings the core philosophy of the [DSPy framework](https://github.com/stanford-oval/dspy) to the Elixir ecosystem. It provides a structured, systematic, and composable way to build and optimize powerful applications on top of Large Language Models (LLMs).

Instead of relying on fragile, hand-tuned prompts, DSPEx allows you to define programs that can learn from data. These programs use `Signatures` to declare their I/O contracts and can be automatically optimized using `Teleprompters` to generate high-quality few-shot examples, turning simple models into powerful, specialized agents.

Built on OTP, `DSPEx` is designed from the ground up to be:
*   **Concurrent:** Evaluate and optimize programs against large datasets with maximum efficiency using lightweight Elixir processes.
*   **Resilient:** Isolated, stateful clients with circuit breakers and automatic caching prevent cascading failures and reduce redundant API calls.
*   **Composable:** Every module, from a simple `Predict` to a complex `ChainOfThought`, adheres to the same `Program` behaviour, allowing them to be swapped, nested, and optimized interchangeably.

---

### Quick Start: Building and Optimizing a Q&A Program

This example demonstrates the end-to-end workflow: define, evaluate, optimize, and re-evaluate.

```elixir
# 1. Define the I/O contract for our program.
defmodule QASignature do
  @moduledoc "Answer questions with a short, factual answer."
  use DSPEx.Signature, "question -> answer"
end

# 2. Create a small development set for evaluation and optimization.
devset = [
  %DSPEx.Example{data: %{question: "What is the capital of France?", answer: "Paris"}},
  %DSPEx.Example{data: %{question: "Who wrote '1984'?", answer: "George Orwell"}}
]
|> Enum.map(&DSPEx.Example.with_inputs(&1, [:question]))

# 3. Define a simple metric to score predictions.
metric_fun = fn example, prediction ->
  if String.trim(example.labels.answer) == String.trim(prediction.outputs.answer), do: 1.0, else: 0.0
end

# 4. Start the LLM client (e.g., for OpenAI).
#    This would typically be in your application's supervision tree.
{:ok, client_pid} = DSPEx.Client.start_link(
  name: :openai_client,
  config: %{
    api_key: System.get_env("OPENAI_API_KEY"),
    model: "gpt-4o-mini",
    base_url: "https://api.openai.com/v1/chat/completions"
  }
)

# 5. Create a simple, un-optimized (zero-shot) program.
uncompiled_program = %DSPEx.Predict{signature: QASignature, client: :openai_client}

# 6. Evaluate its baseline performance.
{:ok, baseline_score, _success, _failures} =
  DSPEx.Evaluate.run(uncompiled_program, devset, metric_fun)

IO.puts("Baseline Score: #{baseline_score}")
# => Baseline Score: 0.5 (example output, may vary)

# 7. Use a "teleprompter" to optimize the program by creating few-shot examples.
#    We use the same program as both student and teacher here for simplicity.
teleprompter = %DSPEx.Teleprompter.BootstrapFewShot{metric: metric_fun, max_demos: 1}

# The `compile` step runs the optimization process.
compiled_program = teleprompter.compile(uncompiled_program, trainset: devset)

# 8. Evaluate the newly compiled program.
{:ok, compiled_score, _success, _failures} =
  DSPEx.Evaluate.run(compiled_program, devset, metric_fun)

IO.puts("Compiled Score: #{compiled_score}")
# => Compiled Score: 1.0 (example output)

# The compiled program now has few-shot examples and performs better!
# IO.inspect(compiled_program.demos)

Installation

Add dspex to your list of dependencies in mix.exs:

def deps do
  [
    {:dspex, "~> 0.1.0"}
  ]
end

Core Concepts

Signatures: Declarative I/O contracts for your programs.
Programs: Executable modules that call LLMs (Predict, ChainOfThought, etc.).
Adapters: Translate between abstract signatures and concrete LLM API formats (e.g., OpenAI Chat).
Teleprompters: Optimizers that “compile” your programs by generating effective few-shot demonstrations.
Evaluation: A concurrent engine to measure your program’s performance against a dataset.


#### **2. Inline API Documentation (`@moduledoc` and `@doc`)**

This documentation will be written directly in the source code, allowing `ExDoc` to generate a professional and searchable documentation website.

**`lib/dspex/signature.ex`**

```elixir
defmodule DSPEx.Signature do
  @moduledoc """
  Defines the `@behaviour` and `use` macro for creating DSPEx Signatures.

  A signature declaratively defines the inputs and outputs of a DSPEx program.
  Using this macro at compile-time generates an efficient struct and the necessary
  functions to interact with the framework. All parsing happens at compile time,
  resulting in zero runtime overhead.

  ## Example

      defmodule QASignature do
        @moduledoc "Answer questions with a short, factual answer."
        use DSPEx.Signature, "question -> answer"
      end

      # This creates a module with:
      # - A struct: %QASignature{question: nil, answer: nil}
      # - A type: t :: %QASignature{}
      # - Implementations for all `DSPEx.Signature` callbacks.
  """

  @doc "The instructions for the LLM, derived from the module's `@moduledoc`."
  @callback instructions() :: String.t()

  @doc "A list of atoms representing the signature's input fields."
  @callback input_fields() :: list(atom())

  @doc "A list of atoms representing the signature's output fields."
  @callback output_fields() :: list(atom())

  # ... macro implementation ...
end

lib/dspex/program.ex

defmodule DSPEx.Program do
  @moduledoc """
  Defines the behaviour for all executable DSPEx modules.

  This contract ensures that any program, from a simple `Predict` to a complex
  `ChainOfThought`, can be composed, evaluated, and optimized in a consistent way.
  """

  @doc """
  The primary execution function for the program.

  Takes the program's own struct and a map of inputs (matching the
  signature's input fields) and returns a `DSPEx.Prediction` struct on success.
  """
  @callback forward(program :: struct(), inputs :: map()) ::
              {:ok, DSPEx.Prediction.t()} | {:error, any()}

  @doc """
  Configures a program with new parameters, such as few-shot demos.

  This is the key mechanism for optimization. It must return a *new*, configured
  program struct, preserving immutability. The `config` map can contain
  any parameters the specific program type understands (e.g., `%{demos: [...]}`).
  """
  @callback configure(program :: struct(), config :: map()) :: struct()
end

lib/dspex/predict.ex

defmodule DSPEx.Predict do
  @moduledoc """
  A simple DSPEx.Program for single-turn predictions.

  This module orchestrates the fundamental "format -> call -> parse" pipeline
  for a single LLM interaction. It is the simplest and most common program.

  ## Example

      # Assume QASignature and :my_client are defined
      predictor = %DSPEx.Predict{signature: QASignature, client: :my_client}
      DSPEx.Program.forward(predictor, %{question: "What is Elixir?"})
  """

  @behaviour DSPEx.Program

  # ... struct and implementation ...
end

lib/dspex/evaluate.ex

defmodule DSPEx.Evaluate do
  @moduledoc """
  A concurrent evaluation engine for DSPEx programs.

  Leverages `Task.async_stream` to run a program against a large development
  set with a high degree of I/O-bound concurrency, providing significant
  speedups over sequential or traditional thread-pool-based approaches.
  """

  @doc """
  Runs a given program against a development set and computes a metric.

  Returns a tuple: `{:ok, aggregate_score, successes, failures}`.
  Failures from individual program executions are collected and do not
  crash the overall evaluation run.

  ## Options

    * `:num_threads` - The maximum number of concurrent evaluation processes.
      Defaults to twice the number of online schedulers.
    * `:display_progress` - Whether to show a CLI progress bar. Defaults to `true`.
  """
  def run(program, devset, metric_fun, opts \\ []) do
    # ... implementation ...
  end
end

Document 5: Documentation & Developer Experience

1. Project README.md

Installation

Core Concepts

1. Project `README.md`