DSPy tut prog

Documentation for DSPy_tut_prog from the Ds ex repository.

DSPy Compact Tutorial: Understanding for DSPEx Development

Overview: What is DSPy?

DSPy is a compiler for Language Model programs, not just a prompting framework. It treats prompts as optimizable artifacts and automatically discovers the best prompting strategies through data-driven optimization.

graph LR A[Manual Prompting] -->|Problems| B[Brittle, Unscientific] C[DSPy Approach] -->|Solutions| D[Systematic, Data-Driven] B --> E[Trial & Error
Hard to Scale
No Metrics] D --> F[Automated Optimization
Reproducible
Measurable]

Core DSPy Architecture

DSPEx Translation: Your Elixir port follows this exact architecture with BEAM-native implementations:

DSPEx.Signature ↔ dspy.Signature
DSPEx.Predict ↔ dspy.Predict
DSPEx.Evaluate ↔ dspy.Evaluate
DSPEx.Teleprompter ↔ dspy.teleprompt

1. Signatures: The Foundation

Signatures define the input/output contract for your program:

# Python DSPy
class QASignature(dspy.Signature):
    """Answer questions with detailed reasoning"""
    question = dspy.InputField()
    context = dspy.InputField()
    answer = dspy.OutputField()
    reasoning = dspy.OutputField()

# Your DSPEx equivalent
defmodule QASignature do
  @moduledoc "Answer questions with detailed reasoning"
  use DSPEx.Signature, "question, context -> answer, reasoning"
end

Key Insight: DSPy signatures are more than schemas - they’re optimization targets. The teleprompter can modify instructions and field descriptions to improve performance.

2. Programs: Executable Units

sequenceDiagram participant User participant Program as DSPy.Predict participant Adapter participant LM as Language Model User->>Program: forward(question="What is OTP?") Program->>Adapter: format_messages(signature, inputs, demos) Adapter->>LM: API call with formatted prompt LM->>Adapter: Raw response Adapter->>Program: parse_response() Program->>User: {answer: "...", reasoning: "..."}

# Python DSPy
predictor = dspy.Predict(QASignature)
result = predictor(question="What is OTP?", context="...")

# Your DSPEx equivalent  
program = DSPEx.Predict.new(QASignature, :gemini)
{:ok, result} = DSPEx.Program.forward(program, %{question: "What is OTP?", context: "..."})

3. The Magic: Few-Shot Learning

DSPy’s power comes from automatic few-shot optimization:

Before Optimization:

Question: What is machine learning?
Answer: [Poor, generic response]

After BootstrapFewShot:

Question: What is supervised learning?
Answer: Supervised learning uses labeled training data to learn mappings from inputs to outputs...

Question: What is unsupervised learning?  
Answer: Unsupervised learning finds patterns in data without labeled examples...

Question: What is machine learning?
Answer: [Much better, contextual response following the pattern]

4. Evaluation: The Feedback Loop

# Python DSPy
def accuracy_metric(example, prediction):
    return example.answer.lower() == prediction.answer.lower()

evaluate = dspy.Evaluate(devset=test_examples, metric=accuracy_metric)
score = evaluate(my_program)

# Your DSPEx equivalent
metric_fn = fn example, prediction ->
  expected = DSPEx.Example.get(example, :answer) |> String.downcase()
  actual = Map.get(prediction, :answer) |> String.downcase()
  if expected == actual, do: 1.0, else: 0.0
end

{:ok, result} = DSPEx.Evaluate.run(program, examples, metric_fn)

5. Teleprompters: The Optimizers

The most important teleprompter is BootstrapFewShot:

sequenceDiagram participant T as Teacher (GPT-4) participant S as Student (GPT-3.5) participant BS as BootstrapFewShot participant E as Evaluator Note over BS: For each training example... BS->>T: Generate high-quality output T->>BS: Quality demonstration BS->>E: Score demonstration E->>BS: Quality score Note over BS: Select best demonstrations BS->>S: Attach demos to student S->>BS: Optimized program

# Python DSPy workflow
teacher = dspy.Predict(QASignature, lm=dspy.OpenAI(model="gpt-4"))
student = dspy.Predict(QASignature, lm=dspy.OpenAI(model="gpt-3.5-turbo"))

teleprompter = dspy.BootstrapFewShot(metric=accuracy_metric)
optimized_student = teleprompter.compile(student, teacher=teacher, trainset=train_examples)

# Your DSPEx equivalent
teacher = DSPEx.Predict.new(QASignature, :openai_gpt4)
student = DSPEx.Predict.new(QASignature, :gemini_flash)

{:ok, optimized_student} = DSPEx.Teleprompter.BootstrapFewShot.compile(
  student, teacher, train_examples, metric_fn
)

6. Complete DSPy Workflow

graph TD subgraph Step1["1 Define Task"] A[Create Signature
Input/Output Fields] end subgraph Step2["2 Create Programs"] B[Student Program
Cheap Model] C[Teacher Program
Expensive Model] end subgraph Step3["3 Prepare Data"] D[Training Examples
Input + Expected Output] E[Validation Set
For Testing] F[Metric Function
How to Score] end subgraph Step4["4 Optimization"] G[BootstrapFewShot
Teleprompter] G -->|Uses teacher to create
demonstrations| H[Optimized Student] end subgraph Step5["5 Evaluation"] I[Test Optimized Program
on Validation Set] I --> J[Performance Metrics] end A --> B A --> C B --> G C --> G D --> G F --> G H --> I E --> I

7. DSPEx Advantages: BEAM-Native Benefits

Your Elixir port provides significant architectural advantages:

graph TD subgraph Python["Python DSPy Limitations"] A[Thread-based Concurrency
GIL Limited] B[Manual Error Handling
Process Crashes] C[External Monitoring
Required] end subgraph Elixir["DSPEx BEAM Advantages"] D[Process-based Concurrency
10,000+ Concurrent Evals] E[Fault Tolerance
OTP Supervision] F[Built-in Telemetry
Native Observability] end A -.->|DSPEx improves| D B -.->|DSPEx improves| E C -.->|DSPEx improves| F

Concrete Performance Example:

# DSPEx can handle massive concurrent evaluation
DSPEx.Evaluate.run(program, 10_000_examples, metric_fn, 
                   max_concurrency: 1000)  # 1000 concurrent processes!

# Python DSPy limited by threads/GIL
# Much slower, more memory intensive

8. Key Takeaways for DSPEx Development

Signatures are Optimization Targets: Not just type definitions, but contracts that teleprompters can modify
Demonstrations are Everything: The core value is automatic few-shot learning through bootstrapping
Evaluation Drives Optimization: Metrics provide the feedback signal for improvement
Concurrency is Critical: Evaluation is I/O bound - BEAM’s process model is perfect
Fault Tolerance Matters: Long-running optimization jobs need resilience

9. Implementation Priority Map

Based on your current DSPEx status, focus on:

graph TD subgraph Current["✅ Already Implemented"] A[Signatures & Programs] B[Basic Prediction] C[Concurrent Evaluation] D[BootstrapFewShot] end subgraph Next["🔄 Next Phase"] E[ChainOfThought Programs] F[Multiple Teleprompters] G[Advanced Adapters] end subgraph Future["🚀 BEAM-Specific"] H[Distributed Optimization] I[Hot Code Upgrades] J[Advanced Supervision] end A --> E B --> F C --> H D --> F

Your DSPEx implementation is already capturing the core DSPy value proposition while leveraging BEAM’s unique strengths for superior concurrency and fault tolerance. The foundation is solid for advanced features that Python DSPy cannot easily achieve.