← Back to Gap analysis

DSPEX GAP ANALYSIS CORE 01

Documentation for DSPEX_GAP_ANALYSIS_CORE_01 from the Ds ex repository.

DSPEx Gap Analysis: Core Teleprompter Components

Overview

This document analyzes the gaps in DSPEx’s teleprompter/optimizer implementations compared to the original DSPy Python library. The analysis focuses on core algorithmic components that are either missing or incomplete.


🎯 SIMBA Teleprompter: Critical Algorithmic Gaps

Status: 60% Complete - Major Algorithmic Components Missing

Based on analysis of dspy/teleprompt/simba.py, DSPEx’s SIMBA implementation has excellent infrastructure but critical algorithmic gaps.

1. Program Selection Algorithm (CRITICAL BLOCKING ISSUE)

Python DSPy Implementation:

def calc_average_score(prog_idx: int) -> float:
    scores = program_scores.get(prog_idx, [])
    if not scores:
        return 0.0
    return sum(scores) / len(scores)

def softmax_sample(rng_obj: random.Random, program_idxs: list[int], temperature: float) -> int:
    # Unnormalized weights
    scores = [calc_average_score(idx) for idx in program_idxs]  # ✅ REAL SCORES
    exps = [np.exp(s / temperature) for s in scores]
    sum_exps = sum(exps)
    if sum_exps <= 0:
        return rng_obj.choice(program_idxs)
    
    # Weighted random choice
    probs = [val / sum_exps for val in exps]
    return rng_obj.choices(program_idxs, weights=probs, k=1)[0]

DSPEx Current (BROKEN):

defp softmax_sample(program_indices, _all_programs, temperature) do
  if is_list(program_indices) and length(program_indices) > 0 do
    scores = Enum.map(program_indices, fn _idx -> 0.5 end)  # ❌ FIXED SCORES!
    # This completely breaks the optimization algorithm
  end
end

Required Fix:

defp calculate_average_score(program_scores, prog_idx) do
  scores = Map.get(program_scores, prog_idx, [])
  if Enum.empty?(scores) do
    0.0
  else
    Enum.sum(scores) / length(scores)
  end
end

defp softmax_sample(program_indices, program_scores, temperature) do
  scores = Enum.map(program_indices, fn idx -> 
    calculate_average_score(program_scores, idx)
  end)
  
  # Calculate exponentials
  exps = Enum.map(scores, fn score -> 
    :math.exp(score / temperature)
  end)
  
  sum_exps = Enum.sum(exps)
  
  if sum_exps <= 0 do
    Enum.random(program_indices)
  else
    # Weighted random selection
    probs = Enum.map(exps, fn exp -> exp / sum_exps end)
    weighted_random_choice(program_indices, probs)
  end
end

2. Program Pool Management (CRITICAL)

Python DSPy Implementation:

def top_k_plus_baseline(k: int) -> list[int]:
    # Sort all programs by descending average score
    scored_programs = sorted(programs, key=lambda p: calc_average_score(p.simba_idx), reverse=True)
    top_k = [p.simba_idx for p in scored_programs[:k]]
    # Ensure baseline=0 is in there:
    if 0 not in top_k and len(top_k) > 0:
        top_k[-1] = 0
    return list(dict.fromkeys(top_k))

DSPEx Status:MISSING ENTIRELY

Required Implementation:

defp select_top_programs_with_baseline(programs, program_scores, k) do
  # Calculate average scores for all programs
  program_avg_scores = programs
  |> Enum.with_index()
  |> Enum.map(fn {_program, idx} ->
    avg_score = calculate_average_score(program_scores, idx)
    {idx, avg_score}
  end)
  |> Enum.sort_by(fn {_idx, score} -> score end, :desc)
  
  # Take top k
  top_k_indices = program_avg_scores
  |> Enum.take(k)
  |> Enum.map(fn {idx, _score} -> idx end)
  
  # Ensure baseline (index 0) is included
  if 0 not in top_k_indices and length(top_k_indices) > 0 do
    # Replace last element with baseline
    List.replace_at(top_k_indices, -1, 0)
  else
    top_k_indices
  end
  |> Enum.uniq()
end

3. Main Optimization Loop Logic (INCOMPLETE)

Python DSPy Key Steps:

  1. Mini-batch Selection ✅ DSPEx Complete
  2. Model Preparation ✅ DSPEx Complete
  3. Program Selection ❌ DSPEx Broken (fixed scores)
  4. Trajectory Sampling ⚠️ DSPEx Over-complex but functional
  5. Bucket Analysis ✅ DSPEx Complete and Superior
  6. Strategy Application ⚠️ DSPEx Partial (only AppendDemo)
  7. Candidate Evaluation ✅ DSPEx Complete
  8. Program Registration ❌ DSPEx Missing logic
  9. Winning Program Selection ⚠️ DSPEx Simplified

Missing Integration: The DSPEx loop exists but doesn’t properly integrate these steps with real scoring and selection logic.

4. Strategy System Gaps

Python DSPy Strategies:

if self.max_demos > 0:
    self.strategies = [append_a_demo(demo_input_field_maxlen), append_a_rule]
else:
    self.strategies = [append_a_rule]

# Random strategy selection
strategy = rng.choice(self.strategies)

DSPEx Status:

  • ✅ AppendDemo strategy complete with Poisson sampling
  • ❌ AppendRule strategy missing entirely
  • ❌ Strategy selection logic incomplete

Required AppendRule Implementation:

defmodule DSPEx.Teleprompter.SIMBA.Strategy.AppendRule do
  @behaviour DSPEx.Teleprompter.SIMBA.Strategy
  
  @impl true
  def apply(bucket, program, _config, _context) do
    # Extract high-performing trajectories
    high_performance_trajectories = bucket.trajectories
    |> Enum.filter(&(&1.score > bucket.avg_score))
    |> Enum.take(3)
    
    if Enum.empty?(high_performance_trajectories) do
      {:ok, program}
    else
      # Generate rule from patterns in high-performing trajectories
      rule = generate_rule_from_trajectories(high_performance_trajectories)
      enhanced_program = add_rule_to_program(program, rule)
      {:ok, enhanced_program}
    end
  end
  
  defp generate_rule_from_trajectories(trajectories) do
    # Analyze common patterns in successful trajectories
    # This would use LLM to generate rules based on successful examples
    "Based on successful examples, ensure you follow this pattern: ..."
  end
  
  defp add_rule_to_program(program, rule) do
    # Add the generated rule to the program's instructions
    # Implementation depends on program structure
    program
  end
end

🔧 Other Teleprompters: Missing Implementations

1. MIPRO (Multi-prompt Instruction Proposal) - MISSING

Status: ❌ Not implemented in DSPEx

Python DSPy Features:

  • Multi-step instruction optimization
  • Automatic prompt proposal and refinement
  • Bayesian optimization for hyperparameters
  • Support for multiple signature optimization

Implementation Priority: HIGH (widely used optimizer)

2. COPRO (Curriculum-based Prompt Optimizer) - MISSING

Status: ❌ Not implemented in DSPEx

Python DSPy Features:

  • Curriculum learning approach
  • Progressive difficulty increase
  • Adaptive example selection
  • Meta-learning components

Implementation Priority: MEDIUM

3. Ensemble - MISSING

Status: ❌ Not implemented in DSPEx

Python DSPy Features:

  • Multiple model combination
  • Voting mechanisms
  • Confidence-weighted averaging
  • Diversity-based selection

Implementation Priority: MEDIUM

4. BootstrapFinetune - MISSING

Status: ❌ Not implemented in DSPEx

Python DSPy Features:

  • Model fine-tuning on generated data
  • Bootstrap data generation
  • Training pipeline integration
  • Model adapter support

Implementation Priority: LOW (requires ML training infrastructure)


🧩 Core Predict Modules: Missing Advanced Patterns

1. ChainOfThought - MISSING

Status: ❌ Not implemented in DSPEx

Python DSPy Implementation:

class ChainOfThought(Module):
    def __init__(self, signature, rationale_type=None, activated=True, **config):
        self.signature = signature
        self.activated = activated
        self.predict = Predict(signature, **config)
        
    def forward(self, **kwargs):
        # Add rationale to signature
        signature = self.signature.with_instructions("Let's think step by step.")
        return self.predict(signature=signature, **kwargs)

Required DSPEx Implementation:

defmodule DSPEx.Predict.ChainOfThought do
  @moduledoc """
  Chain of Thought reasoning module that adds step-by-step thinking.
  """
  
  use DSPEx.Program
  
  defstruct [:signature, :predict, :activated]
  
  def new(signature, opts \\ []) do
    activated = Keyword.get(opts, :activated, true)
    
    # Enhance signature with chain of thought instructions
    enhanced_signature = add_rationale_to_signature(signature)
    predict = DSPEx.Predict.new(enhanced_signature, opts)
    
    %__MODULE__{
      signature: signature,
      predict: predict,
      activated: activated
    }
  end
  
  @impl DSPEx.Program
  def forward(program, inputs, opts \\ []) do
    if program.activated do
      DSPEx.Program.forward(program.predict, inputs, opts)
    else
      # Fallback to basic predict without rationale
      basic_predict = DSPEx.Predict.new(program.signature, opts)
      DSPEx.Program.forward(basic_predict, inputs, opts)
    end
  end
  
  defp add_rationale_to_signature(signature) do
    # Add thinking/rationale field to signature
    # Implementation depends on signature system
    signature
  end
end

2. ReAct (Reason + Act) - MISSING

Status: ❌ Not implemented in DSPEx

Features Needed:

  • Thought-action-observation loops
  • Tool integration
  • Multi-step reasoning
  • Action space definition

3. ProgramOfThought - MISSING

Status: ❌ Not implemented in DSPEx

Features Needed:

  • Code generation capabilities
  • Code execution environment
  • Result integration
  • Error handling for code execution

4. MultiChainComparison - MISSING

Status: ❌ Not implemented in DSPEx

Features Needed:

  • Multiple reasoning chains
  • Comparison mechanisms
  • Best chain selection
  • Confidence scoring

📊 Assessment and Assertions System: Completely Missing

Status: ❌ Not implemented in DSPEx

Python DSPy Features:

import dspy

def basic_qa_assertion(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    if answer_EM < 0.5:
        dspy.Suggest(
            f"The predicted answer is {pred.answer}. However, the gold answer is {example.answer}.",
            target_module="generate_answer"
        )
    return answer_EM

# Usage in programs
with dspy.context(lm=lm):
    with dspy.settings.context(assert_=basic_qa_assertion):
        pred = program(question="What is the capital of France?")

Critical Missing Components:

  1. dspy.Assert() - Hard constraints with retry logic
  2. dspy.Suggest() - Soft hints for improvement
  3. Context management - Assertion integration
  4. Backtracking - Retry with constraints
  5. Constraint satisfaction - Runtime validation

Implementation Priority: HIGH (core to DSPy’s self-improving nature)


🔍 Retrieval Models: Major Gap

Status: ❌ Completely missing from DSPEx

Python DSPy Retrieval Ecosystem:

  • ColBERTv2 integration
  • Vector database connectors (Pinecone, Weaviate, ChromaDB, etc.)
  • Embedding models
  • Retrieval-augmented generation (RAG) support

Missing Components:

  1. DSPEx.Retrieve behavior and base module
  2. Vector database integrations
  3. Embedding model support
  4. RAG pipeline components

This is the largest functional gap - DSPy is heavily used for RAG applications.


🏗️ Infrastructure Gaps

1. Parallel Execution

Python DSPy:

run_parallel = dspy.Parallel(access_examples=False, num_threads=self.num_threads)
outputs = run_parallel(exec_pairs)

DSPEx Status: ✅ Has superior Task.async_stream implementation

2. Model/LM System

Python DSPy: Extensive LiteLLM integration with 100+ models DSPEx Status: ⚠️ Basic client system, limited providers

3. Caching and Persistence

Python DSPy: Built-in caching with various backends DSPEx Status: ⚠️ Basic caching, needs enhancement


🎯 Implementation Priority Matrix

ComponentPriorityEffortImpactStatus
SIMBA Program SelectionCRITICALLOWHIGHBlocking
SIMBA Program Pool MgmtCRITICALMEDIUMHIGHMissing
AppendRule StrategyHIGHMEDIUMMEDIUMMissing
ChainOfThoughtHIGHMEDIUMHIGHMissing
Assert/Suggest SystemHIGHHIGHHIGHMissing
Retrieval ModelsHIGHHIGHVERY HIGHMissing
MIPRO OptimizerMEDIUMHIGHMEDIUMMissing
ReAct ModuleMEDIUMHIGHMEDIUMMissing
Model Provider SupportMEDIUMMEDIUMMEDIUMPartial

📋 Next Steps

  1. Fix SIMBA critical algorithmic gaps (blocking issues)
  2. Implement missing core predict modules (ChainOfThought, ReAct)
  3. Build retrieval model system (enables RAG use cases)
  4. Add assertions and suggestions (core DSPy feature)
  5. Expand teleprompter options (MIPRO, Ensemble)

This analysis shows DSPEx has excellent infrastructure but needs significant algorithmic completion to match DSPy’s functionality.