← Back to Gap analysis

DSPEX CORE GAPS

Documentation for DSPEX_CORE_GAPS from the Ds ex repository.

DSPEx Core Components Gap Analysis

Executive Summary

This document provides a comprehensive analysis of gaps between the implemented DSPEx core components and their DSPy Python counterparts. The analysis focuses exclusively on components that exist in DSPEx and identifies what needs to be enhanced, fixed, or completed to achieve feature parity with DSPy.

Overall Assessment: DSPEx has established a solid foundation with excellent Elixir/OTP patterns, but has significant gaps in core functionality, advanced reasoning modules, dynamic capabilities, and ecosystem completeness.


🏗️ Core Architecture Gaps

1. Signature System: Missing Dynamic Capabilities

DSPEx Implementation Status: ✅ Good foundation with enhanced parser Critical Gaps:

1.1 Dynamic Signature Creation (CRITICAL GAP)

  • DSPy Feature: Uses Pydantic’s create_model() for runtime signature generation
  • DSPEx Gap: Only supports compile-time signature definition via macros
  • Impact: Cannot adapt signatures dynamically for optimization or API-driven workflows
  • Required Fix:
    # Need to implement runtime signature generation
    DSPEx.Signature.create_dynamic(
      "DynamicSignature",
      inputs: [:query, :context],
      outputs: [:answer, :confidence],
      instructions: "Answer with high confidence"
    )
    

1.2 Signature Modification and Extension

  • DSPy Feature: signature.with_updated_fields(), signature.append(), signature.prepend()
  • DSPEx Status: ⚠️ Limited - only basic extend/2 function
  • Gap: Cannot dynamically modify existing signatures at runtime
  • Required Enhancement:
    # Missing signature manipulation API
    extended_sig = DSPEx.Signature.with_updated_fields(base_sig, %{
      rationale: %{type: :string, desc: "Step by step reasoning"}
    })
    

1.3 Signature Introspection and Metadata

  • DSPy Feature: Rich introspection with field types, descriptions, constraints
  • DSPEx Gap: Basic introspection, missing field metadata and type information
  • Impact: Limited ability for teleprompters to understand and modify signatures

2. Program System: Missing Module Composition

DSPEx Implementation Status: ✅ Good Program behavior and forward/3 API Critical Gaps:

2.1 Module Discovery and Introspection

  • DSPy Feature: named_predictors() automatically discovers sub-modules
  • DSPEx Gap: No automatic discovery of nested programs
  • Impact: Cannot automatically optimize complex composed programs
  • Required Implementation:
    defmodule DSPEx.Program do
      def named_predictors(program) do
        # Discover all DSPEx.Program structs within a program
        # Essential for teleprompter optimization
      end
    end
    

2.2 Program State Management

  • DSPy Feature: dump_state() and load_state() for persistence
  • DSPEx Gap: No built-in state serialization
  • Impact: Cannot save/restore optimized programs

2.3 Parallel Execution Patterns

  • DSPy Feature: program.batch() for parallel execution
  • DSPEx Status: ⚠️ Has Task.async_stream but no standardized interface
  • Gap: Missing high-level parallel execution API

3. Predict Module: Missing Advanced Prediction Patterns

DSPEx Implementation Status: ✅ Good core prediction with SIMBA support Critical Gaps:

3.1 Demo Management System

  • DSPy Feature: Automatic demo selection, demo validation, demo weighting
  • DSPEx Gap: Basic demo storage, no intelligent demo management
  • Impact: Suboptimal few-shot learning performance
  • Required Enhancement:
    defmodule DSPEx.Predict.DemoManager do
      def select_relevant_demos(program, inputs, k) do
        # Intelligent demo selection based on similarity
      end
    
      def validate_demo_quality(demo, signature) do
        # Demo quality assessment
      end
    end
    

3.2 Trace Collection and Analysis

  • DSPy Feature: Automatic trace collection for optimization
  • DSPEx Gap: No trace collection system
  • Impact: Teleprompters cannot analyze program execution patterns

3.3 Advanced Output Parsing

  • DSPy Feature: Structured output parsing with multiple formats
  • DSPEx Gap: Basic string-based parsing only
  • Impact: Limited ability to handle complex outputs

🧠 Advanced Reasoning Modules: Major Missing Components

4. Chain of Thought (CoT) - COMPLETELY MISSING

Status: ❌ Not implemented Priority: CRITICAL (most widely used DSPy pattern)

DSPy Implementation:

class ChainOfThought(Module):
    def __init__(self, signature):
        self.signature = signature.with_updated_fields(
            "rationale", desc="Let's think step by step."
        )
        self.predict = Predict(self.signature)

Required DSPEx Implementation:

defmodule DSPEx.Predict.ChainOfThought do
  use DSPEx.Program
  
  defstruct [:original_signature, :enhanced_signature, :predict, :activated]
  
  def new(signature, opts \\ []) do
    # Extend signature with rationale field
    enhanced_signature = DSPEx.Signature.extend(signature, %{
      rationale: %{type: :string, desc: "Let's think step by step."}
    })
    
    %__MODULE__{
      original_signature: signature,
      enhanced_signature: enhanced_signature,
      predict: DSPEx.Predict.new(enhanced_signature, opts),
      activated: Keyword.get(opts, :activated, true)
    }
  end
  
  @impl DSPEx.Program
  def forward(program, inputs, opts \\ []) do
    # Implementation needed
  end
end

5. ReAct (Reason + Act) - COMPLETELY MISSING

Status: ❌ Not implemented
Priority: HIGH (critical for agent applications)

Required Components:

  • Tool integration framework
  • Action-observation loops
  • Multi-step reasoning chains
  • Error handling and recovery

6. Multi-Chain Comparison - COMPLETELY MISSING

Status: ❌ Not implemented Priority: MEDIUM

DSPy Feature: Generate multiple reasoning chains and select the best DSPEx Need: Parallel reasoning with comparison logic


🔧 Teleprompter System: Algorithmic Gaps

7. SIMBA Teleprompter: Critical Algorithmic Issues

DSPEx Implementation Status: ⚠️ 60% Complete - Has infrastructure but broken core algorithms

7.1 Program Selection Algorithm (BLOCKING ISSUE)

Current DSPEx Bug:

# BROKEN: Uses fixed scores instead of real performance scores
defp softmax_sample(program_indices, _all_programs, temperature) do
  scores = Enum.map(program_indices, fn _idx -> 0.5 end)  # ❌ FIXED SCORES!
  # This completely breaks optimization
end

Required Fix:

defp softmax_sample(program_indices, program_scores, temperature) do
  scores = Enum.map(program_indices, fn idx -> 
    calculate_average_score(program_scores, idx)  # ✅ REAL SCORES
  end)
  # Proper softmax sampling implementation
end

7.2 Program Pool Management (MISSING)

DSPy Feature: top_k_plus_baseline() for program selection DSPEx Gap: No program pool management logic Impact: Cannot maintain and select from optimized program variants

7.3 Strategy System Incompleteness

Implemented: ✅ AppendDemo strategy Missing: ❌ AppendRule strategy (critical for instruction optimization) Impact: Cannot generate and apply instruction improvements

8. BootstrapFewShot: Missing Advanced Features

DSPEx Implementation Status: ✅ Basic implementation complete Gaps:

8.1 Advanced Bootstrapping Strategies

  • Missing: Teacher-student diversity mechanisms
  • Missing: Bootstrap quality filtering
  • Missing: Curriculum learning integration

8.2 Validation and Error Handling

  • Gap: Limited validation of generated examples
  • Impact: May produce low-quality demonstrations

📊 Evaluation System: Significant Limitations

9. Metrics Framework: Basic Implementation

DSPEx Implementation Status: ⚠️ Basic evaluation only Critical Gaps:

9.1 Advanced Metrics Missing

Available: Basic exact match Missing:

  • Semantic similarity metrics
  • BLEU/ROUGE scores
  • Custom metric framework
  • Confidence scoring

9.2 Evaluation Infrastructure Gaps

Missing:

  • Statistical analysis tools
  • Result visualization
  • Performance benchmarking
  • A/B testing framework

🔌 Adapter System: Limited Sophistication

10. Response Parsing: Overly Simplistic

DSPEx Implementation Status: ⚠️ Basic string parsing only Critical Gaps:

10.1 Structured Output Parsing

DSPy Feature: Field markers, JSON parsing, type-aware extraction DSPEx Gap: Simple string splitting only Impact: Cannot handle complex multi-field outputs reliably

Required Enhancement:

defmodule DSPEx.Adapter.StructuredParser do
  def parse_with_field_markers(response, signature) do
    # Parse: ## field_name ## content
  end
  
  def parse_json_response(response, signature) do
    # Type-aware JSON parsing
  end
end

10.2 Format Fallback Mechanisms

DSPy Feature: ChatAdapter → JSONAdapter fallback DSPEx Gap: No adapter fallback system Impact: Brittle parsing with no recovery mechanisms

11. Message Formatting: Missing Advanced Features

Gaps:

  • No system message support
  • Limited multi-turn conversation handling
  • No prompt engineering utilities
  • Missing instruction injection mechanisms

🔍 Client System: Missing Provider Ecosystem

12. Provider Support: Limited Ecosystem

DSPEx Implementation Status: ⚠️ Basic OpenAI and Gemini only DSPy Advantage: 100+ models via LiteLLM integration

Critical Gaps:

  • Anthropic Claude integration
  • Local model support (Ollama, etc.)
  • Hugging Face model integration
  • Provider-specific optimizations

13. Advanced Client Features

Missing Components:

  • Sophisticated caching strategies
  • Request batching
  • Circuit breaker patterns (noted as bypassed)
  • Usage tracking and cost monitoring

🏛️ Elixact Integration: Bridge Limitations

14. Dynamic Schema Generation: Architectural Challenge

Status: ⚠️ Partially implemented but limited Critical Gap: Cannot create Elixact schemas at runtime equivalent to DSPy’s create_model()

Current Limitation:

# DSPEx can only do this at compile time via macros
defmodule MySchema do
  use Elixact
  schema do
    field :question, :string
    field :answer, :string
  end
end

Required Capability:

# Need runtime schema generation for dynamic optimizations
schema_module = DSPEx.Signature.Elixact.create_runtime_schema(
  signature, 
  additional_fields: %{confidence: :float}
)

15. Type System Integration

Gaps:

  • Limited type inference from field names
  • No custom type validation
  • Missing constraint propagation
  • Incomplete JSON schema generation

🎯 Priority Matrix for Core Gaps

ComponentSeverityImplementation EffortDSPy UsageStatus
SIMBA Program Selection FixCRITICALLOWHIGH❌ Blocking
Chain of ThoughtCRITICALMEDIUMVERY HIGH❌ Missing
Dynamic Signature CreationHIGHHIGHHIGH❌ Missing
Program Module DiscoveryHIGHMEDIUMHIGH❌ Missing
Structured Output ParsingHIGHMEDIUMHIGH⚠️ Basic
ReAct ImplementationMEDIUMHIGHMEDIUM❌ Missing
Advanced Demo ManagementMEDIUMMEDIUMHIGH⚠️ Basic
Evaluation MetricsMEDIUMMEDIUMMEDIUM⚠️ Basic

📋 Implementation Roadmap

Phase 1: Critical Core Fixes (2-3 weeks)

  1. Fix SIMBA program selection algorithm - Replace fixed scores with real performance tracking
  2. Implement Chain of Thought - Most essential reasoning pattern
  3. Enhance structured output parsing - Better field extraction

Phase 2: Dynamic Capabilities (4-6 weeks)

  1. Dynamic signature creation - Runtime signature generation via Code.eval_string
  2. Program module discovery - Automatic sub-program detection
  3. Advanced demo management - Intelligent demo selection and validation

Phase 3: Advanced Reasoning (6-8 weeks)

  1. ReAct implementation - Tool-enabled reasoning loops
  2. Multi-chain comparison - Multiple reasoning path evaluation
  3. Program composition patterns - High-level program building

Phase 4: Ecosystem Completion (4-6 weeks)

  1. Provider ecosystem expansion - Additional LLM providers
  2. Advanced evaluation metrics - Semantic similarity, BLEU/ROUGE
  3. Production features - Enhanced caching, monitoring, error handling

🎯 Success Criteria

Core Functionality Parity:

  • Chain of Thought reasoning works reliably
  • SIMBA optimization actually improves program performance
  • Dynamic signature creation enables runtime adaptability
  • Structured output parsing handles complex responses

Developer Experience Parity:

  • Programs can be composed as easily as in DSPy
  • Teleprompter optimization is transparent and effective
  • Error messages are clear and actionable
  • Performance matches or exceeds DSPy

Production Readiness:

  • All core tests pass consistently
  • Performance is predictable under load
  • Error handling is comprehensive
  • Telemetry provides actionable insights

This analysis shows that while DSPEx has an excellent foundation, significant work remains to achieve full DSPy parity. The focus should be on fixing critical algorithmic bugs, implementing essential missing patterns, and building dynamic capabilities that enable runtime adaptation and optimization.