← Back to 20250617

ELIXACT TODO

Documentation for ELIXACT_TODO from the Ds ex repository.

Elixact Limitations Discovered During DSPEx Integration (Phases 1 & 2)

Overview

During the complete implementation of DSPEx Elixact integration (both Phase 1: Enhanced Signature System and Phase 2: Configuration Validation), multiple critical limitations in Elixact were discovered that prevent full integration. This document catalogs ALL limitations encountered across both phases while implementing the comprehensive Pydantic-style validation system.

Integration Scope Attempted

Phase 1: Enhanced Signature System (COMPLETED with workarounds)

Goal: Enable Pydantic-style field definitions with types and constraints in DSPEx signatures Example: "name:string[min_length=2,max_length=50] -> greeting:string[max_length=100]"

Successfully Implemented:

  • Enhanced signature parsing with 15+ constraint types (min_length, max_length, gteq, lteq, format, choices, etc.)
  • Array type support: tags:array(string)[min_items=1,max_items=10]
  • Automatic Elixact schema generation from enhanced signatures
  • Backward compatibility with basic signatures
  • Complex constraint parsing with nested brackets and regex patterns

Workarounds Required (limitations discovered):

  • Simplified regex patterns in tests due to regex compilation issues
  • Manual constraint validation fallbacks for complex business rules
  • Limited nested object support for array element validation

Phase 2: Configuration System Enhancement (75% complete, blocked)

Goal: Replace 78+ manual validation functions across 8 configuration domains with declarative Elixact schema-based validation

Domains Implemented:

  1. Client Configuration: timeout, retry_attempts, backoff_factor
  2. Provider Configuration: api_key, base_url, default_model, timeout, rate_limit, circuit_breaker
  3. Prediction Configuration: default_provider, default_temperature, default_max_tokens, cache_enabled, cache_ttl
  4. Evaluation Configuration: batch_size, parallel_limit
  5. Teleprompter Configuration: bootstrap_examples, validation_threshold
  6. BEACON Configuration: default_instruction_model, default_evaluation_model, max_concurrent_operations, optimization, bayesian_optimization
  7. Logging Configuration: level, correlation_enabled
  8. Telemetry Configuration: enabled, detailed_logging, performance_tracking

Phase 1 Limitations Discovered (Worked Around)

1. Regex Compilation in Schema Generation (MEDIUM)

Issue: Complex regex patterns caused compilation failures during dynamic schema generation Examples:

  • Format constraints with complex patterns: format=/^[a-zA-Z0-9\s]+$/
  • Regex patterns with special characters in test data

Workaround Applied: Simplified regex patterns in tests, avoided complex regex in schema generation Impact: Limited regex validation capabilities for enhanced signatures

2. Dynamic Module Generation Memory Issues (LOW)

Issue: Repeated schema generation creates many temporary modules that aren’t garbage collected Context: generate_elixact_schema/2 creates unique module names: #{signature}.ElixactSchema.#{System.unique_integer}

Workaround Applied: Used unique integer suffixes, relied on VM cleanup Impact: Potential memory accumulation in long-running processes

3. Nested Object Schema Support Limited (MEDIUM)

Issue: Cannot validate array element constraints beyond basic type checking Example: tags:array(string)[min_length=2] - cannot validate that each string has min_length=2

Workaround Applied: Only validated array-level constraints (min_items, max_items) Impact: Limited validation for complex array structures

4. Constraint Validation Error Context (LOW)

Issue: Elixact errors don’t provide enough context for constraint violations Example: “expected integer” instead of “min_length constraint requires positive integer”

Workaround Applied: Custom error message mapping in dspex_errors_to_elixact/1 Impact: Limited error message clarity for developers

Phase 2 Critical Limitations Discovered (Blocking)

1. :atom Type Not Implemented (CRITICAL)

Issue: :atom type causes runtime UndefinedFunctionError Error:

** (UndefinedFunctionError) function Elixact.Type.Atom.validate/1 is undefined (module Elixact.Type.Atom is not available)

Affected Fields (12 total):

  • default_provider (:gemini, :openai choices)
  • default_instruction_model (:gemini, :openai choices)
  • default_evaluation_model (:gemini, :openai choices)
  • level (:emergency, :alert, :critical, :error, :warning, :warn, :notice, :info, :debug choices)
  • acquisition_function (:expected_improvement, :probability_of_improvement, :upper_confidence_bound choices)
  • surrogate_model (:gaussian_process, :random_forest, :extra_trees choices)

Impact: ~40% of all configuration fields cannot be validated

2. Union Type Support Missing (CRITICAL)

Issue: Cannot validate fields that accept multiple types Primary Use Case: API keys accepting both string and {:system, "ENV_VAR"} tuple

Example:

# Need to validate both:
api_key: "sk-abc123"                    # string
api_key: {:system, "OPENAI_API_KEY"}    # {atom, string} tuple

Affected Fields:

  • All provider api_key fields (4+ providers)
  • Any configuration supporting environment variable references

Current Workaround: Only validates string format, tuple format fails

3. Nil Validation Logic Incorrect (HIGH)

Issue: optional() fields incorrectly accept nil values Expected Behavior:

  • Field missing from map: ✅ (optional)
  • Field present with nil: ❌ (should fail)
  • Field present with valid value: ✅

Current Behavior: nil values pass validation for optional fields

Impact: All 25+ optional fields accept invalid nil values

4. Nested Map Validation Unsupported (HIGH)

Issue: :map type fields cannot validate internal structure Use Cases:

# rate_limit map needs internal validation:
rate_limit: %{
  requests_per_minute: 100,    # must be positive integer
  tokens_per_minute: 10000     # must be positive integer
}

# circuit_breaker map needs internal validation:
circuit_breaker: %{
  failure_threshold: 5,        # must be positive integer
  recovery_time: 60000         # must be positive integer  
}

# optimization map for BEACON:
optimization: %{
  max_trials: 10,              # must be positive integer
  convergence_patience: 3,     # must be positive integer
  improvement_threshold: 0.05  # must be float 0.0-1.0
}

Affected Configurations: Provider, BEACON (8+ nested map fields)

5. choices() Constraint Issues (MEDIUM)

Issue: choices/1 macro behavior unclear for complex types Examples:

  • choices([:gemini, :openai]) with :atom type (depends on #1)
  • choices/1 with custom validation rules

6. Path-Based Validation Architecture Mismatch (HIGH)

Issue: Elixact expects complete object validation, but DSPEx needs single-field validation DSPEx Pattern:

# Validate individual configuration paths:
validate_config_value([:dspex, :client, :timeout], 30000)
validate_config_value([:dspex, :providers, :openai, :api_key], "sk-123")

Elixact Pattern:

# Expects complete objects:
ClientConfiguration.validate(%{timeout: 30000, retry_attempts: 3, backoff_factor: 2.0})

Current Workaround: Creating single-field maps for validation, but conflicts with required field validation

7. Custom Error Messages Not Supported (MEDIUM)

Issue: Cannot customize validation error messages per field Current: Generic “expected integer, got string” Needed: Business-specific errors like “Must be a positive integer representing timeout in milliseconds”

8. Custom Validation Functions Missing (MEDIUM)

Issue: No support for complex business logic validation Examples Needed:

  • URL format validation for base_url
  • Numeric range validation with custom messages
  • Cross-field validation rules
  • Environment variable existence checking

9. Wildcard Path Support (LOW)

Issue: DSPEx uses wildcard paths for provider configurations Pattern: [:dspex, :providers, :*, :api_key] (any provider) Need: Schema validation that works with dynamic provider names

Test Failures Summary

Total Phase 2 Tests: 45+ tests Currently Failing: ~35 tests (75% failure rate) Failure Categories:

  • :atom type errors: 15+ tests
  • Union type validation: 8+ tests
  • Nil handling: 10+ tests
  • Nested map validation: 5+ tests
  • Error format mismatches: 7+ tests

Implementation Status

✅ Working:

  • Schema module creation and compilation
  • Basic type validation (:string, :integer, :float, :boolean)
  • Optional field handling (partial)
  • Error message integration framework
  • JSON schema export foundation

❌ Blocked by Elixact Limitations:

  • Atom type fields (40% of configuration)
  • Union type support (API key validation)
  • Nested map validation (provider configs, BEACON)
  • Single-field validation pattern
  • Custom error messages
  • Business logic validation

Priority Fix Order for Elixact Fork

Phase 1 (Critical - Enable Basic Functionality)

  1. Implement :atom type module with choices/1 support
  2. Fix nil handling for optional fields
  3. Add union type support (e.g., union([:string, {:tuple, [:atom, :string]}]))

Phase 2 (High - Complete Core Features)

  1. Enhanced :map validation with nested schema support
  2. Single-field validation mode for path-based validation patterns
  3. Custom error messages per field

Phase 3 (Medium - Advanced Features)

  1. Custom validation functions (validator/1 macro)
  2. Wildcard schema matching for dynamic configurations
  3. Cross-field validation support

Cross-Phase Impact Analysis

Phase 1 Achievements (✅ COMPLETED with workarounds)

  • 68 tests passing across enhanced signature system
  • 15+ constraint types fully functional with workarounds
  • Array support working for basic validation
  • Automatic schema generation operational
  • Backward compatibility 100% maintained

Phase 2 Blocking Issues Summary

Total Configuration Fields: 25+ fields across 8 domains Currently Failing: ~35 tests (75% failure rate) due to Elixact limitations Core Blocker: :atom type implementation missing (affects 40% of configurations)

Combined System Impact

If Elixact Phase 1 fixes implemented:

  • Phase 1: Can remove workarounds, achieve 100% feature parity with Pydantic
  • Phase 2: Can achieve 100% completion, replacing all manual validation
  • Combined: Full Pydantic-style validation system operational across DSPEx

Current State:

  • Phase 1: ✅ Production ready with limited regex and array element validation
  • Phase 2: ❌ 75% blocked by core type system limitations

Completion Roadmap

Critical Path (Required for Phase 2 completion)

  1. Implement :atom type module → Unblocks 40% of Phase 2 functionality
  2. Fix nil handling for optional fields → Ensures validation correctness
  3. Add union type support → Enables environment variable patterns

Enhancement Path (Phase 1 improvements)

  1. Enhanced regex compilation → Removes Phase 1 workarounds
  2. Nested array element validation → Full Pydantic-style array validation
  3. Custom validation functions → Business logic validation support

Advanced Features (Future enhancements)

  1. Nested map validation → Complex configuration structures
  2. Cross-field validation → Advanced business rules
  3. Wildcard schema matching → Dynamic configuration patterns

Estimated Timeline: Phase 2 completion possible within 1-2 days after Elixact Core fixes (#1-3) are implemented.