← Back to Infrastructure

Schema validator enhancement

Documentation for schema_validator_enhancement from the Pipeline ex repository.

Schema Validator Enhancement Prompt

Context Recontextualization

You are working on the pipeline_ex system, an Elixir-based pipeline generator that currently has basic JSON Schema validation but needs enhancement to support DSPy integration and advanced pipeline workflows.

Current System Architecture

  • Existing validator: lib/pipeline/validation/schema_validator.ex - Comprehensive JSON Schema validation with standard types and constraints
  • Configuration system: lib/pipeline/config.ex and lib/pipeline/enhanced_config.ex - YAML-based configuration with hard-coded validation
  • Result management: lib/pipeline/result_manager.ex - Basic JSON serialization with manual key conversion
  • Step execution: lib/pipeline/executor.ex - Hard-coded step type dispatch system

Current Limitations

  1. No YAML schema support - Only validates JSON structures
  2. No type preservation - Loses original data types during validation
  3. No schema composition - Cannot extend or merge schemas
  4. No conditional validation - No support for if/then/else logic
  5. Limited metadata support - No annotation or documentation fields

DSPy Integration Requirements

  • Signature schema support - Must validate DSPy input/output field definitions
  • Structured output validation - Strict validation of AI model outputs
  • Type preservation - Maintain data types across serialization boundaries
  • Schema composition - Combine base schemas with extensions
  • Conditional validation - Support dynamic validation rules

Task

Enhance the existing schema validation system to support DSPy integration requirements while maintaining backward compatibility.

Required Enhancements

  1. Enhanced Schema Validator (lib/pipeline/enhanced/schema_validator.ex)

    • Add DSPy signature validation support
    • Implement type preservation during validation
    • Add conditional validation (if/then/else)
    • Support schema composition (allOf, anyOf, oneOf)
    • Add metadata and annotation support
  2. Schema Composer (lib/pipeline/enhanced/schema_composer.ex)

    • Implement schema merging and extension
    • Handle schema conflicts and overrides
    • Support schema inheritance patterns
    • Add schema optimization for performance
  3. Type Preservation System (lib/pipeline/enhanced/type_preserver.ex)

    • Extract and maintain type metadata
    • Support type coercion where appropriate
    • Handle complex nested structures
    • Integrate with existing validation flow

Implementation Requirements

  • Maintain backward compatibility with existing schema_validator.ex
  • Follow existing code patterns from current validator implementation
  • Support all current JSON Schema features plus enhancements
  • Integrate with existing configuration system without breaking changes
  • Add comprehensive error handling with detailed path information
  • Include thorough documentation and examples

DSPy Schema Format Support

The enhanced validator must support DSPy signature schemas like:

signature:
  input_fields:
    - name: code
      type: string
      description: "Source code to analyze"
    - name: context
      type: object
      schema:
        type: object
        properties:
          language: {type: string}
          complexity: {type: string, enum: [low, medium, high]}
  output_fields:
    - name: analysis
      type: object
      schema:
        type: object
        properties:
          issues: {type: array, items: {type: string}}
          score: {type: number, minimum: 0, maximum: 100}

Code Style Requirements

  • Follow existing Elixir conventions from the codebase
  • Use pattern matching and with statements for error handling
  • Include comprehensive docstrings and type specs
  • Add detailed logging for debugging and monitoring
  • Structure code with clear separation of concerns

Testing Requirements

  • Create comprehensive test cases for new functionality
  • Maintain all existing test coverage
  • Add DSPy-specific test scenarios
  • Include edge case handling tests
  • Add performance benchmarks for large schemas

Implement this enhancement as a complete, production-ready solution that integrates seamlessly with the existing pipeline_ex architecture.