← Back to Dspex v2 foundation

01 requirements

Documentation for 01_requirements from the Dspex repository.

DSPex V2 Foundation - Requirements Specification

Document Information

  • Version: 1.0.0
  • Date: 2025-01-19
  • Status: Draft
  • Phase: Foundation (Initial Phase)

Executive Summary

DSPex V2 Foundation establishes the core infrastructure for a cognitive orchestration platform that bridges Elixir and Python’s DSPy framework. This initial phase focuses on creating a minimal but extensible foundation using Snakepit for Python process management, with a clean API that can route between native Elixir and Python implementations.

Functional Requirements

FR1: Core API Surface

FR1.1 Module Management

  • FR1.1.1: Initialize and manage DSPy modules (Predict, ChainOfThought, ReAct)
  • FR1.1.2: Support module configuration with typed parameters
  • FR1.1.3: Enable module composition and chaining
  • FR1.1.4: Provide module introspection capabilities

FR1.2 Signature System

  • FR1.2.1: Parse DSPy signature syntax natively in Elixir
  • FR1.2.2: Validate inputs/outputs against signatures
  • FR1.2.3: Support field types and constraints
  • FR1.2.4: Convert between Elixir and Python representations

FR1.3 Execution Interface

  • FR1.3.1: Execute DSPy operations through a unified interface
  • FR1.3.2: Support both synchronous and asynchronous execution
  • FR1.3.3: Handle streaming responses from LLMs
  • FR1.3.4: Provide execution context and metadata

FR2: Python Bridge via Snakepit

FR2.1 Pool Management

  • FR2.1.1: Configure specialized Snakepit pools for different workloads
  • FR2.1.2: Support multiple Python environments (lightweight, optimizer, GPU)
  • FR2.1.3: Enable pool-specific resource constraints
  • FR2.1.4: Implement health checking and pool monitoring

FR2.2 Communication Protocol

  • FR2.2.1: Use JSON as default serialization format
  • FR2.2.2: Support binary protocols for large data (MessagePack, Arrow)
  • FR2.2.3: Implement request/response correlation
  • FR2.2.4: Handle streaming responses

FR2.3 Session Management

  • FR2.3.1: Create and manage stateful Python sessions
  • FR2.3.2: Support session affinity for complex operations
  • FR2.3.3: Enable session persistence across requests
  • FR2.3.4: Implement session cleanup and lifecycle management

FR3: Native Implementations

FR3.1 Signature Processing

  • FR3.1.1: Implement native signature parsing
  • FR3.1.2: Support signature compilation and caching
  • FR3.1.3: Provide signature validation without Python
  • FR3.1.4: Enable signature introspection

FR3.2 Basic Operations

  • FR3.2.1: Implement native template rendering
  • FR3.2.2: Support native JSON Schema validation
  • FR3.2.3: Provide native metric calculations
  • FR3.2.4: Enable native response parsing

FR4: Routing and Orchestration

FR4.1 Smart Router

  • FR4.1.1: Route operations to native or Python based on availability
  • FR4.1.2: Support fallback mechanisms
  • FR4.1.3: Enable manual routing overrides
  • FR4.1.4: Track routing decisions for optimization

FR4.2 Pipeline Support

  • FR4.2.1: Define pipelines mixing native and Python steps
  • FR4.2.2: Support parallel execution across implementations
  • FR4.2.3: Enable pipeline composition and reuse
  • FR4.2.4: Provide pipeline execution monitoring

Non-Functional Requirements

NFR1: Performance

NFR1.1 Latency

  • NFR1.1.1: Sub-100ms latency for simple operations
  • NFR1.1.2: Sub-10ms overhead for routing decisions
  • NFR1.1.3: Native operations must be 10x faster than Python equivalent
  • NFR1.1.4: Support streaming with <50ms initial response time

NFR1.2 Throughput

  • NFR1.2.1: Handle 1000+ requests/second for cached operations
  • NFR1.2.2: Support 100+ concurrent Python operations
  • NFR1.2.3: Scale linearly with pool size
  • NFR1.2.4: Maintain performance under mixed workloads

NFR2: Reliability

NFR2.1 Fault Tolerance

  • NFR2.1.1: Graceful degradation when Python processes fail
  • NFR2.1.2: Automatic recovery from transient failures
  • NFR2.1.3: Circuit breaker pattern for failing operations
  • NFR2.1.4: No single point of failure

NFR2.2 Error Handling

  • NFR2.2.1: Comprehensive error classification
  • NFR2.2.2: Actionable error messages
  • NFR2.2.3: Error recovery strategies
  • NFR2.2.4: Error tracking and reporting

NFR3: Scalability

NFR3.1 Horizontal Scaling

  • NFR3.1.1: Support distributed deployment
  • NFR3.1.2: Pool size adjustable at runtime
  • NFR3.1.3: Work distribution across nodes
  • NFR3.1.4: Shared nothing architecture

NFR3.2 Resource Management

  • NFR3.2.1: Memory limits per Python process
  • NFR3.2.2: CPU throttling capabilities
  • NFR3.2.3: GPU resource allocation
  • NFR3.2.4: Automatic garbage collection

NFR4: Developer Experience

NFR4.1 API Design

  • NFR4.1.1: Intuitive, Elixir-idiomatic API
  • NFR4.1.2: Comprehensive documentation
  • NFR4.1.3: Type specifications for all public functions
  • NFR4.1.4: Helpful error messages

NFR4.2 Debugging

  • NFR4.2.1: Detailed logging at multiple levels
  • NFR4.2.2: Request tracing across systems
  • NFR4.2.3: Performance profiling hooks
  • NFR4.2.4: Interactive debugging support

NFR5: Compatibility

NFR5.1 DSPy Compatibility

  • NFR5.1.1: Support DSPy 2.x API
  • NFR5.1.2: Handle DSPy-specific data structures
  • NFR5.1.3: Preserve DSPy semantics
  • NFR5.1.4: Track DSPy version compatibility

NFR5.2 Elixir Ecosystem

  • NFR5.2.1: OTP compliance
  • NFR5.2.2: Phoenix integration ready
  • NFR5.2.3: Nx tensor compatibility
  • NFR5.2.4: Telemetry integration

Constraints

Technical Constraints

  • TC1: Must use Snakepit for Python process management
  • TC2: Initial phase limited to core DSPy modules
  • TC3: Python 3.8+ required
  • TC4: Elixir 1.14+ and OTP 25+ required

Business Constraints

  • BC1: 4-week development timeline for foundation
  • BC2: Must maintain backward compatibility path
  • BC3: Open source with Apache 2.0 license
  • BC4: Must support cloud deployment

Success Criteria

Functional Success

  • All core DSPy modules accessible from Elixir
  • Native and Python implementations seamlessly mixed
  • Pipeline execution with performance monitoring
  • Comprehensive test coverage (>90%)

Performance Success

  • Meeting all latency targets
  • Achieving throughput goals
  • Successful load testing at scale
  • Performance regression tests passing

Quality Success

  • Zero critical bugs in production
  • <1% error rate under normal load
  • 99.9% uptime for core services
  • Positive developer feedback

Dependencies

External Dependencies

  • Snakepit (core dependency)
  • Python 3.8+ with DSPy
  • Jason for JSON encoding
  • Telemetry for metrics
  • Nx for tensor operations (optional)

Internal Dependencies

  • None (fresh start)

Risks and Mitigations

Risk 1: Python Process Management Complexity

  • Mitigation: Leverage Snakepit’s proven architecture

Risk 2: Serialization Overhead

  • Mitigation: Multiple protocol support, automatic selection

Risk 3: Version Compatibility

  • Mitigation: Version detection and adaptation layer

Risk 4: Performance Degradation

  • Mitigation: Comprehensive benchmarking and monitoring

Future Considerations

Phase 2 Extensions

  • Advanced DSPy patterns (tree-of-thoughts, self-correction)
  • Multi-agent coordination
  • Distributed optimization
  • Real-time adaptation

Long-term Vision

  • Cognitive orchestration platform
  • Universal variable system
  • Self-optimizing pipelines
  • Production ML infrastructure