Task: NATIVE.4 - Metrics Collection System

Context

You are implementing the metrics collection system for DSPex, which tracks performance, usage patterns, and optimization effectiveness. This system is crucial for the cognitive orchestration capabilities that allow DSPex to learn and adapt.

Required Reading

1. Cognitive Orchestration Architecture

File: /home/home/p/g/n/dspex/docs/specs/dspex_cognitive_orchestration/01_CORE_ARCHITECTURE.md
- Section: “Cognitive Telemetry Layer”
- Focus on how metrics feed into adaptation

2. Telemetry Patterns from libStaging

File: /home/home/p/g/n/dspex/docs/LIBSTAGING_PATTERNS_FOR_COGNITIVE_ORCHESTRATION.md
- Lines 176-189: Performance tracking patterns
- Lines 185-189: Telemetry integration examples

3. Existing Metrics Module

File: /home/home/p/g/n/dspex/lib/dspex/native/metrics.ex
- Review current implementation approach
- Note integration points with other modules

4. Requirements Reference

File: /home/home/p/g/n/dspex/docs/specs/immediate_implementation/REQUIREMENTS.md
- Section: “Non-Functional Requirements” - NFR.5 (Observability)
- Section: “Future-Ready Requirements” - FR.1 (Consciousness hooks)

5. Success Criteria Examples

File: /home/home/p/g/n/dspex/docs/specs/dspex_cognitive_orchestration/06_SUCCESS_CRITERIA.md
- Lines covering telemetry tests (search for “telemetry”)
- Examples of metrics analysis and adaptation

Implementation Requirements

Core Metrics Types

Performance Metrics
- Execution duration (per operation, per stage)
- Latency breakdown (queue time, execution time, response time)
- Throughput (requests/second)
- Resource usage (memory, CPU)
Quality Metrics
- Validation success/failure rates
- LLM token usage
- Output quality scores (when available)
- Retry counts and reasons
Pattern Metrics
- Operation frequency
- Parameter distributions
- Error patterns
- Usage patterns by session
Optimization Metrics
- Variable optimization history
- Strategy effectiveness
- Adaptation success rates
- Learning convergence

Telemetry Events Structure

# Event naming convention: [:dspex, component, action]
[:dspex, :router, :route_selected]
[:dspex, :native, :signature_parsed]
[:dspex, :llm, :adapter_selected]
[:dspex, :pipeline, :stage_completed]
[:dspex, :optimization, :variable_updated]

Implementation Structure

lib/dspex/native/
├── metrics.ex                    # Main metrics module
├── metrics/
│   ├── collector.ex             # Event collection
│   ├── aggregator.ex            # Metric aggregation
│   ├── analyzer.ex              # Pattern analysis
│   └── storage.ex               # ETS-based storage

Acceptance Criteria

Telemetry events defined for all major operations
Metrics collector captures all events with minimal overhead
Aggregator provides time-windowed metrics (1min, 5min, 1hour)
Pattern analyzer detects trends and anomalies
Storage system with automatic cleanup of old data
Export interface for external monitoring systems
Performance overhead <1% on operations
Integration with cognitive orchestration for adaptation triggers

Telemetry Event Examples

# Router selection event
:telemetry.execute(
  [:dspex, :router, :route_selected],
  %{duration: 0.5},
  %{
    operation: "predict",
    selected: :native,
    reason: :performance,
    alternatives: [:python]
  }
)

# LLM execution event
:telemetry.execute(
  [:dspex, :llm, :execution_complete],
  %{duration: 150, tokens: 245},
  %{
    adapter: "instructor_lite",
    model: "gpt-3.5-turbo",
    success: true
  }
)

Storage Schema

# ETS tables
:dspex_metrics_current    # Current window metrics
:dspex_metrics_historical # Historical aggregates
:dspex_metrics_patterns   # Detected patterns

# Metric record structure
%{
  event: [:dspex, :router, :route_selected],
  timestamp: ~U[2024-01-20 10:30:00Z],
  measurements: %{duration: 0.5},
  metadata: %{operation: "predict", selected: :native},
  window: :current
}

Analysis Functions

# Required analysis functions
- calculate_percentiles(metric, percentiles)
- detect_anomalies(metric, window)
- analyze_trends(metric, periods)
- find_patterns(events, window)
- suggest_optimizations(metrics)

Testing Requirements

Create tests in:

test/dspex/native/metrics_test.exs
test/dspex/native/metrics/ (one file per sub-module)

Test scenarios:

High-volume event handling (1000+ events/second)
Memory usage under load
Correct aggregation across time windows
Pattern detection accuracy
Integration with adaptation system

Dependencies

Requires CORE.1 to be complete
Integrates with all other components
Will be used by cognitive orchestration layer

Time Estimate

4 hours total:

1 hour: Core telemetry event setup
1 hour: Collector and aggregator
1 hour: Pattern analyzer
1 hour: Testing and optimization

Notes

Use :telemetry library for standard Elixir patterns
Consider using :telemetry_metrics for aggregation
Ensure minimal performance impact
Design for future export to Prometheus/StatsD
Include hooks for consciousness integration (metadata enrichment)