005 4of10

Documentation for 005_4of10 from the Ds ex repository.

Of course. Here is a detailed technical document for the fourth proposed feature enhancement: "Telemetry Support for Histograms and Summaries."

Technical Specification: Histogram and Summary Metrics in `Foundation.Telemetry`

Document Version: 1.0 Author: AI Assistant Status: PROPOSED

1. Overview

This document specifies an enhancement to the Foundation.Services.TelemetryService to add native support for two powerful metric types: Histograms and Summaries. The current TelemetryService primarily supports counters and gauges, which are excellent for tracking event occurrences and point-in-time values. However, to understand the performance characteristics and distributions of critical operations, more advanced statistical metrics are required.

This enhancement is driven by the needs of the DSPEx project, where understanding the distribution of LLM latencies and evaluation scores is often more important than just tracking the average. For example, knowing that the p99 latency for an LLM call is 5 seconds is a critical SLO (Service Level Objective) that cannot be monitored with a simple gauge or counter.

This proposal introduces Telemetry.emit_histogram/3 and Telemetry.emit_summary/3 to the public API and details the necessary internal changes to the TelemetryService to aggregate and expose this statistical data.

2. Problem Statement & Use Case

A DSPEx.Evaluate run completes on a dev set of 1,000 examples. The final average score is 0.85. This single number hides a crucial story:

Did 85% of the examples score 1.0 and 15% score 0.0 (a bimodal, spiky distribution)?
Or did most examples score between 0.8 and 0.9 (a tight, normal distribution)?

Answering this requires understanding the distribution of the scores. Similarly, when monitoring an LLM client:

An average latency of 800ms might seem acceptable.
However, if the p99 latency is 7,000ms, it means 1% of users are experiencing extremely slow responses, indicating a significant performance problem.

The current TelemetryService cannot capture this distributional data. This enhancement will add that capability.

3. Proposed API Changes

The public API in Foundation.Telemetry will be extended with two new functions.

3.1. `emit_histogram(event_name, value, metadata)`

This function is used to record observations that will be aggregated into histogram buckets. Histograms are ideal for understanding the distribution of values, such as request latencies.

New Signature:

# in foundation/telemetry.ex

@spec emit_histogram(event_name :: [atom()], value :: number(), metadata :: map()) :: :ok

event_name: The hierarchical name of the metric.
value: The numerical observation to record (e.g., a response time in milliseconds).
metadata: Additional tags for filtering/grouping (e.g., %{provider: :openai, model: "gpt-4o"}).

Example Usage (from DSPEx):

# Inside the LM Client after a successful API call
duration_ms = # ... calculate duration ...
Telemetry.emit_histogram(
  [:dspex, :lm, :request_latency_ms], 
  duration_ms,
  %{provider: :openai, model: "gpt-4o-mini"}
)

3.2. `emit_summary(event_name, value, metadata)` (Alternative/Optional)

Summaries are similar to histograms but are calculated on the client side and are more focused on quantiles. While histograms are generally more flexible on the backend, a summary can be useful for pre-aggregated quantile data. We propose adding it for completeness.

New Signature:

# in foundation/telemetry.ex

@spec emit_summary(event_name :: [atom()], value :: number(), metadata :: map()) :: :ok

Note: The internal implementation will likely be very similar to emit_histogram, but the aggregation logic might differ, focusing on calculating and storing specific quantiles (e.g., p50, p90, p99). For simplicity, we will focus the rest of this document on the histogram implementation, as it is the more versatile of the two.

4. Internal Implementation Details

This enhancement requires significant changes to the TelemetryService GenServer and its internal state.

4.1. `TelemetryService` State Enhancement

The metrics map within the server_state will now need to differentiate between metric types.

New server_state Structure:

# in foundation/services/telemetry_service.ex

@type server_state :: %{
  metrics: %{
    # Example structure
    [:dspex, :lm, :requests_total] => %{type: :counter, value: 1234, ...},
    [:dspex, :system, :active_workers] => %{type: :gauge, value: 10, ...},
    [:dspex, :lm, :request_latency_ms] => %{
      type: :histogram,
      buckets: %{100 => 5, 250 => 20, 500 => 50, 1000 => 15, :infinity => 2},
      sum: 123450.0,
      count: 92,
      ...
    }
  },
  # ... other state fields
}

A :type key is added to each metric entry.
For :histogram, the value is a map containing:
- buckets: A map where keys are the upper bound of a bucket (e.g., 100 for 0-100ms) and values are the count of observations in that bucket. An :infinity key catches all values above the highest defined bucket.
- sum: The sum of all observed values.
- count: The total number of observations.

4.2. New `handle_cast` Clause for Histograms

The TelemetryService needs to handle the new event type.

# in foundation/services/telemetry_service.ex

# Modify the record_metric function to handle different types
defp record_metric(event_name, %{histogram: value} = measurements, metadata, state) do
  # This is a histogram observation
  update_histogram(event_name, value, measurements, metadata, state)
end

# ... other record_metric clauses for :counter, :gauge ...

defp update_histogram(event_name, value, measurements, metadata, %{metrics: metrics} = state) do
  # Define default histogram buckets or get them from config
  default_buckets = [100, 250, 500, 1000, 2500, 5000, 10000]

  # Find the correct bucket for the new value
  bucket_key = Enum.find(default_buckets, :infinity, fn upper_bound -> value <= upper_bound end)
  
  # Get the current histogram or create a new one
  current_histogram = Map.get(metrics, event_name, %{
    type: :histogram,
    buckets: %{},
    sum: 0.0,
    count: 0,
    timestamp: System.monotonic_time(),
    metadata: metadata
  })
  
  # Update the histogram's state
  new_histogram =
    current_histogram
    |> Map.update!(:buckets, &Map.update(&1, bucket_key, 1, fn count -> count + 1 end))
    |> Map.update!(:sum, &(&1 + value))
    |> Map.update!(:count, &(&1 + 1))
    |> Map.put(:timestamp, System.monotonic_time())
  
  %{state | metrics: Map.put(metrics, event_name, new_histogram)}
end

4.3. Exposing Aggregated Statistics via `get_metrics`

The get_metrics function should not return the raw bucket data. Instead, it should calculate and return meaningful statistics.

# in foundation/services/telemetry_service.ex

def handle_call(:get_metrics, _from, %{metrics: metrics} = state) do
  # Transform raw metrics into consumable statistics
  stats =
    metrics
    |> Enum.map(fn {event_name, metric_data} ->
      case metric_data.type do
        :histogram ->
          {event_name, calculate_histogram_stats(metric_data)}
        # ... other types
        _ ->
          {event_name, metric_data}
      end
    end)
    |> transform_to_nested_structure() # Your existing function

  {:reply, {:ok, stats}, state}
end

defp calculate_histogram_stats(histogram_data) do
  # This is a complex function. A real implementation might use a library
  # like `:statistics` or a dedicated HDR Histogram implementation.
  # For this spec, we'll outline the logic.

  # Extract data for calculation
  buckets = histogram_data.buckets
  total_count = histogram_data.count
  
  # Avoid division by zero
  if total_count == 0, do: return_empty_stats()

  # 1. Calculate Average
  avg = histogram_data.sum / total_count

  # 2. Calculate Percentiles (p50, p90, p99)
  # This requires iterating through the sorted buckets
  sorted_buckets = Enum.sort(buckets)
  
  p50_rank = total_count * 0.50
  p90_rank = total_count * 0.90
  p99_rank = total_count * 0.99
  
  p50 = find_percentile_value(sorted_buckets, p50_rank)
  p90 = find_percentile_value(sorted_buckets, p90_rank)
  p99 = find_percentile_value(sorted_buckets, p99_rank)

  %{
    type: :histogram,
    count: total_count,
    sum: histogram_data.sum,
    avg: avg,
    p50: p50,
    p90: p90,
    p99: p99,
    # raw_buckets: buckets # Optionally include for debugging
  }
end

# Helper to find percentile value from bucketed data
defp find_percentile_value(sorted_buckets, rank) do
  Enum.reduce_while(sorted_buckets, 0, fn {upper_bound, count}, cumulative_count ->
    new_cumulative = cumulative_count + count
    if new_cumulative >= rank do
      # The rank falls within this bucket
      {:halt, upper_bound}
    else
      {:cont, new_cumulative}
    end
  end)
end

5. Configuration

The default histogram buckets should be configurable via the foundation application environment.

Example config.exs:

config :foundation, :telemetry,
  default_histogram_buckets: [50, 100, 250, 500, 1000, 5000]

The TelemetryService will read this configuration during init/1 to use for all histogram metrics.

6. Conclusion

Adding native support for histogram and summary metrics to the Foundation.Telemetry service is a critical enhancement for building production-grade AI systems. It provides the necessary observability to move beyond simple averages and understand the true performance distribution of key operations.

For DSPEx, this means:

SLO Monitoring: We can now set and monitor SLOs on p99 latency for LLM calls.
Performance Debugging: We can easily identify if a small number of outliers are skewing the average latency.
Optimizer Analysis: We can analyze the full distribution of scores produced by an optimizer, not just the single best score, leading to more robust and reliable optimization.

This feature significantly elevates the operational maturity of any system built on foundation, making it an essential addition to the library.

Technical Specification: Histogram and Summary Metrics in Foundation.Telemetry