← Back to OTPEvals

02

Documentation for 02 from the Dspex repository.

Of course. Generating high-quality training data for OTP (Open Telecom Platform), which is central to Elixir and Erlang, is a fantastic use case for a DSPy-like system. The challenge, as you noted, is moving beyond simple code patterns to capture the essence of concurrency, which is about reasoning over time, messages, and interacting processes.

Here’s a breakdown of ideas for creating training data and evaluations for OTP concurrency, designed for a system like DSPy.

Core Idea: Map Concurrency Concepts to “Learnable” Scenarios

The goal is to teach a model to reason about OTP’s Actor Model—isolated processes communicating via asynchronous messages. We need to create training data that explicitly connects a high-level concurrency problem (the context) to a low-level code implementation and its outcome.


## 1. Labeled Code Scenarios: “Bug” vs. “Fix”

This is the most direct approach. You create pairs of OTP code snippets (in Elixir or Erlang) that demonstrate a specific concurrency issue and its correct implementation.

How to generate it:

  • Concurrency Anti-Patterns: Focus on common OTP mistakes.

    • Race Conditions: A GenServer that reads and then writes a value in separate steps instead of a single atomic handle_call.
    • Deadlocks: Two processes calling each other synchronously, waiting for a reply.
    • Losing Messages: A process that crashes and isn’t properly supervised, losing messages in its mailbox.
    • Blocking Calls: Using a GenServer.call inside a handle_call to another process, which can block the entire system.
  • DSPy Signature for Generation:

    # A DSPy-like signature to generate training data
    class ConcurrencyScenario(dspy.Signature):
        """Given a concurrency anti-pattern, generate a buggy Elixir code example and a corrected version using OTP principles."""
    
        anti_pattern = dspy.InputField(desc="e.g., 'Race condition in GenServer state update'")
        buggy_code = dspy.OutputField(desc="Elixir code with the specified bug.")
        fixed_code = dspy.OutputField(desc="The corrected Elixir code using proper OTP patterns.")
        explanation = dspy.OutputField(desc="A clear explanation of why the first code is buggy and how the fix resolves the concurrency issue.")
    

Example Training Data Point:

  • Context/Input (anti_pattern): “Race condition in a GenServer counter”
  • Output (buggy_code):
    # BUGGY: Not atomic
    def handle_call(:get_and_increment, _from, state) do
      current_value = state.value
      new_state = %{state | value: current_value + 1}
      {:reply, current_value, new_state}
    end
    
  • Output (fixed_code):
    # FIXED: Atomic operation
    def handle_call(:get_and_increment, _from, state) do
      {:reply, state.value, %{state | value: state.value + 1}}
    end
    
  • Output (explanation): “The buggy version can cause a race condition if two processes call it concurrently. Both might read the same current_value before either has a chance to update the state. The fix ensures the reply and state update are an atomic operation within the GenServer’s single-threaded loop.”

## 2. Natural Language Problem to OTP Solution

This approach links real-world problems to OTP’s architectural patterns (Supervisor, GenServer, Application). It teaches the model how to structure a concurrent system.

How to generate it:

  • Problem Descriptions: Write scenarios that imply a need for concurrency.

    • “Build a real-time notification system where multiple users can subscribe to topics.” (Suggests a Registry and GenServers per topic).
    • “Create a job queue that processes video uploads. If a video processor fails, it should be restarted automatically.” (Suggests a Supervisor with a restart strategy).
    • “Design a cache for database queries that expires entries after 5 minutes.” (Suggests a GenServer with Process.send_after).
  • DSPy Signature:

    class SystemDesign(dspy.Signature):
        """Given a real-world problem, design a high-level OTP architecture and provide a skeleton of the main GenServer or Supervisor."""
    
        problem_description = dspy.InputField(desc="A natural language description of a system to build.")
        otp_architecture = dspy.OutputField(desc="A high-level plan using OTP components like Supervisor, GenServer, Registry, etc.")
        code_skeleton = dspy.OutputField(desc="Minimal Elixir code for the core components.")
    

Example Training Data Point:

  • Context/Input (problem_description): “I need a system that can run up to 10 concurrent web scraping jobs. If any single job crashes, it shouldn’t bring down the others, and it should be restarted up to 3 times.”
  • Output (otp_architecture): “This requires a Supervisor with a one_for_one restart strategy. The supervisor will manage worker processes (GenServer or Task), each responsible for one scraping job. A max_restarts option of 3 should be used.”
  • Output (code_skeleton):
    defmodule Scraper.Supervisor do
      use Supervisor
    
      def start_link(opts) do
        Supervisor.start_link(__MODULE__, :ok, opts)
      end
    
      def init(:ok) do
        children = [
          %{
            id: Scraper.Worker,
            start: {Scraper.Worker, :start_link, []},
            restart: :transient,
          }
        ]
        Supervisor.init(children, strategy: :one_for_one, max_restarts: 3)
      end
    end
    

## 3. Execution Trace and State Prediction

This is the most advanced and powerful technique for teaching reasoning. Instead of just looking at static code, the model must understand how processes interact over time.

How to generate it:

  • Simulate Execution: Write a small script that spawns a few OTP processes, sends them a sequence of messages, and captures key events (message sent, message received, state change, crash).

  • Create a “Trace”: Format this sequence of events into a readable text format.

  • Ask for Predictions: Based on the trace, ask the model to predict the final state of a process, identify if a message will be lost, or explain why a deadlock occurred.

  • DSPy Signature:

    class ConcurrencyTrace(dspy.Signature):
        """Given the code for several OTP processes and a timeline of events, predict the final state of a target process or identify a concurrency bug."""
    
        process_code = dspy.InputField(desc="The Elixir code for all involved GenServers.")
        event_trace = dspy.InputField(desc="A timestamped log of messages sent between processes.")
        question = dspy.InputField(desc="e.g., 'What is the final state of Process A?' or 'Will Process B ever receive the :work_done message?'")
        prediction = dspy.OutputField(desc="The predicted final state or an answer to the question.")
        reasoning = dspy.OutputField(desc="Explain the step-by-step reasoning based on OTP's message-passing guarantees.")
    

Example Training Data Point:

  • Context/Input (process_code & event_trace):
    • Code for two GenServers, A and B.
    • Trace:
      1. T=0ms: System -> A: {:call, B, :some_work}
      2. T=1ms: A -> B: {:call, A, :other_work}
      3. ...
  • Input (question): “Describe what happens in this system.”
  • Output (prediction): “A deadlock occurs.”
  • Output (reasoning): “Process A receives a synchronous call and, inside its handle_call, makes another synchronous call to Process B. Process B, in turn, makes a synchronous call back to A. Since both handle_call functions are blocked waiting for a reply, neither process can process the incoming message from the other. They will both time out, resulting in a deadlock.”

How to Use This for Evals

Your evaluation set should be a holdout set of these generated examples.

  1. Code Generation Eval: Give the model a buggy_code snippet and an explanation of the bug. Evaluate if it can produce the fixed_code.
  2. System Design Eval: Provide a new problem_description and see if the generated otp_architecture and code_skeleton are coherent and correct. This can be graded by an expert or even by another LLM call with a “critic” prompt.
  3. Reasoning Eval: This is the key one. Provide the code and event_trace and ask a question. The metric is whether the prediction is correct. This directly tests the model’s ability to reason about concurrency.