← Back to Pipeline ex

JULY 1 AFTERNOON GEM DSPY 02

Documentation for JULY_1_AFTERNOON_GEM_DSPY_02 from the Pipeline ex repository.

Ah, perfect! I understand completely now. This is a much more pragmatic and powerful approach for right now. You’re leveraging the best of both worlds:

  1. DSPy in Python: Use the mature, dynamic, and research-oriented Python ecosystem for what it’s best at—programmatic LLM optimization and prompt engineering.
  2. Elixir Pipeline: Use your robust, concurrent, and production-ready Elixir system for what it’s best at—actually executing the deterministic, multi-provider AI workflows.

This is a brilliant hybrid strategy. The Python/DSPy layer becomes your “AI Engineering Workbench,” and the Elixir system is your “Production Runtime.”

Let’s design this.


Part 1: The Quick and Dirty DSPy Script

Here is the Python script you’d whip together. This script’s goal is to use DSPy to find the single best prompt for your Elixir-based Genesis Pipeline.

Assumptions:

  • You have a Python adapter elixir_pipeline.py that can call your mix pipeline.generate.live command and return the generated YAML string.
  • You have your “gold standard” examples: your input codebase (repomix-output.xml) and your manually created, high-quality OTP refactor plan (JULY_1_2025_PRE_PHASE_2_OTP_report_01.md).
# file: optimize_genesis_prompt.py
import dspy
import subprocess
import json

# --- 1. The Python Adapter for your Elixir Pipeline ---
# This is the bridge between Python and Elixir.
class ElixirGenesisPipeline(dspy.Module):
    def __init__(self):
        super().__init__()
        # This is where the magic happens. We are parameterizing the PROMPT itself.
        # DSPy will learn the best instructions to put here.
        self.generate = dspy.Predict(
            'high_level_goal -> final_refactor_plan_yaml'
        )

    def forward(self, high_level_goal):
        # 1. Use the learned prompt from DSPy's Predict module
        #    This is the core of the optimization.
        #    DSPy is controlling the natural language instructions.
        learned_prompt = self.generate(high_level_goal=high_level_goal).prediction

        # 2. Call your Elixir mix task via the command line.
        #    We pass the learned prompt as an argument.
        #    (Your mix task needs to be able to accept this prompt)
        print("--- Calling Elixir Pipeline with Learned Prompt ---")
        print(learned_prompt)
        print("----------------------------------------------------")
        
        process = subprocess.run(
            ['mix', 'pipeline.generate.live', learned_prompt],
            capture_output=True,
            text=True,
            cwd='/path/to/your/pipeline_ex' # IMPORTANT
        )

        if process.returncode != 0:
            print("Elixir process failed!")
            print(process.stderr)
            # In DSPy, we need to return a prediction object, even on failure.
            return dspy.Prediction(final_refactor_plan_yaml="ERROR: " + process.stderr)

        # 3. The Elixir task prints the path to the generated file.
        #    We parse that path from stdout.
        #    Example stdout: "✅ Pipeline generated at evolved_pipelines/refactor_plan_123.yaml"
        output_path = process.stdout.strip().split(' at ')[-1]
        
        # 4. Read the generated YAML file content.
        with open(f"/path/to/your/pipeline_ex/{output_path}", 'r') as f:
            generated_yaml = f.read()

        return dspy.Prediction(final_refactor_plan_yaml=generated_yaml)


# --- 2. The Evaluation Metric ---
# This is how we score the quality of the Elixir-generated plan.
def validate_otp_plan(example, pred, trace=None):
    gold_standard_plan = example.gold_standard_plan
    generated_plan_yaml = pred.final_refactor_plan_yaml
    
    if generated_plan_yaml.startswith("ERROR:"):
        return False # The Elixir process failed, so this is a 0 score.

    # Use another LLM (Gemini) as a cheap, powerful quality rater.
    # We are asking it to compare the generated plan to our perfect, handwritten one.
    rater = dspy.Predict(
        'generated_plan, gold_standard -> score_and_reason'
    )
    
    # Configure Gemini for this rating task
    gemini = dspy.Google("models/gemini-1.5-flash-latest")
    dspy.settings.configure(lm=gemini)
    
    response = rater(generated_plan=generated_plan_yaml, gold_standard=gold_standard_plan).score_and_reason
    
    # Try to parse the score. This part needs to be robust.
    try:
        # Ask the LLM to respond with JSON for easy parsing
        # "Your response must be a JSON object with keys 'score' (0-10) and 'reason' (string)."
        result = json.loads(response)
        score = int(result.get('score', 0))
        print(f"Plan rated. Score: {score}. Reason: {result.get('reason', 'N/A')}")
        return score >= 8 # Our success metric: a score of 8 or higher is a "pass"
    except (json.JSONDecodeError, ValueError):
        print(f"Failed to parse rating: {response}")
        return False


# --- 3. The Training Data and Optimizer ---
def main():
    # Load your gold-standard example
    with open('repomix-output.xml', 'r') as f:
        codebase_content = f.read()
    
    with open('JULY_1_2025_PRE_PHASE_2_OTP_report_01.md', 'r') as f:
        gold_plan_content = f.read()
        
    train_example = dspy.Example(
        high_level_goal="Analyze this Elixir codebase for OTP anti-patterns and create a critical refactoring plan.",
        gold_standard_plan=gold_plan_content
    ).with_inputs('high_level_goal')

    # Configure the "main" LLM (Claude via our Elixir adapter)
    # The ElixirGenesisPipeline IS our dspy.LM
    claude_via_elixir = ElixirGenesisPipeline()
    
    # Configure DSPy to use our custom Elixir "model"
    dspy.settings.configure(lm=claude_via_elixir)
    
    # Set up the optimizer. We'll use a simple one that generates few-shot examples.
    # It will try different ways of phrasing the instructions to get the best result.
    optimizer = dspy.teleprompt.BootstrapFewShot(metric=validate_otp_plan, max_bootstrapped_demos=2)
    
    # Run the optimization!
    # This will run the ElixirGenesisPipeline multiple times, trying to find
    # the best prompt for `self.generate` that maximizes our `validate_otp_plan` metric.
    optimized_pipeline = optimizer.compile(ElixirGenesisPipeline(), trainset=[train_example])

    # --- 4. Use the Optimized Pipeline ---
    print("\n\n--- OPTIMIZATION COMPLETE ---")
    print("Found an optimized prompt for generating refactor plans.")

    # Now, let's run the final, optimized pipeline one last time
    final_prediction = optimized_pipeline(high_level_goal="Generate the final, best OTP refactor plan.")
    
    print("\n--- FINAL GENERATED YAML ---")
    print(final_prediction.final_refactor_plan_yaml)
    
    # You can inspect the optimized prompt
    optimized_pipeline.inspect_history(n=1)


if __name__ == "__main__":
    main()

What This DSPy Script Does:

  1. Defines ElixirGenesisPipeline: This is a dspy.Module that acts as an LLM for DSPy. Its forward pass takes a high-level goal, uses DSPy’s dspy.Predict to generate a learned prompt, and then shells out to your Elixir mix task, passing that prompt.
  2. Defines validate_otp_plan: This is your metric. It takes the YAML generated by the Elixir pipeline and compares it to your handwritten “gold standard” document. It uses a separate, fast LLM (Gemini Flash) to do the quality scoring. A score of 8/10 or higher means success.
  3. Sets up the BootstrapFewShot Optimizer: This optimizer will try to generate a few high-quality “demonstrations” (examples) to include in the prompt for ElixirGenesisPipeline. It’s essentially learning how to prompt your Elixir system.
  4. Compiles and Runs: The optimizer.compile() call is the main event. It runs the entire loop:
    • DSPy generates a candidate prompt.
    • It calls ElixirGenesisPipeline with that prompt.
    • The Elixir script runs and generates a YAML file.
    • The Python script reads the file.
    • The validate_otp_plan metric scores the result.
    • DSPy uses the score to generate a better prompt for the next iteration.
    • This repeats until it finds a prompt that consistently gets a high score.

Part 2: The Minimum Functionality pipeline_ex Needs

To make the Python script above work, your Elixir system needs a few, very achievable enhancements.

1. Command-Line Prompt Injection

Your mix pipeline.generate.live task needs to accept the prompt from the command line instead of having it hardcoded.

# lib/pipeline_ex/mix/tasks/pipeline.generate.live.ex

def run(args) do
  # The full prompt is now passed as a single string argument
  user_prompt = Enum.join(args, " ")

  # It runs a META-pipeline that takes this prompt as input
  genesis_pipeline_path = "priv/pipelines/genesis.yaml" 
  
  case Pipeline.Executor.execute_with_vars(genesis_pipeline_path, %{user_prompt: user_prompt}) do
    {:ok, results} ->
      generated_yaml = results["generate_pipeline_yaml"]
      file_path = "evolved_pipelines/plan_#{:rand.uniform(1000)}.yaml"
      File.write!(file_path, generated_yaml)
      # CRUCIAL: Print the path to stdout for the Python script to parse
      IO.puts("✅ Pipeline generated at #{file_path}")

    {:error, reason} ->
      # CRUCIAL: Print errors to stderr and exit with non-zero status
      IO.warn("❌ Failed to generate pipeline: #{reason}")
      System.halt(1)
  end
end

And your genesis.yaml needs to be updated to use this variable:

# priv/pipelines/genesis.yaml
...
- name: "generate_pipeline_yaml"
  type: "claude"
  prompt:
    - type: "variable"
      # This variable is populated by the `execute_with_vars` call
      name: "user_prompt"

2. Robust File I/O and Workspace Management (Already in your design)

Your edit, read, and write tools need to be solid. The AI will be creating and modifying files based on the plans it generates. You already have this concept, so just ensure it’s implemented robustly.

3. Structured Output from Steps (Crucial for Verification)

While the DSPy script handles the final output verification, making your intermediate steps produce structured JSON is key.

Your claude_extract step is perfect for this. You should use it everywhere.

Instead of a step that just returns a blob of text, have it return JSON.

BAD:

- name: "analyze_code"
  type: "claude"
  prompt: "Analyze this code and tell me what's wrong."
# Output: "The code has several issues. Firstly, you are using Process.spawn..."

GOOD (Using your existing claude_extract):

- name: "analyze_code"
  type: "claude_extract"
  extraction_config:
    format: "json" # Force JSON output
  prompt: |
    Analyze this code for OTP flaws. Respond with a JSON object containing a key 'flaws', 
    which is a list of objects, each with 'file', 'line', and 'description' keys.
# Output: A JSON string that can be reliably parsed by the next step.

By making these minimal changes, you create a powerful, hybrid system where Python/DSPy handles the high-level, creative, and complex task of prompt engineering, and your Elixir pipeline acts as the fast, concurrent, and reliable execution engine.