← Back to DSPy Self Optimization

4. Analyzing the Optimized Module

Analyzing the results of the DSPy compiler, comparing the optimized prompt to a naive one, and seeing the performance difference on a new example.

After the DSPy compiler has finished its work, we are left with a new, optimized compiled_synthesis_module. But what has actually changed? And does it perform any better? In this final section, we’ll inspect the results and run a comparison.

1. Inspecting the Generated Prompt

The core output of the BootstrapFewShot optimizer is a new, highly-optimized prompt. We can inspect the prompt of our compiled module to see what it has learned.

# Let's assume 'lm' is our configured language model and 
# 'compiled_synthesis_module' is the output from the previous step.

# lm.inspect_history(n=1) will show the last prompt sent to the LLM.
# To see the full prompt, we can call the module and then inspect.
# We'll create a new test example for this.
test_example = dspy.Example(
    thesis="Economic growth is primarily driven by consumer spending (demand-side economics).",
    antithesis="Economic growth is primarily driven by production and investment (supply-side economics).",
    shared_evidence="Shared evidence includes government spending data, consumer confidence indices, records of tax cuts on corporations, and historical GDP growth rates."
)

# Run the compiled module on our test example
compiled_synthesis_module(
    thesis=test_example.thesis, 
    antithesis=test_example.antithesis, 
    shared_evidence=test_example.shared_evidence
)

# Now inspect the last prompt sent to the language model
lm.inspect_history(n=1)

Naive Prompt vs. Optimized Prompt

A naive, hand-written prompt for our ChainOfThought module might look something like this:

Naive Prompt:

Given the thesis, antithesis, and shared evidence, think step-by-step to synthesize a novel hypothesis that resolves the core contradiction.

Thesis: {thesis}

Antithesis: {antithesis}

Shared Evidence: {shared_evidence}

Synthesized Hypothesis:

However, after running optimizer.compile(), the prompt inside our compiled_synthesis_module will be far more sophisticated. It will have been automatically generated by the optimizer because this structure was found to maximize our critic_pipeline_metric. It will look something like this (this is a simplified representation):

Optimized Prompt (Generated by DSPy):

Synthesizes a novel, higher-order hypothesis from two opposing narratives (a thesis and an antithesis) that are grounded in a shared set of evidence. The synthesis must reconcile the conflict and explain the same evidence.

Follow these steps:

  1. Analyze the core contradiction between the thesis and antithesis.
  2. Identify the key elements of the shared evidence that must be explained.
  3. Formulate a new, unifying theory that preserves the valid points of both narratives while resolving the main conflict.

Example 1:

Thesis: The continents are fixed in place…

Antithesis: The continents drift across the Earth’s surface…

Shared Evidence: …jigsaw-puzzle fit of continents…

Reasoning: The user wants a synthesis that reconciles fixed continents with drifting ones. The evidence points to plate tectonics. I will formulate a hypothesis that explains both the apparent stability and the underlying motion by introducing the concept of rigid plates.

Synthesized Hypothesis: A unifying theory of plate tectonics reconciles these views…

Example 2:

Thesis: Light is composed of particles…

Antithesis: Light is a wave…

Shared Evidence: …light travels in straight lines…but also exhibits diffraction…

Reasoning: The user needs to resolve the particle-wave conflict. The evidence supports both behaviors. I will propose a dual-nature model where light has properties of both, which is the concept of wave-particle duality.

Synthesized Hypothesis: A new model of wave-particle duality reconciles the conflict…

Current Task:

Thesis: {thesis}

Antithesis: {antithesis}

Shared Evidence: {shared_evidence}

Reasoning:

The optimized prompt is a much more powerful guide for the LLM. It includes explicit instructions, a chain-of-thought directive (Reasoning:), and, most importantly, few-shot examples that were automatically selected from our trainset by the optimizer because they helped produce high-quality outputs.

2. Comparing Performance on a New Example

Now for the real test. Let’s run both our original, un-optimized module and our new, compiled module on the test_example we created earlier and see how they perform.

# Get the prediction from the un-optimized module
uncompiled_pred = uncompiled_synthesis_module(
    thesis=test_example.thesis, 
    antithesis=test_example.antithesis, 
    shared_evidence=test_example.shared_evidence
)

# Get the prediction from the compiled module
compiled_pred = compiled_synthesis_module(
    thesis=test_example.thesis, 
    antithesis=test_example.antithesis, 
    shared_evidence=test_example.shared_evidence
)

# Let's see the outputs
print("--- Uncompiled Output ---")
print(uncompiled_pred.synthesized_hypothesis)
print("\n--- Compiled Output ---")
print(compiled_pred.synthesized_hypothesis)

# And let's score them with our metric
uncompiled_score = critic_pipeline_metric(test_example, uncompiled_pred)
compiled_score = critic_pipeline_metric(test_example, compiled_pred)

print(f"\nUncompiled Module Score: {uncompiled_score}")
print(f"Compiled Module Score: {compiled_score}")

We would expect to see a significant difference.

  • The uncompiled output might be simplistic, perhaps just averaging the two ideas (e.g., “Both supply and demand are important for the economy.”).
  • The compiled output, guided by its superior prompt, is much more likely to produce a sophisticated synthesis (e.g., “A new model suggesting that economic growth is a dynamic interplay where demand-side stimulus is effective in the short-run to utilize capacity, while long-run growth depends on supply-side investment to expand that capacity.”).

The scores from our critic_pipeline_metric would reflect this difference in quality.

3. Conclusion: The Power of Self-Optimization

This tutorial has demonstrated the core principle of building self-optimizing systems with DSPy. By moving from manual prompt engineering to programmatic optimization, we gain several key advantages:

  • Robustness: The optimized prompt is far more reliable across a wider range of inputs because it has been explicitly taught what a good output looks like.
  • Adaptability: If we change our underlying LLM, we don’t need to re-write our prompts by hand. We simply re-run the compile() step, and DSPy will find the new optimal prompt for the new model.
  • Principled Design: Our system’s performance is driven by a clearly defined metric (our CriticPipeline), making the optimization process transparent and aligned with our project’s core values.

This self-optimization loop—where the system’s own critics are used to improve its own generative components—is a foundational concept for building the next generation of powerful, reliable, and adaptive AI reasoning systems like CNS 2.0.