4. Analyzing the Optimized Module

After the DSPy compiler has finished its work, we are left with a new, optimized compiled_synthesis_module. But what has actually changed? And does it perform any better? In this final section, we’ll inspect the results and run a comparison.

1. Inspecting the Generated Prompt

The core output of the BootstrapFewShot optimizer is a new, highly-optimized prompt. We can inspect the prompt of our compiled module to see what it has learned.

# Let's assume 'lm' is our configured language model and 
# 'compiled_synthesis_module' is the output from the previous step.

# lm.inspect_history(n=1) will show the last prompt sent to the LLM.
# To see the full prompt, we can call the module and then inspect.
# We'll create a new test example for this.
test_example = dspy.Example(
    thesis="Economic growth is primarily driven by consumer spending (demand-side economics).",
    antithesis="Economic growth is primarily driven by production and investment (supply-side economics).",
    shared_evidence="Shared evidence includes government spending data, consumer confidence indices, records of tax cuts on corporations, and historical GDP growth rates."
)

# Run the compiled module on our test example
compiled_synthesis_module(
    thesis=test_example.thesis, 
    antithesis=test_example.antithesis, 
    shared_evidence=test_example.shared_evidence
)

# Now inspect the last prompt sent to the language model
lm.inspect_history(n=1)

Naive Prompt vs. Optimized Prompt

A naive, hand-written prompt for our ChainOfThought module might look something like this:

Naive Prompt:
Given the thesis, antithesis, and shared evidence, think step-by-step to synthesize a novel hypothesis that resolves the core contradiction.
–
Thesis: {thesis}
Antithesis: {antithesis}
Shared Evidence: {shared_evidence}
Synthesized Hypothesis:

However, after running optimizer.compile(), the prompt inside our compiled_synthesis_module will be far more sophisticated. It will have been automatically generated by the optimizer because this structure was found to maximize our critic_pipeline_metric. It will look something like this (this is a simplified representation):

Optimized Prompt (Generated by DSPy):
Synthesizes a novel, higher-order hypothesis from two opposing narratives (a thesis and an antithesis) that are grounded in a shared set of evidence. The synthesis must reconcile the conflict and explain the same evidence.
–
Follow these steps:
Analyze the core contradiction between the thesis and antithesis.
Identify the key elements of the shared evidence that must be explained.
Formulate a new, unifying theory that preserves the valid points of both narratives while resolving the main conflict.
–
Example 1:
Thesis: The continents are fixed in place…
Antithesis: The continents drift across the Earth’s surface…
Shared Evidence: …jigsaw-puzzle fit of continents…
Reasoning: The user wants a synthesis that reconciles fixed continents with drifting ones. The evidence points to plate tectonics. I will formulate a hypothesis that explains both the apparent stability and the underlying motion by introducing the concept of rigid plates.
Synthesized Hypothesis: A unifying theory of plate tectonics reconciles these views…
–
Example 2:
Thesis: Light is composed of particles…
Antithesis: Light is a wave…
Shared Evidence: …light travels in straight lines…but also exhibits diffraction…
Reasoning: The user needs to resolve the particle-wave conflict. The evidence supports both behaviors. I will propose a dual-nature model where light has properties of both, which is the concept of wave-particle duality.
Synthesized Hypothesis: A new model of wave-particle duality reconciles the conflict…
–
Current Task:
Thesis: {thesis}
Antithesis: {antithesis}
Shared Evidence: {shared_evidence}
Reasoning:

The optimized prompt is a much more powerful guide for the LLM. It includes explicit instructions, a chain-of-thought directive (Reasoning:), and, most importantly, few-shot examples that were automatically selected from our trainset by the optimizer because they helped produce high-quality outputs.

2. Comparing Performance on a New Example

Now for the real test. Let’s run both our original, un-optimized module and our new, compiled module on the test_example we created earlier and see how they perform.

# Get the prediction from the un-optimized module
uncompiled_pred = uncompiled_synthesis_module(
    thesis=test_example.thesis, 
    antithesis=test_example.antithesis, 
    shared_evidence=test_example.shared_evidence
)

# Get the prediction from the compiled module
compiled_pred = compiled_synthesis_module(
    thesis=test_example.thesis, 
    antithesis=test_example.antithesis, 
    shared_evidence=test_example.shared_evidence
)

# Let's see the outputs
print("--- Uncompiled Output ---")
print(uncompiled_pred.synthesized_hypothesis)
print("\n--- Compiled Output ---")
print(compiled_pred.synthesized_hypothesis)

# And let's score them with our metric
uncompiled_score = critic_pipeline_metric(test_example, uncompiled_pred)
compiled_score = critic_pipeline_metric(test_example, compiled_pred)

print(f"\nUncompiled Module Score: {uncompiled_score}")
print(f"Compiled Module Score: {compiled_score}")

We would expect to see a significant difference.

The uncompiled output might be simplistic, perhaps just averaging the two ideas (e.g., “Both supply and demand are important for the economy.”).
The compiled output, guided by its superior prompt, is much more likely to produce a sophisticated synthesis (e.g., “A new model suggesting that economic growth is a dynamic interplay where demand-side stimulus is effective in the short-run to utilize capacity, while long-run growth depends on supply-side investment to expand that capacity.”).

The scores from our critic_pipeline_metric would reflect this difference in quality.

3. Conclusion: The Power of Self-Optimization

This tutorial has demonstrated the core principle of building self-optimizing systems with DSPy. By moving from manual prompt engineering to programmatic optimization, we gain several key advantages:

Robustness: The optimized prompt is far more reliable across a wider range of inputs because it has been explicitly taught what a good output looks like.
Adaptability: If we change our underlying LLM, we don’t need to re-write our prompts by hand. We simply re-run the compile() step, and DSPy will find the new optimal prompt for the new model.
Principled Design: Our system’s performance is driven by a clearly defined metric (our CriticPipeline), making the optimization process transparent and aligned with our project’s core values.

This self-optimization loop—where the system’s own critics are used to improve its own generative components—is a foundational concept for building the next generation of powerful, reliable, and adaptive AI reasoning systems like CNS 2.0.