After the DSPy compiler has finished its work, we are left with a new, optimized compiled_synthesis_module
. But what has actually changed? And does it perform any better? In this final section, we’ll inspect the results and run a comparison.
1. Inspecting the Generated Prompt
The core output of the BootstrapFewShot
optimizer is a new, highly-optimized prompt. We can inspect the prompt of our compiled module to see what it has learned.
# Let's assume 'lm' is our configured language model and
# 'compiled_synthesis_module' is the output from the previous step.
# lm.inspect_history(n=1) will show the last prompt sent to the LLM.
# To see the full prompt, we can call the module and then inspect.
# We'll create a new test example for this.
test_example = dspy.Example(
thesis="Economic growth is primarily driven by consumer spending (demand-side economics).",
antithesis="Economic growth is primarily driven by production and investment (supply-side economics).",
shared_evidence="Shared evidence includes government spending data, consumer confidence indices, records of tax cuts on corporations, and historical GDP growth rates."
)
# Run the compiled module on our test example
compiled_synthesis_module(
thesis=test_example.thesis,
antithesis=test_example.antithesis,
shared_evidence=test_example.shared_evidence
)
# Now inspect the last prompt sent to the language model
lm.inspect_history(n=1)
Naive Prompt vs. Optimized Prompt
A naive, hand-written prompt for our ChainOfThought
module might look something like this:
Naive Prompt:
Given the thesis, antithesis, and shared evidence, think step-by-step to synthesize a novel hypothesis that resolves the core contradiction.
–
Thesis: {thesis}
Antithesis: {antithesis}
Shared Evidence: {shared_evidence}
Synthesized Hypothesis:
However, after running optimizer.compile()
, the prompt inside our compiled_synthesis_module
will be far more sophisticated. It will have been automatically generated by the optimizer because this structure was found to maximize our critic_pipeline_metric
. It will look something like this (this is a simplified representation):
Optimized Prompt (Generated by DSPy):
Synthesizes a novel, higher-order hypothesis from two opposing narratives (a thesis and an antithesis) that are grounded in a shared set of evidence. The synthesis must reconcile the conflict and explain the same evidence.
–
Follow these steps:
- Analyze the core contradiction between the thesis and antithesis.
- Identify the key elements of the shared evidence that must be explained.
- Formulate a new, unifying theory that preserves the valid points of both narratives while resolving the main conflict.
–
Example 1:
Thesis: The continents are fixed in place…
Antithesis: The continents drift across the Earth’s surface…
Shared Evidence: …jigsaw-puzzle fit of continents…
Reasoning: The user wants a synthesis that reconciles fixed continents with drifting ones. The evidence points to plate tectonics. I will formulate a hypothesis that explains both the apparent stability and the underlying motion by introducing the concept of rigid plates.
Synthesized Hypothesis: A unifying theory of plate tectonics reconciles these views…
–
Example 2:
Thesis: Light is composed of particles…
Antithesis: Light is a wave…
Shared Evidence: …light travels in straight lines…but also exhibits diffraction…
Reasoning: The user needs to resolve the particle-wave conflict. The evidence supports both behaviors. I will propose a dual-nature model where light has properties of both, which is the concept of wave-particle duality.
Synthesized Hypothesis: A new model of wave-particle duality reconciles the conflict…
–
Current Task:
Thesis: {thesis}
Antithesis: {antithesis}
Shared Evidence: {shared_evidence}
Reasoning:
The optimized prompt is a much more powerful guide for the LLM. It includes explicit instructions, a chain-of-thought directive (Reasoning:
), and, most importantly, few-shot examples that were automatically selected from our trainset
by the optimizer because they helped produce high-quality outputs.
2. Comparing Performance on a New Example
Now for the real test. Let’s run both our original, un-optimized module and our new, compiled module on the test_example
we created earlier and see how they perform.
# Get the prediction from the un-optimized module
uncompiled_pred = uncompiled_synthesis_module(
thesis=test_example.thesis,
antithesis=test_example.antithesis,
shared_evidence=test_example.shared_evidence
)
# Get the prediction from the compiled module
compiled_pred = compiled_synthesis_module(
thesis=test_example.thesis,
antithesis=test_example.antithesis,
shared_evidence=test_example.shared_evidence
)
# Let's see the outputs
print("--- Uncompiled Output ---")
print(uncompiled_pred.synthesized_hypothesis)
print("\n--- Compiled Output ---")
print(compiled_pred.synthesized_hypothesis)
# And let's score them with our metric
uncompiled_score = critic_pipeline_metric(test_example, uncompiled_pred)
compiled_score = critic_pipeline_metric(test_example, compiled_pred)
print(f"\nUncompiled Module Score: {uncompiled_score}")
print(f"Compiled Module Score: {compiled_score}")
We would expect to see a significant difference.
- The uncompiled output might be simplistic, perhaps just averaging the two ideas (e.g., “Both supply and demand are important for the economy.”).
- The compiled output, guided by its superior prompt, is much more likely to produce a sophisticated synthesis (e.g., “A new model suggesting that economic growth is a dynamic interplay where demand-side stimulus is effective in the short-run to utilize capacity, while long-run growth depends on supply-side investment to expand that capacity.”).
The scores from our critic_pipeline_metric
would reflect this difference in quality.
3. Conclusion: The Power of Self-Optimization
This tutorial has demonstrated the core principle of building self-optimizing systems with DSPy. By moving from manual prompt engineering to programmatic optimization, we gain several key advantages:
- Robustness: The optimized prompt is far more reliable across a wider range of inputs because it has been explicitly taught what a good output looks like.
- Adaptability: If we change our underlying LLM, we don’t need to re-write our prompts by hand. We simply re-run the
compile()
step, and DSPy will find the new optimal prompt for the new model. - Principled Design: Our system’s performance is driven by a clearly defined metric (our
CriticPipeline
), making the optimization process transparent and aligned with our project’s core values.
This self-optimization loopâwhere the system’s own critics are used to improve its own generative componentsâis a foundational concept for building the next generation of powerful, reliable, and adaptive AI reasoning systems like CNS 2.0.