This is a great request. Based on the provided codebase, here is a detailed Mermaid diagram that illustrates the SIMBA
(Stochastic Introspective Mini-Batch Ascent) optimizer as a monolithic entity, showing its core components, control flow, and connections to other parts of the system down to the function and variable level.
Diagram Explanation
This diagram visualizes the SIMBA
teleprompter’s compile
method, which is its main entry point for optimizing a DSPy program (student
). The process is iterative and can be broken down into several key stages:
Initialization: The
compile
method starts by initializing its state, including a pool ofprograms
(starting with the initialstudent
), aprogram_scores
tracker, and a parallel executor.Main Optimization Loop: The process iterates for
max_steps
. Each step is a mini-batch ascent.- Step 1: Trajectory Sampling: For each example in the current mini-batch, SIMBA samples multiple “trajectories.” It prepares slightly different Language Models (LMs) using
prepare_models_for_resampling
and selects a program from its pool usingsoftmax_sample
based on past performance. Each program is then wrapped usingwrap_program
to ensure its execution is scored by the providedmetric
. - Step 2: Parallel Execution: The generated program-example pairs are executed in parallel using
dspy.Parallel
. This step generates a rich set of traces and scores. - Step 3: Introspection (Bucketing): The results are grouped into “buckets,” one for each unique training example. These buckets, containing both successful and unsuccessful trajectories, are sorted to identify the most informative examples for learning.
- Step 4: Candidate Generation: SIMBA generates new candidate programs. It first selects a high-performing “source” program from its pool (again, via
softmax_sample
). Then, it randomly chooses a strategy (append_a_demo
orappend_a_rule
) and applies it to an informative bucket, modifying the source program to create a new candidate.- The
append_a_demo
strategy extracts a successful trace from the bucket and adds it as a few-shot example. - The
append_a_rule
strategy uses theOfferFeedback
signature to compare a good and a bad trajectory, generating a natural language “rule” (advice) that gets appended to a predictor’s instructions.
- The
- Steps 5-8: Evaluation & Selection: The newly generated candidate programs are evaluated on the same mini-batch. The best-performing one is added to a list of
winning_programs
. All new candidates and their scores are registered back into the main program pool, improving the diversity and quality of options for the next iteration.
- Step 1: Trajectory Sampling: For each example in the current mini-batch, SIMBA samples multiple “trajectories.” It prepares slightly different Language Models (LMs) using
Final Evaluation: After all optimization steps are complete, SIMBA performs a final, comprehensive evaluation of all “winning” programs on the entire training set to select and return the single best-performing program.
This entire process enables SIMBA to iteratively refine a DSPy program by exploring different prompt structures (demos and instructions) and using introspection to learn from its successes and failures on a mini-batch basis.
Mermaid Diagram Code
- programs: list
- program_scores: dict
- rng, rng_np
- run_parallel: dspy.Parallel"] SIMBA_compile --> state %% Main Loop main_loop{"for step in max_steps"} state --> main_loop %% Loop Steps step1_batch["Step 1: Get Batch & Sample Trajectories"] step2_exec["Step 2: Execute Trajectories"] step3_bucket["Step 3: Create & Sort Buckets"] step4_candidates["Step 4: Build New Candidates"] step5_eval["Step 5: Evaluate New Candidates"] step6_scores["Step 6: Compute Avg Scores"] step7_select["Step 7: Select Winning Program"] step8_register["Step 8: Register New Programs"] main_loop --> step1_batch step1_batch --> step2_exec step2_exec --> step3_bucket step3_bucket --> step4_candidates step4_candidates --> step5_eval step5_eval --> step6_scores step6_scores --> step7_select step7_select --> step8_register step8_register --> main_loop end %% Final Output final_eval["Final Evaluation on Full Trainset"] BestProgram["Best dspy.Module"] main_loop --o final_eval final_eval --> BestProgram end %% Step 1 Details: Trajectory Sampling subgraph Step1_Details[" "] direction LR style Step1_Details fill:none,stroke:none batch_data["batch: list[dspy.Example]"] step1_batch --> batch_data softmax_sample1["softmax_sample(progs, temperature_for_sampling)"] prepare_models_util["simba_utils.prepare_models_for_resampling"] wrap_program_util["simba_utils.wrap_program(program, metric)"] step1_batch --> softmax_sample1 step1_batch --> prepare_models_util step1_batch --> wrap_program_util end %% Step 2 Details: Execution dspy_parallel["dspy.Parallel(run_parallel)"] step2_exec --> |"uses"| dspy_parallel wrapped_program["wrapped_program(example)"] wrap_program_util --> wrapped_program dspy_parallel --> |"calls"| wrapped_program outputs["outputs: list[dict]"] wrapped_program --> outputs step2_exec -- "generates" --> outputs %% Step 3 Details: Bucketing buckets["Buckets: list[list[dict]]"] outputs --> step3_bucket step3_bucket -- "sorts outputs into" --> buckets %% Step 4 Details: Candidate Generation subgraph Step4_Details[" "] direction LR style Step4_Details fill:none,stroke:none softmax_sample2["softmax_sample(progs, temperature_for_candidates)"] strategy_choice{"Choose Strategy"} step4_candidates --> |"uses bucket to inform"| softmax_sample2 softmax_sample2 --> |"selects src_program"| step4_candidates step4_candidates --> strategy_choice end new_candidates["New Candidate Programs"] strategy_choice -- "append_a_demo" --> new_candidates strategy_choice -- "append_a_rule" --> new_candidates %% Step 5, 6, 7, 8 Details candidate_outputs["candidate_outputs"] step5_eval --> |"uses"| dspy_parallel dspy_parallel -- "evaluates new candidates to get" --> candidate_outputs candidate_outputs --> step6_scores winning_programs["winning_programs list"] step7_select --> |"adds to"| winning_programs register_new_program["register_new_program()"] step8_register --> register_new_program program_scores_db(("program_scores: dict")) register_new_program --> |"updates"| program_scores_db %% Strategy Details subgraph Strategies direction TB append_demo_util["simba_utils.append_a_demo"] append_rule_util["simba_utils.append_a_rule"] strategy_choice -- "or" --> append_demo_util strategy_choice -- "or" --> append_rule_util append_demo_util -- "Extracts best trace from bucket; modifies" --> Demos["predictor.demos"] append_rule_util -- "Compares good/bad trace from bucket; calls" --> OfferFeedback_Predict["dspy.Predict(OfferFeedback)"] subgraph OfferFeedback_Signature direction TB OfferFeedback["OfferFeedback(Signature)"] OfferFeedback_Inputs["Inputs:
- program_code
- modules_defn
- trajectories (worse/better)
- rewards (worse/better)"] OfferFeedback_Outputs["Outputs:
- discussion
- module_advice: dict[str, str]"] OfferFeedback --> OfferFeedback_Inputs OfferFeedback --> OfferFeedback_Outputs end OfferFeedback_Predict -- "generates" --> Advice["advice: dict"] Advice -- "modifies" --> Instructions["predictor.instructions"] classDef default fill:#f9f,stroke:#333,stroke-width:2px,color:#000 classDef data fill:#ccf,stroke:#333,stroke-width:2px,color:#000 end