SIMBA py

Documentation for SIMBA_py from the Ds ex repository.

This is a great request. Based on the provided codebase, here is a detailed Mermaid diagram that illustrates the SIMBA (Stochastic Introspective Mini-Batch Ascent) optimizer as a monolithic entity, showing its core components, control flow, and connections to other parts of the system down to the function and variable level.

Diagram Explanation

This diagram visualizes the SIMBA teleprompter’s compile method, which is its main entry point for optimizing a DSPy program (student). The process is iterative and can be broken down into several key stages:

Initialization: The compile method starts by initializing its state, including a pool of programs (starting with the initial student), a program_scores tracker, and a parallel executor.
Main Optimization Loop: The process iterates for max_steps. Each step is a mini-batch ascent.
- Step 1: Trajectory Sampling: For each example in the current mini-batch, SIMBA samples multiple “trajectories.” It prepares slightly different Language Models (LMs) using prepare_models_for_resampling and selects a program from its pool using softmax_sample based on past performance. Each program is then wrapped using wrap_program to ensure its execution is scored by the provided metric.
- Step 2: Parallel Execution: The generated program-example pairs are executed in parallel using dspy.Parallel. This step generates a rich set of traces and scores.
- Step 3: Introspection (Bucketing): The results are grouped into “buckets,” one for each unique training example. These buckets, containing both successful and unsuccessful trajectories, are sorted to identify the most informative examples for learning.
- Step 4: Candidate Generation: SIMBA generates new candidate programs. It first selects a high-performing “source” program from its pool (again, via softmax_sample). Then, it randomly chooses a strategy (append_a_demo or append_a_rule) and applies it to an informative bucket, modifying the source program to create a new candidate.
  - The append_a_demo strategy extracts a successful trace from the bucket and adds it as a few-shot example.
  - The append_a_rule strategy uses the OfferFeedback signature to compare a good and a bad trajectory, generating a natural language “rule” (advice) that gets appended to a predictor’s instructions.
- Steps 5-8: Evaluation & Selection: The newly generated candidate programs are evaluated on the same mini-batch. The best-performing one is added to a list of winning_programs. All new candidates and their scores are registered back into the main program pool, improving the diversity and quality of options for the next iteration.
Final Evaluation: After all optimization steps are complete, SIMBA performs a final, comprehensive evaluation of all “winning” programs on the entire training set to select and return the single best-performing program.

This entire process enables SIMBA to iteratively refine a DSPy program by exploring different prompt structures (demos and instructions) and using introspection to learn from its successes and failures on a mini-batch basis.

Mermaid Diagram Code

graph TD %% Entry Point & Inputs Student["student: dspy.Module"] Trainset["trainset: list(dspy.Example)"] SIMBA_Instance["SIMBA Instance"] Student --> SIMBA_Instance Trainset --> SIMBA_Instance subgraph SIMBA_Optimizer["class SIMBA(Teleprompter)"] direction LR %% Initialization & Main Method SIMBA_init["__init__(metric, bsize, num_candidates, ...)"] SIMBA_compile["compile(student, trainset, seed)"] SIMBA_Instance --> SIMBA_compile subgraph Compile_Execution_Flow direction TB %% State state["State
- programs: list
- program_scores: dict
- rng, rng_np
- run_parallel: dspy.Parallel"] SIMBA_compile --> state %% Main Loop main_loop{"for step in max_steps"} state --> main_loop %% Loop Steps step1_batch["Step 1: Get Batch & Sample Trajectories"] step2_exec["Step 2: Execute Trajectories"] step3_bucket["Step 3: Create & Sort Buckets"] step4_candidates["Step 4: Build New Candidates"] step5_eval["Step 5: Evaluate New Candidates"] step6_scores["Step 6: Compute Avg Scores"] step7_select["Step 7: Select Winning Program"] step8_register["Step 8: Register New Programs"] main_loop --> step1_batch step1_batch --> step2_exec step2_exec --> step3_bucket step3_bucket --> step4_candidates step4_candidates --> step5_eval step5_eval --> step6_scores step6_scores --> step7_select step7_select --> step8_register step8_register --> main_loop end %% Final Output final_eval["Final Evaluation on Full Trainset"] BestProgram["Best dspy.Module"] main_loop --o final_eval final_eval --> BestProgram end %% Step 1 Details: Trajectory Sampling subgraph Step1_Details[" "] direction LR style Step1_Details fill:none,stroke:none batch_data["batch: list[dspy.Example]"] step1_batch --> batch_data softmax_sample1["softmax_sample(progs, temperature_for_sampling)"] prepare_models_util["simba_utils.prepare_models_for_resampling"] wrap_program_util["simba_utils.wrap_program(program, metric)"] step1_batch --> softmax_sample1 step1_batch --> prepare_models_util step1_batch --> wrap_program_util end %% Step 2 Details: Execution dspy_parallel["dspy.Parallel(run_parallel)"] step2_exec --> |"uses"| dspy_parallel wrapped_program["wrapped_program(example)"] wrap_program_util --> wrapped_program dspy_parallel --> |"calls"| wrapped_program outputs["outputs: list[dict]"] wrapped_program --> outputs step2_exec -- "generates" --> outputs %% Step 3 Details: Bucketing buckets["Buckets: list[list[dict]]"] outputs --> step3_bucket step3_bucket -- "sorts outputs into" --> buckets %% Step 4 Details: Candidate Generation subgraph Step4_Details[" "] direction LR style Step4_Details fill:none,stroke:none softmax_sample2["softmax_sample(progs, temperature_for_candidates)"] strategy_choice{"Choose Strategy"} step4_candidates --> |"uses bucket to inform"| softmax_sample2 softmax_sample2 --> |"selects src_program"| step4_candidates step4_candidates --> strategy_choice end new_candidates["New Candidate Programs"] strategy_choice -- "append_a_demo" --> new_candidates strategy_choice -- "append_a_rule" --> new_candidates %% Step 5, 6, 7, 8 Details candidate_outputs["candidate_outputs"] step5_eval --> |"uses"| dspy_parallel dspy_parallel -- "evaluates new candidates to get" --> candidate_outputs candidate_outputs --> step6_scores winning_programs["winning_programs list"] step7_select --> |"adds to"| winning_programs register_new_program["register_new_program()"] step8_register --> register_new_program program_scores_db(("program_scores: dict")) register_new_program --> |"updates"| program_scores_db %% Strategy Details subgraph Strategies direction TB append_demo_util["simba_utils.append_a_demo"] append_rule_util["simba_utils.append_a_rule"] strategy_choice -- "or" --> append_demo_util strategy_choice -- "or" --> append_rule_util append_demo_util -- "Extracts best trace from bucket; modifies" --> Demos["predictor.demos"] append_rule_util -- "Compares good/bad trace from bucket; calls" --> OfferFeedback_Predict["dspy.Predict(OfferFeedback)"] subgraph OfferFeedback_Signature direction TB OfferFeedback["OfferFeedback(Signature)"] OfferFeedback_Inputs["Inputs:
- program_code
- modules_defn
- trajectories (worse/better)
- rewards (worse/better)"] OfferFeedback_Outputs["Outputs:
- discussion
- module_advice: dict[str, str]"] OfferFeedback --> OfferFeedback_Inputs OfferFeedback --> OfferFeedback_Outputs end OfferFeedback_Predict -- "generates" --> Advice["advice: dict"] Advice -- "modifies" --> Instructions["predictor.instructions"] classDef default fill:#f9f,stroke:#333,stroke-width:2px,color:#000 classDef data fill:#ccf,stroke:#333,stroke-width:2px,color:#000 end