Here is a high-level technical plan for adapting the SIMBA
teleprompter to the dspex
environment, leveraging its innovative variables feature.
1. Executive Summary
This plan outlines the reimplementation of dspy
’s SIMBA
(Stochastic Introspective Mini-Batch Ascent) optimizer within the Elixir-centric dspex
framework. The original SIMBA
is a Python-native, in-memory algorithm that evolves a population of dspy.Module
programs by introspecting their performance and generating new few-shot examples and natural language rules.
The adaptation will transform SIMBA
into a distributed, persistent, and language-agnostic optimization process orchestrated by Elixir. This is made possible by dspex
’s core innovation: a cross-language state management system built on a unified gRPC bridge, referred to as the “variables” feature. Instead of directly manipulating Python objects in memory, the new SIMBA
orchestrator, written in Elixir, will manage the optimizable parameters of a dspy.Module
(its instructions and few-shot demos) as variables within the dspex
SessionStore
. The Python dspy
module becomes a stateless execution engine that dynamically reconfigures itself based on these variables, which are updated by the Elixir optimization loop.
2. Core Architectural Shift: From In-Memory to State-Driven
The fundamental change is moving the optimization “brain” from Python to Elixir and representing the program’s state declaratively.
Original
dspy
SIMBA
:- Orchestration: A Python loop within a single process.
- State Management: A list of
dspy.Module
objects held in memory. - Manipulation: Directly modifies Python objects (e.g.,
predictor.demos.append(...)
,predictor.signature.instructions += ...
). - Lifecycle: The optimization state is ephemeral and lost when the process ends.
Adapted
dspex
SIMBA
:- Orchestration: An Elixir
GenServer
process (DSPex.Optimizers.SIMBA
) that manages the optimization loop. - State Management: The set of candidate programs is stored as versioned “Configuration Spaces” within the
Snakepit.Bridge.SessionStore
. A program’s instructions and demos are treated as distinct, remotely-managed variables. - Manipulation: The Elixir orchestrator proposes new values for the
instructions
anddemos
variables. It applies these changes atomically via theDSPex.CognitiveConfiguration.apply/2
function, which triggers updates through the gRPC bridge. - Lifecycle: The optimization state is persistent, fault-tolerant, and can be inspected or resumed at any time.
- Orchestration: An Elixir
This new architecture leverages the strengths of each environment: Elixir’s BEAM for robust, concurrent orchestration, and Python’s dspy
for high-quality LLM interaction and execution.
3. Component Breakdown
The implementation will be split between Elixir (orchestration) and Python (execution), communicating via the existing gRPC bridge.
3.1. Elixir Components (dspex
& snakepit
)
DSPex.Optimizers.SIMBA
Module:- The primary orchestrator, implemented as a
GenServer
. - It will contain the main
compile/3
function, which executes the optimization loop over multiple steps. - Manages the pool of candidate programs by interacting with the
DSPex.CognitiveConfiguration
layer.
- The primary orchestrator, implemented as a
DSPex.CognitiveConfiguration
for DSPy Modules:- A high-level Elixir struct representing the optimizable state of a
dspy.Module
. - It will define two primary variables for each predictor in the student program:
instructions
: A:string
type variable.demos
: A:list
ofDSPex.Example
structs.
- This configuration will be registered in the
SessionStore
for each candidate program.
- A high-level Elixir struct representing the optimizable state of a
Elixir-based Strategy Implementations:
- The
append_a_demo
andappend_a_rule
strategies fromsimba_utils.py
will be re-implemented in Elixir. append_a_demo
: Takes a successful trajectory (received as a map from Python) and constructs a newDSPex.Example
. It then proposes a new configuration by appending this example to thedemos
variable.append_a_rule
: This requires an LLM call. The Elixir orchestrator will use the bridge to execute adspy.Predict
module on the Python side using theOfferFeedback
signature. It will pass the good/bad trajectories as arguments and receive the generated “advice” string back. This string is then appended to theinstructions
variable to form a new candidate configuration.
- The
3.2. Python Components (snakepit_bridge
)
VariableAwareMixin
fordspy.Module
:- The student
dspy.Module
being optimized must be enhanced with this mixin, which is a core part of thedspex
bridge design. - This mixin connects the module to the
SessionContext
, allowing it to read variables managed by Elixir.
- The student
sync_variables()
Hook:- Before each
forward()
call, theVariableAware
module will callsync_variables()
. - This method fetches the latest values for
instructions
anddemos
from theSessionContext
cache (which is kept up-to-date by the gRPC stream). - It then dynamically updates the module’s internal state (e.g.,
self.predictor.signature.instructions = ...
,self.predictor.demos = ...
) before execution. The Python module is thus a “dumb” executor, remotely configured by Elixir on each call.
- Before each
4. The Adapted SIMBA Optimization Loop (in Elixir)
The main loop in DSPex.Optimizers.SIMBA.compile/3
will proceed as follows for each step:
- Get Batch: A mini-batch is sampled from the
trainset
. - Sample Trajectories (The “Read” Phase):
- The orchestrator selects a set of candidate programs (configurations) from the
SessionStore
. - It uses the
dspex
parallel execution utilities to run the Pythondspy.Module
for each example in the batch against each selected program configuration. This involves passing both the example inputs and the configuration (instructions/demos) to the Python side. - The Python module executes and returns a scored trajectory (
prediction
,trace
,score
). These results are collected in Elixir.
- The orchestrator selects a set of candidate programs (configurations) from the
- Introspection (The “Analyze” Phase):
- The Elixir orchestrator receives the “buckets” of scored trajectories.
- It implements the same logic as the original
SIMBA
to sort buckets by score gap (max-min, max-avg) to identify the most informative training instances.
- Propose New Candidates (The “Write” Phase):
- The orchestrator selects a high-performing program configuration as a “source” for evolution.
- It randomly chooses a strategy (
append_a_demo
orappend_a_rule
). - It executes the chosen strategy (as described in 3.1) to generate a new proposed configuration (a map of
{instructions: "...", demos: [...]}
). - This is repeated to generate
num_candidates
new program configurations.
- Evaluate & Evolve:
- The new candidate configurations are evaluated on the same mini-batch.
- The best-performing configurations are registered in the
SessionStore
and added to the pool of programs for the next iteration.
This flow is illustrated below:
5. Benefits of the dspex
Adaptation
- Persistence & Fault Tolerance: The optimization state is managed by
SessionStore
(backed by ETS), making the process resilient to crashes and resumable. - Scalability: The most expensive stepâtrajectory samplingâis highly parallel and can leverage the full power of the BEAM across multiple cores and nodes.
- Introspection & Control: The entire optimization process is visible and controllable from the Elixir side, enabling better monitoring, debugging, and integration with other Elixir-native tools.
- Language Agnostic Orchestration: While the student is a
dspy.Module
, the Elixir orchestrator could theoretically optimize a module written in any language that implements thedspex
bridge protocol. - Decoupling: The optimization logic is cleanly separated from the execution logic, adhering to
dspex
’s architectural principles.