Of course. Here is a detailed technical document for the fifth proposed feature enhancement: “A Dedicated Foundation.Bridge.Python.Optuna
Module.”
Technical Specification: Dedicated Optuna
Bridge Module
Document Version: 1.0 Author: AI Assistant Status: PROPOSED
1. Overview
This document specifies the design of a new, high-level module, Foundation.Bridge.Python.Optuna
, built on top of the existing Python bridge infrastructure. The current bridge provides generic functions like execute/2
and call_function/3
, which are powerful but require the Elixir caller to construct raw Python code and manage the optimization state machine manually.
The goal of this enhancement is to abstract away the specific complexities of interacting with the optuna
library. It will provide a clean, stateful, and idiomatic Elixir API for managing a Bayesian Optimization study, dramatically simplifying the implementation of the DSPEx.MIPROv2
teleprompter.
Instead of MIPROv2
writing Python code strings, it will call functions like Optuna.create_study/1
, Optuna.ask/1
, and Optuna.tell/3
, making its code cleaner, more robust, and completely decoupled from the Python implementation details.
2. Problem Statement & Use Case
The DSPEx.MIPROv2
optimizer implements a complex Bayesian Optimization loop. The core logic of this loop involves a tight, stateful interaction with the optuna
library:
- Initialization: Create an
optuna
“study” with a specific direction (maximize
) and a sampler (TPESampler
). This study object must be preserved for the entire duration of the optimization run. Ask
Step: In each iteration, ask theoptuna
study for the next set of hyperparameters to try.optuna
uses its internal models to suggest the most promising parameters.Evaluate
Step:DSPEx
runs its evaluation using the suggested parameters and gets a score.Tell
Step: Report the score for the completed trial back to theoptuna
study. The study uses this new information to update its internal models for the nextask
step.
Implementing this logic using the generic Python.execute/2
function would be cumbersome and error-prone. The MIPROv2
GenServer
would have to:
- Maintain a reference to the Python
study
object across multiple IPC calls, which is very difficult. - Construct raw Python code strings for each step (e.g.,
code = "study.ask()"
). - Manually serialize and deserialize complex data structures (like the
trial
object) between Elixir and Python.
This new module will encapsulate all of this complexity.
3. Proposed API and Architecture
We will create a new Elixir module, Foundation.Bridge.Python.Optuna
, which will be a GenServer
. Each GenServer
process will manage a single optuna
study object within a dedicated Python worker process.
3.1. Public API (Foundation.Bridge.Python.Optuna
)
start_link(opts)
: Starts a new Optuna
bridge GenServer
. Each GenServer
represents one optimization “study”.
@spec start_link(opts :: keyword()) :: GenServer.on_start()
- Options:
:name
(optional): A name to register the study process.:study_config
: A map defining theoptuna
study, e.g.,%{direction: "maximize", sampler: "tpe"}
.:search_space
: A map defining the hyperparameter search space, e.g.,%{learning_rate: [type: "float", low: 0.001, high: 0.1], dropout: [type: "categorical", choices: [0.1, 0.2, 0.3]]}
.
ask(pid, n_trials \\ 1)
: Asks the study for the next set(s) of parameters.
@spec ask(pid :: pid(), n_trials :: pos_integer()) :: {:ok, list(map())} | {:error, Error.t()}
- Returns
{:ok, [trial1, trial2, ...]}
where eachtrial
is a map like%{trial_id: 123, params: %{learning_rate: 0.05}}
.
tell(pid, trial_id, value)
: Tells the study the result of a trial.
@spec tell(pid :: pid(), trial_id :: integer(), value :: number()) :: :ok | {:error, Error.t()}
best_trial(pid)
: Gets the best trial found so far.
@spec best_trial(pid :: pid()) :: {:ok, map()} | {:error, Error.t()}
3.2. Optuna
GenServer
Implementation
This GenServer
will manage the state and interaction with a dedicated Python worker.
State (defstruct
):
defstruct [
:study_name, # A unique name for the study in the Python process
:python_worker_pid, # The PID of the dedicated Python worker
:search_space, # The hyperparameter search space definition
:pending_requests # Map of Elixir refs to `from` tuples for async calls
]
init/1
:
- Checks out a dedicated worker from the
Foundation.Bridge.Python
pool. Crucially, it does not check this worker back in. ThisGenServer
now “owns” the Python worker for the duration of the study. - Generates a unique
study_name
(e.g.,"study_#{UUID.uuid4()}"
). - Sends a Python command to the worker to create and store the
optuna
study object in a global dictionary within the Python process, keyed bystudy_name
.# Python code executed by the worker on init import optuna # Global dictionary to hold studies if 'studies' not in globals(): studies = {} study_name = "..." # from Elixir search_space = {...} # from Elixir # Create the study studies[study_name] = optuna.create_study(...) # Store search space for later use studies[study_name].set_user_attr("search_space", search_space)
handle_call({:ask, ...})
:
- Constructs the Python code to call
studies[study_name].ask()
. Theask
method inoptuna
takes the search space definition. - Sends the command to the Python worker via the
Port
. - Receives the response, which contains the suggested parameters and a
trial_id
. - Replies to the Elixir caller.
handle_cast({:tell, ...})
:
- Constructs the Python code to call
studies[study_name].tell(trial_id, value)
. - Sends the command to the Python worker. This can be a
cast
as no reply is needed.
terminate/2
:
- Sends a final command to the Python worker to delete the study object from the global
studies
dictionary (del studies[study_name]
). - Checks the Python worker back into the
ConnectionManager
pool, making it available for other tasks.
4. Example Usage in DSPEx.MIPROv2
The MIPROv2
implementation becomes incredibly clean.
# in dspex/teleprompt/mipro_v2.ex
defmodule DSPEx.Teleprompt.MIPROv2 do
alias Foundation.Bridge.Python.Optuna
def compile(program, trainset, valset) do
# 1. Define the search space based on the program's parameters.
search_space = build_search_space(program)
# 2. Start the Optuna study manager.
{:ok, optuna_pid} = Optuna.start_link(
study_config: %{direction: "maximize"},
search_space: search_space
)
# 3. Run the optimization loop.
for _i <- 1..@num_trials do
# 3a. Ask for the next parameters.
{:ok, [trial]} = Optuna.ask(optuna_pid)
# 3b. Evaluate the program with these parameters.
candidate_program = configure_program(program, trial.params)
score = DSPEx.Evaluate.run(candidate_program, valset, &MyMetric.calc/2)
# 3c. Tell Optuna the result.
:ok = Optuna.tell(optuna_pid, trial.trial_id, score)
end
# 4. Get the best program and stop the study.
{:ok, best_trial} = Optuna.best_trial(optuna_pid)
GenServer.stop(optuna_pid)
configure_program(program, best_trial.params)
end
end
5. Implementation Considerations
- Worker Isolation: Each
Optuna
GenServer
must hold its own dedicated Python worker process. Sharing a worker between studies would lead to state conflicts. This means theFoundation.Bridge.Python
must support checking out workers for long-term, exclusive use. - Data Serialization: The
search_space
and trial results must be reliably serialized to JSON to be passed between Elixir and Python. The high-level API inFoundation.Bridge.Python.API
can be used for this. - Error Handling: The
Optuna
GenServer
must be robust. If its dedicated Python worker crashes, theGenServer
should trap the exit, log a critical error, and potentially try to restart the worker and reconstruct theoptuna
study state ifoptuna
supports storage backends (which it does, e.g., with SQLite or PostgreSQL).
6. Conclusion
Creating a dedicated Foundation.Bridge.Python.Optuna
module is a high-leverage enhancement. It encapsulates a significant amount of complex, stateful IPC logic behind a simple, idiomatic Elixir API.
This abstraction provides immense value to DSPEx
:
- Simplifies Optimizer Code: The
MIPROv2
implementation becomes clean, readable, and focused on orchestration rather than low-level Python interop. - Increases Robustness: Centralizes the state management and error handling for
optuna
interactions in one well-tested module. - Promotes Reusability: This dedicated
Optuna
bridge could be used by other parts of thefoundation
ecosystem or other applications, not justDSPEx
.
This is a strategic investment in the foundation
layer that will pay significant dividends in the development of DSPEx
’s most advanced optimization features.