← Back to 002 foundation enhancements

006 5of10

Documentation for 006_5of10 from the Ds ex repository.

Of course. Here is a detailed technical document for the fifth proposed feature enhancement: “A Dedicated Foundation.Bridge.Python.Optuna Module.”


Technical Specification: Dedicated Optuna Bridge Module

Document Version: 1.0 Author: AI Assistant Status: PROPOSED

1. Overview

This document specifies the design of a new, high-level module, Foundation.Bridge.Python.Optuna, built on top of the existing Python bridge infrastructure. The current bridge provides generic functions like execute/2 and call_function/3, which are powerful but require the Elixir caller to construct raw Python code and manage the optimization state machine manually.

The goal of this enhancement is to abstract away the specific complexities of interacting with the optuna library. It will provide a clean, stateful, and idiomatic Elixir API for managing a Bayesian Optimization study, dramatically simplifying the implementation of the DSPEx.MIPROv2 teleprompter.

Instead of MIPROv2 writing Python code strings, it will call functions like Optuna.create_study/1, Optuna.ask/1, and Optuna.tell/3, making its code cleaner, more robust, and completely decoupled from the Python implementation details.


2. Problem Statement & Use Case

The DSPEx.MIPROv2 optimizer implements a complex Bayesian Optimization loop. The core logic of this loop involves a tight, stateful interaction with the optuna library:

  1. Initialization: Create an optuna “study” with a specific direction (maximize) and a sampler (TPESampler). This study object must be preserved for the entire duration of the optimization run.
  2. Ask Step: In each iteration, ask the optuna study for the next set of hyperparameters to try. optuna uses its internal models to suggest the most promising parameters.
  3. Evaluate Step: DSPEx runs its evaluation using the suggested parameters and gets a score.
  4. Tell Step: Report the score for the completed trial back to the optuna study. The study uses this new information to update its internal models for the next ask step.

Implementing this logic using the generic Python.execute/2 function would be cumbersome and error-prone. The MIPROv2 GenServer would have to:

  • Maintain a reference to the Python study object across multiple IPC calls, which is very difficult.
  • Construct raw Python code strings for each step (e.g., code = "study.ask()").
  • Manually serialize and deserialize complex data structures (like the trial object) between Elixir and Python.

This new module will encapsulate all of this complexity.


3. Proposed API and Architecture

We will create a new Elixir module, Foundation.Bridge.Python.Optuna, which will be a GenServer. Each GenServer process will manage a single optuna study object within a dedicated Python worker process.

3.1. Public API (Foundation.Bridge.Python.Optuna)

start_link(opts): Starts a new Optuna bridge GenServer. Each GenServer represents one optimization “study”.

@spec start_link(opts :: keyword()) :: GenServer.on_start()
  • Options:
    • :name (optional): A name to register the study process.
    • :study_config: A map defining the optuna study, e.g., %{direction: "maximize", sampler: "tpe"}.
    • :search_space: A map defining the hyperparameter search space, e.g., %{learning_rate: [type: "float", low: 0.001, high: 0.1], dropout: [type: "categorical", choices: [0.1, 0.2, 0.3]]}.

ask(pid, n_trials \\ 1): Asks the study for the next set(s) of parameters.

@spec ask(pid :: pid(), n_trials :: pos_integer()) :: {:ok, list(map())} | {:error, Error.t()}
  • Returns {:ok, [trial1, trial2, ...]} where each trial is a map like %{trial_id: 123, params: %{learning_rate: 0.05}}.

tell(pid, trial_id, value): Tells the study the result of a trial.

@spec tell(pid :: pid(), trial_id :: integer(), value :: number()) :: :ok | {:error, Error.t()}

best_trial(pid): Gets the best trial found so far.

@spec best_trial(pid :: pid()) :: {:ok, map()} | {:error, Error.t()}
3.2. Optuna GenServer Implementation

This GenServer will manage the state and interaction with a dedicated Python worker.

State (defstruct):

defstruct [
  :study_name,       # A unique name for the study in the Python process
  :python_worker_pid,  # The PID of the dedicated Python worker
  :search_space,     # The hyperparameter search space definition
  :pending_requests  # Map of Elixir refs to `from` tuples for async calls
]

init/1:

  1. Checks out a dedicated worker from the Foundation.Bridge.Python pool. Crucially, it does not check this worker back in. This GenServer now “owns” the Python worker for the duration of the study.
  2. Generates a unique study_name (e.g., "study_#{UUID.uuid4()}").
  3. Sends a Python command to the worker to create and store the optuna study object in a global dictionary within the Python process, keyed by study_name.
    # Python code executed by the worker on init
    import optuna
    
    # Global dictionary to hold studies
    if 'studies' not in globals():
        studies = {}
    
    study_name = "..." # from Elixir
    search_space = {...} # from Elixir
    
    # Create the study
    studies[study_name] = optuna.create_study(...)
    
    # Store search space for later use
    studies[study_name].set_user_attr("search_space", search_space)
    

handle_call({:ask, ...}):

  1. Constructs the Python code to call studies[study_name].ask(). The ask method in optuna takes the search space definition.
  2. Sends the command to the Python worker via the Port.
  3. Receives the response, which contains the suggested parameters and a trial_id.
  4. Replies to the Elixir caller.

handle_cast({:tell, ...}):

  1. Constructs the Python code to call studies[study_name].tell(trial_id, value).
  2. Sends the command to the Python worker. This can be a cast as no reply is needed.

terminate/2:

  1. Sends a final command to the Python worker to delete the study object from the global studies dictionary (del studies[study_name]).
  2. Checks the Python worker back into the ConnectionManager pool, making it available for other tasks.

4. Example Usage in DSPEx.MIPROv2

The MIPROv2 implementation becomes incredibly clean.

# in dspex/teleprompt/mipro_v2.ex
defmodule DSPEx.Teleprompt.MIPROv2 do
  alias Foundation.Bridge.Python.Optuna

  def compile(program, trainset, valset) do
    # 1. Define the search space based on the program's parameters.
    search_space = build_search_space(program)
    
    # 2. Start the Optuna study manager.
    {:ok, optuna_pid} = Optuna.start_link(
      study_config: %{direction: "maximize"},
      search_space: search_space
    )
    
    # 3. Run the optimization loop.
    for _i <- 1..@num_trials do
      # 3a. Ask for the next parameters.
      {:ok, [trial]} = Optuna.ask(optuna_pid)
      
      # 3b. Evaluate the program with these parameters.
      candidate_program = configure_program(program, trial.params)
      score = DSPEx.Evaluate.run(candidate_program, valset, &MyMetric.calc/2)
      
      # 3c. Tell Optuna the result.
      :ok = Optuna.tell(optuna_pid, trial.trial_id, score)
    end
    
    # 4. Get the best program and stop the study.
    {:ok, best_trial} = Optuna.best_trial(optuna_pid)
    GenServer.stop(optuna_pid)
    
    configure_program(program, best_trial.params)
  end
end

5. Implementation Considerations

  • Worker Isolation: Each Optuna GenServer must hold its own dedicated Python worker process. Sharing a worker between studies would lead to state conflicts. This means the Foundation.Bridge.Python must support checking out workers for long-term, exclusive use.
  • Data Serialization: The search_space and trial results must be reliably serialized to JSON to be passed between Elixir and Python. The high-level API in Foundation.Bridge.Python.API can be used for this.
  • Error Handling: The Optuna GenServer must be robust. If its dedicated Python worker crashes, the GenServer should trap the exit, log a critical error, and potentially try to restart the worker and reconstruct the optuna study state if optuna supports storage backends (which it does, e.g., with SQLite or PostgreSQL).

6. Conclusion

Creating a dedicated Foundation.Bridge.Python.Optuna module is a high-leverage enhancement. It encapsulates a significant amount of complex, stateful IPC logic behind a simple, idiomatic Elixir API.

This abstraction provides immense value to DSPEx:

  • Simplifies Optimizer Code: The MIPROv2 implementation becomes clean, readable, and focused on orchestration rather than low-level Python interop.
  • Increases Robustness: Centralizes the state management and error handling for optuna interactions in one well-tested module.
  • Promotes Reusability: This dedicated Optuna bridge could be used by other parts of the foundation ecosystem or other applications, not just DSPEx.

This is a strategic investment in the foundation layer that will pay significant dividends in the development of DSPEx’s most advanced optimization features.