Excellent question. This moves the conversation from “fixing a fragile integration” to “defining a strategic vision.” Given that you have control over both libraries, you are in a powerful position to build a highly cohesive and specialized platform.
This is not a bad idea. In fact, it is likely the best long-term strategy if your goal is to build a robust, observable, and easy-to-use system for developing AI applications in Elixir. Decoupling is a defensive maneuver when you can’t control the other party; co-designing is an offensive maneuver to build a superior, integrated product.
Here is a deep dive on why you should lean into this integration and a roadmap for refactoring Foundation
to be the premier runtime for DSPEx
.
The Case for a Unified AI Application Platform
Right now, you have a library (DSPEx
) that uses a framework (Foundation
). By refactoring, you would be evolving Foundation
from a generic framework into a specialized AI Application Platform, with DSPEx
as its core engine. This is a powerful paradigm shift with significant benefits:
- From Generic to Specialized: A generic framework for config and telemetry will always be a leaky abstraction for a domain as specific as LLM applications. By tailoring Foundation, you can provide features that are perfectly suited to the needs of
DSPEx
, such as native concepts of “LLM calls,” “token costs,” and “optimization traces.” - Unmatched Developer Experience: A developer using your unified platform would get an incredible “batteries-included” experience. They would define their
DSPEx
logic, and the Foundation platform would automatically provide:- Structured, correlated logging and tracing for every LLM call.
- Performance dashboards for optimization runs (e.g., BEACON trial scores over time).
- Cost tracking per request and per program.
- Resilience patterns (circuit breakers, rate limiting) that work out-of-the-box.
- Deep, Domain-Specific Observability: Instead of generic telemetry events, Foundation can have first-class concepts for the AI domain. This is the difference between logging and observing.
Generic Telemetry (Current) | Specialized Telemetry (Proposed) |
---|---|
Foundation.Telemetry.emit_histogram([:dspex, :predict, ...]) | Foundation.Telemetry.track_llm_call(duration, metadata) |
Foundation.Telemetry.emit_gauge([:dspex, :usage, ...]) | Foundation.Telemetry.track_token_usage(provider, model, tokens) |
Generic traces | Traces that understand the relationship between a Predict call, its underlying Client calls, and the resulting Example evaluation. |
- Solving the Lifecycle Problem: The fragile
wait_for_foundation()
hack disappears. The platform dictates the lifecycle. Foundation starts, it prepares its services, and then it boots its “engines” likeDSPEx
. This is a much cleaner and more robust startup sequence.
A Concrete Refactoring Roadmap for Foundation
Here is how you can evolve Foundation
to become this specialized platform, directly addressing the shortcomings we identified.
1. Refactor Foundation.Config
-> Foundation.ApplicationConfig
The problem is that Foundation is too restrictive. The solution is to give applications their own dedicated, validated configuration namespace.
- Define a Behaviour: Create a
Foundation.Application
behaviour.defmodule Foundation.Application do @callback config_schema() :: Ecto.Schema.t() @callback start(config :: map()) :: Supervisor.on_start() end
- Refactor Foundation: Foundation’s core logic will now load, validate, and provide the configuration declared by its registered applications. It could read a
foundation.exs
file that lists the applications to load, e.g.,[DSPEx.App]
. - Refactor DSPEx:
DSPEx
would implement this behaviour, declaring its configuration schema using Ecto. TheConfigManager
GenServer becomes much simpler or disappears entirely, replaced by a simple module that reads from the validated config provided by Foundation at startup.
2. Refactor Foundation.Telemetry
-> Foundation.AI.Observability
The problem is that the telemetry is generic and the integration is fragile. The solution is to make Foundation’s telemetry domain-aware.
- Create Domain-Specific Functions: Instead of
emit_histogram
, create specific functions in a newFoundation.AI.Observability
module. This moves the logic out of DSPEx and into Foundation.# In foundation/lib/foundation/ai/observability.ex defmodule Foundation.AI.Observability do def track_llm_request(metadata) do # Now Foundation knows what a "llm_request" is. # It can automatically extract provider, model, tokens, etc. # and emit the correct underlying telemetry. :telemetry.execute([:llm, :request, :stop], ... , metadata) end def track_optimization_trial(metadata) do # Foundation knows what a "trial" is and can automatically # create dashboards or logs for score, parameters, etc. :telemetry.execute([:teleprompter, :trial, :stop], ... , metadata) end end
- Refactor DSPEx:
DSPEx.TelemetrySetup
is simplified. It now just calls these higher-level Foundation functions, and the fragilehandle_dspex_event
logic is either moved into Foundation or becomes much simpler.
3. Refactor Foundation.Infrastructure
-> Foundation.AI.Resilience
The problem is the circuit breaker is half-implemented. The solution is to build a full-featured, transparent proxy for LLM clients.
- Create
Foundation.AI.ClientProxy
: This GenServer or module would be responsible for making HTTP requests. It would transparently handle:- Circuit Breaking: Using a robust library like
Fuse
. - Rate Limiting: Using a library like
Hammer
. - Caching: Using a library like
Cachex
orNebulex
.
- Circuit Breaking: Using a robust library like
- Refactor
DSPEx.Client
: TheDSPEx.Client
andClientManager
would be dramatically simplified. They would no longer make HTTP calls withReq
. Instead, they would callFoundation.AI.ClientProxy.request(request_params)
. All the resilience logic is now centralized and managed by the platform.
4. Build Foundation.Schema
(The “Exdantic” Core)
The problem is the lack of robust structured data handling. As discussed previously, the solution is to build this capability directly into Foundation using Ecto.
- Create
Foundation.Schema
: Implement theuse Foundation.Schema
macro that wrapsuse Ecto.Schema
and adds thejson_schema()
andfrom_llm_json()
functions. - Refactor
DSPEx
: TheDSPEx.Adapters.InstructorLiteGemini
is replaced by a newDSPEx.Adapters.StructuredOutput
adapter that usesFoundation.Schema
to generate the JSON schema and parse the response.DSPEx.Signature
could even be refactored to useFoundation.Schema
internally.
When This Is a Bad Idea
This integrated platform strategy is powerful, but it’s not without risks. You should reconsider this path if:
- You need
Foundation
to remain a general-purpose framework. If you have other, non-AI applications that depend on Foundation and would be hindered by its new AI-specific focus, then this is a bad idea. You would be sacrificing generality for specificity. - You plan for
DSPEx
to be a standalone, framework-agnostic library. If you want other developers to be able tomix install dspex
into any Phoenix, Nerves, or plain Elixir application without pulling in the entire Foundation platform, then this tight coupling is a mistake. A cohesive platform makes it harder to use the pieces in isolation. - Your team lacks the resources for a larger-scale refactoring. The path outlined above is more work than simply patching the existing fragile integrations. It’s a strategic investment. If you need a quick fix, decoupling and simplifying
DSPEx
’s dependencies might be faster.
The Litmus Test: Ask yourself this question: “Is DSPEx
the flagship application that Foundation
exists to support, or is DSPEx
just one of many independent applications that happens to use Foundation
?”
- If it’s the flagship, then refactoring Foundation into a specialized platform is the correct and most powerful strategic move.
- If it’s just one of many, then you should decouple them and keep Foundation generic.
Given the evidence in the code, it appears DSPEx
is already the primary driver of Foundation’s feature set. My recommendation is to embrace it. Commit to the platform vision. The result will be a far more robust, powerful, and elegant system than you could achieve by keeping the two libraries at arm’s length.