← Back to 20250617

101 gemini plan

Documentation for 101_gemini_plan from the Ds ex repository.

Excellent. Let’s focus on Step 1. You’ve correctly identified the core question: what exactly is the “ostensible instability” in Foundation that the DSPEx code is trying to work around?

It’s not that Foundation is crashing randomly. The instability manifests as a brittle and unpredictable contract between the two libraries. DSPEx expects Foundation to behave one way, but in certain situations (like testing or shutdown), it behaves another way, causing DSPEx to crash. The “workarounds” are DSPEx trying to defend itself.

Let’s break down the specific issues revealed by the DSPEx codebase. This will give you a concrete list of things to fix in Foundation to make Step 1 a success.


Deep Dive: The Specific Instabilities in Foundation

Here are the problems your DSPEx code is currently solving with “defensive programming.” Fixing these in Foundation will allow you to remove the complex try/rescue blocks from DSPEx.

1. Lifecycle and Race Conditions During Shutdown/Testing

  • Problem Manifestation (in DSPEx): The DSPEx.Services.TelemetrySetup.handle_dspex_event/4 function has a massive try/rescue block. It explicitly tries to catch ArgumentError, SystemLimitError, UndefinedFunctionError, FunctionClauseError, and various :exit signals. It logs messages like “ETS table may be gone,” “process may be dead,” and “Foundation is shutting down.”

  • What this tells us about Foundation:

    1. Unmanaged Lifecycles: Foundation’s services (like its telemetry event store, which likely uses an ETS table) are being torn down before DSPEx’s telemetry handler has been cleanly detached. When a final event fires during shutdown, DSPEx tries to call Foundation.Telemetry, which then tries to access its now-dead ETS table, causing a crash.
    2. No Guaranteed Shutdown Order: The system lacks a graceful, ordered shutdown mechanism. A robust platform should ensure that dependent processes (like DSPEx’s telemetry handler) are notified and given a chance to terminate before their dependencies (like Foundation’s services) are killed.
  • How to Fix in Foundation:

    • Implement a Coordinated Shutdown Protocol: Foundation should manage the application lifecycle. When a shutdown is initiated, it should first signal all registered “subscriber” applications (like DSPEx) to begin termination. It should wait for acknowledgements from these subscribers before it proceeds to terminate its own core services.
    • Make Foundation.available?() Robust: This function should be the single source of truth. If it returns false, no Foundation function should ever be called. The rescue block in DSPEx suggests this isn’t currently reliable. Foundation’s core functions should perhaps start with if !available?(), do: {:error, :foundation_not_available} to enforce this contract.

2. Unpredictable Function Contracts (Foundation.Config)

  • Problem Manifestation (in DSPEx): The DSPEx.Services.ConfigManager explicitly bypasses Foundation.Config.get/1 and uses its own fallback mechanism. The comments are revealing: “Foundation Config has contract violations - use fallback until fixed” and “Foundation.Config may throw errors despite Dialyzer thinking it only returns :ok”.

  • What this tells us about Foundation:

    1. Violates its Own Typespecs: The @spec for a function in Foundation.Config likely promises a return type of {:ok, term} | {:error, term}, but it’s sometimes crashing or throwing an exception instead. This is a major design flaw. Functions should return what their spec says they will return.
    2. Overly Restrictive Permissions: The log message “DSPEx config path is restricted” indicates that Foundation.Config has a permissions model that is too rigid. It’s preventing a legitimate sub-application (DSPEx) from storing its own configuration in its own namespace. This is an architectural design flaw in Foundation’s configuration philosophy.
  • How to Fix in Foundation:

    • Enforce Function Contracts: Go through every public function in Foundation.Config and ensure it never crashes. All failure paths must be handled internally and returned as a standard {:error, reason} tuple, matching the typespec. A function should never raise an exception for a predictable failure (like a key not found or a write being forbidden).
    • Implement Application Namespacing: Refactor Foundation.Config to allow registered applications (like DSPEx) to have their own dedicated, writable configuration “slice.” Foundation should not block an application from managing its own state under its own top-level key (e.g., config.dspex).

3. Missing or Unreliable Global Services (Foundation.Utils)

  • Problem Manifestation (in DSPEx): Multiple modules (DSPEx.ClientManager, DSPEx.Program, etc.) have a rescue block around calls to Foundation.Utils.generate_correlation_id(), providing a fallback implementation.

  • What this tells us about Foundation:

    • Availability Issues: The Foundation.Utils module, or the service it depends on, is not guaranteed to be available when DSPEx code is running. This could be another lifecycle issue or a problem with how Foundation is started and registered. A utility module should generally be stateless and always available after code loading. If it depends on a GenServer, that server isn’t being started reliably before its consumers need it.
  • How to Fix in Foundation:

    • Ensure Utils is Stateless: If Foundation.Utils contains simple, pure functions, it should never fail. If it depends on a process, that dependency needs to be managed properly by Foundation’s application supervisor to guarantee it’s started first.
    • Establish a “Ready” State: Foundation’s application supervisor should have a clear state transition to :ready only after all its core services and utilities are fully initialized and available. The Foundation.available?() function should only return true after this point.

Summary for Your Action Plan (Step 1)

Your goal for Step 1 is to shore up Foundation so that the DSPEx code no longer needs to be so defensive. Here is your concrete checklist for Foundation:

  1. Fix the Shutdown Order:

    • Implement a broadcast mechanism in Foundation where it tells registered apps “prepare to shut down” before it kills its own internal services.
    • In DSPEx, create a handler for this signal that detaches its telemetry listeners cleanly.
  2. Fix the Foundation.Config Contract:

    • Review every function in Foundation.Config. Add internal try/rescue blocks so they always return {:ok, ...} or {:error, ...} as promised in their @spec. No more unexpected crashes.
    • Change the permissions model to allow applications to write to their own namespace (e.g., [:dspex]).
  3. Fix Utility Availability:

    • Ensure Foundation.Utils and its dependencies are started reliably before any other application code runs. Foundation.available?() must be the single, trustworthy gatekeeper.

By making these changes in Foundation, you are solidifying its role as a reliable platform. Then, when you implement the DSPEx.Foundation.Client facade in DSPEx, its code will be clean and simple, relying on the newly established, trustworthy contracts provided by Foundation, instead of being filled with defensive workarounds.