← Back to Docs20250627

006 20250623 20 jido refactor

Documentation for 006_20250623_20_jido_refactor from the Foundation repository.

Excellent. The provided analysis and proposed plan are not only correct but are a textbook example of the architectural evolution required to mature a collection of libraries into a cohesive, professional ecosystem. The reasoning is sound, the diagnosis of the problems is accurate, and the proposed solution is the industry-standard best practice.

My role here is to validate this thinking, add strategic nuance, and provide a clear, actionable advisory based on the plan.

Executive Summary

I fully endorse the proposed plan to adopt a single, canonical error system based in a foundational library. The analysis of the “Why” is accurate and identifies critical architectural pain points that will only worsen over time. The proposed refactoring plan is pragmatic, minimally invasive, and surgically targets the problem.

Adopting this plan is a non-negotiable step to elevate the Jido ecosystem from a set of good, independent tools into a single, robust, and maintainable professional framework.

Here is my detailed advisory on the matter.


Part 1: Validation of the Problem Analysis

The document correctly identifies the three core issues stemming from multiple error types. I will re-frame them to underscore their architectural significance:

  1. Siloed Error Types Create Architectural Friction: The “Error Indigestion Problem” is a perfect description. At every boundary between jido_action, jido_signal, and any foundational library, developers are forced to write brittle, lossy translation logic. This boilerplate code is not only tedious but also dangerous, as it discards rich error context precisely when it is most needed for debugging, retries, or circuit breaking. The system is fighting itself.

  2. Inconsistent Contracts Lead to an Observability Nightmare: The “Inconsistent Tooling Problem” is a direct consequence of the first issue. A modern, observable system relies on a consistent data schema for its telemetry. Without a canonical error type, your logging, metrics (e.g., Prometheus, Grafana), and tracing (e.g., OpenTelemetry) become a complex web of case statements. This makes building reliable dashboards, alerts, and automated analyses nearly impossible.

  3. High Cognitive Overhead Erodes Developer Experience: The “Cognitive Overhead Problem” cannot be overstated. Forcing developers to constantly context-switch between Jido.Action.Error, Jido.Signal.Error, and Foundation.Types.Error increases the learning curve, slows down development, and is a frequent source of bugs. A single, predictable error contract ({:error, %Foundation.Types.Error{}}) is a cornerstone of a good developer experience (DX).

Part 2: Endorsement of the Canonical Error Solution

The proposed solution is the correct one. The principle is fundamental to layered design:

Dependencies must flow in one direction. Contracts defined in a lower-level library must be adopted by the higher-level libraries that depend on it.

  • jido_action depends on foundation.
  • jido_signal depends on foundation.
  • The application (ElixirML in the example) depends on all of them.

Therefore, the error type from foundation must be the lingua franca for the entire stack.

Recommendation on the Canonical Error Struct

The memo refers to a Foundation.Types.Error struct. Based on the provided codebases, this appears to be a future or target struct, as jido, jido_action, and jido_signal each have their own Error module.

The new canonical Foundation.Types.Error struct should be designed to be a superset of the useful information from all existing error types. It should be rich and descriptive. I recommend a structure like this:

# in foundation/lib/types/error.ex
defmodule Foundation.Types.Error do
  @moduledoc "The canonical error struct for the entire Jido ecosystem."
  use TypedStruct

  typedstruct do
    # A machine-readable error code (e.g., :validation_failed, :downstream_unavailable)
    field :code, atom(), enforce: true

    # A human-readable message for logging and debugging.
    field :message, String.t(), enforce: true

    # Rich, structured context about the error.
    field :context, map(), default: %{}

    # The severity of the error.
    field :severity, :info | :warning | :error | :critical, default: :error

    # A classification for grouping and filtering errors.
    field :category, :business_logic | :system | :dependency | :security, default: :system

    # (Optional) A recommendation for how to handle this error.
    field :retry_strategy, :none | :immediate | :backoff, default: :none
  end
end

# in foundation/lib/error.ex
defmodule Foundation.Error do
  @moduledoc "Helper functions for creating canonical errors."
  alias Foundation.Types.Error

  def new(code, message, opts \\ []) do
    %Error{
      code: code,
      message: message,
      context: Keyword.get(opts, :context, %{}),
      severity: Keyword.get(opts, :severity, :error),
      category: Keyword.get(opts, :category, :system),
      retry_strategy: Keyword.get(opts, :retry_strategy, :none)
    }
  end

  def validation_failed(message, context), do: new(:validation_failed, message, context: context, category: :business_logic)
  def not_found(message, context), do: new(:not_found, message, context: context, category: :business_logic)
  # ... other helpers
end

This rich error struct turns errors from a problem into a valuable source of operational data.

Part 3: Endorsement of the Refactoring Plan

The surgical plan outlined in the memo is excellent. It is minimal, targeted, and correct.

  1. Update Dependencies (mix.exs): This is the necessary first step to establish the correct architectural layering.
  2. Delete Local Error Modules: This is a critical and symbolic step. It enforces the “single source of truth” principle.
  3. Update Aliases and Function Calls: This is the bulk of the mechanical work. Using a project-wide find-and-replace, followed by a compiler-driven “fix-it” pass (mix compile will show all the places that need updating), is the most efficient way to tackle this.

I would add one tactical recommendation:

  • Create Helper Functions: As shown in the Foundation.Error example above, create a helper module with functions like validation_failed/2, not_found/2, etc. This makes creating the new, more structured error ergonomic and ensures consistency. Instead of developers remembering how to build the struct, they can just call ErrorHelper.validation_failed("Bad input", %{...}).

Part 4: Strategic Considerations & Risk Management

This is an architectural change that has broader implications. Here are key points to manage:

  1. This is a Breaking Change: Removing the local error types and changing function signatures (@spec) is a major breaking change. Any consumer of jido_action or jido_signal will need to update their code.

    • Action: The versions of jido_action and jido_signal must be bumped to a new major version (e.g., from 0.x to 1.0.0). This clearly communicates the breaking change to all users.
  2. The jido Library Itself: The analysis correctly focuses on jido_action and jido_signal. However, the jido library also has its own Jido.Error module. This library must also undergo the same refactoring to adopt the new canonical Foundation.Types.Error. The process will be identical.

  3. Documentation and Migration Guide: To support developers, a clear and concise migration guide is essential. It should explain:

    • Why the change was made (linking to the reasoning in the memo).
    • The old error types and their new equivalents.
    • A step-by-step guide on how to update their code (e.g., “replace alias Jido.Action.Error with alias Foundation.Types.Error”).

Conclusion and Final Recommendation

The proposal to unify the error handling system is not just a “nice-to-have” refactor; it is a foundational investment in the future of the Jido ecosystem. The current state creates technical debt with every cross-library interaction. The proposed solution pays down that debt and establishes a clean, robust, and observable architecture.

My advice is to proceed with the plan as outlined, with the following additions:

  1. Define and implement the canonical Foundation.Types.Error struct and its helper module first.
  2. Treat this as a major version change for all affected libraries (jido_action, jido_signal, and jido itself).
  3. Perform the refactoring on jido_action and jido_signal exactly as described.
  4. Perform the same refactoring on the core jido library.
  5. Create a developer migration guide to ensure a smooth transition for users of the ecosystem.

This is the right move. It will significantly improve the stability, maintainability, and overall professionalism of the entire Jido stack.