007 20250623 14 jido error cleanup

Documentation for 007_20250623_14_jido_error_cleanup from the Foundation repository.

You are asking the perfect follow-up question. It’s a pragmatic and important one: can we achieve our goals without touching the newly refactored jido* libraries?

The direct answer is: To build a truly robust, integrated, and professional stack, yes, we absolutely must modify the jido_action and jido_signal libraries to adopt a single, canonical error system.

You cannot keep them “as is” if you want them to integrate cleanly with foundation. The current separation is logical, but their contracts are not yet harmonized. This modification isn’t a failure of the refactoring; it’s the crucial next step of integration.

Let’s explore why this is non-negotiable and how to do it with minimal friction.

Part 1: The “Why” - The Problem with Multiple Error Types

Currently, our stack has at least three different error structs:

Foundation.Types.Error in foundation
Jido.Action.Error in jido_action
Jido.Signal.Error in jido_signal

While they look similar, their distinct types create a cascade of architectural problems.

1. The Error Indigestion Problem:

Imagine an action from jido_action that uses a foundation service:

defmodule MyAction do
  use Jido.Action
  alias Foundation.Infrastructure.CircuitBreaker

  def run(params, _context) do
    # This call can return {:error, %Foundation.Types.Error{}}
    CircuitBreaker.execute(:my_service, fn -> call_external_api() end)
    # The `run/2` function is expected to return {:error, %Jido.Action.Error{}}
  end
end

What does MyAction do when it receives a Foundation.Types.Error?

Option A (Crash): The function crashes on an unmatched case or with clause. This is brittle.
Option B (Translate and Wrap): The run/2 function must catch the Foundation.Types.Error and manually convert it into a Jido.Action.Error.

This translation is inefficient and loses critical information. Foundation.Types.Error has rich context like code, severity, category, and retry_strategy. Wrapping it in a simpler error struct throws away this valuable data right when you need it most.

2. The Inconsistent Tooling Problem:

How do you build a single, reliable error monitoring system? Your telemetry handlers, log parsers, and alerting systems would need to handle multiple, distinct error structs:

# A telemetry handler becomes a complex pattern-matching nightmare
def handle_event([:jido, :error], _measurements, metadata, _config) do
  case metadata.error do
    %Foundation.Types.Error{code: code, severity: sev} ->
      # Handle foundation error
    %Jido.Action.Error{type: type, details: dets} ->
      # Handle jido_action error
    %Jido.Signal.Error{type: type, details: dets} ->
      # Handle jido_signal error
    other ->
      # Handle generic Elixir errors
  end
end

This is unmaintainable. A unified error schema is the bedrock of good observability.

3. The Cognitive Overhead Problem:

A developer using the full stack must learn, handle, and differentiate between three separate error systems. This violates the Principle of Least Surprise and makes the entire system harder to use correctly.

Part 2: The Solution - A Single Canonical Error Type

The only robust solution is to establish a single source of truth for errors.

Principle: The error type defined in the lowest-level library (foundation) must be the canonical error type for the entire stack.

Higher-level libraries depend on lower-level libraries. Therefore, they must adopt the contracts of their dependencies. A function in jido_action can return a Foundation.Types.Error, but a function in foundation can never return a jido_action error without creating a circular dependency.

The Contract: Every function across foundation, jido_signal, and jido_action that can fail should have the signature ... -> {:ok, result} | {:error, Foundation.Types.Error.t()}.

This creates a unified “error monad” across the whole system.

Part 3: The Refactoring Plan (Minimal & Targeted)

You don’t need to rewrite the libraries. This is a focused, surgical change.

Step 1: Update mix.exs Dependencies

jido_action/mix.exs must add {:foundation, ...}.
jido_signal/mix.exs must add {:foundation, ...}.

Step 2: Refactor jido_action

Delete the local error module:
```
rm jido_action/lib/error.ex
```

Update aliases: In every file that uses Jido.Action.Error, replace the alias.

# Before:
alias Jido.Action.Error

# After:
alias Foundation.Types.Error
alias Foundation.Error, as: ErrorHelper

Update function calls: This is mostly a find-and-replace operation. The function names are similar.

# Before:
Error.validation_error("Invalid input", %{...})

# After (using the new alias):
ErrorHelper.new(:validation_failed, "Invalid input", context: %{...})
# Note: Foundation.Error uses a more structured `new/3`

This needs to be done in jido_action.ex and jido_exec.ex.

Step 3: Refactor jido_signal

Delete the local error module:
```
rm jido_signal/lib/error.ex
```
Update aliases and function calls: Repeat the same process as in jido_action for all relevant files (e.g., dispatch.ex, router.ex).

Part 4: The Payoff - A Clean and Unified System

After this refactoring, the entire stack becomes vastly more powerful and professional.

1. Clean, Consistent Error Handling:

A with block in your ElixirML application can now handle errors from any layer without complex pattern matching.

# In an ElixirML program
with {:ok, pid} <- Foundation.ProcessRegistry.lookup( ... ),                            # Returns Foundation.Types.Error on failure
     {:ok, _} <- Jido.Signal.Dispatch.dispatch(signal, {:pid, [target: pid]}),         # Returns Foundation.Types.Error on failure
     {:ok, result, _} <- Jido.Exec.run(MyAction, %{})                                  # Returns Foundation.Types.Error on failure
do
  # success
else
  # ANY error from ANY layer is a standard %Foundation.Types.Error{}
  {:error, %Foundation.Types.Error{} = error} ->
    # A single, consistent error handling path
    log_and_recover(error)
end

2. Unified and Powerful Observability:

Your telemetry handler becomes trivial and powerful.

def handle_event([_app, _domain, :error], _measurements, metadata, _config) do
  error = metadata.error
  
  # We can trust `error` is always a %Foundation.Types.Error{}
  Prometheus.inc(
    labels: [error.category, error.subcategory, error.error_type, error.severity],
    value: 1
  )
end

Conclusion:

You are correct to question the need for this change. The jido* libraries are well-designed in their isolation. However, to create an integrated system, they cannot remain fully independent. They must agree on a common contract for fundamentals like error handling.

The proposed modification is not a step backward. It is the final, crucial step of integration that solidifies the layered architecture and unlocks the true potential of composing these powerful libraries together. It elevates the stack from a collection of good tools to a single, professional, and robust framework.