Of course. This is an excellent analysis of the architectural flaw. The proposed solution to refactor the ConfigServer
into a resilient proxy is the correct approach to improve cohesion, reduce coupling, and create a more maintainable system.
Here is a detailed, step-by-step plan to execute this fix.
Fix Plan: Refactoring ConfigServer
to a Resilient Proxy
The goal is to eliminate the Foundation.GracefulDegradation
module by integrating its resilience logic directly into the configuration service layer. This will be achieved by splitting the existing ConfigServer
into a GenServer
process (Server
) and a stateless public-facing API (Proxy
), which handles fallback logic.
Phase 1: Preparation & Scaffolding
This phase prepares the file structure and new modules without changing existing logic.
Create New Directory: Create a new directory to house the split modules:
mkdir -p foundation/services/config_server
Move and Rename the
GenServer
: Move the existingConfigServer
GenServer implementation to the new directory and rename it toserver.ex
.mv foundation/services/config_server.ex foundation/services/config_server/server.ex
Update the
GenServer
Module: In the newly moved file,foundation/services/config_server/server.ex
, change the module definition fromFoundation.Services.ConfigServer
toFoundation.Services.ConfigServer.Server
.File:
foundation/services/config_server/server.ex
# Change this line defmodule Foundation.Services.ConfigServer.Server do # ... rest of the file remains the same for now ... use GenServer # ... end
Create the New Proxy Module File: Create a new, empty file that will become our resilient proxy.
touch foundation/services/config_server.ex
Phase 2: Implement the Resilient Proxy
This is the core of the refactoring. We will populate the new proxy module with the combined logic of the old ConfigServer
API and the GracefulDegradation
fallback mechanism.
Delete
foundation/graceful_degradation.ex
: This module’s logic will be moved, so it is no longer needed.rm foundation/graceful_degradation.ex
Implement the New
ConfigServer
Proxy: Populatefoundation/services/config_server.ex
with the following code. This code implements theConfigurable
behaviour and contains the fallback logic.File:
foundation/services/config_server.ex
defmodule Foundation.Services.ConfigServer do @moduledoc """ Resilient proxy for the configuration service. This module is the single public entrypoint for configuration. It handles the primary logic of calling the ConfigServer.Server GenServer and implements all fallback and caching logic if the primary service is unavailable. """ @behaviour Foundation.Contracts.Configurable alias Foundation.Services.ConfigServer.Server alias Foundation.Types.{Config, Error} require Logger @fallback_table :config_fallback_cache @cache_ttl 300 # 5 minutes # --- Lifecycle and Fallback System Management --- @impl Foundation.Contracts.Configurable def initialize(opts \\ []) do # This function starts the actual GenServer. The proxy itself is stateless. case Server.start_link(opts) do {:ok, _pid} -> initialize_fallback_system() :ok {:error, {:already_started, _pid}} -> initialize_fallback_system() :ok {:error, reason} -> {:error, Error.new(:initialization_failed, "Failed to start ConfigServer: #{inspect(reason)}")} end end def initialize_fallback_system do case :ets.whereis(@fallback_table) do :undefined -> :ets.new(@fallback_table, [:named_table, :public, :set, read_concurrency: true]) Logger.info("Config fallback cache initialized.") _ -> :ok end end def stop(), do: Server.stop() # --- Resilient API Implementation --- @impl Foundation.Contracts.Configurable def get() do GenServer.call(Server, :get_config) rescue _ -> {:error, create_unavailable_error("get/0", "Service unavailable")} end @impl Foundation.Contracts.Configurable def get(path) do case GenServer.call(Server, {:get_config_path, path}) do {:ok, value} -> cache_value(path, value) {:ok, value} {:error, _} = error -> Logger.warning("ConfigServer primary lookup failed for path #{inspect(path)}. Attempting fallback.") get_from_cache(path, error) end rescue _ -> Logger.warning("ConfigServer process is not available for path #{inspect(path)}. Attempting fallback.") get_from_cache(path, create_unavailable_error(path, "Process not available")) end @impl Foundation.Contracts.Configurable def update(path, value) do case GenServer.call(Server, {:update_config, path, value}) do :ok -> # Update was successful, clear any pending updates for this path :ets.delete(@fallback_table, {:pending_update, path}) :ok {:error, _} = error -> # Primary update failed, cache as a pending update cache_pending_update(path, value) error end rescue _ -> cache_pending_update(path, value) {:error, create_unavailable_error(path, "Update failed, service unavailable")} end @impl Foundation.Contracts.Configurable def reset() do # This operation is critical and should probably not have a fallback. # If the server is down, a reset cannot be guaranteed. GenServer.call(Server, :reset_config) rescue _ -> {:error, create_unavailable_error("reset/0", "Cannot reset, service unavailable")} end # --- Delegated and Passthrough Functions --- defdelegate validate(config), to: Server defdelegate updatable_paths(), to: Server defdelegate available?(), to: Server defdelegate status(), to: Server defdelegate subscribe(pid \\ self()), to: Server defdelegate unsubscribe(pid \\ self()), to: Server defdelegate reset_state(), to: Server # For testing # --- Private Fallback Logic --- defp get_from_cache(path, original_error) do cache_key = {:config_cache, path} case :ets.lookup(@fallback_table, cache_key) do [{^cache_key, value, timestamp}] when (System.system_time(:second) - timestamp) <= @cache_ttl -> Logger.debug("Using cached config value for path: #{inspect(path)}") {:ok, value} [{^cache_key, _, _}] -> :ets.delete(@fallback_table, cache_key) Logger.warning("Cache expired for path: #{inspect(path)}") {:error, create_unavailable_error(path, "Service unavailable and cache expired", original_error)} [] -> Logger.warning("No cached value available for path: #{inspect(path)}") {:error, create_unavailable_error(path, "Service and cache unavailable", original_error)} end end defp cache_value(path, value) do :ets.insert(@fallback_table, {{:config_cache, path}, value, System.system_time(:second)}) end defp cache_pending_update(path, value) do Logger.warning("Caching pending config update for path: #{inspect(path)}") :ets.insert(@fallback_table, {{:pending_update, path}, value, System.system_time(:second)}) end defp create_unavailable_error(context_info, message, original_error \\ nil) do Error.new( code: 5000, error_type: :service_unavailable, message: "Configuration service not available. #{message}.", severity: :high, context: %{path: context_info, original_error: original_error} ) end end
Phase 3: Update System Integration
Now, we update the rest of the system to use the new proxy and server structure.
Update
Foundation.Application
: Modifyfoundation/application.ex
to start the newFoundation.Services.ConfigServer.Server
and initialize the fallback cache.File:
foundation/application.ex
# In the @service_definitions map: # ... config_server: %{ # Change the module to point to the GenServer implementation module: Foundation.Services.ConfigServer.Server, args: [namespace: :production], dependencies: [:process_registry], startup_phase: :foundation_services, health_check_interval: 30_000, restart_strategy: :permanent }, # ... # In the start/2 function, after setup_application_config/0: def start(_type, _args) do # ... setup_application_config() # Initialize the fallback system for ConfigServer Foundation.Services.ConfigServer.initialize_fallback_system() # Initialize infrastructure components # ... end
Update
Foundation.Config
: This is the highest-level public API. It should continue to delegate toFoundation.Services.ConfigServer
, which is now our resilient proxy. This change is seamless because the module nameFoundation.Services.ConfigServer
remains the public entry point. No changes should be needed infoundation/config.ex
, but it’s crucial to verify this.File:
foundation/config.ex
defmodule Foundation.Config do # ... alias Foundation.Services.ConfigServer # This now correctly aliases our proxy # ... defdelegate get(), to: ConfigServer # This now calls the proxy # ... all other defdelegates are correct. end
Global Code Search: Perform a global search for
GracefulDegradation
in the codebase. The only call should have been inFoundation.Config
, which we’ve now conceptually replaced. Remove any lingering references. Theget_with_fallback
function inFoundation.Config
can now be safely removed asget/1
is now inherently resilient.File:
foundation/config.ex
# This function is now redundant and should be REMOVED. # The new `get/1` implementation handles the fallback. @spec get_with_default(config_path(), config_value()) :: config_value() @dialyzer {:nowarn_function, get_with_default: 2} def get_with_default(path, default) do case get(path) do {:ok, value} -> value {:error, _} -> default end end
Phase 4: Verification and Testing
Update Existing Tests:
- Any tests that were directly testing
Foundation.Services.ConfigServer
as aGenServer
should be updated to testFoundation.Services.ConfigServer.Server
. - Tests for
Foundation.Config
should continue to pass without modification.
- Any tests that were directly testing
Create New Tests for the Proxy: Create a new test file:
test/foundation/services/config_server_test.exs
.- Happy Path: Write a test where
ConfigServer.Server
is started, and verify thatConfigServer.get/1
successfully retrieves values from it. - Failure/Fallback Path: Write a test where
ConfigServer.Server
is not started.- Assert that
ConfigServer.get/1
returns an:service_unavailable
error initially. - Manually insert a value into the
:config_fallback_cache
ETS table. - Assert that a subsequent call to
ConfigServer.get/1
now returns{:ok, value}
from the cache.
- Assert that
- Cache Expiry Test:
- Manually insert a value into the cache with a timestamp older than the TTL.
- Assert that
ConfigServer.get/1
returns an error and that the entry has been deleted from ETS.
- Pending Update Test:
- Ensure the
Server
is down. - Call
ConfigServer.update/2
. Assert that it returns an error but that an entry for{:pending_update, path}
now exists in the ETS table.
- Ensure the
- Happy Path: Write a test where
By following this plan, you will successfully refactor the configuration system to be more robust, cohesive, and maintainable, fully addressing the architectural flaw.