Foundation 2.0: Risk Assessment & Mitigation Strategies
Purpose: To proactively identify and plan for the potential downsides and failure modes of our architectural choices.
Risk Category | Risk Description | Probability | Impact | Mitigation Strategy |
---|---|---|---|---|
Horde | Eventual Consistency Race Conditions: Developer calls start_singleton and immediately tries to lookup_singleton from another node before the registry has synced. | High | Medium | 1. Documentation: Clearly document the “Lifecycle of a Distributed Process” and the sync delay. 2. Facade Helper: Add an optional wait_for_sync: true argument to start_singleton that polls the registry until the process appears, with a timeout.3. Encourage Event-Driven Patterns: Guide users to react to :service_up events on the PubSub channel rather than polling the registry. |
libcluster | Split-Brain Clusters: A network partition causes two or more independent clusters to form, leading to duplicate singletons and inconsistent state. | Medium | High | 1. Quorum Configuration: For strategies that support it (like Gossip), Foundation’s “Apprentice Mode” will automatically configure a quorum based on expected node count. quorum_size = div(expected_nodes, 2) + 1 .2. HealthMonitor Detection: The HealthMonitor will detect when Node.list() is smaller than the configured quorum and raise a critical alert.3. Static Seeds: For production setups, recommend using a static list of seed nodes (e.g., Kubernetes headless service DNS) in addition to dynamic discovery to anchor the cluster. |
Leaky Abstraction | The Abstraction is too Leaky / Confusing: Developers don’t understand when to use the facade vs. the raw tool, leading to inconsistent or incorrect usage. | Medium | Medium | 1. Crystal-Clear Documentation: The IMPL_PATTS_BEST_PRACTICES.md is key. Every facade function docstring must explain what it does, what it wraps, and when to bypass it.2. Excellent Logging: As shown in the patterns, the facades will log exactly which underlying functions they are calling with which arguments at the DEBUG level. This makes tracing behavior trivial.3. Credo Checks: We can write custom Credo checks that suggest using a Foundation facade when raw tool usage is detected for a common pattern. |
Performance | Facade Overhead: The convenience wrappers add unacceptable latency or CPU overhead compared to direct tool usage. | Low | Low | 1. Stateless Facades: Our facades are designed to be stateless modules, not GenServer s, minimizing per-call overhead.2. Benchmarking: The Testing Strategy includes performance benchmarks that will explicitly compare facade calls vs. direct calls to ensure overhead is negligible (<5%). |
Configuration | Auto-Detection Failure: The “Mortal Mode” auto-detection picks the wrong strategy for a given environment (e.g., chooses Gossip in a K8s environment where the API is firewalled). | Medium | High | 1. Verbose Logging: The ClusterConfig module will log every step of its detection logic (Detected K8s env vars , mdns_lite not found , Falling back to Gossip ).2. Clear Override Path: Documentation will prominently feature the “Apprentice Mode” ( config :foundation, cluster: :kubernetes ) as the one-line fix for any auto-detection failure. |