113 CODE STANDARDS GUIDE

Documentation for 113_CODE_STANDARDS_GUIDE from the Foundation repository.

Foundation OS Code Standards Guide

Version 1.0 - Comprehensive Standards for TDD Implementation
Date: June 27, 2025

Executive Summary

This document establishes comprehensive code standards for implementing Foundation OS using Test-Driven Development (TDD), proper OTP design principles, and architectural best practices. These standards synthesize existing quality guidelines from CODE_QUALITY.md, supervision audits, concurrency patterns, and process hierarchy designs into a cohesive framework for reliable, maintainable code.

Philosophy: Production-grade code through TDD, OTP compliance, and architectural discipline
Approach: Staged implementation with quality gates at every level
Enforcement: Automated tooling with manual review checkpoints

Core Design Principles

1. Test-Driven Development (TDD) Mandatory

Red-Green-Refactor Cycle:

# 1. RED: Write failing test first
test "process registry registers agent with metadata" do
  metadata = %{type: :agent, capabilities: [:ml], name: "test_agent"}
  assert {:ok, entry} = Foundation.ProcessRegistry.register(self(), metadata)
  assert entry.metadata == metadata
end

# 2. GREEN: Minimal implementation to pass
def register(pid, metadata, namespace \\ :default) do
  # Minimal implementation
  {:ok, %{pid: pid, metadata: metadata, namespace: namespace}}
end

# 3. REFACTOR: Improve implementation while keeping tests green
def register(pid, metadata, namespace \\ :default) do
  with {:ok, validated_metadata} <- validate_metadata(metadata),
       {:ok, entry} <- do_register(pid, validated_metadata, namespace) do
    {:ok, entry}
  else
    {:error, reason} -> {:error, Error.new(:validation, :invalid_metadata, reason)}
  end
end

TDD Quality Gates:

All code has tests written first
Minimum 95% test coverage
Property-based tests for complex logic
Integration tests for system boundaries
No production code without corresponding tests

2. OTP Design Patterns (Mandatory)

Based on OTP supervision audits and process hierarchy analysis:

Supervision Tree Architecture

# CORRECT: Proper supervision hierarchy
Foundation.Application (Supervisor)
├── Foundation.ProcessRegistry (GenServer)
├── Foundation.ServiceRegistry (GenServer)  
├── Foundation.TaskSupervisor (DynamicSupervisor)
├── Foundation.HealthMonitor (GenServer)
└── Foundation.ServiceMonitor (GenServer)

Forbidden Patterns

# ❌ FORBIDDEN: Unsupervised processes
spawn(fn -> long_running_task() end)
Task.start(fn -> background_work() end)
spawn_link(fn -> coordination_loop() end)

# ❌ FORBIDDEN: Mixed supervisor behaviors  
defmodule BadSupervisor do
  use DynamicSupervisor
  
  def handle_call(...), do: ... # Wrong: No GenServer callbacks in DynamicSupervisor
end

# ❌ FORBIDDEN: Process.sleep for coordination
def wait_for_process do
  Process.sleep(100)  # Wrong: Violates SLEEP.md principles
  check_status()
end

Required Patterns

# ✅ REQUIRED: Supervised processes
Task.Supervisor.start_child(Foundation.TaskSupervisor, fn -> 
  background_work() 
end)

# ✅ REQUIRED: Proper GenServer structure
defmodule Foundation.ServiceMonitor do
  use GenServer
  
  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end
  
  def child_spec(opts) do
    %{
      id: __MODULE__,
      start: {__MODULE__, :start_link, [opts]},
      type: :worker,
      restart: :permanent,
      shutdown: 5000
    }
  end
  
  @impl true
  def init(opts), do: {:ok, initial_state(opts)}
end

# ✅ REQUIRED: Message-based coordination
def wait_for_ready(pid) do
  receive do
    {:ready, ^pid} -> :ok
  after
    5000 -> {:error, :timeout}
  end
end

3. Asynchronous Communication (Mandatory)

Following concurrency patterns guide:

# ✅ PREFERRED: Asynchronous cast
GenServer.cast(worker_pid, {:process_data, data})

# ⚠️ AVOID: Synchronous call unless response needed
GenServer.call(worker_pid, {:process_data, data})  # Only if result required

# ✅ REQUIRED: Event-driven architecture
Foundation.Telemetry.execute([:foundation, :data, :processed], %{count: 1}, %{
  worker_id: worker_id,
  processing_time: duration
})

# ✅ REQUIRED: Proper timeout handling
GenServer.call(worker_pid, request, 5000)  # Always specify timeout

Code Structure Standards

1. Module Organization

Based on CODE_QUALITY.md and architectural analysis:

defmodule Foundation.ProcessRegistry do
  @moduledoc """
  Universal process registration and discovery service.
  
  Provides foundation for agent management without agent-specific logic.
  Follows backend abstraction pattern for pluggable storage.
  
  ## Examples
  
      iex> metadata = %{type: :agent, capabilities: [:ml]}
      iex> {:ok, entry} = Foundation.ProcessRegistry.register(self(), metadata)
      iex> {:ok, ^entry} = Foundation.ProcessRegistry.lookup(self())
  """
  
  # 1. Module attributes and types
  @enforce_keys [:pid, :metadata, :namespace, :registered_at]
  defstruct @enforce_keys
  
  @type t :: %__MODULE__{
    pid: pid(),
    metadata: metadata(),
    namespace: atom(),
    registered_at: DateTime.t()
  }
  
  @type metadata :: %{
    type: atom(),
    capabilities: [atom()],
    name: String.t() | atom(),
    restart_policy: :permanent | :temporary | :transient,
    custom_data: map()
  }
  
  # 2. Client API (public interface)
  @spec register(pid(), metadata(), atom()) :: 
    {:ok, t()} | {:error, Foundation.Types.Error.t()}
  def register(pid, metadata, namespace \\ :default)
  
  @spec lookup(pid() | atom(), atom()) ::
    {:ok, t()} | {:error, Foundation.Types.Error.t()}
  def lookup(identifier, namespace \\ :default)
  
  # 3. GenServer callbacks (if applicable)
  @impl true
  def init(opts), do: {:ok, initial_state(opts)}
  
  @impl true  
  def handle_call({:register, pid, metadata, namespace}, _from, state) do
    # Implementation
  end
  
  # 4. Private helper functions
  defp validate_metadata(metadata), do: ...
  defp build_registry_entry(pid, metadata, namespace), do: ...
end

2. Type Specifications (Mandatory)

# ✅ REQUIRED: @type t for all structs
@type t :: %__MODULE__{
  field1: type1(),
  field2: type2()
}

# ✅ REQUIRED: @spec for all public functions
@spec function_name(arg1_type(), arg2_type()) :: return_type()

# ✅ REQUIRED: Custom types for domain concepts
@type agent_id :: String.t()
@type capability :: atom()
@type coordination_result :: %{
  success: boolean(),
  assignments: %{agent_id() => term()},
  metrics: map()
}

# ✅ REQUIRED: Union types for error handling
@type result(success_type) :: {:ok, success_type} | {:error, Foundation.Types.Error.t()}

3. Error Handling Standards

Based on error standardization requirements:

# ✅ REQUIRED: Use Foundation.Types.Error everywhere
alias Foundation.Types.Error

def register_agent(config) do
  with {:ok, validated_config} <- validate_config(config),
       {:ok, pid} <- start_agent_process(validated_config),
       {:ok, entry} <- register_in_system(pid, validated_config) do
    {:ok, entry}
  else
    {:error, %Error{} = error} -> {:error, error}
    {:error, reason} -> {:error, Error.new(:agent_management, :registration_failed, 
                                          inspect(reason))}
  end
end

# ✅ REQUIRED: Proper error context
Error.new(:validation, :invalid_config, "Configuration validation failed", %{
  field: :capabilities,
  value: invalid_value,
  constraint: "must be list of atoms"
})

# ✅ REQUIRED: Error chaining for debugging
Error.wrap(original_error, :coordination, :auction_failed, 
          "Agent auction coordination failed")

4. Documentation Standards

defmodule Foundation.Coordination.Auction do
  @moduledoc """
  Economic auction protocols for multi-agent coordination.
  
  Implements various auction mechanisms including:
  - First-price sealed-bid auctions
  - Second-price sealed-bid auctions  
  - English (ascending) auctions
  - Dutch (descending) auctions
  
  ## Architecture
  
  The auction system follows the Actor model with:
  - Auctioneer: Manages auction lifecycle and rules
  - Bidders: Submit bids according to auction protocol
  - Observers: Monitor auction progress and outcomes
  
  ## Examples
  
      iex> auction_spec = %{type: :first_price, reserve_price: 100}
      iex> participants = [agent1, agent2, agent3]
      iex> {:ok, result} = Auction.run_auction(auction_spec, participants)
      iex> assert result.winner != nil
      iex> assert result.winning_bid >= 100
      
  ## Configuration
  
  Auctions can be configured with:
  - `:timeout` - Maximum auction duration (default: 30_000ms)
  - `:reserve_price` - Minimum acceptable bid (optional)
  - `:increment` - Minimum bid increment for English auctions
  """
  
  @type auction_spec :: %{
    type: :first_price | :second_price | :english | :dutch,
    reserve_price: non_neg_integer() | nil,
    timeout: pos_integer(),
    increment: pos_integer() | nil
  }
  
  @type participant :: %{
    agent_id: String.t(),
    capabilities: [atom()],
    resource_constraints: map()
  }
  
  @type auction_result :: %{
    success: boolean(),
    winner: participant() | nil,
    winning_bid: non_neg_integer() | nil,
    total_bids: non_neg_integer(),
    duration_ms: non_neg_integer()
  }
  
  @doc """
  Runs an auction with the specified parameters.
  
  ## Parameters
  
  - `auction_spec`: Configuration for the auction mechanism
  - `participants`: List of agents eligible to participate
  - `opts`: Additional options (see module documentation)
  
  ## Returns
  
  - `{:ok, auction_result()}` on successful auction completion
  - `{:error, Foundation.Types.Error.t()}` on auction failure
  
  ## Examples
  
      # Simple first-price auction
      auction_spec = %{type: :first_price, reserve_price: 50}
      participants = [%{agent_id: "agent1", capabilities: [:ml]}]
      {:ok, result} = Auction.run_auction(auction_spec, participants)
      
      # English auction with custom timeout
      auction_spec = %{type: :english, increment: 10}
      {:ok, result} = Auction.run_auction(auction_spec, participants, timeout: 60_000)
  """
  @spec run_auction(auction_spec(), [participant()], keyword()) :: 
    {:ok, auction_result()} | {:error, Foundation.Types.Error.t()}
  def run_auction(auction_spec, participants, opts \\ [])
end

Testing Standards

1. Test Structure and Organization

defmodule Foundation.ProcessRegistryTest do
  use ExUnit.Case, async: true
  use ExUnitProperties
  
  alias Foundation.ProcessRegistry
  alias Foundation.Types.Error
  
  # Test data setup
  setup do
    # Clean slate for each test
    ProcessRegistry.clear_namespace(:test)
    %{namespace: :test}
  end
  
  describe "register/3" do
    test "successfully registers process with valid metadata", %{namespace: namespace} do
      metadata = %{
        type: :agent,
        capabilities: [:ml_optimization],
        name: "test_agent"
      }
      
      assert {:ok, entry} = ProcessRegistry.register(self(), metadata, namespace)
      assert entry.pid == self()
      assert entry.metadata == metadata
      assert entry.namespace == namespace
      assert %DateTime{} = entry.registered_at
    end
    
    test "returns error for invalid metadata", %{namespace: namespace} do
      invalid_metadata = %{type: :unknown_type}
      
      assert {:error, %Error{} = error} = ProcessRegistry.register(self(), invalid_metadata, namespace)
      assert error.category == :validation
      assert error.code == :invalid_metadata
    end
    
    property "handles concurrent registrations correctly" do
      check all process_count <- integer(1..20),
                names <- list_of(string(:alphanumeric), length: process_count) do
        
        # Create test processes
        processes = Enum.map(1..process_count, fn _ ->
          spawn(fn -> Process.sleep(:infinity) end)
        end)
        
        # Register concurrently
        tasks = Enum.zip(processes, names)
        |> Enum.map(fn {pid, name} ->
          Task.async(fn ->
            metadata = %{type: :agent, name: name, capabilities: []}
            ProcessRegistry.register(pid, metadata, :test)
          end)
        end)
        
        results = Task.await_many(tasks, 5000)
        
        # All registrations should succeed
        assert Enum.all?(results, &match?({:ok, _}, &1))
        
        # Cleanup
        Enum.each(processes, &Process.exit(&1, :kill))
      end
    end
  end
  
  describe "lookup/2" do
    setup %{namespace: namespace} do
      metadata = %{type: :agent, name: "lookup_test", capabilities: [:test]}
      {:ok, entry} = ProcessRegistry.register(self(), metadata, namespace)
      %{entry: entry}
    end
    
    test "finds process by name", %{entry: entry, namespace: namespace} do
      assert {:ok, found_entry} = ProcessRegistry.lookup("lookup_test", namespace)
      assert found_entry.pid == entry.pid
    end
    
    test "returns not_found for non-existent process", %{namespace: namespace} do
      assert {:error, %Error{} = error} = ProcessRegistry.lookup("nonexistent", namespace)
      assert error.category == :not_found
      assert error.code == :process_not_found
    end
  end
end

2. Integration Testing Standards

defmodule Foundation.IntegrationTest do
  use ExUnit.Case, async: false  # Integration tests not async
  
  alias Foundation.ProcessRegistry
  alias Foundation.ServiceRegistry
  
  describe "service discovery integration" do
    test "service registry integrates with process registry" do
      # Start a mock service
      {:ok, service_pid} = MockService.start_link(name: :test_service)
      
      # Register service
      service_config = %{
        name: :test_service,
        type: :worker,
        capabilities: [:data_processing]
      }
      
      assert {:ok, _} = ServiceRegistry.register_service(service_pid, service_config)
      
      # Verify integration
      assert {:ok, found_pid} = ServiceRegistry.find_service(:test_service)
      assert found_pid == service_pid
      
      # Verify process registry integration
      assert {:ok, entry} = ProcessRegistry.lookup(service_pid)
      assert entry.metadata.type == :worker
      
      # Cleanup
      GenServer.stop(service_pid)
    end
  end
end

3. Performance Testing Standards

defmodule Foundation.PerformanceTest do
  use ExUnit.Case, async: false
  
  @moduletag :performance
  @moduletag timeout: 300_000  # 5 minutes for performance tests
  
  describe "process registry performance" do
    test "handles high-volume concurrent registrations" do
      process_count = 1000
      namespace = :performance_test
      
      {time_microseconds, results} = :timer.tc(fn ->
        1..process_count
        |> Task.async_stream(fn i ->
          pid = spawn(fn -> Process.sleep(:infinity) end)
          metadata = %{
            type: :agent,
            name: "perf_agent_#{i}",
            capabilities: [:performance_test]
          }
          ProcessRegistry.register(pid, metadata, namespace)
        end, max_concurrency: 50, timeout: 30_000)
        |> Enum.to_list()
      end)
      
      # All registrations should succeed
      success_count = Enum.count(results, &match?({:ok, {:ok, _}}, &1))
      assert success_count == process_count
      
      # Performance assertion
      time_seconds = time_microseconds / 1_000_000
      registrations_per_second = process_count / time_seconds
      
      assert registrations_per_second > 100  # At least 100 registrations/sec
      
      IO.puts("Performance: #{registrations_per_second} registrations/second")
    end
  end
end

Quality Assurance Automation

1. Pre-commit Hooks

#!/bin/sh
# .git/hooks/pre-commit

echo "Running Foundation OS quality checks..."

# 1. Format code
echo "Formatting code..."
mix format --check-formatted || {
  echo "Code formatting required. Run: mix format"
  exit 1
}

# 2. Run tests with coverage
echo "Running tests with coverage..."
mix test --cover || {
  echo "Tests failed"
  exit 1
}

# 3. Check test coverage threshold
echo "Checking test coverage..."
mix test.coverage --threshold 95 || {
  echo "Test coverage below 95%"
  exit 1
}

# 4. Run Credo for code quality
echo "Running Credo..."
mix credo --strict || {
  echo "Credo quality checks failed"
  exit 1
}

# 5. Run Dialyzer for type checking
echo "Running Dialyzer..."
mix dialyzer || {
  echo "Type checking failed"
  exit 1
}

# 6. Check for forbidden patterns
echo "Checking for anti-patterns..."
./scripts/check_antipatterns.sh || {
  echo "Anti-pattern violations found"
  exit 1
}

echo "All quality checks passed!"

2. Anti-Pattern Detection Script

#!/bin/bash
# scripts/check_antipatterns.sh

echo "Checking for forbidden patterns..."

# Check for unsupervised spawn calls
if grep -r "spawn(" lib/ --include="*.ex" | grep -v "# Approved:"; then
  echo "ERROR: Found unsupervised spawn() calls in lib/"
  echo "Use Task.Supervisor.start_child/2 instead"
  exit 1
fi

# Check for Process.sleep calls
if grep -r "Process.sleep" lib/ --include="*.ex"; then
  echo "ERROR: Found Process.sleep calls in lib/"
  echo "Use message-based coordination instead"
  exit 1
fi

# Check for mixed supervisor behaviors
if grep -A 10 "use DynamicSupervisor" lib/**/*.ex | grep "handle_call\|handle_cast\|handle_info"; then
  echo "ERROR: Found GenServer callbacks in DynamicSupervisor"
  echo "Separate DynamicSupervisor from GenServer concerns"
  exit 1
fi

# Check for missing type specs on public functions
python3 scripts/check_typespecs.py lib/ || exit 1

echo "Anti-pattern checks passed!"

3. CI/CD Pipeline Configuration

# .github/workflows/quality.yml
name: Foundation OS Quality Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        elixir: ['1.15', '1.16']
        otp: ['25', '26']
    
    steps:
      - uses: actions/checkout@v3
      - uses: erlef/setup-beam@v1
        with:
          elixir-version: ${{ matrix.elixir }}
          otp-version: ${{ matrix.otp }}
      
      - name: Cache deps
        uses: actions/cache@v3
        with:
          path: deps
          key: deps-${{ runner.os }}-${{ matrix.otp }}-${{ matrix.elixir }}-${{ hashFiles('**/mix.lock') }}
      
      - name: Install dependencies
        run: mix deps.get
      
      - name: Check formatting
        run: mix format --check-formatted
      
      - name: Run tests with coverage
        run: mix test --cover
        env:
          MIX_ENV: test
      
      - name: Verify test coverage
        run: mix test.coverage --threshold 95
      
      - name: Run Credo
        run: mix credo --strict
      
      - name: Run Dialyzer
        run: mix dialyzer --halt-exit-status
      
      - name: Check anti-patterns
        run: ./scripts/check_antipatterns.sh
      
      - name: Run integration tests
        run: mix test --only integration
        if: matrix.elixir == '1.16' && matrix.otp == '26'
      
      - name: Upload coverage reports
        uses: codecov/codecov-action@v3
        with:
          file: ./cover/excoveralls.json
        if: matrix.elixir == '1.16' && matrix.otp == '26'

Development Workflow

1. Feature Development Process

# 1. Start with failing test
git checkout -b feature/agent-coordination
echo "Write failing test first" > TODO.md

# 2. TDD cycle
mix test                    # Should fail
# Write minimal implementation
mix test                    # Should pass
# Refactor while keeping tests green
mix test                    # Should still pass

# 3. Quality checks
mix format
mix credo --strict
mix dialyzer
./scripts/check_antipatterns.sh

# 4. Integration verification
mix test --only integration

# 5. Performance verification (if applicable)
mix test --only performance

2. Code Review Checklist

Architecture Review:

Follows OTP supervision patterns
No unsupervised processes
Proper separation of concerns
Error handling uses Foundation.Types.Error
No Process.sleep for coordination

Code Quality Review:

All public functions have @spec
All structs have @type t
Comprehensive @moduledoc and @doc
Follows naming conventions
No code duplication

Testing Review:

Tests written before implementation (TDD)
Property-based tests for complex logic
Integration tests for system boundaries
Performance tests for critical paths
Coverage >95% for new code

Security Review:

No hardcoded secrets or credentials
Input validation for all external data
Proper error message sanitization
Resource limits and timeouts

3. Staged Implementation Approach

Phase 1: Foundation Layer

# Week 1: Core registry and error system
mix test test/foundation/process_registry_test.exs
mix test test/foundation/types/error_test.exs

# Week 2: Service coordination
mix test test/foundation/service_registry_test.exs
mix test test/foundation/coordination_test.exs

# Week 3: Infrastructure services
mix test test/foundation/infrastructure_test.exs
mix test test/foundation/telemetry_test.exs

Phase 2: Integration Layer

# Week 4-5: Jido integration
mix test test/foundation_jido/agent_test.exs
mix test test/foundation_jido/signal_test.exs

# Week 6: Error standardization
mix test test/foundation_jido/error_bridge_test.exs

Phase 3: DSPEx Layer

# Week 7-8: ML integration
mix test test/dspex/program_test.exs
mix test test/dspex/jido_integration_test.exs

# Week 9: Multi-agent coordination
mix test test/dspex/coordination_test.exs

Performance and Scalability Standards

1. Performance Targets

Foundation Layer:

Process registry: <1ms lookup latency
Service discovery: <5ms resolution time
Error creation: <0.1ms overhead
Telemetry emission: <0.5ms overhead

Integration Layer:

Agent registration: <10ms end-to-end
Signal dispatch: <2ms local dispatch
Error conversion: <0.2ms overhead
Cross-layer calls: <5ms latency

DSPEx Layer:

Program execution: <100ms for simple programs
Variable validation: <1ms per variable
Multi-agent coordination: <30s for team coordination
Optimization iteration: <10s per iteration

2. Resource Limits

# Process memory limits
{:ok, pid} = Agent.start_link(fn -> %{} end, [
  max_heap_size: %{
    size: 1_000_000,    # 1MB heap limit
    kill: true,         # Kill on exceed
    error_logger: true  # Log violations
  }
])

# Process message queue limits  
Process.flag(:message_queue_data, :off_heap)
Process.flag(:min_heap_size, 1000)

# System resource monitoring
if :erlang.memory(:total) > 1_000_000_000 do  # 1GB limit
  Logger.error("System memory usage exceeded limit")
  System.halt(1)
end

3. Scalability Patterns

# Horizontal scaling pattern
def scale_agents(target_count) do
  current_count = MABEAM.AgentRegistry.count_active_agents()
  
  cond do
    current_count < target_count ->
      spawn_additional_agents(target_count - current_count)
    current_count > target_count ->
      terminate_excess_agents(current_count - target_count)
    true ->
      :no_scaling_needed
  end
end

# Load balancing pattern
def distribute_workload(tasks, agents) do
  agent_loads = Enum.map(agents, &get_agent_load/1)
  
  tasks
  |> Enum.zip(Stream.cycle(agents))
  |> Enum.each(fn {task, agent} ->
    if get_agent_load(agent) < max_load_threshold() do
      assign_task(agent, task)
    else
      queue_task(task)
    end
  end)
end

Conclusion

This comprehensive code standards guide provides the foundation for implementing Foundation OS with:

TDD Discipline: All code developed test-first with comprehensive coverage
OTP Compliance: Proper supervision, process management, and fault tolerance
Quality Automation: Continuous validation of code quality and architectural patterns
Performance Assurance: Clear targets and monitoring for system performance
Staged Implementation: Incremental development with quality gates

Key Success Factors:

Mandatory TDD with >95% coverage
Zero tolerance for OTP anti-patterns
Automated quality enforcement
Clear performance targets
Comprehensive documentation

Following these standards ensures Foundation OS will be production-ready, maintainable, and scalable from day one. The emphasis on quality automation and TDD provides confidence for the complex architectural transformation ahead.