03 TDD STRATEGY

Documentation for 03_TDD_STRATEGY from the Json remedy repository.

TDD Strategy for JsonRemedy

📋 Implementation Status Checklist

Phase 1: Foundation & Interfaces

LayerBehaviour Contract - Core interface for all layers
- process/1 function signature
- Input/output type specifications
- Error handling patterns
- Documentation and examples

Phase 2: Layer 1 - Content Cleaning

Implementation Complete (497 lines)
- Code fence removal (remove_code_fences/1)
- Comment stripping (strip_comments/1)
- Encoding normalization (normalize_encoding/1)
- Error handling and validation
Test Coverage Complete (329 lines)
- Unit tests for all functions
- Edge case handling
- Error condition testing
- Integration scenarios
Documentation Complete (docs/LAYER_1.md)

Phase 3: Layer 2 - Structural Repair

Implementation Complete (497 lines)
- State machine architecture
- Quote balancing (balance_quotes/1)
- Bracket matching (balance_brackets/1)
- Comprehensive error recovery
Test Coverage Complete (329 lines)
- State transition testing
- Complex nested structure tests
- Malformed input handling
- Performance edge cases
Documentation Complete (LAYER_2.md)

Phase 4: Layer 3 - Syntax Normalization

Implementation Complete (2050+ lines)
- Character-by-character parser
- Token-based processing
- Syntax error correction
- Context-aware repairs
Test Coverage Complete (597 lines)
- Parser state machine tests
- Complex syntax scenario testing
- Error recovery validation
- Integration with Layer 2
Documentation Complete (LAYER_3.md)

Phase 5: Layer 4 - Validation Layer ⏳

Implementation Pending
- JSON schema validation
- Type checking and coercion
- Data integrity verification
- Custom validation rules
Test Coverage Pending
- Schema validation tests
- Type coercion scenarios
- Invalid data handling
- Performance benchmarks
Documentation Pending

Phase 6: Layer 5 - Tolerant Parsing ⏳

Implementation Pending
- Flexible JSON parsing
- Best-effort data extraction
- Partial result generation
- Fallback mechanisms
Test Coverage Pending
- Tolerance scenario tests
- Partial parsing validation
- Fallback behavior testing
- Edge case handling
Documentation Pending

Phase 7: Integration Pipeline ⏳

Core Pipeline Implementation
- Layer orchestration system
- Data flow between layers
- Error propagation handling
- Performance optimization
Pipeline Testing
- End-to-end integration tests
- Layer interaction validation
- Error handling scenarios
- Performance benchmarking
Pipeline Documentation

Phase 8: Performance & Production Readiness ⏳

Performance Framework
- Comprehensive benchmarking suite
- Memory usage profiling
- Concurrency testing
- Scalability validation
Production Features
- Monitoring and metrics
- Configuration management
- Deployment documentation
- Production testing

Current Status Summary

✅ Completed Phases: 1-4 (Foundation through Layer 3)
⏳ In Progress: Phase 5 (Layer 4 - Validation)
📊 Overall Progress: 50% complete (4/8 phases)
🧪 Test Coverage: Comprehensive for completed layers
📚 Documentation: Complete for Layers 1-3

⚠️ Implementation Style Note

IMPORTANT: When implementing remaining phases, ensure you follow the actual parameter usage patterns, function signatures, and coding style found in the existing codebase at /lib/json_remedy/ and /test/unit/, rather than the examples in this document. The actual implementation may use different parameter names, structures, or patterns than shown in these planning documents.

Overview

This document outlines a comprehensive Test-Driven Development strategy for implementing JsonRemedy’s layered JSON repair architecture. We’ll build the system incrementally, layer by layer, with tests driving the design decisions.

TDD Philosophy for JsonRemedy

Core Principles

Red-Green-Refactor cycle for each layer
Interface-first design - define APIs before implementation
Incremental complexity - start simple, add features gradually
Behavior-driven specifications - tests describe expected behavior
Fail-fast feedback - catch issues early in development

Success Metrics

100% test coverage for core repair functions
All layer interfaces well-defined before implementation
Performance benchmarks as acceptance criteria
Error scenarios explicitly tested
Documentation generated from test examples

Development Phases

Phase 1: Foundation & Interfaces (Week 1)

Goal: Define all module interfaces and basic infrastructure

Day 1-2: Core Type Definitions

# Define shared types across all modules
@type json_value :: nil | boolean() | number() | String.t() | [json_value()] | %{String.t() => json_value()}
@type repair_action :: %{layer: atom(), action: String.t(), position: non_neg_integer() | nil}
@type repair_context :: %{repairs: [repair_action()], options: keyword()}
@type repair_result :: {:ok, json_value()} | {:ok, json_value(), [repair_action()]} | {:error, String.t()}

Day 3-4: Layer Interface Contracts

# Each layer must implement this behavior
defmodule JsonRemedy.LayerBehaviour do
  @callback process(input :: String.t(), context :: repair_context()) :: 
    {:ok, String.t(), repair_context()} | {:error, String.t()}
  
  @callback supports?(input :: String.t()) :: boolean()
  @callback priority() :: non_neg_integer()
end

Day 5: Integration Pipeline Interface

# Main pipeline orchestrator
defmodule JsonRemedy.Pipeline do
  @type layer_module :: module()
  @type pipeline_config :: %{
    layers: [layer_module()],
    early_exit: boolean(),
    max_iterations: pos_integer()
  }
end

Phase 2: Layer 1 - Content Cleaning (Week 2)

Goal: Remove non-JSON content and normalize encoding

Test Categories

Code fence removal
Comment stripping
Wrapper text extraction
Encoding normalization

TDD Cycle for Each Feature

Example: Code Fence Removal

# test/layers/content_cleaning_test.exs
describe "code fence removal" do
  test "removes simple json fences" do
    input = "```json\n{\"name\": \"Alice\"}\n```"
    expected_output = "{\"name\": \"Alice\"}"
    expected_repairs = [%{layer: :content_cleaning, action: "removed code fences", position: 0}]
    
    assert {:ok, output, context} = JsonRemedy.Layer1.ContentCleaning.process(input, %{repairs: [], options: []})
    assert output == expected_output
    assert context.repairs == expected_repairs
  end
  
  test "preserves code fence content inside strings" do
    input = "{\"example\": \"Use ```json for highlighting\"}"
    
    assert {:ok, output, context} = JsonRemedy.Layer1.ContentCleaning.process(input, %{repairs: [], options: []})
    assert output == input  # Should be unchanged
    assert context.repairs == []
  end
end

Red Phase: Write failing tests Green Phase: Implement minimal code to pass Refactor Phase: Clean up implementation

Phase 3: Layer 2 - Structural Repair (Week 3)

Goal: Fix missing/extra delimiters and basic structure

State Machine Design

# Define parsing states for context tracking
@type parser_state :: :start | :in_object | :in_array | :in_string | :in_number | :in_literal
@type delimiter_stack :: [{:object | :array, pos_integer()}]
@type structural_context :: %{
  state: parser_state(),
  stack: delimiter_stack(),
  position: non_neg_integer(),
  repairs: [repair_action()]
}

Test-Driven State Machine Development

describe "structural repair state machine" do
  test "tracks object nesting correctly" do
    input = "{\"users\": [{\"name\": \"Alice\""
    
    # Should detect missing closing delimiters
    {:ok, output, context} = JsonRemedy.Layer2.StructuralRepair.process(input, %{repairs: [], options: []})
    
    assert String.ends_with?(output, "}]}")
    assert length(context.repairs) == 2  # Missing } and ]
  end
  
  test "handles mismatched delimiters" do
    input = "{\"array\": [1, 2, 3}"  # Missing ]
    
    {:ok, output, context} = JsonRemedy.Layer2.StructuralRepair.process(input, %{repairs: [], options: []})
    
    assert output == "{\"array\": [1, 2, 3]}"
    assert hd(context.repairs).action =~ "added missing closing bracket"
  end
end

Phase 4: Layer 3 - Syntax Normalization (Week 4)

Goal: Fix quotes, booleans, trailing commas, etc.

Rule-Based System Design

# Define syntax repair rules as data
@type syntax_rule :: %{
  name: String.t(),
  pattern: Regex.t(),
  replacement: String.t(),
  condition: (String.t() -> boolean()) | nil
}

@syntax_rules [
  %{
    name: "quote_unquoted_keys",
    pattern: ~r/([{,]\s*)([a-zA-Z_][a-zA-Z0-9_]*)\s*:/,
    replacement: ~S(\1"\2":),
    condition: &context_aware_key_check/1
  },
  # More rules...
]

Context-Aware Testing

describe "syntax normalization with context awareness" do
  test "quotes unquoted keys but preserves string content" do
    input = "{message: \"The key: value format\", name: \"Alice\"}"
    
    {:ok, output, context} = JsonRemedy.Layer3.SyntaxNormalization.process(input, %{repairs: [], options: []})
    
    # Should quote 'message' and 'name' but not touch string content
    assert output == "{\"message\": \"The key: value format\", \"name\": \"Alice\"}"
    assert length(context.repairs) == 2
  end
  
  test "removes trailing commas only in appropriate contexts" do
    input = "{\"items\": [\"a,b\", \"c,d\",], \"count\": 2}"
    
    {:ok, output, context} = JsonRemedy.Layer3.SyntaxNormalization.process(input, %{repairs: [], options: []})
    
    # Should remove trailing comma in array but preserve commas in strings
    assert output == "{\"items\": [\"a,b\", \"c,d\"], \"count\": 2}"
  end
end

Phase 5: Layer 4 - Validation (Week 5)

Goal: Attempt standard JSON parsing

Fast Path Optimization

describe "validation layer" do
  test "uses Jason.decode for clean JSON" do
    input = "{\"name\": \"Alice\", \"age\": 30}"
    
    # Should succeed immediately without further processing
    {:ok, result, context} = JsonRemedy.Layer4.Validation.process(input, %{repairs: [], options: []})
    
    assert result == %{"name" => "Alice", "age" => 30}
    assert context.repairs == []
  end
  
  test "passes through malformed JSON for further processing" do
    input = "{name: \"Alice\"}"  # Still needs repair
    
    # Should fail and pass to next layer
    {:continue, ^input, context} = JsonRemedy.Layer4.Validation.process(input, %{repairs: [], options: []})
    assert context.repairs == []
  end
end

Phase 6: Layer 5 - Tolerant Parsing (Week 6)

Goal: Handle edge cases that preprocessing can’t fix

Custom Parser Design

# Recursive descent parser with error recovery
@type parse_position :: non_neg_integer()
@type parse_state :: %{
  input: String.t(),
  position: parse_position(),
  repairs: [repair_action()]
}

Error Recovery Testing

describe "tolerant parsing with error recovery" do
  test "recovers from severely malformed input" do
    input = "name Alice age 30 active true"  # No JSON structure
    
    {:ok, result, context} = JsonRemedy.Layer5.TolerantParsing.process(input, %{repairs: [], options: []})
    
    # Should attempt to extract key-value pairs
    assert result == %{"name" => "Alice", "age" => "30", "active" => "true"}
    assert length(context.repairs) > 0
  end
  
  test "handles truncated input gracefully" do
    input = "{\"users\": [{\"name\": \"Alice\""  # Severely truncated
    
    {:ok, result, context} = JsonRemedy.Layer5.TolerantParsing.process(input, %{repairs: [], options: []})
    
    assert result["users"] |> hd() |> Map.get("name") == "Alice"
  end
end

Test Infrastructure Requirements

1. Test Data Management

# test/support/test_data_manager.ex
defmodule JsonRemedy.TestDataManager do
  @moduledoc """
  Manages test fixtures and generates test data for comprehensive testing.
  """
  
  def load_fixture(category, name) do
    # Load predefined test cases
  end
  
  def generate_malformed_variants(valid_json) do
    # Generate systematic malformations for property testing
  end
  
  def create_large_test_file(size_mb, malformation_types) do
    # Generate large files for performance testing
  end
end

2. Performance Test Utilities

# test/support/performance_helpers.ex
defmodule JsonRemedy.PerformanceHelpers do
  def measure_repair_time(input, expected_max_time) do
    {time, result} = :timer.tc(fn -> JsonRemedy.repair(input) end)
    assert time < expected_max_time, "Repair took #{time}μs, expected < #{expected_max_time}μs"
    result
  end
  
  def measure_memory_usage(input, expected_max_memory) do
    :erlang.garbage_collect()
    {memory_before, _} = :erlang.process_info(self(), :memory)
    
    result = JsonRemedy.repair(input)
    
    :erlang.garbage_collect()
    {memory_after, _} = :erlang.process_info(self(), :memory)
    
    memory_used = memory_after - memory_before
    assert memory_used < expected_max_memory
    result
  end
end

3. Mock Layer System

# test/support/mock_layer.ex
defmodule JsonRemedy.MockLayer do
  @behaviour JsonRemedy.LayerBehaviour
  
  def process(input, context) do
    # Configurable mock for testing pipeline behavior
  end
  
  def supports?(_input), do: true
  def priority, do: 1
end

Acceptance Criteria Per Layer

Layer 1: Content Cleaning

@layer1_acceptance_criteria %{
  code_fence_removal: %{
    success_rate: 0.98,
    max_time_us: 100,
    test_cases: 25
  },
  comment_removal: %{
    success_rate: 0.95,
    max_time_us: 150,
    test_cases: 30
  },
  wrapper_extraction: %{
    success_rate: 0.90,
    max_time_us: 200,
    test_cases: 20
  }
}

Layer 2: Structural Repair

@layer2_acceptance_criteria %{
  missing_delimiters: %{
    success_rate: 0.85,
    max_time_us: 500,
    test_cases: 40
  },
  extra_delimiters: %{
    success_rate: 0.90,
    max_time_us: 300,
    test_cases: 25
  },
  mismatched_nesting: %{
    success_rate: 0.75,
    max_time_us: 800,
    test_cases: 35
  }
}

Layer 3: Syntax Normalization

@layer3_acceptance_criteria %{
  quote_normalization: %{
    success_rate: 0.95,
    max_time_us: 200,
    test_cases: 45
  },
  boolean_normalization: %{
    success_rate: 0.98,
    max_time_us: 100,
    test_cases: 20
  },
  comma_fixes: %{
    success_rate: 0.90,
    max_time_us: 300,
    test_cases: 35
  }
}

Property-Based Testing Strategy

1. Invariant Properties

# Properties that should always hold
defmodule JsonRemedy.PropertyTests do
  use PropCheck
  
  property "repair is idempotent for valid JSON" do
    forall valid_json <- valid_json_generator() do
      {:ok, result1} = JsonRemedy.repair(valid_json)
      {:ok, result2} = JsonRemedy.repair(Jason.encode!(result1))
      result1 == result2
    end
  end
  
  property "repair preserves semantic content when possible" do
    forall {original, malformed} <- malformed_json_pair_generator() do
      case {JsonRemedy.repair(Jason.encode!(original)), JsonRemedy.repair(malformed)} do
        {{:ok, clean_result}, {:ok, repair_result}} ->
          semantically_equivalent?(clean_result, repair_result)
        _ ->
          true  # Acceptable if either fails
      end
    end
  end
  
  property "repair always produces valid JSON or fails gracefully" do
    forall input <- any_string_generator() do
      case JsonRemedy.repair(input) do
        {:ok, result} ->
          match?({:ok, _}, Jason.encode(result))
        {:error, _reason} ->
          true
      end
    end
  end
end

2. Generative Testing Data

defmodule JsonRemedy.TestGenerators do
  use PropCheck
  
  def valid_json_generator do
    sized(size, json_value_generator(size))
  end
  
  defp json_value_generator(0) do
    oneof([
      nil,
      boolean(),
      integer(),
      float(),
      utf8()
    ])
  end
  
  defp json_value_generator(size) when size > 0 do
    oneof([
      json_value_generator(0),
      map(atom(), json_value_generator(size - 1)),
      list(json_value_generator(size - 1))
    ])
  end
  
  def malformed_json_generator do
    oneof([
      add_syntax_errors(valid_json_generator()),
      add_structural_errors(valid_json_generator()),
      add_content_wrapper(valid_json_generator()),
      truncate_json(valid_json_generator())
    ])
  end
  
  defp add_syntax_errors(json_gen) do
    # Add unquoted keys, wrong booleans, etc.
  end
  
  defp add_structural_errors(json_gen) do
    # Remove delimiters, add extras, etc.
  end
end

Continuous Integration Strategy

1. Test Pipeline Stages

# .github/workflows/ci.yml
stages:
  - unit_tests:
      - Layer 1 tests
      - Layer 2 tests  
      - Layer 3 tests
      - Layer 4 tests
      - Layer 5 tests
  
  - integration_tests:
      - End-to-end scenarios
      - Real-world data tests
      - Error handling validation
  
  - performance_tests:
      - Benchmark validation
      - Memory usage tests
      - Large file handling
  
  - property_tests:
      - Invariant validation
      - Generative testing
      - Edge case discovery
  
  - acceptance_tests:
      - Success rate validation
      - Documentation examples
      - API contract verification

2. Quality Gates

# Quality requirements for merging
@quality_gates %{
  test_coverage: 0.95,           # 95% line coverage
  success_rates: %{              # Per acceptance criteria
    layer1: 0.95,
    layer2: 0.85,
    layer3: 0.90,
    integration: 0.80
  },
  performance: %{
    valid_json_max_time: 10,     # microseconds
    simple_repair_max_time: 1000, # microseconds
    complex_repair_max_time: 5000 # microseconds
  },
  code_quality: %{
    credo_issues: 0,             # No credo violations
    dialyzer_warnings: 0,        # No type warnings
    documentation: 0.90          # 90% function documentation
  }
}

Development Workflow

Daily TDD Cycle

Morning: Review failing tests, plan implementation
Red Phase: Write failing tests for new features (30 min)
Green Phase: Implement minimal code to pass (60-90 min)
Refactor Phase: Clean up code and improve design (30 min)
Integration: Run full test suite and fix issues (30 min)
Evening: Review progress, plan next day

Weekly Milestones

Monday: Design interfaces and write contracts
Tuesday-Thursday: Implement layers with TDD
Friday: Integration testing and performance validation
Weekend: Documentation and example creation

Feedback Loops

Immediate: Unit test feedback (< 1 second)
Fast: Layer integration tests (< 10 seconds)
Medium: Full test suite (< 60 seconds)
Slow: Property tests and benchmarks (< 5 minutes)

This TDD strategy ensures that JsonRemedy is built incrementally with high confidence, comprehensive testing, and clear success criteria at every level.