05 DETAILED TEST SPEC AND CASES

Documentation for 05_DETAILED_TEST_SPEC_AND_CASES from the Json remedy repository.

Detailed Test Specifications and Test Cases

📋 Test Implementation Status Checklist

Test Infrastructure & Organization

Test Directory Structure - Comprehensive test organization
- Unit test structure (test/unit/)
- Integration test structure (test/integration/)
- Performance test structure (test/performance/)
- Property-based test structure (test/property/)
- Support utilities structure (test/support/)

Layer-Specific Unit Tests

Layer 1: Content Cleaning Tests ✅

Test Specifications Complete - Comprehensive test cases defined
- Code fence removal test cases (8 scenarios)
- Comment stripping test cases (7 scenarios)
- Wrapper text extraction test cases (4 scenarios)
- Encoding normalization test cases (3 scenarios)
Implementation Status: COMPLETE (329 test lines)
Coverage: Edge cases, error conditions, string preservation

Layer 2: Structural Repair Tests ✅

Test Specifications Complete - State machine test cases defined
- Missing delimiter test cases (complex nesting)
- Extra delimiter removal test cases
- Mismatched delimiter repair test cases
- State machine behavior validation
Implementation Status: COMPLETE (329 test lines)
Coverage: State transitions, parser contexts, error recovery

Layer 3: Syntax Normalization Tests ✅

Test Specifications Complete - Context-aware syntax tests defined
- Quote normalization test cases
- Key quoting test cases
- Literal normalization test cases
- Comma and colon repair test cases
- Context preservation test cases
Implementation Status: COMPLETE (597 test lines)
Coverage: Rule application, context awareness, performance

Layer 4: Validation Tests ⏳

Test Specifications Defined - Validation test cases
- JSON schema validation test cases
- Type checking and coercion test cases
- Data integrity verification test cases
- Custom validation rule test cases
Implementation Status: PENDING
Coverage: Schema compliance, type safety, error handling

Layer 5: Tolerant Parsing Tests ⏳

Test Specifications Defined - Aggressive parsing test cases
- Recursive descent parser test cases
- Error recovery test cases
- Partial parsing test cases
- Fallback extraction test cases
Implementation Status: PENDING
Coverage: Parse states, error recovery, data extraction

Integration Tests ⏳

Pipeline Integration Tests - End-to-end scenarios
- Layer orchestration test cases
- Data flow validation test cases
- Error propagation test cases
- Configuration handling test cases
Real-World Scenarios - Production-like test cases
- Complex malformed JSON test cases
- Large file processing test cases
- Edge case combinations test cases
Implementation Status: PENDING

Performance Tests ⏳

Benchmark Test Suite - Performance validation
- Layer-specific benchmarks
- Pipeline throughput benchmarks
- Memory usage benchmarks
- Scalability benchmarks
Large File Tests - Volume handling
- Streaming performance tests
- Memory efficiency tests
- Timeout handling tests
Implementation Status: PENDING

Property-Based Tests ⏳

Repair Properties - Invariant testing
- Idempotency properties
- Data preservation properties
- Error handling properties
Invariant Properties - System-wide guarantees
- Input/output relationship properties
- Performance boundary properties
- Safety guarantee properties
Implementation Status: PENDING

Test Support Infrastructure ⏳

Test Utilities - Helper functions and fixtures
- Data generators for all layers
- Assertion helpers for repair validation
- Performance measurement utilities
- Error scenario generators
Fixtures - Test data sets
- Valid JSON fixtures
- Malformed JSON fixtures
- Edge case fixtures
- Real-world sample fixtures
Implementation Status: PENDING

Advanced Test Features ⏳

Error Handling Tests - Comprehensive error scenarios
Configuration Tests - Option validation and effects
Monitoring Tests - Performance and health tracking
Security Tests - Input validation and safety
Integration Tests - Framework and CLI testing

Current Test Status Summary

✅ Completed Tests: Layers 1-3 with comprehensive coverage
⏳ Pending Tests: Layers 4-5, integration, performance, properties
📊 Test Coverage: 37.5% complete (3/8 major test suites)
🧪 Test Quality: High coverage for completed layers (22+ scenarios each)
🔧 Ready for TDD: Layer 4-5 specs can drive immediate implementation

Test-Driven Development Notes

Existing Test Style: Follow parameter patterns from actual implementation in test/unit/
Function Signatures: Use actual function signatures from implemented layers
Context Structures: Match repair_context and layer_result patterns from codebase
Assertion Patterns: Build upon existing test assertion styles in working tests

⚠️ Implementation Style Note

CRITICAL: When implementing test cases defined in this document, you MUST follow the actual parameter usage patterns, function signatures, test structure, and assertion styles found in the existing test files at /test/unit/layer1_content_cleaning_test.exs, /test/unit/layer2_structural_repair_test.exs, and /test/unit/layer3_syntax_normalization_test.exs. The test examples in this document are for specification purposes only - the actual implementation may use different parameter names, context structures, or testing patterns than shown here.

Test Organization Structure

File Organization

test/
├── unit/                           # Layer-specific unit tests
│   ├── layer1_content_cleaning_test.exs
│   ├── layer2_structural_repair_test.exs
│   ├── layer3_syntax_normalization_test.exs
│   ├── layer4_validation_test.exs
│   └── layer5_tolerant_parsing_test.exs
├── integration/                    # End-to-end integration tests
│   ├── pipeline_integration_test.exs
│   ├── real_world_scenarios_test.exs
│   └── error_handling_test.exs
├── performance/                    # Performance and benchmarking
│   ├── benchmark_test.exs
│   ├── memory_usage_test.exs
│   └── large_file_test.exs
├── property/                       # Property-based testing
│   ├── repair_properties_test.exs
│   └── invariant_properties_test.exs
├── support/                        # Test utilities and fixtures
│   ├── test_helper.ex
│   ├── fixtures.ex
│   ├── generators.ex
│   └── assertions.ex
└── json_remedy_test.exs           # Main API tests

Layer 1: Content Cleaning Tests

Code Fence Removal Test Cases

# test/unit/layer1_content_cleaning_test.exs
defmodule JsonRemedy.Layer1.ContentCleaningTest do
  use ExUnit.Case
  alias JsonRemedy.Layer1.ContentCleaning
  
  describe "code fence removal" do
    test "removes standard json fences" do
      input = """
      ```json
      {"name": "Alice", "age": 30}
      ```
      """
      expected = "{\"name\": \"Alice\", \"age\": 30}"
      
      assert {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
      assert String.trim(result) == expected
      assert [%{action: action}] = context.repairs
      assert action =~ "removed code fences"
    end
    
    test "handles various fence syntaxes" do
      test_cases = [
        # Standard
        "```json\n{\"a\": 1}\n```",
        # Language variants
        "```JSON\n{\"a\": 1}\n```",
        "```javascript\n{\"a\": 1}\n```",
        # Malformed fences
        "``json\n{\"a\": 1}```",
        "```json\n{\"a\": 1}``",
        # Multiple fences
        "```json\n{\"a\": 1}\n```\n```json\n{\"b\": 2}\n```"
      ]
      
      for input <- test_cases do
        {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
        # Should contain valid JSON after processing
        assert String.contains?(result, "{\"a\": 1}") or String.contains?(result, "{\"b\": 2}")
        assert length(context.repairs) > 0
      end
    end
    
    test "preserves fence content inside strings" do
      input = "{\"example\": \"Use ```json for highlighting\"}"
      
      {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
      
      assert result == input  # Should be unchanged
      assert context.repairs == []
    end
    
    test "handles nested fence-like content" do
      input = """
      ```json
      {
        "description": "Code block: ```python\\nprint('hello')\\n```",
        "value": 42
      }
      ```
      """
      
      {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
      
      # Should remove outer fences but preserve inner content
      assert String.contains?(result, "Code block: ```python")
      assert String.contains?(result, "\"value\": 42")
      refute String.starts_with?(result, "```json")
    end
  end
  
  describe "comment removal" do
    test "removes line comments" do
      test_cases = [
        # Start of line
        {"// Comment\n{\"name\": \"Alice\"}", "{\"name\": \"Alice\"}"},
        # End of line  
        {"{\"name\": \"Alice\"} // Comment", "{\"name\": \"Alice\"} "},
        # Middle of object
        {"{\"name\": \"Alice\", // Comment\n\"age\": 30}", "{\"name\": \"Alice\", \n\"age\": 30}"}
      ]
      
      for {input, expected_pattern} <- test_cases do
        {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
        assert String.contains?(result, "Alice")
        assert not String.contains?(result, "Comment")
        assert length(context.repairs) > 0
      end
    end
    
    test "removes block comments" do
      test_cases = [
        "/* Comment */ {\"name\": \"Alice\"}",
        "{\"name\": \"Alice\" /* Comment */}",
        """
        {
          /* Multi
             line
             comment */
          "name": "Alice"
        }
        """
      ]
      
      for input <- test_cases do
        {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
        assert String.contains?(result, "Alice")
        assert not String.contains?(result, "Comment")
        assert length(context.repairs) > 0
      end
    end
    
    test "preserves comment-like content in strings" do
      input = "{\"message\": \"This // is not a comment\", \"note\": \"Neither /* is this */\"}"
      
      {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
      
      assert result == input
      assert context.repairs == []
    end
    
    test "handles nested block comments" do
      input = "{\"name\": \"Alice\" /* outer /* inner */ still outer */}"
      
      {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
      
      assert String.contains?(result, "Alice")
      assert not String.contains?(result, "outer")
      assert not String.contains?(result, "inner")
    end
  end
  
  describe "wrapper text extraction" do
    test "extracts json from prose" do
      input = """
      Here's the data you requested:
      
      {"name": "Alice", "age": 30}
      
      Let me know if you need anything else!
      """
      
      {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
      
      assert String.trim(result) == "{\"name\": \"Alice\", \"age\": 30}"
      assert [%{action: action}] = context.repairs
      assert action =~ "extracted JSON from wrapper text"
    end
    
    test "handles multiple json objects in text" do
      input = """
      First user: {"name": "Alice"}
      Second user: {"name": "Bob"}
      """
      
      {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
      
      # Should extract the first complete JSON object
      assert String.contains?(result, "Alice")
      # Implementation detail: may or may not include Bob
    end
    
    test "extracts from html/xml wrappers" do
      test_cases = [
        "<pre>{\"name\": \"Alice\"}</pre>",
        "<code>{\"name\": \"Alice\"}</code>",
        "<json>{\"name\": \"Alice\"}</json>"
      ]
      
      for input <- test_cases do
        {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
        assert String.contains?(result, "Alice")
        assert not String.contains?(result, "<")
        assert length(context.repairs) > 0
      end
    end
  end
  
  describe "encoding normalization" do
    test "handles utf-8 content correctly" do
      input = "{\"name\": \"José\", \"city\": \"São Paulo\"}"
      
      {:ok, result, context} = ContentCleaning.process(input, %{repairs: [], options: []})
      
      assert result == input  # Should be preserved correctly
      assert context.repairs == []
    end
    
    test "normalizes different encodings" do
      # These would be actual encoding issues in real scenarios
      test_cases = [
        "{\"emoji\": \"🚀💯✨\"}",
        "{\"unicode\": \"\\u0048\\u0065\\u006c\\u006c\\u006f\"}",  # "Hello"
        "{\"accented\": \"café\"}"
      ]
      
      for input <- test_cases do
        {:ok, result, _context} = ContentCleaning.process(input, %{repairs: [], options: []})
        # Should be valid UTF-8 after processing
        assert String.valid?(result)
      end
    end
  end
end

Layer 2: Structural Repair Tests

State Machine and Delimiter Tests

# test/unit/layer2_structural_repair_test.exs
defmodule JsonRemedy.Layer2.StructuralRepairTest do
  use ExUnit.Case
  alias JsonRemedy.Layer2.StructuralRepair
  
  describe "missing closing delimiters" do
    test "adds missing closing brace - simple object" do
      test_cases = [
        {"{\"name\": \"Alice\"", "{\"name\": \"Alice\"}"},
        {"{\"name\": \"Alice\", \"age\": 30", "{\"name\": \"Alice\", \"age\": 30}"},
        {"{\"nested\": {\"inner\": \"value\"", "{\"nested\": {\"inner\": \"value\"}}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "closing brace"))
      end
    end
    
    test "adds missing closing bracket - simple array" do
      test_cases = [
        {"[1, 2, 3", "[1, 2, 3]"},
        {"[{\"name\": \"Alice\"}, {\"name\": \"Bob\"}", "[{\"name\": \"Alice\"}, {\"name\": \"Bob\"}]"},
        {"[[1, 2], [3, 4]", "[[1, 2], [3, 4]]"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "closing bracket"))
      end
    end
    
    test "handles complex nested missing delimiters" do
      input = """
      {
        "users": [
          {
            "name": "Alice",
            "profile": {
              "city": "NYC",
              "preferences": {
                "theme": "dark"
      """
      
      {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
      
      # Should close all nested structures
      assert String.ends_with?(result, "}}}]}")
      # Should log multiple repairs
      assert length(context.repairs) >= 3
    end
    
    test "tracks nesting depth correctly" do
      input = "{\"level1\": {\"level2\": {\"level3\": \"value\""
      
      {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
      
      assert result == "{\"level1\": {\"level2\": {\"level3\": \"value\"}}}"
      assert length(context.repairs) == 2  # Two missing braces
    end
  end
  
  describe "extra closing delimiters" do
    test "removes extra closing braces" do
      test_cases = [
        {"{\"name\": \"Alice\"}}", "{\"name\": \"Alice\"}"},
        {"{\"name\": \"Alice\"}}}", "{\"name\": \"Alice\"}"},
        {"{{\"name\": \"Alice\"}}", "{\"name\": \"Alice\"}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "removed extra"))
      end
    end
    
    test "removes extra closing brackets" do
      test_cases = [
        {"[1, 2, 3]]", "[1, 2, 3]"},
        {"[1, 2, 3]]]", "[1, 2, 3]"},
        {"[[1, 2, 3]]", "[1, 2, 3]"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "removed extra"))
      end
    end
  end
  
  describe "mismatched delimiters" do
    test "fixes object-array mismatches" do
      test_cases = [
        {"{\"name\": \"Alice\"]", "{\"name\": \"Alice\"}"},
        {"[\"item1\", \"item2\"}", "[\"item1\", \"item2\"]"},
        {"{\"data\": [1, 2, 3}", "{\"data\": [1, 2, 3]}"},
        {"[{\"name\": \"Alice\"}]", "[{\"name\": \"Alice\"}]"}  # Should remain unchanged
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
        assert result == expected
        if input != expected do
          assert length(context.repairs) > 0
        end
      end
    end
    
    test "handles complex mismatched scenarios" do
      input = "{\"users\": [{\"name\": \"Alice\"}, {\"name\": \"Bob\"}}"
      
      {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
      
      assert result == "{\"users\": [{\"name\": \"Alice\"}, {\"name\": \"Bob\"}]}"
      assert Enum.any?(context.repairs, &String.contains?(&1.action, "added missing"))
    end
  end
  
  describe "state machine behavior" do
    test "tracks parser states correctly" do
      # This tests the internal state machine logic
      input = "{\"key\": \"value\", \"array\": [1, 2, {\"nested\": true}]}"
      
      # Should parse without any repairs needed
      {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
      
      assert result == input
      assert context.repairs == []
    end
    
    test "recovers from state machine errors" do
      # Input that would confuse a simple parser
      input = "{\"key\": \"val}ue\", \"other\": \"data\"}"
      
      {:ok, result, context} = StructuralRepair.process(input, %{repairs: [], options: []})
      
      # Should handle the brace inside the string correctly
      assert String.contains?(result, "val}ue")
      assert String.contains?(result, "other")
    end
  end
end

Layer 3: Syntax Normalization Tests

Context-Aware Syntax Repairs

# test/unit/layer3_syntax_normalization_test.exs
defmodule JsonRemedy.Layer3.SyntaxNormalizationTest do
  use ExUnit.Case
  alias JsonRemedy.Layer3.SyntaxNormalization
  
  describe "quote normalization" do
    test "converts single quotes to double quotes" do
      test_cases = [
        {"{'name': 'Alice'}", "{\"name\": \"Alice\"}"},
        {"{'users': [{'name': 'Alice'}, {'name': 'Bob'}]}", "{\"users\": [{\"name\": \"Alice\"}, {\"name\": \"Bob\"}]}"},
        {"{'mixed': \"quotes\"}", "{\"mixed\": \"quotes\"}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "normalized quotes"))
      end
    end
    
    test "handles smart quotes" do
      test_cases = [
        {"{"name": "Alice"}", "{\"name\": \"Alice\"}"},
        {"{'name': 'Alice'}", "{\"name\": \"Alice\"}"},
        {"{"name": 'Alice'}", "{\"name\": \"Alice\"}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert length(context.repairs) > 0
      end
    end
    
    test "preserves quotes inside string content" do
      input = "{\"message\": \"She said 'hello' to me\", \"code\": \"Use \\\"quotes\\\" properly\"}"
      
      {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
      
      assert result == input  # Should be unchanged
      assert context.repairs == []
    end
  end
  
  describe "unquoted keys" do
    test "quotes simple unquoted keys" do
      test_cases = [
        {"{name: \"Alice\"}", "{\"name\": \"Alice\"}"},
        {"{name: \"Alice\", age: 30}", "{\"name\": \"Alice\", \"age\": 30}"},
        {"{user_name: \"Alice\", user_age: 30}", "{\"user_name\": \"Alice\", \"user_age\": 30}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "quoted unquoted key"))
      end
    end
    
    test "handles complex key names" do
      test_cases = [
        {"{user_name_1: \"Alice\"}", "{\"user_name_1\": \"Alice\"}"},
        {"{userName: \"Alice\"}", "{\"userName\": \"Alice\"}"},
        {"{user$name: \"Alice\"}", "{\"user$name\": \"Alice\"}"}  # May or may not be supported
      ]
      
      for {input, _expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        # Should either quote the key or leave it unchanged
        assert String.contains?(result, "Alice")
      end
    end
    
    test "doesn't quote keys that are already quoted" do
      input = "{\"name\": \"Alice\", age: 30, \"active\": true}"
      
      {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
      
      assert result == "{\"name\": \"Alice\", \"age\": 30, \"active\": true}"
      # Should only repair the 'age' key
      assert length(context.repairs) == 1
    end
    
    test "preserves key-like content in strings" do
      input = "{\"description\": \"Use format key: value\", \"example\": \"name: Alice\"}"
      
      {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
      
      assert result == input  # Should be unchanged
      assert context.repairs == []
    end
  end
  
  describe "boolean and null normalization" do
    test "normalizes Python-style booleans" do
      test_cases = [
        {"{\"active\": True}", "{\"active\": true}"},
        {"{\"active\": False}", "{\"active\": false}"},
        {"{\"verified\": True, \"deleted\": False}", "{\"verified\": true, \"deleted\": false}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "normalized boolean"))
      end
    end
    
    test "normalizes case variants" do
      test_cases = [
        {"{\"active\": TRUE}", "{\"active\": true}"},
        {"{\"active\": FALSE}", "{\"active\": false}"},
        {"{\"active\": True}", "{\"active\": true}"},
        {"{\"active\": False}", "{\"active\": false}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert length(context.repairs) > 0
      end
    end
    
    test "normalizes null variants" do
      test_cases = [
        {"{\"value\": None}", "{\"value\": null}"},
        {"{\"value\": NULL}", "{\"value\": null}"},
        {"{\"value\": Null}", "{\"value\": null}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "normalized null"))
      end
    end
    
    test "preserves boolean-like content in strings" do
      input = "{\"message\": \"The value is True\", \"note\": \"Set to None\"}"
      
      {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
      
      assert result == input  # Should be unchanged
      assert context.repairs == []
    end
  end
  
  describe "comma and colon fixes" do
    test "removes trailing commas in objects" do
      test_cases = [
        {"{\"name\": \"Alice\",}", "{\"name\": \"Alice\"}"},
        {"{\"name\": \"Alice\", \"age\": 30,}", "{\"name\": \"Alice\", \"age\": 30}"},
        {"{\"users\": [{\"name\": \"Alice\",}],}", "{\"users\": [{\"name\": \"Alice\"}]}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "removed trailing comma"))
      end
    end
    
    test "removes trailing commas in arrays" do
      test_cases = [
        {"[1, 2, 3,]", "[1, 2, 3]"},
        {"[\"a\", \"b\", \"c\",]", "[\"a\", \"b\", \"c\"]"},
        {"[[1, 2,], [3, 4,],]", "[[1, 2], [3, 4]]"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "removed trailing comma"))
      end
    end
    
    test "adds missing commas in objects" do
      test_cases = [
        {"{\"name\": \"Alice\" \"age\": 30}", "{\"name\": \"Alice\", \"age\": 30}"},
        {"{\"a\": 1 \"b\": 2 \"c\": 3}", "{\"a\": 1, \"b\": 2, \"c\": 3}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "added missing comma"))
      end
    end
    
    test "adds missing commas in arrays" do
      test_cases = [
        {"[1 2 3]", "[1, 2, 3]"},
        {"[\"a\" \"b\" \"c\"]", "[\"a\", \"b\", \"c\"]"},
        {"[{\"name\": \"Alice\"} {\"name\": \"Bob\"}]", "[{\"name\": \"Alice\"}, {\"name\": \"Bob\"}]"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "added missing comma"))
      end
    end
    
    test "adds missing colons" do
      test_cases = [
        {"{\"name\" \"Alice\"}", "{\"name\": \"Alice\"}"},
        {"{\"name\" \"Alice\", \"age\" 30}", "{\"name\": \"Alice\", \"age\": 30}"}
      ]
      
      for {input, expected} <- test_cases do
        {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
        assert result == expected
        assert Enum.any?(context.repairs, &String.contains?(&1.action, "added missing colon"))
      end
    end
    
    test "preserves commas and colons in strings" do
      input = "{\"message\": \"Name: Alice, Age: 30\", \"data\": \"a,b,c\"}"
      
      {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
      
      assert result == input  # Should be unchanged
      assert context.repairs == []
    end
  end
  
  describe "context awareness" do
    test "applies rules only outside of strings" do
      # This is a complex test that verifies context-aware processing
      input = """
      {
        "message": "Set active: True, use None for missing",
        "config": {
          "active": True,
          "value": None,
          "note": "Don't change: True, False, None"
        }
      }
      """
      
      {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
      
      # Should normalize True/None outside strings but preserve inside strings
      assert String.contains?(result, "\"active\": true")
      assert String.contains?(result, "\"value\": null")
      assert String.contains?(result, "Set active: True, use None for missing")
      assert String.contains?(result, "Don't change: True, False, None")
      
      # Should have exactly 2 repairs (True -> true, None -> null)
      assert length(context.repairs) == 2
    end
    
    test "handles escaped quotes in strings" do
      input = "{\"escaped\": \"She said \\\"True\\\" to me\", \"active\": True}"
      
      {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
      
      # Should preserve escaped quotes but normalize the boolean
      assert String.contains?(result, "She said \\\"True\\\" to me")
      assert String.contains?(result, "\"active\": true")
      assert length(context.repairs) == 1
    end
  end
  
  describe "rule ordering and interactions" do
    test "applies rules in correct order" do
      # Input that requires multiple rule applications
      input = "{name: 'Alice', active: True, scores: [95, 87, 92,], metadata: None}"
      
      {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
      
      expected = "{\"name\": \"Alice\", \"active\": true, \"scores\": [95, 87, 92], \"metadata\": null}"
      assert result == expected
      
      # Should have repairs for: unquoted key, single quotes, boolean, trailing comma, null
      assert length(context.repairs) >= 4
    end
    
    test "handles rule interactions correctly" do
      # Rules should not interfere with each other
      input = "{users: [{'name': 'Alice', 'active': True,}, {'name': 'Bob', 'active': False,}],}"
      
      {:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
      
      # Should apply all necessary fixes
      assert String.contains?(result, "\"users\":")
      assert String.contains?(result, "\"name\": \"Alice\"")
      assert String.contains?(result, "\"active\": true")
      assert String.contains?(result, "\"active\": false")
      assert not String.ends_with?(result, ",}")
      assert not String.ends_with?(result, ",]")
    end
  end
end

Layer 4: Validation Tests

Fast Path and Fallback Logic

# test/unit/layer4_validation_test.exs
defmodule JsonRemedy.Layer4.ValidationTest do
  use ExUnit.Case
  alias JsonRemedy.Layer4.Validation
  
  describe "fast path validation" do
    test "parses valid JSON immediately" do
      valid_inputs = [
        "{\"name\": \"Alice\"}",
        "[1, 2, 3]",
        "true",
        "false", 
        "null",
        "\"hello\"",
        "42",
        "3.14"
      ]
      
      for input <- valid_inputs do
        {:ok, result, context} = Validation.process(input, %{repairs: [], options: []})
        
        # Should parse successfully
        assert is_map(result) or is_list(result) or is_binary(result) or 
               is_number(result) or is_boolean(result) or is_nil(result)
        
        # Should not add any repairs
        assert context.repairs == []
      end
    end
    
    test "handles complex valid JSON" do
      input = """
      {
        "users": [
          {"name": "Alice", "age": 30, "active": true},
          {"name": "Bob", "age": 25, "active": false}
        ],
        "metadata": {
          "total": 2,
          "generated": "2024-01-15T10:00:00Z"
        }
      }
      """
      
      {:ok, result, context} = Validation.process(input, %{repairs: [], options: []})
      
      assert is_map(result)
      assert length(result["users"]) == 2
      assert result["metadata"]["total"] == 2
      assert context.repairs == []
    end
  end
  
  describe "fallback behavior" do
    test "passes malformed JSON to next layer" do
      malformed_inputs = [
        "{name: \"Alice\"}",      # Unquoted key
        "[1, 2, 3,]",           # Trailing comma
        "{\"active\": True}",    # Python boolean
        "{'name': 'Alice'}"     # Single quotes
      ]
      
      for input <- malformed_inputs do
        {:continue, passed_input, context} = Validation.process(input, %{repairs: [], options: []})
        
        assert passed_input == input  # Should pass through unchanged
        assert context.repairs == []  # Should not add repairs
      end
    end
    
    test "handles Jason decode errors gracefully" do
      # Input that would cause Jason to throw an exception
      invalid_input = "{\"incomplete\": "
      
      result = Validation.process(invalid_input, %{repairs: [], options: []})
      
      # Should either continue or error gracefully
      assert match?({:continue, _, _}, result) or match?({:error, _}, result)
    end
  end
  
  describe "performance optimization" do
    test "validation is fast for valid JSON" do
      input = "{\"name\": \"Alice\", \"age\": 30}"
      
      {time, result} = :timer.tc(fn ->
        Validation.process(input, %{repairs: [], options: []})
      end)
      
      assert {:ok, _, _} = result
      # Should be very fast (less than 100 microseconds)
      assert time < 100
    end
    
    test "fallback is fast for obviously invalid JSON" do
      input = "{clearly not json"
      
      {time, result} = :timer.tc(fn ->
        Validation.process(input, %{repairs: [], options: []})
      end)
      
      assert {:continue, _, _} = result
      # Should fail fast (less than 50 microseconds)
      assert time < 50
    end
  end
  
  describe "edge cases" do
    test "handles empty input" do
      {:continue, "", context} = Validation.process("", %{repairs: [], options: []})
      assert context.repairs == []
    end
    
    test "handles whitespace-only input" do
      {:continue, "   ", context} = Validation.process("   ", %{repairs: [], options: []})
      assert context.repairs == []
    end
    
    test "handles very large valid JSON" do
      # Generate a large but valid JSON
      large_json = Jason.encode!(%{
        "users" => for i <- 1..1000 do
          %{"id" => i, "name" => "User#{i}", "active" => rem(i, 2) == 0}
        end
      })
      
      {:ok, result, context} = Validation.process(large_json, %{repairs: [], options: []})
      
      assert is_map(result)
      assert length(result["users"]) == 1000
      assert context.repairs == []
    end
  end
end

Layer 5: Tolerant Parsing Tests

Error Recovery and Edge Cases

# test/unit/layer5_tolerant_parsing_test.exs
defmodule JsonRemedy.Layer5.TolerantParsingTest do
  use ExUnit.Case
  alias JsonRemedy.Layer5.TolerantParsing
  
  describe "severely malformed input" do
    test "attempts to parse key-value pairs from unstructured text" do
      test_cases = [
        {"name Alice age 30", %{"name" => "Alice", "age" => "30"}},
        {"name: Alice, age: 30", %{"name" => "Alice", "age" => "30"}},
        {"name=Alice age=30", %{"name" => "Alice", "age" => "30"}}
      ]
      
      for {input, expected_pattern} <- test_cases do
        {:ok, result, context} = TolerantParsing.process(input, %{repairs: [], options: []})
        
        assert is_map(result)
        assert result["name"] =~ "Alice"
        assert Map.has_key?(result, "age")
        assert length(context.repairs) > 0
      end
    end
    
    test "handles truncated JSON gracefully" do
      test_cases = [
        "{\"users\": [{\"name\": \"Alice\"",
        "[{\"name\": \"Alice\", \"age\": 30",
        "{\"config\": {\"theme\": \"dark\", \"lang\""
      ]
      
      for input <- test_cases do
        {:ok, result, context} = TolerantParsing.process(input, %{repairs: [], options: []})
        
        # Should extract what it can
        assert is_map(result) or is_list(result)
        assert length(context.repairs) > 0
      end
    end
    
    test "extracts data from mixed content" do
      input = "User data: name=Alice, age=30, active=true, scores=[95,87,92]"
      
      {:ok, result, context} = TolerantParsing.process(input, %{repairs: [], options: []})
      
      assert is_map(result)
      assert result["name"] == "Alice"
      assert Map.has_key?(result, "age")
      assert length(context.repairs) > 0
    end
  end
  
  describe "error recovery strategies" do
    test "recovers from quote errors" do
      test_cases = [
        "{\"name: \"Alice\", \"age\": 30}",       # Missing closing quote
        "{name\": \"Alice\", \"age\": 30}",       # Missing opening quote
        "{\"name\": Alice\", \"age\": 30}"        # Extra quote
      ]
      
      for input <- test_cases do
        {:ok, result, context} = TolerantParsing.process(input, %{repairs: [], options: []})
        
        assert is_map(result)
        assert Map.has_key?(result, "name") or Map.has_key?(result, "age")
        assert length(context.repairs) > 0
      end
    end
    
    test "recovers from delimiter errors" do
      test_cases = [
        "[1, 2, 3, {\"name\": \"Alice\"}",        # Missing closing bracket
        "{\"users\": [\"Alice\", \"Bob\"",        # Missing multiple closers
        "{{\"nested\": true}"                     # Extra opening brace
      ]
      
      for input <- test_cases do
        result = TolerantParsing.process(input, %{repairs: [], options: []})
        
        # Should either succeed with recovery or fail gracefully
        assert match?({:ok, _, _}, result) or match?({:error, _}, result)
      end
    end
    
    test "handles infinite recursion protection" do
      # Input that could cause infinite loops
      deeply_nested = String.duplicate("{\"a\":", 100) <> "1" <> String.duplicate("}", 100)
      
      result = TolerantParsing.process(deeply_nested, %{repairs: [], options: []})
      
      # Should handle gracefully without hanging
      assert match?({:ok, _, _}, result) or match?({:error, _}, result)
    end
  end
  
  describe "partial parsing success" do
    test "extracts valid parts from invalid input" do
      input = "{\"valid\": true, invalid_part, \"also_valid\": 42}"
      
      {:ok, result, context} = TolerantParsing.process(input, %{repairs: [], options: []})
      
      # Should extract the valid parts
      assert result["valid"] == true
      assert result["also_valid"] == 42
      assert length(context.repairs) > 0
    end
    
    test "handles arrays with mixed valid/invalid elements" do
      input = "[1, 2, invalid_element, 4, \"valid_string\"]"
      
      {:ok, result, context} = TolerantParsing.process(input, %{repairs: [], options: []})
      
      assert is_list(result)
      assert 1 in result
      assert 2 in result
      assert 4 in result
      assert "valid_string" in result
      assert length(context.repairs) > 0
    end
  end
  
  describe "fallback strategies" do
    test "falls back to string extraction when structure fails" do
      input = "completely unstructured text with some data"
      
      result = TolerantParsing.process(input, %{repairs: [], options: []})
      
      # Should either extract something or fail gracefully
      case result do
        {:ok, extracted, context} ->
          assert is_map(extracted) or is_binary(extracted)
          assert length(context.repairs) > 0
        {:error, _reason} ->
          # Acceptable failure for completely unstructured input
          assert true
      end
    end
    
    test "respects maximum recovery attempts" do
      # Input designed to trigger multiple recovery attempts
      problematic_input = String.duplicate("{invalid", 50)
      
      start_time = System.monotonic_time(:millisecond)
      result = TolerantParsing.process(problematic_input, %{repairs: [], options: []})
      end_time = System.monotonic_time(:millisecond)
      
      # Should not take too long (< 1 second)
      assert end_time - start_time < 1000
      
      # Should either succeed or fail gracefully
      assert match?({:ok, _, _}, result) or match?({:error, _}, result)
    end
  end
  
  describe "memory safety" do
    test "handles very large malformed input safely" do
      # Generate large malformed input
      large_input = """
      {
        "data": [
      """ <> 
      (1..1000 |> Enum.map(fn i -> "malformed_item_#{i}" end) |> Enum.join(", ")) <>
      """
        ]
      }
      """
      
      result = TolerantParsing.process(large_input, %{repairs: [], options: []})
      
      # Should handle without memory issues
      assert match?({:ok, _, _}, result) or match?({:error, _}, result)
    end
    
    test "prevents stack overflow on deeply nested input" do
      # Create deeply nested but malformed structure
      nested_input = String.duplicate("[{", 1000) <> "\"value\": 1"
      
      result = TolerantParsing.process(nested_input, %{repairs: [], options: []})
      
      # Should not crash with stack overflow
      assert match?({:ok, _, _}, result) or match?({:error, _}, result)
    end
  end
end

Integration Tests

End-to-End Pipeline Testing

# test/integration/pipeline_integration_test.exs
defmodule JsonRemedy.PipelineIntegrationTest do
  use ExUnit.Case
  alias JsonRemedy
  
  describe "full pipeline integration" do
    test "processes LLM output with multiple issues" do
      llm_output = """
      Here's the user data you requested:

      ```json
      {
        // User information
        users: [
          {
            name: 'Alice Johnson',
            email: "[email protected]",
            age: 30,
            active: True,
            scores: [95, 87, 92,],  // Test scores
            profile: {
              city: "New York",
              interests: ["coding", "music", "travel",]
            },
          },
          {
            name: 'Bob Smith',
            email: "[email protected]", 
            age: 25,
            active: False
            // Missing comma above
          }
        ],
        metadata: {
          total: 2,
          updated: "2024-01-15"
          // Missing closing brace
      ```

      That should give you what you need!
      """
      
      {:ok, result, repairs} = JsonRemedy.repair(llm_output, logging: true)
      
      # Verify structure is correct
      assert is_map(result)
      assert length(result["users"]) == 2
      
      # Verify user data
      alice = Enum.find(result["users"], &(&1["name"] =~ "Alice"))
      bob = Enum.find(result["users"], &(&1["name"] =~ "Bob"))
      
      assert alice["email"] == "[email protected]"
      assert alice["active"] == true
      assert alice["scores"] == [95, 87, 92]
      assert alice["profile"]["city"] == "New York"
      
      assert bob["active"] == false
      assert bob["age"] == 25
      
      # Verify metadata
      assert result["metadata"]["total"] == 2
      
      # Verify repairs were logged
      assert is_list(repairs)
      assert length(repairs) > 5  # Should have many repairs
      
      repair_actions = Enum.map(repairs, & &1.action) |> Enum.join(" ")
      assert repair_actions =~ "code fence"
      assert repair_actions =~ "comment"
      assert repair_actions =~ "quote"
      assert repair_actions =~ "boolean"
      assert repair_actions =~ "comma"
    end
    
    test "handles legacy Python system output" do
      python_output = """
      {
        'timestamp': '2024-01-15T10:00:00Z',
        'users': [
          {
            'id': 1,
            'name': 'Alice',
            'active': True,
            'metadata': None,
            'preferences': {
              'theme': 'dark',
              'notifications': False
            }
          }
        ],
        'success': True,
        'errors': None
      }
      """
      
      {:ok, result} = JsonRemedy.repair(python_output)
      
      assert result["success"] == true
      assert result["errors"] == nil
      assert result["users"] |> hd() |> Map.get("active") == true
      assert result["users"] |> hd() |> Map.get("metadata") == nil
      assert result["users"] |> hd() |> get_in(["preferences", "notifications"]) == false
    end
    
    test "processes streaming/incomplete data" do
      incomplete_inputs = [
        "{\"status\": \"processing\", \"data\": [1, 2, 3",
        "{\"users\": [{\"name\": \"Alice\"}, {\"name\": \"Bob\"}]",
        "{\"config\": {\"theme\": \"dark\", \"lang\": \"en\""
      ]
      
      for input <- incomplete_inputs do
        {:ok, result} = JsonRemedy.repair(input)
        
        assert is_map(result)
        # Should extract meaningful data even from incomplete input
        assert map_size(result) > 0
      end
    end
  end
  
  describe "layer interaction" do
    test "layers work together correctly" do
      # Input that exercises multiple layers
      complex_input = """
      // Configuration file
      ```json
      {
        app_name: "MyApp",
        version: "1.0.0",
        features: {
          authentication: True,
          logging: {
            level: "info",
            destinations: ["console", "file",]
          },
          cache: {
            enabled: True,
            ttl: 3600
          }
        },
        database: {
          host: "localhost",
          port: 5432,
          ssl: False
        }
        // Missing closing brace
      ```
      """
      
      {:ok, result, repairs} = JsonRemedy.repair(complex_input, logging: true)
      
      # Verify structure
      assert result["app_name"] == "MyApp"
      assert result["features"]["authentication"] == true
      assert result["features"]["cache"]["enabled"] == true
      assert result["database"]["ssl"] == false
      
      # Verify all layers contributed
      layer_names = Enum.map(repairs, & &1.layer) |> Enum.uniq()
      assert :content_cleaning in layer_names
      assert :syntax_normalization in layer_names
      assert :structural_repair in layer_names
    end
    
    test "early exit optimization works" do
      valid_json = "{\"name\": \"Alice\", \"age\": 30}"
      
      {:ok, result, repairs} = JsonRemedy.repair(valid_json, logging: true, early_exit: true)
      
      assert result == %{"name" => "Alice", "age" => 30}
      # Should exit early with no repairs needed
      assert repairs == []
    end
    
    test "handles layer failures gracefully" do
      # Input that might cause specific layers to fail
      problematic_input = "\x00\x01\x02{\"name\": \"Alice\"}"  # Binary data + JSON
      
      result = JsonRemedy.repair(problematic_input)
      
      # Should either succeed with recovery or fail gracefully
      assert match?({:ok, _}, result) or match?({:error, _}, result)
    end
  end
  
  describe "performance integration" do
    test "maintains performance across layers" do
      # Test with various input sizes
      inputs = [
        "{\"small\": \"data\"}",
        Jason.encode!(%{"medium" => Enum.to_list(1..100)}),
        Jason.encode!(%{"large" => Enum.to_list(1..10000)})
      ]
      
      for input <- inputs do
        # Introduce some malformation
        malformed = String.replace(input, "\"", "'", global: false)
        
        {time, {:ok, _result}} = :timer.tc(fn ->
          JsonRemedy.repair(malformed)
        end)
        
        # Should complete in reasonable time
        expected_max_time = case byte_size(input) do
          size when size < 100 -> 1_000      # 1ms for small
          size when size < 10_000 -> 10_000  # 10ms for medium  
          _ -> 50_000                        # 50ms for large
        end
        
        assert time < expected_max_time
      end
    end
    
    test "memory usage is reasonable" do
      large_malformed_json = """
      {
        users: [
      """ <> 
      (1..1000 |> Enum.map(fn i -> 
        "{name: 'User#{i}', id: #{i}, active: True}"
      end) |> Enum.join(",\n")) <>
      """
        ],
        total: 1000
      """
      
      :erlang.garbage_collect()
      {memory_before, _} = :erlang.process_info(self(), :memory)
      
      {:ok, _result} = JsonRemedy.repair(large_malformed_json)
      
      :erlang.garbage_collect()
      {memory_after, _} = :erlang.process_info(self(), :memory)
      
      memory_used = memory_after - memory_before
      # Should not use excessive memory (< 1MB for this test)
      assert memory_used < 1_000_000
    end
  end
end

Real-World Scenario Tests

Comprehensive Real-World Test Cases

# test/integration/real_world_scenarios_test.exs
defmodule JsonRemedy.RealWorldScenariosTest do
  use ExUnit.Case
  alias JsonRemedy
  
  describe "LLM outputs" do
    test "ChatGPT style response" do
      chatgpt_response = """
      Based on your request, I'll provide the user data in JSON format:

      ```json
      {
        "response": {
          "status": "success",
          "data": {
            "users": [
              {
                "id": 1,
                "name": "Alice Johnson",
                "email": "[email protected]",
                "profile": {
                  "location": "New York, NY",
                  "interests": ["technology", "design", "travel"]
                },
                "settings": {
                  "theme": "dark",
                  "notifications": true,
                  "privacy": "public"
                }
              },
              {
                "id": 2,
                "name": "Bob Wilson", 
                "email": "[email protected]",
                "profile": {
                  "location": "San Francisco, CA",
                  "interests": ["programming", "music"]
                },
                "settings": {
                  "theme": "light",
                  "notifications": false,
                  "privacy": "private"
                }
              }
            ],
            "pagination": {
              "total": 2,
              "page": 1,
              "limit": 10
            }
          }
        }
      }
      ```

      This structure should work well for your application's user management system.
      """
      
      {:ok, result} = JsonRemedy.repair(chatgpt_response)
      
      assert result["response"]["status"] == "success"
      assert length(result["response"]["data"]["users"]) == 2
      assert result["response"]["data"]["pagination"]["total"] == 2
      
      alice = Enum.find(result["response"]["data"]["users"], &(&1["name"] == "Alice Johnson"))
      assert alice["profile"]["location"] == "New York, NY"
      assert alice["settings"]["theme"] == "dark"
    end
    
    test "Claude style response with reasoning" do
      claude_response = """
      I'll help you create that user configuration. Let me structure this properly:

      Looking at your requirements:
      1. User preferences 
      2. System settings
      3. Feature flags

      Here's the resulting configuration:

      {
        "user": {
          "preferences": {
            "theme": "dark",
            "language": "en-US",
            "timezone": "America/New_York",
            "notifications": {
              "email": true,
              "push": false,
              "sms": false
            }
          },
          "profile": {
            "name": "Alice Johnson",
            "avatar": "https://example.com/avatar.jpg"
          }
        },
        "system": {
          "version": "2.1.0",
          "features": {
            "beta_features": false,
            "analytics": true,
            "debug_mode": false
          }
        },
        "metadata": {
          "created_at": "2024-01-15T10:00:00Z",
          "last_updated": "2024-01-15T10:00:00Z"
        }
      }

      This configuration covers all the essential user and system settings you requested.
      """
      
      {:ok, result} = JsonRemedy.repair(claude_response)
      
      assert result["user"]["preferences"]["theme"] == "dark"
      assert result["user"]["profile"]["name"] == "Alice Johnson"
      assert result["system"]["version"] == "2.1.0"
      assert result["metadata"]["created_at"] == "2024-01-15T10:00:00Z"
    end
    
    test "truncated LLM response" do
      truncated_response = """
      {
        "users": [
          {
            "id": 1,
            "name": "Alice",
            "email": "[email protected]",
            "active": true,
            "profile": {
              "city": "New York",
              "age": 30,
              "interests": ["coding", "design"
      """
      
      {:ok, result} = JsonRemedy.repair(truncated_response)
      
      assert result["users"] |> hd() |> Map.get("name") == "Alice"
      assert result["users"] |> hd() |> Map.get("email") == "[email protected]"
      assert result["users"] |> hd() |> get_in(["profile", "city"]) == "New York"
    end
  end
  
  describe "legacy system outputs" do
    test "Python pickle-style output" do
      python_output = """
      {
        'timestamp': '2024-01-15 10:00:00',
        'data': {
          'users': [
            {
              'user_id': 1,
              'name': 'Alice Johnson',
              'email': '[email protected]',
              'is_active': True,
              'last_login': None,
              'permissions': ['read', 'write'],
              'metadata': {
                'created_at': '2024-01-01',
                'updated_at': '2024-01-15',
                'login_count': 42
              }
            },
            {
              'user_id': 2,
              'name': 'Bob Smith',
              'email': '[email protected]', 
              'is_active': False,
              'last_login': '2024-01-10 15:30:00',
              'permissions': ['read'],
              'metadata': {
                'created_at': '2024-01-05',
                'updated_at': '2024-01-10',
                'login_count': 7
              }
            }
          ]
        },
        'status': 'success',
        'error': None
      }
      """
      
      {:ok, result} = JsonRemedy.repair(python_output)
      
      assert result["status"] == "success"
      assert result["error"] == nil
      assert length(result["data"]["users"]) == 2
      
      alice = Enum.find(result["data"]["users"], &(&1["name"] == "Alice Johnson"))
      assert alice["is_active"] == true
      assert alice["last_login"] == nil
      assert alice["permissions"] == ["read", "write"]
      
      bob = Enum.find(result["data"]["users"], &(&1["name"] == "Bob Smith"))
      assert bob["is_active"] == false
      assert bob["metadata"]["login_count"] == 7
    end
    
    test "JavaScript object literal" do
      js_object = """
      {
        name: "MyApp",
        version: "1.2.3",
        config: {
          apiEndpoint: "https://api.example.com",
          timeout: 5000,
          retries: 3,
          features: {
            authentication: true,
            logging: true,
            caching: false
          }
        },
        dependencies: [
          "express",
          "axios", 
          "lodash"
        ],
        devDependencies: [
          "jest",
          "eslint"
        ]
      }
      """
      
      {:ok, result} = JsonRemedy.repair(js_object)
      
      assert result["name"] == "MyApp"
      assert result["version"] == "1.2.3"
      assert result["config"]["timeout"] == 5000
      assert result["config"]["features"]["authentication"] == true
      assert "express" in result["dependencies"]
      assert "jest" in result["devDependencies"]
    end
    
    test "legacy API response with extra data" do
      api_response = """
      HTTP/1.1 200 OK
      Content-Type: application/json
      Cache-Control: no-cache
      
      {
        "api_version": "v2.1",
        "timestamp": "2024-01-15T10:00:00Z",
        "request_id": "req_123456789",
        "data": {
          "users": [
            {
              "id": "user_001",
              "username": "alice_j",
              "display_name": "Alice Johnson",
              "email": "[email protected]",
              "department": "Engineering",
              "role": "Senior Developer",
              "status": "active",
              "created_at": "2023-06-15T09:00:00Z",
              "last_active": "2024-01-15T09:45:00Z"
            }
          ]
        },
        "meta": {
          "total_count": 1,
          "page": 1,
          "per_page": 10,
          "has_more": false
        }
      }
      """
      
      {:ok, result} = JsonRemedy.repair(api_response)
      
      assert result["api_version"] == "v2.1"
      assert result["data"]["users"] |> hd() |> Map.get("username") == "alice_j"
      assert result["meta"]["total_count"] == 1
    end
  end
  
  describe "configuration files" do
    test "malformed config file with comments" do
      config_content = """
      // Application Configuration
      // Updated: 2024-01-15
      {
        "app": {
          "name": "MyApplication",
          "version": "2.1.0",
          "env": "production"
        },
        
        // Database settings
        "database": {
          "host": "localhost",
          "port": 5432,
          "name": "myapp_prod",
          "ssl": true,
          "pool_size": 10,
          "timeout": 5000
        },
        
        // Cache configuration
        "cache": {
          "type": "redis",
          "host": "redis.example.com",
          "port": 6379,
          "ttl": 3600,
          "max_connections": 20
        },
        
        // Feature flags
        "features": {
          "new_ui": true,
          "analytics": true,
          "beta_features": false,
          "debug_mode": false
        },
        
        // Logging configuration
        "logging": {
          "level": "info",
          "outputs": ["console", "file"],
          "file_path": "/var/log/myapp.log",
          "max_size": "100MB",
          "rotate": true
        }
      }
      """
      
      {:ok, result} = JsonRemedy.repair(config_content)
      
      assert result["app"]["name"] == "MyApplication"
      assert result["database"]["port"] == 5432
      assert result["cache"]["ttl"] == 3600
      assert result["features"]["new_ui"] == true
      assert result["logging"]["level"] == "info"
      assert "console" in result["logging"]["outputs"]
    end
    
    test "environment config with mixed syntax" do
      env_config = """
      {
        // Environment: Production
        NODE_ENV: "production",
        PORT: 3000,
        
        // Database
        DB_HOST: "db.prod.example.com",
        DB_PORT: 5432,
        DB_NAME: "myapp",
        DB_SSL: true,
        
        // External Services
        REDIS_URL: "redis://redis.prod.example.com:6379",
        ELASTICSEARCH_URL: "https://elastic.prod.example.com",
        
        // API Keys (placeholder values)
        API_KEY_SERVICE_A: "sk_prod_xxxxxxxxxxxx",
        API_KEY_SERVICE_B: "pk_prod_yyyyyyyyyyyy",
        
        // Feature toggles
        ENABLE_NEW_FEATURE: true,
        ENABLE_BETA_FEATURES: false,
        MAINTENANCE_MODE: false
      }
      """
      
      {:ok, result} = JsonRemedy.repair(env_config)
      
      assert result["NODE_ENV"] == "production"
      assert result["PORT"] == 3000
      assert result["DB_HOST"] == "db.prod.example.com"
      assert result["ENABLE_NEW_FEATURE"] == true
      assert result["MAINTENANCE_MODE"] == false
    end
  end
  
  describe "data export/import scenarios" do
    test "CSV-to-JSON conversion artifact" do
      # Simulates output from a CSV-to-JSON conversion tool with issues
      csv_conversion = """
      [
        {
          "id": 1,
          "first_name": "Alice",
          "last_name": "Johnson", 
          "email": "[email protected]",
          "phone": "(555) 123-4567",
          "department": "Engineering",
          "salary": "$95,000",
          "start_date": "2023-06-15",
          "is_active": "true",
          "manager_id": 5,
        },
        {
          "id": 2,
          "first_name": "Bob",
          "last_name": "Smith",
          "email": "[email protected]", 
          "phone": "(555) 987-6543",
          "department": "Marketing",
          "salary": "$75,000",
          "start_date": "2023-08-01",
          "is_active": "false",
          "manager_id": 3,
        }
      ]
      """
      
      {:ok, result} = JsonRemedy.repair(csv_conversion)
      
      assert length(result) == 2
      assert hd(result)["first_name"] == "Alice"
      assert hd(result)["salary"] == "$95,000"
      assert Enum.at(result, 1)["department"] == "Marketing"
    end
    
    test "database export with mixed data types" do
      db_export = """
      {
        "export_info": {
          "table": "users",
          "exported_at": "2024-01-15T10:00:00Z",
          "row_count": 3,
          "format": "json"
        },
        "data": [
          {
            "id": 1,
            "username": "alice_j",
            "email": "[email protected]",
            "created_at": "2023-06-15T09:00:00Z",
            "last_login": "2024-01-15T08:30:00Z",
            "login_count": 127,
            "is_verified": true,
            "profile_data": '{"bio": "Software Engineer", "location": "NYC"}',
            "tags": '["developer", "team-lead", "python"]'
          },
          {
            "id": 2,
            "username": "bob_s", 
            "email": "[email protected]",
            "created_at": "2023-08-01T14:00:00Z",
            "last_login": null,
            "login_count": 0,
            "is_verified": false,
            "profile_data": '{"bio": "Marketing Specialist", "location": "SF"}',
            "tags": '["marketing", "content", "social-media"]'
          }
        ]
      }
      """
      
      {:ok, result} = JsonRemedy.repair(db_export)
      
      assert result["export_info"]["row_count"] == 3
      assert length(result["data"]) == 2
      
      alice = hd(result["data"])
      assert alice["username"] == "alice_j"
      assert alice["is_verified"] == true
      assert alice["login_count"] == 127
    end
  end
end

Performance and Property Tests

Performance Validation Tests

# test/performance/benchmark_test.exs
defmodule JsonRemedy.BenchmarkTest do
  use ExUnit.Case
  
  @moduletag :performance
  
  describe "performance benchmarks" do
    test "valid JSON fast path performance" do
      valid_inputs = [
        "{\"name\": \"Alice\"}",
        Jason.encode!(%{"users" => Enum.to_list(1..100)}),
        Jason.encode!(%{"large_data" => Enum.to_list(1..10000)})
      ]
      
      for input <- valid_inputs do
        # Warm up
        for _ <- 1..10, do: JsonRemedy.repair(input)
        
        # Measure
        times = for _ <- 1..100 do
          {time, {:ok, _}} = :timer.tc(fn -> JsonRemedy.repair(input) end)
          time
        end
        
        avg_time = Enum.sum(times) / length(times)
        
        # Performance thresholds based on input size
        expected_max = case byte_size(input) do
          size when size < 100 -> 50      # 50μs for small
          size when size < 10_000 -> 500  # 500μs for medium
          _ -> 5_000                      # 5ms for large
        end
        
        assert avg_time < expected_max, 
          "Average time #{avg_time}μs exceeded threshold #{expected_max}μs for input size #{byte_size(input)}"
      end
    end
    
    test "malformed JSON repair performance" do
      malformed_inputs = [
        "{name: 'Alice', active: True}",
        # Medium complexity
        """
        {
          users: [
            {name: 'Alice', active: True, scores: [1,2,3,]},
            {name: 'Bob', active: False}
          ],
          total: 2
        """,
        # High complexity
        File.read!("test/support/large_invalid.json")
      ]
      
      for input <- malformed_inputs do
        # Warm up
        for _ <- 1..5, do: JsonRemedy.repair(input)
        
        # Measure
        {time, result} = :timer.tc(fn -> JsonRemedy.repair(input) end)
        
        # Should succeed
        assert match?({:ok, _}, result)
        
        # Performance thresholds
        expected_max = case byte_size(input) do
          size when size < 200 -> 2_000    # 2ms for small
          size when size < 5_000 -> 10_000 # 10ms for medium
          _ -> 50_000                      # 50ms for large
        end
        
        assert time < expected_max,
          "Repair time #{time}μs exceeded threshold #{expected_max}μs for input size #{byte_size(input)}"
      end
    end
    
    test "memory usage benchmarks" do
      test_cases = [
        {"small", "{name: 'Alice'}", 10_000},      # 10KB max
        {"medium", File.read!("test/support/invalid.json"), 50_000},  # 50KB max
        {"large", File.read!("test/support/large_invalid.json"), 500_000} # 500KB max
      ]
      
      for {size_label, input, memory_limit} <- test_cases do
        :erlang.garbage_collect()
        {memory_before, _} = :erlang.process_info(self(), :memory)
        
        {:ok, _result} = JsonRemedy.repair(input)
        
        :erlang.garbage_collect()
        {memory_after, _} = :erlang.process_info(self(), :memory)
        
        memory_used = memory_after - memory_before
        
        assert memory_used < memory_limit,
          "Memory usage #{memory_used} bytes exceeded limit #{memory_limit} bytes for #{size_label} input"
      end
    end
    
    test "concurrent repair performance" do
      input = "{users: [{name: 'Alice', active: True}, {name: 'Bob', active: False}]}"
      
      # Test concurrent repairs
      tasks = for _ <- 1..10 do
        Task.async(fn ->
          {time, result} = :timer.tc(fn -> JsonRemedy.repair(input) end)
          {time, result}
        end)
      end
      
      results = Task.await_many(tasks, 5000)
      
      # All should succeed
      assert Enum.all?(results, fn {_time, result} -> match?({:ok, _}, result) end)
      
      # Performance shouldn't degrade significantly under concurrency
      times = Enum.map(results, fn {time, _} -> time end)
      avg_time = Enum.sum(times) / length(times)
      max_time = Enum.max(times)
      
      assert avg_time < 5_000  # 5ms average
      assert max_time < 15_000 # 15ms max (allowing for some variance)
    end
  end
end

Property-Based Tests

# test/property/repair_properties_test.exs
defmodule JsonRemedy.RepairPropertiesTest do
  use ExUnit.Case
  use PropCheck
  
  @moduletag :property
  
  property "repair is idempotent for valid JSON" do
    forall json_term <- json_generator() do
      json_string = Jason.encode!(json_term)
      
      # First repair
      {:ok, result1} = JsonRemedy.repair(json_string)
      
      # Second repair of the result
      result1_string = Jason.encode!(result1)
      {:ok, result2} = JsonRemedy.repair(result1_string)
      
      result1 == result2
    end
  end
  
  property "repair always produces valid JSON or fails gracefully" do
    forall input <- malformed_json_generator() do
      case JsonRemedy.repair(input) do
        {:ok, result} ->
          # Result should be encodable as valid JSON
          case Jason.encode(result) do
            {:ok, _} -> true
            {:error, _} -> false
          end
        {:error, _} ->
          # Graceful failure is acceptable
          true
      end
    end
  end
  
  property "repair preserves semantic content when possible" do
    forall {original_term, malformed_string} <- malformed_pair_generator() do
      original_json = Jason.encode!(original_term)
      
      case {JsonRemedy.repair(original_json), JsonRemedy.repair(malformed_string)} do
        {{:ok, original_result}, {:ok, repair_result}} ->
          semantically_equivalent?(original_result, repair_result)
        _ ->
          true  # Acceptable if either fails
      end
    end
  end
  
  property "repair handles arbitrary strings without crashing" do
    forall input <- utf8() do
      case JsonRemedy.repair(input) do
        {:ok, _} -> true
        {:error, _} -> true
        # Should never crash or return invalid response
      end
    end
  end
  
  # Generators
  defp json_generator do
    sized(size, json_value_generator(size))
  end
  
  defp json_value_generator(0) do
    oneof([
      nil,
      bool(),
      integer(),
      float(),
      utf8()
    ])
  end
  
  defp json_value_generator(size) when size > 0 do
    oneof([
      json_value_generator(0),
      map(utf8(), json_value_generator(size - 1)),
      list(json_value_generator(size - 1))
    ])
  end
  
  defp malformed_json_generator do
    oneof([
      # Syntax issues
      map(json_generator(), &introduce_syntax_errors/1),
      # Structural issues  
      map(json_generator(), &introduce_structural_errors/1),
      # Content issues
      map(json_generator(), &introduce_content_issues/1),
      # Random malformation
      map({json_generator(), choose(1, 10)}, fn {json, corruption_level} ->
        introduce_random_errors(Jason.encode!(json), corruption_level)
      end)
    ])
  end
  
  defp malformed_pair_generator do
    bind(json_generator(), fn original ->
      malformed = introduce_syntax_errors(Jason.encode!(original))
      {original, malformed}
    end)
  end
  
  # Error introduction functions
  defp introduce_syntax_errors(json_string) do
    json_string
    |> String.replace("\"", "'", global: false)  # Single quotes
    |> String.replace("true", "True", global: false)  # Python booleans
    |> String.replace("null", "None", global: false)  # Python null
    |> maybe_add_trailing_comma()
  end
  
  defp introduce_structural_errors(json_string) do
    # Remove random closing delimiter
    if String.length(json_string) > 5 do
      String.slice(json_string, 0..-2)  # Remove last character
    else
      json_string
    end
  end
  
  defp introduce_content_issues(json_string) do
    # Add code fences or comments
    oneof([
      "```json\n" <> json_string <> "\n```",
      "// Comment\n" <> json_string,
      "/* Comment */ " <> json_string
    ])
  end
  
  defp introduce_random_errors(json_string, corruption_level) do
    # Introduce random character corruptions
    chars = String.graphemes(json_string)
    corrupted_count = min(corruption_level, length(chars) - 1)
    
    positions = Enum.take_random(0..(length(chars) - 1), corrupted_count)
    
    Enum.reduce(positions, chars, fn pos, acc ->
      List.replace_at(acc, pos, random_char())
    end)
    |> Enum.join()
  end
  
  defp maybe_add_trailing_comma(json_string) do
    # Add trailing comma before closing delimiter
    cond do
      String.ends_with?(json_string, "}") ->
        String.replace_suffix(json_string, "}", ",}")
      String.ends_with?(json_string, "]") ->
        String.replace_suffix(json_string, "]", ",]")
      true ->
        json_string
    end
  end
  
  defp random_char do
    oneof(["!", "@", "#", "$", "%", "^", "&", "*", "?", "~"])
  end
  
  defp semantically_equivalent?(a, b) do
    # Simple semantic equivalence check
    # In practice, this would be more sophisticated
    normalize_for_comparison(a) == normalize_for_comparison(b)
  end
  
  defp normalize_for_comparison(term) do
    # Normalize for comparison (handle floating point issues, etc.)
    term
  end
end

This comprehensive test specification provides:

Complete layer-by-layer testing with specific test cases for each repair concern
Integration testing that validates the full pipeline works correctly
Real-world scenario testing with actual LLM outputs and legacy system formats
Performance benchmarking with specific thresholds and memory usage validation
Property-based testing to discover edge cases and validate invariants
Error handling validation for graceful failure modes
Concurrent testing to ensure thread safety

The test suite is designed to drive TDD implementation, ensuring each layer can be built incrementally with high confidence in correctness and performance.