LAYER 3

Documentation for LAYER_3 from the Json remedy repository.

Layer 3: Syntax Normalization

Overview

Layer 3 is the third layer in the JSON Remedy pipeline, responsible for Syntax Normalization. It fixes JSON syntax issues using character-by-character parsing to be context-aware and preserve string content while normalizing non-standard JSON syntax elements.

Module: JsonRemedy.Layer3.SyntaxNormalization
Priority: 3 (runs after Layer 2 Structural Repair)
Behavior: Implements JsonRemedy.LayerBehaviour

Purpose & Design Philosophy

Layer 3 addresses syntax inconsistencies in JSON data that commonly occur when:

Data originates from JavaScript or Python environments with relaxed quoting rules
Boolean and null values use language-specific variants (True/False/None)
Keys are unquoted (JavaScript object literal style)
Trailing commas are present from permissive parsers
Missing commas or colons break the structure

The layer uses character-by-character parsing with context awareness to distinguish between syntax elements and string content, ensuring that corrections are only applied outside of quoted strings while preserving original string content intact.

Core Functionality

1. Quote Normalization

Converts single quotes to double quotes and adds quotes to unquoted keys:

// Input (single quotes and unquoted keys)
{'name': 'Alice', age: 30, active: true}

// Output (Layer 3 normalized)
{"name": "Alice", "age": 30, "active": true}

Features:

Converts single quotes (') to double quotes (") for keys and values
Adds double quotes around unquoted object keys
Preserves quote characters within string content
Handles nested structures with mixed quoting styles
Respects escape sequences within strings

Implementation: Uses normalize_quotes/1 and quote_unquoted_keys_direct/1 with character-by-character parsing to maintain string context awareness.

2. Boolean and Null Normalization

Standardizes boolean and null literals to JSON-compliant values:

// Input (Python/JavaScript style literals)
{"active": True, "verified": False, "data": None, "status": NULL}

// Output (Layer 3 normalized)
{"active": true, "verified": false, "data": null, "status": null}

Supported Conversions:

True → true
False → false
TRUE → true
FALSE → false
None → null
NULL → null
Null → null

Implementation: Uses normalize_literals_direct/1 with word boundary detection to avoid changing literals within string content.

3. Comma and Colon Repair

Fixes missing and trailing punctuation in JSON structures:

// Input (missing and trailing punctuation)
{"name": "Alice" "age": 30, "items": [1, 2, 3,],}

// Output (Layer 3 repaired)
{"name": "Alice", "age": 30, "items": [1, 2, 3]}

Repair Types:

Missing Commas: Detects adjacent values/objects without separating commas
Trailing Commas: Removes commas before closing delimiters
Missing Colons: Adds colons between object keys and values
Multiple Commas: Reduces consecutive commas to single commas

Implementation: Uses state machine parsing in fix_commas/1 and post_process_commas/1 to understand structural context.

4. Comprehensive Syntax Processing

Layer 3 performs all normalizations in a coordinated single-pass approach:

// Input (multiple syntax issues)
{name: 'Alice', 'age': 30 active: True, items: [1 2 3,], data: None,}

// Output (Layer 3 fully normalized)
{"name": "Alice", "age": 30, "active": true, "items": [1, 2, 3], "data": null}

State Machine Architecture

Parse State Structure

@type parse_state :: %{
  result: String.t(),
  position: non_neg_integer(),
  in_string: boolean(),
  escape_next: boolean(),
  string_quote: String.t() | nil,
  repairs: [repair_action()],
  context_stack: [:object | :array],
  expecting: :key | :value | :colon | :comma_or_end
}

Context Tracking

The state machine maintains:

String State: Whether currently inside quoted strings to avoid processing syntax within content
Escape State: Handling backslash escape sequences within strings
Context Stack: Tracking whether inside objects {} or arrays []
Expectation State: What token type is expected next (key, value, colon, comma)
Position Tracking: Character-level position for precise repair reporting

Processing Rules

@type syntax_rule :: %{
  name: String.t(),
  processor: (String.t() -> {String.t(), [repair_action()]}),
  condition: (String.t() -> boolean()) | nil
}

Default Rule Set:

quote_unquoted_keys - Add quotes around bare object keys
normalize_single_quotes - Convert single to double quotes
normalize_booleans_and_nulls - Standardize literal values
fix_trailing_commas - Remove trailing punctuation

API Specification

Main Processing Function

@spec process(input :: String.t(), context :: repair_context()) :: layer_result()

Returns:

{:ok, processed_input, updated_context} - Layer completed successfully
{:continue, input, context} - Layer doesn’t apply, pass to next layer
{:error, reason} - Layer failed, stop pipeline

Specialized Normalization Functions

@spec normalize_quotes(input :: String.t()) :: {String.t(), [repair_action()]}
@spec normalize_booleans(input :: String.t()) :: {String.t(), [repair_action()]}
@spec fix_commas(input :: String.t()) :: {String.t(), [repair_action()]}
@spec default_rules() :: [syntax_rule()]

Each specialized function can be called independently for targeted repairs.

Layer Behavior Callbacks

@spec supports?(input :: String.t()) :: boolean()
@spec priority() :: 3
@spec name() :: String.t()
@spec validate_options(options :: keyword()) :: :ok | {:error, String.t()}

Configuration Options

Layer 3 accepts the following boolean configuration options:

:strict_mode - Enable strict validation and error reporting (boolean)
:preserve_formatting - Maintain original whitespace and formatting (boolean)
:normalize_quotes - Enable quote normalization (boolean)
:normalize_booleans - Enable boolean/null literal normalization (boolean)
:fix_commas - Enable comma and colon repair (boolean)

Example:

options = [
  strict_mode: false,
  preserve_formatting: true,
  normalize_quotes: true,
  normalize_booleans: true,
  fix_commas: true
]

Option Validation:

All options must be boolean values
Unknown options result in validation errors
Invalid value types return descriptive error messages

Processing Pipeline

Layer 3 processes input through the following stages:

Syntax Issue Detection
- Scan for single quotes, unquoted keys
- Detect non-standard boolean/null literals
- Identify punctuation issues
Rule-Based Processing
- Apply each syntax rule in sequence
- Maintain character-level position tracking
- Accumulate repair actions for each transformation
Context-Aware Parsing
- Track string boundaries to avoid content modification
- Handle escape sequences within strings
- Maintain structural context (object vs array)
Post-Processing
- Final comma and colon cleanup
- Whitespace normalization (if configured)
- Repair action consolidation

Repair Action Tracking

Layer 3 generates detailed repair actions for transparency:

%{
  layer: :syntax_normalization,
  action: "normalized quotes",
  position: 15,
  original: "'Alice'",
  replacement: "\"Alice\""
}

Action Types:

"normalized quotes" - Single to double quote conversion
"quoted unquoted key" - Added quotes around bare keys
"normalized boolean True -> true" - Boolean literal standardization
"normalized null None -> null" - Null literal standardization
"added missing comma" - Inserted missing comma separators
"removed trailing comma" - Removed trailing punctuation
"added missing colon" - Inserted missing key-value separators

Error Handling

Layer 3 includes comprehensive error handling:

Input Validation: Ensures input is a binary string
Nil Handling: Gracefully handles nil inputs
Type Checking: Validates context parameter types
Option Validation: Validates all configuration options with detailed error messages
Parse Protection: Catches and reports character parsing errors
Graceful Degradation: Returns error tuples instead of raising exceptions

Performance Characteristics

Time Complexity

Linear: O(n) where n is input string length
Single-pass: Character-by-character processing for most operations
Efficient String Building: Minimal memory copying during transformations

Space Complexity

Result Building: O(n) for character accumulation
State Tracking: O(1) for parse state maintenance
Repair Recording: O(r) where r is number of repairs needed

Optimization Features

Early Detection: supports?/1 provides fast pre-screening using heuristics
Word Boundary Detection: Avoids false positives in literal replacement
Context Preservation: Minimal parsing overhead for string boundary tracking
Single-Pass Processing: Most transformations completed in one iteration

Integration with JSON Remedy Pipeline

Layer Interaction

Input: Receives structurally-corrected JSON from Layer 2
Output: Provides syntax-normalized JSON to Layer 4 (if exists)
Context Preservation: Maintains repair history across layers

Pipeline Position

Layer 1 → Layer 2 → Layer 3 → ... → Layer N
Content   Structural  Syntax
Cleaning   Repair     Normalization

Repair Context Flow

# Input context from Layer 2
%{repairs: [layer1_repairs, layer2_repairs], options: [...]}

# Output context to next layer
%{
  repairs: [layer1_repairs, layer2_repairs] ++ [layer3_repairs],
  options: [...],
  metadata: %{..., layer3_applied: true}
}

Common Use Cases

1. JavaScript Object Literals

// JavaScript-style object
{name: 'John', age: 30, active: true}

// Layer 3 normalization
{"name": "John", "age": 30, "active": true}

2. Python Data Serialization

// Python dict with Python literals
{'users': [{'active': True, 'data': None}]}

// Layer 3 normalization
{"users": [{"active": true, "data": null}]}

3. Relaxed Parser Output

// Permissive parser with trailing commas
{"items": ["item1", "item2", "item3",], "count": 3,}

// Layer 3 cleanup
{"items": ["item1", "item2", "item3"], "count": 3}

4. Mixed Syntax Sources

// Multiple syntax issues combined
{name: 'Alice', 'verified': True items: [1 2 3,] data: None,}

// Layer 3 comprehensive repair
{"name": "Alice", "verified": true, "items": [1, 2, 3], "data": null}

5. Configuration File Processing

// Config file with comments removed by Layer 1
{
  server: {
    host: 'localhost',
    port: 8080,
    ssl: False,
  },
}

// Layer 3 standardization
{
  "server": {
    "host": "localhost",
    "port": 8080,
    "ssl": false
  }
}

Testing & Validation

Layer 3 includes comprehensive test coverage:

597 lines of unit tests covering all normalization scenarios
Quote Handling: Single quotes, unquoted keys, nested quoting
Literal Normalization: All boolean and null variant combinations
Punctuation Repair: Missing and trailing commas, missing colons
String Preservation: Ensures syntax-like content in strings remains unchanged
Edge Cases: Empty structures, escape sequences, mixed nesting

Test Categories

Quote Normalization Tests: Single quotes, unquoted keys, smart quotes
Boolean/Null Tests: All literal variants and case combinations
Comma/Colon Tests: Missing and trailing punctuation scenarios
Integration Tests: Multiple syntax issues in single inputs
Layer Behavior Tests: supports?/1, priority/0, validation
String Preservation Tests: Syntax-like content within strings
Error Handling Tests: Invalid inputs and malformed data

Debugging & Diagnostics

Layer Support Detection

SyntaxNormalization.supports?("{'name': 'Alice'}")
# => true (single quotes detected)

SyntaxNormalization.supports?("{name: \"Alice\"}")
# => true (unquoted key detected)

Individual Function Testing

{result, repairs} = SyntaxNormalization.normalize_quotes("{'key': 'value'}")
# => {"{\"key\": \"value\"}", [%{action: "normalized quotes", ...}]}

{result, repairs} = SyntaxNormalization.normalize_booleans("{\"active\": True}")
# => {"{\"active\": true}", [%{action: "normalized boolean True -> true", ...}]}

Repair Action Inspection

{:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
IO.inspect(context.repairs)
# => [%{layer: :syntax_normalization, action: "normalized quotes", position: 1, ...}]

Option Validation

SyntaxNormalization.validate_options([normalize_quotes: "yes"])
# => {:error, "Option normalize_quotes must be a boolean"}

Best Practices

Run After Structural Repair: Layer 3 expects structurally-sound JSON from Layer 2
Configure Selectively: Disable specific normalizations if source format must be preserved
Monitor Repair Actions: Review repair logs to understand data quality patterns
Test String Preservation: Verify that string content remains unchanged
Validate Configuration: Always validate options before processing
Handle Edge Cases: Test with empty structures and complex nesting

Advanced Features

Word Boundary Detection

Layer 3 uses sophisticated word boundary detection to avoid false positives:

// Will NOT be changed (within string content)
{"message": "The result is True", "note": "Set None when empty"}

// WILL be changed (actual JSON literals)
{"result": True, "value": None}

Context-Aware Processing

The layer maintains parsing context to handle complex nested structures:

// Complex nesting with mixed syntax issues
{
  users: [
    {name: 'Alice', active: True},
    {'name': 'Bob', 'active': False,}
  ],
  'metadata': {count: 2, verified: None,}
}

Escape Sequence Preservation

Layer 3 correctly handles escape sequences within strings:

// Input with escape sequences
{"path": "C:\\Users\\name", "quote": "She said \"hello\""}

// Output (unchanged - escapes preserved)
{"path": "C:\\Users\\name", "quote": "She said \"hello\""}

Limitations & Considerations

Heuristic-Based Detection: supports?/1 uses pattern matching heuristics
Single-Pass Processing: Most repairs are single-pass for performance
String Content Preservation: Never modifies content within quoted strings
Context Dependency: Relies on structurally-correct input from Layer 2
Performance Trade-offs: Character-by-character parsing for precision vs speed

Layer 3 provides comprehensive syntax normalization that handles the majority of non-standard JSON syntax variations while maintaining high performance and preserving string content integrity.