Layer 3: Syntax Normalization
Overview
Layer 3 is the third layer in the JSON Remedy pipeline, responsible for Syntax Normalization. It fixes JSON syntax issues using character-by-character parsing to be context-aware and preserve string content while normalizing non-standard JSON syntax elements.
Module: JsonRemedy.Layer3.SyntaxNormalization
Priority: 3 (runs after Layer 2 Structural Repair)
Behavior: Implements JsonRemedy.LayerBehaviour
Purpose & Design Philosophy
Layer 3 addresses syntax inconsistencies in JSON data that commonly occur when:
- Data originates from JavaScript or Python environments with relaxed quoting rules
- Boolean and null values use language-specific variants (True/False/None)
- Keys are unquoted (JavaScript object literal style)
- Trailing commas are present from permissive parsers
- Missing commas or colons break the structure
The layer uses character-by-character parsing with context awareness to distinguish between syntax elements and string content, ensuring that corrections are only applied outside of quoted strings while preserving original string content intact.
Core Functionality
1. Quote Normalization
Converts single quotes to double quotes and adds quotes to unquoted keys:
// Input (single quotes and unquoted keys)
{'name': 'Alice', age: 30, active: true}
// Output (Layer 3 normalized)
{"name": "Alice", "age": 30, "active": true}
Features:
- Converts single quotes (
'
) to double quotes ("
) for keys and values - Adds double quotes around unquoted object keys
- Preserves quote characters within string content
- Handles nested structures with mixed quoting styles
- Respects escape sequences within strings
Implementation: Uses normalize_quotes/1
and quote_unquoted_keys_direct/1
with character-by-character parsing to maintain string context awareness.
2. Boolean and Null Normalization
Standardizes boolean and null literals to JSON-compliant values:
// Input (Python/JavaScript style literals)
{"active": True, "verified": False, "data": None, "status": NULL}
// Output (Layer 3 normalized)
{"active": true, "verified": false, "data": null, "status": null}
Supported Conversions:
True
→true
False
→false
TRUE
→true
FALSE
→false
None
→null
NULL
→null
Null
→null
Implementation: Uses normalize_literals_direct/1
with word boundary detection to avoid changing literals within string content.
3. Comma and Colon Repair
Fixes missing and trailing punctuation in JSON structures:
// Input (missing and trailing punctuation)
{"name": "Alice" "age": 30, "items": [1, 2, 3,],}
// Output (Layer 3 repaired)
{"name": "Alice", "age": 30, "items": [1, 2, 3]}
Repair Types:
- Missing Commas: Detects adjacent values/objects without separating commas
- Trailing Commas: Removes commas before closing delimiters
- Missing Colons: Adds colons between object keys and values
- Multiple Commas: Reduces consecutive commas to single commas
Implementation: Uses state machine parsing in fix_commas/1
and post_process_commas/1
to understand structural context.
4. Comprehensive Syntax Processing
Layer 3 performs all normalizations in a coordinated single-pass approach:
// Input (multiple syntax issues)
{name: 'Alice', 'age': 30 active: True, items: [1 2 3,], data: None,}
// Output (Layer 3 fully normalized)
{"name": "Alice", "age": 30, "active": true, "items": [1, 2, 3], "data": null}
State Machine Architecture
Parse State Structure
@type parse_state :: %{
result: String.t(),
position: non_neg_integer(),
in_string: boolean(),
escape_next: boolean(),
string_quote: String.t() | nil,
repairs: [repair_action()],
context_stack: [:object | :array],
expecting: :key | :value | :colon | :comma_or_end
}
Context Tracking
The state machine maintains:
- String State: Whether currently inside quoted strings to avoid processing syntax within content
- Escape State: Handling backslash escape sequences within strings
- Context Stack: Tracking whether inside objects
{}
or arrays[]
- Expectation State: What token type is expected next (key, value, colon, comma)
- Position Tracking: Character-level position for precise repair reporting
Processing Rules
@type syntax_rule :: %{
name: String.t(),
processor: (String.t() -> {String.t(), [repair_action()]}),
condition: (String.t() -> boolean()) | nil
}
Default Rule Set:
quote_unquoted_keys
- Add quotes around bare object keysnormalize_single_quotes
- Convert single to double quotesnormalize_booleans_and_nulls
- Standardize literal valuesfix_trailing_commas
- Remove trailing punctuation
API Specification
Main Processing Function
@spec process(input :: String.t(), context :: repair_context()) :: layer_result()
Returns:
{:ok, processed_input, updated_context}
- Layer completed successfully{:continue, input, context}
- Layer doesn’t apply, pass to next layer{:error, reason}
- Layer failed, stop pipeline
Specialized Normalization Functions
@spec normalize_quotes(input :: String.t()) :: {String.t(), [repair_action()]}
@spec normalize_booleans(input :: String.t()) :: {String.t(), [repair_action()]}
@spec fix_commas(input :: String.t()) :: {String.t(), [repair_action()]}
@spec default_rules() :: [syntax_rule()]
Each specialized function can be called independently for targeted repairs.
Layer Behavior Callbacks
@spec supports?(input :: String.t()) :: boolean()
@spec priority() :: 3
@spec name() :: String.t()
@spec validate_options(options :: keyword()) :: :ok | {:error, String.t()}
Configuration Options
Layer 3 accepts the following boolean configuration options:
:strict_mode
- Enable strict validation and error reporting (boolean):preserve_formatting
- Maintain original whitespace and formatting (boolean):normalize_quotes
- Enable quote normalization (boolean):normalize_booleans
- Enable boolean/null literal normalization (boolean):fix_commas
- Enable comma and colon repair (boolean)
Example:
options = [
strict_mode: false,
preserve_formatting: true,
normalize_quotes: true,
normalize_booleans: true,
fix_commas: true
]
Option Validation:
- All options must be boolean values
- Unknown options result in validation errors
- Invalid value types return descriptive error messages
Processing Pipeline
Layer 3 processes input through the following stages:
Syntax Issue Detection
- Scan for single quotes, unquoted keys
- Detect non-standard boolean/null literals
- Identify punctuation issues
Rule-Based Processing
- Apply each syntax rule in sequence
- Maintain character-level position tracking
- Accumulate repair actions for each transformation
Context-Aware Parsing
- Track string boundaries to avoid content modification
- Handle escape sequences within strings
- Maintain structural context (object vs array)
Post-Processing
- Final comma and colon cleanup
- Whitespace normalization (if configured)
- Repair action consolidation
Repair Action Tracking
Layer 3 generates detailed repair actions for transparency:
%{
layer: :syntax_normalization,
action: "normalized quotes",
position: 15,
original: "'Alice'",
replacement: "\"Alice\""
}
Action Types:
"normalized quotes"
- Single to double quote conversion"quoted unquoted key"
- Added quotes around bare keys"normalized boolean True -> true"
- Boolean literal standardization"normalized null None -> null"
- Null literal standardization"added missing comma"
- Inserted missing comma separators"removed trailing comma"
- Removed trailing punctuation"added missing colon"
- Inserted missing key-value separators
Error Handling
Layer 3 includes comprehensive error handling:
- Input Validation: Ensures input is a binary string
- Nil Handling: Gracefully handles nil inputs
- Type Checking: Validates context parameter types
- Option Validation: Validates all configuration options with detailed error messages
- Parse Protection: Catches and reports character parsing errors
- Graceful Degradation: Returns error tuples instead of raising exceptions
Performance Characteristics
Time Complexity
- Linear: O(n) where n is input string length
- Single-pass: Character-by-character processing for most operations
- Efficient String Building: Minimal memory copying during transformations
Space Complexity
- Result Building: O(n) for character accumulation
- State Tracking: O(1) for parse state maintenance
- Repair Recording: O(r) where r is number of repairs needed
Optimization Features
- Early Detection:
supports?/1
provides fast pre-screening using heuristics - Word Boundary Detection: Avoids false positives in literal replacement
- Context Preservation: Minimal parsing overhead for string boundary tracking
- Single-Pass Processing: Most transformations completed in one iteration
Integration with JSON Remedy Pipeline
Layer Interaction
- Input: Receives structurally-corrected JSON from Layer 2
- Output: Provides syntax-normalized JSON to Layer 4 (if exists)
- Context Preservation: Maintains repair history across layers
Pipeline Position
Layer 1 → Layer 2 → Layer 3 → ... → Layer N
Content Structural Syntax
Cleaning Repair Normalization
Repair Context Flow
# Input context from Layer 2
%{repairs: [layer1_repairs, layer2_repairs], options: [...]}
# Output context to next layer
%{
repairs: [layer1_repairs, layer2_repairs] ++ [layer3_repairs],
options: [...],
metadata: %{..., layer3_applied: true}
}
Common Use Cases
1. JavaScript Object Literals
// JavaScript-style object
{name: 'John', age: 30, active: true}
// Layer 3 normalization
{"name": "John", "age": 30, "active": true}
2. Python Data Serialization
// Python dict with Python literals
{'users': [{'active': True, 'data': None}]}
// Layer 3 normalization
{"users": [{"active": true, "data": null}]}
3. Relaxed Parser Output
// Permissive parser with trailing commas
{"items": ["item1", "item2", "item3",], "count": 3,}
// Layer 3 cleanup
{"items": ["item1", "item2", "item3"], "count": 3}
4. Mixed Syntax Sources
// Multiple syntax issues combined
{name: 'Alice', 'verified': True items: [1 2 3,] data: None,}
// Layer 3 comprehensive repair
{"name": "Alice", "verified": true, "items": [1, 2, 3], "data": null}
5. Configuration File Processing
// Config file with comments removed by Layer 1
{
server: {
host: 'localhost',
port: 8080,
ssl: False,
},
}
// Layer 3 standardization
{
"server": {
"host": "localhost",
"port": 8080,
"ssl": false
}
}
Testing & Validation
Layer 3 includes comprehensive test coverage:
- 597 lines of unit tests covering all normalization scenarios
- Quote Handling: Single quotes, unquoted keys, nested quoting
- Literal Normalization: All boolean and null variant combinations
- Punctuation Repair: Missing and trailing commas, missing colons
- String Preservation: Ensures syntax-like content in strings remains unchanged
- Edge Cases: Empty structures, escape sequences, mixed nesting
Test Categories
- Quote Normalization Tests: Single quotes, unquoted keys, smart quotes
- Boolean/Null Tests: All literal variants and case combinations
- Comma/Colon Tests: Missing and trailing punctuation scenarios
- Integration Tests: Multiple syntax issues in single inputs
- Layer Behavior Tests: supports?/1, priority/0, validation
- String Preservation Tests: Syntax-like content within strings
- Error Handling Tests: Invalid inputs and malformed data
Debugging & Diagnostics
Layer Support Detection
SyntaxNormalization.supports?("{'name': 'Alice'}")
# => true (single quotes detected)
SyntaxNormalization.supports?("{name: \"Alice\"}")
# => true (unquoted key detected)
Individual Function Testing
{result, repairs} = SyntaxNormalization.normalize_quotes("{'key': 'value'}")
# => {"{\"key\": \"value\"}", [%{action: "normalized quotes", ...}]}
{result, repairs} = SyntaxNormalization.normalize_booleans("{\"active\": True}")
# => {"{\"active\": true}", [%{action: "normalized boolean True -> true", ...}]}
Repair Action Inspection
{:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
IO.inspect(context.repairs)
# => [%{layer: :syntax_normalization, action: "normalized quotes", position: 1, ...}]
Option Validation
SyntaxNormalization.validate_options([normalize_quotes: "yes"])
# => {:error, "Option normalize_quotes must be a boolean"}
Best Practices
- Run After Structural Repair: Layer 3 expects structurally-sound JSON from Layer 2
- Configure Selectively: Disable specific normalizations if source format must be preserved
- Monitor Repair Actions: Review repair logs to understand data quality patterns
- Test String Preservation: Verify that string content remains unchanged
- Validate Configuration: Always validate options before processing
- Handle Edge Cases: Test with empty structures and complex nesting
Advanced Features
Word Boundary Detection
Layer 3 uses sophisticated word boundary detection to avoid false positives:
// Will NOT be changed (within string content)
{"message": "The result is True", "note": "Set None when empty"}
// WILL be changed (actual JSON literals)
{"result": True, "value": None}
Context-Aware Processing
The layer maintains parsing context to handle complex nested structures:
// Complex nesting with mixed syntax issues
{
users: [
{name: 'Alice', active: True},
{'name': 'Bob', 'active': False,}
],
'metadata': {count: 2, verified: None,}
}
Escape Sequence Preservation
Layer 3 correctly handles escape sequences within strings:
// Input with escape sequences
{"path": "C:\\Users\\name", "quote": "She said \"hello\""}
// Output (unchanged - escapes preserved)
{"path": "C:\\Users\\name", "quote": "She said \"hello\""}
Limitations & Considerations
- Heuristic-Based Detection:
supports?/1
uses pattern matching heuristics - Single-Pass Processing: Most repairs are single-pass for performance
- String Content Preservation: Never modifies content within quoted strings
- Context Dependency: Relies on structurally-correct input from Layer 2
- Performance Trade-offs: Character-by-character parsing for precision vs speed
Layer 3 provides comprehensive syntax normalization that handles the majority of non-standard JSON syntax variations while maintaining high performance and preserving string content integrity.