JSON Repair Core Functions Analysis
Core Parser Class and Entry Points
JSONParser
Class
- Primary entry point for all parsing logic
- Manages parsing state (index, context, logging)
- Contains all core repair algorithms
Main Parsing Functions
parse()
→ JSONReturnType
- Main orchestrator function
- Handles multiple JSON objects in sequence
- Manages array wrapping for multiple objects
- Core logic for determining when to parse vs. skip content
parse_json()
→ JSONReturnType
- Central dispatcher for JSON element types
- Determines what type of JSON element to parse based on current character
- Routes to appropriate specialized parser (object, array, string, number, boolean/null)
- Handles edge cases and fallback logic
Container Parsing Functions
parse_object()
→ dict[str, JSONReturnType]
- Object parsing with extensive repair logic
- Handles missing quotes in keys and values
- Manages missing colons after keys
- Fixes missing commas between members
- Handles duplicate keys and array merging
- Manages missing opening/closing braces
- Context-aware parsing for object keys vs values
parse_array()
→ list[JSONReturnType]
- Array parsing with repair capabilities
- Handles missing commas between elements
- Fixes missing opening/closing brackets
- Special logic for detecting objects within arrays (missing braces)
- Manages empty arrays and trailing elements
Primitive Parsing Functions
parse_string()
→ str | bool | None
- Most complex repair function - handles majority of edge cases
- Missing quote detection and repair:
- Detects missing opening quotes
- Detects missing closing quotes
- Handles different quote types (", ‘, “, “)
- Context-aware termination:
- Uses context to determine string boundaries
- Different termination rules for object keys vs values vs array elements
- Escape sequence handling:
- Processes Unicode escapes (\u, \x)
- Handles standard escapes (\n, \t, \r, \b)
- Special cases:
- Doubled quotes handling
- Unmatched delimiters
- Streaming stability mode
- HTML/Markdown content within strings
- Boolean/null detection: Delegates to
parse_boolean_or_null()
when appropriate
parse_number()
→ float | int | str | JSONReturnType
- Number parsing with format flexibility
- Handles various number formats and edge cases
- Falls back to string parsing when number format is invalid
- Context-aware parsing (different behavior in arrays)
parse_boolean_or_null()
→ bool | str | None
- Boolean and null literal parsing
- Case-insensitive matching for true/false/null
- Partial matching with rollback on failure
Utility and Helper Functions
parse_comment()
→ str
- Comment removal logic
- Handles line comments (# and //)
- Handles block comments (/* */)
- Returns empty string to effectively remove comments
Navigation and Character Management
get_char_at(count: int = 0)
→ str | Literal[False]
- Safe character access with bounds checking
- Returns False when at end of input
- Critical for all parsing operations
skip_whitespaces_at(idx: int = 0, move_main_index=True)
→ int
- Whitespace skipping utility
- Used throughout parsing to handle formatting variations
- Can optionally move main parsing index
skip_to_character(character: str, idx: int = 0)
→ int
- Character search utility
- Used for finding delimiters, terminators, etc.
- Essential for lookahead operations in repair logic
Context Management
JsonContext
Class
- Tracks parsing context (object key, object value, array)
- Critical for context-aware string parsing
- Determines how to handle ambiguous cases
ContextValues
Enum
- Defines the different parsing contexts:
OBJECT_KEY
OBJECT_VALUE
ARRAY
Key Repair Strategies
1. Quote Repair
- Automatic quote addition for unquoted strings
- Quote type normalization
- Missing quote detection using context and lookahead
2. Delimiter Repair
- Missing commas between array elements and object members
- Missing colons after object keys
- Missing opening/closing brackets and braces
3. Context-Aware Parsing
- Different parsing rules based on current context
- Smart termination of strings based on surrounding structure
- Fallback mechanisms when primary parsing fails
4. Streaming Stability
- Special mode for incomplete/streaming JSON
- Preserves partial content rather than aggressive cleanup
- Handles incomplete escape sequences
5. Type Coercion and Fallbacks
- Numbers that fall back to strings
- Strings that contain boolean/null values
- Object detection within arrays
Critical Edge Case Handling
- Empty strings and null values
- Malformed escape sequences
- Mixed quote types
- Missing structural elements (braces, brackets)
- Embedded objects without proper delimiters
- Comments and non-JSON content
- Unicode and special character handling
- Streaming/incomplete JSON
All these functions work together to provide comprehensive JSON repair capabilities, with the parse_string()
function being the most complex as it handles the majority of real-world JSON formatting issues.