JsonRemedy Enhancement Technical Specification
Implementation Guide for Missing Core Functions
Executive Summary
Based on analysis of the original JSON repair library and the current Elixir implementation, we have identified 19 critical functions that need to be added to achieve feature parity and enhance the robustness of the layered architecture. This document provides a comprehensive implementation plan that maintains the existing stable codebase while systematically adding the missing functionality.
Current Status: The existing Elixir code is stable with all tests passing. The proposed enhancements will be additive and backward-compatible.
1. Architecture Overview
1.1 Current State
- Layer 1: Content Cleaning (stable)
- Layer 2: Structural Repair (stable)
- Layer 3: Syntax Normalization (stable)
- LayerBehaviour: Contracts and utilities (stable)
1.2 Enhancement Strategy
- Additive Approach: No breaking changes to existing APIs
- Gradual Implementation: Prioritized rollout in 3 phases
- Backward Compatibility: All existing tests continue to pass
- Enhanced Testing: Each addition includes comprehensive test coverage
2. Implementation Phases
Phase 1: Core Context Management (High Priority)
Timeline: 2-3 days
Risk: Low - foundational enhancements
Phase 2: Enhanced Parsing Logic (Medium Priority)
Timeline: 3-4 days
Risk: Medium - more complex integrations
Phase 3: Advanced Features (Low Priority)
Timeline: 1-2 days
Risk: Low - optional enhancements
3. Phase 1: Core Context Management
3.1 Context System Enhancement
3.1.1 JsonContext Module
Location: json_remedy/context/json_context.ex
(new file)
Purpose: Centralized context tracking for all layers
Components:
# Types to implement
@type context_value :: :object_key | :object_value | :array
@type context_state :: %{
current: context_value(),
stack: [context_value()],
position: non_neg_integer(),
in_string: boolean(),
string_delimiter: String.t() | nil
}
Integration Points:
- Layer 2: Enhanced state machine decision making
- Layer 3: Context-aware syntax normalization
- All layers: Improved string content detection
3.1.2 ContextValues Module
Location: json_remedy/context/context_values.ex
(new file)
Purpose: Enum definitions and context utilities
Functions:
- Context transition logic
- Context validation
- Context-aware parsing decisions
3.1.3 String Parsing State Management
Location: Enhance json_remedy/layer3/syntax_normalization.ex
New State Flags:
missing_quotes
: Track unquoted string detectionrstring_delimiter_missing
: Track unclosed stringsunmatched_delimiter
: Track quote paritydoubled_quotes
: Track doubled quote patterns
Integration:
- Add to existing parsing state in Layer 3
- Enhance quote normalization logic
- Improve string boundary detection
3.2 Testing Requirements for Phase 1
3.2.1 New Test Files
test/unit/context/json_context_test.exs
test/unit/context/context_values_test.exs
test/integration/context_integration_test.exs
3.2.2 Enhanced Existing Tests
- Layer 2: Add context-aware structural repair tests
- Layer 3: Add state flag validation tests
- Integration: Cross-layer context consistency tests
3.2.3 Test Coverage Requirements
- Unit Tests: 95%+ coverage for new context modules
- Integration Tests: Context state preservation across layers
- Edge Cases: Context corruption and recovery scenarios
4. Phase 2: Enhanced Parsing Logic
4.1 Utility Functions
4.1.1 Character Navigation Module
Location: json_remedy/utils/char_utils.ex
(new file)
Functions:
get_char_at/3
: Safe character access with bounds checkingskip_to_character/3
: Character search with lookaheadskip_whitespaces_at/3
: Context-aware whitespace handling
UTF-8 Safety: All functions must use String.length()
and String.at()
for proper Unicode handling
4.1.2 String Delimiter Management
Location: Enhance json_remedy/layer3/syntax_normalization.ex
New Constants:
STRING_DELIMITERS
: Supported quote types["\"", "'", """, """]
- Quote normalization mappings
- Delimiter precedence rules
4.1.3 Specialized Parsing Functions
Location: json_remedy/parsers/
(new directory)
Modules:
boolean_null_parser.ex
:parse_boolean_or_null/2
: Enhanced boolean/null parsing- Case-insensitive matching
- Partial matching with rollback
number_parser.ex
:parse_number/2
: Number parsing with fallback- Format flexibility and validation
- Context-aware behavior
object_parser.ex
:parse_object/2
: Object-specific parsing logic- Enhanced Layer 2 structural repairs
- Missing brace detection and repair
array_parser.ex
:parse_array/2
: Array-specific parsing logic- Enhanced Layer 2 structural repairs
- Missing bracket detection and repair
4.2 Integration Strategy
4.2.1 Layer 2 Enhancements
- Integrate specialized parsers into state machine
- Add object/array specific repair logic
- Enhance context tracking during structural repairs
4.2.2 Layer 3 Enhancements
- Integrate utility functions for character navigation
- Add boolean/null parser to literal normalization
- Enhance number parsing with fallback logic
4.2.3 Cross-Layer Communication
- Context state passing between layers
- Repair action coordination
- Position tracking accuracy
4.3 Testing Requirements for Phase 2
4.3.1 New Test Files
test/unit/utils/char_utils_test.exs
test/unit/parsers/boolean_null_parser_test.exs
test/unit/parsers/number_parser_test.exs
test/unit/parsers/object_parser_test.exs
test/unit/parsers/array_parser_test.exs
test/integration/enhanced_parsing_test.exs
4.3.2 Critical Test Areas
- UTF-8 Safety: All parsers with multi-byte characters
- Performance: Large input stress testing
- Memory: Parser memory usage validation
- Edge Cases: Malformed and pathological inputs
5. Phase 3: Advanced Features
5.1 Streaming Support
5.1.1 Stream State Management
Location: json_remedy/stream/
(new directory)
Purpose: Handle incomplete/streaming JSON input
Components:
stream_stable
flag implementation- Partial content preservation
- Incomplete structure handling
5.1.2 Enhanced Comment Parsing
Location: Enhance json_remedy/layer1/content_cleaning.ex
Improvements:
- Nested comment handling
- Comment type detection
- Context-aware comment removal
5.2 Testing Requirements for Phase 3
5.2.1 New Test Files
test/unit/stream/stream_support_test.exs
test/integration/streaming_integration_test.exs
6. Testing Strategy
6.1 Test Categories
6.1.1 Unit Tests (Per Module)
- Function Coverage: Test every public function
- Edge Cases: Boundary conditions and error scenarios
- UTF-8 Safety: Unicode character handling
- Performance: Basic timing and memory checks
6.1.2 Integration Tests (Cross-Layer)
- Context Preservation: State consistency across layers
- Repair Coordination: Multiple layers working together
- End-to-End: Complete repair workflows
6.1.3 Critical Issue Tests (Existing + Enhanced)
- UTF-8 Position Tracking: Accurate with new utilities
- Memory Leak Detection: Enhanced parsers don’t leak
- Performance Stress: New functions under load
- Concurrent Safety: Thread safety validation
6.1.4 Regression Tests
- Existing Functionality: All current tests must pass
- API Compatibility: No breaking changes
- Performance Regression: No significant slowdowns
6.2 Test Metrics and Targets
Category | Coverage Target | Performance Target |
---|---|---|
Unit Tests | 95%+ | < 1ms per test |
Integration Tests | 90%+ | < 10ms per test |
Critical Issue Tests | 100% | < 100ms per test |
Performance Tests | N/A | < 50ms for large inputs |
6.3 Test Automation
6.3.1 Continuous Integration
- All tests run on every commit
- Performance regression detection
- Memory usage monitoring
- UTF-8 specific test suite
6.3.2 Test Organization
test/
├── unit/
│ ├── context/
│ ├── utils/
│ ├── parsers/
│ └── stream/
├── integration/
├── critical/
└── performance/
7. Implementation Guidelines
7.1 Code Quality Standards
7.1.1 Documentation
- Module Docs: Comprehensive @moduledoc for all new modules
- Function Docs: @doc with examples for all public functions
- Type Specs: @spec for all functions
- Examples: Doctests where appropriate
7.1.2 Error Handling
- Graceful Degradation: Never crash, always return valid result
- Error Reporting: Clear error messages with context
- Logging: Appropriate repair action logging
- Recovery: Automatic recovery from parsing errors
7.1.3 Performance
- UTF-8 Aware: Always use
String.length()
notbyte_size()
- Memory Efficient: Avoid large intermediate data structures
- Time Complexity: Maintain O(n) parsing where possible
- Benchmarking: Performance tests for new functions
7.2 Integration Approach
7.2.1 Backward Compatibility
- API Preservation: No changes to existing public APIs
- Default Behavior: Existing behavior unchanged by default
- Optional Features: New features are opt-in where possible
- Migration Path: Clear upgrade path for advanced features
7.2.2 Feature Flags
- Gradual Rollout: Feature flags for new functionality
- Testing: Easy to test with/without new features
- Debugging: Ability to disable problematic features
- Performance: Measure impact of new features
8. Risk Assessment and Mitigation
8.1 Technical Risks
Risk | Probability | Impact | Mitigation |
---|---|---|---|
UTF-8 position bugs | Medium | High | Comprehensive UTF-8 test suite |
Performance regression | Low | Medium | Performance benchmarking |
Memory leaks | Low | High | Memory usage monitoring |
Breaking changes | Low | High | Extensive regression testing |
Integration complexity | Medium | Medium | Phased implementation |
8.2 Implementation Risks
Risk | Probability | Impact | Mitigation |
---|---|---|---|
Schedule overrun | Medium | Low | Prioritized implementation phases |
Testing overhead | High | Low | Automated testing infrastructure |
Feature creep | Medium | Medium | Strict scope adherence |
Documentation debt | High | Medium | Documentation-first approach |
9. Success Criteria
9.1 Functional Requirements
- All 19 identified functions implemented
- 100% of existing tests continue to pass
- No breaking changes to public APIs
- Enhanced context-aware parsing capabilities
9.2 Quality Requirements
- 95%+ test coverage for new code
- No memory leaks detected
- No performance regression > 5%
- All UTF-8 edge cases handled correctly
9.3 Documentation Requirements
- Complete API documentation
- Migration guide for advanced features
- Performance characteristics documented
- Testing strategy documented
10. Conclusion
This enhancement plan provides a systematic approach to adding the missing core functions while maintaining the stability and quality of the existing codebase. The phased implementation allows for:
- Incremental Value: Each phase delivers tangible improvements
- Risk Management: Early phases are low-risk foundational changes
- Quality Assurance: Comprehensive testing at each phase
- Maintainability: Clean architecture and documentation
The result will be a JSON repair library that matches the functionality of the original while providing the benefits of the layered architecture: better testability, maintainability, and extensibility.
Estimated Total Implementation Time: 6-9 days
Estimated Test Development Time: 4-6 days
Total Project Timeline: 2-3 weeks including documentation and review