QUADRATIC PERFORMANCE ISSUE
Problem Statement
JsonRemedy exhibits quadratic time complexity for non-trivial input sizes, making it unusable for real-world applications. Performance degrades catastrophically as input size increases:
- 10 objects (0.4 KB): 1.7ms - acceptable
- 100 objects (4.2 KB): 150ms - slow but usable
- 1000 objects (44 KB): 15+ seconds - UNUSABLE
This represents a quadratic O(n²) complexity pattern where doubling input size quadruples processing time, rather than the expected linear O(n) complexity for text processing.
Impact Assessment
Current State
- ❌ Unusable for production: Any JSON > 10KB becomes prohibitively slow
- ❌ No streaming capability: Cannot process large files incrementally
- ❌ Memory inefficient: Likely creating multiple string copies
- ❌ Blocks user experience: 15+ second delays for moderate files
Expected Performance
A properly optimized JSON processor should handle:
- 1MB JSON files: < 100ms
- 10MB JSON files: < 1 second
- Streaming processing: Constant memory usage regardless of file size
Root Cause Investigation Process
Phase 1: Performance Profiling
# 1. Create minimal reproduction case
mix run scripts/perf_repro.exs
# 2. Profile with :fprof
mix run scripts/profile_layers.exs
# 3. Memory analysis with :observer
mix run scripts/memory_analysis.exs
Phase 2: Layer-by-Layer Analysis
Suspected Culprits (in priority order):
Layer 3: Syntax Normalization - Most Complex
- Character-by-character parsing with state tracking
- Multiple regex passes over the same content
- String concatenation in loops (potential O(n²))
- Context stack operations
Layer 1: Content Cleaning - Regex Heavy
- Multiple regex passes
- String replacements creating new strings
- Potential backtracking in complex patterns
Layer 2: Structural Repair - State Machine
- Character-by-character iteration
- Stack operations for nesting tracking
- String building operations
Phase 3: Specific Investigations
A. String Building Analysis
# Check if we're using inefficient string concatenation
# BAD: result = result <> new_char # O(n²)
# GOOD: Use IO lists and IO.iodata_to_binary/1 # O(n)
B. Regex Performance
# Profile regex operations
:timer.tc(fn -> Regex.run(pattern, large_string) end)
# Check for catastrophic backtracking
# Test with pathological inputs
C. Memory Allocation Patterns
# Track memory growth during processing
:erlang.memory(:total) before/after each layer
D. Algorithm Complexity
# Measure processing time vs input size
# Should be linear O(n), not quadratic O(n²)
sizes = [100, 200, 400, 800, 1600]
times = Enum.map(sizes, &measure_processing_time/1)
# Plot and analyze growth pattern
Phase 4: Benchmarking Framework
Create comprehensive benchmarks:
# scripts/comprehensive_benchmark.exs
defmodule PerformanceBenchmark do
def run_scaling_test do
sizes = [10, 50, 100, 500, 1000, 5000]
for size <- sizes do
input = generate_test_json(size)
{time, _result} = :timer.tc(fn ->
JsonRemedy.repair(input)
end)
IO.puts("Size: #{size} objects, Time: #{time}μs")
end
end
def profile_layers(input) do
# Profile each layer individually
context = %{repairs: [], options: []}
layers = [
JsonRemedy.Layer1.ContentCleaning,
JsonRemedy.Layer2.StructuralRepair,
JsonRemedy.Layer3.SyntaxNormalization,
JsonRemedy.Layer4.Validation
]
for layer <- layers do
{time, _result} = :timer.tc(fn ->
layer.process(input, context)
end)
IO.puts("#{layer}: #{time}μs")
end
end
end
Likely Fixes
Immediate Optimizations
Replace String Concatenation with IO Lists
# Instead of: result = result <> char # Use: iolist = [iolist, char] # Then: IO.iodata_to_binary(iolist)
Eliminate Multiple Passes
# Instead of multiple regex operations: input |> fix_quotes() |> fix_commas() |> fix_booleans() # Use single-pass character processing: single_pass_normalization(input)
Add Early Exit Conditions
# Skip expensive processing for already-valid JSON case Jason.decode(input) do {:ok, _} -> input # Fast path {:error, _} -> run_repair_pipeline(input) end
Architectural Changes
Streaming Architecture
- Process JSON in chunks
- Maintain parsing state between chunks
- Constant memory usage
Incremental Processing
- Parse and validate as we repair
- Stop processing when valid JSON is achieved
- Layer-specific early exits
Smarter Layer Selection
- Analyze input to determine which layers are needed
- Skip unnecessary processing
- Use heuristics to predict complexity
Investigation Scripts
scripts/perf_repro.exs
# Minimal script to reproduce the quadratic behavior
defmodule PerfRepro do
def run do
sizes = [10, 100, 1000]
for size <- sizes do
json = create_malformed_json(size)
{time, _} = :timer.tc(fn ->
JsonRemedy.repair(json)
end)
rate = byte_size(json) * 1_000_000 / time / 1024 # KB/s
IO.puts("Size: #{size} objects (#{Float.round(byte_size(json)/1024, 1)} KB)")
IO.puts("Time: #{Float.round(time/1000, 1)}ms")
IO.puts("Rate: #{Float.round(rate, 1)} KB/s")
IO.puts("---")
end
end
defp create_malformed_json(num_objects) do
objects = for i <- 1..num_objects do
~s|{id: #{i}, name: 'Item #{i}', active: True, data: [1, 2, 3,]}|
end
"[" <> Enum.join(objects, ", ") <> "]"
end
end
PerfRepro.run()
scripts/profile_layers.exs
# Profile individual layers to identify bottleneck
# Use :fprof for detailed function-level profiling
Success Criteria
Fix is successful when:
- ✅ Linear complexity: O(n) processing time
- ✅ Reasonable throughput: > 1MB/second for malformed JSON
- ✅ Memory efficiency: < 2x input size memory usage
- ✅ Large file support: Handle 10MB+ files in < 10 seconds
- ✅ Streaming capability: Process arbitrarily large files
Timeline
- Week 1: Investigation and profiling
- Week 2: Implement string building optimizations
- Week 3: Single-pass processing architecture
- Week 4: Streaming support and final optimization
Status
- Performance profiling completed
- Root cause identified
- Optimization strategy defined
- Implementation started
- Benchmarks passing
- Ready for production use
This quadratic performance issue makes JsonRemedy unsuitable for production use and must be resolved before any real-world deployment.