YAML/JSON Conversion Feasibility Study for Pipeline_ex

Executive Summary

Converting between YAML and JSON in Pipeline_ex is highly feasible with a hybrid build/buy approach:

Buy: Use existing libraries (yaml_elixir for parsing, ymlr for encoding, jason for JSON)
Build: Create a thin conversion layer that handles pipeline-specific concerns

This approach supports the goal of using YAML for human readability while internally working with JSON for LLM interactions.

Current State Analysis

What We Have

YAML Parsing: ✅ yaml_elixir v2.11 (already in use)
JSON Handling: ✅ jason (already in use)
YAML Encoding: ❌ Not currently available

What We Need

YAML encoding capability
Bidirectional conversion utilities
Validation after conversion
Format-specific optimizations

Build vs Buy Analysis

Option 1: Pure “Buy” - Use Existing Libraries

Libraries to Add:

# mix.exs
defp deps do
  [
    {:ymlr, "~> 5.1"},  # YAML encoding
    # existing: {:yaml_elixir, "~> 2.11"}
    # existing: {:jason, "~> 1.4"}
  ]
end

Pros:

Minimal development effort
Well-tested libraries
Community support

Cons:

No pipeline-specific optimizations
Manual handling of conversion edge cases
No integrated validation

Option 2: Pure “Build” - Custom Implementation

Pros:

Complete control over conversion
Pipeline-specific optimizations
Integrated validation

Cons:

Significant development effort
Maintenance burden
Reinventing the wheel
YAML spec complexity

Option 3: Hybrid Approach (Recommended) ✅

Implementation:

defmodule Pipeline.Format.Converter do
  @moduledoc """
  Handles YAML<->JSON conversion for pipeline configurations.
  Preserves pipeline-specific semantics during conversion.
  """
  
  alias Pipeline.Validation.Schemas.WorkflowSchema
  
  @doc """
  Convert YAML string to JSON with validation
  """
  def yaml_to_json(yaml_string, opts \\ []) do
    with {:ok, data} <- parse_yaml(yaml_string),
         {:ok, normalized} <- normalize_pipeline_data(data),
         {:ok, validated} <- validate_if_requested(normalized, opts),
         {:ok, json} <- encode_json(validated, opts) do
      {:ok, json}
    end
  end
  
  @doc """
  Convert JSON string to YAML with validation
  """
  def json_to_yaml(json_string, opts \\ []) do
    with {:ok, data} <- Jason.decode(json_string),
         {:ok, normalized} <- normalize_pipeline_data(data),
         {:ok, validated} <- validate_if_requested(normalized, opts),
         yaml <- encode_yaml(validated, opts) do
      {:ok, yaml}
    end
  end
  
  # Private functions handle pipeline-specific concerns
  defp normalize_pipeline_data(data) do
    data
    |> ensure_string_keys()
    |> normalize_step_types()
    |> handle_null_values()
    |> preserve_numeric_types()
  end
end

Implementation Design

1. Core Conversion Module

defmodule Pipeline.Format do
  defmodule Converter do
    # YAML <-> JSON conversion
  end
  
  defmodule Normalizer do
    # Pipeline-specific data normalization
  end
  
  defmodule Validator do
    # Format validation using Exdantic
  end
  
  defmodule Cache do
    # Cache converted formats for performance
  end
end

2. Pipeline-Specific Handling

Features to Preserve:

Step type consistency
Prompt template structures
Variable references ({{var}})
Function definitions
Conditional expressions

Features to Normalize:

Key format (string vs atom)
Null/nil representation
Number types (int vs float)
Boolean values

3. LLM Integration Optimizations

defmodule Pipeline.Format.LLMOptimizer do
  @doc """
  Optimize JSON for LLM consumption
  """
  def optimize_for_llm(json_data) do
    json_data
    |> remove_null_values()
    |> flatten_single_item_arrays()
    |> simplify_boolean_fields()
    |> add_schema_hints()
  end
  
  @doc """
  Prepare YAML for human editing
  """
  def optimize_for_human(yaml_data) do
    yaml_data
    |> add_helpful_comments()
    |> use_readable_multiline_strings()
    |> group_related_fields()
    |> sort_keys_logically()
  end
end

4. Validation Integration

defmodule Pipeline.Format.ValidatedConverter do
  def yaml_to_validated_json(yaml_string) do
    with {:ok, data} <- Converter.yaml_to_json(yaml_string),
         {:ok, validated} <- WorkflowSchema.validate(data) do
      {:ok, Jason.encode!(validated)}
    end
  end
end

Migration Strategy

Phase 1: Foundation (Week 1)

Add ymlr dependency
Create basic converter module
Add conversion tests
Document conversion limitations

Phase 2: Integration (Week 2)

Integrate with existing pipeline loading
Add format detection
Create CLI conversion tools
Add performance benchmarks

Phase 3: Optimization (Week 3)

Add caching layer
Implement LLM optimizations
Add streaming for large files
Create format migration tools

Phase 4: Tooling (Week 4)

VS Code extension support
Format validation commands
Batch conversion utilities
Documentation generation

Technical Considerations

1. Security

Never use atoms: true with untrusted input
Validate data structure before conversion
Sanitize file paths and names
Limit file sizes for conversion

2. Performance

Cache frequently converted formats
Use streaming for large files (>10MB)
Batch conversions when possible
Consider background processing

3. Compatibility

Document YAML features that don’t survive conversion
Provide migration guides
Support both formats in APIs
Version converted formats

4. Error Handling

def handle_conversion_error(error, context) do
  case error do
    {:yaml_parse_error, reason} ->
      suggest_yaml_fixes(reason, context)
    
    {:json_encode_error, reason} ->
      identify_problematic_data(reason, context)
    
    {:validation_error, errors} ->
      format_validation_errors(errors, context)
  end
end

Cost-Benefit Analysis

Benefits

Developer Experience: YAML for humans, JSON for machines
LLM Integration: Optimal format for AI interactions
Tooling Support: Better IDE integration with JSON Schema
Flexibility: Support multiple input formats
Performance: Cached conversions, optimized formats

Costs

Development Time: ~1 week for full implementation
Dependencies: One additional dependency (ymlr)
Complexity: Additional conversion layer
Testing: Need comprehensive conversion tests

Recommendation

Implement the hybrid approach with these priorities:

Immediate: Add ymlr and basic conversion utilities
Short-term: Integrate with pipeline validation using Exdantic
Medium-term: Add LLM optimizations and caching
Long-term: Build comprehensive tooling ecosystem

This approach provides the best balance of:

Quick implementation using proven libraries
Pipeline-specific optimizations where needed
Future flexibility for format evolution
Solid foundation for LLM integration

The investment is justified by improved developer experience and better LLM integration capabilities.