← Back to 20250617

ELIXACT LATEST GAP ANALYSIS 202506131704

Documentation for ELIXACT_LATEST_GAP_ANALYSIS_202506131704 from the Ds ex repository.

Of course. I have reviewed the dspy source code and your elixact library, including the latest commits. This is a comprehensive gap analysis of Pydantic’s usage in dspy versus elixact’s capabilities, designed to guide your port to Elixir (dspex).

Executive Summary

elixact is an excellent and surprisingly comprehensive equivalent to Pydantic for the purposes of porting dspy. The core features like schema definition, nested models, validation, and even JSON Schema generation are well-supported. Your recent additions of custom validators and detailed constraints put elixact on par with, and in some areas ahead of, how dspy currently uses Pydantic.

There is one critical gap: dspy uses Pydantic’s create_model function for dynamic schema generation at runtime. This is a powerful feature of Python’s dynamic nature that is challenging to replicate in a compiled language like Elixir. This will be the most significant architectural hurdle for dspex.


Detailed Feature Gap Analysis

Here is a breakdown of how Pydantic is used in dspy and how elixact maps to each feature.

Pydantic Featuredspy Usage Analysiselixact Capability AnalysisRecommendation for dspex
Core Schema Definition (BaseModel)Heavily Used. This is the foundation of dspy.Signature. All input/output fields are defined in classes that inherit from BaseModel.Fully Supported. This is the primary purpose of elixact. The use Elixact macro with a schema do ... end block is the direct equivalent.Direct 1:1 porting. dspex.Signature modules will use Elixact.
Nested Models/SchemasUsed. Required for complex outputs. For example, dspy.ToolCalls contains a list of ToolCall models. predict.avatar.Action is also a nested model.Fully Supported. elixact supports this through references (field :user, UserSchema) and the new object type for fixed maps.Straightforward to port. Define nested schemas as separate elixact modules and reference them.
Standard Types & Type HintingHeavily Used. str, int, list, dict, Union, Optional, Any, Callable are used throughout all signatures.Well Supported. elixact provides :string, :integer, :float, :boolean, {:array, type}, {:map, {k, v}}, and {:union, [types]}. optional() handles Optional, and :any can be used for Any. The Callable type for tool functions will need a custom dspex solution.Map Python types to their elixact equivalents. For Callable, you can either pass function captures (&MyMod.my_func/1) and handle them manually or create a custom Dspex.CallableType.
Custom Types (BaseType)Used. dspy.Image, dspy.Audio, and dspy.Tool are custom types that inherit from BaseType and implement a format() method.Supported via Behaviours. The Elixir way to model this is with a behaviour. You can define a Dspex.BaseType that use Elixact.Type and requires format/1 and json_schema/0 callbacks.Define a Dspex.BaseType behaviour. Port dspy.Image and others as structs that implement this behaviour.
Model Validators (@model_validator)Used. dspy.Image and dspy.Audio use this to allow flexible initialization (e.g., from a raw string URL or a dict {'url': ...}). This is important for user experience.Fully Supported. Your recent commit adding with_validator/2 is the perfect analog. It allows attaching custom validation and coercion logic to a type.This is a major win. Use with_validator/2 on your custom Dspex.Image and Dspex.Audio types to replicate this functionality cleanly.
Model Serializers (@model_serializer)Used. dspy.BaseType uses this to control how custom types are serialized, calling their format() method.Supported via Convention. Elixir doesn’t use decorators, but this pattern is common. dspex can define a format/1 function in the Dspex.BaseType behaviour and the adapter layer can call it. Protocols like Jason.Encoder can also be used for deeper integration.No gap. Implement a format/1 function on the custom type structs. The dspex adapter layer will be responsible for calling it during serialization.
Dynamic Model Creation (create_model)CRITICALLY Used. This is the most significant gap. dspy uses this in two key places:
1. adapters/json_adapter.py: To dynamically create a Pydantic model from a dspy.Signature to generate a precise JSON schema for an LLM’s structured output feature.
2. adapters/types/tool.py: To create a temporary wrapper model for validating complex nested arguments passed to a tool.
Significant Gap. As a compiled language, Elixir cannot easily define new modules at runtime like Python can. elixact relies on macros that run at compile time.High-Difficulty Workaround Needed. The most viable, though complex, solution is runtime code generation and evaluation. The dspex adapter would have to:
1. Generate a string containing a complete defmodule ... do use Elixact ... end definition.
2. Use Code.eval_string/3 to compile and load this module in memory.
3. Use the dynamically-created module for validation or schema generation.
This is a major implementation task that requires careful handling of performance and security.
JSON Schema Generation (model_json_schema)CRITICALLY Used. Essential for dspy.Tool function calling and the JSONAdapter.Fully Supported. Elixact.JsonSchema.from_schema/1 is the direct equivalent and appears to be robust, handling nested definitions and references correctly.Direct 1:1 mapping. This is a huge advantage for the port and a core feature you’ve already implemented correctly.
Field-level ConstraintsUsed sparingly in dspy itself, but the functionality is key to Pydantic. dspy mostly relies on its own field descriptions.Fully Supported & Exceeds dspy’s use. elixact’s min_length, max_length, gt, lt, format, and choices are excellent. This is an area where elixact is stronger than dspy’s current Pydantic usage.You can use these constraints to build a more robust and self-documenting dspex from the start. For example, the argument validation in dspy.Tool can be replaced by a more declarative elixact schema.
Custom Error MessagesNot used in dspy.Fully Supported. Your recent commits added with_error_message/3 and with_error_messages/2.This is a “better-than-the-original” feature. It’s not required for a 1:1 port but will significantly improve the developer experience of dspex.

In-Depth Analysis of Key Areas

1. Nested Validations and Custom Logic

Finding: Yes, nested validations are implicitly used, and custom validation logic is crucial.

  • dspy.Tool._validate_and_parse_args: This method iterates through a tool’s arguments. If an argument is a Pydantic model itself, it validates it. It even dynamically creates a Pydantic model (create_model("Wrapper", ...) to handle deeply nested types like list[list[MyModel]]).
  • elixact Status: Excellent. Your recent additions cover this perfectly.
    • validate_schema in validator.ex handles nested schemas via {:ref, Module}.
    • with_validator allows you to define arbitrary Elixir functions for validation, which is even more powerful than Pydantic’s validators for complex business logic.

2. The create_model Challenge

This is the most important point of the analysis.

  • dspy’s Use Case: In adapters/json_adapter.py, the _get_structured_outputs_response_format function takes a dspy.Signature at runtime, inspects its output fields, and builds a new Pydantic class on the fly. This class is then used to generate a JSON schema tailored specifically for that signature’s outputs. This is how dspy tells an LLM like GPT-4 what JSON structure to return.

  • Why it’s hard in Elixir: Elixir modules are compiled. You can’t just “create a new module” in a running system as you can with a class in Python.

  • dspex Action Plan:

    1. The dspex JSONAdapter will need a function that takes a dspex signature module as input.
    2. This function will build a string of Elixir code, for example:
      """
      defmodule Dspex.Dynamic.Schema_#{unique_id} do
        use Elixact
        schema do
          field :#{field_name_1}, :#{field_type_1}
          field :#{field_name_2}, :#{field_type_2}
          ...
        end
      end
      """
      
    3. It will then call Code.eval_string(generated_code_string).
    4. Finally, it will use the returned module Dspex.Dynamic.Schema_... to call Elixact.JsonSchema.from_schema/1.

    This approach is feasible but requires careful implementation to manage the dynamically created modules and potential performance overhead.

3. JSON Schema Generation

  • dspy’s Use Case: model_json_schema() is called whenever dspy needs to describe a data structure to an LLM. This is central to function/tool calling. dspy.Tool uses it to create the function definition, and JSONAdapter uses it for structured output.
  • elixact Status: Perfect Match. Elixact.JsonSchema.from_schema/1 does exactly this. The ReferenceStore is a smart addition to handle complex, nested schemas with shared definitions, which is something Pydantic also does under the hood.

Conclusion and Final Recommendations

Your work on elixact has resulted in a library that is remarkably well-suited for this port. You’ve anticipated and implemented nearly all the necessary “advanced” Pydantic features that dspy relies on.

Your dspex implementation plan should prioritize:

  1. Solving the Dynamic Schema Generation Problem: This is the only major blocker. A proof-of-concept using Code.eval_string for the JSONAdapter should be your top priority.
  2. Implementing a Custom Type System with Behaviours: Create a Dspex.BaseType behaviour that uses Elixact.Type. This will be the foundation for porting dspy.Image, dspy.Tool, etc.
  3. Leveraging elixact’s Strengths: Use elixact’s powerful constraints (min_length, format, etc.) and custom validators (with_validator) to make your dspex code more declarative and robust than the original Python code where possible.

The rest of the port appears to be a relatively straightforward, albeit large, translation task. Congratulations on building such a solid foundation with elixact.