Excellent point. A thorough dependency analysis is a crucial step that can reveal hidden complexities and inform our architectural choices. Let’s review the pyproject.toml
file, map each dependency to its Elixir-equivalent strategy, and see if this uncovers any gaps in our current plan.
Dependency Analysis: Mapping DSPy’s Python Dependencies to the DSPEx
Elixir Port
This document analyzes the dependencies of the original DSPy project to ensure our Elixir port (DSPEx
) has a clear strategy for each required piece of functionality.
Core Dependencies ([project.dependencies]
)
Python Dependency | Version | Purpose in DSPy | DSPEx Porting Strategy | Elixir Equivalent / Notes |
---|---|---|---|---|
backoff , tenacity | >=2.2 , >=8.2.3 | Resilience: Provides exponential backoff and retry logic for network requests (e.g., LM/RM API calls). | Native OTP + foundation . | Our plan correctly identifies this as a core OTP strength. The foundation library’s circuit breaker (:fuse ) and our own supervised Task retry logic will handle this idiomatically. This is a superior, native replacement. |
litellm | >=1.60.3 | LM Abstraction: The primary interface for communicating with various LLM provider APIs. | Custom + foundation . | We will build our own DSPEx.Client.LM using Tesla for HTTP calls and Jason for JSON. The resilience patterns will come from foundation . This is a core part of the port. |
pydantic | >=2.0 | Data Validation & Schemas: The backbone of dspy.Signature . Used for defining structured inputs/outputs and for type validation. | Native Elixir + Macros. | Our defsignature macro will generate Elixir structs. For validation, we can use pattern matching in function heads and guards, which is more idiomatic than Pydantic’s runtime validation. Libraries like typed_struct could be used for stricter compile-time checks if needed. |
diskcache | >=5.6.0 | On-Disk Caching: Provides the persistent caching layer for LM responses. | Native OTP GenServer . | Our planned DSPEx.Cache GenServer will manage file I/O for on-disk caching. This is a straightforward native implementation, replacing the need for an external library. |
cachetools | >=5.5.0 | In-Memory Caching: Provides the in-memory (LRU) cache. | Native OTP ETS . | Erlang Term Storage (ETS) is the native, highly-performant, concurrent-safe solution for in-memory caching. Our DSPEx.Cache GenServer will own the ETS table. This is a superior replacement. |
joblib | ~=1.3 | Parallelism: Likely used for parallel processing loops where ThreadPoolExecutor might be used. | Native OTP Task Module. | OTP’s Task.async_stream is the direct, more efficient replacement for thread-based parallelism. Our DSPEx.Evaluate module will be built on this. This is a superior, native replacement. |
tqdm | >=4.66.1 | Progress Bars: Displays progress bars for long-running operations like evaluation or optimization. | progress_bar Hex package. | The progress_bar package is the standard Elixir equivalent for console progress bars. This is a simple dependency swap. |
rich | >=13.7.1 | Rich Text Output: Used for pretty-printing in the console (e.g., inspect_history ). | IO.ANSI or makeup Hex package. | Elixir’s built-in IO.ANSI can handle basic color formatting. For more advanced syntax highlighting or table rendering, the makeup or table_rex packages could be used. |
ujson | >=5.8.0 | JSON Processing: A faster alternative to the standard json library. | Jason Hex package. | Jason is the de-facto standard for high-performance JSON processing in Elixir. This is a direct dependency swap. |
json-repair | >=0.30.0 | Error Correction: Used by the JSONAdapter to fix malformed JSON from LMs. | Custom or Python Interop. | This is a gap identified. There is no standard, well-known JSON repair library in Elixir. Decision: For the JSONAdapter in Layer 2, we will likely need to use a Port to call the Python json-repair library as a temporary measure, or invest in writing a native Elixir version. |
optuna | >=3.4.0 | Optimization: The core engine for Bayesian Optimization used in MIPROv2 . | Python Interop. | Critical Dependency. As identified, this has no direct Elixir equivalent. Our plan to use a Port to communicate with an optuna Python script is the only viable path forward for porting MIPROv2 . |
datasets | >=2.14.6 | Data Loading: Used to load standard ML datasets (e.g., from Hugging Face). | Custom + :req or Tesla . | Elixir doesn’t have a direct equivalent to the Hugging Face datasets library. Our DSPEx.DataLoader will need to implement its own logic to fetch data from the Hugging Face Hub API using an HTTP client like Tesla . |
pandas | >=2.1.1 | Data Handling: Used in Evaluate for displaying results tables and potentially in data loaders. | Explorer Hex package. | The Explorer library, built on top of Rust’s Polars via NIFs, is the Elixir ecosystem’s answer to pandas . It provides high-performance DataFrames. We will use this for the DSPEx.Evaluate display table. |
numpy | >=1.26.0 | Numerical Operations: Used for vector math in local retrieval models (Embeddings ) and potentially for score aggregation. | Nx (Numerical Elixir). | Nx is the native Elixir library for multi-dimensional tensors and numerical computing. It’s the direct equivalent of numpy . |
cloudpickle | >=3.0.0 | Serialization: Used for saving and loading entire program states, including functions and classes. | Native Erlang Term Storage. | Erlang/Elixir’s built-in :erlang.term_to_binary/1 and :erlang.binary_to_term/1 serve the exact same purpose as cloudpickle but are native to the BEAM. They can serialize any Elixir term, including functions (closures). This is a superior, native replacement. |
asyncer | ==0.0.8 | Async Utilities: Provides utilities for running sync code in an async context. | Native OTP Task Module. | Our DSPEx.asyncify wrapper will use Task.async/1 to run a synchronous function in a separate, managed process, which is the idiomatic OTP way to handle this. This is a native replacement. |
anyio | N/A | Async I/O: An underlying dependency for asyncer and other async libraries. | N/A (Handled by OTP). | The BEAM’s scheduler and process model are the native asynchronous I/O framework. This dependency is not needed. |
requests | >=2.31.0 | HTTP Client: Used for making API calls in various retrieval modules. | Tesla Hex package. | Tesla is a flexible and powerful HTTP client for Elixir. We will use it within our DSPEx.Client modules. |
regex | >=2023.10.3 | Regular Expressions: An alternative regex engine, likely used for its improved performance or feature set. | Native Elixir Regex module. | Elixir has a powerful, PCRE-based native Regex module. This is sufficient for our needs. |
magicattr | >=0.1.6 | Attribute Access: Likely used for dynamic getattr or setattr in the Module base class. | Native Elixir Map and Kernel modules. | Elixir’s standard library functions like Map.get/3 , Map.put/3 , and Kernel.apply/3 provide all the necessary dynamic access patterns. No external library is needed. |
Optional Dependencies ([project.optional-dependencies]
)
Python Dependency | Purpose in DSPy | DSPEx Porting Strategy | Elixir Equivalent / Notes |
---|---|---|---|
anthropic , aws , weaviate-client , mcp , langchain_core | Provider/Tool Integrations: Specific clients for Anthropic, AWS Bedrock, Weaviate, etc. | Custom Implementations. | For DSPEx , we will create a DSPEx.Client.Anthropic or DSPEx.RM.Weaviate module. These will either use a native Elixir client library if one exists, or a Port to Python if not. langchain_core would likely not be ported directly; instead, we would offer DSPEx native tools. |
Developer Dependencies ([project.optional-dependencies.dev]
)
Python Dependency | Purpose in DSPy | DSPEx Porting Strategy | Elixir Equivalent / Notes |
---|---|---|---|
pytest , pytest-mock , pytest-asyncio | Testing Framework: The core testing suite. | ExUnit . | ExUnit is the built-in, powerful testing framework for Elixir. Mox is the standard library for mocking behaviours. |
ruff | Linter/Formatter: Code quality tool. | mix format & Credo . | The Elixir toolchain provides a built-in formatter (mix format ) and a standard linter (Credo ). |
Phoenix and LiveView | N/A (not in deps) | Inspired Addition: Real-time monitoring dashboard. | This is an inspired addition to our roadmap. The Phoenix framework and its LiveView library are the perfect tools for building the real-time monitoring dashboard we envisioned. |
Summary of Findings and Impact on the Plan
- High Confidence in Core Replacements: For most core functionalitiesâresilience, caching, parallelism, data handling, and serializationâElixir and OTP provide superior, native equivalents. Our initial architectural plan is strongly validated.
- Identified Gap:
json-repair
: The lack of a native JSON repair library is a concrete technical risk for implementing theJSONAdapter
. Action: We must budget time to either write a simple native parser for common errors or create aPort
-based microservice for this specific task. - Confirmation of Critical Interop: The analysis confirms that
optuna
(forMIPROv2
) and the various ML/finetuning libraries (transformers
, etc.) are critical, non-negotiable dependencies that must be handled via Python interoperability. OurPort
-based strategy is the correct approach. foundation
’s Role Reinforced: The dependency list includesbackoff
,tenacity
,joblib
,diskcache
, andcachetools
. Thefoundation
library provides robust, observable replacements for all of these, confirming that using it as a dependency is a massive accelerator.
This dependency map provides a clear path forward, confirms our architectural choices, and highlights the specific areas where we will rely on native Elixir, external Hex packages, or controlled Python interoperability.