Here is a comprehensive inspection of the DSPy codebase, identifying decomposable procedures, their inherent side effects, and a thorough analysis of the concurrency implementation.
Summary of Analysis
The DSPy framework is designed around a modular architecture where different components (Predictors, Adapters, LMs, Retrievers, Optimizers) are composed to build complex AI systems. This compositionality inherently defines the primary decomposable procedures.
Key Findings:
Decomposable Procedures: The core procedures in DSPy revolve around the “program execution pipeline”:
- Prompt Formatting (
Adapter.format
): Translates structured inputs and few-shot examples into a string or message list for an LM. - LM Interaction (
LM.__call__
,LM.forward
): Sends the formatted prompt to an LLM provider and receives a response. This is a primary source of I/O. - Response Parsing (
Adapter.parse
): Translates the raw LM response back into a structureddspy.Prediction
object. - Optimization (
Teleprompter.compile
): An outer-loop procedure that repeatedly executes a program with different configurations (prompts, demos, weights) to find an optimal version based on a metric. - Evaluation (
Evaluate.__call__
): A procedure for systematically running a program over a dataset to calculate a performance metric. - State Management (
Module.save/load
): Procedures for serializing and deserializing program state, including learned parameters like prompts and few-shot examples.
- Prompt Formatting (
Inherent Side Effects: Most procedures in DSPy have significant side effects, which is expected for a framework that interacts with external models and the local filesystem.
- Network I/O: The most common side effect is making API calls to external Language Model (LM) and Retrieval Model (RM) services.
- Filesystem I/O: Caching (
dspy.cache
) and model state persistence (dspy.save
,dspy.load
) are major sources of filesystem interaction. Finetuning processes also write data to disk. - Global State Modification: The framework heavily relies on a global, thread-aware
dspy.settings
object for configuration. This object is read and modified by many components. A global history of LM calls is also maintained. - Concurrency: Spawning threads or async tasks for parallel execution, finetuning, and batching is a key side effect for performance-oriented components.
Concurrency Implementation: DSPy employs a hybrid concurrency model to handle both I/O-bound and CPU-bound tasks efficiently.
- Asynchronous I/O (
async
/await
): Core components likeLM
,Adapter
, andModule
haveacall
methods for non-blocking execution, primarily for LM and RM network calls. This is powered byasyncio
andanyio
. - Thread-based Parallelism (
threading
): For parallelizing evaluations and optimization steps across a dataset, DSPy uses a customParallelExecutor
which manages aThreadPoolExecutor
. This is ideal for running multiple I/O-bound program executions concurrently. - Sync-Async Bridge (
asyncer
,asyncify
): Thedspy.asyncify
utility provides a bridge to run synchronous code in an async context without blocking the event loop, using a background thread pool. - Batching with Concurrency Primitives (
unbatchify
): TheUnbatchify
utility usesthreading.Thread
,queue.Queue
, andthreading.Event
to implement an asynchronous batching system, collecting items from multiple producer threads into a single batch for processing.
- Asynchronous I/O (
File-by-File Breakdown Table
The following table details the analysis for each file in the dspy
codebase.
File Path | Decomposable Procedures | Inherent Side Effects | Concurrency Implementation |
---|---|---|---|
dspy/clients/__init__.py | configure_cache | Global State: Initializes and configures the global DSPY_CACHE object.Filesystem: Creates the .dspy_cache directory in the user’s home directory. | None. |
dspy/clients/base_lm.py | BaseLM.__call__ , BaseLM.acall (LM invocation), update_global_history , inspect_history (History management). | Global State: update_global_history appends to the GLOBAL_HISTORY list.Callbacks: Interacts with globally or locally configured callback handlers. | acall and aforward methods are async , designed for non-blocking I/O. |
dspy/clients/lm.py | LM.forward , LM.aforward (API calls via litellm ), LM.finetune (Model training), LM.reinforce (RL-based training), litellm_completion (Request handling with retries), Streaming logic. | Network I/O: Makes API calls to LLM providers. Filesystem/Memory I/O: Interacts with the global cache via @request_cache .Threading: finetune spawns a background threading.Thread to run the training job without blocking.Global State: Reads from dspy.settings . | aforward and alitellm_completion are async .finetune uses threading.Thread .Streaming functions use async generators and anyio streams. |
dspy/clients/cache.py | Cache.get , Cache.put , Cache.cache_key (Caching operations), request_cache (Decorator for caching function calls). | Filesystem I/O: Manages a persistent on-disk cache using diskcache .Memory Management: Manages an in-memory LRU cache using cachetools .State Modification: Modifies cache state (both in-memory and on-disk). | The request_cache decorator is async-aware and can wrap async functions. The underlying diskcache is process-safe. |
dspy/adapters/base.py | Adapter.format (Prompt building), Adapter.parse (Response parsing), Adapter.__call__ /acall (Orchestration). | Network I/O: Invokes dspy.settings.lm to make API calls.Callbacks: Interacts with configured callback handlers. | acall is an async method that orchestrates async LM calls. |
dspy/adapters/chat_adapter.py | Implements format and parse for chat-based models. | Same as base.py . | Inherits async capabilities from the base Adapter . |
dspy/adapters/json_adapter.py | Implements format and parse for JSON-based model interaction. | Same as base.py . Handles JSON schema generation and validation. | Inherits async capabilities from the base Adapter . |
dspy/adapters/two_step_adapter.py | Implements a two-stage process where parse makes an additional LM call. | Network I/O: parse method makes a second LM call for data extraction, increasing latency and cost. | Inherits async capabilities from the base Adapter . |
dspy/primitives/module.py | BaseModule.dump_state , load_state , save , load (State serialization/deserialization). | Filesystem I/O: save and load interact with the filesystem to write/read JSON or pickled state. | None. |
dspy/primitives/program.py | Module.__call__ , acall (Forward pass orchestration), Module.batch (Parallel processing of examples). | Global State: __call__ and acall modify the dspy.settings.caller_modules context variable.Concurrency: batch spawns multiple threads for parallel execution. | acall is async .batch uses dspy.Parallel which in turn uses ParallelExecutor for multi-threading. |
dspy/predict/predict.py | Predict.forward , aforward (Core logic for a single prediction). | Network I/O: Invokes an LM via an adapter. Global State: Reads dspy.settings , updates dspy.settings.trace . | aforward is async and uses the adapter’s acall method. |
dspy/predict/react.py | ReAct.forward , aforward (Multi-step reasoning loop with tool use). | Multiple Network I/O: Makes multiple LM calls within a single forward pass.External Code Execution: Calls external tools. | aforward method implements the reasoning loop asynchronously. |
dspy/evaluate/evaluate.py | Evaluate.__call__ (Program evaluation over a dataset). | Multiple Network I/O: Executes the program for every example in the dev set. Console Output: Prints progress bars and result tables. | __call__ uses ParallelExecutor to run evaluations on multiple threads. |
dspy/teleprompt/* | Teleprompter.compile (Main optimization loop for prompts/demos/weights). | Heavy Network I/O: Involves numerous LM calls for generating and evaluating candidates. High CPU/Memory Usage: Can be computationally intensive. BootstrapFinetune and GRPO initiate model training, a significant side effect. | Optimizers like BootstrapFinetune and MIPROv2 use ParallelExecutor to evaluate candidates concurrently. GRPO manages its own training jobs. |
dspy/streaming/* | streamify (Program wrapper), apply_sync_streaming (Async-to-sync generator bridge), StreamListener (Stream processing). | Concurrency: Manages async generators and data streams. apply_sync_streaming spawns a background thread to consume the async stream. | Heavy use of asyncio , anyio for streams, and threading in apply_sync_streaming to bridge contexts. |
dspy/utils/asyncify.py | asyncify (Function wrapper). | Threading: Runs a synchronous function in a background thread pool managed by asyncer . | Core concurrency utility. Manages an anyio.CapacityLimiter to control the number of concurrent async tasks. |
dspy/utils/parallelizer.py | ParallelExecutor.execute (Task distribution). | Threading: Manages a concurrent.futures.ThreadPoolExecutor .Global State: Propagates dspy.settings to worker threads in an isolated manner.Signal Handling: Catches KeyboardInterrupt . | A primary multi-threading implementation in DSPy, designed for parallelizing I/O-bound tasks. |
dspy/utils/unbatchify.py | Unbatchify.__call__ (Item submission), _worker (Batch processing loop). | Threading: Spawns a dedicated background threading.Thread for batching. | Implements a classic producer-consumer pattern using queue.Queue and threading.Event for concurrent batching. |
dspy/utils/saving.py | load (Model loading). | Filesystem I/O: Reads pickled model and metadata from disk. Dependency Check: Compares saved dependency versions with current environment versions and logs warnings. | None. |
dspy/primitives/python_interpreter.py | PythonInterpreter.execute (Sandboxed code execution). | Process Management: Spawns and communicates with a deno subprocess.Filesystem: Can be configured to allow read/write access to specific paths from within the sandbox. Environment: Can pass environment variables into the sandbox. | Manages a persistent subprocess.Popen instance. Communication happens via stdin/stdout pipes. |
Continuing the comprehensive inspection of the DSPy codebase, here is the file-by-file breakdown for the remaining modules.
File-by-File Breakdown Table (Continued)
File Path | Decomposable Procedures | Inherent Side Effects | Concurrency Implementation |
---|---|---|---|
dspy/adapters/types/audio.py | Audio.from_url , Audio.from_file , Audio.from_array (Data loading and encoding). | Network I/O: from_url downloads audio data.Filesystem I/O: from_file reads local files. | None. |
dspy/adapters/types/base_type.py | split_message_content_for_custom_types (String processing), BaseType.serialize_model (Serialization). | None. | None. |
dspy/adapters/types/history.py | Data model for conversation history. | Not Applicable (Data Model). | Not Applicable (Data Model). |
dspy/adapters/types/image.py | Image.from_url , Image.from_file , Image.from_PIL (Data loading and encoding). | Network I/O: from_url downloads image data.Filesystem I/O: from_file reads local files. | None. |
dspy/adapters/types/tool.py | Tool._parse_function (Reflection), Tool.__call__ /acall (Tool invocation). | External Code Execution: __call__ executes arbitrary user-provided functions. | acall is async and can wrap both sync and async functions. |
dspy/adapters/utils.py | format_field_value , serialize_for_json (Data formatting and serialization). | None. | None. |
dspy/clients/databricks.py | DatabricksProvider.finetune , deploy_finetuned_model (Model training and deployment). | Network I/O: Makes API calls to Databricks for managing jobs and deployments. Filesystem I/O: upload_data saves training data to a temporary local file before uploading. | None. |
dspy/clients/lm_local.py | LocalProvider.launch , kill (Process management), train_sft_locally (Local model finetuning). | Process Management: launch spawns a sglang server in a subprocess.Popen . kill terminates it.Threading: launch uses a threading.Thread to tail process logs without blocking. | The launch procedure uses threading to manage log output from the server process. |
dspy/clients/lm_local_arbor.py | ArborProvider.launch , kill , finetune (Interaction with Arbor backend). | Network I/O: Makes API calls to the Arbor backend to manage remote training and inference jobs. | None. |
dspy/clients/openai.py | OpenAIProvider.finetune , upload_data , _start_remote_training (Finetuning job management). | Network I/O: Makes API calls to the OpenAI API for finetuning. | None. |
dspy/clients/provider.py | Abstract base classes for providers. | Not Applicable (Interface Definition). | Not Applicable (Interface Definition). |
dspy/clients/utils_finetune.py | write_lines , get_finetune_directory (File utilities for training data). | Filesystem I/O: Writes training data to files within the cache directory. | None. |
dspy/datasets/alfworld/alfworld.py | EnvPool (Environment management), env_worker (Process worker). | Process Management: EnvPool spawns and manages a pool of worker processes (multiprocessing.Process ) to run game environments. | EnvPool uses multiprocessing queues and processes to run environments in parallel. This is a significant concurrency implementation. |
dspy/datasets/colors.py | Data loading and preparation. | Not Applicable (Data Preparation). | Not Applicable (Data Preparation). |
dspy/datasets/dataloader.py | DataLoader (Data loading from various sources). | Filesystem I/O: Reads data from CSV, JSON, or Parquet files. Network I/O: Can load data from Hugging Face Hub. | None. |
dspy/datasets/dataset.py | Base class for datasets, handles shuffling and sampling. | None. | None. |
dspy/datasets/gsm8k.py | Data loading and preparation for GSM8K. | Network I/O: Downloads the dataset from Hugging Face Hub. | None. |
dspy/datasets/hotpotqa.py | Data loading and preparation for HotPotQA. | Network I/O: Downloads the dataset from Hugging Face Hub. | None. |
dspy/datasets/math.py | Data loading and preparation for MATH. | Network I/O: Downloads the dataset from Hugging Face Hub. | None. |
dspy/dsp/colbertv2.py | ColBERTv2 (Retrieval from remote ColBERTv2 server), ColBERTv2RetrieverLocal (Local index building/searching). | Network I/O: colbertv2_post_request makes API calls.Filesystem I/O: ColBERTv2RetrieverLocal builds and loads a Faiss index on disk. | None. |
dspy/dsp/utils/settings.py | Settings.configure , Settings.context (Global and thread-local configuration management). | Global State: Manages the global dspy.settings object.Thread-local State: context modifies thread-local configuration overrides. | The entire Settings class is designed to be thread-safe, using a threading.Lock and threading.local for managing configuration across multiple threads. |
dspy/dsp/utils/* | Utility functions for text processing (dpr.py , metrics.py ) and general helpers (utils.py ). | None. | None. |
dspy/evaluate/auto_evaluation.py | SemanticF1.forward , CompleteAndGrounded.forward (LLM-based evaluation). | Network I/O: These modules use an LM to perform the evaluation, making API calls. | None. |
dspy/evaluate/metrics.py | Metric calculation functions (answer_exact_match , etc.). | None. | None. |
dspy/experimental/synthesizer/* | Synthesizer.generate (Synthetic data generation loop). | Network I/O: Makes multiple LM calls to generate data and get feedback. Filesystem I/O: export method writes generated data to a file. | None. |
dspy/experimental/synthetic_data.py | SyntheticDataGenerator.generate | Network I/O: Uses a dspy.Predict module to generate data, involving LM calls. | None. |
dspy/experimental/module_graph.py | ModuleGraph.render_graph | Filesystem I/O: Renders and saves a graph visualization to a file using graphviz . | None. |
dspy/predict/avatar/* | Avatar.forward (Reasoning loop with tool use). | Multiple Network I/O: Similar to ReAct , it makes multiple LM calls within a single forward pass.External Code Execution: Calls external tools. | None. |
dspy/predict/best_of_n.py | BestOfN.forward (Sampling multiple completions and selecting the best). | Multiple Network I/O: Makes N calls to the underlying module/LM to generate candidates. | None. |
dspy/predict/multi_chain_comparison.py | MultiChainComparison.forward (LLM-based comparison of multiple reasoning chains). | Network I/O: Uses an LM to compare and select the best completion. | None. |
dspy/predict/parallel.py | Parallel.forward | Threading: A wrapper around ParallelExecutor to simplify running a module over a list of inputs. | Inherits concurrency from ParallelExecutor . |
dspy/predict/refine.py | Refine.forward (Iterative refinement loop). | Multiple Network I/O: Can make multiple LM calls to refine an answer and generate feedback. | None. |
dspy/retrieve/* | Each file defines a Retrieve module for a specific vector database or search service (e.g., WeaviateRM , PineconeRM , DatabricksRM ). | Network I/O: Each retriever makes API calls to its respective external service. | None. |
dspy/retrievers/embeddings.py | Embeddings._build_faiss (Local index building). | Filesystem I/O / Memory: Builds an in-memory or on-disk Faiss index. | None. |
dspy/signatures/signature.py | ensure_signature , make_signature (Dynamic signature class creation), _detect_custom_types_from_caller (Reflection). | Reflection: _detect_custom_types_from_caller inspects the call stack (sys._getframe ) to resolve type names, which can be fragile. | None. |
dspy/teleprompt/bootstrap_finetune.py | BootstrapFinetune.compile , finetune_lms (Orchestration of fine-tuning jobs). | Process Management/Threading: Can run multiple fine-tuning jobs in parallel using ThreadPoolExecutor .Heavy Network I/O: Interacts with fine-tuning provider APIs. | finetune_lms uses a ThreadPoolExecutor to run multiple fine-tuning jobs concurrently. |
dspy/teleprompt/grpo.py | GRPO.compile (Reinforcement learning optimization loop). | Heavy Network I/O: Manages reinforcement learning jobs with a backend provider. | Manages remote training jobs but the main loop is sequential. |
dspy/teleprompt/simba.py | SIMBA.compile (Stochastic optimization loop). | Heavy Network I/O: Executes many program evaluations in parallel. | Uses dspy.Parallel (and thus ParallelExecutor ) to run evaluation steps concurrently. |
dspy/teleprompt/mipro_optimizer_v2.py | MIPROv2.compile (Hyperparameter optimization using Optuna). | Heavy Network I/O: Executes many program evaluations as part of the Optuna study. CPU Intensive: Optuna’s search can be CPU-bound. | _optimize_prompt_parameters uses ParallelExecutor if minibatching is enabled. |
dspy/utils/callback.py | with_callbacks (Decorator for instrumenting methods). | Global State: Uses a ContextVar (ACTIVE_CALL_ID ) to track call context across async tasks and threads. | The decorator and its helpers are designed to be safe for both synchronous and asynchronous functions, and correctly handle context propagation. |
dspy/utils/logging_utils.py | configure_dspy_loggers , enable/disable_logging . | Global State: Configures the root dspy logger and manages a custom logging stream (DSPY_LOGGING_STREAM ). | The custom stream is designed to be thread-safe. |
dspy/utils/usage_tracker.py | UsageTracker class and track_usage context manager. | Global State: The context manager works by setting and resetting a UsageTracker instance on dspy.settings . | None directly, but designed to work with ParallelExecutor , which deep copies the tracker for each thread. |
Of course. Here is the continuation of the comprehensive inspection, focusing on the tests
directory of the DSPy codebase.
Test Suite Analysis Summary
The DSPy test suite is thorough and well-structured, mirroring the modularity of the framework itself.
Decomposable Procedures: The tests are built around common testing patterns:
- Fixture-based Setup:
conftest.py
and local fixtures define reusable setup procedures. Key examples includelitellm_test_server
(spawning a mock server process) andclear_settings
(ensuring test isolation). - Mocking and Patching: Many unit tests use
pytest-mock
andDummyLM
to isolate components from external dependencies (like live LMs) and control their behavior. - State Assertion: A common procedure is to call a DSPy module, then assert the state of the resulting object (e.g., the content of
demos
in aPredict
module) or the content of the prompt sent to the mocked LM. - End-to-End Validation: Integration tests, particularly in the
reliability
suite, run full DSPy programs against multiple real or mocked LMs and validate the output against predefined assertions or metrics.
- Fixture-based Setup:
Inherent Side Effects: The test suite actively manages side effects to ensure correctness and isolation.
- Process Management: The most significant side effect is the creation and teardown of a
litellm
server process viasubprocess.Popen
in thelitellm_test_server
fixture. This provides a controlled environment for testing network interactions. - Filesystem I/O: Many tests (e.g.,
test_cache.py
,test_saving.py
,reliability/
) create temporary directories and files (tmp_path
fixture) to test caching, serialization, and test generation. - Global State Management: Nearly every test relies on the
clear_settings
fixture inconftest.py
, which resets the globaldspy.settings
object after each test run. This is a critical procedure for maintaining test isolation. - Mocking/Patching: The
monkeypatch
andmocker
fixtures are used extensively to replace functions or objects at runtime, a fundamental side effect in unit testing.
- Process Management: The most significant side effect is the creation and teardown of a
Concurrency Implementation: The test suite rigorously validates DSPy’s concurrency features.
- Async Testing (
pytest-asyncio
): Tests for asynchronous methods (acall
) are decorated with@pytest.mark.asyncio
, allowingawait
syntax to be used directly in test functions. This is prevalent in tests forLM
,Adapter
,Module
, andstreaming
. - Direct Concurrency Tests: Files like
utils/test_asyncify.py
,utils/test_parallelizer.py
, andutils/test_unbatchify.py
directly test the core concurrency primitives (asyncify
,ParallelExecutor
,Unbatchify
). These tests often involve measuring execution time to confirm that tasks are running in parallel rather than sequentially. - Thread-Safety Validation: Tests like
evaluate/test_multithread_evaluate_call
andretrievers/test_embeddings.py
validate that higher-level components correctly utilize the underlying concurrent executors for performance gains.
- Async Testing (
File-by-File Breakdown Table (Test Suite)
File Path | Decomposable Procedures | Inherent Side Effects | Concurrency Implementation |
---|---|---|---|
tests/conftest.py | clear_settings fixture (Reset global state), pytest_... hooks (Test collection modification). | Global State: clear_settings resets dspy.settings to a default state after each test.Test Execution: pytest_collection_modifyitems modifies the test run by adding skip marks based on command-line flags. | anyio_backend fixture configures the backend for async tests. |
tests/test_utils/server/__init__.py | litellm_test_server fixture (Start/stop a mock server), read_litellm_test_server_request_logs (Read log files). | Process Management: Spawns and terminates a subprocess.Popen running the LiteLLM server.Network I/O: The server listens on a random free port. Filesystem I/O: Creates and reads from a temporary server log file. | The fixture itself uses subprocess to create a concurrent process. The test functions using it can then make async network calls. |
tests/test_utils/server/litellm_server.py | DSPyTestModel (Mock LLM implementation), _get_mock_llm_response , _append_request_to_log_file . | Filesystem I/O: _append_request_to_log_file writes incoming request data to a log file specified by an environment variable. | The mock model class implements both sync (completion ) and async (acompletion , astreaming ) methods to handle different types of requests. |
tests/adapters/test_chat_adapter.py | Test procedures for prompt formatting (format ) and response parsing (parse ) with various signatures, including Pydantic models, images, and tools. | Mocking: Uses DummyLM to simulate LM responses. mocker.patch is used to spy on internal calls. | Contains @pytest.mark.asyncio tests to validate async adapter calls. |
tests/adapters/test_json_adapter.py | Test procedures for JSON-based prompt formatting, including structured output capabilities and fallbacks. | Mocking: Uses DummyLM and spies on litellm.completion calls. | Contains @pytest.mark.asyncio tests. |
tests/adapters/test_two_step_adapter.py | Test procedures for the two-stage reasoning and extraction process. | Mocking: Mocks two separate LMs (mock_main_lm , mock_extraction_lm ) to test the two stages independently. | Contains @pytest.mark.asyncio tests. |
tests/adapters/test_tool.py | Procedures to test dspy.Tool creation from functions and classes, invocation, and argument parsing. | None. | Contains a dedicated @pytest.mark.asyncio test (test_async_concurrent_calls ) that uses asyncio.gather to verify that multiple acall s on a tool run concurrently. |
tests/caching/test_litellm_cache.py | Test procedures for cache hits/misses across sessions and LMs. | Filesystem I/O: Uses temporary_blank_cache_dir and temporary_populated_cache_dir fixtures to manage test cache directories.Network I/O: Uses the litellm_test_server to verify that API calls are skipped on cache hits. | None. |
tests/clients/test_cache.py | Test procedures for dspy.cache functionality, including key generation, put/get, and the @request_cache decorator. | Filesystem I/O: Uses tmp_path fixture to create a temporary cache directory. | Contains @pytest.mark.asyncio tests to validate the caching decorator with async functions. |
tests/clients/test_databricks.py | Integration tests for Databricks fine-tuning and deployment. | Network I/O: Makes live API calls to the Databricks SDK. Marked to be skipped in CI. | None. |
tests/clients/test_embedding.py | Test procedures for the Embedder class with both hosted models and local callables. | Mocking: Mocks litellm.embedding to avoid real API calls.Filesystem I/O: Uses tmp_path to test caching. | Contains @pytest.mark.asyncio tests for acall . |
tests/clients/test_inspect_global_history.py | Test procedures for inspecting the GLOBAL_HISTORY object. | Global State: Tests directly read from and implicitly rely on the modification of the GLOBAL_HISTORY list. The clear_history fixture modifies this global state. | None. |
tests/clients/test_lm.py | Test procedures for dspy.LM , including querying, caching, retries, and state management. | Network I/O: Uses litellm_test_server .Filesystem I/O: Uses tmp_path for caching tests.Mocking: Mocks time.sleep to test exponential backoff without waiting. | Contains @pytest.mark.asyncio tests for acall . |
tests/evaluate/test_evaluate.py | test_multithread_evaluate_call : Directly tests parallel evaluation. test_multi_thread_evaluate_call_cancelled : Tests interrupt handling. | Mocking: Uses a SlowLM to simulate long-running tasks.Threading: test_multi_thread_evaluate_call_cancelled spawns a background thread to send a KeyboardInterrupt . | Directly tests the multi-threading capability of dspy.Evaluate . |
tests/primitives/test_module.py | Procedures for testing module state (save /load ), deep copying, and history tracking. | Filesystem I/O: test_save_and_load_* tests use tmp_path to create and read model files.Module Management: test_save_with_extra_modules modifies sys.path and sys.modules to test serialization. | test_module_history_async is an @pytest.mark.asyncio test. test_module_history_with_concurrency validates history tracking in a multi-threaded context. |
tests/primitives/test_python_interpreter.py | Test procedures for executing code in the sandboxed Deno environment, including tests for file and network access control. | Process Management: Starts and communicates with a deno subprocess.Filesystem I/O: test_read_file_access_control and test_enable_write_flag create temporary files to test sandbox permissions. | None. |
tests/reliability/* | A suite of integration and end-to-end tests. | Heavy Network I/O: These tests are designed to run against multiple live LLM backends defined in reliability_conf.yaml .Filesystem I/O: generate module writes test programs and inputs to disk. test_generated.py reads them.Global State: conftest.py extensively configures dspy.settings for each parameterized test run. | None directly in the tests, but they exercise the async capabilities of the underlying DSPy modules being tested. |
tests/retrieve/test_llama_index_rm.py | test_save_load_llama_index_rag : Tests serialization of a RAG program using a LlamaIndex retriever. | Filesystem I/O: Uses tmp_path to save and load the RAG program state. | None. |
tests/retrievers/test_embeddings.py | test_embeddings_multithreaded_search : Validates concurrent search execution. | Mocking: Uses a dummy_embedder . | Directly tests concurrency by submitting tasks to a ThreadPoolExecutor and verifying correct, parallel execution. |
tests/signatures/test_adapter_image.py | Procedures to test handling of dspy.Image objects in signatures, including serialization and prompt formatting. | Network I/O: The sample_pil_image fixture downloads an image from a URL.Filesystem I/O: test_pdf_from_file uses tmp_path to test loading local files. | None. |
tests/streaming/test_streaming.py | All tests validate streaming functionality. | Network I/O: Uses litellm_test_server to get streaming responses.Mocking: Mocks litellm.acompletion with async generators to simulate streaming chunks from different model providers. | All tests are marked with @pytest.mark.anyio and heavily use async for loops to consume and validate async generators. |
tests/teleprompt/test_bootstrap.py | Procedures to test the BootstrapFewShot optimizer’s compile method. | Mocking: Uses DummyLM to control the “teacher” model’s output and validate that the “student” learns the correct demos. | None. |
tests/utils/test_asyncify.py | test_asyncify , verify_asyncify : Directly test the asyncify utility. | Time-based Assertion: The tests use time.sleep to simulate blocking tasks and then measure total execution time to confirm parallel execution. | A core concurrency test file, validating that asyncify and the CapacityLimiter work as expected using asyncio.gather . |
tests/utils/test_parallelizer.py | test_worker_threads_independence , test_parallel_execution_speed , test_max_errors_handling : Directly test the ParallelExecutor . | Time-based Assertion: test_parallel_execution_speed uses time.sleep and measures execution time to verify parallelism. | A core concurrency test file that validates thread isolation, performance, and error handling of the ParallelExecutor . |
tests/utils/test_unbatchify.py | test_unbatchify_batch_size_trigger , test_unbatchify_timeout_trigger : Directly test the Unbatchify utility. | Time-based Assertion: test_unbatchify_timeout_trigger uses time.sleep to test the timeout functionality. | A core concurrency test file that validates the background threading and queue-based batching mechanism. |
tests/utils/test_saving.py | test_save_* , test_load_with_version_mismatch : Procedures to test dspy.save and dspy.load . | Filesystem I/O: All tests use tmp_path to save and load model files (.json , .pkl ).Mocking: test_load_with_version_mismatch mocks the get_dependency_versions function to test version warning logs. | None. |
Of course. Here is the final part of the comprehensive inspection, covering the remaining files in the dspy
codebase.
File-by-File Breakdown Table (Conclusion)
File Path | Decomposable Procedures | Inherent Side Effects | Concurrency Implementation |
---|---|---|---|
dspy/__init__.py | Not Applicable (Module initialization). | Import Side Effects: Re-exports core classes and functions, setting up the dspy namespace. | None. |
dspy/__metadata__.py | Not Applicable (Metadata). | None. | None. |
dspy/dsp/* | colbertv2.py : colbertv2_post_request (API call), ColBERTv2RetrieverLocal.build_index (Local index creation).utils/* : Text processing and metric calculation functions. | Network I/O: colbertv2.py makes API calls to a ColBERTv2 server.Filesystem I/O: ColBERTv2RetrieverLocal builds a Faiss index on disk. | None. |
dspy/predict/aggregation.py | majority (Calculates the majority vote from a set of completions). | None. | None. |
dspy/predict/chain_of_thought.py | ChainOfThought.forward , aforward (Generates a rationale before the final answer). | Network I/O: Makes a single, extended LM call. | Implements aforward for async execution. |
dspy/predict/code_act.py | CodeAct.forward (Generates and executes Python code). | Process Management: Uses PythonInterpreter , which spawns a deno subprocess to execute code in a sandbox.External Code Execution: Runs arbitrary, LM-generated code. | None. |
dspy/predict/knn.py | KNN.__call__ (Finds nearest neighbors in a trainset). | CPU/Memory Usage: Computes embeddings and performs vector search in memory. | None. |
dspy/predict/parameter.py | Not Applicable (Base classes for parameters). | None. | None. |
dspy/predict/program_of_thought.py | ProgramOfThought.forward (Generates and executes Python code to derive an answer). | Process Management: Uses PythonInterpreter to run code in a sandboxed deno process. | None. |
dspy/predict/retry.py | Not Applicable (File is fully commented out and deprecated). | Not Applicable. | Not Applicable. |
dspy/primitives/assertions.py | Not Applicable (File is fully commented out and deprecated). | Not Applicable. | Not Applicable. |
dspy/primitives/example.py | Not Applicable (Data structure). | None. | None. |
dspy/primitives/prediction.py | Not Applicable (Data structure). | None. | None. |
dspy/propose/* | GroundedProposer.propose_instructions_for_program (Generates and refines instructions). create_dataset_summary (Summarizes a dataset using an LM). | Network I/O: These modules are LLM-intensive, making multiple calls to generate, describe, and refine instructions. | None. |
dspy/retrieve/retrieve.py | Retrieve.__call__ (A wrapper to invoke the globally configured RM). | Network I/O: Delegates to dspy.settings.rm , which performs a network call to an external retrieval service. | None. |
dspy/signatures/field.py | Not Applicable (Configuration classes for signature fields). | None. | None. |
dspy/teleprompt/avatar_optimizer.py | AvatarOptimizer.compile (Optimization loop for tool-use agents). | Heavy Network I/O: Involves multiple evaluations of an agent, each potentially making multiple tool and LM calls. | Uses ThreadPoolExecutor in its thread_safe_evaluator method to run evaluations in parallel. |
dspy/teleprompt/bettertogether.py | BetterTogether.compile (Orchestrates prompt and weight optimizers). | Heavy Network I/O: Sequentially runs other optimizers (BootstrapFewShot , BootstrapFinetune ), which are themselves I/O-intensive. | None directly, but the optimizers it calls may use concurrency. |
dspy/teleprompt/copro_optimizer.py | COPRO.compile (Signature optimization loop). | Heavy Network I/O: Makes many calls to the prompt_model to generate candidate instructions and to the task_model to evaluate them. | None. |
dspy/teleprompt/ensemble.py | Ensemble.compile (Combines multiple programs into one). | None, as it wraps programs without immediate execution. The resulting EnsembledProgram makes multiple calls when executed. | None. |
dspy/teleprompt/infer_rules.py | InferRules.compile (Induces and tests natural language rules for prompts). | Heavy Network I/O: Uses an LM to induce rules and then evaluates programs with these new rules. | None. |
dspy/teleprompt/knn_fewshot.py | KNNFewShot.compile (Creates a program that dynamically selects demos using KNN). | CPU/Memory Usage: The compiled program performs an in-memory KNN search on every forward pass. | None. |
dspy/teleprompt/mipro_optimizer_v2.py | MIPROv2.compile (Orchestrates a multi-stage optimization process using Optuna). | Heavy Network I/O: The optimization loop involves many evaluations, which call the task model. User Interaction: May prompt the user for confirmation before starting a potentially costly run. | None directly, but the underlying eval_candidate_program can use parallel execution. |
dspy/teleprompt/random_search.py | BootstrapFewShotWithRandomSearch.compile (Randomly samples demos and evaluates). | Heavy Network I/O: Executes multiple program evaluations to find a good set of random demos. | None. |
dspy/teleprompt/signature_opt.py | Not Applicable (File is deprecated). | Not Applicable. | Not Applicable. |
dspy/teleprompt/simba.py | SIMBA.compile (Stochastic batch optimization loop). | Heavy Network I/O: Executes many program evaluations in parallel. | Uses dspy.Parallel to run trajectory sampling and candidate evaluations concurrently. |
dspy/teleprompt/teleprompt_optuna.py | BootstrapFewShotWithOptuna.compile (Uses Optuna to find the best few-shot demos). | Heavy Network I/O: The Optuna study triggers many evaluations of the student program. | None. |
dspy/teleprompt/vanilla.py | LabeledFewShot.compile (Selects random labeled examples as demos). | None. | None. |
dspy/teleprompt/utils.py | eval_candidate_program (Helper for running evaluations). | Network I/O: Calls the program being evaluated. | None directly, but it is called by optimizers that may run in parallel. |
dspy/utils/dummies.py | Not Applicable (Mock objects for testing). | None. | DummyLM implements an acall method to support async testing scenarios. |
dspy/utils/langchain_tool.py | convert_langchain_tool (Adapts a LangChain tool to a DSPy tool). | None. | The resulting dspy.Tool has an async acall method. |
dspy/utils/mcp.py | convert_mcp_tool (Adapts an MCP tool to a DSPy tool). | Network I/O: The resulting tool’s acall method will make a network call to the MCP session. | The resulting dspy.Tool has an async acall method. |