DSPy
Stanford research framework for programming with LLMs by composing declarative modules and automatically optimizing prompts via compilers rather than hand-crafting. DSPy replaces manual prompt engineering with automatic optimization — define your task as a module (Signature), provide a metric, and DSPy's optimizer (BootstrapFewShot, MIPRO, etc.) generates optimized prompts and few-shot examples. Treats prompts as learnable parameters.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
LLM API keys handled by underlying provider SDK. Optimization datasets may contain sensitive data — keep local. Generated prompts should be audited for prompt injection vulnerabilities.
⚡ Reliability
Best When
You're building complex LLM pipelines where prompt quality matters and you can invest in automatic optimization using a validation dataset.
Avoid When
You need simple, transparent prompts or rapid prototyping — direct API calls or LangChain are faster to start.
Use Cases
- • Automatically optimize prompts for complex pipelines without manual prompt engineering by defining a metric and running DSPy optimizer
- • Build multi-step LLM pipelines (ChainOfThought, ReAct, multi-hop reasoning) using composable DSPy modules
- • Compare LLM providers by swapping dspy.settings.configure(lm=...) without rewriting pipeline code
- • Optimize retrieval-augmented generation (RAG) pipelines end-to-end including retriever and generator prompts
- • Use typed signatures (InputField, OutputField) to enforce structured I/O contracts across pipeline stages
Not For
- • Quick one-off LLM calls — use the LLM provider SDK directly; DSPy overhead is only worthwhile for complex pipelines
- • Production deployment without optimization step — DSPy requires a compile/optimization phase before deployment
- • Teams wanting prompt transparency — DSPy abstracts prompts away; manually-crafted prompts are more auditable
Interface
Authentication
DSPy itself has no auth; LLM backends (OpenAI, Anthropic, local) require their own API keys passed to dspy.settings.configure().
Pricing
Free and open source. Optimization runs consume LLM API credits — optimization on complex tasks can be expensive ($1-$100+ depending on dataset size and model).
Agent Metadata
Known Gotchas
- ⚠ DSPy requires a validation set with ground truth labels for optimization — without a dataset to measure against, the optimizer has no signal; collecting this dataset is often the real work
- ⚠ DSPy 2.x changed the API significantly from 1.x — tutorials and examples before 2024 may use incompatible patterns; always check version compatibility
- ⚠ Optimized programs (compiled) must be saved and loaded separately — running optimizer on every deployment is too slow/expensive; use program.save() and program.load()
- ⚠ dspy.settings.configure() is global — in multi-threaded or async contexts, configure per-call context using dspy.context() to avoid settings conflicts
- ⚠ DSPy's teleprompters (optimizers) make many LLM calls during optimization — BootstrapFewShot with 50 examples × 3 LLM calls each = 150+ API calls per optimization run
- ⚠ Signature field types are hints not enforcement — OutputField doesn't guarantee type-safe output; combine with Pydantic or Outlines for guaranteed structured output
Alternatives
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for DSPy.
Scores are editorial opinions as of 2026-03-06.