Daft
Distributed DataFrame library for large-scale data processing with native multi-modal support (images, embeddings, tensors, URLs alongside tabular data). Written in Rust with Python API. Runs locally or distributed on Ray/AWS. Unlike Spark DataFrames or Pandas, Daft treats images and tensors as first-class column types — enabling AI/ML data pipelines that mix tabular and model data without custom serialization. Designed specifically for ML data preprocessing at scale.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
No network API surface — pure Python library. Cloud storage credentials use standard provider auth chains. Rust core reduces memory safety vulnerabilities. Apache 2.0 source available for audit. Ray cluster security is separate concern.
⚡ Reliability
Best When
You're building ML data pipelines at scale that mix tabular data with images, embeddings, or tensors and need distributed execution beyond single-machine capacity.
Avoid When
Your data fits in memory and you don't need multi-modal types — Polars or Pandas are significantly faster and simpler for pure tabular data.
Use Cases
- • Process millions of images, embeddings, or video frames in distributed agent data pipelines with Daft's native multi-modal column types
- • Scale Pandas-like DataFrame transformations to datasets too large for single machines using Daft's Ray-based distributed execution
- • Build ML training data pipelines that mix tabular metadata with image/embedding columns without custom serialization or format conversion code
- • Query and filter large Parquet/Delta Lake/Iceberg datasets using Daft's lazy evaluation and predicate pushdown for efficient data loading
- • Run distributed Python UDFs on GPU clusters via Ray for batch inference or feature extraction on multi-modal agent datasets
Not For
- • Simple single-machine data analysis — Pandas or Polars are simpler and faster for datasets that fit in memory
- • SQL-first analytics workloads — use DuckDB, ClickHouse, or Spark SQL for SQL-centric analytics without Python UDFs
- • Transactional or real-time data — Daft is a batch processing framework, not a streaming or OLTP system
Interface
Authentication
No authentication — Daft is a Python library. Cloud storage access uses standard cloud credentials (AWS_ACCESS_KEY_ID, GCP Application Default Credentials). Ray cluster authentication is separate from Daft.
Pricing
Apache 2.0 open source. Compute costs depend on Ray cluster size or local machine. Free to use in production. Eventual Inc. (creators) provides commercial support.
Agent Metadata
Known Gotchas
- ⚠ Daft uses lazy evaluation — transformations build a logical plan but don't execute until .collect() or .show() is called; errors in transformations only surface at collect time
- ⚠ Multi-modal column types (ImageType, TensorType) are Daft-specific and require understanding Daft's type system; LLM-generated code may default to serializing these as bytes without using native types
- ⚠ Ray cluster setup is required for distributed execution — local mode works for development but agents must configure Ray for production scale; Ray initialization is separate from Daft
- ⚠ Daft is early-stage (v0.x) — API may have breaking changes between minor versions; pin exact version in agent environments and test on upgrades
- ⚠ GPU UDFs require Ray actors with GPU resources configured — GPU-enabled execution is not automatic; agents must annotate UDFs with @daft.udf(return_dtype=...) and configure Ray resource requests
- ⚠ Parquet reading uses predicate pushdown for filter optimization — agents must write filters using Daft expressions (not Python lambdas) to benefit from pushdown; Python UDF filters don't get pushed down
- ⚠ Memory management differs from Pandas — Daft batches data internally; for large datasets, out-of-memory errors may surface mid-execution rather than upfront with a clear memory estimate
Alternatives
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Daft.
Scores are editorial opinions as of 2026-03-06.