chDB
Embedded OLAP database engine that runs ClickHouse entirely in-process as a Python library (or Go/Node.js). No server, no Docker, no network — just import chdb and run ClickHouse SQL queries on local files (Parquet, CSV, JSON) or in-memory data at ClickHouse speed. Ideal for agents that need powerful analytical SQL without managing database infrastructure. Think DuckDB's architecture applied to ClickHouse's query engine.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
No network attack surface — pure in-process library. Data never leaves the process. OS-level file permissions govern data access. No credentials to leak. Apache 2.0 source available for audit. C++ core inherits ClickHouse's security track record.
⚡ Reliability
Best When
You need ClickHouse's analytical performance for local file processing or single-node analytics without the complexity of running a ClickHouse server.
Avoid When
You need a shared analytics database for multiple users or services — use full ClickHouse server, DuckDB with read-only access, or a managed service.
Use Cases
- • Run ClickHouse-speed analytical queries on Parquet/CSV files directly from Python agents without spinning up a ClickHouse server or cluster
- • Process large local datasets (100GB+) with ClickHouse's columnar query engine in agent data pipelines without network overhead
- • Perform complex aggregations, window functions, and JOIN operations on agent-collected data using familiar SQL without database management
- • Build serverless analytics in Lambda/Cloud Run functions where spinning up a ClickHouse instance is impractical — embed chDB as a library
- • Query cloud storage files (S3 via URL table function) directly from in-process chDB without data movement to a server
Not For
- • Multi-user concurrent write workloads — chDB is a single-process embedded engine, not a shared database server
- • Transactional (OLTP) workloads — chDB is OLAP-focused; use PostgreSQL or SQLite for transactional use cases
- • Production-scale distributed queries requiring ClickHouse cluster features — use full ClickHouse server for distributed query execution across multiple nodes
Interface
Authentication
No authentication — chDB runs in-process with the calling application's privileges. Access control is at the OS file system level. No network server means no authentication surface.
Pricing
Apache 2.0 open source. Community project backed by ClickHouse Inc. No paid tiers for the embedded library. Completely free for commercial use.
Agent Metadata
Known Gotchas
- ⚠ chDB uses ClickHouse SQL dialect, not standard SQL — functions like arrayJoin, groupArray, and ARRAY JOIN syntax differ from PostgreSQL/DuckDB; LLM-generated SQL may need adaptation
- ⚠ Memory usage can be very high for large aggregations — chDB loads data into columnar format in memory; agents must account for memory limits in constrained environments
- ⚠ chDB is a Python library but the underlying engine is C++ — version compatibility between chdb Python package and the embedded ClickHouse binary matters; always pin versions
- ⚠ Query results are returned as Arrow, bytes, or DataFrame depending on output format setting — agents must specify the correct output format for downstream processing
- ⚠ Parquet file reading is very fast but CSV reading can be slow for large files — agents should prefer Parquet or ORC formats for large datasets
- ⚠ No persistent storage by default — chDB state is ephemeral; to persist query results, explicitly write to Parquet or use clickhouse-local syntax with writable paths
- ⚠ chDB is relatively new (2023) — some ClickHouse features available in server mode may not yet be available in the embedded version; check feature parity for advanced use cases
Alternatives
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for chDB.
Scores are editorial opinions as of 2026-03-06.