vaex

Out-of-core lazy dataframe library for Python — processes billion-row datasets without loading into RAM using memory-mapped files. vaex features: open() for memory-mapped HDF5/Arrow/CSV, lazy evaluation (computations deferred until needed), virtual columns (computed on-the-fly), df.mean/std/sum operations at billion rows/second, df.plot1d/plot2d for fast statistical plots, filtering with boolean expressions, df.apply() for UDFs, df.export_hdf5() for efficient storage, string operations, ML feature engineering, and JIT compilation via Numba/Pytables. Enables desktop analysis of datasets too large for pandas.

Evaluated Mar 06, 2026 (0d ago) v4.x
Homepage ↗ Repo ↗ Developer Tools python vaex dataframe big-data out-of-core lazy visualization hdf5
⚙ Agent Friendliness
64
/ 100
Can an agent use this?
🔒 Security
88
/ 100
Is it safe for agents?
⚡ Reliability
74
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
78
Error Messages
75
Auth Simplicity
99
Rate Limits
99

🔒 Security

TLS Enforcement
90
Auth Strength
90
Scope Granularity
88
Dep. Hygiene
85
Secret Handling
88

Local dataframe library with no network calls. HDF5 files are binary and machine-readable — protect large dataset files with filesystem permissions. No remote data access unless explicitly configured.

⚡ Reliability

Uptime/SLA
75
Version Stability
75
Breaking Changes
70
Error Recovery
75
AF Security Reliability

Best When

Processing large datasets (100M-10B rows) that don't fit in RAM — vaex's memory-mapped lazy evaluation enables desktop analysis of truly large data without distributed infrastructure.

Avoid When

Small data (use pandas), complex multi-table operations (use DuckDB), mutable in-place updates, or real-time streaming.

Use Cases

  • Agent large dataset analysis — import vaex; df = vaex.open('large_data.hdf5'); mean_score = df[df['status'] == 'success']['score'].mean() — lazy evaluation; agent analyzes billion-row event log without loading into RAM; operations execute on memory-mapped file; result computed only when accessed
  • Agent CSV to HDF5 conversion — df = vaex.from_csv('large.csv', convert=True, chunk_size=1_000_000) — streaming conversion; agent converts large CSV to vaex HDF5 format for fast future access; chunk_size streams chunks without memory exhaustion; resulting HDF5 processes 100x faster
  • Agent feature engineering — df['log_value'] = vaex.log(df['value']); df['norm'] = (df['value'] - df['value'].mean()) / df['value'].std() — virtual columns; agent adds computed columns without copying data; virtual columns evaluated lazily on access; no memory overhead for computed features
  • Agent statistical plots — df = vaex.open('events.hdf5'); df.plot1d(df['latency_ms'], limits=[0, 1000]) — fast plot; agent visualizes 1 billion events in seconds; vaex uses statistical sampling for visualization; matplotlib integration with fast binning
  • Agent filtering pipeline — df_filtered = df[(df['error_code'] == 0) & (df['latency'] < 100)]; count = df_filtered.count() — lazy filter chain; agent applies multiple filters before counting; filters compose without executing until count(); memory-efficient pipeline for complex queries

Not For

  • Small datasets — vaex has overhead for small data; for <10M rows use pandas
  • Complex joins — vaex has limited join support vs pandas; for multi-table joins use pandas or DuckDB
  • Mutable operations — vaex DataFrames are immutable (no in-place updates); for mutable tabular data use pandas

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

No auth — local dataframe library.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

vaex is MIT licensed. Free for all use.

Agent Metadata

Pagination
cursor
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • vaex DataFrame is not pandas DataFrame — vaex has similar API but not identical; df.groupby() works differently; df.merge() has limitations; agent code porting from pandas to vaex must test operations explicitly; don't assume pandas behavior
  • Lazy evaluation surprises — vaex operations return expressions not values; df['col'].mean() returns Expression; float(df['col'].mean()) triggers actual computation; agent code must force evaluation: .values or float() or numpy() to get actual numbers
  • CSV is slow — vaex.open('file.csv') is significantly slower than HDF5; for performance: vaex.from_csv('file.csv', convert=True) converts to HDF5 on first run; subsequent opens use HDF5; agent pipeline should convert CSV data once before analysis
  • Missing values different from pandas NaN — vaex uses masked arrays; vaex.ismissing(df['col']) to check; df.dropna() removes missing; but pandas NaN handling code doesn't directly apply to vaex missing values; agent code porting pandas NaN logic must adapt
  • vaex 4.x has breaking changes from 3.x — vaex 4.x moved to Apache Arrow backend; vaex 3.x HDF5 files may need migration; agent code upgrading must test data compatibility; check vaex version: import vaex; vaex.__version__
  • Memory mapping requires local filesystem — vaex.open() uses OS memory mapping; network filesystems (NFS, SMB) don't support memory mapping efficiently; agent code on cloud VMs should use local SSD not network storage; S3 requires full download or specialized cloud-native format

Alternatives

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for vaex.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-06.

5229
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered