vaex

Out-of-core lazy dataframe library for Python — processes billion-row datasets without loading into RAM using memory-mapped files. vaex features: open() for memory-mapped HDF5/Arrow/CSV, lazy evaluation (computations deferred until needed), virtual columns (computed on-the-fly), df.mean/std/sum operations at billion rows/second, df.plot1d/plot2d for fast statistical plots, filtering with boolean expressions, df.apply() for UDFs, df.export_hdf5() for efficient storage, string operations, ML feature engineering, and JIT compilation via Numba/Pytables. Enables desktop analysis of datasets too large for pandas.

Evaluated Mar 06, 2026 (0d ago) v4.x

Homepage ↗ Repo ↗ Developer Tools python vaex dataframe big-data out-of-core lazy visualization hdf5

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Local dataframe library with no network calls. HDF5 files are binary and machine-readable — protect large dataset files with filesystem permissions. No remote data access unless explicitly configured.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

Processing large datasets (100M-10B rows) that don't fit in RAM — vaex's memory-mapped lazy evaluation enables desktop analysis of truly large data without distributed infrastructure.

Avoid When

Small data (use pandas), complex multi-table operations (use DuckDB), mutable in-place updates, or real-time streaming.

Use Cases

• Agent large dataset analysis — import vaex; df = vaex.open('large_data.hdf5'); mean_score = df[df['status'] == 'success']['score'].mean() — lazy evaluation; agent analyzes billion-row event log without loading into RAM; operations execute on memory-mapped file; result computed only when accessed
• Agent CSV to HDF5 conversion — df = vaex.from_csv('large.csv', convert=True, chunk_size=1_000_000) — streaming conversion; agent converts large CSV to vaex HDF5 format for fast future access; chunk_size streams chunks without memory exhaustion; resulting HDF5 processes 100x faster
• Agent feature engineering — df['log_value'] = vaex.log(df['value']); df['norm'] = (df['value'] - df['value'].mean()) / df['value'].std() — virtual columns; agent adds computed columns without copying data; virtual columns evaluated lazily on access; no memory overhead for computed features
• Agent statistical plots — df = vaex.open('events.hdf5'); df.plot1d(df['latency_ms'], limits=[0, 1000]) — fast plot; agent visualizes 1 billion events in seconds; vaex uses statistical sampling for visualization; matplotlib integration with fast binning
• Agent filtering pipeline — df_filtered = df[(df['error_code'] == 0) & (df['latency'] < 100)]; count = df_filtered.count() — lazy filter chain; agent applies multiple filters before counting; filters compose without executing until count(); memory-efficient pipeline for complex queries

Not For

• Small datasets — vaex has overhead for small data; for <10M rows use pandas
• Complex joins — vaex has limited join support vs pandas; for multi-table joins use pandas or DuckDB
• Mutable operations — vaex DataFrames are immutable (no in-place updates); for mutable tabular data use pandas

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: none

OAuth: No Scopes: No

No auth — local dataframe library.

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

vaex is MIT licensed. Free for all use.

Agent Metadata

Pagination

cursor

Idempotent

Full

Retry Guidance

Not documented

Known Gotchas

⚠ vaex DataFrame is not pandas DataFrame — vaex has similar API but not identical; df.groupby() works differently; df.merge() has limitations; agent code porting from pandas to vaex must test operations explicitly; don't assume pandas behavior
⚠ Lazy evaluation surprises — vaex operations return expressions not values; df['col'].mean() returns Expression; float(df['col'].mean()) triggers actual computation; agent code must force evaluation: .values or float() or numpy() to get actual numbers
⚠ CSV is slow — vaex.open('file.csv') is significantly slower than HDF5; for performance: vaex.from_csv('file.csv', convert=True) converts to HDF5 on first run; subsequent opens use HDF5; agent pipeline should convert CSV data once before analysis
⚠ Missing values different from pandas NaN — vaex uses masked arrays; vaex.ismissing(df['col']) to check; df.dropna() removes missing; but pandas NaN handling code doesn't directly apply to vaex missing values; agent code porting pandas NaN logic must adapt
⚠ vaex 4.x has breaking changes from 3.x — vaex 4.x moved to Apache Arrow backend; vaex 3.x HDF5 files may need migration; agent code upgrading must test data compatibility; check vaex version: import vaex; vaex.__version__
⚠ Memory mapping requires local filesystem — vaex.open() uses OS memory mapping; network filesystems (NFS, SMB) don't support memory mapping efficiently; agent code on cloud VMs should use local SSD not network storage; S3 requires full download or specialized cloud-native format

Alternatives

modin-python-api

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for vaex.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.