h5py

Python interface to HDF5 — read/write hierarchical datasets in the HDF5 binary format. h5py features: File context manager (h5py.File), Group hierarchy (file['/group/subgroup']), Dataset creation (file.create_dataset), NumPy-compatible array access, chunked datasets, compression (gzip, lzf, szip), dataset attributes, virtual datasets, partial I/O (fancy indexing), memory-mapped access, parallel HDF5 with MPI, SWMR (single-writer multiple-reader), and h5py.special_dtype for variable-length strings. HDF5 is the standard format for scientific datasets, PyTorch model checkpoints, Keras weights (.h5), and TensorFlow SavedModel. Primary Python API for reading/writing HDF5 files used across ML, genomics, and physics.

Evaluated Mar 06, 2026 (0d ago) v3.x
Homepage ↗ Repo ↗ Developer Tools python h5py hdf5 scientific-computing arrays datasets checkpoints
⚙ Agent Friendliness
65
/ 100
Can an agent use this?
🔒 Security
90
/ 100
Is it safe for agents?
⚡ Reliability
82
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
80
Error Messages
78
Auth Simplicity
98
Rate Limits
98

🔒 Security

TLS Enforcement
92
Auth Strength
92
Scope Granularity
88
Dep. Hygiene
85
Secret Handling
90

Local file I/O — no network access. HDF5 files are binary format — do not load untrusted HDF5 files as malformed files can trigger libhdf5 vulnerabilities. Keras .h5 weight files from untrusted sources should be verified before loading into agent models.

⚡ Reliability

Uptime/SLA
85
Version Stability
82
Breaking Changes
78
Error Recovery
82
AF Security Reliability

Best When

Reading/writing large scientific datasets, ML model checkpoints, or genomics/physics data in the standard HDF5 format — h5py is the Python standard for HDF5 access with NumPy-compatible partial I/O.

Avoid When

You need cloud-native array storage (use Zarr), relational data (use SQL), or concurrent multi-writer access.

Use Cases

  • Agent model checkpoint storage — with h5py.File('agent_model.h5', 'w') as f: f.create_dataset('weights', data=model_weights, compression='gzip') — store agent model weights in compressed HDF5; checkpoints load faster than pickle; human-inspectable structure with h5ls
  • Agent large dataset I/O — with h5py.File('training_data.h5', 'r') as f: batch = f['features'][1000:2000, :] — read only needed slice of 100M row dataset; HDF5 supports partial I/O without loading full file; agent training loops read batches directly from HDF5
  • Agent genomics data — with h5py.File('genome.h5', 'r') as f: sequences = f['chr1/sequences'][start:end] — read genomic sequence data stored in HDF5 hierarchy; agent bioinformatics pipelines read standard H5 format from GATK, DeepVariant, and other genomics tools
  • Agent attribute metadata — with h5py.File('agent_data.h5', 'a') as f: f['results'].attrs['model_version'] = '1.2.3'; f['results'].attrs['timestamp'] = str(datetime.now()) — store metadata alongside agent data arrays; attributes inspectable without reading full dataset
  • Agent SWMR streaming — with h5py.File('live_data.h5', 'r', swmr=True) as f: while True: ds = f['sensor_data']; ds.id.refresh(); latest = ds[-1] — read HDF5 being written by another process; agent monitoring pipeline reads sensor data as writer appends; SWMR allows concurrent read/write

Not For

  • Relational queries — use SQLite or PostgreSQL; HDF5 is hierarchical array storage not a relational database
  • Concurrent multi-writer access — HDF5 allows single writer at a time (SWMR allows one writer, many readers); for multi-writer use Zarr or database
  • Small datasets — HDF5 file overhead (metadata, chunking) adds complexity for small datasets; use .npy or pickle for simple small arrays

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

No auth — local file I/O library.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

h5py is BSD licensed. Free for all use.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • h5py.File must be used as context manager — f = h5py.File('data.h5', 'r'); data = f['key'][:]; f.close() is safe but forgetting f.close() leaks file handle; agent code in loops must use with h5py.File(...) as f: or file handles accumulate and process hits OS limit
  • Dataset returned is lazy not array — f['data'] returns h5py.Dataset object, not NumPy array; f['data'].shape works but f['data'][0] + f['data'][1] triggers two I/O reads; agent code processing dataset elements should load needed slice once: arr = f['data'][start:end]; then work with arr
  • Mode 'a' vs 'r+' vs 'w' differ critically — mode='a' creates file if missing or opens for append; mode='r+' requires file exists; mode='w' truncates existing file; agent code using mode='a' to append to existing file accidentally creates empty file if path wrong
  • Variable-length strings require special dtype — h5py.string_dtype() required for variable-length string datasets in h5py 3.x; fixed-length h5py.special_dtype(vlen=str) was h5py 2.x API; agent code from h5py 2.x examples using special_dtype raises deprecation warning in h5py 3.x
  • Fancy indexing returns copy not view — f['data'][[0, 5, 10]] (fancy indexing) reads and returns copy; f['data'][0:10] (slice) may return view into mmap; agent code expecting view semantics from fancy indexing gets full copy; avoid fancy indexing for large datasets in agent memory-constrained environments
  • Parallel HDF5 requires MPI build — h5py with parallel=True requires HDF5 built with MPI support and mpi4py; standard pip install h5py is serial only; agent distributed training code using h5py for multi-GPU data loading must install h5py[mpi] or use Zarr which supports concurrent reads natively

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for h5py.

$99

Scores are editorial opinions as of 2026-03-06.

5208
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered