lakeFS

Git-like versioning layer for data lakes built on object storage (S3, GCS, Azure Blob). lakeFS adds branches, commits, and merges to data stored in object storage — enabling data teams to create isolated development environments, test pipeline changes, and roll back bad data transformations. Works transparently with existing tools (Spark, Presto, dbt, Airflow) via an S3-compatible API. No data movement — versioning is metadata-only.

Evaluated Mar 06, 2026 (0d ago) v1.x
Homepage ↗ Repo ↗ Developer Tools data-versioning git-for-data s3 object-storage open-source lakehouse mlops reproducibility
⚙ Agent Friendliness
61
/ 100
Can an agent use this?
🔒 Security
87
/ 100
Is it safe for agents?
⚡ Reliability
82
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
85
Error Messages
80
Auth Simplicity
82
Rate Limits
78

🔒 Security

TLS Enforcement
95
Auth Strength
85
Scope Granularity
82
Dep. Hygiene
88
Secret Handling
85

Apache 2.0 open source. SOC2 for lakeFS Cloud. Fine-grained RBAC policies. S3-style credentials with granular permissions. Data stays in your object storage — lakeFS only stores metadata. Self-hosting ensures complete data sovereignty.

⚡ Reliability

Uptime/SLA
85
Version Stability
82
Breaking Changes
80
Error Recovery
82
AF Security Reliability

Best When

Data engineering teams managing large datasets on object storage who need Git-like workflow (branches, commits, merges) for data without moving data.

Avoid When

You need ACID transactions on streaming data — use Delta Lake or Apache Iceberg instead. lakeFS is for Git-style versioning, not transactional updates.

Use Cases

  • Create isolated data branches for testing ML pipeline changes before merging to production data lake — zero-copy branching with metadata-only overhead
  • Roll back a data lake to a previous commit when a bad data pipeline writes incorrect data — no data loss risk
  • Enable agent pipelines to work on their own data branch without affecting production — safe parallel data manipulation
  • Track which data version was used for each ML model training run — commit hash as data version identifier
  • Run data quality checks on a branch before merging to production — prevent bad data from reaching downstream consumers

Not For

  • Small-scale data that fits in a database — lakeFS is for object storage (S3) scale data; databases have built-in versioning
  • Real-time streaming data — lakeFS is for batch object storage versioning; use Delta Lake or Iceberg for streaming with ACID
  • Teams not using object storage — lakeFS requires S3-compatible storage as the backend

Interface

REST API
Yes
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
Yes

Authentication

Methods: api_key bearer_token
OAuth: No Scopes: Yes

Access key + secret key pair (S3-style credentials) for API and S3-compatible access. Keys created in lakeFS settings. Policies define fine-grained access control. lakeFS Cloud adds SSO.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

Apache 2.0 open source — fully free for self-hosting. lakeFS Cloud is the managed option. You pay only for underlying object storage (S3 costs). Open source has full feature parity.

Agent Metadata

Pagination
cursor
Idempotent
Full
Retry Guidance
Documented

Known Gotchas

  • lakeFS uses a repository/branch/path URL format (lakefs://repo/branch/path) — S3 clients must be configured with lakeFS endpoint to work transparently
  • Branch merges can have conflicts when the same file is modified on multiple branches — agents must handle merge conflict resolution
  • Commits in lakeFS are explicit — just writing to a branch doesn't commit; agents must explicitly call the commit API
  • lakeFS metadata is separate from object storage — the lakeFS server must be running for any data access; direct S3 access bypasses versioning
  • Garbage collection (GC) must be configured to clean up orphaned objects — branches that are deleted don't automatically free underlying S3 storage
  • lakeFS hooks (pre-commit, pre-merge) run server-side — custom validation logic requires deploying server-side hook scripts

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for lakeFS.

$99

Scores are editorial opinions as of 2026-03-06.

5215
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered