OpenLineage

Open standard for data lineage collection and propagation. OpenLineage defines a spec for how data pipelines emit lineage events (who ran what job, which datasets were inputs, which were outputs). Compatible backends: Marquez (open-source server), DataHub, Atlan. Integrations for Spark, Airflow, dbt, Flink, and more auto-emit lineage events without code changes.

Evaluated Mar 06, 2026 (0d ago) vv1.x
Homepage ↗ Repo ↗ Developer Tools data-lineage metadata open-standard marquez spark airflow dbt lineage governance
⚙ Agent Friendliness
62
/ 100
Can an agent use this?
🔒 Security
83
/ 100
Is it safe for agents?
⚡ Reliability
81
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
82
Error Messages
72
Auth Simplicity
88
Rate Limits
90

🔒 Security

TLS Enforcement
95
Auth Strength
78
Scope Granularity
70
Dep. Hygiene
90
Secret Handling
85

Apache 2.0 open-source — Linux Foundation project, highly auditable. Auth delegated to backend implementation. HTTPS for event transport recommended. No credentials stored in spec — backend-specific. Strong security by design.

⚡ Reliability

Uptime/SLA
85
Version Stability
82
Breaking Changes
80
Error Recovery
78
AF Security Reliability

Best When

You're using Airflow, Spark, or dbt and want automatic data lineage tracking across your data pipelines without writing custom lineage code.

Avoid When

You need commercial enterprise lineage management with support SLAs — use a commercial product like Collibra or Atlan that consumes OpenLineage events.

Use Cases

  • Automatically track data lineage across AI training pipelines using OpenLineage Airflow/Spark integrations — no custom code required
  • Feed lineage data to Marquez or DataHub for catalog enrichment, enabling agents to understand data provenance
  • Implement impact analysis — understand which downstream datasets and models are affected when an upstream data source changes
  • Audit ML model training data lineage for compliance — prove which datasets contributed to a model version via lineage events
  • Build data observability pipelines that trigger alerts when expected lineage events are missing (indicating pipeline failure)

Not For

  • Real-time row-level data tracking — OpenLineage tracks job-level dataset lineage, not individual record provenance
  • Column-level lineage natively — column lineage is being added but not fully supported across all integrations
  • Teams using only proprietary tools — lineage integration requires OpenLineage-compatible data tools

Interface

REST API
Yes
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: bearer_token none
OAuth: No Scopes: No

OpenLineage spec itself is auth-agnostic — backends implement authentication. Marquez (open-source backend) has optional API key auth. DataHub uses its own auth. HTTP transport supports Authorization header.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

OpenLineage is a standard, not a product — free to use. The receiver/backend (Marquez, DataHub) has its own cost. Linux Foundation project with strong industry backing.

Agent Metadata

Pagination
none
Idempotent
Partial
Retry Guidance
Not documented

Known Gotchas

  • OpenLineage is a standard, not a complete product — you need a backend (Marquez, DataHub) to store and query lineage
  • Integrations auto-emit lineage for standard operations — custom dataset transformations may require manual event emission
  • Lineage event schema is strict — invalid events are dropped silently by some backends
  • Column-level lineage support varies by integration — not all Spark/Airflow operations emit column lineage
  • Backend choice significantly affects query capabilities — Marquez offers simpler queries; DataHub offers richer search
  • Async event transport means lineage data may lag behind pipeline execution — don't query lineage immediately after job completion
  • OpenLineage facets extend the base spec — verify that your backend supports the specific facets your integration emits

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for OpenLineage.

$99

Scores are editorial opinions as of 2026-03-06.

5215
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered