Amundsen Data Discovery

Open-source data discovery and metadata engine originally built at Lyft. Amundsen indexes metadata from data sources (tables, dashboards, users) and makes it searchable via Elasticsearch. REST API provides search, table metadata, column lineage, and usage statistics. Uses Neo4j for graph-based lineage. Provides a web UI for data discovery. Apache 2.0 licensed.

Evaluated Mar 06, 2026 (0d ago) vcurrent
Homepage ↗ Repo ↗ Developer Tools data-catalog metadata search lineage open-source lyft python neo4j elasticsearch
⚙ Agent Friendliness
56
/ 100
Can an agent use this?
🔒 Security
77
/ 100
Is it safe for agents?
⚡ Reliability
72
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
72
Error Messages
68
Auth Simplicity
80
Rate Limits
85

🔒 Security

TLS Enforcement
95
Auth Strength
70
Scope Granularity
65
Dep. Hygiene
80
Secret Handling
78

Apache 2.0 open-source — auditable. Default no-auth is a security risk — requires explicit configuration. Self-hosted: security posture depends on deployer. Authentication delegation to OIDC provider when configured.

⚡ Reliability

Uptime/SLA
75
Version Stability
72
Breaking Changes
70
Error Recovery
70
AF Security Reliability

Best When

You need open-source data discovery for your data lake or warehouse and can operate self-hosted infrastructure (Neo4j + Elasticsearch + Python).

Avoid When

You need managed data catalog with SLA, enterprise support, or integrated data governance — DataHub or Atlan are better maintained and more feature-rich.

Use Cases

  • Search Amundsen's metadata catalog via REST API for datasets matching natural language queries — useful for AI agent data discovery
  • Retrieve table metadata (schema, description, owners, usage frequency) before an agent accesses a dataset
  • Query column-level lineage to understand data provenance before using data for AI model training
  • Build data-aware agent workflows that look up dataset documentation and usage patterns from Amundsen before querying
  • Integrate Amundsen's popularity and usage signals to help agents prioritize high-quality, frequently-used datasets

Not For

  • Enterprises needing enterprise support, SLAs, and managed hosting — Amundsen is self-hosted only
  • Complex data governance workflows — Amundsen is primarily discovery-focused; for policy enforcement use Collibra or BigID
  • Real-time metadata updates — Amundsen relies on periodic ingestion pipelines, not real-time metadata streams

Interface

REST API
Yes
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none oauth2
OAuth: Yes Scopes: No

Authentication is optional and configured by the deployer. Supports OIDC/OAuth integration for enterprise deployments. Default setup has no authentication — requires explicit configuration for secure deployments.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

Amundsen is free and open-source. Operational costs include Neo4j (paid for scale), Elasticsearch, and compute. Self-hosting requires operational expertise.

Agent Metadata

Pagination
page
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • Amundsen requires self-hosting Neo4j, Elasticsearch, and multiple Python microservices — significant operational overhead
  • API documentation is sparse — agents must discover endpoints from source code or community resources
  • Metadata freshness depends on ingestion pipeline frequency — data may be days old if pipelines run infrequently
  • No authentication by default — self-hosted deployments without auth are a security risk; always configure OIDC
  • Amundsen development has slowed compared to DataHub — evaluate long-term maintenance before committing
  • Search relevance depends on metadata quality — poor table descriptions yield poor search results for agent discovery
  • Neo4j licensing changed to dual-license (GPL/commercial) — verify licensing implications for your deployment

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Amundsen Data Discovery.

$99

Scores are editorial opinions as of 2026-03-06.

5215
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered