PyIceberg

Official Python implementation of Apache Iceberg, the open table format for huge analytic datasets. Provides a Python API for reading/writing Iceberg tables stored in S3, GCS, HDFS, or local filesystem with catalog support (REST, Hive, AWS Glue, Nessie). Enables Python agents to interact with Iceberg data lakes — schema evolution, time travel, partition management — without Spark or Java.

Evaluated Mar 06, 2026 (0d ago) v0.7+
Homepage ↗ Repo ↗ Other python apache-iceberg data-lake table-format parquet s3 catalog olap lakehouse
⚙ Agent Friendliness
59
/ 100
Can an agent use this?
🔒 Security
83
/ 100
Is it safe for agents?
⚡ Reliability
72
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
78
Error Messages
75
Auth Simplicity
75
Rate Limits
88

🔒 Security

TLS Enforcement
90
Auth Strength
82
Scope Granularity
80
Dep. Hygiene
82
Secret Handling
80

Security depends on catalog and object store configuration. Cloud credentials follow standard practices (IAM, service accounts). Data encrypted at rest by object store.

⚡ Reliability

Uptime/SLA
78
Version Stability
68
Breaking Changes
65
Error Recovery
75
AF Security Reliability

Best When

You're building Python data engineering tools that need to interact with Apache Iceberg tables — catalog management, schema evolution, time travel — without a Spark dependency.

Avoid When

You need high-throughput production ETL (use Spark/Flink with Java Iceberg), simple ad-hoc analytics (use DuckDB), or haven't set up Iceberg infrastructure.

Use Cases

  • Read and write Apache Iceberg tables from Python without requiring a Spark cluster for data lake workloads
  • Execute schema evolution on Iceberg tables (add columns, rename, change types) from Python agent workflows
  • Use Iceberg's time travel capabilities to query historical table snapshots for data validation and debugging
  • Register and discover Iceberg tables via REST, Glue, or Hive catalogs for data mesh and lakehouse architectures
  • Build Python ETL pipelines that write to Iceberg tables with ACID semantics for consistent lakehouse data

Not For

  • Production heavy-write workloads — PyIceberg is Python and slower than Spark/Flink for high-throughput writes; use Java Iceberg for production ETL
  • Teams without Iceberg infrastructure — significant setup required (catalog, object store); Delta Lake or Hudi may have simpler entry points
  • Ad-hoc SQL analytics — use DuckDB or ClickHouse for interactive SQL; PyIceberg is table format management, not a query engine

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: api_key oauth
OAuth: Yes Scopes: Yes

Auth depends on catalog type: REST catalog uses bearer tokens, AWS Glue uses IAM/boto3, Hive uses Kerberos. Object store auth via standard cloud credentials.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

Apache 2.0 license. Apache Software Foundation project.

Agent Metadata

Pagination
cursor
Idempotent
Partial
Retry Guidance
Not documented

Known Gotchas

  • PyIceberg requires a catalog — there is no standalone mode; must configure REST, Hive, Glue, or in-memory catalog before any table operations
  • Writing to Iceberg tables creates Parquet files in object store AND updates catalog metadata — both must succeed atomically or table state becomes inconsistent
  • Partition spec changes require new table rewrites or explicit partition evolution — adding new partition specs doesn't automatically re-partition existing data
  • PyIceberg's write performance is limited by Python's GIL — for high-throughput writes, use Spark with Java Iceberg or batch writes with PyArrow
  • Catalog connection strings vary by type (REST, Hive, Glue) — configuration syntax is not standardized across catalog types
  • Time travel queries require specifying snapshot ID or timestamp — PyIceberg's snapshot history is in the catalog; expired snapshots via table maintenance are not queryable

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for PyIceberg.

$99

Scores are editorial opinions as of 2026-03-06.

5176
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered