PyIceberg

Official Python implementation of Apache Iceberg, the open table format for huge analytic datasets. Provides a Python API for reading/writing Iceberg tables stored in S3, GCS, HDFS, or local filesystem with catalog support (REST, Hive, AWS Glue, Nessie). Enables Python agents to interact with Iceberg data lakes — schema evolution, time travel, partition management — without Spark or Java.

Evaluated Mar 06, 2026 (0d ago) v0.7+

Homepage ↗ Repo ↗ Other python apache-iceberg data-lake table-format parquet s3 catalog olap lakehouse

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Security depends on catalog and object store configuration. Cloud credentials follow standard practices (IAM, service accounts). Data encrypted at rest by object store.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You're building Python data engineering tools that need to interact with Apache Iceberg tables — catalog management, schema evolution, time travel — without a Spark dependency.

Avoid When

You need high-throughput production ETL (use Spark/Flink with Java Iceberg), simple ad-hoc analytics (use DuckDB), or haven't set up Iceberg infrastructure.

Use Cases

• Read and write Apache Iceberg tables from Python without requiring a Spark cluster for data lake workloads
• Execute schema evolution on Iceberg tables (add columns, rename, change types) from Python agent workflows
• Use Iceberg's time travel capabilities to query historical table snapshots for data validation and debugging
• Register and discover Iceberg tables via REST, Glue, or Hive catalogs for data mesh and lakehouse architectures
• Build Python ETL pipelines that write to Iceberg tables with ACID semantics for consistent lakehouse data

Not For

• Production heavy-write workloads — PyIceberg is Python and slower than Spark/Flink for high-throughput writes; use Java Iceberg for production ETL
• Teams without Iceberg infrastructure — significant setup required (catalog, object store); Delta Lake or Hudi may have simpler entry points
• Ad-hoc SQL analytics — use DuckDB or ClickHouse for interactive SQL; PyIceberg is table format management, not a query engine

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: api_key oauth

OAuth: Yes Scopes: Yes

Auth depends on catalog type: REST catalog uses bearer tokens, AWS Glue uses IAM/boto3, Hive uses Kerberos. Object store auth via standard cloud credentials.

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

Apache 2.0 license. Apache Software Foundation project.

Agent Metadata

Pagination

cursor

Idempotent

Partial

Retry Guidance

Not documented

Known Gotchas

⚠ PyIceberg requires a catalog — there is no standalone mode; must configure REST, Hive, Glue, or in-memory catalog before any table operations
⚠ Writing to Iceberg tables creates Parquet files in object store AND updates catalog metadata — both must succeed atomically or table state becomes inconsistent
⚠ Partition spec changes require new table rewrites or explicit partition evolution — adding new partition specs doesn't automatically re-partition existing data
⚠ PyIceberg's write performance is limited by Python's GIL — for high-throughput writes, use Spark with Java Iceberg or batch writes with PyArrow
⚠ Catalog connection strings vary by type (REST, Hive, Glue) — configuration syntax is not standardized across catalog types
⚠ Time travel queries require specifying snapshot ID or timestamp — PyIceberg's snapshot history is in the catalog; expired snapshots via table maintenance are not queryable

Alternatives

apache-iceberg-api delta-lake-api apache-arrow-api duckdb-api

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for PyIceberg.

$99

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.