tabula-py

Python wrapper for the Tabula Java library that extracts tables from PDF documents and returns them as pandas DataFrames.

Evaluated Mar 06, 2026 (0d ago) v2.9.x
Homepage ↗ Repo ↗ Developer Tools python pdf tables pandas java data-extraction tabular-data
⚙ Agent Friendliness
65
/ 100
Can an agent use this?
🔒 Security
87
/ 100
Is it safe for agents?
⚡ Reliability
78
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
82
Error Messages
72
Auth Simplicity
100
Rate Limits
100

🔒 Security

TLS Enforcement
90
Auth Strength
90
Scope Granularity
85
Dep. Hygiene
78
Secret Handling
88

Spawns a Java subprocess; ensure tabula-java jar is sourced from official distribution; avoid processing untrusted PDFs as Java PDF parsers have a history of CVEs

⚡ Reliability

Uptime/SLA
80
Version Stability
80
Breaking Changes
78
Error Recovery
72
AF Security Reliability

Best When

Your PDFs contain native (non-scanned) tables and you need pandas-ready output with minimal code.

Avoid When

Java is unavailable in your runtime environment, or the PDF tables are inside scanned images.

Use Cases

  • Extracting financial tables from PDF reports and converting them to pandas DataFrames
  • Batch processing regulatory filings or research papers to pull structured tabular data
  • Automating data ingestion pipelines that receive data only as PDF tables
  • Extracting multi-column tables from government or academic PDF publications
  • Converting PDF price lists or schedules into machine-readable CSV or JSON

Not For

  • Scanned PDFs where tables are images rather than native PDF content (use docTR + layout analysis instead)
  • Environments where Java 8+ cannot be installed or is prohibited
  • Extracting free-form prose or non-tabular text (use PyMuPDF or pdfminer instead)

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

Local Python library — no authentication required; requires Java 8+ available on PATH

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

MIT license. tabula-py is the Python wrapper; the underlying Tabula Java engine is also open source (MIT). Both are free with no usage limits.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • Java 8+ must be installed and on PATH — missing Java produces an opaque OSError, not a clear dependency message
  • read_pdf() returns a list of DataFrames, one per detected table — agents must iterate, not assume a single result
  • Table detection heuristics can merge or split tables incorrectly; lattice vs stream mode must be chosen manually
  • Password-encrypted PDFs require the password parameter — no automatic detection or helpful error
  • Very large PDFs spawn a long-lived JVM subprocess; agent timeouts must account for JVM startup overhead (~1-2s)

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for tabula-py.

$99

Scores are editorial opinions as of 2026-03-06.

5208
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered