PyMuPDF (fitz)

Fast Python PDF/XPS/EPUB processing library that extracts text, images, annotations, and metadata from PDFs at 10-20x the speed of alternatives.

Evaluated Mar 06, 2026 (0d ago) v1.24.x

Homepage ↗ Repo ↗ Developer Tools pdf python mupdf text-extraction image-extraction agpl

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

100

Rate Limits

100

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Process only trusted PDFs — malicious PDFs can exploit parser vulnerabilities; keep library updated

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

High-throughput PDF text/image extraction where processing speed matters and AGPL license is acceptable.

Avoid When

You need a permissive license for commercial software — AGPL requires open-sourcing your code unless you purchase a commercial license.

Use Cases

• Extract text from 1000-page PDF reports maintaining reading order and page coordinates
• Convert PDF pages to high-resolution PNG images for vision model processing
• Extract all embedded images from PDF documents for downstream image analysis
• Search for specific text patterns across large PDF collections using page-by-page scan
• Annotate PDFs programmatically — add highlights, redactions, watermarks, and bookmarks

Not For

• Extracting tables with complex borders — use Camelot or Tabula for tabular data
• Commercial applications without a commercial MuPDF license (AGPL license restriction)
• Web browser environments — requires native MuPDF C library installation

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: none

OAuth: No Scopes: No

Local Python library, no network auth

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

AGPL v3 — commercial use requires separate license from Artifex

Agent Metadata

Pagination

none

Idempotent

Full

Retry Guidance

Not documented

Known Gotchas

⚠ AGPL license — embedding in proprietary software without commercial license is a legal violation; verify before deployment
⚠ Scanned PDFs return empty text — must combine with Tesseract/EasyOCR for image-based documents
⚠ fitz.open() does not raise on corrupted files — check doc.is_pdf and page count before processing
⚠ Text extraction order follows content streams, not visual reading order — complex multi-column layouts may require sort_coords=True
⚠ pip install pymupdf downloads pre-built wheels (300-500MB total); fails on platforms without wheel support and requires MuPDF build toolchain

Alternatives

pdfplumber-api pypdf-api pdfminer-api

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for PyMuPDF (fitz).

$99

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.