pypdf

Pure-Python library for reading, splitting, merging, encrypting, decrypting, and extracting content from PDF files without any native binary dependencies.

Evaluated Mar 06, 2026 (0d ago) v4.x
Homepage ↗ Repo ↗ Developer Tools python pdf merge split rotate encrypt decrypt form-fields pure-python
⚙ Agent Friendliness
66
/ 100
Can an agent use this?
🔒 Security
30
/ 100
Is it safe for agents?
⚡ Reliability
57
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
85
Error Messages
76
Auth Simplicity
100
Rate Limits
100

🔒 Security

TLS Enforcement
0
Auth Strength
0
Scope Granularity
0
Dep. Hygiene
83
Secret Handling
88

Pure Python with no native code reduces binary attack surface. PDF encryption support uses RC4 and AES — validate that encrypted PDFs use AES-256, not legacy RC4, before trusting content security.

⚡ Reliability

Uptime/SLA
0
Version Stability
80
Breaking Changes
72
Error Recovery
76
AF Security Reliability

Best When

You need to manipulate PDF structure (merge, split, rotate, encrypt/decrypt, extract pages) using a zero-dependency pure-Python solution.

Avoid When

Accurate text extraction with positional awareness or table parsing is required; pypdf's text extraction is best-effort and often produces garbled output on complex layouts.

Use Cases

  • Merge multiple PDF files into a single document for report assembly workflows
  • Split a large PDF into individual pages or chapter ranges for per-page processing pipelines
  • Decrypt password-protected PDFs before passing them to extraction or analysis tools
  • Read and populate PDF form field values (AcroForm) for automated document completion workflows
  • Rotate, crop, or reorder pages in a PDF as part of a document normalization step before OCR or ingestion

Not For

  • High-quality text extraction with layout preservation — pdfplumber or PyMuPDF produce far better results for text extraction
  • Table extraction from PDFs — use pdfplumber or camelot for structured table data
  • Rendering PDF pages to images — pypdf cannot render; use pdf2image or PyMuPDF for rasterization

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

Library — no authentication required.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

BSD 3-Clause licensed. Successor to the deprecated PyPDF2 package.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • pypdf (formerly PyPDF2) underwent a rename and significant API changes — code using PyPDF2 imports will break; always use 'from pypdf import PdfReader, PdfWriter'.
  • extract_text() produces unreliable output for PDFs with complex layouts, ligatures, or custom font encodings; do not rely on it for data extraction without manual validation.
  • Encrypted PDFs must be decrypted with reader.decrypt(password) before any page access; attempting to access pages on an encrypted PDF raises a FileNotDecryptedError with no partial content.
  • PdfWriter does not copy form field data automatically when merging pages from a PdfReader; AcroForm dictionaries must be copied separately to preserve interactive forms.
  • Large PDFs are loaded entirely into memory; for files exceeding a few hundred MB, memory pressure can cause OOM errors in constrained agent environments — stream page-by-page when possible.

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for pypdf.

$99

Scores are editorial opinions as of 2026-03-06.

5215
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered