pypdf

Pure-Python library for reading, splitting, merging, encrypting, decrypting, and extracting content from PDF files without any native binary dependencies.

Evaluated Mar 06, 2026 (0d ago) v4.x

Homepage ↗ Repo ↗ Developer Tools python pdf merge split rotate encrypt decrypt form-fields pure-python

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

100

Rate Limits

100

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Pure Python with no native code reduces binary attack surface. PDF encryption support uses RC4 and AES — validate that encrypted PDFs use AES-256, not legacy RC4, before trusting content security.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You need to manipulate PDF structure (merge, split, rotate, encrypt/decrypt, extract pages) using a zero-dependency pure-Python solution.

Avoid When

Accurate text extraction with positional awareness or table parsing is required; pypdf's text extraction is best-effort and often produces garbled output on complex layouts.

Use Cases

• Merge multiple PDF files into a single document for report assembly workflows
• Split a large PDF into individual pages or chapter ranges for per-page processing pipelines
• Decrypt password-protected PDFs before passing them to extraction or analysis tools
• Read and populate PDF form field values (AcroForm) for automated document completion workflows
• Rotate, crop, or reorder pages in a PDF as part of a document normalization step before OCR or ingestion

Not For

• High-quality text extraction with layout preservation — pdfplumber or PyMuPDF produce far better results for text extraction
• Table extraction from PDFs — use pdfplumber or camelot for structured table data
• Rendering PDF pages to images — pypdf cannot render; use pdf2image or PyMuPDF for rasterization

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: none

OAuth: No Scopes: No

Library — no authentication required.

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

BSD 3-Clause licensed. Successor to the deprecated PyPDF2 package.

Agent Metadata

Pagination

none

Idempotent

Full

Retry Guidance

Not documented

Known Gotchas

⚠ pypdf (formerly PyPDF2) underwent a rename and significant API changes — code using PyPDF2 imports will break; always use 'from pypdf import PdfReader, PdfWriter'.
⚠ extract_text() produces unreliable output for PDFs with complex layouts, ligatures, or custom font encodings; do not rely on it for data extraction without manual validation.
⚠ Encrypted PDFs must be decrypted with reader.decrypt(password) before any page access; attempting to access pages on an encrypted PDF raises a FileNotDecryptedError with no partial content.
⚠ PdfWriter does not copy form field data automatically when merging pages from a PdfReader; AcroForm dictionaries must be copied separately to preserve interactive forms.
⚠ Large PDFs are loaded entirely into memory; for files exceeding a few hundred MB, memory pressure can cause OOM errors in constrained agent environments — stream page-by-page when possible.

Alternatives

pdfplumber-api docling-api camelot-api pymupdf reportlab

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for pypdf.

$99

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.