ftfy

Fix text encoding issues in Python — automatically repairs mojibake (garbled text from wrong encoding), normalizes Unicode, and fixes common text encoding problems. ftfy features: fix_text() for general text repair, fix_encoding() for encoding-specific fixes, fix_and_explain() for diagnostic output, explain_unicode() for character analysis, remove_control_chars() for control character removal, uncurl_quotes() for smart/curly quote normalization, fix_surrogates() for surrogate pair repair, fix_latin_ligatures() for ligature expansion, and normalization (NFC/NFKC). Handles Windows-1252-as-UTF-8, Latin-1 mojibake, and other real-world encoding disasters.

Evaluated Mar 06, 2026 (0d ago) v6.x
Homepage ↗ Repo ↗ Developer Tools python ftfy unicode encoding text-fix mojibake nlp cleaning
⚙ Agent Friendliness
67
/ 100
Can an agent use this?
🔒 Security
92
/ 100
Is it safe for agents?
⚡ Reliability
85
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
85
Error Messages
82
Auth Simplicity
99
Rate Limits
99

🔒 Security

TLS Enforcement
92
Auth Strength
92
Scope Granularity
90
Dep. Hygiene
92
Secret Handling
92

Text processing library with no network calls. No security concerns for processing untrusted text — fix_text() is purely transformative. Output may contain different characters than input (intended) — validate output if used in security contexts after fixing.

⚡ Reliability

Uptime/SLA
85
Version Stability
85
Breaking Changes
82
Error Recovery
88
AF Security Reliability

Best When

Cleaning real-world text data with encoding problems — ftfy automatically detects and repairs the most common Unicode encoding issues in scraped text, database dumps, and document processing.

Avoid When

Text is already clean UTF-8 (ftfy is safe on clean text but adds overhead), language-level normalization (use spacy), or non-text binary data.

Use Cases

  • Agent text cleaning pipeline — import ftfy; clean = ftfy.fix_text(scraped_text) — fixes mojibake, normalizes Unicode; agent web scraping pipeline cleans text before NLP processing; fix_text() handles: 'Café' → 'Café', ‘smart’ quotes → 'smart' quotes, control characters
  • Agent encoding diagnosis — text, explanation = ftfy.fix_and_explain(garbled_text); print(explanation) — understand what was fixed; agent debugging encoding issues sees exactly what transformations applied; explanation shows: 'decoded from Windows-1252 then re-encoded as UTF-8'
  • Agent database text repair — cursor.execute('SELECT content FROM posts'); for row in cursor: fixed = ftfy.fix_text(row['content']); cursor2.execute('UPDATE posts SET content=%s WHERE id=%s', (fixed, row['id'])) — batch repair; agent fixes historical data with encoding issues; fix_text() is safe to call on already-correct text
  • Agent quote normalization — from ftfy import fix_text; normalized = fix_text(user_input, fix_entities=True, uncurl_quotes=True) — normalize smart quotes; agent NLP pipeline converts curly quotes to straight quotes; Word/Office copy-paste introduces curly quotes that break tokenizers
  • Agent character analysis — from ftfy import explain_unicode; explain_unicode('Cafẽ') — prints details of each character; agent debugging unexpected Unicode; shows codepoint, name, category, script for each character; useful for understanding what went wrong with encoding

Not For

  • Language detection — ftfy fixes encoding, not language; for language detection use langdetect
  • Translation — ftfy normalizes encoding, not language; for translation use deep-translator or googletrans
  • Heavy NLP preprocessing — ftfy handles encoding fixes; for full text normalization (stemming, stopwords) use spacy or nltk

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

No auth — local text processing library.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

ftfy is MIT licensed. Free for all use.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • fix_text() requires str not bytes — ftfy.fix_text(b'caf\xc3\xa9') raises TypeError; agent code reading bytes from file must decode first: text.decode('utf-8', errors='replace'); then apply fix_text(); or use fix_text(content.decode('latin-1')) if encoding is unknown
  • fix_text() may change text unexpectedly — fix_text() applies heuristics; rare cases where intended text is misidentified as mojibake; agent code processing technical content with unusual Unicode may see unwanted changes; use fix_text(text, fix_encoding=False) to disable encoding fixes while keeping normalization
  • Not for binary data detection — ftfy cannot tell if string is already UTF-8 or is Latin-1 misinterpreted as UTF-8; heuristics determine encoding; agent code uncertain about source encoding should try chardet first to detect encoding then decode properly before ftfy
  • explain_unicode() is diagnostic only — ftfy.explain_unicode('...') prints to stdout; no return value; agent code wanting explanation for automated processing should use: text, explanation = ftfy.fix_and_explain(text) — explanation is list of operation tuples
  • normalize_line_endings not included — fix_text() does not normalize \r\n to \n; agent NLP pipeline should separately: text = text.replace('\r\n', '\n').replace('\r', '\n') for consistent line endings; fix_text() handles encoding but not OS line ending differences
  • ftfy 6.x removed some functions — ftfy 6.x removed fix_text_segment() and changed API from 5.x; agent code upgrading from ftfy 5.x must update to fix_text(); check version: import ftfy; ftfy.__version__; migration: fix_text_segment() → fix_text() with same behavior

Alternatives

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for ftfy.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-06.

5229
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered