ftfy

Fix text encoding issues in Python — automatically repairs mojibake (garbled text from wrong encoding), normalizes Unicode, and fixes common text encoding problems. ftfy features: fix_text() for general text repair, fix_encoding() for encoding-specific fixes, fix_and_explain() for diagnostic output, explain_unicode() for character analysis, remove_control_chars() for control character removal, uncurl_quotes() for smart/curly quote normalization, fix_surrogates() for surrogate pair repair, fix_latin_ligatures() for ligature expansion, and normalization (NFC/NFKC). Handles Windows-1252-as-UTF-8, Latin-1 mojibake, and other real-world encoding disasters.

Evaluated Mar 06, 2026 (0d ago) v6.x

Homepage ↗ Repo ↗ Developer Tools python ftfy unicode encoding text-fix mojibake nlp cleaning

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Text processing library with no network calls. No security concerns for processing untrusted text — fix_text() is purely transformative. Output may contain different characters than input (intended) — validate output if used in security contexts after fixing.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

Cleaning real-world text data with encoding problems — ftfy automatically detects and repairs the most common Unicode encoding issues in scraped text, database dumps, and document processing.

Avoid When

Text is already clean UTF-8 (ftfy is safe on clean text but adds overhead), language-level normalization (use spacy), or non-text binary data.

Use Cases

• Agent text cleaning pipeline — import ftfy; clean = ftfy.fix_text(scraped_text) — fixes mojibake, normalizes Unicode; agent web scraping pipeline cleans text before NLP processing; fix_text() handles: 'CafÃ©' → 'Café', â€˜smartâ€™ quotes → 'smart' quotes, control characters
• Agent encoding diagnosis — text, explanation = ftfy.fix_and_explain(garbled_text); print(explanation) — understand what was fixed; agent debugging encoding issues sees exactly what transformations applied; explanation shows: 'decoded from Windows-1252 then re-encoded as UTF-8'
• Agent database text repair — cursor.execute('SELECT content FROM posts'); for row in cursor: fixed = ftfy.fix_text(row['content']); cursor2.execute('UPDATE posts SET content=%s WHERE id=%s', (fixed, row['id'])) — batch repair; agent fixes historical data with encoding issues; fix_text() is safe to call on already-correct text
• Agent quote normalization — from ftfy import fix_text; normalized = fix_text(user_input, fix_entities=True, uncurl_quotes=True) — normalize smart quotes; agent NLP pipeline converts curly quotes to straight quotes; Word/Office copy-paste introduces curly quotes that break tokenizers
• Agent character analysis — from ftfy import explain_unicode; explain_unicode('Cafẽ') — prints details of each character; agent debugging unexpected Unicode; shows codepoint, name, category, script for each character; useful for understanding what went wrong with encoding

Not For

• Language detection — ftfy fixes encoding, not language; for language detection use langdetect
• Translation — ftfy normalizes encoding, not language; for translation use deep-translator or googletrans
• Heavy NLP preprocessing — ftfy handles encoding fixes; for full text normalization (stemming, stopwords) use spacy or nltk

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: none

OAuth: No Scopes: No

No auth — local text processing library.

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

ftfy is MIT licensed. Free for all use.

Agent Metadata

Pagination

none

Idempotent

Full

Retry Guidance

Not documented

Known Gotchas

⚠ fix_text() requires str not bytes — ftfy.fix_text(b'caf\xc3\xa9') raises TypeError; agent code reading bytes from file must decode first: text.decode('utf-8', errors='replace'); then apply fix_text(); or use fix_text(content.decode('latin-1')) if encoding is unknown
⚠ fix_text() may change text unexpectedly — fix_text() applies heuristics; rare cases where intended text is misidentified as mojibake; agent code processing technical content with unusual Unicode may see unwanted changes; use fix_text(text, fix_encoding=False) to disable encoding fixes while keeping normalization
⚠ Not for binary data detection — ftfy cannot tell if string is already UTF-8 or is Latin-1 misinterpreted as UTF-8; heuristics determine encoding; agent code uncertain about source encoding should try chardet first to detect encoding then decode properly before ftfy
⚠ explain_unicode() is diagnostic only — ftfy.explain_unicode('...') prints to stdout; no return value; agent code wanting explanation for automated processing should use: text, explanation = ftfy.fix_and_explain(text) — explanation is list of operation tuples
⚠ normalize_line_endings not included — fix_text() does not normalize \r\n to \n; agent NLP pipeline should separately: text = text.replace('\r\n', '\n').replace('\r', '\n') for consistent line endings; fix_text() handles encoding but not OS line ending differences
⚠ ftfy 6.x removed some functions — ftfy 6.x removed fix_text_segment() and changed API from 5.x; agent code upgrading from ftfy 5.x must update to fix_text(); check version: import ftfy; ftfy.__version__; migration: fix_text_segment() → fix_text() with same behavior

Alternatives

unidecode-python-api

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for ftfy.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.