Unidecode
Unicode to ASCII transliteration for Python — converts Unicode text to its closest ASCII representation. Unidecode features: unidecode() for full transliteration, unidecode_expect_ascii() for faster ASCII-expected input, unidecode_expect_nonascii() for Unicode-expected input, character-level mapping tables, and support for CJK (Chinese/Japanese/Korean) romanization, Cyrillic/Greek/Hebrew/Arabic transliteration, accent removal, and full Unicode coverage. Converts 'Ångström' to 'Angstrom', 'Café' to 'Cafe', '日本語' to 'Ri Ben Yu'.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
Transliteration library with no network calls. Output is always ASCII — safe for filesystem and URL use after additional sanitization. Lossy transformation may normalize different Unicode inputs to same ASCII — consider homograph attacks when using unidecode output for security-sensitive identifiers.
⚡ Reliability
Best When
Generating ASCII slugs, filenames, and identifiers from Unicode text — Unidecode provides consistent, predictable ASCII output for any Unicode input without external dependencies.
Avoid When
Round-trip required (use original Unicode), semantic translation (use translation API), or CJK-specific romanization conventions (use language-specific libraries).
Use Cases
- • Agent URL slug generation — from unidecode import unidecode; import re; def slugify(text): ascii_text = unidecode(text); slug = re.sub(r'[^a-z0-9]+', '-', ascii_text.lower()).strip('-'); return slug; slugify('Crème brûlée') — 'creme-brulee' — URL-safe slug; agent generates URL slugs from international titles; consistent ASCII output for any Unicode input
- • Agent filename sanitization — from unidecode import unidecode; safe_name = unidecode(user_filename).replace(' ', '_'); safe_path = os.path.join(base_dir, safe_name) — ASCII filename; agent creates files from user-provided names; filesystem compatibility across OS; removes diacritics and special chars
- • Agent text normalization for search — from unidecode import unidecode; normalized_query = unidecode(user_query.lower()); results = search_index.query(normalized_query) — normalize search query; agent search supports international input matching ASCII-indexed content; 'München' matches 'Munchen' in index
- • Agent username validation — from unidecode import unidecode; base_username = unidecode(display_name).lower(); username = re.sub(r'[^a-z0-9]', '', base_username)[:20] — generate username from display name; agent creates ASCII usernames from Unicode display names; consistent identifier format
- • Agent CSV/spreadsheet export — from unidecode import unidecode; rows = [[unidecode(str(v)) for v in row] for row in data]; csv_writer.writerows(rows) — ASCII CSV; agent exports data to systems expecting ASCII; international text remains readable in ASCII approximation
Not For
- • Translation — unidecode does transliteration (sound), not semantic translation; 'Café' → 'Cafe' not 'Coffee Shop'
- • Round-trip fidelity — unidecode is lossy; original Unicode cannot be recovered from ASCII output; don't use for display purposes
- • Preserving semantic meaning — CJK transliteration may produce unexpected romanizations; for CJK-specific romanization use pinyin/kakasi libraries
Interface
Authentication
No auth — local text processing library.
Pricing
Unidecode is dual licensed GPL-2.0 or Artistic License. Check license compatibility for commercial use.
Agent Metadata
Known Gotchas
- ⚠ Transliteration is lossy — unidecode('Café') → 'Cafe'; original accent information is lost; agent code must NOT use unidecode() for display to users; only use for slug/identifier generation; store original Unicode separately if needed
- ⚠ CJK transliteration is pronunciation-based — unidecode('日本') → 'Ri Ben'; this is Mandarin pronunciation of Japanese kanji; not the Japanese pronunciation 'Nihon'; agent code doing Japanese or Korean transliteration should use language-specific libraries (kakasi for Japanese)
- ⚠ Empty string for some characters — some rare Unicode characters have no ASCII equivalent; unidecode() returns empty string for those; agent code generating slugs may get shorter-than-expected strings or empty strings; always validate slug is non-empty after unidecode
- ⚠ GPL license check for commercial use — Unidecode is GPL-2.0 or Artistic License dual-license; GPL-2.0 requires derived works to also be GPL; for commercial closed-source use: check if Artistic License applies; consider alternatives like anyascii (Apache 2.0) for more permissive licensing
- ⚠ Spaces preserved as spaces — unidecode('hello world') → 'hello world' (spaces unchanged); for slug generation must separately replace spaces: unidecode(text).replace(' ', '-'); unidecode does not produce URL-safe output alone
- ⚠ unidecode_expect_ascii() vs unidecode() — unidecode_expect_ascii(mostly_ascii_text) is faster because it skips table lookup for ASCII characters; returns None for non-ASCII characters (not string); agent code using expect_ascii must handle None: use unidecode() if text may have any Unicode
Alternatives
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Unidecode.
Scores are editorial opinions as of 2026-03-06.