charset-normalizer

Modern character encoding detection library for Python — detects the encoding of bytes sequences using statistical analysis. charset-normalizer features: from_bytes()/from_path()/from_fp() for encoding detection, Results object with best match and alternatives, encoding property for detected encoding name, confidence score per result, normalize() for encoding conversion, cli tool (normalizer) for command-line detection, MYPYC compiled C extension for speed, multibyte encoding support (CJK, Arabic, Hebrew), chaos detection for garbled text, and requests-compatible interface. Drop-in replacement for chardet with better accuracy and active maintenance.

Evaluated Mar 06, 2026 (0d ago) v3.x
Homepage ↗ Repo ↗ Developer Tools python charset encoding detection unicode chardet utf8
⚙ Agent Friendliness
67
/ 100
Can an agent use this?
🔒 Security
92
/ 100
Is it safe for agents?
⚡ Reliability
86
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
85
Error Messages
80
Auth Simplicity
99
Rate Limits
99

🔒 Security

TLS Enforcement
92
Auth Strength
92
Scope Granularity
90
Dep. Hygiene
92
Secret Handling
92

Pure text processing library with no network calls. Processing untrusted bytes: detection is safe (read-only analysis). normalize() writes files — validate output path to prevent path traversal. Decoded text from untrusted sources should be sanitized before use in SQL/HTML/shell contexts.

⚡ Reliability

Uptime/SLA
85
Version Stability
88
Breaking Changes
88
Error Recovery
85
AF Security Reliability

Best When

Detecting character encoding of text files and API responses with unknown encoding — charset-normalizer provides better accuracy than chardet with active maintenance and faster C extension.

Avoid When

Binary file type detection (use python-magic), guaranteed-accurate encoding detection (add BOM at creation time instead), or real-time streaming text.

Use Cases

  • Agent file encoding detection — from charset_normalizer import from_path; results = from_path('unknown.txt'); best = results.best(); print(best.encoding, best.chaos) — detect encoding; agent reads files with unknown encoding; from_path detects encoding from file bytes; best() returns highest-confidence match; then: with open(path, encoding=best.encoding) as f
  • Agent bytes to string — from charset_normalizer import from_bytes; raw_bytes = some_api_response.content; result = from_bytes(raw_bytes).best(); if result: text = str(result) — bytes decode; agent processes API responses or scraped content with unknown encoding; str(result) decodes using detected encoding; returns None if no encoding detected
  • Agent encoding normalization — from charset_normalizer import normalize; normalized_path = normalize('old_file.txt', encoding='utf-8') — convert encoding; agent converts legacy files to UTF-8; normalize() reads file, detects encoding, writes UTF-8; original file backed up with .bak extension by default
  • Agent bulk file processing — from charset_normalizer import from_bytes; for file_bytes in batch: result = from_bytes(file_bytes); encoding = result.best().encoding if result.best() else 'utf-8'; text = file_bytes.decode(encoding, errors='replace') — batch encoding detection; agent processes mixed-encoding document collections; fallback to UTF-8 with errors='replace' for safety
  • Agent requests integration — import requests; from charset_normalizer import from_bytes; response = requests.get(url); detected = from_bytes(response.content).best(); text = response.content.decode(detected.encoding if detected else 'utf-8') — web scraping; agent correctly decodes web pages regardless of Content-Type header encoding declaration accuracy

Not For

  • Binary file detection — charset-normalizer detects text encodings not binary vs text; use python-magic for binary type detection
  • Guaranteed accuracy — encoding detection is probabilistic; short texts and mixed-encoding documents may be misdetected; validate results
  • Real-time streaming — from_bytes requires full byte sequence; for streaming text use codec detection via BOM or HTTP Content-Type header

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

No auth — local text processing library.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

charset-normalizer is MIT licensed. Free for all use.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • results.best() can return None — from_bytes(data).best() returns None if no encoding detected (empty bytes, binary data, or too ambiguous); agent code must check: result = from_bytes(data).best(); if result is None: handle_failure(); do NOT call result.encoding on None — raises AttributeError; always guard with if result check
  • import charset_normalizer not charset_normalizer.detect — the chardet-compatible interface is: from charset_normalizer import detect; result = detect(bytes_data) returns dict with encoding and confidence; the native interface is: from charset_normalizer import from_bytes; both work but native interface gives more information including alternatives and chaos score
  • Small byte sequences have low reliability — detection on <100 bytes is unreliable; UTF-8 with ASCII characters often detected as ASCII; agent code should pass as much of the file as available; for streaming: buffer first 4096 bytes then detect; confidence < 0.5 should be treated as uncertain
  • chaos score indicates garbled text — result.chaos is float 0.0-1.0 measuring % of weird characters; chaos > 0.1 suggests misdetected encoding or garbled data; agent code should: if result.chaos > 0.1: try alternative encodings or flag for human review; chaos = 0.0 means clean, well-formed text in detected encoding
  • normalize() creates output file, not in-memory — charset_normalizer.normalize(path, encoding='utf-8') writes a new file to same directory; it does NOT return string; agent code needing in-memory conversion: read file → from_bytes() → str(result); normalize() is a file-to-file utility
  • requests library uses charset-normalizer automatically since 2.26 — requests already detects encoding via charset-normalizer/chardet; response.text uses detected encoding; agent code using requests should use response.text directly; only use charset-normalizer manually when response.content needs custom processing or response.encoding is wrong

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for charset-normalizer.

$99

Scores are editorial opinions as of 2026-03-06.

5208
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered