chardet

Character encoding detection library for Python — detects encoding of byte strings using statistical analysis. chardet features: detect() for single-result detection with confidence score, UniversalDetector for streaming/incremental detection, supports 50+ encodings (UTF-8, Latin-1, Shift-JIS, GB2312, EUC-KR, Big5, ASCII, ISO-8859, Windows-1252, etc.), returns encoding name and confidence (0.0-1.0), and close() for finalizing detection. Used when reading files of unknown encoding from email, web scraping, or legacy systems.

Evaluated Mar 06, 2026 (0d ago) v5.x
Homepage ↗ Repo ↗ Developer Tools python chardet encoding charset detect unicode ASCII
⚙ Agent Friendliness
65
/ 100
Can an agent use this?
🔒 Security
92
/ 100
Is it safe for agents?
⚡ Reliability
83
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
82
Error Messages
75
Auth Simplicity
99
Rate Limits
99

🔒 Security

TLS Enforcement
92
Auth Strength
92
Scope Granularity
92
Dep. Hygiene
90
Secret Handling
92

Pure detection library with no network calls. No security concerns in detection itself. Misdetected encoding may lead to incorrect text interpretation — validate decoded text when processing user data. latin-1 errors='replace' is safe fallback but may produce garbled text.

⚡ Reliability

Uptime/SLA
82
Version Stability
82
Breaking Changes
85
Error Recovery
82
AF Security Reliability

Best When

Detecting encoding of files or web content from unknown sources — chardet is the standard choice for handling legacy encodings when charset information is unavailable.

Avoid When

Known encoding, ASCII-only text, high-speed bulk processing, or when charset-normalizer (more accurate) is already available via requests.

Use Cases

  • Agent detect file encoding — import chardet; with open('unknown.txt', 'rb') as f: raw = f.read(); result = chardet.detect(raw); encoding = result['encoding']; confidence = result['confidence']; text = raw.decode(encoding) — detect; agent reads binary file and detects encoding; decode with detected encoding; check confidence before using
  • Agent streaming detection — from chardet import UniversalDetector; detector = UniversalDetector(); with open('large.txt', 'rb') as f: for line in f: detector.feed(line); if detector.done: break; detector.close(); result = detector.result — streaming; agent detects encoding without loading entire file; feed() until done=True
  • Agent web scraping encoding — import requests; import chardet; resp = requests.get(url); encoding = chardet.detect(resp.content)['encoding']; text = resp.content.decode(encoding or 'utf-8') — web; agent handles pages with wrong Content-Type charset declaration; detect actual encoding from bytes
  • Agent handle low confidence — result = chardet.detect(raw); if result['confidence'] < 0.7: try: text = raw.decode('utf-8') except UnicodeDecodeError: text = raw.decode('latin-1', errors='replace') — confidence check; agent uses fallback strategy for uncertain detection; latin-1 decodes any byte sequence
  • Agent multiple encodings — from charset_normalizer import from_bytes; results = from_bytes(raw).best(); text = str(results) — alternative; agent uses charset-normalizer (more accurate successor to chardet) for better detection; charset-normalizer is dependency of requests 2.26+

Not For

  • Files with known encoding — if encoding is known (from HTTP headers, database), decode directly; chardet adds unnecessary overhead
  • Pure ASCII text — ASCII is always detected correctly; no need for chardet with English-only text
  • High-speed bulk processing — chardet uses statistical analysis; slow for bulk encoding detection; use charset-normalizer for accuracy or assume UTF-8 with error handling

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

No auth — encoding detection utility.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

chardet is LGPL 2.1 licensed. Free for all use.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • result['encoding'] may be None — chardet.detect(b'') returns {'encoding': None, 'confidence': 0.0}; empty or too-short input fails detection; agent code: encoding = result['encoding'] or 'utf-8' — always provide fallback; also check result['confidence'] before trusting encoding
  • Confidence below 0.7 is unreliable — chardet.detect() confidence: 0.0=failed, 0.5-0.6=guess, 0.7-0.99=reliable; agent code: if confidence < 0.7: try multiple decodings or default to UTF-8; high confidence (>0.9) is trustworthy; medium confidence should have fallback
  • UTF-8 with BOM detected as UTF-8-SIG — files starting with 0xEF 0xBB 0xBF (UTF-8 BOM) detected as UTF-8-SIG; decoding with utf-8 raises UnicodeDecodeError; agent code: use detected encoding directly: raw.decode(result['encoding']) — UTF-8-SIG strips BOM automatically
  • chardet vs charset-normalizer — requests uses charset-normalizer not chardet since 2.26+; charset-normalizer is more accurate: from charset_normalizer import from_bytes; best = from_bytes(raw).best(); text = str(best); agent code for new projects: consider charset-normalizer as drop-in replacement with better accuracy
  • Input must be bytes not string — chardet.detect(text_string) where text_string is str raises TypeError; always pass bytes: chardet.detect(b'content'); when reading file: open in binary mode 'rb'; agent code: ensure input is bytes before calling detect()
  • Short strings give poor confidence — chardet.detect(b'Hello') — UTF-8 detected with 0.99 confidence (ASCII subset); chardet.detect(b'AB') — low confidence; agent code for encoding detection: need at least 100-1000 bytes for reliable detection; for very short strings: default to UTF-8; UniversalDetector needs more data than detect()

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for chardet.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-06.

5229
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered