chardet

Character encoding detection library for Python — detects encoding of byte strings using statistical analysis. chardet features: detect() for single-result detection with confidence score, UniversalDetector for streaming/incremental detection, supports 50+ encodings (UTF-8, Latin-1, Shift-JIS, GB2312, EUC-KR, Big5, ASCII, ISO-8859, Windows-1252, etc.), returns encoding name and confidence (0.0-1.0), and close() for finalizing detection. Used when reading files of unknown encoding from email, web scraping, or legacy systems.

Evaluated Mar 06, 2026 (0d ago) v5.x

Homepage ↗ Repo ↗ Developer Tools python chardet encoding charset detect unicode ASCII

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Pure detection library with no network calls. No security concerns in detection itself. Misdetected encoding may lead to incorrect text interpretation — validate decoded text when processing user data. latin-1 errors='replace' is safe fallback but may produce garbled text.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

Detecting encoding of files or web content from unknown sources — chardet is the standard choice for handling legacy encodings when charset information is unavailable.

Avoid When

Known encoding, ASCII-only text, high-speed bulk processing, or when charset-normalizer (more accurate) is already available via requests.

Use Cases

• Agent detect file encoding — import chardet; with open('unknown.txt', 'rb') as f: raw = f.read(); result = chardet.detect(raw); encoding = result['encoding']; confidence = result['confidence']; text = raw.decode(encoding) — detect; agent reads binary file and detects encoding; decode with detected encoding; check confidence before using
• Agent streaming detection — from chardet import UniversalDetector; detector = UniversalDetector(); with open('large.txt', 'rb') as f: for line in f: detector.feed(line); if detector.done: break; detector.close(); result = detector.result — streaming; agent detects encoding without loading entire file; feed() until done=True
• Agent web scraping encoding — import requests; import chardet; resp = requests.get(url); encoding = chardet.detect(resp.content)['encoding']; text = resp.content.decode(encoding or 'utf-8') — web; agent handles pages with wrong Content-Type charset declaration; detect actual encoding from bytes
• Agent handle low confidence — result = chardet.detect(raw); if result['confidence'] < 0.7: try: text = raw.decode('utf-8') except UnicodeDecodeError: text = raw.decode('latin-1', errors='replace') — confidence check; agent uses fallback strategy for uncertain detection; latin-1 decodes any byte sequence
• Agent multiple encodings — from charset_normalizer import from_bytes; results = from_bytes(raw).best(); text = str(results) — alternative; agent uses charset-normalizer (more accurate successor to chardet) for better detection; charset-normalizer is dependency of requests 2.26+

Not For

• Files with known encoding — if encoding is known (from HTTP headers, database), decode directly; chardet adds unnecessary overhead
• Pure ASCII text — ASCII is always detected correctly; no need for chardet with English-only text
• High-speed bulk processing — chardet uses statistical analysis; slow for bulk encoding detection; use charset-normalizer for accuracy or assume UTF-8 with error handling

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: none

OAuth: No Scopes: No

No auth — encoding detection utility.

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

chardet is LGPL 2.1 licensed. Free for all use.

Agent Metadata

Pagination

none

Idempotent

Full

Retry Guidance

Not documented

Known Gotchas

⚠ result['encoding'] may be None — chardet.detect(b'') returns {'encoding': None, 'confidence': 0.0}; empty or too-short input fails detection; agent code: encoding = result['encoding'] or 'utf-8' — always provide fallback; also check result['confidence'] before trusting encoding
⚠ Confidence below 0.7 is unreliable — chardet.detect() confidence: 0.0=failed, 0.5-0.6=guess, 0.7-0.99=reliable; agent code: if confidence < 0.7: try multiple decodings or default to UTF-8; high confidence (>0.9) is trustworthy; medium confidence should have fallback
⚠ UTF-8 with BOM detected as UTF-8-SIG — files starting with 0xEF 0xBB 0xBF (UTF-8 BOM) detected as UTF-8-SIG; decoding with utf-8 raises UnicodeDecodeError; agent code: use detected encoding directly: raw.decode(result['encoding']) — UTF-8-SIG strips BOM automatically
⚠ chardet vs charset-normalizer — requests uses charset-normalizer not chardet since 2.26+; charset-normalizer is more accurate: from charset_normalizer import from_bytes; best = from_bytes(raw).best(); text = str(best); agent code for new projects: consider charset-normalizer as drop-in replacement with better accuracy
⚠ Input must be bytes not string — chardet.detect(text_string) where text_string is str raises TypeError; always pass bytes: chardet.detect(b'content'); when reading file: open in binary mode 'rb'; agent code: ensure input is bytes before calling detect()
⚠ Short strings give poor confidence — chardet.detect(b'Hello') — UTF-8 detected with 0.99 confidence (ASCII subset); chardet.detect(b'AB') — low confidence; agent code for encoding detection: need at least 100-1000 bytes for reliable detection; for very short strings: default to UTF-8; UniversalDetector needs more data than detect()

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for chardet.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.