lxml

Fast XML and HTML processing library for Python — C-backed bindings to libxml2 and libxslt providing the fastest XML/HTML parsing with XPath, XSLT, validation (DTD/XML Schema/RelaxNG), and lxml.etree API. lxml features: etree.parse()/etree.fromstring() for XML, html.parse()/html.fromstring() for HTML, XPath expressions (.xpath()), XSLT transformations, XML Schema and RelaxNG validation, ElementTree-compatible API, incremental parsing (iterparse for large files), objectify API for attribute-style XML access, ElementMaker for programmatic XML building, cleanup_namespaces(), and serialization (tostring() with pretty_print=True).

Evaluated Mar 06, 2026 (0d ago) v5.x
Homepage ↗ Repo ↗ Developer Tools python lxml xml html xpath xslt etree parsing
⚙ Agent Friendliness
69
/ 100
Can an agent use this?
🔒 Security
89
/ 100
Is it safe for agents?
⚡ Reliability
88
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
88
Error Messages
85
Auth Simplicity
99
Rate Limits
99

🔒 Security

TLS Enforcement
90
Auth Strength
90
Scope Granularity
88
Dep. Hygiene
88
Secret Handling
88

XML parsing library. XXE (XML External Entity) attacks: lxml has resolve_entities=False to disable; use etree.XMLParser(resolve_entities=False, no_network=True) for untrusted XML. Billion laughs attack: use etree.XMLParser(huge_tree=False). XSLT from untrusted source: allows arbitrary code execution via extensions — never apply untrusted XSLT.

⚡ Reliability

Uptime/SLA
88
Version Stability
88
Breaking Changes
88
Error Recovery
90
AF Security Reliability

Best When

Fast XML/HTML processing with XPath and XSLT — lxml is 5-50x faster than stdlib ElementTree and provides full XPath 1.0, XSLT, and schema validation that stdlib lacks.

Avoid When

Simple XML without XPath (use stdlib etree), JSON data, or environments where C extension compilation fails.

Use Cases

  • Agent XML parsing — from lxml import etree; tree = etree.parse('data.xml'); root = tree.getroot(); items = root.xpath('//item[@status="active"]/name/text()') — XPath; agent parses XML and extracts data via XPath; lxml XPath is much more powerful than ElementTree; returns list of matched nodes/text
  • Agent HTML scraping — from lxml import html; tree = html.fromstring(html_content); links = tree.xpath('//a[@href]/@href'); titles = tree.cssselect('.product h2') — HTML parse; agent parses HTML with lxml for fast XPath-based extraction; html.fromstring() handles malformed HTML; cssselect plugin adds CSS selector support
  • Agent large XML streaming — context = etree.iterparse('large.xml', events=('end',), tag='Record'); for event, elem in context: process(elem); elem.clear() — streaming parse; agent processes multi-GB XML files without loading into memory; elem.clear() releases memory; iterparse is event-driven
  • Agent XML validation — schema = etree.XMLSchema(etree.parse('schema.xsd')); doc = etree.parse('data.xml'); if not schema.validate(doc): errors = schema.error_log; handle_errors(errors) — schema validation; agent validates XML against XSD schema; error_log provides detailed validation errors
  • Agent XML generation — root = etree.Element('root'); child = etree.SubElement(root, 'item', id='1'); child.text = 'content'; xml_bytes = etree.tostring(root, pretty_print=True, xml_declaration=True, encoding='UTF-8') — XML creation; agent programmatically builds XML documents

Not For

  • Simple XML with stdlib — for basic XML tasks, stdlib xml.etree.ElementTree is sufficient without the C dependency
  • JSON data — lxml is XML/HTML only; for JSON use stdlib json or orjson
  • Environments where C extension fails — lxml requires compiled C extension; some minimal environments (Alpine musl) may have issues; use lxml-stubs for type hints

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

No auth — XML/HTML parsing library.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

lxml is BSD 3-Clause and GPL licensed. Free for all use.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • XPath returns list of nodes not single value — root.xpath('//item/text()') returns list ['val1', 'val2']; root.xpath('//item[1]/text()') returns list with one element; agent code: use [0] or xpath('//item[1]/text()')[0] for single value; or xpath('string(//item[1]/text())') returns string directly; xpath('count(//item)') returns float
  • lxml elements have tail text — etree structure: <a>text<b/>tail</a>; elem.text is 'text' (before child); elem[0].tail is 'tail' (after child b, still part of parent a); agent code parsing mixed-content XML: use itertext() or get_text() from lxml.html for HTML; raw XML: concatenate elem.text and each child's tail
  • Namespace handling in XPath — <root xmlns:ns='http://example.com'><ns:item/></root>; xpath('//ns:item') requires namespace map: root.xpath('//ns:item', namespaces={'ns': 'http://example.com'}); agent code with namespaced XML: always define namespace map; Clark notation alternative: root.xpath('//{http://example.com}item')
  • iterparse memory management — iterparse yields (event, elem) pairs; element still holds children in memory; agent code: after processing elem, call elem.clear() and del elem; also root reference keeps everything alive: use root = tree.getroot() outside loop or clear root too; without clearing: iterparse for large files still loads all into memory
  • from lxml import etree vs import lxml.etree — both work; from lxml import etree is conventional; lxml.html is separate: from lxml import html; html module has fromstring() that handles malformed HTML better than etree; agent code: use html.fromstring() for HTML (returns HtmlElement); etree.fromstring() for strict XML
  • html.fromstring() vs html.document_fromstring() — html.fromstring('<p>text</p>') may return an Element (not full document) if input is a fragment; html.document_fromstring() always returns full document HtmlElement; agent code parsing complete HTML pages: use document_fromstring(); for fragments: fromstring() and work with the element directly

Alternatives

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for lxml.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-06.

5229
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered