lxml

Fast XML and HTML processing library for Python — C-backed bindings to libxml2 and libxslt providing the fastest XML/HTML parsing with XPath, XSLT, validation (DTD/XML Schema/RelaxNG), and lxml.etree API. lxml features: etree.parse()/etree.fromstring() for XML, html.parse()/html.fromstring() for HTML, XPath expressions (.xpath()), XSLT transformations, XML Schema and RelaxNG validation, ElementTree-compatible API, incremental parsing (iterparse for large files), objectify API for attribute-style XML access, ElementMaker for programmatic XML building, cleanup_namespaces(), and serialization (tostring() with pretty_print=True).

Evaluated Mar 06, 2026 (0d ago) v5.x

Homepage ↗ Repo ↗ Developer Tools python lxml xml html xpath xslt etree parsing

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

XML parsing library. XXE (XML External Entity) attacks: lxml has resolve_entities=False to disable; use etree.XMLParser(resolve_entities=False, no_network=True) for untrusted XML. Billion laughs attack: use etree.XMLParser(huge_tree=False). XSLT from untrusted source: allows arbitrary code execution via extensions — never apply untrusted XSLT.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

Fast XML/HTML processing with XPath and XSLT — lxml is 5-50x faster than stdlib ElementTree and provides full XPath 1.0, XSLT, and schema validation that stdlib lacks.

Avoid When

Simple XML without XPath (use stdlib etree), JSON data, or environments where C extension compilation fails.

Use Cases

• Agent XML parsing — from lxml import etree; tree = etree.parse('data.xml'); root = tree.getroot(); items = root.xpath('//item[@status="active"]/name/text()') — XPath; agent parses XML and extracts data via XPath; lxml XPath is much more powerful than ElementTree; returns list of matched nodes/text
• Agent HTML scraping — from lxml import html; tree = html.fromstring(html_content); links = tree.xpath('//a[@href]/@href'); titles = tree.cssselect('.product h2') — HTML parse; agent parses HTML with lxml for fast XPath-based extraction; html.fromstring() handles malformed HTML; cssselect plugin adds CSS selector support
• Agent large XML streaming — context = etree.iterparse('large.xml', events=('end',), tag='Record'); for event, elem in context: process(elem); elem.clear() — streaming parse; agent processes multi-GB XML files without loading into memory; elem.clear() releases memory; iterparse is event-driven
• Agent XML validation — schema = etree.XMLSchema(etree.parse('schema.xsd')); doc = etree.parse('data.xml'); if not schema.validate(doc): errors = schema.error_log; handle_errors(errors) — schema validation; agent validates XML against XSD schema; error_log provides detailed validation errors
• Agent XML generation — root = etree.Element('root'); child = etree.SubElement(root, 'item', id='1'); child.text = 'content'; xml_bytes = etree.tostring(root, pretty_print=True, xml_declaration=True, encoding='UTF-8') — XML creation; agent programmatically builds XML documents

Not For

• Simple XML with stdlib — for basic XML tasks, stdlib xml.etree.ElementTree is sufficient without the C dependency
• JSON data — lxml is XML/HTML only; for JSON use stdlib json or orjson
• Environments where C extension fails — lxml requires compiled C extension; some minimal environments (Alpine musl) may have issues; use lxml-stubs for type hints

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: none

OAuth: No Scopes: No

No auth — XML/HTML parsing library.

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

lxml is BSD 3-Clause and GPL licensed. Free for all use.

Agent Metadata

Pagination

none

Idempotent

Full

Retry Guidance

Not documented

Known Gotchas

⚠ XPath returns list of nodes not single value — root.xpath('//item/text()') returns list ['val1', 'val2']; root.xpath('//item[1]/text()') returns list with one element; agent code: use [0] or xpath('//item[1]/text()')[0] for single value; or xpath('string(//item[1]/text())') returns string directly; xpath('count(//item)') returns float
⚠ lxml elements have tail text — etree structure: <a>text<b/>tail</a>; elem.text is 'text' (before child); elem[0].tail is 'tail' (after child b, still part of parent a); agent code parsing mixed-content XML: use itertext() or get_text() from lxml.html for HTML; raw XML: concatenate elem.text and each child's tail
⚠ Namespace handling in XPath — <root xmlns:ns='http://example.com'><ns:item/></root>; xpath('//ns:item') requires namespace map: root.xpath('//ns:item', namespaces={'ns': 'http://example.com'}); agent code with namespaced XML: always define namespace map; Clark notation alternative: root.xpath('//{http://example.com}item')
⚠ iterparse memory management — iterparse yields (event, elem) pairs; element still holds children in memory; agent code: after processing elem, call elem.clear() and del elem; also root reference keeps everything alive: use root = tree.getroot() outside loop or clear root too; without clearing: iterparse for large files still loads all into memory
⚠ from lxml import etree vs import lxml.etree — both work; from lxml import etree is conventional; lxml.html is separate: from lxml import html; html module has fromstring() that handles malformed HTML better than etree; agent code: use html.fromstring() for HTML (returns HtmlElement); etree.fromstring() for strict XML
⚠ html.fromstring() vs html.document_fromstring() — html.fromstring('<p>text</p>') may return an Element (not full document) if input is a fragment; html.document_fromstring() always returns full document HtmlElement; agent code parsing complete HTML pages: use document_fromstring(); for fragments: fromstring() and work with the element directly

Alternatives

beautifulsoup-python-api parsel-python-api

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for lxml.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.