Nokogiri

HTML and XML parsing library for Ruby — the standard Ruby library for parsing, querying, and modifying HTML/XML documents. Nokogiri wraps libxml2 and libgumbo (HTML5 parser) for fast, standards-compliant parsing. Key APIs: Nokogiri::HTML5(html_string) for HTML5 parsing, Nokogiri::XML(xml_string) for XML, CSS selectors (doc.css('div.agent-card')), XPath (doc.xpath('//agent[@status="active"]')), text extraction (.text), attribute access (.attr('href')), and document modification. Used for web scraping agent tools, parsing HTML email content, processing RSS/Atom feeds, extracting agent knowledge from web pages, and XML API response handling.

Evaluated Mar 06, 2026 (0d ago) v1.16.x

Homepage ↗ Repo ↗ Developer Tools ruby html xml parsing xpath css-selector web-scraping document

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

100

Rate Limits

100

🔒 Security

TLS Enforcement

100

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Nokogiri sanitizes HTML with Nokogiri::HTML::DocumentFragment.parse and then filtering — use ActionView::Helpers::SanitizeHelper or Loofah built on Nokogiri for safe HTML rendering. Parsing attacker-controlled XML can trigger XXE (XML External Entity) attacks; always disable external entity processing for untrusted agent-scraped XML.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You need to parse HTML or XML in Ruby — web scraping agent tools, XML API processing, feed parsing, or HTML email extraction — Nokogiri is the standard, production-ready HTML/XML parser.

Avoid When

Your content is JSON, you need JavaScript rendering (use headless browser), or you're doing simple text extraction (use regex).

Use Cases

• Web scraping tool for agent knowledge extraction — Nokogiri.HTML5(faraday.get(url).body).css('article.content').map(&:text) extracts relevant content from web pages for agent knowledge base
• Parse XML API responses in agent integrations — Nokogiri::XML(soap_response).xpath('//AgentData').map { |n| n.text } for agent services consuming legacy XML-based APIs
• Extract agent-relevant data from HTML email — parse HTML email body with Nokogiri and extract order details, confirmation numbers for agent inbox processing workflows
• Process RSS/Atom feeds for agent content — Nokogiri::XML(feed_response).css('item').map { |i| {title: i.css('title').text, content: i.css('content').text} } for agent news/content tools
• Sanitize user-generated HTML before agent processing — Nokogiri parses and filters HTML content, removing scripts and dangerous elements from agent input processing

Not For

• Simple string extraction — if you just need to extract a regex pattern from HTML, use Ruby regex instead of parsing; Nokogiri is for structured document traversal
• JSON APIs — Nokogiri parses HTML/XML, not JSON; use Ruby's built-in JSON.parse for JSON agent API responses
• JavaScript-rendered pages — Nokogiri parses static HTML; use Ferrum or Capybara with headless Chrome for JavaScript-rendered agent target pages

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: none

OAuth: No Scopes: No

Document parsing library — no auth. Network requests for agent scraping handled by separate HTTP client (Net::HTTP, Faraday, HTTParty).

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

Nokogiri is MIT licensed, maintained by Mike Dalessio. Free for all use.

Agent Metadata

Pagination

none

Idempotent

Full

Retry Guidance

Not documented

Known Gotchas

⚠ CSS vs XPath syntax — Nokogiri supports both; doc.css('div.agent') uses CSS selector; doc.xpath('//div[@class="agent"]') uses XPath; CSS is cleaner for HTML scraping, XPath is more powerful for complex conditions; agent scraping tools should pick one consistently
⚠ HTML parsing is lenient, XML parsing is strict — Nokogiri::HTML5(malformed_html) repairs malformed HTML silently; Nokogiri::XML(malformed_xml) may raise SyntaxError or parse incorrectly; validate agent XML sources if XML structure is business-critical
⚠ Encoding issues with non-UTF8 content — Nokogiri may misparse agent web content with non-UTF8 encoding; force encoding with Nokogiri::HTML5(body.encode('UTF-8', invalid: :replace, undef: :replace)) before parsing Japanese, Arabic, or other non-Latin agent content
⚠ NodeSet vs Node vs String return types — doc.css('.title') returns NodeSet; doc.css('.title').first returns Node; .text on Node returns string; calling .text on NodeSet concatenates all text without separators; agent scraping code must handle return type correctly
⚠ Nested element text includes descendant text — .text on parent element includes all nested child text; doc.css('article').text returns ALL article text including headers, links, captions; use .children.select { |n| n.text? }.map(&:text).join for direct text only
⚠ libxml2 native extension compilation — Nokogiri ships native gems for major platforms (Linux, macOS, Windows) avoiding compile; on uncommon platforms or custom systems, gem install nokogiri may trigger native compilation requiring libxml2-dev and libxslt-dev system packages

Alternatives

oga-ruby-api rexml-api ferrum-api

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Nokogiri.

$99

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.