Nokogiri

HTML and XML parsing library for Ruby — the standard Ruby library for parsing, querying, and modifying HTML/XML documents. Nokogiri wraps libxml2 and libgumbo (HTML5 parser) for fast, standards-compliant parsing. Key APIs: Nokogiri::HTML5(html_string) for HTML5 parsing, Nokogiri::XML(xml_string) for XML, CSS selectors (doc.css('div.agent-card')), XPath (doc.xpath('//agent[@status="active"]')), text extraction (.text), attribute access (.attr('href')), and document modification. Used for web scraping agent tools, parsing HTML email content, processing RSS/Atom feeds, extracting agent knowledge from web pages, and XML API response handling.

Evaluated Mar 06, 2026 (0d ago) v1.16.x
Homepage ↗ Repo ↗ Developer Tools ruby html xml parsing xpath css-selector web-scraping document
⚙ Agent Friendliness
70
/ 100
Can an agent use this?
🔒 Security
94
/ 100
Is it safe for agents?
⚡ Reliability
90
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
90
Error Messages
85
Auth Simplicity
100
Rate Limits
100

🔒 Security

TLS Enforcement
100
Auth Strength
95
Scope Granularity
90
Dep. Hygiene
88
Secret Handling
95

Nokogiri sanitizes HTML with Nokogiri::HTML::DocumentFragment.parse and then filtering — use ActionView::Helpers::SanitizeHelper or Loofah built on Nokogiri for safe HTML rendering. Parsing attacker-controlled XML can trigger XXE (XML External Entity) attacks; always disable external entity processing for untrusted agent-scraped XML.

⚡ Reliability

Uptime/SLA
92
Version Stability
90
Breaking Changes
88
Error Recovery
90
AF Security Reliability

Best When

You need to parse HTML or XML in Ruby — web scraping agent tools, XML API processing, feed parsing, or HTML email extraction — Nokogiri is the standard, production-ready HTML/XML parser.

Avoid When

Your content is JSON, you need JavaScript rendering (use headless browser), or you're doing simple text extraction (use regex).

Use Cases

  • Web scraping tool for agent knowledge extraction — Nokogiri.HTML5(faraday.get(url).body).css('article.content').map(&:text) extracts relevant content from web pages for agent knowledge base
  • Parse XML API responses in agent integrations — Nokogiri::XML(soap_response).xpath('//AgentData').map { |n| n.text } for agent services consuming legacy XML-based APIs
  • Extract agent-relevant data from HTML email — parse HTML email body with Nokogiri and extract order details, confirmation numbers for agent inbox processing workflows
  • Process RSS/Atom feeds for agent content — Nokogiri::XML(feed_response).css('item').map { |i| {title: i.css('title').text, content: i.css('content').text} } for agent news/content tools
  • Sanitize user-generated HTML before agent processing — Nokogiri parses and filters HTML content, removing scripts and dangerous elements from agent input processing

Not For

  • Simple string extraction — if you just need to extract a regex pattern from HTML, use Ruby regex instead of parsing; Nokogiri is for structured document traversal
  • JSON APIs — Nokogiri parses HTML/XML, not JSON; use Ruby's built-in JSON.parse for JSON agent API responses
  • JavaScript-rendered pages — Nokogiri parses static HTML; use Ferrum or Capybara with headless Chrome for JavaScript-rendered agent target pages

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

Document parsing library — no auth. Network requests for agent scraping handled by separate HTTP client (Net::HTTP, Faraday, HTTParty).

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

Nokogiri is MIT licensed, maintained by Mike Dalessio. Free for all use.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • CSS vs XPath syntax — Nokogiri supports both; doc.css('div.agent') uses CSS selector; doc.xpath('//div[@class="agent"]') uses XPath; CSS is cleaner for HTML scraping, XPath is more powerful for complex conditions; agent scraping tools should pick one consistently
  • HTML parsing is lenient, XML parsing is strict — Nokogiri::HTML5(malformed_html) repairs malformed HTML silently; Nokogiri::XML(malformed_xml) may raise SyntaxError or parse incorrectly; validate agent XML sources if XML structure is business-critical
  • Encoding issues with non-UTF8 content — Nokogiri may misparse agent web content with non-UTF8 encoding; force encoding with Nokogiri::HTML5(body.encode('UTF-8', invalid: :replace, undef: :replace)) before parsing Japanese, Arabic, or other non-Latin agent content
  • NodeSet vs Node vs String return types — doc.css('.title') returns NodeSet; doc.css('.title').first returns Node; .text on Node returns string; calling .text on NodeSet concatenates all text without separators; agent scraping code must handle return type correctly
  • Nested element text includes descendant text — .text on parent element includes all nested child text; doc.css('article').text returns ALL article text including headers, links, captions; use .children.select { |n| n.text? }.map(&:text).join for direct text only
  • libxml2 native extension compilation — Nokogiri ships native gems for major platforms (Linux, macOS, Windows) avoiding compile; on uncommon platforms or custom systems, gem install nokogiri may trigger native compilation requiring libxml2-dev and libxslt-dev system packages

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Nokogiri.

$99

Scores are editorial opinions as of 2026-03-06.

5229
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered