Diffbot API

Applies AI-powered extraction to any web page to return structured data (articles, products, people, companies, discussions) and provides a knowledge graph of 1 billion+ entities with relationships derived from the public web.

Evaluated Mar 06, 2026 (0d ago) vcurrent
Homepage ↗ Other web-scraping knowledge-graph nlp entity-extraction article product company crawl
⚙ Agent Friendliness
58
/ 100
Can an agent use this?
🔒 Security
69
/ 100
Is it safe for agents?
⚡ Reliability
76
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
82
Error Messages
75
Auth Simplicity
82
Rate Limits
70

🔒 Security

TLS Enforcement
100
Auth Strength
65
Scope Granularity
50
Dep. Hygiene
72
Secret Handling
60

Token in query param is a security concern as it appears in access logs. Processes public web data; extracted content may contain PII from web pages. No key scoping available.

⚡ Reliability

Uptime/SLA
78
Version Stability
78
Breaking Changes
75
Error Recovery
72
AF Security Reliability

Best When

You need to extract structured data from arbitrary web pages at scale without building and maintaining custom scrapers per site.

Avoid When

You need to access authenticated content, need sub-second extraction latency, or require certified data provenance for compliance purposes.

Use Cases

  • Extract structured article content (title, author, date, body text, images) from any news URL without custom scraper maintenance
  • Query the Diffbot Knowledge Graph for a company entity to retrieve funding rounds, employees, competitors, and news in one API call
  • Crawl a competitor's product catalog pages and extract structured product data (name, price, specs, images) at scale
  • Use the NLP API to extract entities, sentiment, and topics from raw text for document classification or tagging pipelines
  • Monitor a set of web pages for content changes and extract updated structured data on a schedule for a competitive intelligence agent

Not For

  • Scraping pages behind authentication or paywalls — Diffbot processes publicly accessible URLs only
  • Real-time streaming web data at sub-minute latency — extraction latency is typically seconds per URL
  • Legal or regulatory data requiring certified source provenance — Diffbot derives data from the public web without chain of custody

Interface

REST API
Yes
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
Yes

Authentication

Methods: api_key
OAuth: No Scopes: No

API token passed as a query parameter `token=YOUR_TOKEN` on all requests. Token is obtained from the Diffbot dashboard.

Pricing

Model: freemium
Free tier: Yes
Requires CC: No

Free trial is generous for evaluation but time-limited. Knowledge Graph queries are typically a separate higher-tier add-on.

Agent Metadata

Pagination
cursor
Idempotent
Full
Retry Guidance
Documented

Known Gotchas

  • JavaScript-heavy single-page applications may extract poorly or return empty fields — Diffbot uses a headless browser but JS rendering adds latency and is not 100% reliable
  • Extraction latency varies from 1-30+ seconds depending on page complexity and target server speed — agents must use async patterns or generous timeouts
  • Knowledge Graph entity searches return confidence scores; low-confidence entities may contain incorrect relationship data
  • The Crawl API is asynchronous — agents must poll for completion or use webhooks; synchronous crawl assumptions will break
  • API token in query string is logged in server access logs and HTTP referrer headers — treat as sensitive credential despite query param placement

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Diffbot API.

$99

Scores are editorial opinions as of 2026-03-06.

5173
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered