Diffbot API

Applies AI-powered extraction to any web page to return structured data (articles, products, people, companies, discussions) and provides a knowledge graph of 1 billion+ entities with relationships derived from the public web.

Evaluated Mar 06, 2026 (0d ago) vcurrent

Homepage ↗ Other web-scraping knowledge-graph nlp entity-extraction article product company crawl

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

100

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Token in query param is a security concern as it appears in access logs. Processes public web data; extracted content may contain PII from web pages. No key scoping available.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You need to extract structured data from arbitrary web pages at scale without building and maintaining custom scrapers per site.

Avoid When

You need to access authenticated content, need sub-second extraction latency, or require certified data provenance for compliance purposes.

Use Cases

• Extract structured article content (title, author, date, body text, images) from any news URL without custom scraper maintenance
• Query the Diffbot Knowledge Graph for a company entity to retrieve funding rounds, employees, competitors, and news in one API call
• Crawl a competitor's product catalog pages and extract structured product data (name, price, specs, images) at scale
• Use the NLP API to extract entities, sentiment, and topics from raw text for document classification or tagging pipelines
• Monitor a set of web pages for content changes and extract updated structured data on a schedule for a competitive intelligence agent

Not For

• Scraping pages behind authentication or paywalls — Diffbot processes publicly accessible URLs only
• Real-time streaming web data at sub-minute latency — extraction latency is typically seconds per URL
• Legal or regulatory data requiring certified source provenance — Diffbot derives data from the public web without chain of custody

Interface

REST API

Yes

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Yes

Authentication

Methods: api_key

OAuth: No Scopes: No

API token passed as a query parameter `token=YOUR_TOKEN` on all requests. Token is obtained from the Diffbot dashboard.

Pricing

Model: freemium

Free tier: Yes

Requires CC: No

Free trial is generous for evaluation but time-limited. Knowledge Graph queries are typically a separate higher-tier add-on.

Agent Metadata

Pagination

cursor

Idempotent

Full

Retry Guidance

Documented

Known Gotchas

⚠ JavaScript-heavy single-page applications may extract poorly or return empty fields — Diffbot uses a headless browser but JS rendering adds latency and is not 100% reliable
⚠ Extraction latency varies from 1-30+ seconds depending on page complexity and target server speed — agents must use async patterns or generous timeouts
⚠ Knowledge Graph entity searches return confidence scores; low-confidence entities may contain incorrect relationship data
⚠ The Crawl API is asynchronous — agents must poll for completion or use webhooks; synchronous crawl assumptions will break
⚠ API token in query string is logged in server access logs and HTTP referrer headers — treat as sensitive credential despite query param placement

Alternatives

clearbit-api peopledatalabs-api scraperapi-api

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Diffbot API.

$99

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.