Scrapy

High-performance Python web scraping and crawling framework. Scrapy provides a complete spider lifecycle (request scheduling, downloading, parsing, pipeline processing) with async I/O via Twisted. Built-in support for robots.txt, rate limiting, cookie handling, caching, and item pipelines for storing scraped data. The de-facto standard for large-scale Python web scraping.

Evaluated Mar 06, 2026 (0d ago) v2.11+
Homepage ↗ Repo ↗ Developer Tools python scraping crawling spider data-extraction async middleware
⚙ Agent Friendliness
67
/ 100
Can an agent use this?
🔒 Security
85
/ 100
Is it safe for agents?
⚡ Reliability
86
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
88
Error Messages
82
Auth Simplicity
100
Rate Limits
90

🔒 Security

TLS Enforcement
95
Auth Strength
85
Scope Granularity
80
Dep. Hygiene
85
Secret Handling
80

Self-hosted framework — security model depends on deployment. Twisted handles TLS. Scrapy settings file must not contain credentials in version control.

⚡ Reliability

Uptime/SLA
90
Version Stability
88
Breaking Changes
85
Error Recovery
82
AF Security Reliability

Best When

You need to crawl many pages systematically with built-in rate limiting, pipelines, and middleware for production-grade web scraping in Python.

Avoid When

The target site requires JavaScript rendering or you need a simple one-off scrape — use Playwright or requests+BeautifulSoup respectively.

Use Cases

  • Build agent data collection pipelines that crawl entire websites extracting structured data using CSS/XPath selectors
  • Schedule and run automated web scrapers that feed agent knowledge bases with regularly updated content
  • Extract product data, prices, and inventory from e-commerce sites for agent competitive intelligence
  • Crawl documentation sites to build agent-searchable knowledge stores from HTML content
  • Run large-scale domain crawls with rate limiting, politeness rules, and resume support for agent training data

Not For

  • JavaScript-heavy single-page applications — Scrapy doesn't execute JS; use Playwright or Selenium for SPAs
  • Simple one-off data extractions — requests + BeautifulSoup is simpler for small tasks
  • Real-time event-driven scraping — Scrapy is batch-oriented; use streaming solutions for real-time needs

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

Scrapy is a self-hosted Python framework with no authentication model of its own. Handles auth on target sites via cookies, headers, or form login via spiders.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

Scrapy framework itself is free and open source. Zyte (formerly Scrapinghub) offers managed Scrapy Cloud hosting and anti-bot proxy services commercially.

Agent Metadata

Pagination
none
Idempotent
Partial
Retry Guidance
Documented

Known Gotchas

  • Scrapy runs on Twisted async I/O — mixing with asyncio requires scrapy-asyncio bridge or Python 3.10+ native asyncio reactor support
  • JavaScript-rendered content is not accessible without scrapy-playwright or scrapy-splash middleware — many modern sites require JS execution
  • robots.txt is respected by default (ROBOTSTXT_OBEY=True) — agents must disable this setting if scraping non-public content that is allowed
  • Scrapy uses a global Item pipeline pattern — scraped items flow through all pipelines; ordering matters and errors in one pipeline can drop items silently
  • Memory usage can grow with large crawls if DEPTH_LIMIT and URL deduplication are not tuned — monitor DUPEFILTER stats to detect loops
  • Anti-bot detection (Cloudflare, Akamai) blocks naive Scrapy requests — requires rotating proxies, browser fingerprint emulation, or Zyte Smart Proxy Manager

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Scrapy.

$99

Scores are editorial opinions as of 2026-03-06.

5215
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered