webclaw

webclaw is a self-hosted, local-first web scraping/extraction tool written in Rust. It fetches and extracts main content from URLs into LLM-friendly outputs (e.g., markdown/text/JSON), supports crawling and sitemap mapping, and includes an MCP server plus optional cloud API for protected/JS-heavy sites and “LLM features” (summarize/extract/search/research).

Evaluated Mar 30, 2026 (0d ago)
Homepage ↗ Repo ↗ Ai Ml ai-scraping web-crawler data-extraction mcp rust self-hosted llm-rag markdown json
⚙ Agent Friendliness
59
/ 100
Can an agent use this?
🔒 Security
53
/ 100
Is it safe for agents?
⚡ Reliability
32
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
85
Documentation
70
Error Messages
0
Auth Simplicity
75
Rate Limits
25

🔒 Security

TLS Enforcement
85
Auth Strength
55
Scope Granularity
20
Dep. Hygiene
40
Secret Handling
60

Traffic is described as using TLS fingerprinting; README implies local operation without accounts for most tools. Cloud features require WEBCLAW_API_KEY and provider keys (OpenAI/Anthropic) are configured via environment variables. The provided README does not detail secure storage, redaction in logs, TLS enforcement guarantees for all modes, or fine-grained access scopes. Scraping/bot-bypass capabilities should be used only with authorization.

⚡ Reliability

Uptime/SLA
10
Version Stability
45
Breaking Changes
40
Error Recovery
35
AF Security Reliability

Best When

You want fast, self-hosted web content extraction that produces LLM-optimized outputs, and you use an MCP-capable agent client (Claude Desktop/Code, Cursor, etc.).

Avoid When

You need strict API contract guarantees (the README provides limited REST API details) or you must adhere to strict legal/compliance constraints around scraping/target-site access—validate before deployment.

Use Cases

  • Provide real-time web access to AI agents via MCP (scrape/crawl/map/batch + LLM-oriented tools).
  • Token-efficient extraction for RAG/training data pipelines (structured markdown/JSON with links/images metadata).
  • Monitoring and change tracking via snapshots and diffs.
  • Automated extraction from docs/sites with includes/excludes CSS selectors.
  • Brand/identity extraction (colors/fonts/logos/OG imagery).
  • Batch extraction and parallel crawling for large documentation sets.

Not For

  • High-assurance compliance use without reviewing scraping behavior and data handling requirements.
  • Use cases requiring a guaranteed browser-execution environment (it is explicitly “no headless browser”).
  • Circumventing website security controls without permission (it claims bot-bypass via TLS fingerprinting and optional API key).

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
Yes
SDK
Yes
Webhooks
No

Authentication

Methods: WEBCLAW_API_KEY (cloud API key) Ollama connectivity via OLLAMA_HOST for local LLM features OPENAI_API_KEY for LLM features ANTHROPIC_API_KEY for LLM features
OAuth: No Scopes: No

Auth is described primarily via environment variables for cloud and optional LLM providers. MCP tools generally “work locally” without an account/key; cloud-required tools include search/research per the README table, but the README does not describe fine-grained scopes.

Pricing

Free tier: No
Requires CC: No

README indicates cloud API key enables advanced features and fallback; no explicit pricing tiers or free tier limits are provided.

Agent Metadata

Pagination
none
Idempotent
False
Retry Guidance
Not documented

Known Gotchas

  • Crawling/batching may be sensitive to target site rate limits/robots; README does not provide explicit retry/idempotency guidance.
  • Some MCP tools depend on local LLM runtime (Ollama) or external providers (OpenAI/Anthropic); tool availability may vary by deployment.
  • Cloud fallback/bot-detection behavior is described at a high level; exact triggering conditions and error modes are not documented in the provided README.

Alternatives

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for webclaw.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-30.

6406
Packages Evaluated
19997
Need Evaluation
586
Need Re-evaluation
Community Powered