webclaw

webclaw is a self-hosted, local-first web scraping/extraction tool written in Rust. It fetches and extracts main content from URLs into LLM-friendly outputs (e.g., markdown/text/JSON), supports crawling and sitemap mapping, and includes an MCP server plus optional cloud API for protected/JS-heavy sites and “LLM features” (summarize/extract/search/research).

Evaluated Mar 30, 2026 (45d ago)

Homepage ↗ Repo ↗ Ai Ml ai-scraping web-crawler data-extraction mcp rust self-hosted llm-rag markdown json

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Traffic is described as using TLS fingerprinting; README implies local operation without accounts for most tools. Cloud features require WEBCLAW_API_KEY and provider keys (OpenAI/Anthropic) are configured via environment variables. The provided README does not detail secure storage, redaction in logs, TLS enforcement guarantees for all modes, or fine-grained access scopes. Scraping/bot-bypass capabilities should be used only with authorization.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You want fast, self-hosted web content extraction that produces LLM-optimized outputs, and you use an MCP-capable agent client (Claude Desktop/Code, Cursor, etc.).

Avoid When

You need strict API contract guarantees (the README provides limited REST API details) or you must adhere to strict legal/compliance constraints around scraping/target-site access—validate before deployment.

Use Cases

• Provide real-time web access to AI agents via MCP (scrape/crawl/map/batch + LLM-oriented tools).
• Token-efficient extraction for RAG/training data pipelines (structured markdown/JSON with links/images metadata).
• Monitoring and change tracking via snapshots and diffs.
• Automated extraction from docs/sites with includes/excludes CSS selectors.
• Brand/identity extraction (colors/fonts/logos/OG imagery).
• Batch extraction and parallel crawling for large documentation sets.

Not For

• High-assurance compliance use without reviewing scraping behavior and data handling requirements.
• Use cases requiring a guaranteed browser-execution environment (it is explicitly “no headless browser”).
• Circumventing website security controls without permission (it claims bot-bypass via TLS fingerprinting and optional API key).

Interface

REST API

GraphQL

gRPC

MCP Server

Yes

SDK

Yes

Webhooks

Authentication

Methods: WEBCLAW_API_KEY (cloud API key) Ollama connectivity via OLLAMA_HOST for local LLM features OPENAI_API_KEY for LLM features ANTHROPIC_API_KEY for LLM features

OAuth: No Scopes: No

Auth is described primarily via environment variables for cloud and optional LLM providers. MCP tools generally “work locally” without an account/key; cloud-required tools include search/research per the README table, but the README does not describe fine-grained scopes.

Pricing

Free tier: No

Requires CC: No

README indicates cloud API key enables advanced features and fallback; no explicit pricing tiers or free tier limits are provided.

Agent Metadata

Pagination

none

Idempotent

False

Retry Guidance

Not documented

Known Gotchas

⚠ Crawling/batching may be sensitive to target site rate limits/robots; README does not provide explicit retry/idempotency guidance.
⚠ Some MCP tools depend on local LLM runtime (Ollama) or external providers (OpenAI/Anthropic); tool availability may vary by deployment.
⚠ Cloud fallback/bot-detection behavior is described at a high level; exact triggering conditions and error modes are not documented in the provided README.

Alternatives

Firecrawl Trafilatura Readability-based extractors Crawl4AI Scrapy Playwright/Selenium-based scrapers (for heavy JS sites)

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for webclaw.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-30.