{"id":"sadiuysal-crawl4ai-mcp-server","name":"crawl4ai-mcp-server","homepage":null,"repo_url":"https://github.com/sadiuysal/crawl4ai-mcp-server","category":"ai-ml","subcategories":[],"tags":["mcp","web-scraping","web-crawling","agent-tools","playwright","crawl4ai"],"what_it_does":"crawl4ai-mcp-server is a self-hosted MCP server (stdio) that exposes Crawl4AI scraping/crawling capabilities to AI agents via four MCP tools: scrape, crawl, crawl_site, and crawl_sitemap. It supports Markdown extraction, BFS crawling with depth/page limits and optional adaptive stopping, and can optionally persist results to disk via output_dir/manifest files, while applying URL safety blocks for localhost/private/internal targets.","use_cases":["Agent-assisted web research with automated scraping and crawling","Building RAG pipelines from website content with persisted crawl artifacts","Extracting page content to Markdown for downstream summarization","Crawling documentation sites with include/exclude URL patterns","Sitemap-driven ingestion for large sets of pages"],"not_for":["Highly regulated scraping without review of target-site legality/robots/copyright","Public-facing internet services without hardening (this is described as an MCP stdio tool, not a hosted API)","Guaranteeing completeness or consistency of crawl results (website structure varies)","Tasks requiring strict idempotent repeatability across runs without managing output state"],"best_when":"Used locally/self-hosted by an agent workflow to gather web content from allowed public URLs and store crawl artifacts for later processing.","avoid_when":"Avoid targeting private/internal networks even if the agent tries; also avoid using it as a generic open proxy without strict egress controls and operational guardrails.","alternatives":["Firecrawl (hosted API)","scraping frameworks + Playwright (self-built, direct code integration)","Other MCP web browsing/scraping servers (if available for your stack)","Crawler/RAG ingestion tools such as Apify (vendor-managed execution)"],"af_score":64.2,"security_score":44.0,"reliability_score":21.2,"package_type":"mcp_server","discovery_source":["github"],"priority":"high","status":"evaluated","version_evaluated":null,"last_evaluated":"2026-03-30T13:53:24.660317+00:00","interface":{"has_rest_api":false,"has_graphql":false,"has_grpc":false,"has_mcp_server":true,"mcp_server_url":null,"has_sdk":false,"sdk_languages":[],"openapi_spec_url":null,"webhooks":false},"auth":{"methods":["OPENAI_API_KEY environment variable (only referenced for OpenAI Agents SDK example)"],"oauth":false,"scopes":false,"notes":"No explicit auth/authorization for the MCP server itself is described. The only credential mentioned is OPENAI_API_KEY used by the example agent integration."},"pricing":{"model":null,"free_tier_exists":false,"free_tier_limits":null,"paid_tiers":[],"requires_credit_card":false,"estimated_workload_costs":null,"notes":"Pricing is not described; as an MIT repo with a Docker image reference, costs likely depend on infrastructure and any upstream model provider used by the agent (e.g., OpenAI)."},"requirements":{"requires_signup":false,"requires_credit_card":false,"domain_verification":false,"data_residency":[],"compliance":[],"min_contract":null},"agent_readiness":{"af_score":64.2,"security_score":44.0,"reliability_score":21.2,"mcp_server_quality":78.0,"documentation_accuracy":70.0,"error_message_quality":0.0,"error_message_notes":null,"auth_complexity":95.0,"rate_limit_clarity":15.0,"tls_enforcement":95.0,"auth_strength":20.0,"scope_granularity":10.0,"dependency_hygiene":40.0,"secret_handling":60.0,"security_notes":"README claims safety guards that block localhost/private IPs and file:// schemes and internal domain patterns. However, no server-side auth/authorization is described (so anyone with access to the stdio process may use tools). The presence of OPENAI_API_KEY is only for examples; secret-handling specifics are not documented, and dependency hygiene/CVEs are unknown from provided content.","uptime_documented":0.0,"version_stability":45.0,"breaking_changes_history":0.0,"error_recovery":40.0,"idempotency_support":"false","idempotency_notes":"When output_dir is used, runs appear to write run-specific artifacts (run_id). Re-running may produce additional files and different outcomes depending on crawling state; no explicit idempotency guarantees are documented.","pagination_style":"none","retry_guidance_documented":false,"known_agent_gotchas":["MCP server is stdio-based; requires correct MCP client setup (command/args) and editor configuration.","Long crawls may be constrained by max_pages/max_depth/timeout_sec; agents should tune these to avoid partial results.","Use of output_dir persists artifacts; agents should manage filesystem/volume permissions and cleanup.","URL safety blocks may prevent scraping expected targets if they match localhost/private/internal patterns."]}}