{"id":"dasdigitalemomentum-searxncrawl","name":"searxNcrawl","homepage":null,"repo_url":"https://github.com/DasDigitaleMomentum/searxNcrawl","category":"infrastructure","subcategories":[],"tags":["mcp","crawler","web-search","markdown","playwright","python","stdlib-cli","search","bfs-crawl"],"what_it_does":"searxNcrawl provides a minimal MCP server (STDIO and HTTP) plus CLI tools to search the web via SearXNG and crawl web pages/sites, extracting readable Markdown using Crawl4AI with configurable deduplication and optional authenticated crawling via Playwright storage_state.","use_cases":["Generating clean Markdown sources from documentation-heavy websites","Batch crawling a list of URLs with concurrency control and deduplication","Breadth-first site crawling with max depth/page limits","Web search over a SearXNG instance for finding documentation pages","Integrating crawling/search as tools for MCP-capable agent harnesses"],"not_for":["A production-grade public web crawling service without safeguards","High-assurance authentication and authorization workflows (auth flow is described as WIP)","Stable, documented REST/SDK-based third-party integrations (not provided as a first-class interface)"],"best_when":"You need local/agent-harness crawling + Markdown extraction and can run an MCP server you control, optionally pointing to your own SearXNG instance.","avoid_when":"You need strict compliance guarantees, guaranteed safe crawling behavior, or you cannot run Playwright (chromium) and required dependencies locally.","alternatives":["Crawl4AI directly (without the searxNcrawl MCP/CLI wrapper and defaults)","SearXNG API access (for search) combined with a separate crawler","Other crawler-to-Markdown pipelines (e.g., headless/browser-based crawlers) tailored to documentation sites"],"af_score":58.0,"security_score":52.2,"reliability_score":31.2,"package_type":"mcp_server","discovery_source":["github"],"priority":"high","status":"evaluated","version_evaluated":null,"last_evaluated":"2026-03-30T15:20:37.803992+00:00","interface":{"has_rest_api":false,"has_graphql":false,"has_grpc":false,"has_mcp_server":true,"mcp_server_url":null,"has_sdk":false,"sdk_languages":[],"openapi_spec_url":null,"webhooks":false},"auth":{"methods":["SearXNG basic auth via SEARXNG_USERNAME/SEARXNG_PASSWORD (optional)","Playwright storage_state file for authenticated crawling (WIP)"],"oauth":false,"scopes":false,"notes":"Authentication is not described as being enforced/authoritatively scoped within searxNcrawl itself; instead it relies on (a) optional basic auth when calling a SearXNG instance and (b) user-supplied Playwright storage_state for logged-in browsing during crawling."},"pricing":{"model":null,"free_tier_exists":false,"free_tier_limits":null,"paid_tiers":[],"requires_credit_card":false,"estimated_workload_costs":null,"notes":"Repo metadata indicates MIT license; no hosted pricing is described."},"requirements":{"requires_signup":false,"requires_credit_card":false,"domain_verification":false,"data_residency":[],"compliance":[],"min_contract":null},"agent_readiness":{"af_score":58.0,"security_score":52.2,"reliability_score":31.2,"mcp_server_quality":55.0,"documentation_accuracy":78.0,"error_message_quality":0.0,"error_message_notes":null,"auth_complexity":60.0,"rate_limit_clarity":20.0,"tls_enforcement":85.0,"auth_strength":45.0,"scope_granularity":20.0,"dependency_hygiene":60.0,"secret_handling":55.0,"security_notes":"The README documents optional basic auth credentials for SearXNG via environment variables and authenticated crawling via Playwright storage_state. It does not describe server-side access controls for the MCP HTTP transport, nor does it document secrets handling/logging or operational mitigations (e.g., SSRF protections, allowlists, crawl throttling, robots.txt behavior). TLS is presumably supported via HTTP(S) URLs, but enforcement and configuration are not explicitly stated.","uptime_documented":0.0,"version_stability":45.0,"breaking_changes_history":40.0,"error_recovery":40.0,"idempotency_support":"false","idempotency_notes":"Crawling operations are inherently non-idempotent with respect to external site state and time; the docs do not explicitly describe idempotent request semantics.","pagination_style":"none","retry_guidance_documented":false,"known_agent_gotchas":["Authenticated crawling is WIP; UX/flow may change and may not be fully reliable.","Crawling can be expensive (browser automation via Playwright/chromium) and slow depending on target pages and concurrency.","Site crawling uses BFS with max depth/page limits; agents should set tight limits to avoid unexpectedly large crawls."]}}