{"id":"sisig-ai-doctor","name":"doctor","homepage":null,"repo_url":"https://github.com/sisig-ai/doctor","category":"ai-ml","subcategories":[],"tags":["ai-ml","search","crawling","rag","mcp","fastapi","duckdb","vector-search","web-indexing"],"what_it_does":"Doctor is an agent-oriented web discovery/crawling and indexing system. It crawls web pages, chunks and embeds text (via OpenAI through LiteLLM), stores results in DuckDB with vector search, exposes a FastAPI HTTP API for fetch/search/map navigation, and provides access to these capabilities via an MCP server endpoint (/mcp) for LLM agents.","use_cases":["Crawl and index documentation or public websites for retrieval-augmented generation","Build hierarchical site maps (parent/child/siblings) to navigate crawled content","Provide an MCP-accessible search tool to LLM agents for up-to-date code/text generation from newly crawled sources","Implement an internal web knowledge base with vector search over crawled pages"],"not_for":["Crawling sites requiring authenticated access without additional supported mechanisms","Handling sensitive/regulated data without explicit security/compliance configuration and review","Large-scale internet crawling at very high throughput without robust rate limiting, queue management, and operational safeguards","Use as a general-purpose authenticated API service for untrusted external clients"],"best_when":"You control the deployment (Docker compose), want local HTTP + MCP access to crawled/embedded web content, and can provide an OpenAI API key for embeddings.","avoid_when":"You need strong multi-tenant security, strict compliance guarantees, or you must crawl high-risk content with minimal operational risk.","alternatives":["RAG/search over existing sitemaps using dedicated search engines (e.g., Elasticsearch/OpenSearch + ingest pipelines)","Open-source crawling + indexing stacks (e.g., Scrapy + embedding/index pipeline)","Hosted web-to-vector solutions (vendor managed crawling + embeddings + search)","Other MCP-connected RAG/search services (where available)"],"af_score":50.8,"security_score":33.8,"reliability_score":35.0,"package_type":"mcp_server","discovery_source":["github"],"priority":"high","status":"evaluated","version_evaluated":null,"last_evaluated":"2026-03-30T13:27:52.394866+00:00","interface":{"has_rest_api":true,"has_graphql":false,"has_grpc":false,"has_mcp_server":true,"mcp_server_url":"http://localhost:9111/mcp","has_sdk":false,"sdk_languages":[],"openapi_spec_url":null,"webhooks":false},"auth":{"methods":["OpenAI API key via OPENAI_API_KEY (for embeddings)"],"oauth":false,"scopes":false,"notes":"The README describes only an OpenAI API key requirement and local service configuration. It does not describe HTTP authentication/authorization for the FastAPI endpoints or MCP server, so access control for /fetch_url/search/maps appears not to be documented/guaranteed."},"pricing":{"model":null,"free_tier_exists":false,"free_tier_limits":null,"paid_tiers":[],"requires_credit_card":false,"estimated_workload_costs":null,"notes":"Runtime costs likely depend on embedding calls to OpenAI (and any LLM usage via LiteLLM). No pricing or free tier information is provided in the README."},"requirements":{"requires_signup":false,"requires_credit_card":false,"domain_verification":false,"data_residency":[],"compliance":[],"min_contract":null},"agent_readiness":{"af_score":50.8,"security_score":33.8,"reliability_score":35.0,"mcp_server_quality":55.0,"documentation_accuracy":70.0,"error_message_quality":0.0,"error_message_notes":null,"auth_complexity":80.0,"rate_limit_clarity":10.0,"tls_enforcement":40.0,"auth_strength":20.0,"scope_granularity":10.0,"dependency_hygiene":45.0,"secret_handling":60.0,"security_notes":"TLS/auth and operational security details are not described in the provided README. An OpenAI API key is required, but how it is stored/used (env vars vs logs, etc.) is not fully specified. The stack uses FastAPI/Redis/RQ/DuckDB; without documented access control for /fetch_url and MCP, exposure risk is non-trivial. Dependency versions are partially pinned (e.g., crawl4ai==0.6.0), but overall CVE/patch hygiene cannot be confirmed from the provided data.","uptime_documented":0.0,"version_stability":55.0,"breaking_changes_history":50.0,"error_recovery":35.0,"idempotency_support":"false","idempotency_notes":"No idempotency semantics are documented for endpoints like POST /fetch_url or job-related operations.","pagination_style":"none","retry_guidance_documented":false,"known_agent_gotchas":["Crawling can be expensive and time-consuming; agents may trigger repeated /fetch_url calls without idempotency controls.","No documented authentication for HTTP/MCP endpoints in the README; agents should assume network exposure risks if deployed beyond localhost.","Embedding depends on external OpenAI access; failures/timeouts may occur if the OpenAI key or upstream service is unavailable."]}}