doctor

⚠ Stale — 111d ago

Doctor is an agent-oriented web discovery/crawling and indexing system. It crawls web pages, chunks and embeds text (via OpenAI through LiteLLM), stores results in DuckDB with vector search, exposes a FastAPI HTTP API for fetch/search/map navigation, and provides access to these capabilities via an MCP server endpoint (/mcp) for LLM agents.

Evaluated Mar 30, 2026 (111d ago)

Repo ↗ Ai Ml ai-ml search crawling rag mcp fastapi duckdb vector-search web-indexing

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

TLS/auth and operational security details are not described in the provided README. An OpenAI API key is required, but how it is stored/used (env vars vs logs, etc.) is not fully specified. The stack uses FastAPI/Redis/RQ/DuckDB; without documented access control for /fetch_url and MCP, exposure risk is non-trivial. Dependency versions are partially pinned (e.g., crawl4ai==0.6.0), but overall CVE/patch hygiene cannot be confirmed from the provided data.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You control the deployment (Docker compose), want local HTTP + MCP access to crawled/embedded web content, and can provide an OpenAI API key for embeddings.

Avoid When

You need strong multi-tenant security, strict compliance guarantees, or you must crawl high-risk content with minimal operational risk.

Use Cases

• Crawl and index documentation or public websites for retrieval-augmented generation
• Build hierarchical site maps (parent/child/siblings) to navigate crawled content
• Provide an MCP-accessible search tool to LLM agents for up-to-date code/text generation from newly crawled sources
• Implement an internal web knowledge base with vector search over crawled pages

Not For

• Crawling sites requiring authenticated access without additional supported mechanisms
• Handling sensitive/regulated data without explicit security/compliance configuration and review
• Large-scale internet crawling at very high throughput without robust rate limiting, queue management, and operational safeguards
• Use as a general-purpose authenticated API service for untrusted external clients

Interface

REST API

Yes

GraphQL

gRPC

MCP Server

Yes ↗

SDK

Webhooks

Authentication

Methods: OpenAI API key via OPENAI_API_KEY (for embeddings)

OAuth: No Scopes: No

The README describes only an OpenAI API key requirement and local service configuration. It does not describe HTTP authentication/authorization for the FastAPI endpoints or MCP server, so access control for /fetch_url/search/maps appears not to be documented/guaranteed.

Pricing

Free tier: No

Requires CC: No

Runtime costs likely depend on embedding calls to OpenAI (and any LLM usage via LiteLLM). No pricing or free tier information is provided in the README.

Agent Metadata

Pagination

none

Idempotent

False

Retry Guidance

Not documented

Known Gotchas

⚠ Crawling can be expensive and time-consuming; agents may trigger repeated /fetch_url calls without idempotency controls.
⚠ No documented authentication for HTTP/MCP endpoints in the README; agents should assume network exposure risks if deployed beyond localhost.
⚠ Embedding depends on external OpenAI access; failures/timeouts may occur if the OpenAI key or upstream service is unavailable.

Alternatives

RAG/search over existing sitemaps using dedicated search engines (e.g., Elasticsearch/OpenSearch + ingest pipelines) Open-source crawling + indexing stacks (e.g., Scrapy + embedding/index pipeline) Hosted web-to-vector solutions (vendor managed crawling + embeddings + search) Other MCP-connected RAG/search services (where available)

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for doctor.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-30.