{"id":"houtini-ai-houtini-lm","name":"houtini-lm","homepage":"https://houtini.com/how-to-cut-your-claude-code-bill-with-houtini-lm/","repo_url":"https://github.com/houtini-ai/houtini-lm","category":"ai-ml","subcategories":[],"tags":["ai-agents","mcp","llm-routing","claude","local-llm","openai-compatible","embeddings","developer-tools"],"what_it_does":"Provides an MCP server that connects Claude Code to a local (LM Studio/Ollama/vLLM/llama.cpp) or OpenAI-compatible LLM endpoint. It routes bounded “grunt work” tasks to the cheaper model (e.g., code review drafts, tests, commit messages, format conversion, mock data, embeddings), while leaving complex reasoning and orchestration to Claude. Includes model discovery/caching, per-model routing hints, performance stats, and tool functions such as chat/custom_prompt/code_task/embed/discover/list_models.","use_cases":["Delegating bounded coding tasks from Claude Code to a local LLM to reduce token usage","Local/cloud hybrid routing across OpenAI-compatible providers","Generating structured JSON outputs via json_schema with grammar-constrained sampling","Generating embeddings for RAG pipelines through OpenAI-compatible /v1/embeddings","Performance monitoring of delegated calls (latency/tokens/sessions)","Model discovery and capability-aware routing via cached metadata"],"not_for":["Running without access to an MCP-capable orchestrator (e.g., Claude Code integration workflow)","Tasks requiring strong multi-tool orchestration or deep agentic reasoning where Claude must remain the orchestrator","Scenarios needing a first-party hosted API/SaaS with centralized controls (this is primarily a local/bring-your-own-endpoint integration)"],"best_when":"You already run an LLM (local or OpenAI-compatible endpoint) and want Claude Code to offload bounded, file-heavy subtasks to reduce context/tokens and cost while keeping Claude’s planning/orchestration.","avoid_when":"You cannot reliably run or secure access to the target LLM endpoint(s), or you require strong compliance controls/auditability that aren’t described here; also avoid when the delegated tasks are not actually bounded (risk of pushing reasoning-heavy work to the wrong model).","alternatives":["Other MCP servers/integration layers for routing tasks to local LLMs","Direct use of an OpenAI-compatible API with your own orchestration layer","Using Claude Code configuration/tools without an MCP delegation server (manual prompting)","Community wrappers around Ollama/vLLM/OpenAI-compatible endpoints for agent delegation"],"af_score":70.5,"security_score":48.5,"reliability_score":38.8,"package_type":"mcp_server","discovery_source":["github"],"priority":"high","status":"evaluated","version_evaluated":null,"last_evaluated":"2026-03-30T15:37:38.871586+00:00","interface":{"has_rest_api":false,"has_graphql":false,"has_grpc":false,"has_mcp_server":true,"mcp_server_url":null,"has_sdk":false,"sdk_languages":["JavaScript","TypeScript","Node.js"],"openapi_spec_url":null,"webhooks":false},"auth":{"methods":["OpenAI-compatible auth (e.g., bearer API key) via LM_STUDIO_PASSWORD/endpoint key for cloud endpoints","No auth for local LM Studio/Ollama if not required by the local endpoint"],"oauth":false,"scopes":false,"notes":"Authentication is delegated to the configured LLM endpoint (local often no key; cloud OpenAI-compatible endpoints typically use an API key). The README doesn’t describe OAuth flows or fine-grained scopes at the MCP-server layer."},"pricing":{"model":null,"free_tier_exists":false,"free_tier_limits":null,"paid_tiers":[],"requires_credit_card":false,"estimated_workload_costs":null,"notes":"Project itself is described as free; actual costs depend on which upstream LLM endpoint(s) you route to (local vs cloud providers). README explicitly claims “Free. No rate limits.” but this is about the server usage/integration, not necessarily upstream provider costs."},"requirements":{"requires_signup":false,"requires_credit_card":false,"domain_verification":false,"data_residency":[],"compliance":[],"min_contract":null},"agent_readiness":{"af_score":70.5,"security_score":48.5,"reliability_score":38.8,"mcp_server_quality":88.0,"documentation_accuracy":75.0,"error_message_quality":0.0,"error_message_notes":null,"auth_complexity":70.0,"rate_limit_clarity":80.0,"tls_enforcement":60.0,"auth_strength":50.0,"scope_granularity":20.0,"dependency_hygiene":60.0,"secret_handling":55.0,"security_notes":"Security posture is partially inferable: it supports connecting to local network endpoints and OpenAI-compatible cloud endpoints; README mentions LM_STUDIO_PASSWORD for cloud-like setups but does not detail TLS enforcement, secret redaction/logging behavior, or threat model. It claims privacy/no rate limits, but upstream provider security and transport (HTTP vs HTTPS) depend on the configured URL. Dependency hygiene is unknown beyond declared deps (@modelcontextprotocol/sdk, sql.js).","uptime_documented":0.0,"version_stability":65.0,"breaking_changes_history":45.0,"error_recovery":45.0,"idempotency_support":"false","idempotency_notes":"Operations appear request/response and likely not idempotent in the strict sense (LLM generation). No explicit idempotency guarantees described.","pagination_style":"none","retry_guidance_documented":false,"known_agent_gotchas":["Delegation overhead can dominate for small tasks (MCP/tool call overhead).","Routing relies on model discovery/cached metadata; ensure the cache refresh cadence (described as 7 days) aligns with your model changes.","Local models may hallucinate with truncated input; README advises sending complete code or relevant functions only.","As of v2.8.0, enforces one call at a time using a request semaphore—agents should not assume concurrent tool execution."]}}