{"id":"houtini-ai-houtini-lm","name":"houtini-lm","af_score":70.5,"security_score":48.5,"reliability_score":38.8,"what_it_does":"Provides an MCP server that connects Claude Code to a local (LM Studio/Ollama/vLLM/llama.cpp) or OpenAI-compatible LLM endpoint. It routes bounded “grunt work” tasks to the cheaper model (e.g., code review drafts, tests, commit messages, format conversion, mock data, embeddings), while leaving complex reasoning and orchestration to Claude. Includes model discovery/caching, per-model routing hints, performance stats, and tool functions such as chat/custom_prompt/code_task/embed/discover/list_models.","best_when":"You already run an LLM (local or OpenAI-compatible endpoint) and want Claude Code to offload bounded, file-heavy subtasks to reduce context/tokens and cost while keeping Claude’s planning/orchestration.","avoid_when":"You cannot reliably run or secure access to the target LLM endpoint(s), or you require strong compliance controls/auditability that aren’t described here; also avoid when the delegated tasks are not actually bounded (risk of pushing reasoning-heavy work to the wrong model).","last_evaluated":"2026-03-30T15:37:38.871586+00:00","has_mcp":true,"has_api":false,"auth_methods":["OpenAI-compatible auth (e.g., bearer API key) via LM_STUDIO_PASSWORD/endpoint key for cloud endpoints","No auth for local LM Studio/Ollama if not required by the local endpoint"],"has_free_tier":false,"known_gotchas":["Delegation overhead can dominate for small tasks (MCP/tool call overhead).","Routing relies on model discovery/cached metadata; ensure the cache refresh cadence (described as 7 days) aligns with your model changes.","Local models may hallucinate with truncated input; README advises sending complete code or relevant functions only.","As of v2.8.0, enforces one call at a time using a request semaphore—agents should not assume concurrent tool execution."],"error_quality":0.0}