houtini-lm
Provides an MCP server that connects Claude Code to a local (LM Studio/Ollama/vLLM/llama.cpp) or OpenAI-compatible LLM endpoint. It routes bounded “grunt work” tasks to the cheaper model (e.g., code review drafts, tests, commit messages, format conversion, mock data, embeddings), while leaving complex reasoning and orchestration to Claude. Includes model discovery/caching, per-model routing hints, performance stats, and tool functions such as chat/custom_prompt/code_task/embed/discover/list_models.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
Security posture is partially inferable: it supports connecting to local network endpoints and OpenAI-compatible cloud endpoints; README mentions LM_STUDIO_PASSWORD for cloud-like setups but does not detail TLS enforcement, secret redaction/logging behavior, or threat model. It claims privacy/no rate limits, but upstream provider security and transport (HTTP vs HTTPS) depend on the configured URL. Dependency hygiene is unknown beyond declared deps (@modelcontextprotocol/sdk, sql.js).
⚡ Reliability
Best When
You already run an LLM (local or OpenAI-compatible endpoint) and want Claude Code to offload bounded, file-heavy subtasks to reduce context/tokens and cost while keeping Claude’s planning/orchestration.
Avoid When
You cannot reliably run or secure access to the target LLM endpoint(s), or you require strong compliance controls/auditability that aren’t described here; also avoid when the delegated tasks are not actually bounded (risk of pushing reasoning-heavy work to the wrong model).
Use Cases
- • Delegating bounded coding tasks from Claude Code to a local LLM to reduce token usage
- • Local/cloud hybrid routing across OpenAI-compatible providers
- • Generating structured JSON outputs via json_schema with grammar-constrained sampling
- • Generating embeddings for RAG pipelines through OpenAI-compatible /v1/embeddings
- • Performance monitoring of delegated calls (latency/tokens/sessions)
- • Model discovery and capability-aware routing via cached metadata
Not For
- • Running without access to an MCP-capable orchestrator (e.g., Claude Code integration workflow)
- • Tasks requiring strong multi-tool orchestration or deep agentic reasoning where Claude must remain the orchestrator
- • Scenarios needing a first-party hosted API/SaaS with centralized controls (this is primarily a local/bring-your-own-endpoint integration)
Interface
Authentication
Authentication is delegated to the configured LLM endpoint (local often no key; cloud OpenAI-compatible endpoints typically use an API key). The README doesn’t describe OAuth flows or fine-grained scopes at the MCP-server layer.
Pricing
Project itself is described as free; actual costs depend on which upstream LLM endpoint(s) you route to (local vs cloud providers). README explicitly claims “Free. No rate limits.” but this is about the server usage/integration, not necessarily upstream provider costs.
Agent Metadata
Known Gotchas
- ⚠ Delegation overhead can dominate for small tasks (MCP/tool call overhead).
- ⚠ Routing relies on model discovery/cached metadata; ensure the cache refresh cadence (described as 7 days) aligns with your model changes.
- ⚠ Local models may hallucinate with truncated input; README advises sending complete code or relevant functions only.
- ⚠ As of v2.8.0, enforces one call at a time using a request semaphore—agents should not assume concurrent tool execution.
Alternatives
Full Evaluation Report
Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for houtini-lm.
AI-powered analysis · PDF + markdown · Delivered within 30 minutes
Package Brief
Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.
Delivered within 10 minutes
Score Monitoring
Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.
Continuous monitoring
Scores are editorial opinions as of 2026-03-30.