houtini-lm

Provides an MCP server that connects Claude Code to a local (LM Studio/Ollama/vLLM/llama.cpp) or OpenAI-compatible LLM endpoint. It routes bounded “grunt work” tasks to the cheaper model (e.g., code review drafts, tests, commit messages, format conversion, mock data, embeddings), while leaving complex reasoning and orchestration to Claude. Includes model discovery/caching, per-model routing hints, performance stats, and tool functions such as chat/custom_prompt/code_task/embed/discover/list_models.

Evaluated Mar 30, 2026 (0d ago)
Homepage ↗ Repo ↗ Ai Ml ai-agents mcp llm-routing claude local-llm openai-compatible embeddings developer-tools
⚙ Agent Friendliness
70
/ 100
Can an agent use this?
🔒 Security
48
/ 100
Is it safe for agents?
⚡ Reliability
39
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
88
Documentation
75
Error Messages
0
Auth Simplicity
70
Rate Limits
80

🔒 Security

TLS Enforcement
60
Auth Strength
50
Scope Granularity
20
Dep. Hygiene
60
Secret Handling
55

Security posture is partially inferable: it supports connecting to local network endpoints and OpenAI-compatible cloud endpoints; README mentions LM_STUDIO_PASSWORD for cloud-like setups but does not detail TLS enforcement, secret redaction/logging behavior, or threat model. It claims privacy/no rate limits, but upstream provider security and transport (HTTP vs HTTPS) depend on the configured URL. Dependency hygiene is unknown beyond declared deps (@modelcontextprotocol/sdk, sql.js).

⚡ Reliability

Uptime/SLA
0
Version Stability
65
Breaking Changes
45
Error Recovery
45
AF Security Reliability

Best When

You already run an LLM (local or OpenAI-compatible endpoint) and want Claude Code to offload bounded, file-heavy subtasks to reduce context/tokens and cost while keeping Claude’s planning/orchestration.

Avoid When

You cannot reliably run or secure access to the target LLM endpoint(s), or you require strong compliance controls/auditability that aren’t described here; also avoid when the delegated tasks are not actually bounded (risk of pushing reasoning-heavy work to the wrong model).

Use Cases

  • Delegating bounded coding tasks from Claude Code to a local LLM to reduce token usage
  • Local/cloud hybrid routing across OpenAI-compatible providers
  • Generating structured JSON outputs via json_schema with grammar-constrained sampling
  • Generating embeddings for RAG pipelines through OpenAI-compatible /v1/embeddings
  • Performance monitoring of delegated calls (latency/tokens/sessions)
  • Model discovery and capability-aware routing via cached metadata

Not For

  • Running without access to an MCP-capable orchestrator (e.g., Claude Code integration workflow)
  • Tasks requiring strong multi-tool orchestration or deep agentic reasoning where Claude must remain the orchestrator
  • Scenarios needing a first-party hosted API/SaaS with centralized controls (this is primarily a local/bring-your-own-endpoint integration)

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
Yes
SDK
No
Webhooks
No

Authentication

Methods: OpenAI-compatible auth (e.g., bearer API key) via LM_STUDIO_PASSWORD/endpoint key for cloud endpoints No auth for local LM Studio/Ollama if not required by the local endpoint
OAuth: No Scopes: No

Authentication is delegated to the configured LLM endpoint (local often no key; cloud OpenAI-compatible endpoints typically use an API key). The README doesn’t describe OAuth flows or fine-grained scopes at the MCP-server layer.

Pricing

Free tier: No
Requires CC: No

Project itself is described as free; actual costs depend on which upstream LLM endpoint(s) you route to (local vs cloud providers). README explicitly claims “Free. No rate limits.” but this is about the server usage/integration, not necessarily upstream provider costs.

Agent Metadata

Pagination
none
Idempotent
False
Retry Guidance
Not documented

Known Gotchas

  • Delegation overhead can dominate for small tasks (MCP/tool call overhead).
  • Routing relies on model discovery/cached metadata; ensure the cache refresh cadence (described as 7 days) aligns with your model changes.
  • Local models may hallucinate with truncated input; README advises sending complete code or relevant functions only.
  • As of v2.8.0, enforces one call at a time using a request semaphore—agents should not assume concurrent tool execution.

Alternatives

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for houtini-lm.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

$3

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

Scores are editorial opinions as of 2026-03-30.

6533
Packages Evaluated
19870
Need Evaluation
586
Need Re-evaluation
Community Powered