houtini-lm

Provides an MCP server that connects Claude Code to a local (LM Studio/Ollama/vLLM/llama.cpp) or OpenAI-compatible LLM endpoint. It routes bounded “grunt work” tasks to the cheaper model (e.g., code review drafts, tests, commit messages, format conversion, mock data, embeddings), while leaving complex reasoning and orchestration to Claude. Includes model discovery/caching, per-model routing hints, performance stats, and tool functions such as chat/custom_prompt/code_task/embed/discover/list_models.

Evaluated Mar 30, 2026 (90d ago)

Homepage ↗ Repo ↗ Ai Ml ai-agents mcp llm-routing claude local-llm openai-compatible embeddings developer-tools

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Security posture is partially inferable: it supports connecting to local network endpoints and OpenAI-compatible cloud endpoints; README mentions LM_STUDIO_PASSWORD for cloud-like setups but does not detail TLS enforcement, secret redaction/logging behavior, or threat model. It claims privacy/no rate limits, but upstream provider security and transport (HTTP vs HTTPS) depend on the configured URL. Dependency hygiene is unknown beyond declared deps (@modelcontextprotocol/sdk, sql.js).

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

You already run an LLM (local or OpenAI-compatible endpoint) and want Claude Code to offload bounded, file-heavy subtasks to reduce context/tokens and cost while keeping Claude’s planning/orchestration.

Avoid When

You cannot reliably run or secure access to the target LLM endpoint(s), or you require strong compliance controls/auditability that aren’t described here; also avoid when the delegated tasks are not actually bounded (risk of pushing reasoning-heavy work to the wrong model).

Use Cases

• Delegating bounded coding tasks from Claude Code to a local LLM to reduce token usage
• Local/cloud hybrid routing across OpenAI-compatible providers
• Generating structured JSON outputs via json_schema with grammar-constrained sampling
• Generating embeddings for RAG pipelines through OpenAI-compatible /v1/embeddings
• Performance monitoring of delegated calls (latency/tokens/sessions)
• Model discovery and capability-aware routing via cached metadata

Not For

• Running without access to an MCP-capable orchestrator (e.g., Claude Code integration workflow)
• Tasks requiring strong multi-tool orchestration or deep agentic reasoning where Claude must remain the orchestrator
• Scenarios needing a first-party hosted API/SaaS with centralized controls (this is primarily a local/bring-your-own-endpoint integration)

Interface

REST API

GraphQL

gRPC

MCP Server

Yes

SDK

Webhooks

Authentication

Methods: OpenAI-compatible auth (e.g., bearer API key) via LM_STUDIO_PASSWORD/endpoint key for cloud endpoints No auth for local LM Studio/Ollama if not required by the local endpoint

OAuth: No Scopes: No

Authentication is delegated to the configured LLM endpoint (local often no key; cloud OpenAI-compatible endpoints typically use an API key). The README doesn’t describe OAuth flows or fine-grained scopes at the MCP-server layer.

Pricing

Free tier: No

Requires CC: No

Project itself is described as free; actual costs depend on which upstream LLM endpoint(s) you route to (local vs cloud providers). README explicitly claims “Free. No rate limits.” but this is about the server usage/integration, not necessarily upstream provider costs.

Agent Metadata

Pagination

none

Idempotent

False

Retry Guidance

Not documented

Known Gotchas

⚠ Delegation overhead can dominate for small tasks (MCP/tool call overhead).
⚠ Routing relies on model discovery/cached metadata; ensure the cache refresh cadence (described as 7 days) aligns with your model changes.
⚠ Local models may hallucinate with truncated input; README advises sending complete code or relevant functions only.
⚠ As of v2.8.0, enforces one call at a time using a request semaphore—agents should not assume concurrent tool execution.

Alternatives

Other MCP servers/integration layers for routing tasks to local LLMs Direct use of an OpenAI-compatible API with your own orchestration layer Using Claude Code configuration/tools without an MCP delegation server (manual prompting) Community wrappers around Ollama/vLLM/OpenAI-compatible endpoints for agent delegation

Full Evaluation Report

Comprehensive deep-dive: security analysis, reliability audit, agent experience review, cost modeling, competitive positioning, and improvement roadmap for houtini-lm.

AI-powered analysis · PDF + markdown · Delivered within 30 minutes

$99

Package Brief

Quick verdict, integration guide, cost projections, gotchas with workarounds, and alternatives comparison.

Delivered within 10 minutes

Score Monitoring

Get alerted when this package's AF, security, or reliability scores change significantly. Stay ahead of regressions.

Continuous monitoring

$3/mo

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-30.