Lunary

Open-source LLM observability and analytics platform with full tracing, cost tracking, user analytics, and evaluation capabilities. Lunary captures every LLM call with inputs, outputs, tokens, costs, and latency — and provides a UI for analyzing agent behavior, debugging failures, and running evals. MIT licensed with self-host option. Built for production LLM apps: supports multi-step agent traces, user tracking, and A/B testing of prompts.

Evaluated Mar 06, 2026 (0d ago) vv1
Homepage ↗ Repo ↗ AI & Machine Learning llm observability tracing open-source agent analytics cost-tracking evaluation
⚙ Agent Friendliness
60
/ 100
Can an agent use this?
🔒 Security
81
/ 100
Is it safe for agents?
⚡ Reliability
75
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
82
Error Messages
78
Auth Simplicity
88
Rate Limits
75

🔒 Security

TLS Enforcement
100
Auth Strength
75
Scope Granularity
68
Dep. Hygiene
82
Secret Handling
80

MIT open source with self-host option for data sovereignty. HTTPS enforced for managed cloud. API keys should be kept server-side — client-side exposure risks trace injection. Self-hosting recommended for sensitive LLM inputs (PII, proprietary content).

⚡ Reliability

Uptime/SLA
75
Version Stability
75
Breaking Changes
72
Error Recovery
78
AF Security Reliability

Best When

Building production LLM agent applications where you need full trace visibility, cost accounting per user/feature, and continuous evaluation of LLM output quality.

Avoid When

Simple LLM prototyping where tracing overhead isn't justified — add Lunary when moving to production, not during development.

Use Cases

  • Trace multi-step agent execution with parent-child span relationships — see exactly which LLM calls happen in each agent run and how long each takes
  • Track LLM cost per user, per feature, and per agent run to identify expensive patterns and optimize prompts for cost
  • Run automated evaluations on production LLM traces — score outputs for quality, relevance, and safety using LLM-as-judge
  • Identify agent failure patterns by querying logged traces for error patterns, token limit hits, and low-quality outputs
  • A/B test prompt variations in production by routing a percentage of traffic to different prompt templates and comparing outcomes

Not For

  • Non-LLM application monitoring — Lunary is purpose-built for LLM observability; use Datadog or New Relic for general application monitoring
  • Teams needing enterprise compliance features without self-hosting — managed tier is early stage; self-host for full data control
  • ML model (non-LLM) monitoring — whylogs or Evidently are better for traditional ML drift detection

Interface

REST API
Yes
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: api_key
OAuth: No Scopes: No

API key for ingesting trace data. Key passed as environment variable LUNARY_PUBLIC_KEY. Separate project keys per environment (dev/prod). Dashboard at lunary.ai for viewing traces.

Pricing

Model: freemium
Free tier: Yes
Requires CC: No

MIT licensed — self-hosting is completely free and unlimited. Managed cloud free tier suitable for small projects. Production scale typically requires paid plan or self-hosted deployment.

Agent Metadata

Pagination
cursor
Idempotent
Partial
Retry Guidance
Not documented

Known Gotchas

  • Lunary SDK uses monkey-patching to intercept LLM calls — this can conflict with other SDK wrappers (LangChain callbacks, OpenTelemetry) if both are active
  • Trace parent-child relationships must be established explicitly with run_manager context — without proper context, all calls appear as independent top-level traces
  • Event batching is async — trace data appears in dashboard with 5-30 second delay; agents checking for their own traces immediately after execution may not see them
  • Token count tracking requires model-specific tokenizer configuration — incorrect tokenizer causes wrong cost estimates
  • Self-hosted deployment requires Docker and PostgreSQL — not a simple single-binary install
  • Free tier's 30-day retention means historical analysis of agent behavior requires paid plan or self-hosting with custom retention

Alternatives

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Lunary.

$99

Scores are editorial opinions as of 2026-03-06.

5178
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered