Scoring Methodology
How Assay evaluates packages and calculates scores. We publish this openly because transparency builds trust.
Overview
Every package receives three independent scores on a 0–100 scale: Agent Friendliness (AF), Security, and Reliability. Each dimension measures a different quality aspect, and each is composed of weighted sub-components.
Scores are editorial opinions based on our evaluation methodology applied to publicly available information. They are not certifications or guarantees.
Data Sources
Evaluations draw from multiple public sources:
- • GitHub repositories — README, docs, issues, releases, dependency manifests, commit history
- • Official documentation — API references, getting-started guides, error catalogs, changelogs
- • Package registries — npm, PyPI, crates.io metadata and version history
- • MCP server implementations — Tool schemas, error handling, transport support
- • Status pages & SLAs — Published uptime commitments and incident history
Evaluation Process
Each evaluation follows a structured pipeline:
- Discovery — Package identified via GitHub search, registry crawl, or community submission
- Data collection — Automated gathering of repository metadata, documentation, and dependency information
- Structured evaluation — LLM-assisted analysis against each scoring dimension using a standardized rubric
- Score calculation — Weighted combination of sub-component scores into dimension totals
- Audit record — Every evaluation records the model used, tokens consumed, and raw output for transparency
Agent Friendliness (AF Score)
Measures how effectively an AI agent can use this package autonomously.
| Sub-component | Weight |
|---|---|
| MCP Server Quality Existence, maturity, tool schemas, transport support |
25% |
| Documentation Accuracy API docs completeness, examples, up-to-date content |
25% |
| Error Message Quality Structured errors, codes, recovery guidance |
20% |
| Auth Complexity Ease of programmatic authentication |
15% |
| Rate Limit Clarity Documentation + response headers for limits |
15% |
Security Score
Measures whether it is safe for an agent to use this package.
| Sub-component | Weight |
|---|---|
| TLS Enforcement HTTPS required for all communication |
20% |
| Auth Strength Mechanism quality (API keys, OAuth2, mTLS) |
25% |
| Scope Granularity Fine-grained permission controls |
20% |
| Dependency Hygiene Clean dependencies, no known CVEs |
15% |
| Secret Handling Credentials via env vars/vault, never in logs |
20% |
Reliability Score
Measures whether the package works consistently over time.
| Sub-component | Weight |
|---|---|
| Uptime / SLA Published SLA, status page, uptime history |
25% |
| Version Stability Stable releases, semver adherence |
25% |
| Breaking Changes History Frequency of breaking changes, migration guides |
25% |
| Error Recovery Retry guidance, idempotent operations |
25% |
Score Interpretation
Limitations
- • Point-in-time snapshots — Scores reflect the state of a package at evaluation time. Packages may have improved or degraded since.
- • Public information only — We evaluate what is publicly visible. Internal documentation, private APIs, or unreleased features are not captured.
- • LLM-assisted evaluation — While we use structured rubrics, LLM evaluation introduces inherent variability. We record all evaluation metadata for auditability.
- • Security is not a penetration test — Security scores assess publicly visible security posture (TLS, auth mechanisms, dependency health). They do not replace security audits.
- • Reliability from documentation — Uptime and stability scores are based on published SLAs and changelogs, not active monitoring.
Re-evaluation Frequency
Packages are flagged for re-evaluation when their last evaluation is over 90 days old. Re-evaluation priority is based on:
- Packages with active monitoring subscribers
- High-star / high-usage packages
- Packages where maintainers have reported changes
- Random sampling from the catalog
Maintainers can request immediate re-evaluation by opening a GitHub issue.
Score Disputes
If you maintain a package and believe its score is inaccurate, we want to hear from you. See Score Disputes & Corrections on the about page.