Scoring Methodology

How Assay evaluates packages and calculates scores. We publish this openly because transparency builds trust.

Overview

Every package receives three independent scores on a 0–100 scale: Agent Friendliness (AF), Security, and Reliability. Each dimension measures a different quality aspect, and each is composed of weighted sub-components.

Scores are editorial opinions based on our evaluation methodology applied to publicly available information. They are not certifications or guarantees.

Data Sources

Evaluations draw from multiple public sources:

• GitHub repositories — README, docs, issues, releases, dependency manifests, commit history
• Official documentation — API references, getting-started guides, error catalogs, changelogs
• Package registries — npm, PyPI, crates.io metadata and version history
• MCP server implementations — Tool schemas, error handling, transport support
• Status pages & SLAs — Published uptime commitments and incident history

Evaluation Process

Each evaluation follows a structured pipeline:

Discovery — Package identified via GitHub search, registry crawl, or community submission
Data collection — Automated gathering of repository metadata, documentation, and dependency information
Structured evaluation — LLM-assisted analysis against each scoring dimension using a standardized rubric
Score calculation — Weighted combination of sub-component scores into dimension totals
Audit record — Every evaluation records the model used, tokens consumed, and raw output for transparency

Agent Friendliness (AF Score)

Measures how effectively an AI agent can use this package autonomously.

Sub-component	Weight
MCP Server Quality Existence, maturity, tool schemas, transport support	25%
Documentation Accuracy API docs completeness, examples, up-to-date content	25%
Error Message Quality Structured errors, codes, recovery guidance	20%
Auth Complexity Ease of programmatic authentication	15%
Rate Limit Clarity Documentation + response headers for limits	15%

Security Score

Measures whether it is safe for an agent to use this package.

Sub-component	Weight
TLS Enforcement HTTPS required for all communication	20%
Auth Strength Mechanism quality (API keys, OAuth2, mTLS)	25%
Scope Granularity Fine-grained permission controls	20%
Dependency Hygiene Clean dependencies, no known CVEs	15%
Secret Handling Credentials via env vars/vault, never in logs	20%

Reliability Score

Measures whether the package works consistently over time.

Sub-component	Weight
Uptime / SLA Published SLA, status page, uptime history	25%
Version Stability Stable releases, semver adherence	25%
Breaking Changes History Frequency of breaking changes, migration guides	25%
Error Recovery Retry guidance, idempotent operations	25%

Score Interpretation

80–100 Excellent

60–79 Good

40–59 Fair

<40 Needs Work

Limitations

• Point-in-time snapshots — Scores reflect the state of a package at evaluation time. Packages may have improved or degraded since.
• Public information only — We evaluate what is publicly visible. Internal documentation, private APIs, or unreleased features are not captured.
• LLM-assisted evaluation — While we use structured rubrics, LLM evaluation introduces inherent variability. We record all evaluation metadata for auditability.
• Security is not a penetration test — Security scores assess publicly visible security posture (TLS, auth mechanisms, dependency health). They do not replace security audits.
• Reliability from documentation — Uptime and stability scores are based on published SLAs and changelogs, not active monitoring.

Re-evaluation Frequency

Packages are flagged for re-evaluation when their last evaluation is over 90 days old. Re-evaluation priority is based on:

Packages with active monitoring subscribers
High-star / high-usage packages
Packages where maintainers have reported changes
Random sampling from the catalog

Maintainers can request immediate re-evaluation by opening a GitHub issue.

Score Disputes

If you maintain a package and believe its score is inaccurate, we want to hear from you. See Score Disputes & Corrections on the about page.