Scoring Methodology

How Assay evaluates packages and calculates scores. We publish this openly because transparency builds trust.

Overview

Every package receives three independent scores on a 0–100 scale: Agent Friendliness (AF), Security, and Reliability. Each dimension measures a different quality aspect, and each is composed of weighted sub-components.

Scores are editorial opinions based on our evaluation methodology applied to publicly available information. They are not certifications or guarantees.

Data Sources

Evaluations draw from multiple public sources:

  • GitHub repositories — README, docs, issues, releases, dependency manifests, commit history
  • Official documentation — API references, getting-started guides, error catalogs, changelogs
  • Package registries — npm, PyPI, crates.io metadata and version history
  • MCP server implementations — Tool schemas, error handling, transport support
  • Status pages & SLAs — Published uptime commitments and incident history

Evaluation Process

Each evaluation follows a structured pipeline:

  1. Discovery — Package identified via GitHub search, registry crawl, or community submission
  2. Data collection — Automated gathering of repository metadata, documentation, and dependency information
  3. Structured evaluation — LLM-assisted analysis against each scoring dimension using a standardized rubric
  4. Score calculation — Weighted combination of sub-component scores into dimension totals
  5. Audit record — Every evaluation records the model used, tokens consumed, and raw output for transparency

Agent Friendliness (AF Score)

Measures how effectively an AI agent can use this package autonomously.

Sub-component Weight
MCP Server Quality
Existence, maturity, tool schemas, transport support
25%
Documentation Accuracy
API docs completeness, examples, up-to-date content
25%
Error Message Quality
Structured errors, codes, recovery guidance
20%
Auth Complexity
Ease of programmatic authentication
15%
Rate Limit Clarity
Documentation + response headers for limits
15%

Security Score

Measures whether it is safe for an agent to use this package.

Sub-component Weight
TLS Enforcement
HTTPS required for all communication
20%
Auth Strength
Mechanism quality (API keys, OAuth2, mTLS)
25%
Scope Granularity
Fine-grained permission controls
20%
Dependency Hygiene
Clean dependencies, no known CVEs
15%
Secret Handling
Credentials via env vars/vault, never in logs
20%

Reliability Score

Measures whether the package works consistently over time.

Sub-component Weight
Uptime / SLA
Published SLA, status page, uptime history
25%
Version Stability
Stable releases, semver adherence
25%
Breaking Changes History
Frequency of breaking changes, migration guides
25%
Error Recovery
Retry guidance, idempotent operations
25%

Score Interpretation

80–100 Excellent
60–79 Good
40–59 Fair
<40 Needs Work

Limitations

  • Point-in-time snapshots — Scores reflect the state of a package at evaluation time. Packages may have improved or degraded since.
  • Public information only — We evaluate what is publicly visible. Internal documentation, private APIs, or unreleased features are not captured.
  • LLM-assisted evaluation — While we use structured rubrics, LLM evaluation introduces inherent variability. We record all evaluation metadata for auditability.
  • Security is not a penetration test — Security scores assess publicly visible security posture (TLS, auth mechanisms, dependency health). They do not replace security audits.
  • Reliability from documentation — Uptime and stability scores are based on published SLAs and changelogs, not active monitoring.

Re-evaluation Frequency

Packages are flagged for re-evaluation when their last evaluation is over 90 days old. Re-evaluation priority is based on:

  1. Packages with active monitoring subscribers
  2. High-star / high-usage packages
  3. Packages where maintainers have reported changes
  4. Random sampling from the catalog

Maintainers can request immediate re-evaluation by opening a GitHub issue.

Score Disputes

If you maintain a package and believe its score is inaccurate, we want to hear from you. See Score Disputes & Corrections on the about page.