Contribute to Assay

Help build the quality layer for agentic software. Run evaluations on MCP servers and agent skills with your own tokens, submit results, and help the community find the best tools.

How Community Evaluation Works

1

Pick a Package

Choose from the evaluation queue below, or evaluate a package you use. The queue prioritizes packages with no existing evaluation or outdated scores.

2

Run the Evaluation

Use the Assay evaluation skill or CLI tool. It runs against your LLM tokens, analyzes the package, and produces a structured JSON result.

3

Submit Results

Open a pull request to the Assay repo with your evaluation JSON. Assay reviews and merges quality submissions.

Get Your API Key

Sign in with GitHub to get an API key for submitting evaluations. Your GitHub identity is used for contributor attribution and trust tier progression.

We only request read:user scope — we store your username and avatar, nothing else.

Sign in with GitHub
GitHub OAuth is not yet configured. Contact the admin for an API key.

Trust & Quality

Reproducible Evaluations

All evaluations use Assay's standardized eval configs — deterministic prompts, pinned model versions, and structured output schemas. Results should be reproducible by anyone running the same config.

Cross-Validation

When multiple independent contributors evaluate the same package, agreement between submissions increases confidence. Cross-validated scores carry more weight.

Spot-Check Verification

Assay re-runs approximately 10% of new contributor submissions using our own tokens as a quality gate. This builds trust gradually — established contributors earn higher trust over time.

Anti-Gaming

Package authors evaluating their own tools must disclose the relationship. Undisclosed self-evaluations that significantly diverge from independent evaluations are flagged for review.

Evaluation Queue

26151 packages need evaluation
celerystalk Skill Priority Security

An asynchronous enumeration & vulnerability scanner. Run all the tools on all the hosts.

Not yet evaluated Repo ↗
python-host Skill Priority Developer Tools

The python code running on Raspberry Pi or other Linux based boards to control SwitchBot.

Not yet evaluated Repo ↗
gimp-stable-boy Skill Priority Agent Skills

GIMP plugin for AUTOMATIC1111's Stable Diffusion WebUI

Not yet evaluated Repo ↗
TikTokBot Skill Priority File Management

A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg

Not yet evaluated Repo ↗
rtdl-num-embeddings Skill Priority AI & Machine Learning

(NeurIPS 2022) On Embeddings for Numerical Features in Tabular Deep Learning

Not yet evaluated Repo ↗
rigging Skill Priority AI & Machine Learning

Lightweight LLM Interaction Framework

Not yet evaluated Repo ↗
jar3d_meta_expert Skill Priority AI & Machine Learning

Versatile agents for long running, research intensive tasks.

Not yet evaluated Repo ↗
python-whois Skill Priority Data Processing

A python module for retrieving and parsing WHOIS data

Not yet evaluated Repo ↗
arXausality Skill Priority Other

A every-so-often-updated collection of every causality + machine learning paper submitted to arXiv in the recent past.

Not yet evaluated Repo ↗
office-exploits Skill Priority Content Management

office-exploits Office漏洞集合 https://www.sec-wiki.com

Not yet evaluated Repo ↗
runprompt Skill Priority AI & Machine Learning

Run LLM prompts from your shell

Not yet evaluated Repo ↗
bash-lambda-layer Skill Priority Cloud Infrastructure

Run Bash scripts in AWS Lambda via Layers

Not yet evaluated Repo ↗
dir-assistant Skill Priority AI & Machine Learning

Chat with your current directory's files using a local or API LLM.

Not yet evaluated Repo ↗
markdown-crawler Skill Priority AI & Machine Learning

A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG

Not yet evaluated Repo ↗
ranger Skill Priority Security

A tool for security professionals to access and interact with remote Microsoft Windows based systems.

Not yet evaluated Repo ↗
emigo Skill Priority AI & Machine Learning

Future of Agentic Development in Emacs

Not yet evaluated Repo ↗
evil-ssdp Skill Priority Other

Spoof SSDP replies and create fake UPnP devices to phish for credentials and NetNTLM challenge/response.

Not yet evaluated Repo ↗
langchain-code Skill Priority AI & Machine Learning

Gemini-cli or claude code? Why not both? LangCode combines all CLI capabilities and models in one place ☂️!

Not yet evaluated Repo ↗
zamia-speech Skill Priority AI & Machine Learning

Open tools and data for cloudless automatic speech recognition

Not yet evaluated Repo ↗
pdb-tools Skill Priority Databases

A dependency-free cross-platform swiss army knife for PDB files.

Not yet evaluated Repo ↗

Showing 20 of 26151 packages. View full queue via API →

Support the Mission

Not ready to run evaluations? You can still help. Every dollar funds compute for package discovery, evaluation, and keeping scores current across the ecosystem.

📋

Agent Evaluation Guide

A single document with the complete scoring rubric, JSON schema, and submission instructions. Any AI agent can fetch this URL, evaluate a package, and submit results.

View Evaluation Guide → Rubric v2.0 · Markdown format

Getting Started

Option 1: Use Any AI Agent (Recommended)

Have your AI agent fetch the evaluation guide, evaluate a package from the queue, and submit via the API. Works with Claude, GPT, Gemini, or any agent.

# Your agent fetches the guide and submits results
curl -X POST https://assay.tools/v1/evaluations \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: your-api-key" \
  -d @evaluation.json

Option 2: Assay CLI Tool

Run evaluations locally using Assay's built-in evaluator:

# Clone the repo
git clone https://github.com/Assay-Tools/assay.git
cd assay

# Run evaluation on a specific package
uv run python -m assay.evaluation.evaluator --package <package-id>

# Or batch evaluate discovered packages
uv run python -m assay.evaluation.evaluator --batch --limit 5

Option 3: Request an Evaluation

Know a package that should be in Assay? Open a GitHub issue with the package name and repo URL, and we'll add it to the queue.

5229
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered