Contribute to Assay
Help build the quality layer for agentic software. Run evaluations on MCP servers and agent skills with your own tokens, submit results, and help the community find the best tools.
How Community Evaluation Works
Pick a Package
Choose from the evaluation queue below, or evaluate a package you use. The queue prioritizes packages with no existing evaluation or outdated scores.
Run the Evaluation
Use the Assay evaluation skill or CLI tool. It runs against your LLM tokens, analyzes the package, and produces a structured JSON result.
Submit Results
Open a pull request to the Assay repo with your evaluation JSON. Assay reviews and merges quality submissions.
Get Your API Key
Sign in with GitHub to get an API key for submitting evaluations. Your GitHub identity is used for contributor attribution and trust tier progression.
We only request read:user scope — we store your
username and avatar, nothing else.
Trust & Quality
Reproducible Evaluations
All evaluations use Assay's standardized eval configs — deterministic prompts, pinned model versions, and structured output schemas. Results should be reproducible by anyone running the same config.
Cross-Validation
When multiple independent contributors evaluate the same package, agreement between submissions increases confidence. Cross-validated scores carry more weight.
Spot-Check Verification
Assay re-runs approximately 10% of new contributor submissions using our own tokens as a quality gate. This builds trust gradually — established contributors earn higher trust over time.
Anti-Gaming
Package authors evaluating their own tools must disclose the relationship. Undisclosed self-evaluations that significantly diverge from independent evaluations are flagged for review.
Evaluation Queue
26151 packages need evaluationAn asynchronous enumeration & vulnerability scanner. Run all the tools on all the hosts.
The python code running on Raspberry Pi or other Linux based boards to control SwitchBot.
GIMP plugin for AUTOMATIC1111's Stable Diffusion WebUI
A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg
(NeurIPS 2022) On Embeddings for Numerical Features in Tabular Deep Learning
Lightweight LLM Interaction Framework
Versatile agents for long running, research intensive tasks.
A python module for retrieving and parsing WHOIS data
A every-so-often-updated collection of every causality + machine learning paper submitted to arXiv in the recent past.
office-exploits Office漏洞集合 https://www.sec-wiki.com
Run LLM prompts from your shell
Run Bash scripts in AWS Lambda via Layers
Chat with your current directory's files using a local or API LLM.
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG
A tool for security professionals to access and interact with remote Microsoft Windows based systems.
Future of Agentic Development in Emacs
Spoof SSDP replies and create fake UPnP devices to phish for credentials and NetNTLM challenge/response.
Gemini-cli or claude code? Why not both? LangCode combines all CLI capabilities and models in one place ☂️!
Open tools and data for cloudless automatic speech recognition
A dependency-free cross-platform swiss army knife for PDB files.
Showing 20 of 26151 packages. View full queue via API →
Support the Mission
Not ready to run evaluations? You can still help. Every dollar funds compute for package discovery, evaluation, and keeping scores current across the ecosystem.
Agent Evaluation Guide
A single document with the complete scoring rubric, JSON schema, and submission instructions. Any AI agent can fetch this URL, evaluate a package, and submit results.
View Evaluation Guide → Rubric v2.0 · Markdown formatGetting Started
Option 1: Use Any AI Agent (Recommended)
Have your AI agent fetch the evaluation guide, evaluate a package from the queue, and submit via the API. Works with Claude, GPT, Gemini, or any agent.
# Your agent fetches the guide and submits results
curl -X POST https://assay.tools/v1/evaluations \
-H "Content-Type: application/json" \
-H "X-Api-Key: your-api-key" \
-d @evaluation.json
Option 2: Assay CLI Tool
Run evaluations locally using Assay's built-in evaluator:
# Clone the repo
git clone https://github.com/Assay-Tools/assay.git
cd assay
# Run evaluation on a specific package
uv run python -m assay.evaluation.evaluator --package <package-id>
# Or batch evaluate discovered packages
uv run python -m assay.evaluation.evaluator --batch --limit 5
Option 3: Request an Evaluation
Know a package that should be in Assay? Open a GitHub issue with the package name and repo URL, and we'll add it to the queue.