Criterion.rs

Statistics-driven micro-benchmarking library for Rust. Criterion.rs runs benchmarks multiple times, applies statistical analysis to detect performance regressions and improvements, generates HTML reports with interactive charts, and integrates with cargo bench. Uses Welch's t-test to determine if performance changes are statistically significant — reducing false positives from benchmark noise. The standard benchmarking tool for Rust.

Evaluated Mar 06, 2026 (0d ago) v0.5+
Homepage ↗ Repo ↗ Developer Tools rust benchmarking performance statistics testing open-source
⚙ Agent Friendliness
68
/ 100
Can an agent use this?
🔒 Security
89
/ 100
Is it safe for agents?
⚡ Reliability
86
/ 100
Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality
--
Documentation
88
Error Messages
82
Auth Simplicity
100
Rate Limits
98

🔒 Security

TLS Enforcement
90
Auth Strength
90
Scope Granularity
88
Dep. Hygiene
88
Secret Handling
90

Local-only benchmarking tool — no network calls. No security concerns for the library itself.

⚡ Reliability

Uptime/SLA
90
Version Stability
85
Breaking Changes
82
Error Recovery
88
AF Security Reliability

Best When

You need statistically valid micro-benchmarks for Rust agent code — Criterion's noise reduction and regression detection catch real performance changes from benchmark noise.

Avoid When

You need load testing or end-to-end performance measurement — use k6, wrk, or Locust for service-level performance testing.

Use Cases

  • Benchmark agent algorithm performance in Rust — measure serialization, parsing, or computation hotspots with statistically valid results
  • Detect performance regressions in Rust agent code with Criterion's baseline comparison — compare current vs committed performance
  • Profile different implementation strategies (SIMD vs scalar, cache-friendly vs not) with Criterion's parametric benchmarks
  • Generate shareable HTML performance reports for Rust agent library releases to communicate performance characteristics
  • Benchmark against multiple input sizes with BenchmarkGroup to understand agent algorithm scaling behavior

Not For

  • End-to-end integration benchmarking — Criterion is for micro-benchmarks of specific functions, not full system load testing
  • Profiling to find hotspots — use perf, flamegraph, or cargo-flamegraph for profiling; Criterion measures pre-identified bottlenecks
  • Go, Python, or non-Rust code — Criterion is Rust-only

Interface

REST API
No
GraphQL
No
gRPC
No
MCP Server
No
SDK
Yes
Webhooks
No

Authentication

Methods: none
OAuth: No Scopes: No

Local benchmarking library — no external auth or network calls.

Pricing

Model: open_source
Free tier: Yes
Requires CC: No

Apache 2.0 / MIT dual-licensed open source Rust crate.

Agent Metadata

Pagination
none
Idempotent
Full
Retry Guidance
Not documented

Known Gotchas

  • Criterion requires the benchmarked function to use black_box() to prevent compiler optimization from eliminating the computation being measured — forgotten black_box() produces meaningless 0ns benchmarks
  • Running benchmarks in debug mode (cargo bench without --release) produces unoptimized results — always use cargo bench --release for valid performance measurements
  • Criterion baseline comparison requires committing the baseline first (cargo bench -- --save-baseline main) — without a baseline, regression detection doesn't work
  • Async benchmarks require criterion's async support with tokio::runtime::Runtime — sync criterion functions cannot await async agent code directly
  • CI benchmark variance is high due to shared infrastructure — Criterion results on CI should be used for trend detection, not absolute numbers; run on dedicated hardware for precise measurements
  • Criterion's HTML reports require gnuplot installed — without gnuplot, reports are text-only; install gnuplot or use cargo-criterion for modern HTML reports without gnuplot dependency

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Criterion.rs.

$99

Scores are editorial opinions as of 2026-03-06.

5215
Packages Evaluated
26151
Need Evaluation
173
Need Re-evaluation
Community Powered