Criterion.rs
Statistics-driven micro-benchmarking library for Rust. Criterion.rs runs benchmarks multiple times, applies statistical analysis to detect performance regressions and improvements, generates HTML reports with interactive charts, and integrates with cargo bench. Uses Welch's t-test to determine if performance changes are statistically significant — reducing false positives from benchmark noise. The standard benchmarking tool for Rust.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
Local-only benchmarking tool — no network calls. No security concerns for the library itself.
⚡ Reliability
Best When
You need statistically valid micro-benchmarks for Rust agent code — Criterion's noise reduction and regression detection catch real performance changes from benchmark noise.
Avoid When
You need load testing or end-to-end performance measurement — use k6, wrk, or Locust for service-level performance testing.
Use Cases
- • Benchmark agent algorithm performance in Rust — measure serialization, parsing, or computation hotspots with statistically valid results
- • Detect performance regressions in Rust agent code with Criterion's baseline comparison — compare current vs committed performance
- • Profile different implementation strategies (SIMD vs scalar, cache-friendly vs not) with Criterion's parametric benchmarks
- • Generate shareable HTML performance reports for Rust agent library releases to communicate performance characteristics
- • Benchmark against multiple input sizes with BenchmarkGroup to understand agent algorithm scaling behavior
Not For
- • End-to-end integration benchmarking — Criterion is for micro-benchmarks of specific functions, not full system load testing
- • Profiling to find hotspots — use perf, flamegraph, or cargo-flamegraph for profiling; Criterion measures pre-identified bottlenecks
- • Go, Python, or non-Rust code — Criterion is Rust-only
Interface
Authentication
Local benchmarking library — no external auth or network calls.
Pricing
Apache 2.0 / MIT dual-licensed open source Rust crate.
Agent Metadata
Known Gotchas
- ⚠ Criterion requires the benchmarked function to use black_box() to prevent compiler optimization from eliminating the computation being measured — forgotten black_box() produces meaningless 0ns benchmarks
- ⚠ Running benchmarks in debug mode (cargo bench without --release) produces unoptimized results — always use cargo bench --release for valid performance measurements
- ⚠ Criterion baseline comparison requires committing the baseline first (cargo bench -- --save-baseline main) — without a baseline, regression detection doesn't work
- ⚠ Async benchmarks require criterion's async support with tokio::runtime::Runtime — sync criterion functions cannot await async agent code directly
- ⚠ CI benchmark variance is high due to shared infrastructure — Criterion results on CI should be used for trend detection, not absolute numbers; run on dedicated hardware for precise measurements
- ⚠ Criterion's HTML reports require gnuplot installed — without gnuplot, reports are text-only; install gnuplot or use cargo-criterion for modern HTML reports without gnuplot dependency
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Criterion.rs.
Scores are editorial opinions as of 2026-03-06.