Scalene
High-performance CPU, GPU, and memory profiler for Python with line-level granularity — profiles both Python and native code without significant overhead. Scalene features: scalene script.py for CLI profiling, @profile decorator, line-level CPU time (Python vs C), memory allocation/consumption per line, GPU time tracking (CUDA), memory leak detection, copy volume tracking (identifies data movement bottlenecks), web-based HTML report, JSON output for programmatic analysis, AI-powered optimization suggestions (--openai), minimal overhead (<10% CPU impact vs 100%+ for cProfile), and async/multiprocessing support.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
Profiling output may expose code structure and variable names — treat profile reports as sensitive. AI suggestion feature (--openai) sends code snippets to OpenAI — disable for sensitive code. No network calls during profiling itself.
⚡ Reliability
Best When
Finding Python performance bottlenecks at line level combining CPU, memory, and GPU — Scalene's simultaneous CPU+memory+GPU line-level view in one report is unique; identifies whether to optimize Python code, use numpy, or fix memory patterns.
Avoid When
You need exact call counts (use cProfile), are on Windows, or need zero-overhead production attach (use py-spy).
Use Cases
- • Agent CPU+memory co-profiling — scalene agent.py --output-file profile.html — generates HTML report showing CPU time AND memory per line; agent developer sees that line 42 uses 80% CPU and 2GB memory simultaneously; standard profilers show CPU or memory separately — Scalene shows both in unified view
- • Agent Python vs native breakdown — Scalene report shows 'Python time' vs 'System time' per line; agent developer discovers numpy.dot() spends 5% in Python overhead and 95% in C BLAS — identified correctly as already optimal; line calling pure Python loop takes 90% Python time — prime optimization target
- • Agent memory leak detection — scalene agent_worker.py --memory-leak-threshold 10 — detects lines where memory grows monotonically across samples; agent long-running worker diagnosed: list appends inside loop never get garbage collected because list held in module-level cache dict
- • Agent GPU profiling — scalene ml_agent.py — tracks GPU time per line in CUDA code; agent ML engineer identifies data preprocessing on CPU consumes 60% of total time while GPU sits idle waiting; GPU time column shows where model inference actually runs vs wasted time
- • Agent copy volume tracking — Scalene reports 'Copy' column showing bytes copied between C and Python; agent code doing df.values repeatedly in loop copies 100MB per iteration; Scalene pinpoints exact line causing data movement overhead invisible to CPU/memory-only profilers
Not For
- • Tracing (exact call counts) — Scalene is sampling-based; for exact call counts use cProfile or py-trace
- • Windows — Scalene has limited Windows support; use cProfile/py-spy on Windows
- • Zero-overhead production monitoring — Scalene overhead is low (1-5%) but not zero; use py-spy for attach-and-detach production profiling
Interface
Authentication
No auth — local profiling tool. Optional AI suggestions use OpenAI API key.
Pricing
Scalene is Apache 2.0 licensed. Free for all use. AI suggestions feature uses OpenAI API separately (optional).
Agent Metadata
Known Gotchas
- ⚠ scalene must run script not import module — scalene agent.py runs agent.py as script; scalene cannot profile: python -m module; agent packages must have a runnable entry point script or use @scalene_profiler.profile() decorator for specific function profiling
- ⚠ HTML report opens browser by default — scalene agent.py opens browser with live HTML report; CI/headless environments need: scalene --json --outfile profile.json agent.py for machine-readable output; or scalene --html --outfile report.html to save file without opening
- ⚠ GPU profiling requires CUDA and cupy/torch — scalene shows GPU time only for GPU operations via CUDA; pure Python or NumPy CPU operations show 0 GPU time; agent ML profiling must have CUDA installed and GPU operations for GPU column to populate
- ⚠ Memory profiling shows current not peak — Scalene memory column shows memory in use at sampling time, not peak allocation; agent code that allocates and frees rapidly may show low memory despite high churn; use --memory-leak-threshold to track trends over time
- ⚠ Multiprocessing requires --profile-all — scalene agent.py only profiles main process by default; agent code with multiprocessing.Pool() workers not profiled; use scalene --profile-all agent.py to include child processes; pmap-parallel code requires explicit flag
- ⚠ Copy column indicates Python↔C data movement — high Copy values mean frequent conversion between Python objects and C arrays; agent code doing np.array(python_list) inside loop shows high copy cost; move numpy creation outside loop or use pre-allocated arrays to eliminate repeated copy
Alternatives
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for Scalene.
Scores are editorial opinions as of 2026-03-06.