librosa

Python audio and music analysis library — NumPy-based audio feature extraction and signal processing. librosa features: librosa.load() (audio loading with resampling), librosa.feature.melspectrogram(), librosa.feature.mfcc(), librosa.feature.chroma_stft(), librosa.beat.beat_track() (tempo and beats), librosa.onset.onset_detect(), librosa.effects.pitch_shift() and time_stretch(), librosa.stft() and librosa.istft(), librosa.display for visualization, harmonic/percussive source separation, and 50+ audio feature functions. NumPy-based — integrates with matplotlib for visualization and scikit-learn for ML. Standard audio analysis library for music information retrieval and audio ML feature extraction.

Evaluated Mar 06, 2026 (0d ago) v0.10.x

Homepage ↗ Repo ↗ AI & Machine Learning python librosa audio music signal-processing mel-spectrogram beat-tracking mfcc

⚙ Agent Friendliness

/ 100

Can an agent use this?

🔒 Security

/ 100

Is it safe for agents?

⚡ Reliability

/ 100

Does it work consistently?

Score Breakdown

⚙ Agent Friendliness

MCP Quality

Documentation

Error Messages

Auth Simplicity

Rate Limits

🔒 Security

TLS Enforcement

Auth Strength

Scope Granularity

Dep. Hygiene

Secret Handling

Local audio analysis — no network access, no data exfiltration. Audio file loading via soundfile/audioread — validate audio file sources for agent pipelines handling user-uploaded content. No known security concerns beyond standard Python dependency hygiene.

⚡ Reliability

Uptime/SLA

Version Stability

Breaking Changes

Error Recovery

Best When

Extracting audio features for ML model training (MFCC, mel spectrograms, chroma, beat features) or analyzing music/audio with NumPy-based pipelines — librosa is the standard Python audio analysis library with the richest feature extraction API.

Avoid When

You need GPU acceleration (use torchaudio), real-time processing, or are training PyTorch models (use torchaudio for better DataLoader integration).

Use Cases

• Agent audio feature extraction — y, sr = librosa.load('audio.wav', sr=22050); mel = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000); mel_db = librosa.power_to_db(mel, ref=np.max) — mel spectrogram in dB for agent audio classifier; standard preprocessing for music genre classification
• Agent MFCC features — mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13); mfcc_delta = librosa.feature.delta(mfcc) — 13 MFCC coefficients + deltas for agent speech/audio classification; standard features for keyword spotting and audio event detection
• Agent beat analysis — tempo, beats = librosa.beat.beat_track(y=y, sr=sr); beat_times = librosa.frames_to_time(beats, sr=sr) — detect tempo (BPM) and beat positions; agent music analysis pipeline extracts rhythmic structure; music-synchronized agent actions
• Agent pitch shifting — shifted = librosa.effects.pitch_shift(y, sr=sr, n_steps=2) — shift audio pitch by 2 semitones; agent audio augmentation for voice/instrument training data; librosa.effects.time_stretch(y, rate=1.2) changes speed without pitch change
• Agent harmonic separation — y_harmonic, y_percussive = librosa.effects.hpss(y) — separate harmonic (melodic) and percussive (rhythm) components; agent music analysis isolates melody from rhythm; harmonic component used for pitch/chord analysis, percussive for beat tracking

Not For

• Real-time audio processing — librosa is offline analysis; for real-time use sounddevice or torchaudio.io.StreamReader
• GPU-accelerated processing — librosa is CPU/NumPy only; for GPU audio transforms use torchaudio
• Professional audio production — librosa is analysis-focused; for production audio editing use soundfile, pydub, or DAW software

Interface

REST API

GraphQL

gRPC

MCP Server

SDK

Yes

Webhooks

Authentication

Methods: none

OAuth: No Scopes: No

No auth — local audio processing library.

Pricing

Model: open_source

Free tier: Yes

Requires CC: No

librosa is ISC licensed. Free for all use.

Agent Metadata

Pagination

none

Idempotent

Full

Retry Guidance

Not documented

Known Gotchas

⚠ librosa.load returns float32 normalized to [-1, 1] — unlike soundfile which preserves original int16; agent code doing arithmetic on librosa-loaded audio may overflow if expecting int16 range; librosa output is always float32 normalized; don't multiply by 32768 expecting int16 range from librosa.load
⚠ Default sr=22050 resamples audio — librosa.load('audio.wav') resamples to 22050 Hz regardless of source; pass sr=None to preserve original sample rate: y, sr = librosa.load('audio.wav', sr=None); agent code comparing features from different-rate audio must use consistent sr
⚠ Mono conversion is default — librosa.load('stereo.wav') returns mono by default (averaged channels); librosa.load('audio.wav', mono=False) returns (2, samples) for stereo; agent code expecting stereo gets mono; be explicit about mono=True/False for agent audio pipelines
⚠ STFT frame/sample unit confusion — librosa.feature.melspectrogram returns (n_mels, n_frames); n_frames depends on hop_length; librosa.frames_to_time(frames, sr=sr, hop_length=512) converts frames to seconds; agent code mixing frame and sample indices in time calculations gets wrong timestamps
⚠ librosa.load of MP3 requires audioread or soundfile with MPEG support — pure soundfile doesn't read MP3; librosa falls back to audioread (requires ffmpeg/libav); agent environments without ffmpeg cannot load MP3 with librosa; install ffmpeg or convert to WAV before agent processing
⚠ power_to_db should use ref=np.max not ref=1.0 for visualization — librosa.power_to_db(mel, ref=np.max) scales relative to max value giving nice 0 to -80dB range; ref=1.0 gives absolute dB values often large negative numbers; agent mel spectrogram visualizations use ref=np.max for interpretable display

Alternatives

torchaudio-api whisper-api

Full Evaluation Report

Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for librosa.

$99

API endpoint ↗ Agent guide ↗ Report inaccuracy

Scores are editorial opinions as of 2026-03-06.