librosa
Python audio and music analysis library — NumPy-based audio feature extraction and signal processing. librosa features: librosa.load() (audio loading with resampling), librosa.feature.melspectrogram(), librosa.feature.mfcc(), librosa.feature.chroma_stft(), librosa.beat.beat_track() (tempo and beats), librosa.onset.onset_detect(), librosa.effects.pitch_shift() and time_stretch(), librosa.stft() and librosa.istft(), librosa.display for visualization, harmonic/percussive source separation, and 50+ audio feature functions. NumPy-based — integrates with matplotlib for visualization and scikit-learn for ML. Standard audio analysis library for music information retrieval and audio ML feature extraction.
Score Breakdown
⚙ Agent Friendliness
🔒 Security
Local audio analysis — no network access, no data exfiltration. Audio file loading via soundfile/audioread — validate audio file sources for agent pipelines handling user-uploaded content. No known security concerns beyond standard Python dependency hygiene.
⚡ Reliability
Best When
Extracting audio features for ML model training (MFCC, mel spectrograms, chroma, beat features) or analyzing music/audio with NumPy-based pipelines — librosa is the standard Python audio analysis library with the richest feature extraction API.
Avoid When
You need GPU acceleration (use torchaudio), real-time processing, or are training PyTorch models (use torchaudio for better DataLoader integration).
Use Cases
- • Agent audio feature extraction — y, sr = librosa.load('audio.wav', sr=22050); mel = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000); mel_db = librosa.power_to_db(mel, ref=np.max) — mel spectrogram in dB for agent audio classifier; standard preprocessing for music genre classification
- • Agent MFCC features — mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13); mfcc_delta = librosa.feature.delta(mfcc) — 13 MFCC coefficients + deltas for agent speech/audio classification; standard features for keyword spotting and audio event detection
- • Agent beat analysis — tempo, beats = librosa.beat.beat_track(y=y, sr=sr); beat_times = librosa.frames_to_time(beats, sr=sr) — detect tempo (BPM) and beat positions; agent music analysis pipeline extracts rhythmic structure; music-synchronized agent actions
- • Agent pitch shifting — shifted = librosa.effects.pitch_shift(y, sr=sr, n_steps=2) — shift audio pitch by 2 semitones; agent audio augmentation for voice/instrument training data; librosa.effects.time_stretch(y, rate=1.2) changes speed without pitch change
- • Agent harmonic separation — y_harmonic, y_percussive = librosa.effects.hpss(y) — separate harmonic (melodic) and percussive (rhythm) components; agent music analysis isolates melody from rhythm; harmonic component used for pitch/chord analysis, percussive for beat tracking
Not For
- • Real-time audio processing — librosa is offline analysis; for real-time use sounddevice or torchaudio.io.StreamReader
- • GPU-accelerated processing — librosa is CPU/NumPy only; for GPU audio transforms use torchaudio
- • Professional audio production — librosa is analysis-focused; for production audio editing use soundfile, pydub, or DAW software
Interface
Authentication
No auth — local audio processing library.
Pricing
librosa is ISC licensed. Free for all use.
Agent Metadata
Known Gotchas
- ⚠ librosa.load returns float32 normalized to [-1, 1] — unlike soundfile which preserves original int16; agent code doing arithmetic on librosa-loaded audio may overflow if expecting int16 range; librosa output is always float32 normalized; don't multiply by 32768 expecting int16 range from librosa.load
- ⚠ Default sr=22050 resamples audio — librosa.load('audio.wav') resamples to 22050 Hz regardless of source; pass sr=None to preserve original sample rate: y, sr = librosa.load('audio.wav', sr=None); agent code comparing features from different-rate audio must use consistent sr
- ⚠ Mono conversion is default — librosa.load('stereo.wav') returns mono by default (averaged channels); librosa.load('audio.wav', mono=False) returns (2, samples) for stereo; agent code expecting stereo gets mono; be explicit about mono=True/False for agent audio pipelines
- ⚠ STFT frame/sample unit confusion — librosa.feature.melspectrogram returns (n_mels, n_frames); n_frames depends on hop_length; librosa.frames_to_time(frames, sr=sr, hop_length=512) converts frames to seconds; agent code mixing frame and sample indices in time calculations gets wrong timestamps
- ⚠ librosa.load of MP3 requires audioread or soundfile with MPEG support — pure soundfile doesn't read MP3; librosa falls back to audioread (requires ffmpeg/libav); agent environments without ffmpeg cannot load MP3 with librosa; install ffmpeg or convert to WAV before agent processing
- ⚠ power_to_db should use ref=np.max not ref=1.0 for visualization — librosa.power_to_db(mel, ref=np.max) scales relative to max value giving nice 0 to -80dB range; ref=1.0 gives absolute dB values often large negative numbers; agent mel spectrogram visualizations use ref=np.max for interpretable display
Alternatives
Full Evaluation Report
Detailed scoring breakdown, competitive positioning, security analysis, and improvement recommendations for librosa.
Scores are editorial opinions as of 2026-03-06.