Core IO and DSP¶
Audio processing¶
load (path[, sr, mono, offset, duration, …]) |
Load an audio file as a floating point time series. |
to_mono (y) |
Force an audio signal down to mono. |
resample (y, orig_sr, target_sr[, res_type, …]) |
Resample a time series from orig_sr to target_sr |
get_duration ([y, sr, S, n_fft, hop_length, …]) |
Compute the duration (in seconds) of an audio time series, feature matrix, or filename. |
autocorrelate (y[, max_size, axis]) |
Bounded auto-correlation |
zero_crossings (y[, threshold, …]) |
Find the zero-crossings of a signal y: indices i such that sign(y[i]) != sign(y[j]). |
clicks ([times, frames, sr, hop_length, …]) |
Returns a signal with the signal click placed at each specified time |
Spectral representations¶
stft (y[, n_fft, hop_length, win_length, …]) |
Short-time Fourier transform (STFT) |
istft (stft_matrix[, hop_length, win_length, …]) |
Inverse short-time Fourier transform (ISTFT). |
ifgram (y[, sr, n_fft, hop_length, …]) |
Compute the instantaneous frequency (as a proportion of the sampling rate) obtained as the time-derivative of the phase of the complex spectrum as described by [Ra44d590316d7-1]. |
cqt (y[, sr, hop_length, fmin, n_bins, …]) |
Compute the constant-Q transform of an audio signal. |
icqt (C[, sr, hop_length, fmin, …]) |
Compute the inverse constant-Q transform. |
hybrid_cqt (y[, sr, hop_length, fmin, …]) |
Compute the hybrid constant-Q transform of an audio signal. |
pseudo_cqt (y[, sr, hop_length, fmin, …]) |
Compute the pseudo constant-Q transform of an audio signal. |
iirt (y[, sr, win_length, hop_length, …]) |
Time-frequency representation using IIR filters [Rd4077732470d-1]. |
fmt (y[, t_min, n_fmt, kind, beta, …]) |
The fast Mellin transform (FMT) [R6343f8d4cac9-1] of a uniformly sampled signal y. |
interp_harmonics (x, freqs, h_range[, kind, …]) |
Compute the energy at harmonics of time-frequency representation. |
salience (S, freqs, h_range[, weights, …]) |
Harmonic salience function. |
phase_vocoder (D, rate[, hop_length]) |
Phase vocoder. |
magphase (D[, power]) |
Separate a complex-valued spectrogram D into its magnitude (S) and phase (P) components, so that D = S * P. |
Magnitude scaling¶
amplitude_to_db (S[, ref, amin, top_db]) |
Convert an amplitude spectrogram to dB-scaled spectrogram. |
db_to_amplitude (S_db[, ref]) |
Convert a dB-scaled spectrogram to an amplitude spectrogram. |
power_to_db (S[, ref, amin, top_db]) |
Convert a power spectrogram (amplitude squared) to decibel (dB) units |
db_to_power (S_db[, ref]) |
Convert a dB-scale spectrogram to a power spectrogram. |
perceptual_weighting (S, frequencies, **kwargs) |
Perceptual weighting of a power spectrogram: |
A_weighting (frequencies[, min_db]) |
Compute the A-weighting of a set of frequencies. |
Time and frequency conversion¶
frames_to_samples (frames[, hop_length, n_fft]) |
Converts frame indices to audio sample indices |
frames_to_time (frames[, sr, hop_length, n_fft]) |
Converts frame counts to time (seconds) |
samples_to_frames (samples[, hop_length, n_fft]) |
Converts sample indices into STFT frames. |
samples_to_time (samples[, sr]) |
Convert sample indices to time (in seconds). |
time_to_frames (times[, sr, hop_length, n_fft]) |
Converts time stamps into STFT frames. |
time_to_samples (times[, sr]) |
Convert timestamps (in seconds) to sample indices. |
hz_to_note (frequencies, **kwargs) |
Convert one or more frequencies (in Hz) to the nearest note names. |
hz_to_midi (frequencies) |
Get MIDI note number(s) for given frequencies |
midi_to_hz (notes) |
Get the frequency (Hz) of MIDI note(s) |
midi_to_note (midi[, octave, cents]) |
Convert one or more MIDI numbers to note strings. |
note_to_hz (note, **kwargs) |
Convert one or more note names to frequency (Hz) |
note_to_midi (note[, round_midi]) |
Convert one or more spelled notes to MIDI number(s). |
hz_to_mel (frequencies[, htk]) |
Convert Hz to Mels |
hz_to_octs (frequencies[, A440]) |
Convert frequencies (Hz) to (fractional) octave numbers. |
mel_to_hz (mels[, htk]) |
Convert mel bin numbers to frequencies |
octs_to_hz (octs[, A440]) |
Convert octaves numbers to frequencies. |
fft_frequencies ([sr, n_fft]) |
Alternative implementation of np.fft.fftfreqs |
cqt_frequencies (n_bins, fmin[, …]) |
Compute the center frequencies of Constant-Q bins. |
mel_frequencies ([n_mels, fmin, fmax, htk]) |
Compute the center frequencies of mel bands. |
tempo_frequencies (n_bins[, hop_length, sr]) |
Compute the frequencies (in beats-per-minute) corresponding to an onset auto-correlation or tempogram matrix. |
Pitch and tuning¶
estimate_tuning ([y, sr, S, n_fft, …]) |
Estimate the tuning of an audio time series or spectrogram input. |
pitch_tuning (frequencies[, resolution, …]) |
Given a collection of pitches, estimate its tuning offset (in fractions of a bin) relative to A440=440.0Hz. |
piptrack ([y, sr, S, n_fft, hop_length, …]) |
Pitch tracking on thresholded parabolically-interpolated STFT |
Dynamic Time Warping¶
dtw ([X, Y, C, metric, step_sizes_sigma, …]) |
Dynamic time warping (DTW). |
fill_off_diagonal (x, radius[, value]) |
Sets all cells of a matrix to a given value if they lie outside a constraint region. |