librosa.decompose.hpss¶

librosa.decompose.hpss(S, kernel_size=31, power=2.0, mask=False, margin=1.0)[source]¶

Median-filtering harmonic percussive source separation (HPSS).

If margin = 1.0, decomposes an input spectrogram S = H + P where H contains the harmonic components, and P contains the percussive components.

If margin > 1.0, decomposes an input spectrogram S = H + P + R where R contains residual components not included in H or P.

This implementation is based upon the algorithm described by [1] and [2].

[1]	Fitzgerald, Derry. “Harmonic/percussive separation using median filtering.” 13th International Conference on Digital Audio Effects (DAFX10), Graz, Austria, 2010.

[2]	(1, 2) Driedger, Müller, Disch. “Extending harmonic-percussive separation of audio.” 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan, 2014.

Parameters:

S : np.ndarray [shape=(d, n)]

input spectrogram. May be real (magnitude) or complex.

kernel_size : int or tuple (kernel_harmonic, kernel_percussive)

kernel size(s) for the median filters.

If scalar, the same size is used for both harmonic and percussive.
If tuple, the first value specifies the width of the harmonic filter, and the second value specifies the width of the percussive filter.

power : float > 0 [scalar]

Exponent for the Wiener filter when constructing soft mask matrices.

mask : bool

Return the masking matrices instead of components.

Masking matrices contain non-negative real values that can be used to measure the assignment of energy from S into harmonic or percussive components.

Components can be recovered by multiplying S * mask_H or S * mask_P.

margin : float or tuple (margin_harmonic, margin_percussive)

margin size(s) for the masks (as described in [2])

If scalar, the same size is used for both harmonic and percussive.
If tuple, the first value specifies the margin of the harmonic mask, and the second value specifies the margin of the percussive mask.

Returns:

harmonic : np.ndarray [shape=(d, n)]: harmonic component (or mask)
percussive : np.ndarray [shape=(d, n)]: percussive component (or mask)

See also

util.softmask

Notes

This function caches at level 30.

Examples

Separate into harmonic and percussive

>>> y, sr = librosa.load(librosa.util.example_audio_file(), duration=15)
>>> D = librosa.stft(y)
>>> H, P = librosa.decompose.hpss(D)

>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(3, 1, 1)
>>> librosa.display.specshow(librosa.amplitude_to_db(D,
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Full power spectrogram')
>>> plt.subplot(3, 1, 2)
>>> librosa.display.specshow(librosa.amplitude_to_db(H,
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Harmonic power spectrogram')
>>> plt.subplot(3, 1, 3)
>>> librosa.display.specshow(librosa.amplitude_to_db(P,
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Percussive power spectrogram')
>>> plt.tight_layout()

Or with a narrower horizontal filter

>>> H, P = librosa.decompose.hpss(D, kernel_size=(13, 31))

Just get harmonic/percussive masks, not the spectra

>>> mask_H, mask_P = librosa.decompose.hpss(D, mask=True)
>>> mask_H
array([[  1.000e+00,   1.469e-01, ...,   2.648e-03,   2.164e-03],
       [  1.000e+00,   2.368e-01, ...,   9.413e-03,   7.703e-03],
       ...,
       [  8.869e-01,   5.673e-02, ...,   4.603e-02,   1.247e-05],
       [  7.068e-01,   2.194e-02, ...,   4.453e-02,   1.205e-05]], dtype=float32)
>>> mask_P
array([[  2.858e-05,   8.531e-01, ...,   9.974e-01,   9.978e-01],
       [  1.586e-05,   7.632e-01, ...,   9.906e-01,   9.923e-01],
       ...,
       [  1.131e-01,   9.433e-01, ...,   9.540e-01,   1.000e+00],
       [  2.932e-01,   9.781e-01, ...,   9.555e-01,   1.000e+00]], dtype=float32)

Separate into harmonic/percussive/residual components by using a margin > 1.0

>>> H, P = librosa.decompose.hpss(D, margin=3.0)
>>> R = D - (H+P)
>>> y_harm = librosa.core.istft(H)
>>> y_perc = librosa.core.istft(P)
>>> y_resi = librosa.core.istft(R)

Get a more isolated percussive component by widening its margin

>>> H, P = librosa.decompose.hpss(D, margin=(1.0,5.0))

(Source code)