torchaudio¶
The torchaudio
package consists of I/O, popular datasets and common audio transformations.
-
torchaudio.
get_sox_bool
(i=0)[source]¶ Get enum of sox_bool for sox encodinginfo options.
Parameters: i (int, optional) – Choose type or get a dict with all possible options use __members__
to see all options when not specified. (Default:sox_false
or0
)Returns: A sox_bool type Return type: sox_bool
-
torchaudio.
get_sox_encoding_t
(i=None)[source]¶ Get enum of sox_encoding_t for sox encodings.
Parameters: i (int, optional) – Choose type or get a dict with all possible options use __members__
to see all options when not specified. (Default:None
)Returns: A sox_encoding_t type for output encoding Return type: sox_encoding_t
-
torchaudio.
get_sox_option_t
(i=2)[source]¶ Get enum of sox_option_t for sox encodinginfo options.
Parameters: i (int, optional) – Choose type or get a dict with all possible options use __members__
to see all options when not specified. (Default:sox_option_default
or2
)Returns: A sox_option_t type Return type: sox_option_t
-
torchaudio.
info
(filepath)[source]¶ Gets metadata from an audio file without loading the signal.
Parameters: filepath (str) – Path to audio file Returns: A si (sox_signalinfo_t) signal info as a python object. An ei (sox_encodinginfo_t) encoding info Return type: Tuple[sox_signalinfo_t, sox_encodinginfo_t] - Example
>>> si, ei = torchaudio.info('foo.wav') >>> rate, channels, encoding = si.rate, si.channels, ei.encoding
-
torchaudio.
initialize_sox
()[source]¶ Initialize sox for use with effects chains. This is not required for simple loading. Importantly, only run initialize_sox once and do not shutdown after each effect chain, but rather once you are finished with all effects chains.
-
torchaudio.
load
(filepath, out=None, normalization=True, channels_first=True, num_frames=0, offset=0, signalinfo=None, encodinginfo=None, filetype=None)[source]¶ Loads an audio file from disk into a tensor
Parameters: - filepath (str or pathlib.Path) – Path to audio file
- out (torch.Tensor, optional) – An output tensor to use instead of creating one. (Default:
None
) - normalization (bool, number, or callable, optional) – If boolean True, then output is divided by 1 << 31
(assumes signed 32-bit audio), and normalizes to [-1, 1].
If number, then output is divided by that number
If callable, then the output is passed as a parameter
to the given function, then the output is divided by
the result. (Default:
True
) - channels_first (bool) – Set channels first or length first in result. (Default:
True
) - num_frames (int, optional) – Number of frames to load. 0 to load everything after the offset.
(Default:
0
) - offset (int, optional) – Number of frames from the start of the file to begin data loading.
(Default:
0
) - signalinfo (sox_signalinfo_t, optional) – A sox_signalinfo_t type, which could be helpful if the
audio type cannot be automatically determined. (Default:
None
) - encodinginfo (sox_encodinginfo_t, optional) – A sox_encodinginfo_t type, which could be set if the
audio type cannot be automatically determined. (Default:
None
) - filetype (str, optional) – A filetype or extension to be set if sox cannot determine it
automatically. (Default:
None
)
Returns: An output tensor of size [C x L] or [L x C] where L is the number of audio frames and C is the number of channels. An integer which is the sample rate of the audio (as listed in the metadata of the file)
Return type: Tuple[torch.Tensor, int]
- Example
>>> data, sample_rate = torchaudio.load('foo.mp3') >>> print(data.size()) torch.Size([2, 278756]) >>> print(sample_rate) 44100 >>> data_vol_normalized, _ = torchaudio.load('foo.mp3', normalization=lambda x: torch.abs(x).max()) >>> print(data_vol_normalized.abs().max()) 1.
-
torchaudio.
load_wav
(filepath, **kwargs)[source]¶ Loads a wave file. It assumes that the wav file uses 16 bit per sample that needs normalization by shifting the input right by 16 bits.
Parameters: filepath (str or pathlib.Path) – Path to audio file Returns: An output tensor of size [C x L] or [L x C] where L is the number of audio frames and C is the number of channels. An integer which is the sample rate of the audio (as listed in the metadata of the file) Return type: Tuple[torch.Tensor, int]
-
torchaudio.
save
(filepath, src, sample_rate, precision=16, channels_first=True)[source]¶ Convenience function for save_encinfo.
Parameters: - filepath (str) – Path to audio file
- src (torch.Tensor) – An input 2D tensor of shape [C x L] or [L x C] where L is the number of audio frames, C is the number of channels
- sample_rate (int) – An integer which is the sample rate of the audio (as listed in the metadata of the file)
- precision (int) – Bit precision (Default:
16
) - channels_first (bool) – Set channels first or length first in result. (
Default:
True
)
-
torchaudio.
save_encinfo
(filepath, src, channels_first=True, signalinfo=None, encodinginfo=None, filetype=None)[source]¶ Saves a tensor of an audio signal to disk as a standard format like mp3, wav, etc.
Parameters: - filepath (str) – Path to audio file
- src (torch.Tensor) – An input 2D tensor of shape [C x L] or [L x C] where L is the number of audio frames, C is the number of channels
- channels_first (bool) – Set channels first or length first in result. (Default:
True
) - signalinfo (sox_signalinfo_t) – A sox_signalinfo_t type, which could be helpful if the
audio type cannot be automatically determined. (Default:
None
) - encodinginfo (sox_encodinginfo_t, optional) – A sox_encodinginfo_t type, which could be set if the
audio type cannot be automatically determined. (Default:
None
) - filetype (str, optional) – A filetype or extension to be set if sox cannot determine it
automatically. (Default:
None
)
- Example
>>> data, sample_rate = torchaudio.load('foo.mp3') >>> torchaudio.save('foo.wav', data, sample_rate)
-
torchaudio.
shutdown_sox
()[source]¶ Showdown sox for effects chain. Not required for simple loading. Importantly, only call once. Attempting to re-initialize sox will result in seg faults.
-
torchaudio.
sox_encodinginfo_t
()[source]¶ Create a sox_encodinginfo_t object. This object can be used to set the encoding type, bit precision, compression factor, reverse bytes, reverse nibbles, reverse bits and endianness. This can be used in an effects chain to encode the final output or to save a file with a specific encoding. For example, one could use the sox ulaw encoding to do 8-bit ulaw encoding. Note in a tensor output the result will be a 32-bit number, but number of unique values will be determined by the bit precision.
- Returns: sox_encodinginfo_t(object)
- encoding (sox_encoding_t), output encoding
- bits_per_sample (int), bit precision, same as precision in sox_signalinfo_t
- compression (float), compression for lossy formats, 0.0 for default compression
- reverse_bytes (sox_option_t), reverse bytes, use sox_option_default
- reverse_nibbles (sox_option_t), reverse nibbles, use sox_option_default
- reverse_bits (sox_option_t), reverse bytes, use sox_option_default
- opposite_endian (sox_bool), change endianness, use sox_false
- Example
>>> ei = torchaudio.sox_encodinginfo_t() >>> ei.encoding = torchaudio.get_sox_encoding_t(1) >>> ei.bits_per_sample = 16 >>> ei.compression = 0 >>> ei.reverse_bytes = torchaudio.get_sox_option_t(2) >>> ei.reverse_nibbles = torchaudio.get_sox_option_t(2) >>> ei.reverse_bits = torchaudio.get_sox_option_t(2) >>> ei.opposite_endian = torchaudio.get_sox_bool(0)
-
torchaudio.
sox_signalinfo_t
()[source]¶ Create a sox_signalinfo_t object. This object can be used to set the sample rate, number of channels, length, bit precision and headroom multiplier primarily for effects
- Returns: sox_signalinfo_t(object)
- rate (float), sample rate as a float, practically will likely be an integer float
- channel (int), number of audio channels
- precision (int), bit precision
- length (int), length of audio in samples * channels, 0 for unspecified and -1 for unknown
- mult (float, optional), headroom multiplier for effects and
None
for no multiplier
- Example
>>> si = torchaudio.sox_signalinfo_t() >>> si.channels = 1 >>> si.rate = 16000. >>> si.precision = 16 >>> si.length = 0