torchaudio.sox_effects¶
Create SoX effects chain for preprocessing audio.
SoxEffect¶
SoxEffectsChain¶
-
class
torchaudio.sox_effects.
SoxEffectsChain
(normalization: Union[bool, float, Callable] = True, channels_first: bool = True, out_siginfo: Any = None, out_encinfo: Any = None, filetype: str = 'raw') → None[source]¶ SoX effects chain class.
Parameters: - normalization (bool, number, or callable, optional) – If boolean True, then output is divided by 1 << 31
(assumes signed 32-bit audio), and normalizes to [-1, 1]. If number, then output is divided by that
number. If callable, then the output is passed as a parameter to the given function, then the
output is divided by the result. (Default:
True
) - channels_first (bool, optional) – Set channels first or length first in result. (Default:
True
) - out_siginfo (sox_signalinfo_t, optional) – a sox_signalinfo_t type, which could be helpful if the
audio type cannot be automatically determined. (Default:
None
) - out_encinfo (sox_encodinginfo_t, optional) – a sox_encodinginfo_t type, which could be set if the
audio type cannot be automatically determined. (Default:
None
) - filetype (str, optional) – a filetype or extension to be set if sox cannot determine it
automatically. . (Default:
'raw'
)
Returns: An output Tensor of size [C x L] or [L x C] where L is the number of audio frames and C is the number of channels. An integer which is the sample rate of the audio (as listed in the metadata of the file)
Return type: Tuple[Tensor, int]
- Example
>>> class MyDataset(Dataset): >>> def __init__(self, audiodir_path): >>> self.data = [os.path.join(audiodir_path, fn) for fn in os.listdir(audiodir_path)] >>> self.E = torchaudio.sox_effects.SoxEffectsChain() >>> self.E.append_effect_to_chain("rate", [16000]) # resample to 16000hz >>> self.E.append_effect_to_chain("channels", ["1"]) # mono signal >>> def __getitem__(self, index): >>> fn = self.data[index] >>> self.E.set_input_file(fn) >>> x, sr = self.E.sox_build_flow_effects() >>> return x, sr >>> >>> def __len__(self): >>> return len(self.data) >>> >>> torchaudio.initialize_sox() >>> ds = MyDataset(path_to_audio_files) >>> for sig, sr in ds: >>> [do something here] >>> torchaudio.shutdown_sox()
-
append_effect_to_chain
(ename: str, eargs: Union[List[str], NoneType] = None) → None[source]¶ Append effect to a sox effects chain.
Parameters:
-
set_input_file
(input_file: str) → None[source]¶ Set input file for input of chain
Parameters: input_file (str) – The path to the input file.
-
sox_build_flow_effects
(out: Union[torch.Tensor, NoneType] = None) → Tuple[torch.Tensor, int][source]¶ Build effects chain and flow effects from input file to output tensor
Parameters: out (Tensor, optional) – Where the output will be written to. (Default: None
)Returns: An output Tensor of size [C x L] or [L x C] where L is the number of audio frames and C is the number of channels. An integer which is the sample rate of the audio (as listed in the metadata of the file) Return type: Tuple[Tensor, int]
- normalization (bool, number, or callable, optional) – If boolean True, then output is divided by 1 << 31
(assumes signed 32-bit audio), and normalizes to [-1, 1]. If number, then output is divided by that
number. If callable, then the output is passed as a parameter to the given function, then the
output is divided by the result. (Default: