torchaudio.datasets¶

All datasets are subclasses of torch.utils.data.Dataset i.e, they have __getitem__ and __len__ methods implemented. Hence, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers. For example:

yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
                                          batch_size=1,
                                          shuffle=True,
                                          num_workers=args.nThreads)

The following datasets are available:

Datasets

VCTK
YESNO

All the datasets have almost similar API. They all have two common arguments: transform and target_transform to transform the input and target respectively.

VCTK ¶

class torchaudio.datasets.VCTK(root, url='http://homepages.inf.ed.ac.uk/jyamagis/release/VCTK-Corpus.tar.gz', folder_in_archive='VCTK-Corpus', download=False, downsample=False, transform=None, target_transform=None)[source]¶

Create a Dataset for VCTK. Each item is a tuple of the form: (waveform, sample_rate, utterance, speaker_id, utterance_id)

Folder p315 will be ignored due to the non-existent corresponding text files. For more information about the dataset visit: https://datashare.is.ed.ac.uk/handle/10283/3443

YESNO ¶

class torchaudio.datasets.YESNO(root, url='http://www.openslr.org/resources/1/waves_yesno.tar.gz', folder_in_archive='waves_yesno', download=False, transform=None, target_transform=None)[source]¶: Create a Dataset for YesNo. Each item is a tuple of the form: (waveform, sample_rate, labels)

torchaudio.datasets¶

VCTK ¶

YESNO ¶

Docs

Tutorials

Resources

torchaudio.datasets¶

VCTK¶

YESNO¶

Docs

Tutorials

Resources

VCTK ¶

YESNO ¶