chainer.datasets.LabeledImageDataset

class chainer.datasets.LabeledImageDataset(pairs, root='.', dtype=None, label_dtype=<class 'numpy.int32'>)[source]

Dataset of image and label pairs built from a list of paths and labels.

This dataset reads an external image file like ImageDataset. The difference from ImageDataset is that this dataset also returns a label integer. The paths and labels are given as either a list of pairs or a text file contains paths/labels pairs in distinct lines. In the latter case, each path and corresponding label are separated by white spaces. This format is same as one used in Caffe.

Note

This dataset requires the Pillow package being installed. In order to use this dataset, install Pillow (e.g. by using the command pip install Pillow). Be careful to prepare appropriate libraries for image formats you want to use (e.g. libpng for PNG images, and libjpeg for JPG images).

Warning

You are responsible for preprocessing the images before feeding them to a model. For example, if your dataset contains both RGB and grayscale images, make sure that you convert them to the same format. Otherwise you will get errors because the input dimensions are different for RGB and grayscale images.

Parameters
  • pairs (str or list of tuples) – If it is a string, it is a path to a text file that contains paths to images in distinct lines. If it is a list of pairs, the i-th element represents a pair of the path to the i-th image and the corresponding label. In both cases, each path is a relative one from the root path given by another argument.

  • root (str) – Root directory to retrieve images from.

  • dtype – Data type of resulting image arrays. chainer.config.dtype is used by default (see Configuring Chainer).

  • label_dtype – Data type of the labels.

Methods

__getitem__(index)[source]

Returns an example or a sequence of examples.

It implements the standard Python indexing and one-dimensional integer array indexing. It uses the get_example() method by default, but it may be overridden by the implementation to, for example, improve the slicing performance.

Parameters

index (int, slice, list or numpy.ndarray) – An index of an example or indexes of examples.

Returns

If index is int, returns an example created by get_example. If index is either slice or one-dimensional list or numpy.ndarray, returns a list of examples created by get_example.

Example

>>> import numpy
>>> from chainer import dataset
>>> class SimpleDataset(dataset.DatasetMixin):
...     def __init__(self, values):
...         self.values = values
...     def __len__(self):
...         return len(self.values)
...     def get_example(self, i):
...         return self.values[i]
...
>>> ds = SimpleDataset([0, 1, 2, 3, 4, 5])
>>> ds[1]   # Access by int
1
>>> ds[1:3]  # Access by slice
[1, 2]
>>> ds[[4, 0]]  # Access by one-dimensional integer list
[4, 0]
>>> index = numpy.arange(3)
>>> ds[index]  # Access by one-dimensional integer numpy.ndarray
[0, 1, 2]
__len__()[source]

Returns the number of data points.

get_example(i)[source]

Returns the i-th example.

Implementations should override it. It should raise IndexError if the index is invalid.

Parameters

i (int) – The index of the example.

Returns

The i-th example.