chainer.training.updaters.ParallelUpdater

class chainer.training.updaters.ParallelUpdater(iterator, optimizer, converter=<function concat_examples>, models=None, devices=None, loss_func=None, loss_scale=None, auto_new_epoch=True)[source]

Implementation of a parallel GPU Updater.

This is an implementation of Updater that uses multiple GPUs. It behaves similarly to StandardUpdater. The update routine is modified to support data-parallel computation on multiple GPUs in one machine. It is based on synchronous parallel SGD: it parallelizes the gradient computation over a mini-batch, and updates the parameters only in the main device.

Parameters
  • iterator – Dataset iterator for the training dataset. It can also be a dictionary that maps strings to iterators. If this is just an iterator, then the iterator is registered by the name 'main'.

  • optimizer – Optimizer to update parameters. It can also be a dictionary that maps strings to optimizers. If this is just an optimizer, then the optimizer is registered by the name 'main'.

  • converter – Converter function to build input arrays. Each batch extracted by the main iterator is split equally between the devices and then passed with corresponding device option to this function. concat_examples() is used by default.

  • models – Dictionary of models. The main model should be the same model attached to the 'main' optimizer.

  • devices – Dictionary of devices to which the training data is sent. The devices should be arranged in a dictionary with the same structure as models.

  • loss_func – Loss function. The model is used as a loss function by default.

  • loss_scale (float) – Loss scaling factor. Loss scaling is a usefull technique to mitigate vanishing gradient issue that tends to happen when low precision data type like float16 is used during training. If you set loss scaling factor, gradients of loss values are to be multiplied by the factor before backprop starts. The factor is propagated to whole gradients in a computational graph along the backprop. The gradients of parameters are divided by the factor just before the parameters are to be updated.

  • auto_new_epoch (bool) – If True, new_epoch() of the main optimizer is automatically called when the is_new_epoch attribute of the main iterator is True.

Methods

connect_trainer(trainer)[source]

Connects the updater to the trainer that will call it.

The typical usage of this method is to register additional links to the reporter of the trainer. This method is called at the end of the initialization of Trainer. The default implementation does nothing.

Parameters

trainer (Trainer) – Trainer object to which the updater is registered.

finalize()[source]

Finalizes the updater object.

This method calls the finalize method of each iterator that this updater has. It is called at the end of training loops.

get_all_optimizers()[source]

Gets a dictionary of all optimizers for this updater.

Returns

Dictionary that maps names to optimizers.

Return type

dict

get_iterator(name)[source]

Gets the dataset iterator of given name.

Parameters

name (str) – Name of the dataset iterator.

Returns

Corresponding dataset iterator.

Return type

Iterator

get_optimizer(name)[source]

Gets the optimizer of given name.

Parameters

name (str) – Name of the optimizer.

Returns

Corresponding optimizer.

Return type

Optimizer

serialize(serializer)[source]

Serializes the current state of the updater object.

update()[source]

Updates the parameters of the target model.

This method implements an update formula for the training task, including data loading, forward/backward computations, and actual updates of parameters.

This method is called once at each iteration of the training loop.

update_core()[source]

Attributes

epoch
epoch_detail
is_new_epoch
previous_epoch_detail