API Reference¶
Communicators¶
- 
chainermn.create_communicator(communicator_name='hierarchical', mpi_comm=None, allreduce_grad_dtype=None, batched_copy=False)¶
- Create a ChainerMN communicator. - Different communicators provide different approaches of communication, so they have different performance charasteristics. The default communicator - hierarchicalis expected to generally perform well on a variety of environments, so one need not to change communicators in most cases. However, choosing proper communicator may give better performance. The following communicators are available.- Name - CPU - GPU - NCCL - Recommended Use Cases - pure_nccl - OK - Required (>= v2) - pure_ncclis recommended when NCCL2 is available in the environment.- hierarchical - OK - Required - Each node has a single NIC or HCA - two_dimensional - OK - Required - Each node has multiple NICs or HCAs - single_node - OK - Required - Single node with multiple GPUs - flat - OK - N/A - naive - OK - OK - Testing on CPU mode - Parameters
- communicator_name – The name of communicator ( - naive,- flat,- hierarchical,- two_dimensional,- pure_nccl, or- single_node)
- mpi_comm – MPI4py communicator 
- allreduce_grad_dtype – Data type of gradient used in All-Reduce. If - None, the dtype of a model is used.
 
- Returns
- ChainerMN communicator that implements methods defined in - chainermn.CommunicatorBase
 
- 
class chainermn.CommunicatorBase¶
- Interface definition of all communicators. - All communicators that have compatible set of methods with this class is supposed to work in ChainerMN’s parallel computation implementation. The methods are named after MPI functions, such as - bcast()came from- MPI_Bcast().- There are two types of methods: one that treats Python objects have - _objsuffix. The other has methods without any suffix and it handles ndarray and arrays filled with scaler values. So the number of methods would be- [send, recv, bcast, gather, allreduce] * [ '_obj', ''] - (with single exception - alltoall,- allreduce_grad,- splitand- bcast_dataso far). Also methods are supposed to be written in this order. All those methods must be implemented in its implementation class, or otherwise it cannot be instantiated in runtime.- Note - As most implementation of - _obj-sufficed methods involves Python object pickling and unpickling, there is an implicit size limit.- TODO(kuenishi): as of now no implementation class actually has - allreducemethod.- 
allreduce(data)¶
- Allreduce operation among processes - Processes one of several aggregation operations using all data from all processes and returns the result of the aggregation to all processes. - TODO(kuenishi): add - opargument once we find a use case for operations other than ‘SUM’.- Parameters
- data (ndarray) – the data to aggregate among all nodes. 
- Returns
- Sum of all data from all processes. 
 
 - 
allreduce_grad(model)¶
- Works as same as - allreduce_objbut for Chainer model gradients- Note - this only supports SUM same as - allreduce_obj.
 - 
allreduce_obj(obj)¶
- Apply a reduce operation to all objects and spread the result. - For example of integers and summation, equivalent local code is: - >>> from functools import reduce >>> reduce(lambda x, y: x + y, [1, 2, 3, 4, 5]) 15 - The only operation currently supported is summation. - TODO(kuenishi): support other operations such as ‘MAX’, ‘MIN’ and ‘PROD’ with - opargument once we need any of them.- Parameters
- obj – An arbitrary object to apply reduce operation. Must have corresponding operation method e.g. - __plus__().
- Returns
- The result of the operation applied to all objects. 
 
 - 
alltoall(xs)¶
- All-to-all implementation for ndarray - Parameters
- xs (tuple of numpy/cupy array) – 
- Returns
- Received arrays. The length of tuple equals to the communicator size. 
- Return type
- ys (tuple of numpy/cupy array) 
 
 - 
bcast(data, max_buf_len=None, root=0)¶
- Broadcasts an ndarray from root process to all processes - Parameters
- Returns
- The data sent from root process 
- Return type
- ys (numpy/cupy array) 
 
 - 
bcast_data(model)¶
- Broadcast Chainer model parameter data 
 - 
bcast_obj(obj, max_buf_len=None, root=0)¶
- Broadcasts an arbitrary object from root to all non-root processes. 
 - 
gather(data, root=0)¶
- Gathers an ndarray from all processes to root process - Parameters
- data (ndarray, or scaler) – for root process this is ignored. For For non-root processes, the data to send to root process. 
- root (int) – rank of the process who receives the data. 
 
- Returns
- For root process, the ndarray sent from non-root processes. For non-root processes, what? 
 
 - 
gather_obj(obj, root=0)¶
- Gathers arbitrary objects from all non-root processes to root process. - Parameters
- obj – arbtrary object to send to root process. Root process will receive this argument included in returned list. 
- root (int) – rank of the root node who receives all objects. 
 
- Returns
- A list of objects sent from all processes. 
 - TODO(kuenishi): make sure the ordering of objects in the returned list. 
 - 
inter_rank¶
- The rank of this node in the cluster. 
 - 
inter_size¶
- Number of nodes that participates the cluster. 
 - 
intra_rank¶
- Intra rank (process id in the machine) of this process. 
 - 
rank¶
- Rank (process id in the cluster) of this process in integer. 
 - 
recv(source, tag)¶
- Receives an ndarray from source. - To receive the message, sender must send the data. 
 - 
recv_obj(source, tag)¶
- Receives an arbitrary Python object from source process with a tag. - Parameters
- source (int) – Rank number of sender process, to selectively receive the object. 
- tag – tag to identify the message. 
 
- Returns
- an object sent from the source by - send_obj.
 
 - 
send(data, dest, tag)¶
- Sends an ndarray to destination - Receiver must invoke - recv()to wait for the message.
 - 
send_obj(obj, dest, tag)¶
- Sends an arbitrary Python object to destination with a tag. - Parameters
- obj – Arbitrary object to send to receiver. 
- dest (int) – Rank number of receiver process (destination). 
- tag – tag to identify the message. 
 
 
 - 
size¶
- Number of processes of the cluster. 
 - 
split(color, key)¶
- A function anologous to - MPI_Comm_Split.- This method splits the inter MPI commnicator and return a wrapped ChainerMN communicator. - Parameters
- color (int) – Index of new group. The process with the same color will be assigned to the same group. 
- key (int) – Control of rank assignment. The process will be assigned a rank in the new group ordered by the value of key. If you do not care of the rank, you can just simply specify the original rank. 
 
- Returns
- CommunicatorBase 
 
 
- 
Optimizers and Evaluators¶
- 
chainermn.create_multi_node_optimizer(actual_optimizer, communicator, double_buffering=False)¶
- Create a multi node optimizer from a Chainer optimizer. - Parameters
- actual_optimizer – Chainer optimizer (e.g., - chainer.optimizers.Adam).
- communicator – ChainerMN communicator. 
- double_buffering – If - True, all-reduce and other processing (such as forward and backward) are overlapped using double buffering. There are cases where accuracy is affected because the gradients of the previous iteration are used for update. This flag is supported by- PureNcclCommunicatoronly.
 
- Returns
- The multi node optimizer based on - actual_optimizer.
 
- 
chainermn.create_multi_node_evaluator(actual_evaluator, communicator)¶
- Create a multi node evaluator from a normal evaluator. - Actually this method patches the evaluator to work in multi node environment. This method adds several hidden attributes starting with _mn_ prefix. - Parameters
- actual_evaluator – evaluator to be patched (e.g., - chainer.training.extensions.Evaluator)
- communicator – ChainerMN communicator 
 
- Returns
- The multi-node patched - actual_evaluator.
 - Note - After patched, original evaluator does not work correctly in non-MPI environment. 
Dataset Utilities¶
- 
chainermn.scatter_dataset(dataset, comm, root=0, shuffle=False, seed=None, max_buf_len=268435456)¶
- Scatter the given dataset to the workers in the communicator. - The dataset of worker 0 (i.e., the worker whose - comm.rankis 0) is scattered to all workers. The given dataset of other workers are ignored. The dataset is split to sub datasets of almost equal sizes and scattered to workers. To create a sub dataset,- chainer.datasets.SubDatasetis used.- Parameters
- dataset – A dataset (e.g., - list,- numpy.ndarray,- chainer.datasets.TupleDataset, …).
- comm – ChainerMN communicator or MPI4py communicator. 
- shuffle (bool) – If - True, the order of examples is shuffled before being scattered.
- root (int) – The root process of the scatter operation. 
- seed (int) – Seed the generator used for the permutation of indexes. If an integer being convertible to 32 bit unsigned integers is specified, it is guaranteed that each sample in the given dataset always belongs to a specific subset. If - None, the permutation is changed randomly.
- max_buf_len (int) – Max buffer size to be used at broadcasting binaries. Must not be larger than 2147483647. 
 
- Returns
- Scattered dataset. 
 
- 
chainermn.datasets.create_empty_dataset(dataset)¶
- Creates an empty dataset for models with no inputs and outputs. - This function generates an empty dataset, i.e., - __getitem__()only returns- None. Its dataset is compatible with the original one. Such datasets used for models which do not take any inputs, neither return any outputs. We expect models, e.g., whose- forward()is starting with- chainermn.functions.recv()and ending with- chainermn.functions.send().- Parameters
- dataset – Dataset to convert. 
- Returns
- Dataset consists of only patterns in the original one. 
- Return type
 
Links¶
- 
class chainermn.MultiNodeChainList(comm)¶
- Combining multiple non-connected components of computational graph. - This class combines each - chainer.Chain, which represents one of the non-connected component in compuational graph. In- __call__(), the returned object of- chainer.Chain(which represents pointer) are passed to the next- chainer.Chain, in order to retain the computational graph connected and make backprop work properly.- Users add each - chainer.Chainby- add_link()method. Each chain is invoked in forward computation according to the order they are added, and in backward computation according to the reversed order.- Example (basic usage) - This is a simple example of the model which sends its outputs to rank=1 machine: - import chainer import chainer.functions as F import chainermn class SimpleModelSub(chainer.Chain): def __init__(self, n_in, n_hidden, n_out): super(SimpleModelSub, self).__init__( l1=L.Linear(n_in, n_hidden), l2=L.Linear(n_hidden, n_out)) def __call__(self, x): h1 = F.relu(self.l1(x)) return self.l2(h1) class SimpleModel(chainermn.MultiNodeChainList): def __init__(self, comm, n_in, n_hidden, n_out): super(SimpleModel, self).__init__(comm) self.add_link( SimpleModelSub(n_in, n_hidden, n_out), rank_in=None, rank_out=1) - Example (split MLP on 2 processes) - This is the other example of two models interacting each other: - import chainer import chainer.functions as F import chainermn class MLP(chainer.Chain): def __init__(self, n_in, n_hidden, n_out): super(MLP, self).__init__( l1=L.Linear(n_in, n_hidden), l2=L.Linear(n_hidden, n_hidden), l3=L.Linear(n_hidden, n_out)) def __call__(self, x): h1 = F.relu(self.l1(x)) h2 = F.relu(self.l2(h1)) return self.l3(h2) class Model0(chainermn.MultiNodeChainList): def __init__(self, comm): super(Model0, self).__init__(comm) self.add_link( MLP(10000, 5000, 2000), rank_in=None, rank_out=1) self.add_link( MLP(100, 50, 10), rank_in=1, rank_out=None) class Model1(chainermn.MultiNodeChainList): def __init__(self, comm): super(Model1, self).__init__(comm) self.add_link(MLP(2000, 500, 100), rank_in=0, rank_out=0) - Model0is expected to be on rank=0, and- Model1is expected to be on rank=1. The first- MLPin- Model0will send its outputs to- Model1, then- MLPin- Model1will receive it and send its outputs to the second- MLPin- Model0.- Example (sending tuples) - This is the example for sending a tuple: - import chainer import chainer.functions as F import chainermn class NN0(chainer.Chain): def __call__(self, x): y0 = some_calculation_nn0_0(x) y1 = some_calculation_nn1_1(x) return y0, y1 class NN1(chainer.Chain): def __call__(self, y): y0, y1 = y # unpack tuple from NN0 return some_calculation_nn1(y0, y1) class Model_on_Process_0(chainermn.MultiNodeChainList): def __init__(self, comm): super(Model_on_Process_0, self).__init__(comm=comm) self.add_link(NN0(), rank_in=None, rank_out=1) class Model_on_Process_1(chainermn.MultiNodeChainList): def __init__(self, comm): super(Model_on_Process_1, self).__init__(comm=comm) self.add_link(NN1(), rank_in=0, rank_out=None) - In this example, - Model_on_Process_0sends two elemental tuple- (y0, y1)(returned by- NN0.__call__) to- Model_on_Process_1, which can be unpacked as shown in- NN1.__call__.- Parameters
- comm (chainermn.communicators._base.CommunicatorBase) – ChainerMN communicator. 
 - 
add_link(link, rank_in=None, rank_out=None)¶
- Register one connected link with its inout rank. - Parameters
- link (chainer.Link) – The link object to be registered. 
- rank_in (int, list, or None) – Ranks from which it receives data. If None is specified, the model does not receive from any machines. 
- rank_out (int, list, or None) – Ranks to which it sends data. If None is specified, the model will not send to any machine. 
 
 
 
- 
class chainermn.links.MultiNodeBatchNormalization(size, comm, decay=0.9, eps=2e-05, dtype=<class 'numpy.float32'>, use_gamma=True, use_beta=True, initial_gamma=None, initial_beta=None, communication_backend='auto')¶
- Batch normalization layer that can use the whole batch stats. - When using chainer.link.BatchNormalization, batch mean and std are computed independently for the local batch in each worker. When local batch size is too small, training is unstable due to unreliable batch stats. - In contrast, when using this MultiNodeBatchNormalization, workers communicate to conduct ‘correct’ batch normalization (e.g., obtaining mean and std for the whole global batch). - This link works only with Chainer >= 2.0.0. - Parameters
- size (int or tuple of ints) – Size (or shape) of channel dimensions. 
- comm (ChainerMN communicator) – communicator to share the batch stats. 
- decay (float) – Decay rate of moving average. It is used on training. 
- eps (float) – Epsilon value for numerical stability. 
- dtype (numpy.dtype) – Type to use in computing. 
- use_gamma (bool) – If - True, use scaling parameter. Otherwise, use unit(1) which makes no effect.
- use_beta (bool) – If - True, use shifting parameter. Otherwise, use unit(0) which makes no effect.
- communication_backend (str) – - mpi,- ncclor- auto. It is used to determine communication backend. If- auto, use the best communication backend for each communicator.
 
 
Functions¶
- 
chainermn.functions.send(x, communicator, rank, tag=0)¶
- Send elements to target process. - This function returns a dummy variable only holding the computational graph. If - backward()is invoked by this dummy variable, it will try to receive gradients from the target process and send them back to the parent nodes.- Parameters
- Returns
- A dummy variable with no actual data, only holding the computational graph. Please refer - chainermn.functions.pseudo_connectfor detail.
- Return type
 
- 
chainermn.functions.recv(communicator, rank, delegate_variable=None, tag=0, force_tuple=False)¶
- Receive elements from target process. - This function returns data received from target process. If - backward()is invoked, it will try to send gradients to the target process. The received array will be on the current CUDA device if the corresponding- send()is invoked with arrays on GPU. Please be aware that the current CUDA device is intended one. (- https://docs-cupy.chainer.org/en/stable/tutorial/basic.html#current-device)- Note - If you define non-connected computational graph on one process, you have to use - delegate_variableto specify the output of previous computational graph component. Otherwise- backward()does not work well. Please refer- chainermn.functions.pseudo_connectfor detail.- Parameters
- communicator (chainer.communicators.CommunicatorBase) – ChainerMN communicator. 
- rank (int) – Target process specifier. 
- delegate_variable (chainer.Variable) – Pointer to the other non-connected component. 
- tag (int) – Optional message ID (MPI feature). 
- force_tuple (bool) – If - False(the default) a Variable will be returned when the number of outputs is one. Otherwise, this method returns a tuple even when the number of outputs is one.
 
- Returns
- Data received from target process. If - backward()is invoked by this variable, it will send gradients to the target process.
- Return type
 
- 
chainermn.functions.pseudo_connect(delegate_variable, *actual_variables)¶
- Connect independent connected graph component. - This function is implemented to return received arguments directly, except the first - delegate_variable. In backward computation, it returns received gradients directly, adding a zero grad corresponding to- delegate_variable. The detail of- delegate_variableis described in the following notes.- Note - In model-parallel framework, models on each process might have many non-connected components. Here we call a given graph non-connected when multiple inter-process communications are needed for its computation. For example, consider the following example: - class ConnectedGraph(chainermn.MultiNodeChainList): def __init__(self, comm): super(ConnectedGraph, self).__init__(comm) self.add_link(ConnectedGraphSub(), rank_in=3, rank_out=1) - This model receives inputs from rank=3 process and sends its outputs to rank=1 process. The entire graph can be seen as one connected component - ConnectedGraphSub. Please refer the documentation of- MultiNodeChainListfor detail.- On the other hand, see the next example: - class NonConnectedGraph(chainermn.MultiNodeChainList): def __init__(self, comm): super(NonConnectedGraph, self).__init__(comm) self.add_link(NonConnectedGraphSubA(), rank_in=3, rank_out=1) self.add_link(NonConnectedGraphSubB(), rank_in=1, rank_out=2) - This model consists of two components: at first, - NonConnectedGraphSubAreceives inputs from rank=3 process and sends its outputs to rank=1 process, and then- NonConnectedGraphSubBreceives inputs from rank=1 process and sends its outputs to rank=2 process. Here multiple inter-process communications are invoked between- NonConnectedGraphSubAand- NonConnectedGraphSubB, so it is regarded as non-connected.- Such kind of non-connected models can be problematic in backward computation. Chainer traces back the computational graph from the output variable, however naive implementation of - chainermn.functions.recvdoes not take any inputs rather receives inputs by- MPI_Recv, where backward path vanishes.- To prevent this, dummy variables what we call - delegate_variableare used. In principle,- chainermn.functions.senddoes not return any outputs because it sends data to the other process by- MPI_Send. However,- chainermn.functions.sendreturns a dummy / empty variable in our implementation, which is called- delegate_variable. This variable does not hold any data, just used for retaining backward computation path. We can guarantee the backward computation just by putting- delegate_variableto the next- chainermn.functions.recv(- chainermn.functions.recvhas an optional argument to receive- delegate_variable).- Note - In some cases the intermediate graph component returns model outputs. See the next example: - class NonConnectedGraph2(chainermn.MultiNodeChainList): def __init__(self, comm): super(NonConnectedGraph2, self).__init__(comm) self.add_link(NonConnectedGraphSubA(), rank_in=1, rank_out=None) self.add_link(NonConnectedGraphSubB(), rank_in=None, rank_out=1) - This model first receives inputs from rank=1 process and make model outputs (specified by - rank_out=None) in- NonConnectedGraphSubA. Then using model inputs (specified by- rank_in=None),- NonConnectedGraphSubBsends its outputs to rank=1 process. Since- MultiNodeChainList.__call__returns outputs of the last component (in this case, outputs of- NonConnectedGraphSubB), naive implementation cannot output the returned value of- NonConnectedGraphSubAas the model outputs. In this case,- pseudo_connectshould be used.- pseudo_connecttakes two arguments. The first one- delegate_variableis what we explained in above note. In this case, returned value of- NonConnectedGraphSubBcorresponds to- delegate_variable. The second one- actual_variablesis “what we want- delegate_variableto imitate”. In- NonConnectedGraph2, we obtain returned value of- NonConnectedGraphSubBas the model outputs, but what we actually want is returned value of- NonConnectedGraphSubA. At the same time we want to trace back this resulted variable in backward computation. Using- pseudo_connect, we can make a variable whose data is the same as the returned value of- NonConnectedGraphSubA, and which traces back- NonConnectedGraphSubBfirst.- pseudo_connectshould also be used in some pathological cases, for example, where multiple- chainermn.functions.sendoccurs sequentially.- Parameters
- delegate_variable (chainer.Variable) – Pointer to the previous non-connected graph component. 
- actual_variables (tuple of chainer.Variable) – Actual values which - delegate_variableimitate.
 
- Returns
- A variable with the given values combined with delegating variable. 
- Return type
 
- 
chainermn.functions.bcast(comm, x, root=0)¶
- Differentiable broadcast communication between workers. - This function invokes broadcast communications among processes specified by the communicator. Backward will be invoked as well as the ordinary chainer functions, where gradients are gathered to the root process and summed up. - The received array will be on the current CUDA device if - xon the invoking process is on GPU. Please be aware that the current CUDA device is intended one. (- https://docs-cupy.chainer.org/en/stable/tutorial/basic.html#current-device)- Parameters
- comm – ChainerMN communicator. 
- x (chainer.Variable) – Variable to be sent. 
 
- Returns
- Broadcasted variable. 
- Return type
- y (chainer.Variable) 
 
- 
chainermn.functions.gather(comm, x, root=0)¶
- Differentiable gather communication between workers. - This function invokes gather communications among processes specified by the communicator. Backward will be invoked as well as the ordinary chainer functions, where gradients are scattered from the root process to each slave. - The received array will be on the current CUDA device if - xon the root process is on GPU. Please be aware that the current CUDA device is intended one. (- https://docs-cupy.chainer.org/en/stable/tutorial/basic.html#current-device)- Parameters
- comm – ChainerMN communicator. 
- x (chainer.Variable) – Variable to be sent. 
 
- Returns
- Gathered variables. - Nonefor slaves.
- Return type
- ys (chainer.Variable) 
 
- 
chainermn.functions.scatter(comm, xs, root=0)¶
- Differentiable scatter communication between workers. - This function invokes scatter communications among processes specified by the communicator. Backward will be invoked as well as the ordinary chainer functions, where gradients are gathered to the root process. - The received array will be on the current CUDA device if - xson the root process is on GPU. Please be aware that the current CUDA device is intended one. (- https://docs-cupy.chainer.org/en/stable/tutorial/basic.html#current-device)- Parameters
- comm – ChainerMN communicator. 
- xs (list of chainer.Variable) – Variables to be scattered for master process. - Nonefor slave process.
 
- Returns
- Scattered variable. 
- Return type
- y (chainer.Variable) 
 
- 
chainermn.functions.alltoall(comm, xs)¶
- Differentiable all-to-all communication between workers. - This function invokes all-to-all communications among processes specified by the communicator. Backward will be invoked as well as the ordinary chainer functions, just passing input gradients back. Unlike point-to-point communication such as - chainermn.functions.sendand- chainermn.functions.recv, users need not to care about delegate variables, since- backward()will not be invoked until all gradients from output direction arrive. Please refer to- chainermn.functions.pseudo_connectabout the detail of delegate variables.- The received array will be on the current CUDA device on the invoking process if - xsis on GPU. Please be aware that the current CUDA device is intended one. (- https://docs-cupy.chainer.org/en/stable/tutorial/basic.html#current-device)- Parameters
- comm – ChainerMN communicator. 
- xs (list of chainer.Variables) – Variables to send. 
 
- Returns
- Received variables. 
- Return type
- ys (list of chainer.Variables) 
 
- 
chainermn.functions.allgather(comm, x)¶
- Differentiable all-gather communication between workers. - This function invokes gather communications among processes specified by the communicator. Backward will be invoked as well as the ordinary chainer functions, where gradients are reduced to each process. - The received array will be on the current CUDA device on the invoking process if - xis on GPU. Please be aware that the current CUDA device is intended one. (- https://docs-cupy.chainer.org/en/stable/tutorial/basic.html#current-device)- Parameters
- comm – ChainerMN communicator. 
- x (chainer.Variables) – Variables to send. 
 
- Returns
- Received variables. 
- Return type
- ys (list of chainer.Variables) 
 
Iterators¶
- 
chainermn.iterators.create_multi_node_iterator(actual_iterator, communicator, rank_master=0)¶
- Create a multi node iterator from a Chainer iterator. - This iterator shares the same batches on multiple processes, simply broadcasting batches from master process to slave processes in each iteration. Master process obtains batches from - actual_iterator, which you can specify any Chainer iterator (e.g.- chainer.iterators.SerialIterator).- Here is an example situation. When we train a sequence-to-sequence model, where the encoder and the decoder is located on two different processes, we want to share the same batches on each process, thus inputs for the encoder and output teacher signals for the decoder become consistent. - In order to use the multi node iterator, first create the iterator from Chainer iterator and ChainerMN communicator: - iterator = chainermn.iterators.create_multi_node_iterator( chainer.iterators.SerialIterator( dataset, batch_size, shuffle=True), communicator) - Then you can use it as the ordinary Chainer iterator: - updater = chainer.training.StandardUpdater(iterator, optimizer) trainer = training.Trainer(updater) trainer.run() - Since this iterator shares batches through network in each iteration, communication might be large. If you train your model-parallel network on extremely large dataset, you can also consider to use - chainermn.iterators.create_synchronized_iterator.- Current multi node iterator supports numpy.float32 or tuple of numpy.float32 as the data type of the batch element. - Note - create_multi_node_iteratorand- serializeof created iterators must be called at the same time by master and slaves, unless it falls into deadlock because they synchronize internal states of iterators.- Parameters
- actual_iterator – Chainer iterator ( - chainer.iterators.SerialIteratorand- chainer.iterators.MultiprocessIteratorare supported).
- communicator – ChainerMN communicator. 
- rank_master – process rank to be master. 
 
- Returns
- The master-slave iterator based on - actual_iterator.
 
Trainer extensions¶
- 
class chainermn.extensions.AllreducePersistent(model, comm)¶
- Chainer extension to averagize persistents over workers. - When called, this extension invokes all-reduce communication among workers to compute averages of persistent variables in the model. Persistent variables are updated to the averages. Currently, we ignore integer persistent variables, and only float persistent variables are handled. - This extension is mainly to improve the running mean and variance of BatchNormalization by increasing the effective number of examples. We do not need to call this frequently; call just before storing or evaluating the model. - Parameters
- model (chainer.link.Link) – Target link object. 
- comm (ChainerMN communicator) – communicator to compute averages. 
 
 
- 
chainermn.create_multi_node_checkpointer(name, comm, cp_interval=5, gc_interval=5, path=None)¶
- Create multi-node checkpointer object - Generational snapshot extension to allow fault tolerance; It keeps several old snapshots to rollback synchronized snapshot at each MPI process. Snapshot files are identified as ‘<name>.<rank>.<iteration>’. - <name> … identifier of the run where snapshot is kept for 
- <rank> … which process owned the model 
- <iteration> … number of iteration. 
 - This extension keeps several files for each execution and allows users to resume the whole job at the latest snapshots of each MPI process, and the iteration where all snapshots agrees. - As this object is a usual Chainer extension, users can just create this object and pass to the trainer as an extension: - checkpointer = create_multi_node_checkpointer(name=run_id, comm=comm) trainer.extend(checkpointer, trigger=(25, 'iteration')) - To run recovery at startup, before first iteration, run - checkpointer.maybe_load(trainer, optimizer) - before - trainer.run(). If nothing is recovered (i.e. no snapshot found),- trainer.updater.iterationwill remain- 0. Otherwise it will have the value of snapshot and the training will resume from that iteration.- optimizeris optional but this will let multi node optimizer avoid initial broadcast when all snapshot data among nodes are all in sync.- Note - Make sure that - checkpointer.maybe_loadis called after all extensions with states, such as- ExponentialShift, set to the trainer.- After training finished without errors all those temporary checkpoints will be cleaned up at all nodes. - Another example to use checkpointer without trainer would be: - checkpointer = create_multi_node_checkpointer(name=run_id, comm=comm) checkpointer.maybe_load(obj_you_want_to_snap, optimizer) while True: ## Training loop ... updater.update() ... checkpointer.save(obj_you_want_to_snap) # Make a checkpoint