Class MultiWorkerAllReduce
Inherits From: AllReduceCrossDeviceOps
Defined in tensorflow/python/distribute/cross_device_ops.py
.
All-reduce algorithms for distributed TensorFlow.
__init__
__init__(
worker_devices,
num_gpus_per_worker,
all_reduce_spec=('pscpu/pscpu', 2, -1),
num_packs=0,
agg_small_grads_max_bytes=0,
agg_small_grads_max_group=10
)
Initialize the all-reduce algorithm.
Args:
worker_devices
: a list of device strings for workers participating in all-reduce.num_gpus_per_worker
: number of GPU devices per worker.all_reduce_spec
: a tuple or a named tuple or a list of tuples specifying the all-reduce algorithm.- The first element of a tuple is the name of the all-reduce algorithm. Valid algorithm names are: "nccl", "nccl/xring", "nccl/rechd", "nccl/pscpu", "xring", "pscpu", "psgpu", "pscpu/pscpu". Algorithms with a "/" are hierarchical, so two all-reduces are executed, the first one aggregates tensors within a worker and the second aggregates across workers.
- The second element of a tuple is the number of shards when doing all-reduce. Let's say its values is M, each tensor after packing will be split into M shards and then M parallel all-reduces would be performed before finally they are concatenated backed into a complete tensor.
- The third element is the maximum size of tensors that will be applicable for the algorithm specified by the first element. For example, if all_reduce_spec=[("nccl", 2, 1024), ("pscpu/pscpu", 2, -1)], tensors with size not larger than 1024 bytes will be applied a 2-shard "nccl" all-reduce and other tensors will be applied a 2-shard "pscpu/pscpu" algorithm. The third elements should be in increasing order across tuples and end with -1 which indicates infinity.
num_packs
: see AllReduceCrossDeviceOps.agg_small_grads_max_bytes
: see AllReduceCrossDeviceOps.agg_small_grads_max_group
: see AllReduceCrossDeviceOps.
Methods
tf.contrib.distribute.MultiWorkerAllReduce.batch_reduce
batch_reduce(
reduce_op,
value_destination_pairs
)
Reduce PerReplica objects in a batch.
Reduce each first element in value_destination_pairs
to each second
element which indicates the destinations.
Args:
reduce_op
: Indicates how per_replica_value will be reduced. Accepted values aretf.distribute.ReduceOp.SUM
,tf.distribute.ReduceOp.MEAN
.value_destination_pairs
: a list or a tuple of tuples of PerReplica objects (or tensors with device set if there is one device) and destinations.
Returns:
a list of Mirrored objects.
Raises:
ValueError
: ifvalue_destination_pairs
is not a list or a tuple of tuples of PerReplica objects and destinations
tf.contrib.distribute.MultiWorkerAllReduce.broadcast
broadcast(
tensor,
destinations
)
Broadcast the tensor
to destinations.
Args:
tensor
: the tensor to broadcast.destinations
: the broadcast destinations.
Returns:
a Mirrored object.
tf.contrib.distribute.MultiWorkerAllReduce.reduce
reduce(
reduce_op,
per_replica_value,
destinations
)
Reduce per_replica_value
to destinations
.
It runs the reduction operation defined by reduce_op
and put the
result on destinations
.
Args:
reduce_op
: Indicates how per_replica_value will be reduced. Accepted values aretf.distribute.ReduceOp.SUM
,tf.distribute.ReduceOp.MEAN
.per_replica_value
: a PerReplica object or a tensor with device set.destinations
: the reduction destinations.
Returns:
a Mirrored object.
Raises:
ValueError
: if per_replica_value is not a PerReplica object.