tf.distribute.HierarchicalCopyAllReduce

View source on GitHub

Reduction using hierarchical copy all-reduce.

tf.distribute.HierarchicalCopyAllReduce(
    num_packs=1
)

It reduces to one GPU along edges in some hierarchy and broadcasts back to each GPU along the same path. Before performing all-reduce, tensors will be repacked or aggregated for more efficient cross-device transportation.

This is a reduction created for Nvidia DGX-1 which assumes GPUs connects like that on DGX-1 machine. If you have different GPU inter-connections, it is likely that it would be slower than tf.distribute.ReductionToOneDevice.

Args:

Raises:

ValueError if num_packs is negative.

Methods

batch_reduce

View source

batch_reduce(
    reduce_op, value_destination_pairs
)

Reduce PerReplica objects in a batch.

Reduce each first element in value_destination_pairs to each second element which indicates the destinations.

This can be faster than multiple individual reduces because we can fuse several tensors into one or multiple packs before reduction.

Args:

Returns:

a list of Mirrored objects.

Raises:

broadcast

View source

broadcast(
    tensor, destinations
)

Broadcast the tensor to destinations.

Args:

Returns:

a Mirrored object.

reduce

View source

reduce(
    reduce_op, per_replica_value, destinations
)

Reduce per_replica_value to destinations.

It runs the reduction operation defined by reduce_op and put the result on destinations.

Args:

Returns:

a Mirrored object.

Raises: