View source on GitHub |
A class listing aggregation methods used to combine gradients.
Computing partial derivatives can require aggregating gradient contributions. This class lists the various methods that can be used to combine gradients in the graph.
The following aggregation methods are part of the stable API for aggregating gradients:
ADD_N
: All of the gradient terms are summed as part of one
operation using the "AddN" op (see tf.add_n
). This
method has the property that all gradients must be ready and
buffered separately in memory before any aggregation is performed.DEFAULT
: The system-chosen default aggregation method.The following aggregation methods are experimental and may not be supported in future releases:
EXPERIMENTAL_TREE
: Gradient terms are summed in pairs using
using the "AddN" op. This method of summing gradients may reduce
performance, but it can improve memory utilization because the
gradients can be released earlier.