Aliases:
tf.contrib.eager.custom_gradient
tf.custom_gradient
tf.custom_gradient(f)
Defined in tensorflow/python/ops/custom_gradient.py
.
Decorator to define a function with a custom gradient.
This decorator allows fine grained control over the gradients of a sequence for operations. This may be useful for multiple reasons, including providing a more efficient or numerically stable gradient for a sequence of operations.
For example, consider the following function that commonly occurs in the computation of cross entropy and log likelihoods:
def log1pexp(x):
return tf.log(1 + tf.exp(x))
Due to numerical instability, the gradient this function evaluated at x=100 is NaN. For example:
x = tf.constant(100.)
y = log1pexp(x)
dy = tf.gradients(y, x) # Will be NaN when evaluated.
The gradient expression can be analytically simplified to provide numerical stability:
@tf.custom_gradient
def log1pexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.log(1 + e), grad
With this definition, the gradient at x=100 will be correctly evaluated as 1.0.
See also tf.RegisterGradient
which registers a gradient function for a
primitive TensorFlow operation. tf.custom_gradient
on the other hand allows
for fine grained control over the gradient computation of a sequence of
operations.
Note that if the decorated function uses Variable
s, the enclosing variable
scope must be using ResourceVariable
s.
Args:
f
: functionf(*x)
that returns a tuple(y, grad_fn)
where:x
is a sequence ofTensor
inputs to the function.y
is aTensor
or sequence ofTensor
outputs of applying TensorFlow operations inf
tox
.grad_fn
is a function with the signatureg(*grad_ys)
which returns a list ofTensor
s - the derivatives ofTensor
s iny
with respect to theTensor
s inx
.grad_ys
is aTensor
or sequence ofTensor
s the same size asy
holding the initial value gradients for eachTensor
iny
. In a pure mathematical sense, a vector-argument vector-valued functionf
's derivatives should be its Jacobian matrixJ
. Here we are expressing the JacobianJ
as a functiongrad_fn
which defines howJ
will transform a vectorgrad_ys
when left-multiplied with it (grad_ys * J
). This functional representation of a matrix is convenient to use for chain-rule calculation (in e.g. the back-propagation algorithm).If
f
usesVariable
s (that are not part of the inputs), i.e. throughget_variable
, thengrad_fn
should have signatureg(*grad_ys, variables=None)
, wherevariables
is a list of theVariable
s, and return a 2-tuple(grad_xs, grad_vars)
, wheregrad_xs
is the same as above, andgrad_vars
is alist<Tensor>
with the derivatives ofTensor
s iny
with respect to the variables (that is, grad_vars has one Tensor per variable in variables).
Returns:
A function h(x)
which returns the same value as f(x)[0]
and whose
gradient (as calculated by tf.gradients
) is determined by f(x)[1]
.