Writing an Op to work on an ndarray
in C¶
So suppose you have looked through the library documentation and you don’t see a function that does what you want.
If you can implement something in terms of existing Ops, you should do that. Odds are your function that uses existing Theano expressions is short, has no bugs, and potentially profits from optimizations that have already been implemented.
However, if you cannot implement an Op in terms of existing Ops, you have to write a new one. Don’t worry, Theano was designed to make it easy to add new Ops, Types, and Optimizations.
This section walks through a non-trivial example Op that does something pretty
weird and unrealistic, that is hard to express with existing Ops.
(Technically, we could use Scan
to implement the Op we’re about to describe,
but we ignore that possibility for the sake of example.)
The following code works, but important error-checking has been omitted for clarity. For example, when you write C code that assumes memory is contiguous, you should check the strides and alignment.
import theano
class Fibby(theano.Op):
"""
An arbitrarily generalized Fibbonacci sequence
"""
__props__ = ()
def make_node(self, x):
x_ = tensor.as_tensor_variable(x)
assert x_.ndim == 1
return theano.Apply(self,
inputs=[x_],
outputs=[x_.type()])
# using x_.type() is dangerous, it copies x's broadcasting behaviour
def perform(self, node, inputs, output_storage):
x, = inputs
y = output_storage[0][0] = x.copy()
for i in range(2, len(x)):
y[i] = y[i-1] * y[i-2] + x[i]
def c_code(self, node, name, inames, onames, sub):
x, = inames
y, = onames
fail = sub['fail']
return """
Py_XDECREF(%(y)s);
%(y)s = (PyArrayObject*)PyArray_FromArray(
%(x)s, 0, NPY_ARRAY_ENSURECOPY);
if (!%(y)s)
%(fail)s;
{//New scope needed to make compilation work
dtype_%(y)s * y = (dtype_%(y)s*)PyArray_DATA(%(y)s);
dtype_%(x)s * x = (dtype_%(x)s*)PyArray_DATA(%(x)s);
for (int i = 2; i < PyArray_DIMS(%(x)s)[0]; ++i)
y[i] = y[i-1]*y[i-2] + x[i];
}
""" % locals()
def c_code_cache_version(self):
return (1,)
fibby = Fibby()
At a high level, the code fragment declares a class (Fibby
) and then
creates one instance of it (fibby
).
We often gloss over this distinction, but will be precise here:
fibby
(the instance) is an Op, not Fibby
(the class which is a subclass of theano.Op
).
You can call fibby(tensor.vector())
on a Variable to build an
expression, and in the expression there will be a .op
attribute that refers
to fibby
.
The first two methods in the Op are relatively boilerplate: __eq__
and __hash__
.
When two Ops are equal, Theano will merge their outputs if they are applied to the same inputs.
The base class (Op) says two objects are equal if (and only if)
they are the same object.
Writing these boilerplate definitions ensures that the logic of the equality comparison is always explicit.
It is an essential part of the Op’s contract that if two Ops compare
equal, then they must compute the same result when presented with the same
inputs. Here, if we allocated another instance of Fibby
by typing fibby2
= Fibby()
then we would have two Ops that behave identically.
When should the implementation of __eq__
be more complicated?
If Fibby.__init__
had parameters, then we could
have configured fibby2
differently from fibby
by passing different
arguments to the constructor. If we had done that, and if that different
configuration made fibby2
compute different results from fibby
(for the
same inputs) then we would have to add logic to the __eq__
and __hash__
function so that he two Fibby
Ops would not be equal. The reason why: Theano’s merge
optimization looks for Ops comparing equal and merges them. If two Ops compare
equal but don’t always produce equal results from equal inputs, then you might
see wrong calculation.
The make_node
method creates a node to be included in the expression graph.
It runs when we apply our Op (fibby
) to Variable (x
), as in fibby(tensor.vector())
.
When an Op has multiple inputs, their order in the inputs argument to Apply
is important: Theano will call make_node(*inputs)
to copy the graph,
so it is important not to change the semantics of the expression by changing the argument order.
All the inputs
and outputs
arguments to Apply
must be Variables.
A common and easy way to ensure inputs are variables is to run them through
as_tensor_variable
.
This function leaves TensorType variables alone, raises an
error for non-TensorType variables, and copies any numpy.ndarray
into the
storage for a TensorType Constant.
The make_node
method dictates the appropriate Type for all output
variables.
The perform
method implements the Op’s mathematical logic in Python.
The inputs (here x
) are passed by value,
but a single output is returned indirectly as the first element of
single-element lists. If fibby
had a second output, it would be stored
in output_storage[1][0]
.
.. jpt: DOn’t understand the following
In some execution modes, the output storage might
contain the return value of a previous call. That old value can be reused to avoid
memory re-allocation, but it must not influence the semantics of the Op output.
The c_code
method accepts variable names as arguments (name
, inames
,
onames
) and returns a C code fragment that computes the expression output.
In case of error, the %(fail)s
statement cleans up and returns properly.
The variables %(x)s
and %(y)s
are set up by the TensorType to be PyArrayObject
pointers.
TensorType also set up dtype_%(x)s
to be a typdef to the C type for x
.
In the first two lines of the C function, we make y point to a new array with
the correct size for the output. This is essentially simulating the line
y = x.copy()
.
Py_XDECREF(%(y)s);
%(y)s = (PyArrayObject*)PyArray_FromArray(
%(x)s, 0, NPY_ARRAY_ENSURECOPY);
The first line reduces the reference count of the data that y originally pointed to. The second line allocates the new data and makes y point to it.
In C code for a theano op, numpy arrays are represented as PyArrayObject
C
structs. This is part of the numpy/scipy C API documented at
http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html
TODO: NEEDS MORE EXPLANATION.
There are some important restrictions to remember when implementing an Op.
Unless your Op correctly defines a view_map
attribute, the perform
and c_code
must not
produce outputs whose memory is aliased to any input (technically, if changing the
output could change the input object in some sense, they are aliased).
Unless your Op correctly defines a destroy_map
attribute, perform
and c_code
must
not modify any of the inputs.
TODO: EXPLAIN DESTROYMAP and VIEWMAP BETTER AND GIVE EXAMPLE.
When developing an Op, you should run computations in DebugMode, by using
argument mode='DebugMode'
to theano.function
. DebugMode is
slow, but it can catch many common violations of the Op contract.
TODO: Like what? How? Talk about Python vs. C too.
DebugMode is no silver bullet though.
For example, if you modify an Op self.*
during any of
make_node
, perform
, or c_code
, you are probably doing something
wrong but DebugMode will not detect this.
TODO: jpt: I don’t understand the following sentence.
Ops and Types should usually be considered immutable – you should
definitely not make a change that would have an impact on __eq__
,
__hash__
, or the mathematical value that would be computed by perform
or c_code
.
Writing an Optimization¶
fibby
of a vector of zeros is another vector of zeros of
the same size.
Theano does not attempt to infer this from the code provided via Fibby.perform
or Fibby.c_code
.
However, we can write an optimization that makes use of this observation.
This sort of local substitution of special cases is common,
and there is a stage of optimization (specialization) devoted to such optimizations.
The following optimization (fibby_of_zero
) tests whether the input is
guaranteed to be all zero, and if so it returns the input itself as a replacement
for the old output.
TODO: talk about OPTIMIZATION STAGES
from theano.tensor.opt import get_scalar_constant_value, NotScalarConstantError
# Remove any fibby(zeros(...))
@theano.tensor.opt.register_specialize
@theano.gof.local_optimizer([fibby])
def fibby_of_zero(node):
if node.op == fibby:
x = node.inputs[0]
try:
if numpy.all(0 == get_scalar_constant_value(x)):
return [x]
except NotScalarConstantError:
pass
The register_specialize
decorator is what activates our optimization, and
tells Theano to use it in the specialization stage.
The local_optimizer
decorator builds a class instance around our global
function. The [fibby]
argument is a hint that our optimizer works on nodes
whose .op
attribute equals fibby
.
The function here (fibby_of_zero
) expects an Apply
instance as an
argument for parameter node
. It tests using
function get_scalar_constant_value
, which determines if a
Variable (x
) is guaranteed to be a constant, and if so, what constant.
Test the optimization¶
Here is some code to test that the optimization is applied only when needed.
import numpy
import theano.tensor as T
from theano import function
from theano import tensor
# Test it does not apply when not needed
x = T.dvector()
f = function([x], fibby(x))
# We call the function to make sure it runs.
# If you run in DebugMode, it will compare the C and Python outputs.
f(numpy.random.rand(5))
topo = f.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, Fibby)
# Test that the optimization gets applied.
f_zero = function([], fibby(T.zeros([5])))
# If you run in DebugMode, it will compare the output before
# and after the optimization.
f_zero()
# Check that the optimization removes the Fibby Op.
# For security, the Theano memory interface ensures that the output
# of the function is always memory not aliased to the input.
# That is why there is a DeepCopyOp op.
topo = f_zero.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, theano.compile.ops.DeepCopyOp)