.. _cop:

====================================
Implementing the arithmetic Ops in C
====================================

Now that we have set up our ``double`` type properly to allow C
implementations for operations that work on it, all we have to do now
is to actually define these operations in C.


How does it work?
=================

Before a C :ref:`op` is executed, the variables related to each of its
inputs will be declared and will be filled appropriately, either from
an input provided by the end user (using c_extract) or it might simply
have been calculated by another operation. For each of the outputs,
the variables associated to them will be declared and initialized.

The operation then has to compute what it needs to using the
input variables and place the variables in the output variables.


What needs to be defined
========================

There are less methods to define for an Op than for a Type:

.. class:: Op

    .. method:: c_code(node, name, input_names, output_names, sub)

       This must return C code that carries the computation we want to
       do.

       `sub` is a dictionary of extras parameters to the c_code
       method. It contains the following values:

       ``sub['fail']``

          A string of code that you should execute (after ensuring
          that a python exception is set) if your C code needs to
          raise an exception.

       ``sub['params']``

          (optional) The name of the variable which holds the context
          for the node.  This will only appear if the op has requested
          a context by having a :meth:`get_params()` method that return
          something other than None.

    .. method:: c_code_cleanup(node, name, input_names, output_names, sub)

       This must return C code that cleans up whatever c_code
       allocated and that we must free.

       *Default:* The default behavior is to do nothing.

    .. method:: c_headers([c_compiler])

       Returns a list of headers to include in the file. 'Python.h' is
       included by default so you don't need to specify it.  Also all
       of the headers required by the Types involved (inputs and
       outputs) will also be included.

       The `c_compiler` [#2v]_ parameter is the C compiler that will
       be used to compile the code for the node. You may get multiple
       calls with different C compilers.

    .. method:: c_header_dirs([c_compiler])

       Returns a list of directories to search for headers (arguments
       to -I).

       The `c_compiler` [#2v]_ parameter is the C compiler that will
       be used to compile the code for the node. You may get multiple
       calls with different C compilers.

    .. method:: c_libraries([c_compiler])

       Returns a list of library names that your op needs to link to.
       All ops are automatically linked with 'python' and the
       libraries their types require. (arguments to -l)

       The `c_compiler` [#2v]_ parameter is the C compiler that will
       be used to compile the code for the node. You may get multiple
       calls with different C compilers.

    .. method:: c_lib_dirs([c_compiler])

       Returns a list of directory to search for libraries (arguments
       to -L).

       The `c_compiler` [#2v]_ parameter is the C compiler that will
       be used to compile the code for the node. You may get multiple
       calls with different C compilers.

    .. method:: c_compile_args([c_compiler])

       Allows to specify additional arbitrary arguments to the C
       compiler.  This is not usually required.

       The `c_compiler` [#2v]_ parameter is the C compiler that will
       be used to compile the code for the node. You may get multiple
       calls with different C compilers.

    .. method:: c_no_compile_args([c_compiler])

       Returns a list of C compiler arguments that are forbidden when
       compiling this Op.

       The `c_compiler` [#2v]_ parameter is the C compiler that will
       be used to compile the code for the node. You may get multiple
       calls with different C compilers.

    .. method:: c_init_code()

       Allows you to specify code that will be executed once when the
       module is initialized, before anything else is executed.  This
       is for code that will be executed once per Op.

    .. method:: c_init_code_apply(node, name)

       Allows you to specify code that will be executed once when the
       module is initialized, before anything else is executed and is
       specialized for a particular apply of an :ref:`op`.

    .. method:: c_init_code_struct(node, name, sub)

       Allows you to specify code that will be inserted in the struct
       constructor of the Op.  This is for code which should be
       executed once per thunk (Apply node, more or less).

       `sub` is a dictionary of extras parameters to the
       c_code_init_code_struct method. It contains the following
       values:

       ``sub['fail']``

          A string of code that you should execute (after ensuring
          that a python exception is set) if your C code needs to
          raise an exception.

       ``sub['params']``

          (optional) The name of the variable which holds the context
          for the node.  This will only appear if the op has requested
          a context by having a :meth:`get_params()` method that return
          something other than None.

    .. method:: c_support_code()

       Allows you to specify helper functions/structs that the
       :ref:`op` needs.  That code will be reused for each apply of
       this op. It will be inserted at global scope.

    .. method:: c_support_code_apply(node, name)

       Allows you to specify helper functions/structs specialized for
       a particular apply of an :ref:`op`. Use :meth:`c_support_code`
       if the code is the same for each apply of an op.  It will be
       inserted at global scope.

    .. method:: c_support_code_struct(node, name)

       Allows you to specify helper functions of variables that will
       be specific to one particular thunk.  These are inserted at
       struct scope.

       :note:
         You cannot specify CUDA kernels in the code returned by this
         since that isn't supported by CUDA.  You should place your
         kernels in :meth:`c_support_code()` or
         :meth:`c_support_code_apply()` and call them from this code.

    .. method:: c_cleanup_code_struct(node, name)

       Allows you to specify code that will be inserted in the struct
       destructor of the Op.  This is for cleaninp up allocations and
       stuff like this when the thunk is released (when you "free" a
       compiled function using this op).

    .. method:: infer_shape(node, (i0_shapes,i1_shapes,...))

       Allow optimizations to lift the Shape op over this op.  An
       example of why this is good is when we only need the shape of a
       variable: we will be able to obtain it without computing the
       variable itself.

       Must return a list where each element is a tuple representing
       the shape of one output.

       For example, for the matrix-matrix product ``infer_shape`` will
       have as inputs (node, ((x0,x1), (y0,y1))) and should return
       [(x0, y1)]. Both the inputs and the return value may be Theano
       variables.

    .. method:: c_code_cache_version()

       Must return a tuple of hashable objects like integers. This
       specifies the version of the code. It is used to cache the
       compiled code. You MUST change the returned tuple for each
       change in the code. If you don't want to cache the compiled
       code return an empty tuple or don't implement it.

    .. method:: c_code_cache_version_apply(node)

       Overrides :meth:`c_code_cache_version` if defined, but
       otherwise has the same contract.

    .. method:: python_constant_folding(node)

       Optional. If present this method will be called before doing
       constant folding of a node, with that node as a parameter. If
       it return True, we will not generate c code when doing constant
       folding of this node.  This is useful when the compilation of
       the c code will be longer then the computation in python
       (e.g. Elemwise of scalars).

       In addition, this allow to lower the number of compiled module
       and disk access. Particularly useful when the file system load
       is high or when theano compilation directory is shared by many
       process (like on a network file server on a cluster).

    .. method:: get_params(node)

       (optional) If defined, should return the runtime params the op
       needs.  These parameters will be passed to the C code through the
       variable named in `sub['params']`.  The variable is also
       available for use in the code returned by
       :meth:`c_init_code_struct`.  If it returns `None` this is
       considered the same as if the method was not defined.

       If this method is defined and does not return `None`, then the
       Op *must* have a `params_type` property with the Type to use
       for the params variable.

    .. attribute:: _f16_ok

       (optional) If this attribute is absent or evaluates to `False`,
       C code will be disabled for the op if any of its inputs or
       outputs contains float16 data. This is added as a check to make
       sure we don't compute wrong results since there is no hardware
       float16 type so special care must be taken to make sure
       operations are done correctly.

       If you don't intend to deal with float16 data you can leave
       this undefined.

       This attribute is internal and may go away at any point during
       developpment if a better solution is found.

The ``name`` argument is currently given an invalid value, so steer
away from it. As was the case with Type, ``sub['fail']`` provides
failure code that you *must* use if you want to raise an exception,
after setting the exception message.

The ``node`` argument is an :ref:`apply` node representing an
application of the current Op on a list of inputs, producing a list of
outputs. ``input_names`` and ``output_names`` arguments contain as
many strings as there are inputs and outputs to the application of the
Op and they correspond to the ``name`` that is passed to the type of
each Variable in these lists. For example, if ``node.inputs[0].type ==
double``, then ``input_names[0]`` is the ``name`` argument passed to
``double.c_declare`` etc. when the first input is processed by Theano.

In a nutshell, ``input_names`` and ``output_names`` parameterize the
names of the inputs your operation needs to use and the outputs it
needs to put variables into. But this will be clear with the examples.

.. rubric:: Footnotes

.. [#2v] There are actually two versions of this method one with a
         `c_compiler` parameter and one without. The calling code will
         try the version with c_compiler and try the version without
         if it does not work.  Defining both versions is pointless
         since the one without `c_compiler` will never get called.

         Note that these methods are not specific to a single apply
         node so they may get called more than once on the same object
         with different values for c_compiler.


Defining the methods
====================

We will be defining C code for the multiplication Op on doubles.

**c_code**

.. testsetup::

   from theano import Op
   mul = Op()

.. testcode::

   def c_code(node, name, input_names, output_names, sub):
       x_name, y_name = input_names[0], input_names[1]
       output_name = output_names[0]
       return """
       %(output_name)s = %(x_name)s * %(y_name)s;
       """ % locals()
   mul.c_code = c_code

And that's it. As we enter the scope of the C code we are defining in
the method above, many variables are defined for us. Namely, the
variables x_name, y_name and output_name are all of the primitive C
``double`` type and they were declared using the C code returned by
``double.c_declare``.

Implementing multiplication is as simple as multiplying the two input
doubles and setting the output double to what comes out of it. If you
had more than one output, you would just set the variable(s) for
each output to what they should be.

.. warning::
   Do *NOT* use C's ``return`` statement to return the variable(s) of
   the computations. Set the output variables directly as shown
   above. Theano will pick them up for you.


**c_code_cleanup**

There is nothing to cleanup after multiplying two doubles. Typically,
you won't need to define this method unless you malloc() some
temporary storage (which you would free() here) or create temporary
Python objects (which you would Py_XDECREF() here).


Final version
=============

As before, I tried to organize the code in order to minimize
repetition. You can check that mul produces the same C code in this
version that it produces in the code I gave above.

.. testcode::

   from theano import gof

   class BinaryDoubleOp(gof.Op):

       __props__ = ("name", "fn", "ccode")

       def __init__(self, name, fn, ccode):
           self.name = name
           self.fn = fn
           self.ccode = ccode

       def make_node(self, x, y):
           if isinstance(x, (int, float)):
               x = gof.Constant(double, x)
           if isinstance(y, (int, float)):
               y = gof.Constant(double, y)
           if x.type != double or y.type != double:
               raise TypeError('%s only works on doubles' % self.name)
           return gof.Apply(self, [x, y], [double()])

       def perform(self, node, inp, out):
           x, y = inp
           z, = out
           z[0] = self.fn(x, y)

       def __str__(self):
           return self.name

       def c_code(self, node, name, inp, out, sub):
           x, y = inp
           z, = out
           return self.ccode % locals()


   add = BinaryDoubleOp(name='add',
                        fn=lambda x, y: x + y,
                        ccode="%(z)s = %(x)s + %(y)s;")

   sub = BinaryDoubleOp(name='sub',
                        fn=lambda x, y: x - y,
                        ccode="%(z)s = %(x)s - %(y)s;")

   mul = BinaryDoubleOp(name='mul',
                        fn=lambda x, y: x * y,
                        ccode="%(z)s = %(x)s * %(y)s;")

   div = BinaryDoubleOp(name='div',
                        fn=lambda x, y: x / y,
                        ccode="%(z)s = %(x)s / %(y)s;")