Frequently Asked Questions

Does Theano support Python 3?

We support both Python 2 >= 2.6 and Python 3 >= 3.3.

TypeError: object of type ‘TensorVariable’ has no len()

If you receive the following error, it is because the Python function __len__ cannot be implemented on Theano variables:

TypeError: object of type 'TensorVariable' has no len()

Python requires that __len__ returns an integer, yet it cannot be done as Theano’s variables are symbolic. However, var.shape[0] can be used as a workaround.

This error message cannot be made more explicit because the relevant aspects of Python’s internals cannot be modified.

Faster gcc optimization

You can enable faster gcc optimization with the cxxflags option. This list of flags was suggested on the mailing list:

-O3 -ffast-math -ftree-loop-distribution -funroll-loops -ftracer

Use it at your own risk. Some people warned that the -ftree-loop-distribution optimization resulted in wrong results in the past.

In the past we said that if the compiledir was not shared by multiple computers, you could add the -march=native flag. Now we recommend to remove this flag as Theano does it automatically and safely, even if the compiledir is shared by multiple computers with different CPUs. In fact, Theano asks g++ what are the equivalent flags it uses, and re-uses them directly.

Faster Theano Function Compilation

Theano function compilation can be time consuming. It can be sped up by setting the flag mode=FAST_COMPILE which instructs Theano to skip most optimizations and disables the generation of any c/cuda code. This is useful for quickly testing a simple idea.

If c/cuda code is necessary, as when using a GPU, the flag optimizer=fast_compile can be used instead. It instructs Theano to skip time consuming optimizations but still generate c/cuda code. To get the most out of this flag requires using a development version of Theano instead of the latest release (0.6).

Similarly using the flag optimizer_excluding=inplace will speed up compilation by preventing optimizations that replace operations with a version that reuses memory where it will not negatively impact the integrity of the operation. Such optimizations can be time consuming. However using this flag will result in greater memory usage because space must be allocated for the results which would be unnecessary otherwise. In short, using this flag will speed up compilation but it will also use more memory because optimizer_excluding=inplace excludes inplace optimizations resulting in a trade off between speed of compilation and memory usage.

Theano flag reoptimize_unpickled_function controls if an unpickled theano function should reoptimize its graph or not. Theano users can use the standard python pickle tools to save a compiled theano function. When pickling, both graph before and after the optimization are saved, including shared variables. When set to True, the graph is reoptimized when being unpickled. Otherwise, skip the graph optimization and use directly the optimized graph from the pickled file. After Theano 0.7, the default changed to False.

Faster Theano function

You can set the Theano flag allow_gc to False to get a speed-up by using more memory. By default, Theano frees intermediate results when we don’t need them anymore. Doing so prevents us from reusing this memory. So disabling the garbage collection will keep all intermediate results’ memory space to allow to reuse them during the next call to the same Theano function, if they are of the correct shape. The shape could change if the shapes of the inputs change.

Note

With CNMeM, this isn’t very useful with GPU anymore.

Unsafe optimization

Some Theano optimizations make the assumption that the user inputs are valid. What this means is that if the user provides invalid values (like incompatible shapes or indexing values that are out of bounds) and the optimizations are applied, the user error will get lost. Most of the time, the assumption is that the user inputs are valid. So it is good to have the optimization being applied, but loosing the error is bad. The newest optimization in Theano with such assumption will add an assertion in the graph to keep the user error message. Computing these assertions could take some time. If you are sure everything is valid in your graph and want the fastest possible Theano, you can enable an optimization that will remove those assertions with: optimizer_including=local_remove_all_assert

Faster Small Theano function

Note

For Theano 0.6 and up.

For Theano functions that don’t do much work, like a regular logistic regression, the overhead of checking the input can be significant. You can disable it by setting f.trust_input to True. Make sure the types of arguments you provide match those defined when the function was compiled.

For example, replace the following

import theano
from theano import function

x = theano.tensor.scalar('x')
f = function([x], x + 1.)
f(10.)

with

import numpy
import theano
from theano import function

x = theano.tensor.scalar('x')
f = function([x], x + 1.)
f.trust_input = True
f(numpy.array([10.], dtype=theano.config.floatX))

Also, for small Theano functions, you can remove more Python overhead by making a Theano function that does not take any input. You can use shared variables to achieve this. Then you can call it like this: f.fn() or f.fn(n_calls=N) to speed it up. In the last case, only the last function output (out of N calls) is returned.

You can also use the C linker that will put all nodes in the same C compilation unit. This removes some overhead between node in the graph, but requires that all nodes in the graph have a C implementation:

x = theano.tensor.scalar('x')
f = function([x], (x + 1.) * 2, mode=theano.Mode(linker='c'))
f(10.)

Out of memory... but not really

Occasionally Theano may fail to allocate memory when there appears to be more than enough reporting:

Error allocating X bytes of device memory (out of memory). Driver report Y bytes free and Z total.

where X is far less than Y and Z (i.e. X << Y < Z).

This scenario arises when an operation requires allocation of a large contiguous block of memory but no blocks of sufficient size are available.

GPUs do not have virtual memory and as such all allocations must be assigned to a continuous memory region. CPUs do not have this limitation because or their support for virtual memory. Multiple allocations on a GPU can result in memory fragmentation which can makes it more difficult to find contiguous regions of memory of sufficient size during subsequent memory allocations.

A known example is related to writing data to shared variables. When updating a shared variable Theano will allocate new space if the size of the data does not match the size of the space already assigned to the variable. This can lead to memory fragmentation which means that a continugous block of memory of sufficient capacity may not be available even if the free memory overall is large enough.

“What are Theano’s Limitations?”

Theano offers a good amount of flexibility, but has some limitations too. You must answer for yourself the following question: How can my algorithm be cleverly written so as to make the most of what Theano can do?

Here is a list of some of the known limitations:

  • While- or for-Loops within an expression graph are supported, but only via the theano.scan() op (which puts restrictions on how the loop body can interact with the rest of the graph).
  • Neither goto nor recursion is supported or planned within expression graphs.

“float32 / int{32, 64} gives float64”

It should be noted that using float32 and int{32, 64} together inside a function would provide float64 as output.

Since the GPU can’t compute this kind of output, it would be preferable not to use those dtypes together.

To help you find where float64 are created, see the warn_float64 Theano flag.

Theano memory/speed trade-off

There is a few things you can easily do to change the trade-off between speed and memory usage. It nothing is said, this affect the CPU and GPU memory usage.

Could speed up and lower memory usage:

  • CuDNN default CuDNN convolution use less

    memory then Theano version. But some flags allow it to use more memory. GPU only.

  • Shortly avail, multi-GPU.

Could raise memory usage but speed up computation:

  • config.lib.cnmem =1 # Do not raise much memory usage, but if you are at the limit of GPU memory available. GPU only.
  • config.allow_gc =False
  • config.optimizer_excluding =low_memory , GPU only for now.

Could lower the memory usage, but raise computation time:

  • config.scan.allow_gc =True # Probably not significant slowdown if config.lib.cnmem is used.

  • config.scan.allow_output_prealloc =False

  • Use batch_normalization(). It use less memory then building a corresponding Theano graph.

  • Disable one or scan more optimizations:
    • optimizer_excluding=scanOp_pushout_seqs_ops
    • optimizer_excluding=scan_pushout_dot1
    • optimizer_excluding=scanOp_pushout_output
  • Disable all optimization tagged as raising memory usage: optimizer_excluding=more_mem (currently only the 3 scan optimizations above)

  • float16.

If you want to analyze the memory usage during computation, the simplest is to let the memory error happen during Theano execution and use the Theano flags exception_verbosity=high.