Tips and FAQs¶
It takes too long time to compile a computational graph. Can I skip it?¶
Chainer does not compile computational graphs, so you cannot skip it, or, I mean, you have already skipped it :).
It seems you have actually seen on-the-fly compilations of CUDA kernels. CuPy compiles kernels on demand to make kernels optimized to the number of dimensions and element types of input arguments. Pre-compilation is not available, because we have to compile an exponential number of kernels to support all CuPy functionalities. This restriction is unavoidable because Python cannot call CUDA/C++ template functions in generic way. Note that every framework using CUDA require compilation at some point; the difference between other statically-compiled frameworks (such as cutorch) and Chainer is whether a kernel is compiled at installation or at the first use.
These compilations should run only at the first use of the kernels.
The compiled binaries are cached to the $(HOME)/.cupy/kernel_cache
directory by default.
If you see that compilations run every time you run the same script, then the caching is failed.
Please check that the directory is kept as is between multiple executions of the script.
If your home directory is not suited to caching the kernels (e.g. in case that it uses NFS), change the kernel caching directory by setting the CUPY_CACHE_DIR
environment variable to an appropriate path.
See CuPy Overview for more details.
MNIST example does not converge in CPU mode on Mac OS X¶
Note
Mac OS X is not an officially supported OS.
Many users have reported that MNIST example does not work correctly when using vecLib as NumPy backend on Mac OS X. vecLib is the default BLAS library installed on Mac OS X.
We recommend using other BLAS libraries such as OpenBLAS.
To use an alternative BLAS library, it is necessary to reinstall NumPy. Here are instructions to install NumPy with OpenBLAS using Conda.
$ conda install -c conda-forge numpy
Otherwise, to install NumPy without Conda, you may need to install NumPy from source.
Use Homebrew to install OpenBLAS.
$ brew install openblas
Uninstall existing NumPy installation
$ pip uninstall numpy
You’ll to create a file called .numpy-site.cfg in your home (~/) directory with the following:
[openblas]
libraries = openblas
library_dirs = /usr/local/opt/openblas/lib
include_dirs = /usr/local/opt/openblas/include
Install NumPy from the source code
pip install --no-binary :all: numpy
Confirm NumPy has been installed with OpenBLAS by running this command:
$ python -c "import numpy; print(numpy.show_config())"
You should see the following information:
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/opt/openblas/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/usr/local/opt/openblas/lib']
...
Once this is done, you should be able to import chainer
without OpenBLAS errors.
For details of this problem, see issue #704.
How do I fix InvalidType error?¶
Chainer raises an InvalidType
exception when invalid inputs are given to Functions.
If you got InvalidType
, generally you need to check if dtype
and/or shape
of inputs are valid for the function.
Here are some examples of InvalidType
errors:
import chainer.functions as F
import numpy as np
x = np.arange(10) - 5
F.relu(x)
Traceback (most recent call last):
...
chainer.utils.type_check.InvalidType:
Invalid operation is performed in: ReLU (Forward)
Expect: in_types[0].dtype.kind == f
Actual: i != f
In this case, kind
of in_types[0]
(which means the first input to the function, x
) is expected to be f
(floating-point), whereas the input was i
(signed integer).
You need to cast the input appropriately before passing to the function (e.g., x.astype(np.float32)
).
import chainer.functions as F
import numpy as np
x = np.ones((4, 4))
y = np.ones((3, 3))
F.concat([x, y])
Traceback (most recent call last):
...
chainer.utils.type_check.InvalidType:
Invalid operation is performed in: Concat (Forward)
Expect: in_types[0].shape[0] == in_types[1].shape[0]
Actual: 4 != 3
In this case, the function expects that x.shape[0]
is equal to y.shape[0]
, but actually it was 4
and 3
, respectively.
See Type Checks for the detailed behavior of type checking system in Chainer.
How do I accelerate my model using Chainer Backend for Intel Architecture?¶
Follow these steps to utilize Chainer Backend for Intel Architecture in your model.
Install Chainer Backend for Intel Architecture¶
The following environments are recommended by Chainer Backend for Intel Architecture.
Ubuntu 14.04 / 16.04 LTS (64-bit) and CentOS 7 (64-bit)
Python 2.7.6+, 3.5.2+, and 3.6.0+
On recommended systems, you can install Chainer Backend for Intel Architecture wheel (binary distribution) by:
$ pip install 'ideep4py<2.1'
Note
ideep4py
v1.0.x is incompatible with v2.0.x, and is not supported in Chainer v5.0 or later.
Enable Chainer Backend for Intel Architecture Configuration¶
Currently Chainer Backend for Intel Architecture is disabled by default because it is an experimental feature.
You need to manually enable it by changing chainer.config.use_ideep
configuration to 'auto'
.
See Configuring Chainer for details.
The easiest way to change the configuration is to set environment variable as follows:
export CHAINER_USE_IDEEP="auto"
You can also use chainer.using_config()
to change the configuration.
x = np.ones((3, 3), dtype='f')
with chainer.using_config('use_ideep', 'auto'):
y = chainer.functions.relu(x)
print(type(y.data))
<class 'ideep4py.mdarray'>
Convert Your Model to Chainer Backend for Intel Architecture¶
You need to call model.to_intel64()
(in the same way you call model.to_gpu()
to transfer your link to GPU) to convert the link to Chainer Backend for Intel Architecture.
Run Your Model¶
Now your model is accelerated by Chainer Backend for Intel Architecture!
Please note that not all functions and optimizers support Chainer Backend for Intel Architecture acceleration. Also note that Chainer Backend for Intel Architecture will not be used depending on the shape and data type of the input data.
My training process gets stuck when using MultiprocessIterator¶
When you are using OpenCV somewhere in your code and the MultiprocessIterator
is used in the
training code, the training loop may get stuck at some point. In such situation, there are several workarounds to
prevent the process got stuck.
Set the environment variable as follows:
OMP_NUM_THREADS=1
Add
cv2.setNumThreads(0)
right afterimport cv2
in your training script.Use
MultithreadIterator
instead ofMultiprocessIterator
.
This problem is originally reported here: A training loop got stuck in a certain condition with multi-processing updater and opencv for Chainer and the discussion on related problems is still going here: OpenCV + Python multiprocessing breaks on OSX.