Comparison with Other Frameworks¶

A table for quick comparison¶

This table compares Chainer with other actively developed deep learning frameworks. Content is current as of July 2017.

		Chainer	PyTorch	TensorFlow	Theano-based	Caffe1/Caffe2	Torch7	MXNet	DyNet	PaddlePaddle	DL4J	CNTK	neon	Knet.jl	Darknet	Thinc
Basics	Language	Python	Python	Python	Python	Python/C++/ MATLAB	LuaJIT	Python/others	Python/C++	Python/C++	Java	BrainScript/ Python/C++	Python	Julia	C	Python
	Approach	define-by-run	define-by-run	symbolic autograd	symbolic autograd	static	static/ manual grads	symbolic autograd/ manual grads/ define-by-run 1	define-by-run	symbolic autograd	static/ manual grads/ symbolic autograd 2	static/ symbolic autograd	static/ symbolic autograd 3	define-by-run	static	callback-based define-by-run
	CPU backend package	NumPy	TH	Eigen	NumPy		TH	mshadow	Eigen		ND4J		NumPy	Julia		NumPy
	GPU backend package	CuPy	THC	Eigen	libgpuarray		THC	mshadow	Eigen		ND4J		neon	KnetArrays		CuPy
	Primary sponsor	Preferred Networks	Facebook	Google	MILA	Facebook	Facebook	Amazon/Apache	CMU	Baidu	Skymind	Microsoft	Intel Nervana	Koç University	Joe Redmon	Explosion AI
NNs	CNNs	full	full	full	full	full	full	full	partial	full	full	full	full	partial	full	none
	RNNs	full	full	full	full	partial	full	full	full	full	full	full	partial	partial	partial	partial
	Reverse-mode autograd	Y	Y	Y	Y		torch-autograd	Y	Y	Y		Y	ngraph	Y		with closures
	Forward-mode autograd			tensorflow-forward-ad	Y
	Higher-order grads	Y 4	Y	Y	Y									Y
	Variable-length loops	native	native	while_loop	scan	RNNs only	native	2017	native	RNNs only	none	dynamic axis	none	native	none	native
	Different architectures per batch	native	native	fold			torch-autograd	MinPy	native					native		native
Performance	cuDNN support	full	full	partial	partial	full	full	full	partial	full	partial	full	N/A 5		partial
	CPU/GPU generic backend	Y	Y				Y	Y	Y	Y	Y	Y	Y	Y		Y
	Multi-GPU data parallelism	Y	Y	Y	Y	Y	Y	Y		Y	Y	Y	Y	Y	Y
	Multi-GPU model parallelism	Y	Y	Y	Y	Y	Y	Y		Y		Y	Y
	Multiprocessing 6	full	partial						full
	Distributed training	ChainerMN	THD	Y		2017	torch-distlearn	Y		Y	Spark	Y	Y
Misc	Runtime debugging	debug mode, typechecking, pdb	pdb	tfdbg				Monitor	pdb		Java debuggers	cntk.debugging		Gallium.jl	gdb	pdb
	Trainer abstraction	native	tnt		Blocks, Lasagne, Keras	native	torchnet			native	native	native	native			native
	Reporter abstraction	native	tnt	native			torchnet	native			native	native
	Web interface	ChainerUI, tensorboardX	tensorboardX, visdom	TensorBoard							DL4J-UI		Nervana Cloud
	Graph compilation engine		2017	XLA		2017		NNVM					ngraph

1: Define-by-run is in development as of June 2017 and tracked in dmlc/mxnet#5705. It is also possible using the much slower MinPy extension.
2: Symbolic autograd is in development as of June 2017 and tracked in deeplearning4j/nd4j#1750.
3: Symbolic autograd is available only with ngraph backend (experimental).
4: Some functions do not support higher-order differentiation. See chainer/chainer#4449.
5: Nervana provides kernels that are meant to compete with cuDNN.
6: Multiprocessing provides a significant performance improvement only for frameworks that use Python at runtime.

Benchmarks¶

Benchmarks for convolutional networks can be found at convnet-benchmarks while some NLP benchmarks are at dynet-benchmark. Chainer wraps the latest available cuDNN kernels for CNNs and RNNs, so performance of most common networks that use these kernels is typically similar to that of other modern frameworks. As Chainer’s define-by-run approach means the user’s Python code is executed directly at runtime, particularly complex networks or those with very small tensor sizes may be slower than in static-graph frameworks.