Comparison with Other Frameworks

A table for quick comparison

This table compares Chainer with other actively developed deep learning frameworks. Content is current as of July 2017.

Chainer

PyTorch

TensorFlow

Theano-based

Caffe1/Caffe2

Torch7

MXNet

DyNet

PaddlePaddle

DL4J

CNTK

neon

Knet.jl

Darknet

Thinc

Basics

Language

Python

Python

Python

Python

Python/C++/ MATLAB

LuaJIT

Python/others

Python/C++

Python/C++

Java

BrainScript/ Python/C++

Python

Julia

C

Python

Approach

define-by-run

define-by-run

symbolic autograd

symbolic autograd

static

static/ manual grads

symbolic autograd/ manual grads/ define-by-run 1

define-by-run

symbolic autograd

static/ manual grads/ symbolic autograd 2

static/ symbolic autograd

static/ symbolic autograd 3

define-by-run

static

callback-based define-by-run

CPU backend package

NumPy

TH

Eigen

NumPy

TH

mshadow

Eigen

ND4J

NumPy

Julia

NumPy

GPU backend package

CuPy

THC

Eigen

libgpuarray

THC

mshadow

Eigen

ND4J

neon

KnetArrays

CuPy

Primary sponsor

Preferred Networks

Facebook

Google

MILA

Facebook

Facebook

Amazon/Apache

CMU

Baidu

Skymind

Microsoft

Intel Nervana

Koç University

Joe Redmon

Explosion AI

NNs

CNNs

full

full

full

full

full

full

full

partial

full

full

full

full

partial

full

none

RNNs

full

full

full

full

partial

full

full

full

full

full

full

partial

partial

partial

partial

Reverse-mode autograd

Y

Y

Y

Y

torch-autograd

Y

Y

Y

Y

ngraph

Y

with closures

Forward-mode autograd

tensorflow-forward-ad

Y

Higher-order grads

Y 4

Y

Y

Y

Y

Variable-length loops

native

native

while_loop

scan

RNNs only

native

2017

native

RNNs only

none

dynamic axis

none

native

none

native

Different architectures per batch

native

native

fold

torch-autograd

MinPy

native

native

native

Performance

cuDNN support

full

full

partial

partial

full

full

full

partial

full

partial

full

N/A 5

partial

CPU/GPU generic backend

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Multi-GPU data parallelism

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Multi-GPU model parallelism

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Multiprocessing 6

full

partial

full

Distributed training

ChainerMN

THD

Y

2017

torch-distlearn

Y

Y

Spark

Y

Y

Misc

Runtime debugging

debug mode, typechecking, pdb

pdb

tfdbg

Monitor

pdb

Java debuggers

cntk.debugging

Gallium.jl

gdb

pdb

Trainer abstraction

native

tnt

Blocks, Lasagne, Keras

native

torchnet

native

native

native

native

native

Reporter abstraction

native

tnt

native

torchnet

native

native

native

Web interface

ChainerUI, tensorboardX

tensorboardX, visdom

TensorBoard

DL4J-UI

Nervana Cloud

Graph compilation engine

2017

XLA

2017

NNVM

ngraph

1

Define-by-run is in development as of June 2017 and tracked in dmlc/mxnet#5705. It is also possible using the much slower MinPy extension.

2

Symbolic autograd is in development as of June 2017 and tracked in deeplearning4j/nd4j#1750.

3

Symbolic autograd is available only with ngraph backend (experimental).

4

Some functions do not support higher-order differentiation. See chainer/chainer#4449.

5

Nervana provides kernels that are meant to compete with cuDNN.

6

Multiprocessing provides a significant performance improvement only for frameworks that use Python at runtime.

Benchmarks

Benchmarks for convolutional networks can be found at convnet-benchmarks while some NLP benchmarks are at dynet-benchmark. Chainer wraps the latest available cuDNN kernels for CNNs and RNNs, so performance of most common networks that use these kernels is typically similar to that of other modern frameworks. As Chainer’s define-by-run approach means the user’s Python code is executed directly at runtime, particularly complex networks or those with very small tensor sizes may be slower than in static-graph frameworks.