Module Interface HowTo

The module API provide intermediate-level and high-level interface for computation with neural networks in MXNet. A “module” is an instance of subclasses of BaseModule. The most widely used module class is simply called Module, which wraps a Symbol and one or more Executors. Please refer to the API doc for BaseModule below for a full list of functions available. Each specific subclass of modules might have some extra interface functions. We provide here some examples of common use cases. All the module APIs live in the namespace of mxnet.module or simply mxnet.mod.

Preparing a module for computation

To construct a module, refer to the constructors of the specific module class. For example, the Module class takes a Symbol as input,

import mxnet as mx

# construct a simple MLP
data = mx.symbol.Variable('data')
fc1  = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu")
fc2  = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64)
act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu")
fc3  = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10)
out  = mx.symbol.SoftmaxOutput(fc3, name = 'softmax')

# construct the module
mod = mx.mod.Module(out)

You also need to specify the data_names and label_names of your Symbol. Here we skip those parameters because our Symbol follows a conventional way of naming, so the default behavior (data named as data, and label named as softmax_label) is OK. Another important parameter is context, which by default is the CPU. You can specify a GPU context or even a list of GPU contexts if data-parallelization is needed.

Before one can compute with a module, we need to call bind() to allocate the device memory, and init_params() or set_params() to initialize the parameters.

mod.bind(data_shapes=train_dataiter.provide_data,
         label_shapes=train_dataiter.provide_label)
mod.init_params()

Now you can compute with the module via functions like forward(), backward(), etc. If you simply want to fit a module, you do not need to call bind() and init_params() explicitly, as the fit() function will call them automatically if needed.

Training, Predicting, and Evaluating

Modules provide high-level APIs for training, predicting and evaluating. To fit a module, simply call the fit() function with some DataIters:

mod = mx.mod.Module(softmax)
mod.fit(train_dataiter, eval_data=val_dataiter,
        optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
        num_epoch=n_epoch)

The interface is very similar to the old FeedForward class. You can pass in batch-end callbacks as well as epoch-end callbacks. To predict with a module, simply call predict() with a DataIter:

mod.predict(val_dataiter)

It will collect and return all the prediction results. Please refer to the doc of predict() for more details about the format of the return values. Another convenient API for prediction in the case where the prediction results might be too large to fit in the memory is iter_predict:

for preds, i_batch, batch in mod.iter_predict(val_dataiter):
    pred_label = preds[0].asnumpy().argmax(axis=1)
    label = batch.label[0].asnumpy().astype('int32')
    # do something...

If you do not need the prediction outputs, but just need to evaluate on a test set, you can call the score() function with a DataIter and a EvalMetric:

mod.score(val_dataiter, metric)

It will run predictions on each batch in the provided DataIter and compute the evaluation score using the provided EvalMetric. The evaluation results will be stored in metric so that you can query later on.

Saving and Loading Module Parameters

You can save the module parameters in each training epoch by using a checkpoint callback.

model_prefix = 'mymodel'
checkpoint = mx.callback.do_checkpoint(model_prefix)

mod.fit(..., epoch_end_callback=checkpoint)

To load the saved module parameters, call the load_checkpoint function:

sym, arg_params, aux_params = \
    mx.model.load_checkpoint(model_prefix, n_epoch_load)

# assign parameters
mod.set_params(arg_params, aux_params)

Or if you just want to resume training from a saved checkpoint, instead of calling set_params(), you can directly call fit(), passing the loaded parameters, so that fit() knows to start from those parameters instead of initializing from random.

mod.fit(..., arg_params=arg_params, aux_params=aux_params,
        begin_epoch=n_epoch_load)

Note we also pass in begin_epoch so that fit() knows we are resuming from a previous saved epoch.

Module Interface API

The BaseModule Interface

BaseModule defines an API for modules.

class mxnet.module.base_module.BaseModule(logger=<module 'logging' from '/home/jie/anaconda2/envs/doc/lib/python2.7/logging/__init__.pyc'>)

The base class of a modules. A module represents a computation component. The design purpose of a module is that it abstract a computation “machine”, that one can run forward, backward, update parameters, etc. We aim to make the APIs easy to use, especially in the case when we need to use imperative API to work with multiple modules (e.g. stochastic depth network).

A module has several states:

  • Initial state. Memory is not allocated yet, not ready for computation yet.
  • Binded. Shapes for inputs, outputs, and parameters are all known, memory allocated, ready for computation.
  • Parameter initialized. For modules with parameters, doing computation before initializing the parameters might result in undefined outputs.
  • Optimizer installed. An optimizer can be installed to a module. After this, the parameters of the module can be updated according to the optimizer after gradients are computed (forward-backward).

In order for a module to interactive with others, a module should be able to report the following information in its raw stage (before binded)

  • data_names: list of string indicating the names of required data.
  • output_names: list of string indicating the names of required outputs.

And also the following richer information after binded:

  • state information
    • binded: bool, indicating whether the memory buffers needed for computation has been allocated.
    • for_training: whether the module is binded for training (if binded).
    • params_initialized: bool, indicating whether the parameters of this modules has been initialized.
    • optimizer_initialized: ‘bool`, indicating whether an optimizer is defined and initialized.
    • inputs_need_grad: bool, indicating whether gradients with respect to the input data is needed. Might be useful when implementing composition of modules.
  • input/output information
    • data_shapes: a list of (name, shape). In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelization, the data arrays might not be of the same shape as viewed from the external world.
    • label_shapes: a list of (name, shape). This might be [] if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not binded for training.
    • output_shapes: a list of (name, shape) for outputs of the module.
  • parameters (for modules with parameters)
    • get_params(): return a tuple (arg_params, aux_params). Each of those is a dictionary of name to NDArray mapping. Those NDArray always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters. Therefore, modifying
    • set_params(arg_params, aux_params): assign parameters to the devices doing the computation.
    • init_params(...): a more flexible interface to assign or initialize the parameters.
  • setup
    • bind(): prepare environment for computation.
    • init_optimizer(): install optimizer for parameter updating.
  • computation
    • forward(data_batch): forward operation.

    • backward(out_grads=None): backward operation.

    • update(): update parameters according to installed optimizer.

    • get_outputs(): get outputs of the previous forward operation.

    • get_input_grads(): get the gradients with respect to the inputs computed in the previous backward operation.

    • update_metric(metric, labels): update performance metric for the previous forward

      computed results.

  • other properties (mostly for backward compatability)
    • symbol: the underlying symbolic graph for this module (if any) This property is not necessarily constant. For example, for BucketingModule, this property is simply the current symbol being used. For other modules, this value might not be well defined.

When those intermediate-level API are implemented properly, the following high-level API will be automatically available for a module:

  • fit: train the module parameters on a data set
  • predict: run prediction on a data set and collect outputs
  • score: run prediction on a data set and evaluate performance
forward_backward(data_batch)

A convenient function that calls both forward and backward.

score(eval_data, eval_metric, num_batch=None, batch_end_callback=None, reset=True, epoch=0)

Run prediction on eval_data and evaluate the performance according to eval_metric.

Parameters:
  • eval_data (DataIter) –
  • eval_metric (EvalMetric) –
  • num_batch (int) – Number of batches to run. Default is None, indicating run until the DataIter finishes.
  • batch_end_callback (function) – Could also be a list of functions.
  • reset (bool) – Default True, indicating whether we should reset eval_data before starting evaluating.
  • epoch (int) – Default 0. For compatibility, this will be passed to callbacks (if any). During training, this will correspond to the training epoch number.
iter_predict(eval_data, num_batch=None, reset=True)

Iterate over predictions.

for pred, i_batch, batch in module.iter_predict(eval_data):
# pred is a list of outputs from the module # i_batch is a integer # batch is the data batch from the data iterator
Parameters:
  • eval_data (DataIter) –
  • num_batch (int) – Default is None, indicating running all the batches in the data iterator.
  • reset (bool) – Default is True, indicating whether we should reset the data iter before start doing prediction.
predict(eval_data, num_batch=None, merge_batches=True, reset=True, always_output_list=False)

Run prediction and collect the outputs.

Parameters:
  • eval_data (DataIter) –
  • num_batch (int) – Default is None, indicating running all the batches in the data iterator.
  • merge_batches (bool) – Default is True, see the doc for return values.
  • reset (bool) – Default is True, indicating whether we should reset the data iter before start doing prediction.
  • always_output_list (bool) – Default is False, see the doc for return values.
Returns:

  • When merge_batches is True (by default), the return value will be a list
  • [out1, out2, out3]. Where each element is concatenation of the outputs for
  • all the mini-batches. If further that always_output_list is False (by default),
  • then in the case of a single output, out1 is returned instead of [out1].
  • When merge_batches is False, the return value will be a nested list like
  • [[out1_batch1, out2_batch1], [out1_batch2], ...]. This mode is useful because
  • in some cases (e.g. bucketing), the module does not necessarily produce the same
  • number of outputs.
  • The objects in the results are `NDArray`s. If you need to work with numpy array,
  • just call .asnumpy() on each of the NDArray.
fit(train_data, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), eval_batch_end_callback=None, initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_rebind=False, force_init=False, begin_epoch=0, num_epoch=None, validation_metric=None)

Train the module parameters.

Parameters:
  • train_data (DataIter) –
  • eval_data (DataIter) – If not None, will be used as validation set and evaluate the performance after each epoch.
  • eval_metric (str or EvalMetric) – Default ‘acc’. The performance measure used to display during training.
  • epoch_end_callback (function or list of function) – Each callback will be called with the current epoch, symbol, arg_params and aux_params.
  • batch_end_callback (function or list of function) – Each callback will be called with a BatchEndParam.
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The parameters for the optimizer constructor. The default value is not a dict, just to avoid pylint warning on dangerous default values.
  • eval_batch_end_callback (function or list of function) –
  • initializer (Initializer) – Will be called to initialize the module parameters if not already initialized.
  • arg_params (dict) – Default None, if not None, should be existing parameters from a trained model or loaded from a checkpoint (previously saved model). In this case, the value here will be used to initialize the module parameters, unless they are already initialized by the user via a call to init_params or fit. arg_params has higher priority to initializer.
  • aux_params (dict) – Default None. Similar to arg_params, except for auxiliary states.
  • allow_missing (bool) – Default False. Indicate whether we allow missing parameters when arg_params and aux_params are not None. If this is True, then the missing parameters will be initialized via the initializer.
  • force_rebind (bool) – Default False. Whether to force rebinding the executors if already binded.
  • force_init (bool) – Default False. Indicate whether we should force initialization even if the parameters are already initialized.
  • begin_epoch (int) – Default 0. Indicate the starting epoch. Usually, if we are resuming from a checkpoint saved at a previous training phase at epoch N, then we should specify this value as N+1.
  • num_epoch (int) – Number of epochs to run training.
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not binded for training, then this should return an empty list [].

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

get_params()

Get parameters, those are potentially copies of the the actual parameters used to do computation on the device.

Returns:
Return type:(arg_params, aux_params), a pair of dictionary of name to value mapping.
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize the parameters and auxiliary states.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If true, will force re-initialize even if already initialized.
set_params(arg_params, aux_params)

Assign parameter and aux state values.

Parameters:
  • arg_params (dict) – Dictionary of name to value (NDArray) mapping.
  • aux_params (dict) – Dictionary of name to value (NDArray) mapping.
save_params(fname)

Save model parameters to file.

Parameters:fname (str) – Path to output param file.
load_params(fname)

Load model parameters from file.

Parameters:fname (str) – Path to input param file.
forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.
backward(out_grads=None)

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
get_outputs(merge_multi_context=True)

Get outputs of the previous forward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [out1, out2]. Otherwise, it
  • is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output
  • elements are NDArray. When merge_multi_context is False, those NDArray
  • might live on different devices.
get_input_grads(merge_multi_context=True)

Get the gradients to the inputs, computed in the previous backward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the gradients will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it
  • is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output
  • elements are NDArray. When merge_multi_context is False, those NDArray
  • might live on different devices.
update()

Update parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None)

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already binded. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
symbol

Get the symbol associated with this module.

Except for Module, for other types of modules (e.g. BucketingModule), this property might not be a constant throughout its life time. Some modules might not even be associated with any symbols.

The Built-in Modules

A Module implement the BaseModule API by wrapping a Symbol and one or more Executor for data parallelization.

class mxnet.module.module.Module(symbol, data_names=('data', ), label_names=('softmax_label', ), logger=<module 'logging' from '/home/jie/anaconda2/envs/doc/lib/python2.7/logging/__init__.pyc'>, context=cpu(0), work_load_list=None)

Module is a basic module that wrap a Symbol. It is functionally the same as the FeedForward model, except under the module API.

Parameters:
  • symbol (Symbol) –
  • data_names (list of str) – Default is (‘data’) for a typical model used in image classification.
  • label_names (list of str) – Default is (‘softmax_label’) for a typical model used in image classification.
  • logger (Logger) – Default is logging.
  • context (Context or list of Context) – Default is cpu().
  • work_load_list (list of number) – Default None, indicating uniform workload.
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Get data shapes. :returns: :rtype: A list of (name, shape) pairs.

label_shapes

Get label shapes. :returns: * A list of (name, shape) pairs. The return value could be None if

  • the module does not need labels, or if the module is not binded for
  • training (in this case, label information is not available).
output_shapes

Get output shapes. :returns: :rtype: A list of (name, shape) pairs.

get_params()

Get current parameters. :returns: * (arg_params, aux_params), each a dictionary of name to parameters (in

  • NDArray) mapping.
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize the parameters and auxiliary states.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If true, will force re-initialize even if already initialized.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None)

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already binded. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
borrow_optimizer(shared_module)

Borrow optimizer from a shared module. Used in bucketing, where exactly the same optimizer (esp. kvstore) is used.

Parameters:shared_module (Module) –
forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.
backward(out_grads=None)

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
update()

Update parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

get_outputs(merge_multi_context=True)

Get outputs of the previous forward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [out1, out2]. Otherwise, it
  • is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output
  • elements are NDArray.
get_input_grads(merge_multi_context=True)

Get the gradients with respect to the inputs of the module.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it
  • is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output
  • elements are NDArray.
update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.

A BucketingModule implement the BaseModule API, and allows multiple symbols to be used depending on the bucket_key provided by each different mini-batch of data.

class mxnet.module.bucketing_module.BucketingModule(sym_gen, default_bucket_key=None, logger=<module 'logging' from '/home/jie/anaconda2/envs/doc/lib/python2.7/logging/__init__.pyc'>, context=cpu(0), work_load_list=None)

A bucketing module is a module that support bucketing.

Parameters:
  • sym_gen (function) – A function when called with a bucket key, returns a triple (symbol, data_names, label_names).
  • default_bucket_key (str (or any python object)) – The key for the default bucket.
  • logger (Logger) –
  • context (Context or list of Context) – Default cpu()
  • work_load_list (list of number) – Default None, indicating uniform workload.
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Get data shapes. :returns: :rtype: A list of (name, shape) pairs.

label_shapes

Get label shapes. :returns: * A list of (name, shape) pairs. The return value could be None if

  • the module does not need labels, or if the module is not binded for
  • training (in this case, label information is not available).
output_shapes

Get output shapes. :returns: :rtype: A list of (name, shape) pairs.

get_params()

Get current parameters. :returns: * (arg_params, aux_params), each a dictionary of name to parameters (in

  • NDArray) mapping.
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize parameters.

Parameters:
  • initializer (Initializer) –
  • arg_params (dict) – Default None. Existing parameters. This has higher priority than initializer.
  • aux_params (dict) – Default None. Existing auxiliary states. This has higher priority than initializer.
  • allow_missing (bool) – Allow missing values in arg_params and aux_params (if not None). In this case, missing values will be filled with initializer.
  • force_init (bool) – Default False.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None)

Binding for a BucketingModule means setting up the buckets and bind the executor for the default bucket key. Executors corresponding to other keys are binded afterwards with switch_bucket.

Parameters:
  • data_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.
  • label_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.
  • for_training (bool) – Default is True.
  • inputs_need_grad (bool) – Default is False.
  • force_rebind (bool) – Default is False.
  • shared_module (BucketingModule) – Default is None. This value is currently not used.
switch_bucket(bucket_key, data_shapes, label_shapes=None)

Switch to a different bucket. This will change self.curr_module.

Parameters:
  • bucket_key (str (or any python object)) – The key of the target bucket.
  • data_shapes (list of (str, tuple)) – Typically data_batch.provide_data.
  • label_shapes (list of (str, tuple)) – Typically data_batch.provide_label.
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) –
  • is_train (bool) – Default is None, in which case is_train is take as self.for_training.
backward(out_grads=None)

Backward computation.

update()

Update parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

get_outputs(merge_multi_context=True)

Get outputs from a previous forward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [out1, out2]. Otherwise, it
  • is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output
  • elements are numpy arrays.
get_input_grads(merge_multi_context=True)

Get the gradients with respect to the inputs of the module.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it
  • is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output
  • elements are NDArray.
update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
symbol

The symbol of the current bucket being used.

SequentialModule is a container module that chains a number of modules together.

class mxnet.module.sequential_module.SequentialModule(logger=<module 'logging' from '/home/jie/anaconda2/envs/doc/lib/python2.7/logging/__init__.pyc'>)

A SequentialModule is a container module that can chain multiple modules together. Note building a computation graph with this kind of imperative container is less flexible and less efficient than the symbolic graph. So this should be only used as a handy utility.

add(module, **kwargs)

Add a module to the chain.

Parameters:
  • module (BaseModule) – The new module to add.
  • kwargs (**keywords) –

    All the keyword arguments are saved as meta information for the added module. The currently known meta includes

    • take_labels: indicating whether the module expect to take labels when doing computation. Note any module in the chain can take labels (not necessarily only the top most one), and they all take the same labels passed from the original data batch for the SequentialModule.
Returns:

  • This function returns self to allow us to easily chain a
  • series of add calls.
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Get data shapes. :returns: * A list of (name, shape) pairs. The data shapes of the

  • first module is the data shape of a SequentialModule.
label_shapes

Get label shapes. :returns: * A list of (name, shape) pairs. The return value could be None if

  • the module does not need labels, or if the module is not binded for
  • training (in this case, label information is not available).
output_shapes

Get output shapes. :returns: * A list of (name, shape) pairs. The output shapes of the last

  • module is the output shape of a SequentialModule.
get_params()

Get current parameters. :returns: * (arg_params, aux_params), each a dictionary of name to parameters (in

  • NDArray) mapping. This is a merged dictionary of all the parameters
  • in the modules.
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize parameters.

Parameters:
  • initializer (Initializer) –
  • arg_params (dict) – Default None. Existing parameters. This has higher priority than initializer.
  • aux_params (dict) – Default None. Existing auxiliary states. This has higher priority than initializer.
  • allow_missing (bool) – Allow missing values in arg_params and aux_params (if not None). In this case, missing values will be filled with initializer.
  • force_init (bool) – Default False.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None)

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already binded. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. Currently shared module is not supported for SequentialModule.
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) –
  • is_train (bool) – Default is None, in which case is_train is take as self.for_training.
backward(out_grads=None)

Backward computation.

update()

Update parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

get_outputs(merge_multi_context=True)

Get outputs from a previous forward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [out1, out2]. Otherwise, it
  • is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output
  • elements are numpy arrays.
get_input_grads(merge_multi_context=True)

Get the gradients with respect to the inputs of the module.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it
  • is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output
  • elements are NDArray.
update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.

Writing Modules in Python

Provide some handy classes for user to implement a simple computation module in Python easily.

class mxnet.module.python_module.PythonModule(data_names, label_names, output_names, logger=<module 'logging' from '/home/jie/anaconda2/envs/doc/lib/python2.7/logging/__init__.pyc'>)

A convenient module class that implements many of the module APIs as empty functions.

Parameters:
  • data_names (list of str) – Names of the data expected by the module.
  • label_names (list of str) – Names of the labels expected by the module. Could be None if the module does not need labels.
  • output_names (list of str) – Names of the outputs.
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not binded for training, then this should return an empty list [].

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

get_params()

Get parameters, those are potentially copies of the the actual parameters used to do computation on the device.

Returns:
  • ({}, {}), a pair of empty dict. Subclass should override this method if
  • contains parameters.
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize the parameters and auxiliary states. By default this function does nothing. Subclass should override this method if contains parameters.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If true, will force re-initialize even if already initialized.
update()

Update parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch. Currently we do nothing here. Subclass should override this method if contains parameters.

update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation. ubclass should override this method if needed.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None)

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already binded. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers. By default we do nothing. Subclass should

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
class mxnet.module.python_module.PythonLossModule(name='pyloss', data_names=('data', ), label_names=('softmax_label', ), logger=<module 'logging' from '/home/jie/anaconda2/envs/doc/lib/python2.7/logging/__init__.pyc'>, grad_func=None)

A convenient module class that implements many of the module APIs as empty functions.

Parameters:
  • name (str) – Names of the module. The outputs will be named [name + ‘_output’].
  • data_names (list of str) – Default [‘data’]. Names of the data expected by this module. Should be a list of only one name.
  • label_names (list of str) – Default [‘softmax_label’]. Names of the labels expected by the module. Should be a list of only one name.
  • grad_func (function) – Optional. If not None, should be a function that takes scores and labels, both of type NDArray, and return the gradients with respect to the scores according to this loss function. The return value could be a numpy array or an NDArray.
forward(data_batch, is_train=None)

Forward computation. Here we do nothing but to keep a reference to the scores and the labels so that we can do backward computation.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.
get_outputs(merge_multi_context=True)

Get outputs of the previous forward computation. As a output loss module, we treat the inputs to this module as scores, and simply return them.

Parameters:merge_multi_context (bool) – Should always be True, because we do not use multiple contexts for computing.
backward(out_grads=None)

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
get_input_grads(merge_multi_context=True)

Get the gradients to the inputs, computed in the previous backward computation.

Parameters:merge_multi_context (bool) – Should always be True because we do not use multiple context for computation.