MXNet Python Symbolic API

You are also highly encouraged to read Symbolic Configuration and Execution in Pictures with this document.

How to Compose Symbols

The symbolic API provides a way for you to configure the computation graphs. You can do it in a level of neural network layer operations, as well as fine grained operations.

The following code gives an example of two layer neural network configuration.

>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
>>> net = mx.symbol.Activation(data=net, name='relu1', act_type="relu")
>>> net = mx.symbol.FullyConnected(data=net, name='fc2', num_hidden=64)
>>> net = mx.symbol.SoftmaxOutput(data=net, name='out')
>>> type(net)
<class 'mxnet.symbol.Symbol'>

The basic arithematic operators(plus, minus, div, multiplication) are overloaded for elementwise operations of symbols.

The following code gives an example of computation graph that add two inputs together.

>>> import mxnet as mx
>>> a = mx.symbol.Variable('a')
>>> b = mx.symbol.Variable('b')
>>> c = a + b

Symbol Attributes

Attributes can be attached to symbols, by providing an attribute dictionary when creating a symbol.

data = mx.sym.Variable('data', attr={'mood': 'angry'})
op   = mx.sym.Convolution(data=data, name='conv', kernel=(1, 1),
                          num_filter=1, attr={'mood': 'so so'})

Both key and values of the attribute dictionary should be strings, in order to properly communicate with the C++ backend. The attributes can be retrived via attr(key) or list_attr():

assert data.attr('mood') == 'angry'
assert op.list_attr() == {'mood': 'so so'}

In the case of a composite symbol, you can also retrieve all the attributes associated with that symbol and its descendents via list_attr(recursive=True). Note in the returned dictionary, all the attribute names are with a prefix 'symbol_name' + '_' in order to avoid naming conflicts.

assert op.list_attr(recursive=True) == {'data_mood': 'angry', 'conv_mood': 'so so',
                                        'conv_weight_mood': 'so so', 'conv_bias_mood': 'so so'}

Here you may noticed that the mood attribute we set for the Convolution operator is copied to conv_weight and conv_bias. Those are symbols automatically created by the Convolution operator, and the attributes are also automatically copied for them. This is intentional and is especially useful for annotation of context groups in model parallelism. However, if the weight or bias symbol are explicitly created by the user, then the attributes for the host operator will not be copied to them:

weight = mx.sym.Variable('crazy_weight', attr={'size': '5'})
data = mx.sym.Variable('data', attr={'mood': 'angry'})
op = mx.sym.Convolution(data=data, weight=weight, name='conv', kernel=(1, 1),
                              num_filter=1, attr={'mood': 'so so'})
op.list_attr(recursive=True)
# =>
# {'conv_mood': 'so so',
#  'conv_bias_mood': 'so so',
#  'crazy_weight_size': '5',
#  'data_mood': 'angry'}

As you can see, the mood attribute is copied to the automatically created symbol conv_bias, but not to the manually created weight symbol crazy_weight.

Another way of attaching attributes is to use AttrScope. An AttrScope will automatically add the specified attributes to all the symbols created within that scope. For example:

data = mx.symbol.Variable('data')
with mx.AttrScope(group='4', data='great'):
    fc1 = mx.symbol.Activation(data, act_type='relu')
    with mx.AttrScope(init_bias='0.0'):
        fc2 = mx.symbol.FullyConnected(fc1, num_hidden=10, name='fc2')
assert fc1.attr('data') == 'great'
assert fc2.attr('data') == 'great'
assert fc2.attr('init_bias') == '0.0'

Naming convention: it is recommended to choose the attribute names to be valid variable names. Names with double underscope (e.g. __shape__) are reserved for internal use. The slash '_' is the character used to separate a symbol name and its attributes, as well as the separator between a symbol and a variable that is automatically created by that symbol. For example, the weight variable created automatically by a Convolution operator named conv1 will be called conv1_weight.

Components that uses attributes: more and more components are using symbol attributes to collect useful annotations for the computational graph. Here is a (probably incomplete) list:

  • Variable use attributes to store (optional) shape information for a variable.
  • Optimizers will read lr_mult and wd_mult attributes for each symbol in a computational graph. This is useful to control per-layer learning rate and decay.
  • The model parallelism LSTM example uses ctx_group attribute to divide the operators into different groups corresponding to different GPU devices.

Serialization

There are two ways to save and load the symbols. You can use pickle to serialize the Symbol objects. Alternatively, you can use mxnet.symbol.Symbol.save and mxnet.symbol.load, functions. The advantage of using save and load is that it is language agnostic, and also being cloud friendly. The symbol is saved in json format. You can also directly get a json string using mxnet.symbol.Symbol.tojson

The following code gives an example of saving a symbol to S3 bucket, load it back and compare two symbols using json string.

>>> import mxnet as mx
>>> a = mx.symbol.Variable('a')
>>> b = mx.symbol.Variable('b')
>>> c = a + b
>>> c.save('s3://my-bucket/symbol-c.json')
>>> c2 = mx.symbol.load('s3://my-bucket/symbol-c.json')
>>> c.tojson() == c2.tojson()
True

Multiple Ouputs

You can use mxnet.symbol.Group function to group the symbols together.

>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> fc1 = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
>>> net = mx.symbol.Activation(data=fc1, name='relu1', act_type="relu")
>>> net = mx.symbol.FullyConnected(data=net, name='fc2', num_hidden=64)
>>> out = mx.symbol.SoftmaxOutput(data=net, name='softmax')
>>> group = mx.symbol.Group([fc1, out])
>>> group.list_outputs()
['fc1_output', 'softmax_output']

After you get the group, you can go ahead and bind on group instead, and the resulting executor will have two outputs, one for fc1_output and one for softmax_output.

Symbol Creation API Reference

Symbolic configuration API of mxnet.

class mxnet.symbol.Symbol(handle)

Symbol is symbolic graph of the mxnet.

__call__(*args, **kwargs)

Invoke symbol as function on inputs.

Parameters:
  • args – provide positional arguments
  • kwargs – provide keyword arguments
Returns:

Return type:

the resulting symbol

name

Get name string from the symbol, this function only works for non-grouped symbol.

Returns:value – The name of this symbol, returns None for grouped symbol.
Return type:str
attr(key)

Get attribute string from the symbol, this function only works for non-grouped symbol.

Parameters:key (str) – The key to get attribute from.
Returns:value – The attribute value of the key, returns None if attribute do not exist.
Return type:str
list_attr(recursive=False)

Get all attributes from the symbol.

Parameters:recursive (bool) – Default False. When recursive is True, list recursively all the attributes in the descendents. The attribute names are pre-pended with the symbol names to avoid conflicts. If False, then only attributes that belongs to this symbol is returned, and the attribute names will not be pre-pended with the symbol name.
get_internals()

Get a new grouped symbol whose output contains all the internal outputs of this symbol.

Returns:sgroup – The internal of the symbol.
Return type:Symbol
list_arguments()

List all the arguments in the symbol.

Returns:args – List of all the arguments.
Return type:list of string
list_outputs()

List all outputs in the symbol.

Returns:returns – List of all the outputs.
Return type:list of string
list_auxiliary_states()

List all auxiliary states in the symbol.

Returns:aux_states – List the names of the auxiliary states.
Return type:list of string

Notes

Auxiliary states are special states of symbols that do not corresponds to an argument, and do not have gradient. But still be useful for the specific operations. A common example of auxiliary state is the moving_mean and moving_variance in BatchNorm. Most operators do not have Auxiliary states.

infer_type(*args, **kwargs)

Infer the type of outputs and arguments of given known types of arguments.

User can either pass in the known types in positional way or keyword argument way. Tuple of Nones is returned if there is not enough information passed in. An error will be raised if there is inconsistency found in the known types passed in.

Parameters:
  • *args

    Provide type of arguments in a positional way. Unknown type can be marked as None

  • **kwargs

    Provide keyword arguments of known types.

Returns:

  • arg_types (list of numpy.dtype or None) – List of types of arguments. The order is in the same order as list_arguments()
  • out_types (list of numpy.dtype or None) – List of types of outputs. The order is in the same order as list_outputs()
  • aux_types (list of numpy.dtype or None) – List of types of outputs. The order is in the same order as list_auxiliary()
infer_shape(*args, **kwargs)

Infer the shape of outputs and arguments of given known shapes of arguments.

User can either pass in the known shapes in positional way or keyword argument way. Tuple of Nones is returned if there is not enough information passed in. An error will be raised if there is inconsistency found in the known shapes passed in.

Parameters:
  • *args

    Provide shape of arguments in a positional way. Unknown shape can be marked as None

  • **kwargs

    Provide keyword arguments of known shapes.

Returns:

  • arg_shapes (list of tuple or None) – List of shapes of arguments. The order is in the same order as list_arguments()
  • out_shapes (list of tuple or None) – List of shapes of outputs. The order is in the same order as list_outputs()
  • aux_shapes (list of tuple or None) – List of shapes of outputs. The order is in the same order as list_auxiliary()
infer_shape_partial(*args, **kwargs)

Partially infer the shape. The same as infer_shape, except that the partial results can be returned.

debug_str()

Get a debug string.

Returns:debug_str – Debug string of the symbol.
Return type:string
save(fname)

Save symbol into file.

You can also use pickle to do the job if you only work on python. The advantage of load/save is the file is language agnostic. This means the file saved using save can be loaded by other language binding of mxnet. You also get the benefit being able to directly load/save from cloud storage(S3, HDFS)

Parameters:fname (str) – The name of the file - s3://my-bucket/path/my-s3-symbol - hdfs://my-bucket/path/my-hdfs-symbol - /path-to/my-local-symbol

See also

symbol.load()
Used to load symbol from file.
tojson()

Save symbol into a JSON string.

See also

symbol.load_json()
Used to load symbol from JSON string.
simple_bind(ctx, grad_req='write', type_dict=None, group2ctx=None, **kwargs)

Bind current symbol to get an executor, allocate all the ndarrays needed. Allows specifying data types.

This function will ask user to pass in ndarray of position they like to bind to, and it will automatically allocate the ndarray for arguments and auxiliary states that user did not specify explicitly.

Parameters:
  • ctx (Context) – The device context the generated executor to run on.
  • grad_req (string) – {‘write’, ‘add’, ‘null’}, or list of str or dict of str to str, optional Specifies how we should update the gradient to the args_grad. - ‘write’ means everytime gradient is write to specified args_grad NDArray. - ‘add’ means everytime gradient is add to the specified NDArray. - ‘null’ means no action is taken, the gradient may not be calculated.
  • type_dict (dict of str->numpy.dtype) – Input type dictionary, name->dtype
  • group2ctx (dict of string to mx.Context) – The dict mapping the ctx_group attribute to the context assignment.
  • kwargs (dict of str->shape) – Input shape dictionary, name->shape
Returns:

executor – The generated Executor

Return type:

mxnet.Executor

bind(ctx, args, args_grad=None, grad_req='write', aux_states=None, group2ctx=None, shared_exec=None)

Bind current symbol to get an executor.

Parameters:
  • ctx (Context) – The device context the generated executor to run on.
  • args (list of NDArray or dict of str to NDArray) –

    Input arguments to the symbol.

    • If type is list of NDArray, the position is in the same order of list_arguments.
    • If type is dict of str to NDArray, then it maps the name of arguments to the corresponding NDArray.
    • In either case, all the arguments must be provided.
  • args_grad (list of NDArray or dict of str to NDArray, optional) –

    When specified, args_grad provide NDArrays to hold the result of gradient value in backward.

    • If type is list of NDArray, the position is in the same order of list_arguments.
    • If type is dict of str to NDArray, then it maps the name of arguments to the corresponding NDArray.
    • When the type is dict of str to NDArray, users only need to provide the dict for needed argument gradient. Only the specified argument gradient will be calculated.
  • grad_req ({'write', 'add', 'null'}, or list of str or dict of str to str, optional) –

    Specifies how we should update the gradient to the args_grad.

    • ‘write’ means everytime gradient is write to specified args_grad NDArray.
    • ‘add’ means everytime gradient is add to the specified NDArray.
    • ‘null’ means no action is taken, the gradient may not be calculated.
  • aux_states (list of NDArray, or dict of str to NDArray, optional) –

    Input auxiliary states to the symbol, only need to specify when list_auxiliary_states is not empty.

    • If type is list of NDArray, the position is in the same order of list_auxiliary_states
    • If type is dict of str to NDArray, then it maps the name of auxiliary_states to the corresponding NDArray,
    • In either case, all the auxiliary_states need to be provided.
  • group2ctx (dict of string to mx.Context) – The dict mapping the ctx_group attribute to the context assignment.
  • shared_exec (mx.executor.Executor) – Executor to share memory with. This is intended for runtime reshaping, variable length sequences, etc. The returned executor shares state with shared_exec, and should not be used in parallel with it.
Returns:

executor – The generated Executor

Return type:

mxnet.Executor

Notes

Auxiliary states are special states of symbols that do not corresponds to an argument, and do not have gradient. But still be useful for the specific operations. A common example of auxiliary state is the moving_mean and moving_variance in BatchNorm. Most operators do not have auxiliary states and this parameter can be safely ignored.

User can give up gradient by using a dict in args_grad and only specify gradient they interested in.

grad(wrt)

Get the autodiff of current symbol.

This function can only be used if current symbol is a loss function.

Parameters:wrt (Array of String) – keyword arguments of the symbol that the gradients are taken.
Returns:grad – A gradient Symbol with returns to be the corresponding gradients.
Return type:Symbol
mxnet.symbol.Variable(name, attr=None, shape=None)

Create a symbolic variable with specified name.

Parameters:
  • name (str) – Name of the variable.
  • attr (dict of string -> string) – Additional attributes to set on the variable.
  • shape (tuple) – Optionally, one can specify the shape of a variable. This will be used during shape inference. If user specified a different shape for this variable using keyword argument when calling shape inference, this shape information will be ignored.
Returns:

variable – The created variable symbol.

Return type:

Symbol

mxnet.symbol.Group(symbols)

Create a symbol that groups symbols together.

Parameters:symbols (list) – List of symbols to be grouped.
Returns:sym – The created group symbol.
Return type:Symbol
mxnet.symbol.load(fname)

Load symbol from a JSON file.

You can also use pickle to do the job if you only work on python. The advantage of load/save is the file is language agnostic. This means the file saved using save can be loaded by other language binding of mxnet. You also get the benefit being able to directly load/save from cloud storage(S3, HDFS)

Parameters:fname (str) –

The name of the file, examples:

  • s3://my-bucket/path/my-s3-symbol
  • hdfs://my-bucket/path/my-hdfs-symbol
  • /path-to/my-local-symbol
Returns:sym – The loaded symbol.
Return type:Symbol

See also

Symbol.save()
Used to save symbol into file.
mxnet.symbol.load_json(json_str)

Load symbol from json string.

Parameters:json_str (str) – A json string.
Returns:sym – The loaded symbol.
Return type:Symbol

See also

Symbol.tojson()
Used to save symbol into json string.
mxnet.symbol.pow(base, exp)

Raise base to an exp.

Parameters:
  • base (Symbol or Number) –
  • exp (Symbol or Number) –
Returns:

result

Return type:

Symbol or Number

mxnet.symbol.sum(data, axis=None, keepdims=False, name=None)

Calculate the sum of the array along given axis. The semantic strictly follows numpy’s document.

Parameters:
  • data (Symbol) – the array to be reduced
  • axis (int or list(int), optional) – along which axis to do reduction
  • keepdims (bool) – whether the reduced axis should be kept in the final shape
Returns:

out – Symbol represents the reduced Array.

Return type:

Symbol

mxnet.symbol.maximum(left, right)

maximum left and right

Parameters:
  • left (Symbol or Number) –
  • right (Symbol or Number) –
Returns:

result

Return type:

Symbol or Number

mxnet.symbol.minimum(left, right)

minimum left and right

Parameters:
  • left (Symbol or Number) –
  • right (Symbol or Number) –
Returns:

result

Return type:

Symbol or Number

mxnet.symbol.Activation(*args, **kwargs)

Apply activation function to input.Softmax Activation is only available with CUDNN on GPUand will be computed at each location across channel if input is 4D.

Parameters:
  • data (Symbol) – Input data to activation function.
  • act_type ({'relu', 'sigmoid', 'softrelu', 'tanh'}, required) – Activation function to be applied.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.BatchNorm(*args, **kwargs)

Apply batch normalization to input.

Parameters:
  • data (Symbol) – Input data to batch normalization
  • eps (float, optional, default=0.001) – Epsilon to prevent div 0
  • momentum (float, optional, default=0.9) – Momentum for moving average
  • fix_gamma (boolean, optional, default=True) – Fix gamma while training
  • use_global_stats (boolean, optional, default=False) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.BlockGrad(*args, **kwargs)

Get output from a symbol and pass 0 gradient back

Parameters:
  • data (Symbol) – Input data.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Cast(*args, **kwargs)

Cast array to a different data type.

Parameters:
  • data (Symbol) – Input data to cast function.
  • dtype ({'float16', 'float32', 'float64', 'int32', 'uint8'}, required) – Target data type.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Concat(*args, **kwargs)

Perform an feature concat on channel dim (defaut is 1) over all This function support variable length of positional input.

Parameters:
  • data (Symbol[]) – List of tensors to concatenate
  • num_args (int, required) – Number of inputs to be concated.
  • dim (int, optional, default='1') – the dimension to be concated.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

Examples

>>> import mxnet as mx
>>> data = mx.nd.array(range(6)).reshape((2,1,3))
>>> print "input shape = %s" % data.shape
>>> print "data = %s" % (data.asnumpy(), )
input shape = (2L, 1L, 3L)
data = [[[ 0.  1.  2.]]
 [[ 3.  4.  5.]]]
>>> # concat two variables on different dimensions
>>> a = mx.sym.Variable('a')
>>> b = mx.sym.Variable('b')
>>> for dim in range(3):
... cat = mx.sym.Concat(a, b, dim=dim)
... exe = cat.bind(ctx=mx.cpu(), args={'a':data, 'b':data})
... exe.forward()
... out = exe.outputs[0]
... print "concat at dim = %d" % dim
... print "shape = %s" % (out.shape, )
... print "results = %s" % (out.asnumpy(), )
concat at dim = 0
shape = (4L, 1L, 3L)
results = [[[ 0.  1.  2.]]
 [[ 3.  4.  5.]]
 [[ 0.  1.  2.]]
 [[ 3.  4.  5.]]]
concat at dim = 1
shape = (2L, 2L, 3L)
results = [[[ 0.  1.  2.]
  [ 0.  1.  2.]]
 [[ 3.  4.  5.]
  [ 3.  4.  5.]]]
concat at dim = 2
shape = (2L, 1L, 6L)
results = [[[ 0.  1.  2.  0.  1.  2.]]
 [[ 3.  4.  5.  3.  4.  5.]]]
mxnet.symbol.Convolution(*args, **kwargs)

Apply convolution to input then add a bias.

Parameters:
  • data (Symbol) – Input data to the ConvolutionOp.
  • weight (Symbol) – Weight matrix.
  • bias (Symbol) – Bias parameter.
  • kernel (Shape(tuple), required) – convolution kernel size: (y, x)
  • stride (Shape(tuple), optional, default=(1,1)) – convolution stride: (y, x)
  • dilate (Shape(tuple), optional, default=(1,1)) – convolution dilate: (y, x)
  • pad (Shape(tuple), optional, default=(0,0)) – pad for convolution: (y, x)
  • num_filter (int (non-negative), required) – convolution filter(channel) number
  • num_group (int (non-negative), optional, default=1) – Number of groups partition. This option is not supported by CuDNN, you can use SliceChannel to num_group,apply convolution and concat instead to achieve the same need.
  • workspace (long (non-negative), optional, default=512) – Tmp workspace for convolution (MB).
  • no_bias (boolean, optional, default=False) – Whether to disable bias parameter.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Crop(*args, **kwargs)

Crop the 2nd and 3rd dim of input data, with the corresponding size of h_w or with width and height of the second input symbol, i.e., with one input, we need h_w to specify the crop height and width, otherwise the second input symbol’s size will be used This function support variable length of positional input.

Parameters:
  • data (Symbol or Symbol[]) – Tensor or List of Tensors, the second input will be used as crop_like shape reference
  • num_args (int, required) – Number of inputs for crop, if equals one, then we will use the h_wfor crop height and width, else if equals two, then we will use the heightand width of the second input symbol, we name crop_like here
  • offset (Shape(tuple), optional, default=(0,0)) – crop offset coordinate: (y, x)
  • h_w (Shape(tuple), optional, default=(0,0)) – crop height and weight: (h, w)
  • center_crop (boolean, optional, default=False) – If set to true, then it will use be the center_crop,or it will crop using the shape of crop_like
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.CuDNNBatchNorm(*args, **kwargs)

Apply batch normalization to input.

Parameters:
  • data (Symbol) – Input data to batch normalization
  • eps (float, optional, default=0.001) – Epsilon to prevent div 0
  • momentum (float, optional, default=0.9) – Momentum for moving average
  • fix_gamma (boolean, optional, default=True) – Fix gamma while training
  • use_global_stats (boolean, optional, default=False) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Custom(*args, **kwargs)

Custom operator implemented in frontend.

Parameters:
  • op_type (string) – Type of custom operator. Must be registered first.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Deconvolution(*args, **kwargs)

Apply deconvolution to input then add a bias.

Parameters:
  • data (Symbol) – Input data to the DeconvolutionOp.
  • weight (Symbol) – Weight matrix.
  • bias (Symbol) – Bias parameter.
  • kernel (Shape(tuple), required) – deconvolution kernel size: (y, x)
  • stride (Shape(tuple), optional, default=(1,1)) – deconvolution stride: (y, x)
  • pad (Shape(tuple), optional, default=(0,0)) – pad for deconvolution: (y, x)
  • num_filter (int (non-negative), required) – deconvolution filter(channel) number
  • num_group (int (non-negative), optional, default=1) – number of groups partition
  • workspace (long (non-negative), optional, default=512) – Tmp workspace for deconvolution (MB)
  • no_bias (boolean, optional, default=True) – Whether to disable bias parameter.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Dropout(*args, **kwargs)

Apply dropout to input

Parameters:
  • data (Symbol) – Input data to dropout.
  • p (float, optional, default=0.5) – Fraction of the input that gets dropped out at training time
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.ElementWiseSum(*args, **kwargs)

Perform an elementwise sum over all the inputs. This function support variable length of positional input.

Parameters:
  • num_args (int, required) – Number of inputs to be summed.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Embedding(*args, **kwargs)

Get embedding for one-hot input. A n-dimensional input tensor will be trainsformed into a (n+1)-dimensional tensor, where a new dimension is added for the embedding results.

Parameters:
  • data (Symbol) – Input data to the EmbeddingOp.
  • weight (Symbol) – Enbedding weight matrix.
  • input_dim (int, required) – input dim of one-hot encoding
  • output_dim (int, required) – output dim of embedding
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Flatten(*args, **kwargs)

Flatten input

Parameters:
  • data (Symbol) – Input data to flatten.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.FullyConnected(*args, **kwargs)

Apply matrix multiplication to input then add a bias.

Parameters:
  • data (Symbol) – Input data to the FullyConnectedOp.
  • weight (Symbol) – Weight matrix.
  • bias (Symbol) – Bias parameter.
  • num_hidden (int, required) – Number of hidden nodes of the output.
  • no_bias (boolean, optional, default=False) – Whether to disable bias parameter.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.IdentityAttachKLSparseReg(*args, **kwargs)

Apply a sparse regularization to the output a sigmoid activation function.

Parameters:
  • data (Symbol) – Input data.
  • sparseness_target (float, optional, default=0.1) – The sparseness target
  • penalty (float, optional, default=0.001) – The tradeoff parameter for the sparseness penalty
  • momentum (float, optional, default=0.9) – The momentum for running average
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.L2Normalization(*args, **kwargs)

Set the l2 norm of each instance to a constant.

Parameters:
  • data (Symbol) – Input data to the L2NormalizationOp.
  • eps (float, optional, default=1e-10) – Epsilon to prevent div 0
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.LRN(*args, **kwargs)

Apply convolution to input then add a bias.

Parameters:
  • data (Symbol) – Input data to the ConvolutionOp.
  • alpha (float, optional, default=0.0001) – value of the alpha variance scaling parameter in the normalization formula
  • beta (float, optional, default=0.75) – value of the beta power parameter in the normalization formula
  • knorm (float, optional, default=2) – value of the k parameter in normalization formula
  • nsize (int (non-negative), required) – normalization window width in elements.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.LeakyReLU(*args, **kwargs)

Apply activation function to input.

Parameters:
  • data (Symbol) – Input data to activation function.
  • act_type ({'elu', 'leaky', 'prelu', 'rrelu'},optional, default='leaky') – Activation function to be applied.
  • slope (float, optional, default=0.25) – Init slope for the activation. (For leaky and elu only)
  • lower_bound (float, optional, default=0.125) – Lower bound of random slope. (For rrelu only)
  • upper_bound (float, optional, default=0.334) – Upper bound of random slope. (For rrelu only)
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.LinearRegressionOutput(*args, **kwargs)

Use linear regression for final output, this is used on final output of a net.

Parameters:
  • data (Symbol) – Input data to function.
  • label (Symbol) – Input label to function.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.LogisticRegressionOutput(*args, **kwargs)

Use Logistic regression for final output, this is used on final output of a net. Logistic regression is suitable for binary classification or probability prediction tasks.

Parameters:
  • data (Symbol) – Input data to function.
  • label (Symbol) – Input label to function.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.MAERegressionOutput(*args, **kwargs)

Use mean absolute error regression for final output, this is used on final output of a net.

Parameters:
  • data (Symbol) – Input data to function.
  • label (Symbol) – Input label to function.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.MakeLoss(*args, **kwargs)

Get output from a symbol and pass 1 gradient back. This is used as a terminal loss if unary and binary operator are used to composite a loss with no declaration of backward dependency

Parameters:
  • data (Symbol) – Input data.
  • grad_scale (float, optional, default=1) – gradient scale as a supplement to unary and binary operators
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Pooling(*args, **kwargs)

Perform spatial pooling on inputs.

Parameters:
  • data (Symbol) – Input data to the pooling operator.
  • global_pool (boolean, optional, default=False) – Ignore kernel size, do global pooling based on current input feature map. This is useful for input with different shape
  • kernel (Shape(tuple), required) – pooling kernel size: (y, x)
  • pool_type ({'avg', 'max', 'sum'}, required) – Pooling type to be applied.
  • stride (Shape(tuple), optional, default=(1,1)) – stride: for pooling (y, x)
  • pad (Shape(tuple), optional, default=(0,0)) – pad for pooling: (y, x)
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.ROIPooling(*args, **kwargs)

Performs region-of-interest pooling on inputs. Resize bounding box coordinates by spatial_scale and crop input feature maps accordingly. The cropped feature maps are pooled by max pooling to a fixed size output indicated by pooled_size. batch_size will change to the number of region bounding boxes after ROIPooling

Parameters:
  • data (Symbol) – Input data to the pooling operator, a 4D Feature maps
  • rois (Symbol) – Bounding box coordinates, a 2D array of [[batch_index, x1, y1, x2, y2]]. (x1, y1) and (x2, y2) are top left and down right corners of designated region of interest. batch_index indicates the index of corresponding image in the input data
  • pooled_size (Shape(tuple), required) – fix pooled size: (h, w)
  • spatial_scale (float, required) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Reshape(*args, **kwargs)

Reshape input to target shape

Parameters:
  • data (Symbol) – Input data to reshape.
  • target_shape (Shape(tuple), optional, default=(0,0)) – (Deprecated! Use shape instead.) Target new shape. One and only one dim can be 0, in which case it will be inferred from the rest of dims
  • keep_highest (boolean, optional, default=False) – (Deprecated! Use shape instead.) Whether keep the highest dim unchanged.If set to yes, than the first dim in target_shape is ignored,and always fixed as input
  • shape (, optional, default=()) – Target new shape. If the dim is same, set it to 0. If the dim is set to be -1, it will be inferred from the rest of dims. One and only one dim can be -1
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.SliceChannel(*args, **kwargs)

Slice input equally along specified axis

Parameters:
  • num_outputs (int, required) – Number of outputs to be sliced.
  • axis (int, optional, default='1') – Dimension along which to slice.
  • squeeze_axis (boolean, optional, default=False) – If true AND the sliced dimension becomes 1, squeeze that dimension.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.Softmax(*args, **kwargs)

DEPRECATED: Perform a softmax transformation on input. Please use SoftmaxOutput

Parameters:
  • data (Symbol) – Input data to softmax.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • ignore_label (float, optional, default=-1) – the label value will be ignored during backward (only works if use_ignore is set to be true).
  • multi_output (boolean, optional, default=False) – If set to true, for a (n,k,x_1,..,x_n) dimensional input tensor, softmax will generate n*x_1*...*x_n output, each has k classes
  • use_ignore (boolean, optional, default=False) – If set to true, the ignore_label value will not contribute to the backward gradient
  • normalization ({'batch', 'null', 'valid'},optional, default='null') – If set to null, op will do nothing on output gradient.If set to batch, op will normalize gradient by divide batch sizeIf set to valid, op will normalize gradient by divide sample not ignored
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.SoftmaxActivation(*args, **kwargs)

Apply softmax activation to input. This is intended for internal layers. For output (loss layer) please use SoftmaxOutput. If type=instance, this operator will compute a softmax for each instance in the batch; this is the default mode. If type=channel, this operator will compute a num_channel-class softmax at each position of each instance; this can be used for fully convolutional network, image segmentation, etc.

Parameters:
  • data (Symbol) – Input data to activation function.
  • mode ({'channel', 'instance'},optional, default='instance') – Softmax Mode. If set to instance, this operator will compute a softmax for each instance in the batch; this is the default mode. If set to channel, this operator will compute a num_channel-class softmax at each position of each instance; this can be used for fully convolutional network, image segmentation, etc.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.SoftmaxOutput(*args, **kwargs)

Perform a softmax transformation on input, backprop with logloss.

Parameters:
  • data (Symbol) – Input data to softmax.
  • label (Symbol) – Label data.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • ignore_label (float, optional, default=-1) – the label value will be ignored during backward (only works if use_ignore is set to be true).
  • multi_output (boolean, optional, default=False) – If set to true, for a (n,k,x_1,..,x_n) dimensional input tensor, softmax will generate n*x_1*...*x_n output, each has k classes
  • use_ignore (boolean, optional, default=False) – If set to true, the ignore_label value will not contribute to the backward gradient
  • normalization ({'batch', 'null', 'valid'},optional, default='null') – If set to null, op will do nothing on output gradient.If set to batch, op will normalize gradient by divide batch sizeIf set to valid, op will normalize gradient by divide sample not ignored
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.SwapAxis(*args, **kwargs)

Apply swapaxis to input.

Parameters:
  • data (Symbol) – Input data to the SwapAxisOp.
  • dim1 (int (non-negative), optional, default=0) – the first axis to be swapped.
  • dim2 (int (non-negative), optional, default=0) – the second axis to be swapped.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.UpSampling(*args, **kwargs)

Perform nearest neighboor/bilinear up sampling to inputs This function support variable length of positional input.

Parameters:
  • data (Symbol[]) – Array of tensors to upsample
  • scale (int (non-negative), required) – Up sampling scale
  • num_filter (int (non-negative), optional, default=0) – Input filter. Only used by nearest sample_type.
  • sample_type ({'bilinear', 'nearest'}, required) – upsampling method
  • multi_input_mode ({'concat', 'sum'},optional, default='concat') – How to handle multiple input. concat means concatenate upsampled images along the channel dimension. sum means add all images together, only available for nearest neighbor upsampling.
  • num_args (int, required) – Number of inputs to be upsampled. For nearest neighbor upsampling, this can be 1-N; the size of output will be(scale*h_0,scale*w_0) and all other inputs will be upsampled to thesame size. For bilinear upsampling this must be 2; 1 input and 1 weight.
  • workspace (long (non-negative), optional, default=512) – Tmp workspace for deconvolution (MB)
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.abs(*args, **kwargs)

Take absolute value of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_axis(*args, **kwargs)

Broadcast data in the given axis to the given size. The original size of the broadcasting axis must be 1.

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_div(*args, **kwargs)

lhs divide rhs with broadcast

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_minus(*args, **kwargs)

lhs minus rhs with broadcast

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_mul(*args, **kwargs)

lhs multiple rhs with broadcast

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_plus(*args, **kwargs)

lhs add rhs with broadcast

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.ceil(*args, **kwargs)

Take ceil value of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.cos(*args, **kwargs)

Take cos of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.dot(*args, **kwargs)

Calculate dot product of two matrices or two vectors

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.exp(*args, **kwargs)

Take exp of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.floor(*args, **kwargs)

Take floor value of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.log(*args, **kwargs)

Take log of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.round(*args, **kwargs)

Take round value of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.rsqrt(*args, **kwargs)

Take rsqrt of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.sign(*args, **kwargs)

Take sign value of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.sin(*args, **kwargs)

Take sin of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.smooth_l1(*args, **kwargs)

Calculate Smooth L1 Loss(lhs, scalar)

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.softmax_cross_entropy(*args, **kwargs)

Calculate cross_entropy(lhs, one_hot(rhs))

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.sqrt(*args, **kwargs)

Take sqrt of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.square(*args, **kwargs)

Take square of the src

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.sum_axis(*args, **kwargs)

Take sum of the src in the given axis. axis=-1 means to reduce all the dimensions.The keepdims option has the same meaning as Numpy.

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

mxnet.symbol.transpose(*args, **kwargs)

Transpose the input matrix and return a new one

Parameters:
  • lhs (Symbol) – Left symbolic input to the function
  • rhs (Symbol) – Left symbolic input to the function
  • name (string, optional.) – Name of the resulting symbol.
Returns:

symbol – The result symbol.

Return type:

Symbol

Execution API Reference

Symbolic Executor component of MXNet.

class mxnet.executor.Executor(handle, symbol, ctx, grad_req, group2ctx)

Executor is the actual executing object of MXNet.

forward(is_train=False, **kwargs)

Calculate the outputs specified by the bound symbol.

Parameters:
  • is_train (bool, optional) – whether this forward is for evaluation purpose.
  • **kwargs

    Additional specification of input arguments.

Examples

>>> # doing forward by specifying data
>>> texec.forward(is_train=True, data=mydata)
>>> # doing forward by not specifying things, but copy to the executor before hand
>>> mydata.copyto(texec.arg_dict['data'])
>>> texec.forward(is_train=True)
backward(out_grads=None)

Do backward pass to get the gradient of arguments.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
set_monitor_callback(callback)

Install callback.

Parameters:callback (function) – Takes a string and an NDArrayHandle.
arg_dict

Get dictionary representation of argument arrrays.

Returns:arg_dict – The dictionary that maps name of arguments to NDArrays.
Return type:dict of str to NDArray
Raises:ValueError : if there are duplicated names in the arguments.
grad_dict

Get dictionary representation of gradient arrays.

Returns:grad_dict – The dictionary that maps name of arguments to gradient arrays.
Return type:dict of str to NDArray
aux_dict

Get dictionary representation of auxiliary states arrays.

Returns:aux_dict – The dictionary that maps name of auxiliary states to NDArrays.
Return type:dict of str to NDArray
Raises:ValueError : if there are duplicated names in the auxiliary states.
copy_params_from(arg_params, aux_params=None, allow_extra_params=False)

Copy parameters from arg_params, aux_params into executor’s internal array.

Parameters:
  • arg_params (dict of str to NDArray) – Parameters, dict of name to NDArray of arguments
  • aux_params (dict of str to NDArray, optional) – Parameters, dict of name to NDArray of auxiliary states.
  • allow_extra_params (boolean, optional) – Whether allow extra parameters that are not needed by symbol If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
Raises:

ValueError – If there is additional parameters in the dict but allow_extra_params=False

reshape(partial_shaping=False, allow_up_sizing=False, **kwargs)

Return a new executor with the same symbol and shared memory, but different input/output shapes. For runtime reshaping, variable length sequences, etc. The returned executor shares state with the current one, and cannot be used in parallel with it.

Parameters:
  • partial_shaping (bool) – Whether to allow changing the shape of unspecified arguments.
  • allow_up_sizing (bool) – Whether to allow allocating new ndarrays that’s larger than the original.
  • kwargs (dict of string to tuple of int) – new shape for arguments.
Returns:

exec – A new executor that shares memory with self.

Return type:

Executor

debug_str()

Get a debug string about internal execution plan.

Returns:debug_str – Debug string of the executor.
Return type:string