chainer.functions.lstm¶
-
chainer.functions.
lstm
(c_prev, x)[source]¶ Long Short-Term Memory units as an activation function.
This function implements LSTM units with forget gates. Let the previous cell state
c_prev
and the input arrayx
.First, the input array
x
is split into four arrays \(a, i, f, o\) of the same shapes along the second axis. It means thatx
‘s second axis must have 4 times thec_prev
‘s second axis.The split input arrays are corresponding to:
\(a\) : sources of cell input
\(i\) : sources of input gate
\(f\) : sources of forget gate
\(o\) : sources of output gate
Second, it computes the updated cell state
c
and the outgoing signalh
as:\[\begin{split}c &= \tanh(a) \sigma(i) + c_{\text{prev}} \sigma(f), \\ h &= \tanh(c) \sigma(o),\end{split}\]where \(\sigma\) is the elementwise sigmoid function. These are returned as a tuple of two variables.
This function supports variable length inputs. The mini-batch size of the current input must be equal to or smaller than that of the previous one. When mini-batch size of
x
is smaller than that ofc
, this function only updatesc[0:len(x)]
and doesn’t change the rest ofc
,c[len(x):]
. So, please sort input sequences in descending order of lengths before applying the function.- Parameters
c_prev (
Variable
or N-dimensional array) – Variable that holds the previous cell state. The cell state should be a zero array or the output of the previous call of LSTM.x (
Variable
or N-dimensional array) – Variable that holds the sources of cell input, input gate, forget gate and output gate. It must have the second dimension whose size is four times of that of the cell state.
- Returns
Two
Variable
objectsc
andh
.c
is the updated cell state.h
indicates the outgoing signal.- Return type
See the original paper proposing LSTM with forget gates: Long Short-Term Memory in Recurrent Neural Networks.
See also
Example
Assuming
y
is the current incoming signal,c
is the previous cell state, andh
is the previous outgoing signal from anlstm
function. Each ofy
,c
andh
hasn_units
channels. Most typical preparation ofx
is:>>> n_units = 100 >>> y = chainer.Variable(np.zeros((1, n_units), np.float32)) >>> h = chainer.Variable(np.zeros((1, n_units), np.float32)) >>> c = chainer.Variable(np.zeros((1, n_units), np.float32)) >>> model = chainer.Chain() >>> with model.init_scope(): ... model.w = L.Linear(n_units, 4 * n_units) ... model.v = L.Linear(n_units, 4 * n_units) >>> x = model.w(y) + model.v(h) >>> c, h = F.lstm(c, x)
It corresponds to calculate the input array
x
, or the input sources \(a, i, f, o\), from the current incoming signaly
and the previous outgoing signalh
. Different parameters are used for different kind of input sources.Note
We use the naming rule below.
- incoming signal
The formal input of the formulation of LSTM (e.g. in NLP, word vector or output of lower RNN layer). The input of
chainer.links.LSTM
is the incoming signal.
- input array
The array which is linear transformed from incoming signal and the previous outgoing signal. The input array contains four sources, the sources of cell input, input gate, forget gate and output gate. The input of
chainer.functions.activation.lstm.LSTM
is the input array.