chainer.functions.lstm¶
-
chainer.functions.lstm(c_prev, x)[source]¶ Long Short-Term Memory units as an activation function.
This function implements LSTM units with forget gates. Let the previous cell state
c_prevand the input arrayx.First, the input array
xis split into four arrays \(a, i, f, o\) of the same shapes along the second axis. It means thatx‘s second axis must have 4 times thec_prev‘s second axis.The split input arrays are corresponding to:
\(a\) : sources of cell input
\(i\) : sources of input gate
\(f\) : sources of forget gate
\(o\) : sources of output gate
Second, it computes the updated cell state
cand the outgoing signalhas:\[\begin{split}c &= \tanh(a) \sigma(i) + c_{\text{prev}} \sigma(f), \\ h &= \tanh(c) \sigma(o),\end{split}\]where \(\sigma\) is the elementwise sigmoid function. These are returned as a tuple of two variables.
This function supports variable length inputs. The mini-batch size of the current input must be equal to or smaller than that of the previous one. When mini-batch size of
xis smaller than that ofc, this function only updatesc[0:len(x)]and doesn’t change the rest ofc,c[len(x):]. So, please sort input sequences in descending order of lengths before applying the function.- Parameters
c_prev (
Variableor N-dimensional array) – Variable that holds the previous cell state. The cell state should be a zero array or the output of the previous call of LSTM.x (
Variableor N-dimensional array) – Variable that holds the sources of cell input, input gate, forget gate and output gate. It must have the second dimension whose size is four times of that of the cell state.
- Returns
Two
Variableobjectscandh.cis the updated cell state.hindicates the outgoing signal.- Return type
See the original paper proposing LSTM with forget gates: Long Short-Term Memory in Recurrent Neural Networks.
See also
Example
Assuming
yis the current incoming signal,cis the previous cell state, andhis the previous outgoing signal from anlstmfunction. Each ofy,candhhasn_unitschannels. Most typical preparation ofxis:>>> n_units = 100 >>> y = chainer.Variable(np.zeros((1, n_units), np.float32)) >>> h = chainer.Variable(np.zeros((1, n_units), np.float32)) >>> c = chainer.Variable(np.zeros((1, n_units), np.float32)) >>> model = chainer.Chain() >>> with model.init_scope(): ... model.w = L.Linear(n_units, 4 * n_units) ... model.v = L.Linear(n_units, 4 * n_units) >>> x = model.w(y) + model.v(h) >>> c, h = F.lstm(c, x)
It corresponds to calculate the input array
x, or the input sources \(a, i, f, o\), from the current incoming signalyand the previous outgoing signalh. Different parameters are used for different kind of input sources.Note
We use the naming rule below.
- incoming signal
The formal input of the formulation of LSTM (e.g. in NLP, word vector or output of lower RNN layer). The input of
chainer.links.LSTMis the incoming signal.
- input array
The array which is linear transformed from incoming signal and the previous outgoing signal. The input array contains four sources, the sources of cell input, input gate, forget gate and output gate. The input of
chainer.functions.activation.lstm.LSTMis the input array.