chainer.functions.n_step_lstm¶
- 
chainer.functions.n_step_lstm(n_layers, dropout_ratio, hx, cx, ws, bs, xs)[source]¶
- Stacked Uni-directional Long Short-Term Memory function. - This function calculates stacked Uni-directional LSTM with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\). \[\begin{split}i_t &= \sigma(W_0 x_t + W_4 h_{t-1} + b_0 + b_4) \\ f_t &= \sigma(W_1 x_t + W_5 h_{t-1} + b_1 + b_5) \\ o_t &= \sigma(W_2 x_t + W_6 h_{t-1} + b_2 + b_6) \\ a_t &= \tanh(W_3 x_t + W_7 h_{t-1} + b_3 + b_7) \\ c_t &= f_t \cdot c_{t-1} + i_t \cdot a_t \\ h_t &= o_t \cdot \tanh(c_t)\end{split}\]- As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Eight weight matrices and eight bias vectors are required for each layer. So, when \(S\) layers exist, you need to prepare \(8S\) weight matrices and \(8S\) bias vectors. - If the number of layers - n_layersis greater than \(1\), the input of the- k-th layer is the hidden state- h_tof the- k-1-th layer. Note that all input variables except the first layer may have different shape from the first layer.- Parameters
- n_layers (int) – The number of layers. 
- dropout_ratio (float) – Dropout ratio. 
- hx (Variable) – Variable holding stacked hidden states. Its shape is - (S, B, N)where- Sis the number of layers and is equal to- n_layers,- Bis the mini-batch size, and- Nis the dimension of the hidden units.
- cx (Variable) – Variable holding stacked cell states. It has the same shape as - hx.
- ws (list of list of - Variable) – Weight matrices.- ws[i]represents the weights for the i-th layer. Each- ws[i]is a list containing eight matrices.- ws[i][j]corresponds to \(W_j\) in the equation. Only- ws[0][j]where- 0 <= j < 4are- (I, N)-shaped as they are multiplied with input variables, where- Iis the size of the input and- Nis the dimension of the hidden units. All other matrices are- (N, N)-shaped.
- bs (list of list of - Variable) – Bias vectors.- bs[i]represents the biases for the i-th layer. Each- bs[i]is a list containing eight vectors.- bs[i][j]corresponds to \(b_j\) in the equation. The shape of each matrix is- (N,)where- Nis the dimension of the hidden units.
- xs (list of - Variable) – A list of- Variableholding input values. Each element- xs[t]holds input value for time- t. Its shape is- (B_t, I), where- B_tis the mini-batch size for time- t. The sequences must be transposed.- transpose_sequence()can be used to transpose a list of- Variables each representing a sequence. When sequences has different lengths, they must be sorted in descending order of their lengths before transposing. So- xsneeds to satisfy- xs[t].shape[0] >= xs[t + 1].shape[0].
 
- Returns
- This function returns a tuple containing three elements, - hy,- cyand- ys.- hyis an updated hidden states whose shape is the same as- hx.
- cyis an updated cell states whose shape is the same as- cx.
- ysis a list of- Variable. Each element- ys[t]holds hidden states of the last layer corresponding to an input- xs[t]. Its shape is- (B_t, N)where- B_tis the mini-batch size for time- t, and- Nis size of hidden units. Note that- B_tis the same value as- xs[t].
 
- Return type
 - Note - The dimension of hidden units is limited to only one size - N. If you want to use variable dimension of hidden units, please use- chainer.functions.lstm.- See also - Example - >>> batchs = [3, 2, 1] # support variable length sequences >>> in_size, out_size, n_layers = 3, 2, 2 >>> dropout_ratio = 0.0 >>> xs = [np.ones((b, in_size)).astype(np.float32) for b in batchs] >>> [x.shape for x in xs] [(3, 3), (2, 3), (1, 3)] >>> h_shape = (n_layers, batchs[0], out_size) >>> hx = np.ones(h_shape).astype(np.float32) >>> cx = np.ones(h_shape).astype(np.float32) >>> w_in = lambda i, j: in_size if i == 0 and j < 4 else out_size >>> ws = [] >>> bs = [] >>> for n in range(n_layers): ... ws.append([np.ones((out_size, w_in(n, i))).astype(np.float32) for i in range(8)]) ... bs.append([np.ones((out_size,)).astype(np.float32) for _ in range(8)]) ... >>> ws[0][0].shape # ws[0][:4].shape are (out_size, in_size) (2, 3) >>> ws[1][0].shape # others are (out_size, out_size) (2, 2) >>> bs[0][0].shape (2,) >>> hy, cy, ys = F.n_step_lstm( ... n_layers, dropout_ratio, hx, cx, ws, bs, xs) >>> hy.shape (2, 3, 2) >>> cy.shape (2, 3, 2) >>> [y.shape for y in ys] [(3, 2), (2, 2), (1, 2)]