chainer.functions.n_step_bilstm

chainer.functions.n_step_bilstm(n_layers, dropout_ratio, hx, cx, ws, bs, xs)[source]

Stacked Bi-directional Long Short-Term Memory function.

This function calculates stacked Bi-directional LSTM with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\).

\[\begin{split}i^{f}_t &=& \sigma(W^{f}_0 x_t + W^{f}_4 h_{t-1} + b^{f}_0 + b^{f}_4), \\ f^{f}_t &=& \sigma(W^{f}_1 x_t + W^{f}_5 h_{t-1} + b^{f}_1 + b^{f}_5), \\ o^{f}_t &=& \sigma(W^{f}_2 x_t + W^{f}_6 h_{t-1} + b^{f}_2 + b^{f}_6), \\ a^{f}_t &=& \tanh(W^{f}_3 x_t + W^{f}_7 h_{t-1} + b^{f}_3 + b^{f}_7), \\ c^{f}_t &=& f^{f}_t \cdot c^{f}_{t-1} + i^{f}_t \cdot a^{f}_t, \\ h^{f}_t &=& o^{f}_t \cdot \tanh(c^{f}_t), \\ i^{b}_t &=& \sigma(W^{b}_0 x_t + W^{b}_4 h_{t-1} + b^{b}_0 + b^{b}_4), \\ f^{b}_t &=& \sigma(W^{b}_1 x_t + W^{b}_5 h_{t-1} + b^{b}_1 + b^{b}_5), \\ o^{b}_t &=& \sigma(W^{b}_2 x_t + W^{b}_6 h_{t-1} + b^{b}_2 + b^{b}_6), \\ a^{b}_t &=& \tanh(W^{b}_3 x_t + W^{b}_7 h_{t-1} + b^{b}_3 + b^{b}_7), \\ c^{b}_t &=& f^{b}_t \cdot c^{b}_{t-1} + i^{b}_t \cdot a^{b}_t, \\ h^{b}_t &=& o^{b}_t \cdot \tanh(c^{b}_t), \\ h_t &=& [h^{f}_t; h^{b}_t]\end{split}\]

where \(W^{f}\) is the weight matrices for forward-LSTM, \(W^{b}\) is weight matrices for backward-LSTM.

As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Eight weight matrices and eight bias vectors are required for each layer of each direction. So, when \(S\) layers exist, you need to prepare \(16S\) weight matrices and \(16S\) bias vectors.

If the number of layers n_layers is greater than \(1\), the input of the k-th layer is the hidden state h_t of the k-1-th layer. Note that all input variables except the first layer may have different shape from the first layer.

Parameters
  • n_layers (int) – The number of layers.

  • dropout_ratio (float) – Dropout ratio.

  • hx (Variable) – Variable holding stacked hidden states. Its shape is (2S, B, N) where S is the number of layers and is equal to n_layers, B is the mini-batch size, and N is the dimension of the hidden units. Because of bi-direction, the first dimension length is 2S.

  • cx (Variable) – Variable holding stacked cell states. It has the same shape as hx.

  • ws (list of list of Variable) – Weight matrices. ws[2 * l + m] represents the weights for the l-th layer of the m-th direction. (m == 0 means the forward direction and m == 1 means the backward direction.) Each ws[i] is a list containing eight matrices. ws[i][j] corresponds to \(W_j\) in the equation. ws[0][j] and ws[1][j] where 0 <= j < 4 are (I, N)-shaped because they are multiplied with input variables, where I is the size of the input. ws[i][j] where 2 <= i and 0 <= j < 4 are (N, 2N)-shaped because they are multiplied with two hidden layers \(h_t = [h^{f}_t; h^{b}_t]\). All other matrices are (N, N)-shaped.

  • bs (list of list of Variable) – Bias vectors. bs[2 * l + m] represents the weights for the l-th layer of m-th direction. (m == 0 means the forward direction and m == 1 means the backward direction.) Each bs[i] is a list containing eight vectors. bs[i][j] corresponds to \(b_j\) in the equation. The shape of each matrix is (N,).

  • xs (list of Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is the mini-batch size for time t. The sequences must be transposed. transpose_sequence() can be used to transpose a list of Variables each representing a sequence. When sequences has different lengths, they must be sorted in descending order of their lengths before transposing. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].

Returns

This function returns a tuple containing three elements, hy, cy and ys.

  • hy is an updated hidden states whose shape is the same as hx.

  • cy is an updated cell states whose shape is the same as cx.

  • ys is a list of Variable . Each element ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, 2N) where B_t is the mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type

tuple

Example

>>> batchs = [3, 2, 1]  # support variable length sequences
>>> in_size, out_size, n_layers = 3, 2, 2
>>> dropout_ratio = 0.0
>>> xs = [np.ones((b, in_size)).astype(np.float32) for b in batchs]
>>> [x.shape for x in xs]
[(3, 3), (2, 3), (1, 3)]
>>> h_shape = (n_layers * 2, batchs[0], out_size)
>>> hx = np.ones(h_shape).astype(np.float32)
>>> cx = np.ones(h_shape).astype(np.float32)
>>> def w_in(i, j):
...     if i == 0 and j < 4:
...         return in_size
...     elif i > 0 and j < 4:
...         return out_size * 2
...     else:
...         return out_size
...
>>> ws = []
>>> bs = []
>>> for n in range(n_layers):
...     for direction in (0, 1):
...         ws.append([np.ones((out_size, w_in(n, i))).astype(np.float32) for i in range(8)])
...         bs.append([np.ones((out_size,)).astype(np.float32) for _ in range(8)])
...
>>> ws[0][0].shape  # ws[0:2][:4].shape are (out_size, in_size)
(2, 3)
>>> ws[2][0].shape  # ws[2:][:4].shape are (out_size, 2 * out_size)
(2, 4)
>>> ws[0][4].shape  # others are (out_size, out_size)
(2, 2)
>>> bs[0][0].shape
(2,)
>>> hy, cy, ys = F.n_step_bilstm(
...     n_layers, dropout_ratio, hx, cx, ws, bs, xs)
>>> hy.shape
(4, 3, 2)
>>> cy.shape
(4, 3, 2)
>>> [y.shape for y in ys]
[(3, 4), (2, 4), (1, 4)]