chainer.functions.n_step_bilstm¶
-
chainer.functions.n_step_bilstm(n_layers, dropout_ratio, hx, cx, ws, bs, xs)[source]¶ Stacked Bi-directional Long Short-Term Memory function.
This function calculates stacked Bi-directional LSTM with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\).
\[\begin{split}i^{f}_t &=& \sigma(W^{f}_0 x_t + W^{f}_4 h_{t-1} + b^{f}_0 + b^{f}_4), \\ f^{f}_t &=& \sigma(W^{f}_1 x_t + W^{f}_5 h_{t-1} + b^{f}_1 + b^{f}_5), \\ o^{f}_t &=& \sigma(W^{f}_2 x_t + W^{f}_6 h_{t-1} + b^{f}_2 + b^{f}_6), \\ a^{f}_t &=& \tanh(W^{f}_3 x_t + W^{f}_7 h_{t-1} + b^{f}_3 + b^{f}_7), \\ c^{f}_t &=& f^{f}_t \cdot c^{f}_{t-1} + i^{f}_t \cdot a^{f}_t, \\ h^{f}_t &=& o^{f}_t \cdot \tanh(c^{f}_t), \\ i^{b}_t &=& \sigma(W^{b}_0 x_t + W^{b}_4 h_{t-1} + b^{b}_0 + b^{b}_4), \\ f^{b}_t &=& \sigma(W^{b}_1 x_t + W^{b}_5 h_{t-1} + b^{b}_1 + b^{b}_5), \\ o^{b}_t &=& \sigma(W^{b}_2 x_t + W^{b}_6 h_{t-1} + b^{b}_2 + b^{b}_6), \\ a^{b}_t &=& \tanh(W^{b}_3 x_t + W^{b}_7 h_{t-1} + b^{b}_3 + b^{b}_7), \\ c^{b}_t &=& f^{b}_t \cdot c^{b}_{t-1} + i^{b}_t \cdot a^{b}_t, \\ h^{b}_t &=& o^{b}_t \cdot \tanh(c^{b}_t), \\ h_t &=& [h^{f}_t; h^{b}_t]\end{split}\]where \(W^{f}\) is the weight matrices for forward-LSTM, \(W^{b}\) is weight matrices for backward-LSTM.
As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Eight weight matrices and eight bias vectors are required for each layer of each direction. So, when \(S\) layers exist, you need to prepare \(16S\) weight matrices and \(16S\) bias vectors.
If the number of layers
n_layersis greater than \(1\), the input of thek-th layer is the hidden stateh_tof thek-1-th layer. Note that all input variables except the first layer may have different shape from the first layer.- Parameters
n_layers (int) – The number of layers.
dropout_ratio (float) – Dropout ratio.
hx (Variable) – Variable holding stacked hidden states. Its shape is
(2S, B, N)whereSis the number of layers and is equal ton_layers,Bis the mini-batch size, andNis the dimension of the hidden units. Because of bi-direction, the first dimension length is2S.cx (Variable) – Variable holding stacked cell states. It has the same shape as
hx.ws (list of list of
Variable) – Weight matrices.ws[2 * l + m]represents the weights for the l-th layer of the m-th direction. (m == 0means the forward direction andm == 1means the backward direction.) Eachws[i]is a list containing eight matrices.ws[i][j]corresponds to \(W_j\) in the equation.ws[0][j]andws[1][j]where0 <= j < 4are(I, N)-shaped because they are multiplied with input variables, whereIis the size of the input.ws[i][j]where2 <= iand0 <= j < 4are(N, 2N)-shaped because they are multiplied with two hidden layers \(h_t = [h^{f}_t; h^{b}_t]\). All other matrices are(N, N)-shaped.bs (list of list of
Variable) – Bias vectors.bs[2 * l + m]represents the weights for the l-th layer of m-th direction. (m == 0means the forward direction andm == 1means the backward direction.) Eachbs[i]is a list containing eight vectors.bs[i][j]corresponds to \(b_j\) in the equation. The shape of each matrix is(N,).xs (list of
Variable) – A list ofVariableholding input values. Each elementxs[t]holds input value for timet. Its shape is(B_t, I), whereB_tis the mini-batch size for timet. The sequences must be transposed.transpose_sequence()can be used to transpose a list ofVariables each representing a sequence. When sequences has different lengths, they must be sorted in descending order of their lengths before transposing. Soxsneeds to satisfyxs[t].shape[0] >= xs[t + 1].shape[0].
- Returns
This function returns a tuple containing three elements,
hy,cyandys.hyis an updated hidden states whose shape is the same ashx.cyis an updated cell states whose shape is the same ascx.ysis a list ofVariable. Each elementys[t]holds hidden states of the last layer corresponding to an inputxs[t]. Its shape is(B_t, 2N)whereB_tis the mini-batch size for timet, andNis size of hidden units. Note thatB_tis the same value asxs[t].
- Return type
Example
>>> batchs = [3, 2, 1] # support variable length sequences >>> in_size, out_size, n_layers = 3, 2, 2 >>> dropout_ratio = 0.0 >>> xs = [np.ones((b, in_size)).astype(np.float32) for b in batchs] >>> [x.shape for x in xs] [(3, 3), (2, 3), (1, 3)] >>> h_shape = (n_layers * 2, batchs[0], out_size) >>> hx = np.ones(h_shape).astype(np.float32) >>> cx = np.ones(h_shape).astype(np.float32) >>> def w_in(i, j): ... if i == 0 and j < 4: ... return in_size ... elif i > 0 and j < 4: ... return out_size * 2 ... else: ... return out_size ... >>> ws = [] >>> bs = [] >>> for n in range(n_layers): ... for direction in (0, 1): ... ws.append([np.ones((out_size, w_in(n, i))).astype(np.float32) for i in range(8)]) ... bs.append([np.ones((out_size,)).astype(np.float32) for _ in range(8)]) ... >>> ws[0][0].shape # ws[0:2][:4].shape are (out_size, in_size) (2, 3) >>> ws[2][0].shape # ws[2:][:4].shape are (out_size, 2 * out_size) (2, 4) >>> ws[0][4].shape # others are (out_size, out_size) (2, 2) >>> bs[0][0].shape (2,) >>> hy, cy, ys = F.n_step_bilstm( ... n_layers, dropout_ratio, hx, cx, ws, bs, xs) >>> hy.shape (4, 3, 2) >>> cy.shape (4, 3, 2) >>> [y.shape for y in ys] [(3, 4), (2, 4), (1, 4)]