chainer.functions.n_step_bigru¶
-
chainer.functions.n_step_bigru(n_layers, dropout_ratio, hx, ws, bs, xs)[source]¶ Stacked Bi-directional Gated Recurrent Unit function.
This function calculates stacked Bi-directional GRU with sequences. This function gets an initial hidden state \(h_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) for each time \(t\) from input \(x_t\).
\[\begin{split}r^{f}_t &= \sigma(W^{f}_0 x_t + W^{f}_3 h_{t-1} + b^{f}_0 + b^{f}_3) \\ z^{f}_t &= \sigma(W^{f}_1 x_t + W^{f}_4 h_{t-1} + b^{f}_1 + b^{f}_4) \\ h^{f'}_t &= \tanh(W^{f}_2 x_t + b^{f}_2 + r^{f}_t \cdot (W^{f}_5 h_{t-1} + b^{f}_5)) \\ h^{f}_t &= (1 - z^{f}_t) \cdot h^{f'}_t + z^{f}_t \cdot h_{t-1} \\ r^{b}_t &= \sigma(W^{b}_0 x_t + W^{b}_3 h_{t-1} + b^{b}_0 + b^{b}_3) \\ z^{b}_t &= \sigma(W^{b}_1 x_t + W^{b}_4 h_{t-1} + b^{b}_1 + b^{b}_4) \\ h^{b'}_t &= \tanh(W^{b}_2 x_t + b^{b}_2 + r^{b}_t \cdot (W^{b}_5 h_{t-1} + b^{b}_5)) \\ h^{b}_t &= (1 - z^{b}_t) \cdot h^{b'}_t + z^{b}_t \cdot h_{t-1} \\ h_t &= [h^{f}_t; h^{b}_t] \\\end{split}\]where \(W^{f}\) is weight matrices for forward-GRU, \(W^{b}\) is weight matrices for backward-GRU.
As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Six weight matrices and six bias vectors are required for each layers. So, when \(S\) layers exists, you need to prepare \(6S\) weight matrices and \(6S\) bias vectors.
If the number of layers
n_layersis greather than \(1\), input ofk-th layer is hidden stateh_tofk-1-th layer. Note that all input variables except first layer may have different shape from the first layer.- Parameters
n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is
(2S, B, N)whereSis number of layers and is equal ton_layers,Bis mini-batch size, andNis dimension of hidden units.ws (list of list of chainer.Variable) – Weight matrices.
ws[i]represents weights for i-th layer. Eachws[i]is a list containing six matrices.ws[i][j]is corresponding withW_jin the equation. Onlyws[0][j]where0 <= j < 3is(I, N)shape as they are multiplied with input variables. All other matrices has(N, N)shape.bs (list of list of chainer.Variable) – Bias vectors.
bs[i]represnents biases for i-th layer. Eachbs[i]is a list containing six vectors.bs[i][j]is corresponding withb_jin the equation. Shape of each matrix is(N,)whereNis dimension of hidden units.xs (list of chainer.Variable) – A list of
Variableholding input values. Each elementxs[t]holds input value for timet. Its shape is(B_t, I), whereB_tis mini-batch size for timet, andIis size of input units. Note that this function supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence.transpose_sequence()transpose a list ofVariable()holding sequence. Soxsneeds to satisfyxs[t].shape[0] >= xs[t + 1].shape[0].use_bi_direction (bool) – If
True, this function uses Bi-direction GRU.
- Returns
This function returns a tuple containing three elements,
hyandys.hyis an updated hidden states whose shape is same ashx.ysis a list ofVariable. Each elementys[t]holds hidden states of the last layer corresponding to an inputxs[t]. Its shape is(B_t, N)whereB_tis mini-batch size for timet, andNis size of hidden units. Note thatB_tis the same value asxs[t].
- Return type