chainer.functions.n_step_birnn¶
-
chainer.functions.n_step_birnn(n_layers, dropout_ratio, hx, ws, bs, xs, activation='tanh')[source]¶ Stacked Bi-directional RNN function for sequence inputs.
This function calculates stacked Bi-directional RNN with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\).
\[\begin{split}h^{f}_t &=& f(W^{f}_0 x_t + W^{f}_1 h_{t-1} + b^{f}_0 + b^{f}_1), \\ h^{b}_t &=& f(W^{b}_0 x_t + W^{b}_1 h_{t-1} + b^{b}_0 + b^{b}_1), \\ h_t &=& [h^{f}_t; h^{f}_t], \\\end{split}\]where \(f\) is an activation function.
Weight matrices \(W\) contains two matrices \(W^{f}\) and \(W^{b}\). \(W^{f}\) is weight matrices for forward directional RNN. \(W^{b}\) is weight matrices for backward directional RNN.
\(W^{f}\) contains \(W^{f}_0\) for an input sequence and \(W^{f}_1\) for a hidden state. \(W^{b}\) contains \(W^{b}_0\) for an input sequence and \(W^{b}_1\) for a hidden state.
Bias matrices \(b\) contains two matrices \(b^{f}\) and \(b^{f}\). \(b^{f}\) contains \(b^{f}_0\) for an input sequence and \(b^{f}_1\) for a hidden state. \(b^{b}\) contains \(b^{b}_0\) for an input sequence and \(b^{b}_1\) for a hidden state.
As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Two weight matrices and two bias vectors are required for each layer. So, when \(S\) layers exist, you need to prepare \(2S\) weight matrices and \(2S\) bias vectors.
If the number of layers
n_layersis greather than \(1\), input ofk-th layer is hidden stateh_tofk-1-th layer. Note that all input variables except first layer may have different shape from the first layer.- Parameters
n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is
(2S, B, N)whereSis number of layers and is equal ton_layers,Bis mini-batch size, andNis dimension of hidden units. Because of bi-direction, the first dimension length is2S.ws (list of list of chainer.Variable) – Weight matrices.
ws[i + di]represents weights for i-th layer. Note thatdi = 0for forward-RNN anddi = 1for backward-RNN. Eachws[i + di]is a list containing two matrices.ws[i + di][j]is corresponding withW^{f}_jifdi = 0and corresponding withW^{b}_jifdi = 1in the equation. Onlyws[0][j]andws[1][j]where0 <= j < 1are(I, N)shape as they are multiplied with input variables. All other matrices has(N, N)shape.bs (list of list of chainer.Variable) – Bias vectors.
bs[i + di]represnents biases for i-th layer. Note thatdi = 0for forward-RNN anddi = 1for backward-RNN. Eachbs[i + di]is a list containing two vectors.bs[i + di][j]is corresponding withb^{f}_jifdi = 0and corresponding withb^{b}_jifdi = 1in the equation. Shape of each matrix is(N,)whereNis dimension of hidden units.xs (list of chainer.Variable) – A list of
Variableholding input values. Each elementxs[t]holds input value for timet. Its shape is(B_t, I), whereB_tis mini-batch size for timet, andIis size of input units. Note that this function supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence.transpose_sequence()transpose a list ofVariable()holding sequence. Soxsneeds to satisfyxs[t].shape[0] >= xs[t + 1].shape[0].activation (str) – Activation function name. Please select
tanhorrelu.
- Returns
This function returns a tuple containing three elements,
hyandys.hyis an updated hidden states whose shape is same ashx.ysis a list ofVariable. Each elementys[t]holds hidden states of the last layer corresponding to an inputxs[t]. Its shape is(B_t, N)whereB_tis mini-batch size for timet, andNis size of hidden units. Note thatB_tis the same value asxs[t].
- Return type