chainer.functions.n_step_birnn¶

chainer.functions.n_step_birnn(n_layers, dropout_ratio, hx, ws, bs, xs, activation='tanh')[source]¶

Stacked Bi-directional RNN function for sequence inputs.

This function calculates stacked Bi-directional RNN with sequences. This function gets an initial hidden state $h_0$ , an initial cell state $c_0$ , an input sequence $x$ , weight matrices $W$ , and bias vectors $b$ . This function calculates hidden states $h_t$ and $c_t$ for each time $t$ from input $x_t$ .

$\begin{split}h^{f}_t &=& f(W^{f}_0 x_t + W^{f}_1 h_{t-1} + b^{f}_0 + b^{f}_1), \\ h^{b}_t &=& f(W^{b}_0 x_t + W^{b}_1 h_{t-1} + b^{b}_0 + b^{b}_1), \\ h_t &=& [h^{f}_t; h^{f}_t], \\\end{split}$

where $f$ is an activation function.

Weight matrices $W$ contains two matrices $W^{f}$ and $W^{b}$ . $W^{f}$ is weight matrices for forward directional RNN. $W^{b}$ is weight matrices for backward directional RNN.

$W^{f}$ contains $W^{f}_0$ for an input sequence and $W^{f}_1$ for a hidden state. $W^{b}$ contains $W^{b}_0$ for an input sequence and $W^{b}_1$ for a hidden state.

Bias matrices $b$ contains two matrices $b^{f}$ and $b^{f}$ . $b^{f}$ contains $b^{f}_0$ for an input sequence and $b^{f}_1$ for a hidden state. $b^{b}$ contains $b^{b}_0$ for an input sequence and $b^{b}_1$ for a hidden state.

As the function accepts a sequence, it calculates $h_t$ for all $t$ with one call. Two weight matrices and two bias vectors are required for each layer. So, when $S$ layers exist, you need to prepare $2S$ weight matrices and $2S$ bias vectors.

If the number of layers n_layers is greather than $1$ , input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Parameters

n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is (2S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimension of hidden units. Because of bi-direction, the first dimension length is 2S.
ws (list of list of chainer.Variable) – Weight matrices. ws[i + di] represents weights for i-th layer. Note that di = 0 for forward-RNN and di = 1 for backward-RNN. Each ws[i + di] is a list containing two matrices. ws[i + di][j] is corresponding with W^{f}_j if di = 0 and corresponding with W^{b}_j if di = 1 in the equation. Only ws[0][j] and ws[1][j] where 0 <= j < 1 are (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.
bs (list of list of chainer.Variable) – Bias vectors. bs[i + di] represnents biases for i-th layer. Note that di = 0 for forward-RNN and di = 1 for backward-RNN. Each bs[i + di] is a list containing two vectors. bs[i + di][j] is corresponding with b^{f}_j if di = 0 and corresponding with b^{b}_j if di = 1 in the equation. Shape of each matrix is (N,) where N is dimension of hidden units.
xs (list of chainer.Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this function supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].
activation (str) – Activation function name. Please select tanh or relu.

Returns

This function returns a tuple containing three elements, hy and ys.

hy is an updated hidden states whose shape is same as hx.
ys is a list of Variable . Each element ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type

tuple