chainer.functions.n_step_birnn¶
-
chainer.functions.
n_step_birnn
(n_layers, dropout_ratio, hx, ws, bs, xs, activation='tanh')[source]¶ Stacked Bi-directional RNN function for sequence inputs.
This function calculates stacked Bi-directional RNN with sequences. This function gets an initial hidden state h0, an initial cell state c0, an input sequence x, weight matrices W, and bias vectors b. This function calculates hidden states ht and ct for each time t from input xt.
hft=f(Wf0xt+Wf1ht−1+bf0+bf1),hbt=f(Wb0xt+Wb1ht−1+bb0+bb1),ht=[hft;hft],where f is an activation function.
Weight matrices W contains two matrices Wf and Wb. Wf is weight matrices for forward directional RNN. Wb is weight matrices for backward directional RNN.
Wf contains Wf0 for an input sequence and Wf1 for a hidden state. Wb contains Wb0 for an input sequence and Wb1 for a hidden state.
Bias matrices b contains two matrices bf and bf. bf contains bf0 for an input sequence and bf1 for a hidden state. bb contains bb0 for an input sequence and bb1 for a hidden state.
As the function accepts a sequence, it calculates ht for all t with one call. Two weight matrices and two bias vectors are required for each layer. So, when S layers exist, you need to prepare 2S weight matrices and 2S bias vectors.
If the number of layers
n_layers
is greather than 1, input ofk
-th layer is hidden stateh_t
ofk-1
-th layer. Note that all input variables except first layer may have different shape from the first layer.- Parameters
n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is
(2S, B, N)
whereS
is number of layers and is equal ton_layers
,B
is mini-batch size, andN
is dimension of hidden units. Because of bi-direction, the first dimension length is2S
.ws (list of list of chainer.Variable) – Weight matrices.
ws[i + di]
represents weights for i-th layer. Note thatdi = 0
for forward-RNN anddi = 1
for backward-RNN. Eachws[i + di]
is a list containing two matrices.ws[i + di][j]
is corresponding withW^{f}_j
ifdi = 0
and corresponding withW^{b}_j
ifdi = 1
in the equation. Onlyws[0][j]
andws[1][j]
where0 <= j < 1
are(I, N)
shape as they are multiplied with input variables. All other matrices has(N, N)
shape.bs (list of list of chainer.Variable) – Bias vectors.
bs[i + di]
represnents biases for i-th layer. Note thatdi = 0
for forward-RNN anddi = 1
for backward-RNN. Eachbs[i + di]
is a list containing two vectors.bs[i + di][j]
is corresponding withb^{f}_j
ifdi = 0
and corresponding withb^{b}_j
ifdi = 1
in the equation. Shape of each matrix is(N,)
whereN
is dimension of hidden units.xs (list of chainer.Variable) – A list of
Variable
holding input values. Each elementxs[t]
holds input value for timet
. Its shape is(B_t, I)
, whereB_t
is mini-batch size for timet
, andI
is size of input units. Note that this function supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence.transpose_sequence()
transpose a list ofVariable()
holding sequence. Soxs
needs to satisfyxs[t].shape[0] >= xs[t + 1].shape[0]
.activation (str) – Activation function name. Please select
tanh
orrelu
.
- Returns
This function returns a tuple containing three elements,
hy
andys
.hy
is an updated hidden states whose shape is same ashx
.ys
is a list ofVariable
. Each elementys[t]
holds hidden states of the last layer corresponding to an inputxs[t]
. Its shape is(B_t, N)
whereB_t
is mini-batch size for timet
, andN
is size of hidden units. Note thatB_t
is the same value asxs[t]
.
- Return type