tf.contrib.seq2seq.BahdanauMonotonicAttention

Class `BahdanauMonotonicAttention`

Defined in tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py.

Monotonic attention mechanism with Bahadanau-style energy function.

This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can't attend to any prior points at subsequence output timesteps. It achieves this by using the _monotonic_probability_fn instead of softmax to construct its attention distributions. Since the attention scores are passed through a sigmoid, a learnable scalar bias parameter is applied after the score function and before the sigmoid. Otherwise, it is equivalent to BahdanauAttention. This approach is proposed in

Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, "Online and Linear-Time Attention by Enforcing Monotonic Alignments." ICML 2017. https://arxiv.org/abs/1704.00784

`init`

__init__(
    num_units,
    memory,
    memory_sequence_length=None,
    normalize=False,
    score_mask_value=None,
    sigmoid_noise=0.0,
    sigmoid_noise_seed=None,
    score_bias_init=0.0,
    mode='parallel',
    dtype=None,
    name='BahdanauMonotonicAttention'
)

Construct the Attention mechanism.

Args:

num_units: The depth of the query mechanism.
memory: The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...]. memory_sequence_length (optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
normalize: Python boolean. Whether to normalize the energy term.
score_mask_value: (optional): The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
sigmoid_noise: Standard deviation of pre-sigmoid noise. See the docstring for _monotonic_probability_fn for more information.
sigmoid_noise_seed: (optional) Random seed for pre-sigmoid noise.
score_bias_init: Initial value for score bias scalar. It's recommended to initialize this to a negative value when the length of the memory is large.
mode: How to compute the attention distribution. Must be one of 'recursive', 'parallel', or 'hard'. See the docstring for tf.contrib.seq2seq.monotonic_attention for more information.
dtype: The data type for the query and memory layers of the attention mechanism.
name: Name to use when creating ops.

Properties

`alignments_size`

`batch_size`

`keys`

`memory_layer`

`query_layer`

`state_size`

`values`

Methods