tf.data.experimental.bucket_by_sequence_length

View source on GitHub

A transformation that buckets elements in a Dataset by length.

tf.data.experimental.bucket_by_sequence_length(
    element_length_func, bucket_boundaries, bucket_batch_sizes, padded_shapes=None,
    padding_values=None, pad_to_bucket_boundary=False, no_padding=False,
    drop_remainder=False
)

Elements of the Dataset are grouped together by length and then are padded and batched.

This is useful for sequence tasks in which the elements have variable length. Grouping together elements that have similar lengths reduces the total fraction of padding in a batch which increases training step efficiency.

Args:

Returns:

A Dataset transformation function, which can be passed to tf.data.Dataset.apply.

Raises: