View source on GitHub |
Fused implementation of map
and batch
. (deprecated)
tf.data.experimental.map_and_batch(
map_func, batch_size, num_parallel_batches=None, drop_remainder=False,
num_parallel_calls=None
)
Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.map(map_func, num_parallel_calls)
followed by tf.data.Dataset.batch(batch_size, drop_remainder)
. Static tf.data optimizations will take care of using the fused implementation.
Maps map_func
across batch_size
consecutive elements of this dataset
and then combines them into a batch. Functionally, it is equivalent to map
followed by batch
. However, by fusing the two transformations together, the
implementation can be more efficient. Surfacing this transformation in the API
is temporary. Once automatic input pipeline optimization is implemented,
the fusing of map
and batch
will happen automatically and this API will be
deprecated.
map_func
: A function mapping a nested structure of tensors to another
nested structure of tensors.batch_size
: A tf.int64
scalar tf.Tensor
, representing the number of
consecutive elements of this dataset to combine in a single batch.num_parallel_batches
: (Optional.) A tf.int64
scalar tf.Tensor
,
representing the number of batches to create in parallel. On one hand,
higher values can help mitigate the effect of stragglers. On the other
hand, higher values can increase contention if CPU is scarce.drop_remainder
: (Optional.) A tf.bool
scalar tf.Tensor
, representing
whether the last batch should be dropped in case its size is smaller than
desired; the default behavior is not to drop the smaller batch.num_parallel_calls
: (Optional.) A tf.int32
scalar tf.Tensor
,
representing the number of elements to process in parallel. If not
specified, batch_size * num_parallel_batches
elements will be processed
in parallel. If the value tf.data.experimental.AUTOTUNE
is used, then
the number of parallel calls is set dynamically based on available CPU.A Dataset
transformation function, which can be passed to
tf.data.Dataset.apply
.
ValueError
: If both num_parallel_batches
and num_parallel_calls
are
specified.