View source on GitHub |
Returns a Dataset
of feature dictionaries from Example
protos.
tf.data.experimental.make_batched_features_dataset(
file_pattern, batch_size, features, reader=None, label_key=None,
reader_args=None, num_epochs=None, shuffle=True, shuffle_buffer_size=10000,
shuffle_seed=None, prefetch_buffer_size=None, reader_num_threads=None,
parser_num_threads=None, sloppy_ordering=False, drop_final_batch=False
)
If label_key argument is provided, returns a Dataset
of tuple
comprising of feature dictionaries and label.
serialized_examples = [
features {
feature { key: "age" value { int64_list { value: [ 0 ] } } }
feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
feature { key: "kws" value { bytes_list { value: [ "code", "art" ] } } }
},
features {
feature { key: "age" value { int64_list { value: [] } } }
feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
feature { key: "kws" value { bytes_list { value: [ "sports" ] } } }
}
]
features: {
"age": FixedLenFeature([], dtype=tf.int64, default_value=-1),
"gender": FixedLenFeature([], dtype=tf.string),
"kws": VarLenFeature(dtype=tf.string),
}
And the expected output is:
{
"age": [[0], [-1]],
"gender": [["f"], ["f"]],
"kws": SparseTensor(
indices=[[0, 0], [0, 1], [1, 0]],
values=["code", "art", "sports"]
dense_shape=[2, 2]),
}
file_pattern
: List of files or patterns of file paths containing
Example
records. See tf.io.gfile.glob
for pattern rules.batch_size
: An int representing the number of records to combine
in a single batch.features
: A dict
mapping feature keys to FixedLenFeature
or
VarLenFeature
values. See tf.io.parse_example
.reader
: A function or class that can be
called with a filenames
tensor and (optional) reader_args
and returns
a Dataset
of Example
tensors. Defaults to tf.data.TFRecordDataset
.label_key
: (Optional) A string corresponding to the key labels are stored in
tf.Examples
. If provided, it must be one of the features
key,
otherwise results in ValueError
.reader_args
: Additional arguments to pass to the reader class.num_epochs
: Integer specifying the number of times to read through the
dataset. If None, cycles through the dataset forever. Defaults to None
.shuffle
: A boolean, indicates whether the input should be shuffled. Defaults
to True
.shuffle_buffer_size
: Buffer size of the ShuffleDataset. A large capacity
ensures better shuffling but would increase memory usage and startup time.shuffle_seed
: Randomization seed to use for shuffling.prefetch_buffer_size
: Number of feature batches to prefetch in order to
improve performance. Recommended value is the number of batches consumed
per training step. Defaults to auto-tune.reader_num_threads
: Number of threads used to read Example
records. If >1,
the results will be interleaved. Defaults to 1
.parser_num_threads
: Number of threads to use for parsing Example
tensors
into a dictionary of Feature
tensors. Defaults to 2
.sloppy_ordering
: If True
, reading performance will be improved at
the cost of non-deterministic ordering. If False
, the order of elements
produced is deterministic prior to shuffling (elements are still
randomized if shuffle=True
. Note that if the seed is set, then order
of elements after shuffling is deterministic). Defaults to False
.drop_final_batch
: If True
, and the batch size does not evenly divide the
input dataset size, the final smaller batch will be dropped. Defaults to
False
.A dataset of dict
elements, (or a tuple of dict
elements and label).
Each dict
maps feature keys to Tensor
or SparseTensor
objects.
TypeError
: If reader
is a tf.compat.v1.ReaderBase
subclass.ValueError
: If label_key
is not one of the features
keys.