tf.contrib.data.make_batched_features_dataset(
file_pattern,
batch_size,
features,
reader=tf.data.TFRecordDataset,
label_key=None,
reader_args=None,
num_epochs=None,
shuffle=True,
shuffle_buffer_size=10000,
shuffle_seed=None,
prefetch_buffer_size=optimization.AUTOTUNE,
reader_num_threads=1,
parser_num_threads=2,
sloppy_ordering=False,
drop_final_batch=False
)
Defined in tensorflow/contrib/data/python/ops/readers.py
.
Returns a Dataset
of feature dictionaries from Example
protos. (deprecated)
If label_key argument is provided, returns a Dataset
of tuple
comprising of feature dictionaries and label.
Example:
serialized_examples = [
features {
feature { key: "age" value { int64_list { value: [ 0 ] } } }
feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
feature { key: "kws" value { bytes_list { value: [ "code", "art" ] } } }
},
features {
feature { key: "age" value { int64_list { value: [] } } }
feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
feature { key: "kws" value { bytes_list { value: [ "sports" ] } } }
}
]
We can use arguments:
features: {
"age": FixedLenFeature([], dtype=tf.int64, default_value=-1),
"gender": FixedLenFeature([], dtype=tf.string),
"kws": VarLenFeature(dtype=tf.string),
}
And the expected output is:
{
"age": [[0], [-1]],
"gender": [["f"], ["f"]],
"kws": SparseTensor(
indices=[[0, 0], [0, 1], [1, 0]],
values=["code", "art", "sports"]
dense_shape=[2, 2]),
}
Args:
file_pattern
: List of files or patterns of file paths containingExample
records. Seetf.gfile.Glob
for pattern rules.batch_size
: An int representing the number of records to combine in a single batch.features
: Adict
mapping feature keys toFixedLenFeature
orVarLenFeature
values. Seetf.parse_example
.reader
: A function or class that can be called with afilenames
tensor and (optional)reader_args
and returns aDataset
ofExample
tensors. Defaults totf.data.TFRecordDataset
.label_key
: (Optional) A string corresponding to the key labels are stored intf.Examples
. If provided, it must be one of thefeatures
key, otherwise results inValueError
.reader_args
: Additional arguments to pass to the reader class.num_epochs
: Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever. Defaults toNone
.shuffle
: A boolean, indicates whether the input should be shuffled. Defaults toTrue
.shuffle_buffer_size
: Buffer size of the ShuffleDataset. A large capacity ensures better shuffling but would increase memory usage and startup time.shuffle_seed
: Randomization seed to use for shuffling.prefetch_buffer_size
: Number of feature batches to prefetch in order to improve performance. Recommended value is the number of batches consumed per training step. Defaults to auto-tune.reader_num_threads
: Number of threads used to readExample
records. If >1, the results will be interleaved.parser_num_threads
: Number of threads to use for parsingExample
tensors into a dictionary ofFeature
tensors.sloppy_ordering
: IfTrue
, reading performance will be improved at the cost of non-deterministic ordering. IfFalse
, the order of elements produced is deterministic prior to shuffling (elements are still randomized ifshuffle=True
. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults toFalse
.drop_final_batch
: IfTrue
, and the batch size does not evenly divide the input dataset size, the final smaller batch will be dropped. Defaults toFalse
.
Returns:
A dataset of dict
elements, (or a tuple of dict
elements and label).
Each dict
maps feature keys to Tensor
or SparseTensor
objects.
Raises:
ValueError
: Iflabel_key
is not one of thefeatures
keys.