tf.contrib.data.make_batched_features_dataset

tf.contrib.data.make_batched_features_dataset(
    file_pattern,
    batch_size,
    features,
    reader=tf.data.TFRecordDataset,
    label_key=None,
    reader_args=None,
    num_epochs=None,
    shuffle=True,
    shuffle_buffer_size=10000,
    shuffle_seed=None,
    prefetch_buffer_size=optimization.AUTOTUNE,
    reader_num_threads=1,
    parser_num_threads=2,
    sloppy_ordering=False,
    drop_final_batch=False
)

Defined in tensorflow/contrib/data/python/ops/readers.py.

Returns a Dataset of feature dictionaries from Example protos. (deprecated)

If label_key argument is provided, returns a Dataset of tuple comprising of feature dictionaries and label.

Example:

serialized_examples = [
  features {
    feature { key: "age" value { int64_list { value: [ 0 ] } } }
    feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
    feature { key: "kws" value { bytes_list { value: [ "code", "art" ] } } }
  },
  features {
    feature { key: "age" value { int64_list { value: [] } } }
    feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
    feature { key: "kws" value { bytes_list { value: [ "sports" ] } } }
  }
]

We can use arguments:

features: {
  "age": FixedLenFeature([], dtype=tf.int64, default_value=-1),
  "gender": FixedLenFeature([], dtype=tf.string),
  "kws": VarLenFeature(dtype=tf.string),
}

And the expected output is:

{
  "age": [[0], [-1]],
  "gender": [["f"], ["f"]],
  "kws": SparseTensor(
    indices=[[0, 0], [0, 1], [1, 0]],
    values=["code", "art", "sports"]
    dense_shape=[2, 2]),
}

Args:

file_pattern: List of files or patterns of file paths containing Example records. See tf.gfile.Glob for pattern rules.
batch_size: An int representing the number of records to combine in a single batch.
features: A dict mapping feature keys to FixedLenFeature or VarLenFeature values. See tf.parse_example.
reader: A function or class that can be called with a filenames tensor and (optional) reader_args and returns a Dataset of Example tensors. Defaults to tf.data.TFRecordDataset.
label_key: (Optional) A string corresponding to the key labels are stored in tf.Examples. If provided, it must be one of the features key, otherwise results in ValueError.
reader_args: Additional arguments to pass to the reader class.
num_epochs: Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever. Defaults to None.
shuffle: A boolean, indicates whether the input should be shuffled. Defaults to True.
shuffle_buffer_size: Buffer size of the ShuffleDataset. A large capacity ensures better shuffling but would increase memory usage and startup time.
shuffle_seed: Randomization seed to use for shuffling.
prefetch_buffer_size: Number of feature batches to prefetch in order to improve performance. Recommended value is the number of batches consumed per training step. Defaults to auto-tune.
reader_num_threads: Number of threads used to read Example records. If >1, the results will be interleaved.
parser_num_threads: Number of threads to use for parsing Example tensors into a dictionary of Feature tensors.
sloppy_ordering: If True, reading performance will be improved at the cost of non-deterministic ordering. If False, the order of elements produced is deterministic prior to shuffling (elements are still randomized if shuffle=True. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to False.
drop_final_batch: If True, and the batch size does not evenly divide the input dataset size, the final smaller batch will be dropped. Defaults to False.

Returns:

A dataset of dict elements, (or a tuple of dict elements and label). Each dict maps feature keys to Tensor or SparseTensor objects.

Raises:

ValueError: If label_key is not one of the features keys.