tf.contrib.framework.RecordInput

Class RecordInput

Defined in tensorflow/python/ops/data_flow_ops.py.

RecordInput asynchronously reads and randomly yields TFRecords.

A RecordInput Op will continuously read a batch of records asynchronously into a buffer of some fixed capacity. It can also asynchronously yield random records from this buffer.

It will not start yielding until at least buffer_size / 2 elements have been placed into the buffer so that sufficient randomization can take place.

The order the files are read will be shifted each epoch by shift_amount so that the data is presented in a different order every epoch.

__init__

__init__(
    file_pattern,
    batch_size=1,
    buffer_size=1,
    parallelism=1,
    shift_ratio=0,
    seed=0,
    name=None,
    batches=None,
    compression_type=None
)

Constructs a RecordInput Op.

Args:

  • file_pattern: File path to the dataset, possibly containing wildcards. All matching files will be iterated over each epoch.
  • batch_size: How many records to return at a time.
  • buffer_size: The maximum number of records the buffer will contain.
  • parallelism: How many reader threads to use for reading from files.
  • shift_ratio: What percentage of the total number files to move the start file forward by each epoch.
  • seed: Specify the random number seed used by generator that randomizes records.
  • name: Optional name for the operation.
  • batches: None by default, creating a single batch op. Otherwise specifies how many batches to create, which are returned as a list when get_yield_op() is called. An example use case is to split processing between devices on one computer.
  • compression_type: The type of compression for the file. Currently ZLIB and GZIP are supported. Defaults to none.

Raises:

  • ValueError: If one of the arguments is invalid.

Methods

tf.contrib.framework.RecordInput.get_yield_op

get_yield_op()

Adds a node that yields a group of records every time it is executed. If RecordInput batches parameter is not None, it yields a list of record batches with the specified batch_size.