tf.io.decode_proto

The op extracts fields from a serialized protocol buffers message into tensors.

tf.io.decode_proto(
    bytes, message_type, field_names, output_types, descriptor_source='local://',
    message_format='binary', sanitize=False, name=None
)

The decode_proto op extracts fields from a serialized protocol buffers message into tensors. The fields in field_names are decoded and converted to the corresponding output_types if possible.

A message_type name must be provided to give context for the field names. The actual message descriptor can be looked up either in the linked-in descriptor pool or a filename provided by the caller using the descriptor_source attribute.

Each output tensor is a dense tensor. This means that it is padded to hold the largest number of repeated elements seen in the input minibatch. (The shape is also padded by one to prevent zero-sized dimensions). The actual repeat counts for each example in the minibatch can be found in the sizes output. In many cases the output of decode_proto is fed immediately into tf.squeeze if missing values are not a concern. When using tf.squeeze, always pass the squeeze dimension explicitly to avoid surprises.

For the most part, the mapping between Proto field types and TensorFlow dtypes is straightforward. However, there are a few special cases:

Both binary and text proto serializations are supported, and can be chosen using the format attribute.

The descriptor_source attribute selects the source of protocol descriptors to consult when looking up message_type. This may be:

Args:

Returns:

A tuple of Tensor objects (sizes, values).