Class BigtableTable
Aliases:
- Class
tf.contrib.bigtable.BigtableTable
- Class
tf.contrib.cloud.BigtableTable
Defined in tensorflow/contrib/bigtable/python/ops/bigtable_api.py
.
Entry point for reading and writing data in Cloud Bigtable.
This BigtableTable class is the Python representation of the Cloud Bigtable table within TensorFlow. Methods on this class allow data to be read from and written to the Cloud Bigtable service in flexible and high performance manners.
__init__
__init__(
name,
snapshot,
resource
)
Initialize self. See help(type(self)) for accurate signature.
Methods
tf.contrib.bigtable.BigtableTable.keys_by_prefix_dataset
keys_by_prefix_dataset(prefix)
Retrieves the row keys matching a given prefix.
Args:
prefix
: All row keys that begin withprefix
in the table will be retrieved.
Returns:
A tf.data.Dataset
. containing tf.string
Tensors corresponding to all
of the row keys matching that prefix.
tf.contrib.bigtable.BigtableTable.keys_by_range_dataset
keys_by_range_dataset(
start,
end
)
Retrieves all row keys between start and end.
Args:
start
: The start row key. The row keys for rows after start (inclusive) will be retrieved.end
: (Optional.) The end row key. Rows up to (but not including) end will be retrieved. If end is None, all subsequent row keys will be retrieved.
Returns:
A tf.data.Dataset
containing tf.string
Tensors corresponding to all
of the row keys between start
and end
.
tf.contrib.bigtable.BigtableTable.lookup_columns
lookup_columns(
*args,
**kwargs
)
Retrieves the values of columns for a dataset of keys.
Example usage:
table = bigtable_client.table("my_table")
key_dataset = table.get_keys_prefix("imagenet")
images = key_dataset.apply(table.lookup_columns(("cf1", "image"),
("cf2", "label"),
("cf2", "boundingbox")))
training_data = images.map(parse_and_crop, num_parallel_calls=64).batch(128)
Alternatively, you can use keyword arguments to specify the columns to capture. Example (same as above, rewritten):
table = bigtable_client.table("my_table")
key_dataset = table.get_keys_prefix("imagenet")
images = key_dataset.apply(table.lookup_columns(
cf1="image", cf2=("label", "boundingbox")))
training_data = images.map(parse_and_crop, num_parallel_calls=64).batch(128)
- 'name'
Args:
*args
: A list of tuples containing (column family, column name) pairs.**kwargs
: Column families (keys) and column qualifiers (values).
Returns:
A function that can be passed to tf.data.Dataset.apply
to retrieve the
values of columns for the rows.
tf.contrib.bigtable.BigtableTable.parallel_scan_prefix
parallel_scan_prefix(
prefix,
num_parallel_scans=None,
probability=None,
columns=None,
**kwargs
)
Retrieves row (including values) from the Bigtable service at high speed.
Rows with row-key prefixed by prefix
will be retrieved. This method is
similar to scan_prefix
, but by contrast performs multiple sub-scans in
parallel in order to achieve higher performance.
Specifying the columns to retrieve for each row is done by either using
kwargs or in the columns parameter. To retrieve values of the columns "c1",
and "c2" from the column family "cfa", and the value of the column "c3"
from column family "cfb", the following datasets (ds1
, and ds2
) are
equivalent:
table = # ...
ds1 = table.parallel_scan_prefix("row_prefix", columns=[("cfa", "c1"),
("cfa", "c2"),
("cfb", "c3")])
ds2 = table.parallel_scan_prefix("row_prefix", cfa=["c1", "c2"], cfb="c3")
Args:
prefix
: The prefix all row keys must match to be retrieved for prefix- based scans.num_parallel_scans
: (Optional.) The number of concurrent scans against the Cloud Bigtable instance.probability
: (Optional.) A float between 0 (exclusive) and 1 (inclusive). A non-1 value indicates to probabilistically sample rows with the provided probability.columns
: The columns to read. Note: most commonly, they are expressed as kwargs. Use the columns value if you are using column families that are reserved. The value of columns and kwargs are merged. Columns is a list of tuples of strings ("column_family", "column_qualifier").**kwargs
: The column families and columns to read. Keys are treated as column_families, and values can be either lists of strings, or strings that are treated as the column qualifier (column name).
Returns:
A tf.data.Dataset
returning the row keys and the cell contents.
Raises:
ValueError
: If the configured probability is unexpected.
tf.contrib.bigtable.BigtableTable.parallel_scan_range
parallel_scan_range(
start,
end,
num_parallel_scans=None,
probability=None,
columns=None,
**kwargs
)
Retrieves rows (including values) from the Bigtable service.
Rows with row-keys between start
and end
will be retrieved. This method
is similar to scan_range
, but by contrast performs multiple sub-scans in
parallel in order to achieve higher performance.
Specifying the columns to retrieve for each row is done by either using
kwargs or in the columns parameter. To retrieve values of the columns "c1",
and "c2" from the column family "cfa", and the value of the column "c3"
from column family "cfb", the following datasets (ds1
, and ds2
) are
equivalent:
table = # ...
ds1 = table.parallel_scan_range("row_start",
"row_end",
columns=[("cfa", "c1"),
("cfa", "c2"),
("cfb", "c3")])
ds2 = table.parallel_scan_range("row_start", "row_end",
cfa=["c1", "c2"], cfb="c3")
Args:
start
: The start of the range when scanning by range.end
: (Optional.) The end of the range when scanning by range.num_parallel_scans
: (Optional.) The number of concurrent scans against the Cloud Bigtable instance.probability
: (Optional.) A float between 0 (exclusive) and 1 (inclusive). A non-1 value indicates to probabilistically sample rows with the provided probability.columns
: The columns to read. Note: most commonly, they are expressed as kwargs. Use the columns value if you are using column families that are reserved. The value of columns and kwargs are merged. Columns is a list of tuples of strings ("column_family", "column_qualifier").**kwargs
: The column families and columns to read. Keys are treated as column_families, and values can be either lists of strings, or strings that are treated as the column qualifier (column name).
Returns:
A tf.data.Dataset
returning the row keys and the cell contents.
Raises:
ValueError
: If the configured probability is unexpected.
tf.contrib.bigtable.BigtableTable.sample_keys
sample_keys()
Retrieves a sampling of row keys from the Bigtable table.
This dataset is most often used in conjunction with
tf.data.experimental.parallel_interleave
to construct a set of ranges for
scanning in parallel.
Returns:
A tf.data.Dataset
returning string row keys.
tf.contrib.bigtable.BigtableTable.scan_prefix
scan_prefix(
prefix,
probability=None,
columns=None,
**kwargs
)
Retrieves row (including values) from the Bigtable service.
Rows with row-key prefixed by prefix
will be retrieved.
Specifying the columns to retrieve for each row is done by either using
kwargs or in the columns parameter. To retrieve values of the columns "c1",
and "c2" from the column family "cfa", and the value of the column "c3"
from column family "cfb", the following datasets (ds1
, and ds2
) are
equivalent:
table = # ...
ds1 = table.scan_prefix("row_prefix", columns=[("cfa", "c1"),
("cfa", "c2"),
("cfb", "c3")])
ds2 = table.scan_prefix("row_prefix", cfa=["c1", "c2"], cfb="c3")
Args:
prefix
: The prefix all row keys must match to be retrieved for prefix- based scans.probability
: (Optional.) A float between 0 (exclusive) and 1 (inclusive). A non-1 value indicates to probabilistically sample rows with the provided probability.columns
: The columns to read. Note: most commonly, they are expressed as kwargs. Use the columns value if you are using column families that are reserved. The value of columns and kwargs are merged. Columns is a list of tuples of strings ("column_family", "column_qualifier").**kwargs
: The column families and columns to read. Keys are treated as column_families, and values can be either lists of strings, or strings that are treated as the column qualifier (column name).
Returns:
A tf.data.Dataset
returning the row keys and the cell contents.
Raises:
ValueError
: If the configured probability is unexpected.
tf.contrib.bigtable.BigtableTable.scan_range
scan_range(
start,
end,
probability=None,
columns=None,
**kwargs
)
Retrieves rows (including values) from the Bigtable service.
Rows with row-keys between start
and end
will be retrieved.
Specifying the columns to retrieve for each row is done by either using
kwargs or in the columns parameter. To retrieve values of the columns "c1",
and "c2" from the column family "cfa", and the value of the column "c3"
from column family "cfb", the following datasets (ds1
, and ds2
) are
equivalent:
table = # ...
ds1 = table.scan_range("row_start", "row_end", columns=[("cfa", "c1"),
("cfa", "c2"),
("cfb", "c3")])
ds2 = table.scan_range("row_start", "row_end", cfa=["c1", "c2"], cfb="c3")
Args:
start
: The start of the range when scanning by range.end
: (Optional.) The end of the range when scanning by range.probability
: (Optional.) A float between 0 (exclusive) and 1 (inclusive). A non-1 value indicates to probabilistically sample rows with the provided probability.columns
: The columns to read. Note: most commonly, they are expressed as kwargs. Use the columns value if you are using column families that are reserved. The value of columns and kwargs are merged. Columns is a list of tuples of strings ("column_family", "column_qualifier").**kwargs
: The column families and columns to read. Keys are treated as column_families, and values can be either lists of strings, or strings that are treated as the column qualifier (column name).
Returns:
A tf.data.Dataset
returning the row keys and the cell contents.
Raises:
ValueError
: If the configured probability is unexpected.
tf.contrib.bigtable.BigtableTable.write
write(
dataset,
column_families,
columns,
timestamp=None
)
Writes a dataset to the table.
Args:
dataset
: Atf.data.Dataset
to be written to this table. It must produce a list of number-of-columns+1 elements, all of which must be strings. The first value will be used as the row key, and subsequent values will be used as cell values for the corresponding columns from the corresponding column_families and columns entries.column_families
: Atf.Tensor
oftf.string
s corresponding to the column names to store the dataset's elements into.columns
: Atf.Tensor
oftf.string
s corresponding to the column names to store the dataset's elements into.timestamp
: (Optional.) An int64 timestamp to write all the values at. Leave as None to use server-provided timestamps.
Returns:
A tf.Operation
that can be run to perform the write.
Raises:
ValueError
: If there are unexpected or incompatible types, or if the number of columns and column_families does not match the output ofdataset
.