View source on GitHub
|
String to Id table wrapper that assigns out-of-vocabulary keys to buckets.
tf.lookup.StaticVocabularyTable(
initializer, num_oov_buckets, lookup_key_dtype=None, name=None
)
For example, if an instance of StaticVocabularyTable is initialized with a
string-to-id initializer that maps:
emerson -> 0lake -> 1palmer -> 2The Vocabulary object will performs the following mapping:
emerson -> 0lake -> 1palmer -> 2<other term> -> bucket_id, where bucket_id will be between 3 and
3 + num_oov_buckets - 1, calculated by:
hash(<term>) % num_oov_buckets + vocab_sizeIf input_tensor is ["emerson", "lake", "palmer", "king", "crimson"],
the lookup result is [0, 1, 2, 4, 7].
If initializer is None, only out-of-vocabulary buckets are used.
num_oov_buckets = 3
input_tensor = tf.constant(["emerson", "lake", "palmer", "king", "crimnson"])
table = tf.lookup.StaticVocabularyTable(
tf.TextFileIdTableInitializer(filename), num_oov_buckets)
out = table.lookup(input_tensor).
table.init.run()
print(out.eval())
The hash function used for generating out-of-vocabulary buckets ID is Fingerprint64.
initializer: A TableInitializerBase object that contains the data used to
initialize the table. If None, then we only use out-of-vocab buckets.num_oov_buckets: Number of buckets to use for out-of-vocabulary keys. Must
be greater than zero.lookup_key_dtype: Data type of keys passed to lookup. Defaults to
initializer.key_dtype if initializer is specified, otherwise
tf.string. Must be string or integer, and must be castable to
initializer.key_dtype.name: A name for the operation (optional).key_dtype: The table key dtype.name: The name of the table.resource_handle: Returns the resource handle associated with this Resource.value_dtype: The table value dtype.ValueError: when num_oov_buckets is not positive.TypeError: when lookup_key_dtype or initializer.key_dtype are not
integer or string. Also when initializer.value_dtype != int64.lookuplookup(
keys, name=None
)
Looks up keys in the table, outputs the corresponding values.
It assigns out-of-vocabulary keys to buckets based in their hashes.
keys: Keys to look up. May be either a SparseTensor or dense Tensor.name: Optional name for the op.A SparseTensor if keys are sparse, otherwise a dense Tensor.
TypeError: when keys doesn't match the table key data type.sizesize(
name=None
)
Compute the number of elements in this table.