tf.lookup.StaticVocabularyTable

String to Id table that assigns out-of-vocabulary keys to hash buckets.

tf.lookup.StaticVocabularyTable(
    initializer, num_oov_buckets, lookup_key_dtype=None, name=None
)

For example, if an instance of StaticVocabularyTable is initialized with a string-to-id initializer that maps:

init = tf.lookup.KeyValueTensorInitializer(
    keys=tf.constant(['emerson', 'lake', 'palmer']),
    values=tf.constant([0, 1, 2], dtype=tf.int64))
table = tf.lookup.StaticVocabularyTable(
   init,
   num_oov_buckets=5)

The Vocabulary object will performs the following mapping:

emerson -> 0
lake -> 1
palmer -> 2
<other term> -> bucket_id, where bucket_id will be between 3 and 3 + num_oov_buckets - 1 = 7, calculated by: hash(<term>) % num_oov_buckets + vocab_size

If input_tensor is:

input_tensor = tf.constant(["emerson", "lake", "palmer",
                            "king", "crimson"])
table[input_tensor].numpy()
array([0, 1, 2, 6, 7])

If initializer is None, only out-of-vocabulary buckets are used.

Example usage:

num_oov_buckets = 3
vocab = ["emerson", "lake", "palmer", "crimnson"]
import tempfile
f = tempfile.NamedTemporaryFile(delete=False)
f.write('\n'.join(vocab).encode('utf-8'))
f.close()

init = tf.lookup.TextFileInitializer(
    f.name,
    key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE,
    value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER)
table = tf.lookup.StaticVocabularyTable(init, num_oov_buckets)
table.lookup(tf.constant(["palmer", "crimnson" , "king",
                          "tarkus", "black", "moon"])).numpy()
array([2, 3, 5, 6, 6, 4])

The hash function used for generating out-of-vocabulary buckets ID is Fingerprint64.

Args
`initializer`	A `TableInitializerBase` object that contains the data used to initialize the table. If None, then we only use out-of-vocab buckets.
`num_oov_buckets`	Number of buckets to use for out-of-vocabulary keys. Must be greater than zero.
`lookup_key_dtype`	Data type of keys passed to `lookup`. Defaults to `initializer.key_dtype` if `initializer` is specified, otherwise `tf.string`. Must be string or integer, and must be castable to `initializer.key_dtype`.
`name`	A name for the operation (optional).

Raises
`ValueError`	when `num_oov_buckets` is not positive.
`TypeError`	when lookup_key_dtype or initializer.key_dtype are not integer or string. Also when initializer.value_dtype != int64.

Attributes
`key_dtype`	The table key dtype.
`name`	The name of the table.
`resource_handle`	Returns the resource handle associated with this Resource.
`value_dtype`	The table value dtype.

Methods

`lookup`

View source

lookup(
    keys, name=None
)

Looks up keys in the table, outputs the corresponding values.

It assigns out-of-vocabulary keys to buckets based in their hashes.

Args
`keys`	Keys to look up. May be either a `SparseTensor` or dense `Tensor`.
`name`	Optional name for the op.

Returns
A `SparseTensor` if keys are sparse, a `RaggedTensor` if keys are ragged, otherwise a dense `Tensor`.

Raises
`TypeError`	when `keys` doesn't match the table key data type.

`size`

View source

size(
    name=None
)

Compute the number of elements in this table.

`getitem`

View source

__getitem__(
    keys
)

Looks up keys in a table, outputs the corresponding values.

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/lookup/StaticVocabularyTable