tf.compat.v1.keras.layers.experimental.preprocessing.CategoryEncoding

CategoryEncoding layer.

Inherits From: CategoryEncoding, PreprocessingLayer, Layer, Module

tf.compat.v1.keras.layers.experimental.preprocessing.CategoryEncoding(
    max_tokens=None, output_mode=BINARY, sparse=False, **kwargs
)

This layer provides options for condensing input data into denser representations. It accepts either integer values or strings as inputs, allows users to map those inputs into a contiguous integer space, and outputs either those integer values (one sample = 1D tensor of integer token indices) or a dense representation (one sample = 1D tensor of float values representing data about the sample's tokens).

If desired, the user can call this layer's adapt() method on a dataset. When this layer is adapted, it will analyze the dataset, determine the frequency of individual integer or string values, and create a 'vocabulary' from them. This vocabulary can have unlimited size or be capped, depending on the configuration options for this layer; if there are more unique values in the input than the maximum vocabulary size, the most frequent terms will be used to create the vocabulary.

Attributes
`max_elements`	The maximum size of the vocabulary for this layer. If None, there is no cap on the size of the vocabulary.
`output_mode`	Optional specification for the output of the layer. Values can be "int", "binary", "count" or "tf-idf", configuring the layer as follows: "int": Outputs integer indices, one integer index per split string token. "binary": Outputs a single int array per batch, of either vocab_size or max_elements size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item. "count": As "binary", but the int array contains a count of the number of times the token at that index appeared in the batch item. "tf-idf": As "binary", but the TF-IDF algorithm is applied to find the value in each token slot.
`output_sequence_length`	Only valid in INT mode. If set, the output will have its time dimension padded or truncated to exactly `output_sequence_length` values, resulting in a tensor of shape [batch_size, output_sequence_length] regardless of the input shape.
`pad_to_max_elements`	Only valid in "binary", "count", and "tf-idf" modes. If True, the output will have its feature axis padded to `max_elements` even if the number of unique values in the vocabulary is less than max_elements, resulting in a tensor of shape [batch_size, max_elements] regardless of vocabulary size. Defaults to False.

Methods

`adapt`

View source

adapt(
    data, reset_state=True
)

Fits the state of the preprocessing layer to the dataset.

Overrides the default adapt method to apply relevant preprocessing to the inputs before passing to the combiner.

Arguments
`data`	The data to train on. It can be passed either as a tf.data Dataset, or as a numpy array.
`reset_state`	Optional argument specifying whether to clear the state of the layer at the start of the call to `adapt`. This must be True for this layer, which does not support repeated calls to `adapt`.

Raises
`RuntimeError`	if the layer cannot be adapted at this time.

`set_num_elements`

View source

set_num_elements(
    num_elements
)

`set_tfidf_data`

View source

set_tfidf_data(
    tfidf_data
)

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/compat/v1/keras/layers/experimental/preprocessing/CategoryEncoding