tf.io.RaggedFeature
Configuration for passing a RaggedTensor input feature.
tf.io.RaggedFeature( dtype, value_key=None, partitions=(), row_splits_dtype=tf.dtypes.int32, validate=False )
value_key
specifies the feature key for a variable-length list of values; and partitions
specifies zero or more feature keys for partitioning those values into higher dimensions. Each element of partitions
must be one of the following:
tf.io.RaggedFeature.RowSplits(key: string)
tf.io.RaggedFeature.RowLengths(key: string)
tf.io.RaggedFeature.RowStarts(key: string)
tf.io.RaggedFeature.RowLimits(key: string)
tf.io.RaggedFeature.ValueRowIds(key: string)
-
tf.io.RaggedFeature.UniformRowLength(length: int)
.
Where key
is a feature key whose values are used to partition the values. Partitions are listed from outermost to innermost.
-
If
len(partitions) == 0
(the default), then:- A feature from a single
tf.Example
is parsed into a 1Dtf.Tensor
. - A feature from a batch of
tf.Example
s is parsed into a 2Dtf.RaggedTensor
, where the outer dimension is the batch dimension, and the inner (ragged) dimension is the feature length in each example.
- A feature from a single
-
If
len(partitions) == 1
, then:A feature from a single
tf.Example
is parsed into a 2Dtf.RaggedTensor
, where the values taken from thevalue_key
are separated into rows using the partition key.A feature from a batch of
tf.Example
s is parsed into a 3Dtf.RaggedTensor
, where the outer dimension is the batch dimension, the two inner dimensions are formed by separating thevalue_key
values from each example into rows using that example's partition key.
-
If
len(partitions) > 1
, then:A feature from a single
tf.Example
is parsed into atf.RaggedTensor
whose rank islen(partitions)+1
, and whose ragged_rank islen(partitions)
.A feature from a batch of
tf.Example
s is parsed into atf.RaggedTensor
whose rank islen(partitions)+2
and whose ragged_rank islen(partitions)+1
, where the outer dimension is the batch dimension.
There is one exception: if the final (i.e., innermost) element(s) of partitions
are UniformRowLength
s, then the values are simply reshaped (as a higher-dimensional tf.Tensor
), rather than being wrapped in a tf.RaggedTensor
.
Examples
import google.protobuf.text_format as pbtext example_batch = [ pbtext.Merge(r''' features { feature {key: "v" value {int64_list {value: [3, 1, 4, 1, 5, 9]} } } feature {key: "s1" value {int64_list {value: [0, 2, 3, 3, 6]} } } feature {key: "s2" value {int64_list {value: [0, 2, 3, 4]} } } }''', tf.train.Example()).SerializeToString(), pbtext.Merge(r''' features { feature {key: "v" value {int64_list {value: [2, 7, 1, 8, 2, 8, 1]} } } feature {key: "s1" value {int64_list {value: [0, 3, 4, 5, 7]} } } feature {key: "s2" value {int64_list {value: [0, 1, 1, 4]} } } }''', tf.train.Example()).SerializeToString()]
features = { # Zero partitions: returns 1D tf.Tensor for each Example. 'f1': tf.io.RaggedFeature(value_key="v", dtype=tf.int64), # One partition: returns 2D tf.RaggedTensor for each Example. 'f2': tf.io.RaggedFeature(value_key="v", dtype=tf.int64, partitions=[ tf.io.RaggedFeature.RowSplits("s1")]), # Two partitions: returns 3D tf.RaggedTensor for each Example. 'f3': tf.io.RaggedFeature(value_key="v", dtype=tf.int64, partitions=[ tf.io.RaggedFeature.RowSplits("s2"), tf.io.RaggedFeature.RowSplits("s1")]) }
feature_dict = tf.io.parse_single_example(example_batch[0], features) for (name, val) in sorted(feature_dict.items()): print('%s: %s' % (name, val)) f1: tf.Tensor([3 1 4 1 5 9], shape=(6,), dtype=int64) f2: <tf.RaggedTensor [[3, 1], [4], [], [1, 5, 9]]> f3: <tf.RaggedTensor [[[3, 1], [4]], [[]], [[1, 5, 9]]]>
feature_dict = tf.io.parse_example(example_batch, features) for (name, val) in sorted(feature_dict.items()): print('%s: %s' % (name, val)) f1: <tf.RaggedTensor [[3, 1, 4, 1, 5, 9], [2, 7, 1, 8, 2, 8, 1]]> f2: <tf.RaggedTensor [[[3, 1], [4], [], [1, 5, 9]], [[2, 7, 1], [8], [2], [8, 1]]]> f3: <tf.RaggedTensor [[[[3, 1], [4]], [[]], [[1, 5, 9]]], [[[2, 7, 1]], [], [[8], [2], [8, 1]]]]>
Fields:
-
dtype
: Data type of theRaggedTensor
. Must be one of:tf.dtypes.int64
,tf.dtypes.float32
,tf.dtypes.string
. -
value_key
: (Optional.) Key for aFeature
in the inputExample
, whose parsedTensor
will be the resultingRaggedTensor.flat_values
. If not specified, then it defaults to the key for thisRaggedFeature
. -
partitions
: (Optional.) A list of objects specifying the row-partitioning tensors (from outermost to innermost). Each entry in this list must be one of:tf.io.RaggedFeature.RowSplits(key: string)
tf.io.RaggedFeature.RowLengths(key: string)
tf.io.RaggedFeature.RowStarts(key: string)
tf.io.RaggedFeature.RowLimits(key: string)
tf.io.RaggedFeature.ValueRowIds(key: string)
-
tf.io.RaggedFeature.UniformRowLength(length: int)
. Wherekey
is a key for aFeature
in the inputExample
, whose parsedTensor
will be the resulting row-partitioning tensor.
-
row_splits_dtype
: (Optional.) Data type for the row-partitioning tensor(s). One ofint32
orint64
. Defaults toint32
. -
validate
: (Optional.) Boolean indicating whether or not to validate that the input values form a valid RaggedTensor. Defaults toFalse
.
Attributes | |
---|---|
dtype | |
value_key | |
partitions | |
row_splits_dtype | |
validate |
Child Classes
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/io/RaggedFeature