tensorflow::ops::FixedUnigramCandidateSampler::Attrs

#include <candidate_sampling_ops.h>

Optional attribute setters for FixedUnigramCandidateSampler.

Summary

Public attributes
`distortion_ = 1.0f`	`float`
`num_reserved_ids_ = 0`	`int64`
`num_shards_ = 1`	`int64`
`seed2_ = 0`	`int64`
`seed_ = 0`	`int64`
`shard_ = 0`	`int64`
`unigrams_ = {}`	`gtl::ArraySlice< float >`
`vocab_file_ = ""`	`StringPiece`

Public functions
`Distortion(float x)`	`TF_MUST_USE_RESULT Attrs` The distortion is used to skew the unigram probability distribution.
`NumReservedIds(int64 x)`	`TF_MUST_USE_RESULT Attrs` Optionally some reserved IDs can be added in the range [0, ..., num_reserved_ids) by the users.
`NumShards(int64 x)`	`TF_MUST_USE_RESULT Attrs` A sampler can be used to sample from a subset of the original range in order to speed up the whole computation through parallelism.
`Seed(int64 x)`	`TF_MUST_USE_RESULT Attrs` If either seed or seed2 are set to be non-zero, the random number generator is seeded by the given seed.
`Seed2(int64 x)`	`TF_MUST_USE_RESULT Attrs` An second seed to avoid seed collision.
`Shard(int64 x)`	`TF_MUST_USE_RESULT Attrs` A sampler can be used to sample from a subset of the original range in order to speed up the whole computation through parallelism.
`Unigrams(const gtl::ArraySlice< float > & x)`	`TF_MUST_USE_RESULT Attrs` A list of unigram counts or probabilities, one per ID in sequential order.
`VocabFile(StringPiece x)`	`TF_MUST_USE_RESULT Attrs` Each valid line in this file (which should have a CSV-like format) corresponds to a valid word ID.

Public attributes

distortion_

float tensorflow::ops::FixedUnigramCandidateSampler::Attrs::distortion_ = 1.0f

num_reserved_ids_

int64 tensorflow::ops::FixedUnigramCandidateSampler::Attrs::num_reserved_ids_ = 0

num_shards_

int64 tensorflow::ops::FixedUnigramCandidateSampler::Attrs::num_shards_ = 1

seed2_

int64 tensorflow::ops::FixedUnigramCandidateSampler::Attrs::seed2_ = 0

seed_

int64 tensorflow::ops::FixedUnigramCandidateSampler::Attrs::seed_ = 0

shard_

int64 tensorflow::ops::FixedUnigramCandidateSampler::Attrs::shard_ = 0

unigrams_

gtl::ArraySlice< float > tensorflow::ops::FixedUnigramCandidateSampler::Attrs::unigrams_ = {}

vocab_file_

StringPiece tensorflow::ops::FixedUnigramCandidateSampler::Attrs::vocab_file_ = ""

Public functions

Distortion

TF_MUST_USE_RESULT Attrs tensorflow::ops::FixedUnigramCandidateSampler::Attrs::Distortion(
  float x
)

The distortion is used to skew the unigram probability distribution.

Each weight is first raised to the distortion's power before adding to the internal unigram distribution. As a result, distortion = 1.0 gives regular unigram sampling (as defined by the vocab file), and distortion = 0.0 gives a uniform distribution.

Defaults to 1

NumReservedIds

TF_MUST_USE_RESULT Attrs tensorflow::ops::FixedUnigramCandidateSampler::Attrs::NumReservedIds(
  int64 x
)

Optionally some reserved IDs can be added in the range [0, ..., num_reserved_ids) by the users.

One use case is that a special unknown word token is used as ID 0. These IDs will have a sampling probability of 0.

Defaults to 0

NumShards

TF_MUST_USE_RESULT Attrs tensorflow::ops::FixedUnigramCandidateSampler::Attrs::NumShards(
  int64 x
)

A sampler can be used to sample from a subset of the original range in order to speed up the whole computation through parallelism.

This parameter (together with 'shard') indicates the number of partitions that are being used in the overall computation.

Defaults to 1

Seed

TF_MUST_USE_RESULT Attrs tensorflow::ops::FixedUnigramCandidateSampler::Attrs::Seed(
  int64 x
)

If either seed or seed2 are set to be non-zero, the random number generator is seeded by the given seed.

Otherwise, it is seeded by a random seed.

Defaults to 0

Seed2

TF_MUST_USE_RESULT Attrs tensorflow::ops::FixedUnigramCandidateSampler::Attrs::Seed2(
  int64 x
)

An second seed to avoid seed collision.

Defaults to 0

Shard

TF_MUST_USE_RESULT Attrs tensorflow::ops::FixedUnigramCandidateSampler::Attrs::Shard(
  int64 x
)

A sampler can be used to sample from a subset of the original range in order to speed up the whole computation through parallelism.

This parameter (together with 'num_shards') indicates the particular partition number of a sampler op, when partitioning is being used.

Defaults to 0

Unigrams

TF_MUST_USE_RESULT Attrs tensorflow::ops::FixedUnigramCandidateSampler::Attrs::Unigrams(
  const gtl::ArraySlice< float > & x
)

A list of unigram counts or probabilities, one per ID in sequential order.

Exactly one of vocab_file and unigrams should be passed to this op.

Defaults to []

VocabFile

TF_MUST_USE_RESULT Attrs tensorflow::ops::FixedUnigramCandidateSampler::Attrs::VocabFile(
  StringPiece x
)

Each valid line in this file (which should have a CSV-like format) corresponds to a valid word ID.

IDs are in sequential order, starting from num_reserved_ids. The last entry in each line is expected to be a value corresponding to the count or relative probability. Exactly one of vocab_file and unigrams needs to be passed to this op.

Defaults to ""

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 4.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/cc/struct/tensorflow/ops/fixed-unigram-candidate-sampler/attrs