tf.compat.v1.tpu.experimental.AdamParameters

Optimization parameters for Adam with TPU embeddings.

tf.compat.v1.tpu.experimental.AdamParameters(
    learning_rate: float,
    beta1: float = 0.9,
    beta2: float = 0.999,
    epsilon: float = 1e-08,
    lazy_adam: bool = True,
    sum_inside_sqrt: bool = True,
    use_gradient_accumulation: bool = True,
    clip_weight_min: Optional[float] = None,
    clip_weight_max: Optional[float] = None,
    weight_decay_factor: Optional[float] = None,
    multiply_weight_decay_factor_by_learning_rate: Optional[bool] = None,
    clip_gradient_min: Optional[float] = None,
    clip_gradient_max: Optional[float] = None
)

Pass this to tf.estimator.tpu.experimental.EmbeddingConfigSpec via the optimization_parameters argument to set the optimizer and its parameters. See the documentation for tf.estimator.tpu.experimental.EmbeddingConfigSpec for more details.

estimator = tf.estimator.tpu.TPUEstimator(
    ...
    embedding_config_spec=tf.estimator.tpu.experimental.EmbeddingConfigSpec(
        ...
        optimization_parameters=tf.tpu.experimental.AdamParameters(0.1),
        ...))

Args
`learning_rate`	a floating point value. The learning rate.
`beta1`	A float value. The exponential decay rate for the 1st moment estimates.
`beta2`	A float value. The exponential decay rate for the 2nd moment estimates.
`epsilon`	A small constant for numerical stability.
`lazy_adam`	Use lazy Adam instead of Adam. Lazy Adam trains faster. Please see `optimization_parameters.proto` for details.
`sum_inside_sqrt`	This improves training speed. Please see `optimization_parameters.proto` for details.
`use_gradient_accumulation`	setting this to `False` makes embedding gradients calculation less accurate but faster. Please see `optimization_parameters.proto` for details. for details.
`clip_weight_min`	the minimum value to clip by; None means -infinity.
`clip_weight_max`	the maximum value to clip by; None means +infinity.
`weight_decay_factor`	amount of weight decay to apply; None means that the weights are not decayed.
`multiply_weight_decay_factor_by_learning_rate`	if true, `weight_decay_factor` is multiplied by the current learning rate.
`clip_gradient_min`	the minimum value to clip by; None means -infinity.
`clip_gradient_max`	the maximum value to clip by; None means +infinity.

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/compat/v1/tpu/experimental/AdamParameters