tf.keras.optimizers.Nadam

Optimizer that implements the NAdam algorithm.

Inherits From: Optimizer

View aliases

Compat aliases for migration

tf.compat.v1.keras.optimizers.Nadam, `tf.compat.v2.keras.optimizers.Nadam`, `tf.compat.v2.optimizers.Nadam`

tf.keras.optimizers.Nadam(
    learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, name='Nadam',
    **kwargs
)

Much like Adam is essentially RMSprop with momentum, Nadam is Adam with Nesterov momentum.

Initialization:

$$m_0 := 0 \text{(Initialize 1st moment vector)}$$

$$v_0 := 0 \text{(Initialize 2nd moment vector)}$$

$$mu_0 := 1$$

$$t := 0 \text{(Initialize timestep)}$$

Computes:

$$t := t + 1$$

$$\mu_t := \beta_1 * (1 - 0.5 * 0.96^{0.004 * t})$$

$$g' := g / (1 - \prod_{i=1}^{t}{\mu_i})$$

$$m_t := \beta_1 * m_{t-1} + (1 - \beta_1) * g$$

$$m' := m_t / (1 - \prod_{i=1}^{t+1}{\mu_i})$$

$$v_t := \beta_2 * v_{t-1} + (1 - \beta_2) * g * g$$

$$v' := v_t / (1 - \beta_2^t)$$

$$\bar{m} := (1 - \mu_t) * g' + \mu_{t+1} * m'$$

$$\theta_t := \theta_{t-1} - lr * \bar{m} / (\sqrt{v'} + \epsilon)$$

gradient is evaluated at theta(t) + momentum * v(t), and the variables always store theta + beta_1 * m / sqrt(v) instead of theta.

References See Dozat, T., 2015.

Args
`learning_rate`	A Tensor or a floating point value. The learning rate.
`beta_1`	A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
`beta_2`	A float value or a constant float tensor. The exponential decay rate for the exponentially weighted infinity norm.
`epsilon`	A small constant for numerical stability.
`name`	Optional name for the operations created when applying gradients. Defaults to "Adamax".
`**kwargs`	keyword arguments. Allowed to be {`clipnorm`, `clipvalue`, `lr`, `decay`}. `clipnorm` is clip gradients by norm; `clipvalue` is clip gradients by value, `decay` is included for backward compatibility to allow time inverse decay of learning rate. `lr` is included for backward compatibility, recommended to use `learning_rate` instead.

Attributes
`iterations`	Variable. The number of training steps this Optimizer has run.
`weights`	Returns variables of this Optimizer based on the order created.

Methods

`add_slot`

View source

add_slot(
    var, slot_name, initializer='zeros'
)

Add a new slot variable for var.

`add_weight`

View source

add_weight(
    name, shape, dtype=None, initializer='zeros', trainable=None,
    synchronization=tf.VariableSynchronization.AUTO,
    aggregation=tf.VariableAggregation.NONE
)

`apply_gradients`

View source

apply_gradients(
    grads_and_vars, name=None
)

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

Args
`grads_and_vars`	List of (gradient, variable) pairs.
`name`	Optional name for the returned operation. Default to the name passed to the `Optimizer` constructor.

Returns
An `Operation` that applies the specified gradients. The `iterations` will be automatically increased by 1.

Raises
`TypeError`	If `grads_and_vars` is malformed.
`ValueError`	If none of the variables have gradients.

`from_config`

View source

@classmethod
from_config(
    config, custom_objects=None
)

Creates an optimizer from its config.

This method is the reverse of get_config, capable of instantiating the same optimizer from the config dictionary.

Arguments
`config`	A Python dictionary, typically the output of get_config.
`custom_objects`	A Python dictionary mapping names to additional Python objects used to create this optimizer, such as a function used for a hyperparameter.

Returns
An optimizer instance.

`get_config`

View source

get_config()

Returns the config of the optimimizer.

An optimizer config is a Python dictionary (serializable) containing the configuration of an optimizer. The same optimizer can be reinstantiated later (without any saved state) from this configuration.

Returns
Python dictionary.

`get_gradients`

View source

get_gradients(
    loss, params
)

Returns gradients of loss with respect to params.

Arguments
`loss`	Loss tensor.
`params`	List of variables.

Returns
List of gradient tensors.

Raises
`ValueError`	In case any gradient cannot be computed (e.g. if gradient function not implemented).

`get_slot`

View source

get_slot(
    var, slot_name
)

`get_slot_names`

View source

get_slot_names()

A list of names for this optimizer's slots.

`get_updates`

View source

get_updates(
    loss, params
)

`get_weights`

View source

get_weights()

`minimize`

View source

minimize(
    loss, var_list, grad_loss=None, name=None
)

Minimize loss by updating var_list.

This method simply computes gradient using tf.GradientTape and calls apply_gradients(). If you want to process the gradient before applying then call tf.GradientTape and apply_gradients() explicitly instead of using this function.

Args
`loss`	A callable taking no arguments which returns the value to minimize.
`var_list`	list or tuple of `Variable` objects to update to minimize `loss`, or a callable returning the list or tuple of `Variable` objects. Use callable when the variable list would otherwise be incomplete before `minimize` since the variables are created at the first time `loss` is called.
`grad_loss`	Optional. A `Tensor` holding the gradient computed for `loss`.
`name`	Optional name for the returned operation.

Returns
An Operation that updates the variables in `var_list`. If `global_step` was not `None`, that operation also increments `global_step`.

Raises
`ValueError`	If some of the variables are not `Variable` objects.

`set_weights`

View source

set_weights(
    weights
)

`variables`

View source

variables()

Returns variables of this Optimizer based on the order created.

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/keras/optimizers/Nadam