tf.keras.mixed_precision.experimental.LossScaleOptimizer
View source on GitHub |
An deprecated optimizer that applies loss scaling.
Inherits From: LossScaleOptimizer
, Optimizer
tf.keras.mixed_precision.experimental.LossScaleOptimizer( optimizer, loss_scale )
This class is identical to the non-experimental keras.mixed_precision.LossScaleOptimizer
except its constructor takes different arguments. For this class (the experimental version), the constructor takes a loss_scale
argument. For the non-experimental class, the constructor encodes the loss scaling information in multiple arguments. Note that unlike this class, the non-experimental class does not accept a tf.compat.v1.mixed_precision.LossScale
, which is deprecated.
If you currently use this class, you should switch to the non-experimental tf.keras.mixed_precision.LossScaleOptimizer
instead. We show several examples of converting the use of the experimental class to the equivalent non-experimental class.
# In all of the the examples below, `opt1` and `opt2` are identical opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer( tf.keras.optimizers.SGD(), loss_scale='dynamic') opt2 = tf.keras.mixed_precision.LossScaleOptimizer( tf.keras.optimizers.SGD()) assert opt1.get_config() == opt2.get_config()
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer( tf.keras.optimizers.SGD(), loss_scale=123) # dynamic=False indicates to use fixed loss scaling. initial_scale=123 # refers to the initial loss scale, which is the single fixed loss scale # when dynamic=False. opt2 = tf.keras.mixed_precision.LossScaleOptimizer( tf.keras.optimizers.SGD(), dynamic=False, initial_scale=123) assert opt1.get_config() == opt2.get_config()
loss_scale = tf.compat.v1.mixed_precision.experimental.DynamicLossScale( initial_loss_scale=2048, increment_period=500) opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer( tf.keras.optimizers.SGD(), loss_scale=loss_scale) opt2 = tf.keras.mixed_precision.LossScaleOptimizer( tf.keras.optimizers.SGD(), initial_scale=2048, dynamic_growth_steps=500) assert opt1.get_config() == opt2.get_config()
Make sure to also switch from this class to the non-experimental class in isinstance checks, if you have any. If you do not do this, your model may run into hard-to-debug issues, as the experimental LossScaleOptimizer
subclasses the non-experimental LossScaleOptimizer
, but not vice versa. It is safe to switch isinstance checks to the non-experimental LossScaleOptimizer
even before using the non-experimental LossScaleOptimizer
.
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer( tf.keras.optimizers.SGD(), loss_scale='dynamic') # The experimental class subclasses the non-experimental class isinstance(opt1, tf.keras.mixed_precision.LossScaleOptimizer) True opt2 = tf.keras.mixed_precision.LossScaleOptimizer( tf.keras.optimizers.SGD()) # The non-experimental class does NOT subclass the experimental class. isinstance(opt2, tf.keras.mixed_precision.experimental.LossScaleOptimizer) False
Args | |
---|---|
optimizer | The Optimizer instance to wrap. |
loss_scale | The loss scale to scale the loss and gradients. This can either be an int/float to use a fixed loss scale, the string "dynamic" to use dynamic loss scaling, or an instance of a LossScale. The string "dynamic" equivalent to passing DynamicLossScale() , and passing an int/float is equivalent to passing a FixedLossScale with the given loss scale. If a DynamicLossScale is passed, DynamicLossScale.multiplier must be 2 (the default). |
Raises | |
---|---|
ValueError | in case of any invalid argument. |
Attributes | |
---|---|
dynamic | Bool indicating whether dynamic loss scaling is used. |
dynamic_counter | The number of steps since the loss scale was last increased or decreased. This is None if The counter is incremented every step. Once it reaches |
dynamic_growth_steps | The number of steps it takes to increase the loss scale. This is None if Every |
initial_scale | The initial loss scale. If |
inner_optimizer | The optimizer that this LossScaleOptimizer is wrapping. |
loss_scale | The current loss scale as a float32 scalar tensor. |
Methods
get_scaled_loss
get_scaled_loss( loss )
Scales the loss by the loss scale.
This method is only needed if you compute gradients manually, e.g. with tf.GradientTape
. In that case, call this method to scale the loss before passing the loss to tf.GradientTape
. If you use LossScaleOptimizer.minimize
or LossScaleOptimizer.get_gradients
, loss scaling is automatically applied and this method is unneeded.
If this method is called, get_unscaled_gradients
should also be called. See the tf.keras.mixed_precision.LossScaleOptimizer
doc for an example.
Args | |
---|---|
loss | The loss, which will be multiplied by the loss scale. Can either be a tensor or a callable returning a tensor. |
Returns | |
---|---|
loss multiplied by LossScaleOptimizer.loss_scale . |
get_unscaled_gradients
get_unscaled_gradients( grads )
Unscales the gradients by the loss scale.
This method is only needed if you compute gradients manually, e.g. with tf.GradientTape
. In that case, call this method to unscale the gradients after computing them with tf.GradientTape
. If you use LossScaleOptimizer.minimize
or LossScaleOptimizer.get_gradients
, loss scaling is automatically applied and this method is unneeded.
If this method is called, get_scaled_loss
should also be called. See the tf.keras.mixed_precision.LossScaleOptimizer
doc for an example.
Args | |
---|---|
grads | A list of tensors, each which will be divided by the loss scale. Can have None values, which are ignored. |
Returns | |
---|---|
A new list the same size as grads , where every non-None value in grads is divided by LossScaleOptimizer.loss_scale . |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/keras/mixed_precision/experimental/LossScaleOptimizer