tf.train.CheckpointManager
View source on GitHub |
Manages multiple checkpoints by keeping some and deleting unneeded ones.
tf.train.CheckpointManager( checkpoint, directory, max_to_keep, keep_checkpoint_every_n_hours=None, checkpoint_name='ckpt', step_counter=None, checkpoint_interval=None, init_fn=None )
Example usage:
import tensorflow as tf checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model) manager = tf.train.CheckpointManager( checkpoint, directory="/tmp/model", max_to_keep=5) status = checkpoint.restore(manager.latest_checkpoint) while True: # train manager.save()
CheckpointManager
preserves its own state across instantiations (see the __init__
documentation for details). Only one should be active in a particular directory at a time.
Args | |
---|---|
checkpoint | The tf.train.Checkpoint instance to save and manage checkpoints for. |
directory | The path to a directory in which to write checkpoints. A special file named "checkpoint" is also written to this directory (in a human-readable text format) which contains the state of the CheckpointManager . |
max_to_keep | An integer, the number of checkpoints to keep. Unless preserved by keep_checkpoint_every_n_hours , checkpoints will be deleted from the active set, oldest first, until only max_to_keep checkpoints remain. If None , no checkpoints are deleted and everything stays in the active set. Note that max_to_keep=None will keep all checkpoint paths in memory and in the checkpoint state protocol buffer on disk. |
keep_checkpoint_every_n_hours | Upon removal from the active set, a checkpoint will be preserved if it has been at least keep_checkpoint_every_n_hours since the last preserved checkpoint. The default setting of None does not preserve any checkpoints in this way. |
checkpoint_name | Custom name for the checkpoint file. |
step_counter | A tf.Variable instance for checking the current step counter value, in case users want to save checkpoints every N steps. |
checkpoint_interval | An integer, indicates the minimum step interval between two checkpoints. |
init_fn | Callable. A function to do customized intialization if no checkpoints are in the directory. |
Raises | |
---|---|
ValueError | If max_to_keep is not a positive integer. |
Attributes | |
---|---|
checkpoint | Returns the tf.train.Checkpoint object. |
checkpoint_interval | |
checkpoints | A list of managed checkpoints. Note that checkpoints saved due to |
directory | |
latest_checkpoint | The prefix of the most recent checkpoint in directory . Equivalent to Suitable for passing to |
Methods
restore_or_initialize
restore_or_initialize()
Restore items in checkpoint
from the latest checkpoint file.
This method will first try to restore from the most recent checkpoint in directory
. If no checkpoints exist in directory
, and init_fn
is specified, this method will call init_fn
to do customized initialization. This can be used to support initialization from pretrained models.
Note that unlike tf.train.Checkpoint.restore()
, this method doesn't return a load status object that users can run assertions on (e.g. assert_consumed()). Thus to run assertions, users should directly use tf.train.Checkpoint.restore()
method.
Returns | |
---|---|
The restored checkpoint path if the lastest checkpoint is found and restored. Otherwise None. |
save
save( checkpoint_number=None, check_interval=True )
Creates a new checkpoint and manages it.
Args | |
---|---|
checkpoint_number | An optional integer, or an integer-dtype Variable or Tensor , used to number the checkpoint. If None (default), checkpoints are numbered using checkpoint.save_counter . Even if checkpoint_number is provided, save_counter is still incremented. A user-provided checkpoint_number is not incremented even if it is a Variable . |
check_interval | An optional boolean. The argument is only effective when checkpoint_interval is passed into the manager. If True , the manager will only save the checkpoint if the interval between checkpoints is larger than checkpoint_interval . Otherwise it will always save the checkpoint unless a checkpoint has already been saved for the current step. |
Returns | |
---|---|
The path to the new checkpoint. It is also recorded in the checkpoints and latest_checkpoint properties. None if no checkpoint is saved. |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/train/CheckpointManager