Class CheckpointManager
Aliases:
- Class
tf.contrib.checkpoint.CheckpointManager
- Class
tf.train.CheckpointManager
Defined in tensorflow/python/training/checkpoint_management.py
.
Deletes old checkpoints.
Example usage:
import tensorflow as tf
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.contrib.checkpoint.CheckpointManager(
checkpoint, directory="/tmp/model", max_to_keep=5)
status = checkpoint.restore(manager.latest_checkpoint)
while True:
# train
manager.save()
CheckpointManager
preserves its own state across instantiations (see the
__init__
documentation for details). Only one should be active in a
particular directory at a time.
__init__
__init__(
checkpoint,
directory,
max_to_keep,
keep_checkpoint_every_n_hours=None
)
Configure a CheckpointManager
for use in directory
.
If a CheckpointManager
was previously used in directory
, its
state will be restored. This includes the list of managed checkpoints and
the timestamp bookkeeping necessary to support
keep_checkpoint_every_n_hours
. The behavior of the new CheckpointManager
will be the same as the previous CheckpointManager
, including cleaning up
existing checkpoints if appropriate.
Checkpoints are only considered for deletion just after a new checkpoint has
been added. At that point, max_to_keep
checkpoints will remain in an
"active set". Once a checkpoint is preserved by
keep_checkpoint_every_n_hours
it will not be deleted by this
CheckpointManager
or any future CheckpointManager
instantiated in
directory
(regardless of the new setting of
keep_checkpoint_every_n_hours
). The max_to_keep
checkpoints in the
active set may be deleted by this CheckpointManager
or a future
CheckpointManager
instantiated in directory
(subject to its
max_to_keep
and keep_checkpoint_every_n_hours
settings).
Args:
checkpoint
: Thetf.train.Checkpoint
instance to save and manage checkpoints for.directory
: The path to a directory in which to write checkpoints. A special file named "checkpoint" is also written to this directory (in a human-readable text format) which contains the state of theCheckpointManager
.max_to_keep
: An integer, the number of checkpoints to keep. Unless preserved bykeep_checkpoint_every_n_hours
, checkpoints will be deleted from the active set, oldest first, until onlymax_to_keep
checkpoints remain. IfNone
, no checkpoints are deleted and everything stays in the active set. Note thatmax_to_keep=None
will keep all checkpoint paths in memory and in the checkpoint state protocol buffer on disk.keep_checkpoint_every_n_hours
: Upon removal from the active set, a checkpoint will be preserved if it has been at leastkeep_checkpoint_every_n_hours
since the last preserved checkpoint. The default setting ofNone
does not preserve any checkpoints in this way.
Raises:
ValueError
: Ifmax_to_keep
is not a positive integer.
Properties
checkpoints
A list of managed checkpoints.
Note that checkpoints saved due to keep_checkpoint_every_n_hours
will not
show up in this list (to avoid ever-growing filename lists).
Returns:
A list of filenames, sorted from oldest to newest.
latest_checkpoint
The prefix of the most recent checkpoint in directory
.
Equivalent to tf.train.latest_checkpoint(directory)
where directory
is
the constructor argument to CheckpointManager
.
Suitable for passing to tf.train.Checkpoint.restore
to resume training.
Returns:
The checkpoint prefix. If there are no checkpoints, returns None
.
Methods
tf.train.CheckpointManager.save
save(checkpoint_number=None)
Creates a new checkpoint and manages it.
Args:
checkpoint_number
: An optional integer, or an integer-dtypeVariable
orTensor
, used to number the checkpoint. IfNone
(default), checkpoints are numbered usingcheckpoint.save_counter
. Even ifcheckpoint_number
is provided,save_counter
is still incremented. A user-providedcheckpoint_number
is not incremented even if it is aVariable
.
Returns:
The path to the new checkpoint. It is also recorded in the checkpoints
and latest_checkpoint
properies.