View source on GitHub |
Deletes old checkpoints.
tf.train.CheckpointManager(
checkpoint, directory, max_to_keep, keep_checkpoint_every_n_hours=None,
checkpoint_name='ckpt'
)
import tensorflow as tf
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.train.CheckpointManager(
checkpoint, directory="/tmp/model", max_to_keep=5)
status = checkpoint.restore(manager.latest_checkpoint)
while True:
# train
manager.save()
CheckpointManager
preserves its own state across instantiations (see the
__init__
documentation for details). Only one should be active in a
particular directory at a time.
checkpoint
: The tf.train.Checkpoint
instance to save and manage
checkpoints for.directory
: The path to a directory in which to write checkpoints. A
special file named "checkpoint" is also written to this directory (in a
human-readable text format) which contains the state of the
CheckpointManager
.max_to_keep
: An integer, the number of checkpoints to keep. Unless
preserved by keep_checkpoint_every_n_hours
, checkpoints will be
deleted from the active set, oldest first, until only max_to_keep
checkpoints remain. If None
, no checkpoints are deleted and everything
stays in the active set. Note that max_to_keep=None
will keep all
checkpoint paths in memory and in the checkpoint state protocol buffer
on disk.keep_checkpoint_every_n_hours
: Upon removal from the active set, a
checkpoint will be preserved if it has been at least
keep_checkpoint_every_n_hours
since the last preserved checkpoint. The
default setting of None
does not preserve any checkpoints in this way.checkpoint_name
: Custom name for the checkpoint file.checkpoints
: A list of managed checkpoints.
Note that checkpoints saved due to keep_checkpoint_every_n_hours
will not
show up in this list (to avoid ever-growing filename lists).
latest_checkpoint
: The prefix of the most recent checkpoint in directory
.
Equivalent to tf.train.latest_checkpoint(directory)
where directory
is
the constructor argument to CheckpointManager
.
Suitable for passing to tf.train.Checkpoint.restore
to resume training.
ValueError
: If max_to_keep
is not a positive integer.save
save(
checkpoint_number=None
)
Creates a new checkpoint and manages it.
checkpoint_number
: An optional integer, or an integer-dtype Variable
or
Tensor
, used to number the checkpoint. If None
(default),
checkpoints are numbered using checkpoint.save_counter
. Even if
checkpoint_number
is provided, save_counter
is still incremented. A
user-provided checkpoint_number
is not incremented even if it is a
Variable
.The path to the new checkpoint. It is also recorded in the checkpoints
and latest_checkpoint
properties.