Backups and disaster recovery

Estimated reading time: 3 minutes

When you decide to start using Docker Universal Control Plane on a production setting, you should configure it for high availability.

The next step is creating a backup policy and disaster recovery plan.

Backup policy

Docker UCP nodes persist data using named volumes.

As part of your backup policy you should regularly create backups of the controller nodes. Since the nodes used for running user containers don’t persist data, you can decide not to create any backups for them.

To perform a backup of a UCP controller node, use the docker/ucp backup command. This creates a tar archive with the contents of the volumes used by UCP on that node, and streams it to stdout.

To create a consistent backup, the backup command temporarily stops the UCP containers running on the node where the backup is being performed. User containers and services are not affected by this.

To have minimal impact on your business, you should:

Schedule the backup to take place outside business hours.
Configure UCP for high availability. This allows load-balancing user requests across multiple UCP controller nodes.

Backup command

The example below shows how to create a backup of a UCP controller node:

# Create a backup, encrypt it, and store it on /tmp/backup.tar
$ docker run --rm -i --name ucp \
  -v /var/run/docker.sock:/var/run/docker.sock \
  docker/ucp backup --interactive \
  --passphrase "secret" > /tmp/backup.tar

# Decrypt the backup and list its contents
$ gpg --decrypt /tmp/backup.tar | tar --list

Restore command

The example below shows how to restore a UCP controller node from an existing backup:

$ docker run --rm -i --name ucp \
  -v /var/run/docker.sock:/var/run/docker.sock  \
  docker/ucp restore --passphrase "secret" < backup.tar

When restoring, make sure you use the same version of the docker/dtr image that you’ve used to create the backup.

Restore your cluster

The restore command can be used to create a new UCP cluster from a backup file. After the restore operation is complete, the following data will be copied from the backup file:

Users, Teams and Permissions.
Cluster Configuration, such as the default Controller Port or the KV store timeout.
DDC Subscription License.
Options on Scheduling, Content Trust, Authentication Methods and Reporting.

The restore operation may be performed against any Docker Engine, regardless of swarm membership, as long as the target Engine is not already managed by a UCP installation. If the Docker Engine is already part of a swarm, that swarm and all deployed containers and services will be managed by UCP after the restore operation completes.

As an example, if you have a cluster with three controller nodes, A, B, and C, and your most recent backup was of node A:

Uninstall UCP from the swarm using the uninstall-ucp operation.
Restore one of the swarm managers, such as node B, using the most recent backup from node A.
Wait for all nodes of the swarm to become healthy UCP nodes.

You should now have your UCP cluster up and running.

Additionally, in the event where half or more controller nodes are lost and cannot be recovered to a healthy state, the system can only be restored through the following disaster recovery procedure. This procedure is not guaranteed to succeed with no loss of either swarm services or UCP configuration data:

On one of the remaining manager nodes, perform docker swarm init --force-new-cluster. This will instantiate a new single-manager swarm by recovering as much state as possible from the existing manager. This is a disruptive operation and any existing tasks will be either terminated or suspended.
Obtain a backup of one of the remaining manager nodes if one is not already available.
Perform a restore operation on the recovered swarm manager node.
For all other nodes of the cluster, perform a docker swarm leave --force and then a docker swarm join operation with the cluster’s new join-token.
Wait for all nodes of the swarm to become healthy UCP nodes.

Where to go next

Rate this page: