Troubleshoot UCP node states
Estimated reading time: 4 minutesThere are several cases in the lifecycle of UCP when a node is actively transitioning from one state to another, such as when a new node is joining the swarm or during node promotion and demotion. In these cases, the current step of the transition will be reported by UCP as a node message. You can view the state of each individual node by following the same steps required to monitor cluster status.
UCP node states
The following table lists all possible node states that may be reported for a UCP node, their explanation, and the expected duration of a given step.
Message | Description | Typical step duration |
---|---|---|
Completing node registration | Waiting for the node to appear in KV node inventory. This is expected to occur when a node first joins the UCP swarm. | 5 - 30 seconds |
heartbeat failure | The node has not contacted any swarm managers in the last 10 seconds. Check Swarm state in docker info on the node. inactive means the node has been removed from the swarm with docker swarm leave . pending means dockerd on the node has been attempting to contact a manager since dockerd on the node started. Confirm network security policy allows tcp port 2377 from the node to managers. error means an error prevented swarm from starting on the node. Check docker daemon logs on the node. |
Until resolved |
Node is being reconfigured | The ucp-reconcile container is currently converging the current state of the node to the desired state. This process may involve issuing certificates, pulling missing images, and starting containers, depending on the current node state. |
1 - 60 seconds |
Reconfiguration pending | The target node is expected to be a manager but the ucp-reconcile container has not been started yet. |
1 - 10 seconds |
The ucp-agent task is state |
The ucp-agent task on the target node is not in a running state yet. This is an expected message when configuration has been updated, or when a new node was first joined to the UCP cluster. This step may take a longer time duration than expected if the UCP images need to be pulled from Docker Hub on the affected node. |
1-10 seconds |
Unable to determine node state | The ucp-reconcile container on the target node just started running and we are not able to determine its state. |
1-10 seconds |
Unhealthy UCP Controller: node is unreachable | Other manager nodes of the cluster have not received a heartbeat message from the affected node within a predetermined timeout. This usually indicates that there’s either a temporary or permanent interruption in the network link to that manager node. Ensure the underlying networking infrastructure is operational, and contact support if the symptom persists. | Until resolved |
Unhealthy UCP Controller: unable to reach controller | The controller that we are currently communicating with is not reachable within a predetermined timeout. Please refresh the node listing to see if the symptom persists. If the symptom appears intermittently, this could indicate latency spikes between manager nodes, which can lead to temporary loss in the availability of UCP itself. Please ensure the underlying networking infrastructure is operational, and contact support if the symptom persists. | Until resolved |
Unhealthy UCP Controller: Docker Swarm Cluster: Local node <ip> has status Pending |
The Engine ID of an engine is not unique in the swarm. When a node first joins the cluster, it’s added to the node inventory and discovered as Pending by Docker Swarm. The engine is “validated” if a ucp-swarm-manager container can connect to it via TLS, and if its Engine ID is unique in the swarm. If you see this issue repeatedly, make sure that your engines don’t have duplicate IDs. Use docker info to see the Engine ID. Refresh the ID by removing the /etc/docker/key.json file and restarting the daemon. |
Until resolved |