Apache Mesos - High-Availability Mode

Mesos High-Availability Mode

If the Mesos master is unavailable, existing tasks can continue to execute, but new resources cannot be allocated and new tasks cannot be launched. To reduce the chance of this situation occurring, Mesos has a high-availability mode that uses multiple Mesos masters: one active master (called the leader or leading master) and several backups in case it fails. The masters elect the leader, with Apache ZooKeeper both coordinating the election and handling leader detection by masters, agents, and scheduler drivers. More information regarding how leader election works is available on the Apache Zookeeper website.

This document describes how to configure Mesos to run in high-availability mode. For more information on developing highly available frameworks, see a companion document.

Note: This document assumes you know how to start, run, and work with ZooKeeper, whose client library is included in the standard Mesos build.

Usage

To put Mesos into high-availability mode:

  1. Ensure that the ZooKeeper cluster is up and running.

  2. Provide the znode path to all masters, agents, and framework schedulers as follows:

From now on, the Mesos masters and agents all communicate with ZooKeeper to find out which master is the current leading master. This is in addition to the usual communication between the leading master and the agents.

In addition to ZooKeeper, one can get the location of the leading master by sending an HTTP request to /redirect endpoint on any master.

For HTTP endpoints that only work at the leading master, requests made to endpoints at a non-leading master will result in either a 307 Temporary Redirect (with the location of the leading master) or 503 Service Unavailable (if the master does not know who the current leader is).

Refer to the Scheduler API for how to deal with leadership changes.

Component Disconnection Handling

When a network partition disconnects a component (master, agent, or scheduler driver) from ZooKeeper, the component’s Master Detector induces a timeout event. This notifies the component that it has no leading master. Depending on the component, the following happens. (Note that while a component is disconnected from ZooKeeper, a master may still be in communication with agents or schedulers and vice versa.)

When a network partition disconnects an agent from the leader:

Implementation Details

Mesos implements two levels of ZooKeeper leader election abstractions, one in src/zookeeper and the other in src/master (look for contender|detector.hpp|cpp).

The notion of the group of leader candidates is implemented in Group. This abstraction handles reliable (through queues and retries of retryable errors under the covers) ZooKeeper group membership registration, cancellation, and monitoring. It watches for several ZooKeeper session events:

We also explicitly timeout our sessions when disconnected from ZooKeeper for a specified amount of time. See --zk_session_timeout configuration option. This is because the ZooKeeper client libraries only notify of session expiration upon reconnection. These timeouts are of particular interest for network partitions.