Apache Mesos - Task Health Checking and Generalized Checks

Task Health Checking and Generalized Checks

Sometimes applications crash, misbehave, or become unresponsive. To detect and recover from such situations, some frameworks (e.g., Marathon, Apache Aurora) implement their own logic for checking the health of their tasks. This is typically done by having the framework scheduler send a “ping” request, e.g., via HTTP, to the host where the task is running and arranging for the task or executor to respond to the ping. Although this technique is extremely useful, there are several disadvantages in the way it is usually implemented:

To address the aforementioned problems, Mesos 1.2.0 introduced the Mesos-native health check design, defined common API for command, HTTP(S), and TCP health checks, and provided reference implementations for all built-in executors.

Mesos 1.4.0 introduced a generalized check, which delegates interpretation of a check result to the framework. This might be useful, for instance, to track tasks’ internal state transitions reliably without Mesos taking action on them.

NOTE: Some functionality related to health checking was available prior to 1.2.0 release, however it was considered experimental.

NOTE: Mesos monitors each process-based task, including Docker containers, using an equivalent of a waitpid() system call. This technique allows detecting and reporting process crashes, but is insufficient for cases when the process is still running but is not responsive.

This document describes supported check and health check types, touches on relevant implementation details, and mentions limitations and caveats.

## Mesos-native Task Checking

In contrast to the state-of-the-art “scheduler health check” pattern mentioned above, Mesos-native checks run on the agent node: it is the executor which performs checks and not the scheduler. This improves scalability but means that detecting network faults or task availability from the outside world becomes a separate concern. For instance, if the task is running on a partitioned agent, it will still be (health) checked and—if the health checks fail—might be terminated. Needless to say that due to the network partition, all this will happen without the framework scheduler being notified.

Mesos checks and health checks are described in CheckInfo and HealthCheck protobufs respectively. Currently, only tasks can be (health) checked, not arbitrary processes or executors, i.e., only the TaskInfo protobuf has the optional CheckInfo and HealthCheck fields. However, it is worth noting that all built-in executors map a task to a process.

Task status updates are leveraged to transfer the check and health check status to the Mesos master and further to the framework’s scheduler ensuring the “at-least-once” delivery guarantee. To minimize performance overhead, those task status updates are triggered if a certain condition is met, e.g., the value or presence of a specific field in the check status changes.

When a built-in executor sends a task status update because the check or health check status has changed, it sets TaskStatus.reason to REASON_TASK_CHECK_STATUS_UPDATED or REASON_TASK_HEALTH_CHECK_STATUS_UPDATED respectively. While sending such an update, the executor avoids shadowing other data that might have been injected previously, e.g., a check update includes the last known update from a health check.

It is the responsibility of the executor to interpret CheckInfo and HealthCheckInfo and perform checks appropriately. All built-in executors support health checking their tasks and all except the docker executor support generalized checks (see implementation details and limitations).

NOTE: It is up to the executor how—and whether at all—to honor the CheckInfo and HealthCheck fields in TaskInfo. Implementations may vary significantly depending on what entity TaskInfo represents. On this page only the reference implementation for built-in executors is considered.

Custom executors can use the checker library, the reference implementation for health checking that all built-in executors rely on.

On the Differences Between Checks and Health Checks

When humans read data from a sensor, they may interpret these data and act on them. For example, if they check air temperature, they usually interpret temperature readings and say whether it’s cold or warm outside; they may also act on the interpretation and decide to apply sunscreen or put on an extra jacket.

Similar reasoning can be applied to checking task’s state in Mesos:

  1. Perform a check.
  2. Optionally interpret the result and, for example, declare the task either healthy or unhealthy.
  3. Optionally act on the interpretation by killing an unhealthy task.

Mesos health checks do all of the above, 1+2+3: they run the check, declare the task healthy or not, and kill it after consecutive_failures have occurred. Though efficient and scalable, this strategy is inflexible for the needs of frameworks which may want to run an arbitrary check without Mesos interpreting the result in any way, for example, to transmit the task’s internal state transitions and make global decisions.

Conceptually, a health check is a check with an interpretation and a kill policy. A check and a health check differ in how they are specified and implemented:

NOTE: Docker executor currently supports health checks but not checks.

NOTE: Slight changes in protobuf message naming and structure are due to backward compatibility reasons; in the future the HealthCheck message will be based on CheckInfo.

## Anatomy of a Check

A CheckStatusInfo message is added to the task status update to convey the check status. Currently, check status info is only added for TASK_RUNNING status updates.

Built-in executors leverage task status updates to deliver check updates to the scheduler. To minimize performance overhead, a check-related task status update is triggered if and only if the value or presence of any field in CheckStatusInfo changes. As the CheckStatusInfo message matures, in the future we might deduplicate only on specific fields in CheckStatusInfo to make sure that as few updates as possible are sent. Note that custom executors may use a different strategy.

To support third party tooling that might not have access to the original TaskInfo specification, TaskStatus.check_status generated by built-in executors adheres to the following conventions:

NOTE: Frameworks that use custom executors are highly advised to follow the same principles built-in executors use for consistency.

### Command Checks

Command checks are described by the CommandInfo protobuf wrapped in the CheckInfo.Command message; some fields are ignored though: CommandInfo.user and CommandInfo.uris. A command check specifies an arbitrary command that is used to check a particular condition of the task. The result of the check is the exit code of the command.

NOTE: Docker executor does not currently support checks. For all other tasks, including Docker containers launched in the mesos containerizer, the command will be executed from the task’s mount namespace.

To specify a command check, set type to CheckInfo::COMMAND and populate CheckInfo.Command.CommandInfo, for example:

TaskInfo task = [...];

CheckInfo check;
check.set_type(CheckInfo::COMMAND);
check.mutable_command()->mutable_command()->set_value(
    "ls /checkfile > /dev/null");

task.mutable_check()->CopyFrom(check);

### HTTP Checks

HTTP checks are described by the CheckInfo.Http protobuf with port and path fields. A GET request is sent to http://<host>:port/path using the curl command. Note that <host> is currently not configurable and is set automatically to 127.0.0.1 (see limitations), hence the checked task must listen on the loopback interface along with any other routeable interface it might be listening on. Field port must specify an actual port the task is listening on, not a mapped one. The result of the check is the HTTP status code of the response.

Built-in executors follow HTTP 3xx redirects; custom executors may employ a different strategy.

If necessary, executors enter the task’s network namespace prior to launching the curl command.

NOTE: HTTPS checks are currently not supported.

To specify an HTTP check, set type to CheckInfo::HTTP and populate CheckInfo.Http, for example:

TaskInfo task = [...];

CheckInfo check;
check.set_type(CheckInfo::HTTP);
check.mutable_http()->set_port(8080);
check.mutable_http()->set_path("/health");

task.mutable_check()->CopyFrom(check);

### TCP Checks

TCP checks are described by the CheckInfo.Tcp protobuf, which has a single port field, which must specify an actual port the task is listening on, not a mapped one. The task is probed using Mesos’ mesos-tcp-connect command, which tries to establish a TCP connection to <host>:port. Note that <host> is currently not configurable and is set automatically to 127.0.0.1 (see limitations), hence the checked task must listen on the loopback interface along with any other routeable interface it might be listening on. Field port must specify an actual port the task is listening on, not a mapped one. The result of the check is the boolean value indicating whether a TCP connection succeeded.

If necessary, executors enter the task’s network namespace prior to launching the mesos-tcp-connect command.

To specify a TCP check, set type to CheckInfo::TCP and populate CheckInfo.Tcp, for example:

TaskInfo task = [...];

CheckInfo check;
check.set_type(CheckInfo::TCP);
check.mutable_tcp()->set_port(8080);

task.mutable_check()->CopyFrom(check);

Common options

The CheckInfo protobuf contains common options which regulate how a check must be performed by an executor:

NOTE: Since each time a check is performed a helper command is launched (see limitations), setting timeout_seconds to a small value, e.g., <5s, may lead to intermittent failures.

NOTE: Launching a check is not a free operation. To avoid unpredictable spikes in agent’s load, e.g., when most of the tasks run their checks simultaneously, avoid setting interval_seconds to zero.

As an example, the code below specifies a task which is a Docker container with a simple HTTP server listening on port 8080 and an HTTP check that should be performed every 5 seconds starting from the task launch and response time under 1 second.

TaskInfo task = createTask(...);

// Use Netcat to emulate an HTTP server.
const string command =
    "nc -lk -p 8080 -e echo -e \"HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\"";
task.mutable_command()->set_value(command)

Image image;
image.set_type(Image::DOCKER);
image.mutable_docker()->set_name("alpine");

ContainerInfo* container = task.mutable_container();
container->set_type(ContainerInfo::MESOS);
container->mutable_mesos()->mutable_image()->CopyFrom(image);

// Set `delay_seconds` here because it takes
// some time to launch Netcat to serve requests.
CheckInfo check;
check.set_type(CheckInfo::HTTP);
check.mutable_http()->set_port(8080);
check.set_delay_seconds(15);
check.set_interval_seconds(5);
check.set_timeout_seconds(1);

task.mutable_check()->CopyFrom(check);

Anatomy of a Health Check

The boolean healthy field is used to convey health status, which may be insufficient in certain cases. This means a task that has failed health checks will be RUNNING with healthy set to false. Currently, the healthy field is only set for TASK_RUNNING status updates.

When a task turns unhealthy, a task status update message with the healthy field set to false is sent to the Mesos master and then forwarded to a scheduler. The executor is expected to kill the task after a number of consecutive failures defined in the consecutive_failures field of the HealthCheck protobuf.

NOTE: While a scheduler currently cannot cancel a task kill due to failing health checks, it may issue a killTask command itself. This may be helpful to emulate a “global” policy for handling tasks with failing health checks (see limitations). Alternatively, the scheduler might use generalized checks instead.

Built-in executors forward all unhealthy status updates, as well as the first healthy update when a task turns healthy, i.e., when the task has started or after one or more unhealthy updates have occurred. Note that custom executors may use a different strategy.

### Command Health Checks

Command health checks are described by the CommandInfo protobuf; some fields are ignored though: CommandInfo.user and CommandInfo.uris. A command health check specifies an arbitrary command that is used to validate the health of the task. The executor launches the command and inspects its exit status: 0 is treated as success, any other status as failure.

NOTE: If a task is a Docker container launched by the docker executor, it will be wrapped in docker run. For all other tasks, including Docker containers launched in the mesos containerizer, the command will be executed from the task’s mount namespace.

To specify a command health check, set type to HealthCheck::COMMAND and populate CommandInfo, for example:

TaskInfo task = [...];

HealthCheck healthCheck;
healthCheck.set_type(HealthCheck::COMMAND);
healthCheck.mutable_command()->set_value("ls /checkfile > /dev/null");

task.mutable_health_check()->CopyFrom(healthCheck);

### HTTP(S) Health Checks

HTTP(S) health checks are described by the HealthCheck.HTTPCheckInfo protobuf with scheme, port, path, and statuses fields. A GET request is sent to scheme://<host>:port/path using the curl command. Note that <host> is currently not configurable and is set automatically to 127.0.0.1 (see limitations), hence the health checked task must listen on the loopback interface along with any other routeable interface it might be listening on. The scheme field supports "http" and "https" values only. Field port must specify an actual port the task is listening on, not a mapped one.

Built-in executors follow HTTP 3xx redirects and treat status codes between 200 and 399 as success; custom executors may employ a different strategy, e.g., leveraging the statuses field.

NOTE: Setting HealthCheck.HTTPCheckInfo.statuses has no effect on the built-in executors.

If necessary, executors enter the task’s network namespace prior to launching the curl command.

To specify an HTTP health check, set type to HealthCheck::HTTP and populate HTTPCheckInfo, for example:

TaskInfo task = [...];

HealthCheck healthCheck;
healthCheck.set_type(HealthCheck::HTTP);
healthCheck.mutable_http()->set_port(8080);
healthCheck.mutable_http()->set_scheme("http");
healthCheck.mutable_http()->set_path("/health");

task.mutable_health_check()->CopyFrom(healthCheck);

### TCP Health Checks

TCP health checks are described by the HealthCheck.TCPCheckInfo protobuf, which has a single port field, which must specify an actual port the task is listening on, not a mapped one. The task is probed using Mesos’ mesos-tcp-connect command, which tries to establish a TCP connection to <host>:port. Note that <host> is currently not configurable and is set automatically to 127.0.0.1 (see limitations), hence the health checked task must listen on the loopback interface along with any other routeable interface it might be listening on. Field port must specify an actual port the task is listening on, not a mapped one.

The health check is considered successful if the connection can be established.

If necessary, executors enter the task’s network namespace prior to launching the mesos-tcp-connect command.

To specify a TCP health check, set type to HealthCheck::TCP and populate TCPCheckInfo, for example:

TaskInfo task = [...];

HealthCheck healthCheck;
healthCheck.set_type(HealthCheck::TCP);
healthCheck.mutable_tcp()->set_port(8080);

task.mutable_health_check()->CopyFrom(healthCheck);

Common options

The HealthCheck protobuf contains common options which regulate how a health check must be performed and interpreted by an executor:

NOTE: Since each time a health check is performed a helper command is launched (see limitations), setting timeout_seconds to a small value, e.g., <5s, may lead to intermittent failures.

As an example, the code below specifies a task which is a Docker container with a simple HTTP server listening on port 8080 and an HTTP health check that should be performed every 5 seconds starting from the task launch and allows consecutive failures during the first 15 seconds and response time under 1 second.

TaskInfo task = createTask(...);

// Use Netcat to emulate an HTTP server.
const string command =
    "nc -lk -p 8080 -e echo -e \"HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\"";
task.mutable_command()->set_value(command)

Image image;
image.set_type(Image::DOCKER);
image.mutable_docker()->set_name("alpine");

ContainerInfo* container = task.mutable_container();
container->set_type(ContainerInfo::MESOS);
container->mutable_mesos()->mutable_image()->CopyFrom(image);

// Set `grace_period_seconds` here because it takes
// some time to launch Netcat to serve requests.
HealthCheck healthCheck;
healthCheck.set_type(HealthCheck::HTTP);
healthCheck.mutable_http()->set_port(8080);
healthCheck.set_delay_seconds(0);
healthCheck.set_interval_seconds(5);
healthCheck.set_timeout_seconds(1);
healthCheck.set_grace_period_seconds(15);

task.mutable_health_check()->CopyFrom(healthCheck);

## Under the Hood

All built-in executors rely on the checker library, which lives in “src/checks”. An executor creates an instance of the Checker or HealthChecker class per task and passes the check or health check definition together with extra parameters. In return, the library notifies the executor of changes in the task’s check or health status. For health checks, the definition is converted to the check definition before performing the check, and the check result is interpreted according to the health check definition.

The library depends on curl for HTTP(S) checks and mesos-tcp-connect for TCP checks (the latter is a simple command bundled with Mesos).

One of the most non-trivial things the library takes care of is entering the appropriate task’s namespaces (mnt, net) on Linux agents. To perform a command check, the checker must be in the same mount namespace as the checked process; this is achieved by either calling docker run for the check command in case of docker containerizer or by explicitly calling setns() for mnt namespace in case of mesos containerizer (see containerization in Mesos). To perform an HTTP(S) or TCP check, the most reliable solution is to share the same network namespace with the checked process; in case of docker containerizer setns() for net namespace is explicitly called, while mesos containerizer guarantees an executor and its tasks are in the same network namespace.

NOTE: Custom executors may or may not use this library. Please consult the respective framework’s documentation.

Regardless of executor, all checks and health checks consume resources from the task’s resource allocation. Hence it is a good idea to add some extra resources, e.g., 0.05 cpu and 32MB mem, to the task definition if a Mesos-native check and/or health check is specified.

Windows Implementation

On Windows, the implementation differs between the mesos containerizer and docker containerizer. The mesos containerizer does not provide network or mount namespace isolation, so curl, mesos-tcp-connect or the command health check simply run as regular processes on the host. In constrast, the docker containerizer provides network and mount isolation. For the command health check, the command enters the container’s namespace through docker exec. For the network health checks, the docker executor launches a container with the mesos/windows-health-check image and enters the original container’s network namespace through the --network=container:<ID> parameter in docker run.

## Current Limitations and Caveats