Some TaskStatus messages will arrive with the reason
field set to a value that can allow frameworks to display better error messages and to implement special behaviour for some of the reasons.
For most reasons, the message
field of the TaskStatus message will give a more detailed, human-readable error description.
Not all status updates will contain a reason.
Frameworks that implement their own executors are free to set the reason field on any status messages they produce.
Note that executors can not generally rely on the fact that the scheduler will see the status update with the reason set by the executor, since only the latest update for each different task state is stored and re-transmitted. See in particular the description of REASON_RECONCILIATION
below.
Most reasons describe conditions that can only be detected in the master or agent code, and will accompany automatically generated status updates from either of these.
For consistency with the existing usages of the different task reasons, we recommend that executors restrict themselves to the following subset if they use a non-default reason in their status updates.
REASON_TASK_CHECK_STATUS_UPDATED
|
For executors that support running task checks, it is recommended to generate a status update with this reason every time the task check status changes, together with a human-readable description of the change in the message field.
|
REASON_TASK_HEALTH_CHECK_STATUS_UPDATED
|
For executors that support running task health checks, it is recommended to generate a status update with this reason every time the health check status changes, together with a human-readable description of the change in the message field. Note: The built-in executors additionally send an update with this reason every time a health check is unhealthy.
|
REASON_TASK_INVALID
|
For executors that implement their own task validation logic, this reason can be used when the validation check fails, together with a human-readable description of the failed check in the message field.
|
REASON_TASK_UNAUTHORIZED
|
For executors that implement their own authorization logic, this reason can be used when authorization fails, together with a human-readable description in the message field.
|
The reason REASON_COMMAND_EXECUTOR_FAILED
is deprecated and will be removed in the future. It should not be referenced by newly written code.
The reasons REASON_CONTAINER_LIMITATION
, REASON_INVALID_FRAMEWORKID
, REASON_SLAVE_UNKNOWN
, REASON_TASK_UNKNOWN
and REASON_EXECUTOR_UNREGISTERED
are not used as of Mesos 1.4.
For these status updates, the reason indicates why the task state changed. Typically, a given reason will always appear together with the same state.
Typically they are generated by mesos when an error occurs that prevents the executor from sending its own status update messages.
Below, a partition-aware framework means a framework which has the Capability::PARTITION_AWARE
capability bit set in its FrameworkInfo
. Messages generated on the master will have the source
field set to SOURCE_MASTER
and messages generated on the agent will have it set to SOURCE_AGENT
in the v1 API or SOURCE_SLAVE
in the v0 API.
As of Mesos 1.4, the following reasons are being used.
TASK_FAILED
REASON_CONTAINER_LAUNCH_FAILED
|
The task could not be launched because its container failed to launch. |
REASON_CONTAINER_LIMITATION_MEMORY
|
The container in which the task was running exceeded its memory allocation. |
REASON_CONTAINER_LIMITATION_DISK
|
The container in which the task was running exceeded its disk quota. |
REASON_IO_SWITCHBOARD_EXITED
|
The I/O switchboard server terminated unexpectedly. |
REASON_EXECUTOR_REGISTRATION_TIMEOUT
|
The executor for this task didn’t register with the agent within the allowed time limit. |
REASON_EXECUTOR_REREGISTRATION_TIMEOUT
|
The executor for this task lost connection and didn’t reregister within the allowed time limit. |
REASON_EXECUTOR_TERMINATED
|
The tasks’ executor terminated abnormally, and no more specific reason could be determined. |
TASK_KILLED
REASON_FRAMEWORK_REMOVED
|
The framework to which this task belonged was removed. Note: The status update will be sent out before the task is actually killed. |
REASON_TASK_KILLED_DURING_LAUNCH
|
This task, or a task within this task group, was killed before delivery to the agent. |
REASON_TASK_KILLED_DURING_LAUNCH
|
This task, or a task within this task group, was killed before delivery to the executor. Note: Prior to version 1.5, the agent would in this situation sometimes send status updates with reason set to REASON_EXECUTOR_UNREGISTERED and sometimes without any reason set, depending on details of the timing of the executor launch and the kill command.
|
TASK_ERROR
REASON_TASK_INVALID
|
Task or resource validation checks failed. |
REASON_TASK_GROUP_INVALID
|
Task group or resource validation checks failed. |
REASON_TASK_UNAUTHORIZED
|
Task authorization failed on the master. |
REASON_TASK_GROUP_UNAUTHORIZED
|
Task group authorization failed on the master. |
REASON_TASK_UNAUTHORIZED
|
Task authorization failed on the agent. |
REASON_TASK_GROUP_UNAUTHORIZED
|
Task group authorization failed on the agent. |
TASK_LOST
REASON_SLAVE_DISCONNECTED
|
The agent on which the task was running disconnected, and didn’t reconnect in time. |
The task was part of an accepted offer, but the agent sending the offer disconnected in the meantime. Note: For partition-aware frameworks, the state will be TASK_DROPPED instead.
|
|
REASON_MASTER_DISCONNECTED
|
The task was part of an accepted offer which couldn’t be sent to the master, because it was disconnected. Note: For partition-aware frameworks, the state will be TASK_DROPPED instead. Note: Despite the source being set to SOURCE_MASTER , the message is not sent from the master but locally from the scheduler driver. Note: This reason is only used in the v0 API.
|
REASON_SLAVE_REMOVED
|
The agent on which the task was running was removed. |
The task was part of an accepted offer, but the agent sending the offer was disconnected in the meantime. Note: For partition-aware frameworks, the state will be to TASK_DROPPED instead.
|
|
The agent on which the task was running was marked unreachable. Note: For partition-aware frameworks, the state will be TASK_UNREACHABLE instead.
|
|
REASON_RESOURCES_UNKNOWN
|
The task was part of an accepted offer which used checkpointed resources that are not known to the master. Note: For partition-aware frameworks, the state will be TASK_DROPPED instead.
|
REASON_SLAVE_RESTARTED
|
The task was launched during an agent restart, and never got forwarded to the executor. Note: For partition-aware frameworks, the state will be TASK_DROPPED instead.
|
REASON_CONTAINER_PREEMPTED
|
The container in which the task was running was pre-empted by a QoS correction. Note: For partition-aware frameworks, the state will be changed to TASK_GONE instead.
|
REASON_CONTAINER_UPDATE_FAILED
|
The container in which the task was running was discarded because a resource update failed. Note: For partition-aware frameworks, the state will be TASK_GONE instead.
|
REASON_EXECUTOR_TERMINATED
|
The executor which was supposed to execute this task was already terminated, or the agent receives an instruction to kill the task before the executor was started. Note: For partition-aware frameworks, the state will be TASK_DROPPED instead.
|
REASON_GC_ERROR
|
A directory to be used by this task was scheduled for GC and it could not be unscheduled. Note: For partition-aware frameworks, the state will be TASK_DROPPED instead.
|
REASON_INVALID_OFFERS
|
This task belonged to an accepted offer that didn’t pass validation checks. Note: For partition-aware frameworks, the state will be TASK_DROPPED instead.
|
TASK_DROPPED
:
REASON_SLAVE_DISCONNECTED
|
See TASK_LOST
|
REASON_SLAVE_REMOVED
|
See TASK_LOST
|
REASON_RESOURCES_UNKNOWN
|
See TASK_LOST
|
REASON_SLAVE_RESTARTED
|
See TASK_LOST
|
REASON_GC_ERROR
|
See TASK_LOST
|
REASON_INVALID_OFFERS
|
See TASK_LOST
|
TASK_UNREACHABLE
:
REASON_SLAVE_REMOVED
|
See TASK_LOST |
TASK_GONE
REASON_CONTAINER_UPDATE_FAILED
|
See TASK_LOST
|
REASON_CONTAINER_PREEMPTED
|
See TASK_LOST
|
REASON_EXECUTOR_PREEMPTED
|
Renamed to REASON_CONTAINER_PREEMPTED in Mesos 0.26.
|
These reasons do not cause a state change, and will be sent along with the last known state of the task. The reason field indicates why the status update was sent.
REASON_RECONCILIATION
|
A framework requested implicit or explicit reconciliation for this task. Note: Status updates with this reason are not the original ones, but rather a modified copy that is re-sent from the master. In particular, the original data and message fields are erased and the original reason field is overwritten by REASON_RECONCILIATION .
|
REASON_TASK_CHECK_STATUS_UPDATED
|
A task check notified the agent that its state changed. Note: This reason is set by the executor, so for tasks that are running with a custom executor, whether or not status updates with this reasons are sent depends on that executors implementation. Note: Currently, when using one of the built-in executors, this reason is only used within status updates with task state TASK_RUNNING .
|
REASON_TASK_HEALTH_CHECK_STATUS_UPDATED
|
A task health check notified the agent that its state changed. Note: This reason is set by the executor, so for tasks that are running with a custom executor, whether or not status updates with this reasons are sent depends on that executors implementation. Note: Currently, when using one of the built-in executors, this reason is only used within status updates with task state TASK_RUNNING .
|
REASON_SLAVE_REREGISTERED
|
The agent on which the task was running has reregistered after being marked unreachable by the master. Note: Due to garbage collection of the unreachable and gone agents in the registry and master state Mesos also sends such status updates for agents unknown to the master. Note: Status updates with this reason are modified copies re-sent by the master which reflect the states of the tasks reported by the agent upon its re-registration. See comments for REASON_RECONCILIATION .
|