In this document we refer to Mesos applications as “frameworks”.
See one of the example framework schedulers in MESOS_HOME/src/examples/
to get an idea of what a Mesos framework scheduler and executor in the language of your choice looks like. RENDLER provides example framework implementations in C++, Go, Haskell, Java, Python and Scala.
If you are writing a scheduler against Mesos 1.0 or newer, it is recommended to use the new HTTP API to talk to Mesos.
If your framework needs to talk to Mesos 0.28.0 or older, or you have not updated to the HTTP API, you can write the scheduler in C++, Java/Scala, or Python. Your framework scheduler should inherit from the Scheduler
class (see: C++, Java, Python). Your scheduler should create a SchedulerDriver (which will mediate communication between your scheduler and the Mesos master) and then call SchedulerDriver.run()
(see: C++, Java, Python).
How to build Mesos frameworks that are highly available in the face of failures is discussed in a separate document.
When implementing a scheduler, it’s important to adhere to the following guidelines in order to ensure that the scheduler can run in a scalable manner alongside other schedulers in the same Mesos cluster:
Suppress
: The scheduler must stay in a suppressed state whenever it has no additional tasks to launch or offer operations to perform. This ensures that Mesos can more efficiently offer resources to those frameworks that do have work to perform.Decline
resources using a large timeout: when declining an offer, use a large Filters.refuse_seconds
timeout (e.g. 1 hour). This ensures that Mesos will have time to try offering the resources to other scheduler before trying the same scheduler again. However, if the scheduler is unable to eventually enter a SUPPRESS
ed state, and it has new workloads to run after having declined, it should consider REVIVE
ing if it is not receiving sufficient resources for some time.REVIVE
frequently: REVIVE
ing clears all filters, and therefore if REVIVE
occurs frequently it is similar to always declining with a very short timeout (violation of guideline (3)).Operationally, the following can be done to ensure that schedulers get the resources they need when co-existing with other schedulers:
--role-sorter=random
and --framework_sorter=random
.See the Offer Starvation Design Document in MESOS-3202 for more information about the pitfalls and future plans for running multiple schedulers.
Mesos provides a simple executor that can execute shell commands and Docker containers on behalf of the framework scheduler; enough functionality for a wide variety of framework requirements.
Any scheduler can make use of the Mesos command executor by filling in the optional CommandInfo
member of the TaskInfo
protobuf message.
message TaskInfo {
...
optional CommandInfo command = 7;
...
}
The Mesos slave will fill in the rest of the ExecutorInfo
for you when tasks are specified this way.
Note that the agent will derive an ExecutorInfo
from the TaskInfo
and additionally copy fields (e.g., Labels
) from TaskInfo
into the new ExecutorInfo
. This ExecutorInfo
is only visible on the agent.
Since Mesos 1.1, a new built-in default executor (experimental) is available that can execute a group of tasks. Just like the command executor the tasks can be shell commands or Docker containers.
The current semantics of the default executor are as folows:
– Tasks are launched as nested containers underneath the executor container.
– Task containers and executor container share resources like cpu, memory, network and volumes.
– There is no resource isolation between different tasks within an executor. Tasks’ resources are added to the executor container.
– If any of the tasks exits with a non-zero exit code, all the tasks in the task group are killed and the executor shuts down.
– Multiple task groups are not supported.
Once the default executor is considered stable, the command executor will be deprecated in favor of it.
Any scheduler can make use of the Mesos default executor by setting ExecutorInfo.type
to DEFAULT
when launching a group of tasks using the LAUNCH_GROUP
offer operation. If DEFAULT
executor is explicitly specified when using LAUNCH
offer operation, command executor is used instead of the default executor. This might change in the future when the default executor gets support for handling LAUNCH
operation.
message ExecutorInfo {
...
optional Type type = 15;
...
}
If your framework has special requirements, you might want to provide your own Executor implementation. For example, you may not want a 1:1 relationship between tasks and processes.
If you are writing an executor against Mesos 1.0 or newer, it is recommended to use the new HTTP API to talk to Mesos.
If writing against Mesos 0.28.0 or older, your framework executor must inherit from the Executor class (see (see: C++, Java, Python). It must override the launchTask() method. You can use the $MESOS_HOME environment variable inside of your executor to determine where Mesos is running from. Your executor should create an ExecutorDriver (which will mediate communication between your executor and the Mesos agent) and then call ExecutorDriver.run()
(see: C++, Java, Python).
After creating your custom executor, you need to make it available to all slaves in the cluster.
One way to distribute your framework executor is to let the Mesos fetcher download it on-demand when your scheduler launches tasks on that slave. ExecutorInfo
is a Protocol Buffer Message class (defined in include/mesos/mesos.proto
), and it contains a field of type CommandInfo
. CommandInfo
allows schedulers to specify, among other things, a number of resources as URIs. These resources are fetched to a sandbox directory on the slave before attempting to execute the ExecutorInfo
command. Several URI schemes are supported, including HTTP, FTP, HDFS, and S3 (e.g. see src/examples/java/TestFramework.java for an example of this).
Alternatively, you can pass the frameworks_home
configuration option (defaults to: MESOS_HOME/frameworks
) to your mesos-slave
daemons when you launch them to specify where your framework executors are stored (e.g. on an NFS mount that is available to all slaves), then use a relative path in CommandInfo.uris
, and the slave will prepend the value of frameworks_home
to the relative path provided.
Once you are sure that your executors are available to the mesos-slaves, you should be able to run your scheduler, which will register with the Mesos master, and start receiving resource offers!
Labels
can be found in the FrameworkInfo
, TaskInfo
, DiscoveryInfo
and TaskStatus
messages; framework and module writers can use Labels to tag and pass unstructured information around Mesos. Labels are free-form key-value pairs supplied by the framework scheduler or label decorator hooks. Below is the protobuf definitions of labels:
optional Labels labels = 11;
/**
* Collection of labels.
*/
message Labels {
repeated Label labels = 1;
}
/**
* Key, value pair used to store free form user-data.
*/
message Label {
required string key = 1;
optional string value = 2;
}
Labels are not interpreted by Mesos itself, but will be made available over master and slave state endpoints. Further more, the executor and scheduler can introspect labels on the TaskInfo
and TaskStatus
programmatically. Below is an example of how two label pairs ("environment": "prod"
and "bananas": "apples"
) can be fetched from the master state endpoint.
$ curl http://master/state.json
...
{
"executor_id": "default",
"framework_id": "20150312-120017-16777343-5050-39028-0000",
"id": "3",
"labels": [
{
"key": "environment",
"value": "prod"
},
{
"key": "bananas",
"value": "apples"
}
],
"name": "Task 3",
"slave_id": "20150312-115625-16777343-5050-38751-S0",
"state": "TASK_FINISHED",
...
},
When your framework registers an executor or launches a task, it can provide additional information for service discovery. This information is stored by the Mesos master along with other imporant information such as the slave currently running the task. A service discovery system can programmatically retrieve this information in order to set up DNS entries, configure proxies, or update any consistent store used for service discovery in a Mesos cluster that runs multiple frameworks and multiple tasks.
The optional DiscoveryInfo
message for TaskInfo
and ExecutorInfo
is declared in MESOS_HOME/include/mesos/mesos.proto
message DiscoveryInfo {
enum Visibility {
FRAMEWORK = 0;
CLUSTER = 1;
EXTERNAL = 2;
}
required Visibility visibility = 1;
optional string name = 2;
optional string environment = 3;
optional string location = 4;
optional string version = 5;
optional Ports ports = 6;
optional Labels labels = 7;
}
Visibility
is the key parameter that instructs the service discovery system whether a service should be discoverable. We currently differentiate between three cases:
Many service discovery systems provide additional features that manage the visibility of services (e.g., ACLs in proxy based systems, security extensions to DNS, VLAN or subnet selection). It is not the intended use of the visibility field to manage such features. When a service discovery system retrieves the task or executor information from the master, it can decide how to handle tasks without DiscoveryInfo
. For instance, tasks may be made non discoverable to other frameworks (equivalent to visibility=FRAMEWORK
) or discoverable to all frameworks (equivalent to visibility=CLUSTER
).
The name
field is a string that provides the service discovery system with the name under which the task is discoverable. The typical use of the name field will be to provide a valid hostname. If name is not provided, it is up to the service discovery system to create a name for the task based on the name field in taskInfo
or other information.
The environment
, location
, and version
fields provide first class support for common attributes used to differentiate between similar services in large deployments. The environment
may receive values such as PROD/QA/DEV
, the location
field may receive values like EAST-US/WEST-US/EUROPE/AMEA
, and the version
field may receive values like v2.0/v0.9. The exact use of these fields is up to the service discovery system.
The ports
field allows the framework to identify the ports a task listens to and explicitly name the functionality they represent and the layer-4 protocol they use (TCP, UDP, or other). For example, a Cassandra task will define ports like "7000,Cluster,TCP"
, "7001,SSL,TCP"
, "9160,Thrift,TCP"
, "9042,Native,TCP"
, and "7199,JMX,TCP"
. It is up to the service discovery system to use these names and protocol in appropriate ways, potentially combining them with the name
field in DiscoveryInfo
.
The labels
field allows a framework to pass arbitrary labels to the service discovery system in the form of key/value pairs. Note that anything passed through this field is not guaranteed to be supported moving forward. Nevertheless, this field provides extensibility. Common uses of this field will allow us to identify use cases that require first class support.