Apache Mesos - Master Options

Master Options

Required Flags

Flag	Explanation
–quorum=VALUE	The size of the quorum of replicas when using `replicated_log` based registry. It is imperative to set this value to be a majority of masters i.e., `quorum > (number of masters)/2`. NOTE: Not required if master is run in standalone mode (non-HA).
–work_dir=VALUE	Path of the master work directory. This is where the persistent information of the cluster will be stored. Note that locations like `/tmp` which are cleaned automatically are not suitable for the work directory when running in production, since long-running masters could lose data when cleanup occurs. (Example: `/var/lib/mesos/master`)
–zk=VALUE	ZooKeeper URL (used for leader election amongst masters). May be one of: `zk://host1:port1,host2:port2,.../path zk://username:password@host1:port1,host2:port2,.../path file:///path/to/file (where file contains one of the above)` NOTE: Not required if master is run in standalone mode (non-HA).

Optional Flags

Flag	Explanation
–acls=VALUE	The value could be a JSON-formatted string of ACLs or a file path containing the JSON-formatted ACLs used for authorization. Path could be of the form `file:///path/to/file` or `/path/to/file`. Note that if the flag `–authorizers` is provided with a value different than `local`, the ACLs contents will be ignored. See the ACLs protobuf in acls.proto for the expected format. Example: { "register_frameworks": [ { "principals": { "type": "ANY" }, "roles": { "values": ["a"] } } ], "run_tasks": [ { "principals": { "values": ["a", "b"] }, "users": { "values": ["c"] } } ], "teardown_frameworks": [ { "principals": { "values": ["a", "b"] }, "framework_principals": { "values": ["c"] } } ], "set_quotas": [ { "principals": { "values": ["a"] }, "roles": { "values": ["a", "b"] } } ], "remove_quotas": [ { "principals": { "values": ["a"] }, "quota_principals": { "values": ["a"] } } ], "get_endpoints": [ { "principals": { "values": ["a"] }, "paths": { "values": ["/flags"] } } ] }
–agent_ping_timeout=VALUE, –slave_ping_timeout=VALUE	The timeout within which an agent is expected to respond to a ping from the master. Agents that do not respond within max_agent_ping_timeouts ping retries will be marked unreachable. NOTE: The total ping timeout (`agent_ping_timeout` multiplied by `max_agent_ping_timeouts`) should be greater than the ZooKeeper session timeout to prevent useless re-registration attempts. (default: 15secs)
–agent_removal_rate_limit=VALUE –slave_removal_rate_limit=VALUE	The maximum rate (e.g., `1/10mins`, `2/3hrs`, etc) at which agents will be removed from the master when they fail health checks. By default, agents will be removed as soon as they fail the health checks. The value is of the form `(Number of agents)/(Duration)`.
–agent_reregister_timeout=VALUE –slave_reregister_timeout=VALUE	The timeout within which an agent is expected to reregister. Agents reregister when they become disconnected from the master or when a new master is elected as the leader. Agents that do not reregister within the timeout will be marked unreachable in the registry; if/when the agent reregisters with the master, any non-partition-aware tasks running on the agent will be terminated. NOTE: This value has to be at least 10mins. (default: 10mins)
–allocation_interval=VALUE	Amount of time to wait between performing (batch) allocations (e.g., 500ms, 1sec, etc). (default: 1secs)
–allocator=VALUE	Allocator to use for resource allocation to frameworks. Use the default `HierarchicalDRF` allocator, or load an alternate allocator module using `–modules`. (default: HierarchicalDRF)
–min_allocatable_resources=VALUE	One or more sets of resources that define the minimum allocatable resources for the allocator. The allocator will only offer resources that contain at least one of the specified sets. The resources in each set should be delimited by semicolons, and the sets should be delimited by the pipe character. (Example: `disk:1\|cpu:1;mem:32` configures the allocator to only offer resources if they contain a disk resource of at least 1 megabyte, or if they contain both 1 cpu and 32 megabytes of memory.) (default: cpus:0.01\|mem:32).
–[no-]authenticate_agents, –[no-]authenticate_slaves	If `true` only authenticated agents are allowed to register. If `false` unauthenticated agents are also allowed to register. (default: false)
–[no-]authenticate_frameworks, –[no-]authenticate	If `true`, only authenticated frameworks are allowed to register. If `false`, unauthenticated frameworks are also allowed to register. For HTTP based frameworks use the `–authenticate_http_frameworks` flag. (default: false)
–[no-]authenticate_http_frameworks	If `true`, only authenticated HTTP based frameworks are allowed to register. If `false`, HTTP frameworks are not authenticated. (default: false)
–authenticators=VALUE	Authenticator implementation to use when authenticating frameworks and/or agents. Use the default `crammd5`, or load an alternate authenticator module using `–modules`. (default: crammd5)
–authentication_v0_timeout=VALUE	The timeout within which an authentication is expected to complete against a v0 framework or agent. This does not apply to the v0 or v1 HTTP APIs. (default: `15secs`)
–authorizers=VALUE	Authorizer implementation to use when authorizing actions that require it. Use the default `local`, or load an alternate authorizer module using `–modules`. Note that if the flag `–authorizers` is provided with a value different than the default `local`, the ACLs passed through the `–acls` flag will be ignored. Currently there is no support for multiple authorizers. (default: local)
–cluster=VALUE	Human readable name for the cluster, displayed in the webui.
–credentials=VALUE	Path to a JSON-formatted file containing credentials. Path can be of the form `file:///path/to/file` or `/path/to/file`. Example: `{ "credentials": [ { "principal": "sherman", "secret": "kitesurf" } ] }`
–fair_sharing_excluded_resource_names=VALUE	A comma-separated list of the resource names (e.g. ‘gpus’) that will be excluded from fair sharing constraints. This may be useful in cases where the fair sharing implementation currently has limitations. E.g. See the problem of “scarce” resources: msg35631 MESOS-5377
–[no_]filter_gpu_resources	When set to true, this flag will cause the mesos master to filter all offers from agents with GPU resources by only sending them to frameworks that opt into the ‘GPU_RESOURCES’ framework capability. When set to false, this flag will cause the master to not filter offers from agents with GPU resources, and indiscriminately send them to all frameworks whether they set the ‘GPU_RESOURCES’ capability or not. This flag is meant as a temporary workaround towards the eventual deprecation of the ‘GPU_RESOURCES’ capability. Please see the following for more information: msg37571 MESOS-7576
–framework_sorter=VALUE	Policy to use for allocating resources between a given user’s frameworks. Options are the same as for `–user_sorter`. (default: drf)
–http_framework_authenticators=VALUE	HTTP authenticator implementation to use when authenticating HTTP frameworks. Use the `basic` authenticator or load an alternate HTTP authenticator module using `–modules`. This must be used in conjunction with `–authenticate_http_frameworks`. Currently there is no support for multiple HTTP authenticators.
–[no-]log_auto_initialize	Whether to automatically initialize the replicated log used for the registry. If this is set to false, the log has to be manually initialized when used for the very first time. (default: true)
–master_contender=VALUE	The symbol name of the master contender to use. This symbol should exist in a module specified through the `–modules` flag. Cannot be used in conjunction with `–zk`. Must be used in conjunction with `–master_detector`.
–master_detector=VALUE	The symbol name of the master detector to use. This symbol should exist in a module specified through the `–modules` flag. Cannot be used in conjunction with `–zk`. Must be used in conjunction with `–master_contender`.
–max_agent_ping_timeouts=VALUE, –max_slave_ping_timeouts=VALUE	The number of times an agent can fail to respond to a ping from the master. Agents that do not respond within `max_agent_ping_timeouts` ping retries will be marked unreachable. (default: 5)
–max_completed_frameworks=VALUE	Maximum number of completed frameworks to store in memory. (default: 50)
–max_completed_tasks_per_framework=VALUE	Maximum number of completed tasks per framework to store in memory. (default: 1000)
–max_unreachable_tasks_per_framework=VALUE	Maximum number of unreachable tasks per framework to store in memory. (default: 1000)
–offer_timeout=VALUE	Duration of time before an offer is rescinded from a framework. This helps fairness when running frameworks that hold on to offers, or frameworks that accidentally drop offers. If not set, offers do not timeout.
–rate_limits=VALUE	The value could be a JSON-formatted string of rate limits or a file path containing the JSON-formatted rate limits used for framework rate limiting. Path could be of the form `file:///path/to/file` or `/path/to/file`. See the RateLimits protobuf in mesos.proto for the expected format. Example: `{ "limits": [ { "principal": "foo", "qps": 55.5 }, { "principal": "bar" } ], "aggregate_default_qps": 33.3 }`
–recovery_agent_removal_limit=VALUE, –recovery_slave_removal_limit=VALUE	For failovers, limit on the percentage of agents that can be removed from the registry and shutdown after the re-registration timeout elapses. If the limit is exceeded, the master will fail over rather than remove the agents. This can be used to provide safety guarantees for production environments. Production environments may expect that across master failovers, at most a certain percentage of agents will fail permanently (e.g. due to rack-level failures). Setting this limit would ensure that a human needs to get involved should an unexpected widespread failure of agents occur in the cluster. Values: [0%-100%] (default: 100%)
–registry=VALUE	Persistence strategy for the registry; available options are `replicated_log`, `in_memory` (for testing). (default: replicated_log)
–registry_fetch_timeout=VALUE	Duration of time to wait in order to fetch data from the registry after which the operation is considered a failure. (default: 1mins)
–registry_gc_interval=VALUE	How often to garbage collect the registry. The current leading master will periodically discard information from the registry. How long registry state is retained is controlled by other parameters (e.g., `registry_max_agent_age`, `registry_max_agent_count`); this parameter controls how often the master will examine the registry to see if data should be discarded. (default: 15mins)
–registry_max_agent_age=VALUE	Maximum length of time to store information in the registry about agents that are not currently connected to the cluster. This information allows frameworks to determine the status of unreachable and gone agents. Note that the registry always stores information on all connected agents. If there are more than `registry_max_agent_count` partitioned/gone agents, agent information may be discarded from the registry sooner than indicated by this parameter. (default: 2weeks)
–registry_max_agent_count=VALUE	Maximum number of partitioned/gone agents to store in the registry. This information allows frameworks to determine the status of disconnected agents. Note that the registry always stores information about all connected agents. See also the `registry_max_agent_age` flag. (default: 102400)
–registry_store_timeout=VALUE	Duration of time to wait in order to store data in the registry after which the operation is considered a failure. (default: 20secs)
–[no-]require_agent_domain	If true, only agents with a configured domain can register. (default: false)
–roles=VALUE	A comma-separated list of the allocation roles that frameworks in this cluster may belong to. This flag is deprecated; if it is not specified, any role name can be used.
–[no-]root_submissions	Can root submit frameworks? (default: true)
–user_sorter=VALUE	Policy to use for allocating resources between users. May be one of: dominant_resource_fairness (drf) (default: drf)
–webui_dir=VALUE	Directory path of the webui files/assets (default: /usr/local/share/mesos/webui)
–weights=VALUE	A comma-separated list of role/weight pairs of the form `role=weight,role=weight`. Weights can be used to control the relative share of cluster resources that is offered to different roles. This flag is deprecated. Instead, operators should configure weights dynamically using the `/weights` HTTP endpoint.
–whitelist=VALUE	Path to a file which contains a list of agents (one per line) to advertise offers for. The file is watched and periodically re-read to refresh the agent whitelist. By default there is no whitelist: all machines are accepted. Path can be of the form `file:///path/to/file` or `/path/to/file`.

Network Isolator Flags

Available when configured with --with-network-isolator.

Flag	Explanation
–max_executors_per_agent=VALUE, –max_executors_per_slave=VALUE	Maximum number of executors allowed per agent. The network monitoring/isolation technique imposes an implicit resource acquisition on each executor (# ephemeral ports), as a result one can only run a certain number of executors on each agent.