This document describes the posix/rlimits
isolator. The isolator adds support for setting POSIX resource limits (rlimits) for containers launched using the Mesos containerizer.
To enable the POSIX Resource Limits support, append posix/rlimits
to the --isolation
flag when starting the agent.
POSIX rlimits can be used control the resources a process can consume. Resource limits are typically set at boot time and inherited when a child process is forked from a parent process; resource limits can also be modified via setrlimit(2)
. In many interactive shells, resource limits can be inspected or modified with the ulimit
shell built-in.
A POSIX resource limit consist of a soft and a hard limit. The soft limit specifies the effective resource limit for the current and forked process, while the hard limit gives the value up to which processes may increase their effective limit; increasing the hard limit is a privileged action. It is required that the soft limit is less than or equal to the hard limit. System administrators can use a hard resource limit to define the maximum amount of resources that can be consumed by a user; users can employ soft resource limits to ensure that one of their tasks only consumes a limited amount of the global hard resource limit.
This isolator permits setting per-task resource limits. This isolator interprets rlimits specified as part of a task’s ContainerInfo
for the Mesos containerizer, e.g.,
{
"container": {
"type": "MESOS",
"rlimit_info": {
"rlimits": [
{
"type": "RLMT_CORE"
},
{
"type": "RLMT_STACK",
"soft": 8192,
"hard": 32768
}
]
}
}
}
To enable interpretation of rlimits, agents need to be started with posix/rlimits
in its --isolation
flag, e.g.,
mesos-agent --master=<master ip> --ip=<agent ip>
--work_dir=/var/lib/mesos
--isolation=posix/rlimits[,other isolation flags]
To set a hard limit for a task larger than the current value of the hard limit, the agent process needs to be under a privileged user (with the CAP_SYS_RESOURCE
capability), typically root
.
POSIX currently defines a base set of resources, see the documentation; Linux defines additional resource limits, see e.g., the documentation of setrlimit(2)
.
Resource | Comment |
---|---|
RLIMIT_CORE
|
POSIX: This is the maximum size of a core file, in bytes, that may be created by a process. |
RLIMIT_CPU
|
POSIX: This is the maximum amount of CPU time, in seconds, used by a process. |
RLIMIT_DATA
|
POSIX: This is the maximum size of a process’ data segment, in bytes. |
RLIMIT_FSIZE
|
POSIX: This is the maximum size of a file, in bytes, that may be created by a process. |
RLIMIT_NOFILE
|
POSIX: This is a number one greater than the maximum value that the system may assign to a newly-created descriptor. |
RLIMIT_STACK
|
POSIX: This is the maximum size of the initial thread’s stack, in bytes. |
RLIMIT_AS
|
POSIX: This is the maximum size of a process’ total available memory, in bytes. |
RLMT_LOCKS
|
Linux: (Early Linux 2.4 only) A limit on the combined number of flock(2) locks and fcntl(2) leases that this process may establish.
|
RLMT_MEMLOCK
|
Linux: The maximum number of bytes of memory that may be locked into RAM. |
RLMT_MSGQUEUE
|
Linux: Specifies the limit on the number of bytes that can be allocated for POSIX message queues for the real user ID of the calling process. |
RLMT_NICE
|
Linux: (Since Linux 2.6.12) Specifies a ceiling to which the process’s nice value can be raised using setpriority(2) or nice(2) .
|
RLMT_NPROC
|
Linux: The maximum number of processes (or, more precisely on Linux, threads) that can be created for the real user ID of the calling process. |
RLMT_RSS
|
Linux: Specifies the limit (in pages) of the process’s resident set (the number of virtual pages resident in RAM). |
RLMT_RTPRIO
|
Linux: (Since Linux 2.6.12) Specifies a ceiling on the real-time priority that may be set for this process using sched_setscheduler(2) and sched_setparam(2). |
RLMT_RTTIME
|
Linux: (Since Linux 2.6.25) Specifies a limit (in microseconds) on the amount of CPU time that a process scheduled under a real-time scheduling policy may consume without making a blocking system call. |
RLMT_SIGPENDING
|
Linux: (Since Linux 2.6.8) Specifies the limit on the number of signals that may be queued for the real user ID of the calling process. |
Mesos maps these resource types onto RLimit
types, where by convention the prefix RLMT_
is used in place of RLIMIT_
above. Not all limits types are supported on all platforms.
We require either both the soft and hard RLimit
value, or none to be set; the latter case is interpreted as the absence of an explicit limit.