This document describes the network/cni
isolator, a network isolator for the MesosContainerizer that implements the Container Network Interface (CNI) specification. The network/cni
isolator allows containers launched using the MesosContainerizer
to be attached to several different types of IP networks. The network technologies on which containers can possibly be launched range from traditional layer 3/layer 2 networks such as VLAN, ipvlan, macvlan, to the new class of networks designed for container orchestration such as Calico, Weave and Flannel. The MesosContainerizer
has the network/cni
isolator enabled by default.
Having a separate network namespace for each container is attractive for orchestration engines such as Mesos, since it provides containers with network isolation and allows users to operate on containers as if they were operating on an end-host. Without network isolation users have to deal with managing network resources such as TCP/UDP ports on an end host, complicating the design of their application.
The challenge is in implementing the ability in the orchestration engine to communicate with the underlying network in order to configure IP connectivity to the container. This problem arises due to the diversity in terms of the choices of IPAM (IP address management system) and networking technologies available for enabling IP connectivity. To solve this problem we would need to adopt a driver based network orchestration model, where the MesosContainerizer
can offload the business intelligence of configuring IP connectivity to a container, to network specific drivers.
The Container Network Interface (CNI) is a specification proposed by CoreOS that provides such a driver based model. The specification defines a JSON schema that defines the inputs and outputs expected of a CNI plugin (network driver). The specification also provides a clear separation of concerns for the container run time and the CNI plugin. As per the specification the container run time is expected to configure the namespace for the container, a unique identifier for the container (container ID), and a JSON formatted input to the plugin that defines the configuration parameters for a given network. The responsibility of the plugin is to create a veth pair and attach one of the veth pairs to the network namespace of the container, and the other end to a network understood by the plugin. The CNI specification also allows for multiple networks to exist simultaneously, with each network represented by a canonical name, and associated with a unique CNI configuration. There are already CNI plugins for a variety of networks such as bridge, ipvlan, macvlan, Calico, Weave and Flannel.
Thus, introducing support for CNI in Mesos through the network/cni
isolator provides Mesos with tremendous flexibility to orchestrate containers on a wide variety of network technologies.
The network/cni
isolator is enabled by default. However, to use the isolator there are certain actions required by the operator and the frameworks. In this section we specify the steps required by the operator to configure CNI networks on Mesos and the steps required by frameworks to attach containers to a CNI network.
In order to configure the network/cni
isolator the operator specifies two flags at Agent startup as follows:
sudo mesos-slave --master=<master IP> --ip=<Agent IP>
--work_dir=/var/lib/mesos
--network_cni_config_dir=<location of CNI configs>
--network_cni_plugins_dir=<search path for CNI plugins>
Note that the network/cni
isolator learns all the available networks by looking at the CNI configuration in the --network_cni_config_dir
at startup. This implies that if a new CNI network needs to be added after Agent startup, the Agent needs to be restarted. The network/cni
isolator has been designed with recover
capabilities and hence restarting the Agent (and therefore the network/cni
isolator) will not affect container orchestration.
The network/cni
isolator learns about all the CNI networks by reading the CNI configuration specified in --network_cni_config_dir
. Hence, if the operator wants to add a CNI network, the corresponding configuration needs to be added to --network_cni_config_dir
.
While the network/cni
isolator learns the CNI networks by reading the CNI configuration files in --network_cni_config_dir
, it does not keep an in-memory copy of the CNI configurations. The network/cni
isolator only stores a mapping of the CNI network names to the corresponding CNI configuration files. Whenever the network/cni
isolator needs to attach a container to a CNI network it reads the corresponding configuration from the disk and invokes the appropriate plugin with the specified JSON configuration. Though the network/cni
isolator does not keep an in-memory copy of the JSON configuration, it checkpoints the CNI configuration used to launch a container. Checkpointing the CNI configuration protects the resources, associated with the container, by freeing them correctly when the container is destroyed, even if the CNI configuration is deleted.
The fact that the network/cni
isolator always reads the CNI configurations from the disk allows the operator to dynamically add, modify and delete CNI configurations without the need to restart the agent. Whenever the operator modifies an existing CNI configuration, the agent will pick up this new CNI configuration when the next container is launched on that specific CNI network. Similarly when the operator deletes a CNI network the network/cni
isolator will “unlearn” the CNI network (since it will have a reference to this CNI network when it started) in case a framework tries to launch a container on the deleted CNI network.
Frameworks can specify the CNI network to which they want their containers to be attached by setting the name
field in the NetworkInfo
protobuf. The name
field was introduced in the NetworkInfo
protobuf as part of MESOS-4758. Also, by specifying multiple instances of the NetworkInfo
protobuf with different name
in each of the protobuf, the MesosContainerizer
will attach the container to all the different CNI networks specified.
The default behavior for containers is to join the host network
, i.e., if the framework does not specify a name
in the NetworkInfo
protobuf, the network/cni
isolator will be a no-op for that container and will not associate a new network namespace with the container. This would effectively make the container use the host network namespace, attaching
it to the host network.
**NOTE**: While specifying multiple `NetworkInfo` protobuf allows a
container to be attached to different CNI networks, if one of the
`NetworkInfo` protobuf is without the `name` field the `network/cni`
isolator simply "skips" the protobuf, attaching the container to all
the specified CNI networks except the `host network`. To attach a
container to the host network as well as other CNI networks you
will need to attach the container to a CNI network (such as
bridge/macvlan) that, in turn, is attached to the host network.
When invoking CNI plugins (e.g., with command ADD), the isolator will pass on some Mesos meta-data to the plugins by specifying the args
field in the network configuration JSON according to the CNI spec. Currently, the isolator only passes on NetworkInfo
of the corresponding network to the plugin. This is simply the JSON representation of the NetworkInfo
protobuf. For instance:
{
"name" : "mynet",
"type" : "bridge",
"args" : {
"org.apache.mesos" : {
"network_info" : {
"name" : "mynet",
"labels" : {
"labels" : [
{ "key" : "app", "value" : "myapp" },
{ "key" : "env", "value" : "prod" }
]
},
"port_mappings" : [
{ "host_port" : 8080, "container_port" : 80 },
{ "host_port" : 8081, "container_port" : 443 }
]
}
}
}
}
It is important to note that labels
or port_mappings
within the NetworkInfo
is set by frameworks launching the container, and the isolator passses on this information to the CNI plugins. As per the spec, it is the prerogative of the CNI plugins to use this meta-data information as they see fit while attaching/detaching containers to a CNI network. E.g., CNI plugins could use labels
to enforce domain specific policies, or port_mappings
to implement NAT rules.
The network/cni
isolator allocates a network namespace to a container when it needs to attach the container to a CNI network. The network namespace is checkpointed on the host file system and can be useful to debug network connectivity to the network namespace. For a given container the network/cni
isolator checkpoints its network namespace at:
/var/run/mesos/isolators/network/cni/<container ID>/ns
The network namespace can be used with the ip
command from the iproute2 package by creating a symbolic link to the network namespace. Assuming the container ID is 5baff64c-d028-47ba-864e-a5ee679fc069
you can create the symlink as follows:
ln -s /var/run/mesos/isolators/network/cni/5baff64c-d028-47ba-8ff64c64e-a5ee679fc069/ns /var/run/netns/5baff64c
Now we can use the network namespace identifier 5baff64c
to run commands in the new network name space using the iproute2 package. E.g. you can view all the links in the container network namespace by running the command:
ip netns exec 5baff64c ip link
Similarly you can view the container’s route table by running:
ip netns exec 5baff64c ip route show
NOTE: Once MESOS-5278 is completed, executing commands within the container network namespace would be simplified and we will no longer have a dependency on the iproute2
package to debug Mesos container networking.
This section presents examples for launching containers on different CNI networks. For each of the examples the assumption is that the CNI configurations are present at /var/lib/mesos/cni/config
, and the plugins are present at /var/lib/mesos/cni/plugins
. The Agents therefore need to be started with the following command:
sudo mesos-slave --master=<master IP> --ip=<Agent IP>
--work_dir=/var/lib/mesos
--network_cni_config_dir=/var/lib/mesos/cni/config
--network_cni_plugins_dir=/var/lib/mesos/cni/plugins
--isolation=filesystem/linux,docker/runtime
--image_providers=docker
Apart from the CNI configuration parameters, we are also starting the Agent with the ability to launch docker images on MesosContainerizer
. We enable this ability in the MesosContainerizer
by enabling the filesystem/linux
and docker/runtime
isolator and setting the image provider to docker
.
To present an example of a framework launching containers on a specific CNI network, the mesos-execute
CLI framework has been modified to take a --networks
flag which will allow this example framework to launch containers on the specified network. You can find the mesos-execute
framework in your Mesos installation directory at <mesos installation>/bin/mesos-execute
.
The bridge plugin attaches containers to a Linux bridge. Linux bridges could be configured to attach to VLANs and VxLAN allowing containers to be plugged into existing layer 2 networks. We present an example below, where the CNI configuration instructs the MesosContainerizer
to invoke a bridge plugin to connect a container to a Linux bridge. The configuration also instructs the bridge plugin to assign an IP address to the container by invoking a host-local IPAM.
First, build the CNI plugin according to the instructions in the CNI repository then copy the bridge binary to the plugins directory on each agent.
Next, create the configuration file and copy this to the CNI configuration directory on each agent.
{
"name": "cni-test",
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/16",
"routes": [
{ "dst":
"0.0.0.0/0" }
]
}
}
The CNI configuration tells the bridge plugin to attach the container to a bridge called mesos-cni0
. If the bridge does not exist the bridge plugin will create one.
It is important to note the routes
section in the ipam
dictionary. For Mesos, the executors
launched as containers need to register with the Agent in order for a task to be successfully launched. Hence, it is imperative that the Agent IP is reachable from the container IP and vice versa. In this specific instance we specified a default route for the container, allowing containers to reach any network that will be routeable by the gateway, which for this CNI configuration is the bridge itself.
Another interesting attribute in the CNI configuration is the ipMasq
option. Setting this to true will install an iptable
rule in the host network namespace that would SNAT all traffic originating from the container and egressing the Agent. This allows containers to talk to the outside world even when they are in an address space that is not routeable from outside the agent.
Below we give an example of launching a Ubuntu
container and attaching it to the mesos-cni0
bridge. You can launch the Ubuntu
container using the mesos-execute
framework as follows:
sudo mesos-execute --command=/bin/bash
--docker_image=ubuntu:latest --master=<master IP>:5050 --name=ubuntu
--networks=cni-test --no-shell
The above command would pull the Ubuntu
image from the docker hub and launch it using the MesosContainerizer
and attach it to the mesos-cni0
bridge.
You can verify the network settings of the Ubuntu
container by creating a symlink to the network namespace and running the ip
command as describe in the section “Accessing container network namespace”.
Assuming we created a reference for the network namespace in /var/run/netns/5baff64c
. The output of the IP address and route table in the container network namespace would be as follows:
$ sudo ip netns exec 5baff64c ip addr show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 8a:2c:f9:41:0a:54 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.2/16 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::882c:f9ff:fe41:a54/64 scope link
valid_lft forever preferred_lft forever
$ sudo ip netns exec 5baff64c ip route show
default via 192.168.0.1 dev eth0
192.168.0.0/16 dev eth0 proto kernel scope link src 192.168.0.2
For private, isolated, networks such as a bridge network where the IP address of a container is not routeable from outside the host it becomes imperative to provide containers with DNAT capabilities so that services running on the container can be exposed outside the host on which the container is running.
Unfortunately, there is no CNI plugin available in the containernetworking/cni repository that provides port-mapping functionality. Hence, we have developed a port-mapper CNI plugin that resides within the Mesos code base called the mesos-cni-port-mapper
. The mesos-cni-port-mapper
is designed to work with any other CNI plugin that requires DNAT capabilities. One of the most obvious being the bridge
CNI plugin.
We explain the operational semantics of the mesos-cni-port-mapper
plugin by taking an example CNI configuration that allows the mesos-cni-port-mapper
to provide DNAT functionality to the bridge
plugin.
{
"name" : "port-mapper-test",
"type" : "mesos-cni-port-mapper",
"excludeDevices" : ["mesos-cni0"],
"chain": "MESOS-TEST-PORT-MAPPER",
"delegate": {
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/16",
"routes": [
{ "dst":
"0.0.0.0/0" }
]
}
}
}
For the CNI configuration above, apart from the parameters that the mesos-cni-port-mapper
plugin accepts, the important point to note in the CNI configuration of the plugin is the “delegate” field. The “delegate” field allows the mesos-cni-port-mapper
to wrap the CNI configuration of any other CNI plugin, and allows the plugin to provide DNAT capabilities to any CNI network. In this specific case the mesos-cni-port-mapper
is providing DNAT capabilities to containers running on the bridge network mesos-cni0
. The parameters that the mesos-cni-port-mapper
accepts are listed below:
The mesos-cni-port-mapper
relies heavily on iptables
to provide the DNAT capabilities to a CNI network. In order for the port-mapper plugin to function properly we have certain minimum requirements for iptables as listed below:
Finally, while the CNI configuration of the port-mapper plugin tells the plugin as to how and where to install the iptables
rules, and which CNI plugin to “delegate” the attachment/detachment of the container, the port-mapping information itself is learned by looking at the NetworkInfo
set in the args
field of the CNI configuration passed by Mesos to the port-mapper plugin. Please refer to the “Passing network labels and port-mapping information to CNI plugins” section for more details.
Calico provides 3rd-party CNI plugin that works out-of-the-box with Mesos CNI.
Calico takes a pure Layer-3 approach to networking, allocating a unique, routable IP address to each Meso task. Task routes are distributed by a BGP vRouter run on each Agent, which leverages the existing Linux kernel forwarding engine without needing tunnels, NAT, or overlays. Additionally, Calico supports rich and flexible network policy which it enforces using bookended ACLs on each compute node to provide tenant isolation, security groups, and external reachability constraints.
For information on setting up and using Calico-CNI, see Calico’s guide on integerating with Mesos.
Cilium provides a CNI plugin that works with Mesos.
Cilium brings HTTP-aware network security filtering to Linux container frameworks. Using a new Linux kernel technology called BPF, Cilium provides a simple and efficient way to define and enforce both network-layer and HTTP-layer security policies.
For more information on using Cilium with Mesos, check out the Getting Started Using Mesos Guide.
Weave provides a CNI implementation that works out-of-the-box with Mesos.
Weave provides hassle free configuration by assigning an ip-per-container and providing a fast DNS on each node. Weave is fast, by automatically choosing the fastest path between hosts. Multicast addressing and routing is fully supported. It has built in NAT traversal and encryption and continues to work even during a network partition. Finally, Multi-cloud deployments are easy to setup and maintain, even when there are multiple hops.
For more information on setting up and using Weave CNI, see Weave’s CNI documentation