Content-type: text/html; charset=UTF-8
Linux provides the following namespaces:
Namespace | Constant | Isolates |
Cgroup | CLONE_NEWCGROUP | Cgroup root directory |
IPC | CLONE_NEWIPC | System V IPC, POSIX message queues |
Network | CLONE_NEWNET | Network devices, stacks, ports, etc. |
Mount | CLONE_NEWNS | Mount points |
PID | CLONE_NEWPID | Process IDs |
User | CLONE_NEWUSER | User and group IDs |
UTS | CLONE_NEWUTS | Hostname and NIS domain name |
This page describes the various namespaces and the associated /proc files, and summarizes the APIs for working with namespaces.
Creation of new namespaces using clone(2) and unshare(2) in most cases requires the CAP_SYS_ADMIN capability. User namespaces are the exception: since Linux 3.8, no privilege is required to create a user namespace.
$ ls -l /proc/$$/ns total 0 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup -> cgroup:[4026531835] lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc -> ipc:[4026531839] lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt -> mnt:[4026531840] lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net -> net:[4026531969] lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid -> pid:[4026531836] lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user -> user:[4026531837] lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts -> uts:[4026531838]
Bind mounting (see mount(2)) one of the files in this directory to somewhere else in the filesystem keeps the corresponding namespace of the process specified by pid alive even if all processes currently in the namespace terminate.
Opening one of the files in this directory (or a file that is bind mounted to one of these files) returns a file handle for the corresponding namespace of the process specified by pid. As long as this file descriptor remains open, the namespace will remain alive, even if all processes in the namespace terminate. The file descriptor can be passed to setns(2).
In Linux 3.7 and earlier, these files were visible as hard links. Since Linux 3.8, they appear as symbolic links. If two processes are in the same namespace, then the inode numbers of their /proc/[pid]/ns/xxx symbolic links will be the same; an application can check this using the stat.st_ino field returned by stat(2). The content of this symbolic link is a string containing the namespace type and inode number as in the following example:
$ readlink /proc/$$/ns/uts uts:[4026531838]
The symbolic links in this subdirectory are as follows:
Permission to dereference or read (readlink(2)) these symbolic links is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem. Objects created in an IPC namespace are visible to all other processes that are members of that namespace, but are not visible to processes in other IPC namespaces.
The following /proc interfaces are distinct in each IPC namespace:
When an IPC namespace is destroyed (i.e., when the last process that is a member of the namespace terminates), all IPC objects in the namespace are automatically destroyed.
Use of IPC namespaces requires a kernel that is configured with the CONFIG_IPC_NS option.
When a network namespace is freed (i.e., when the last process in the namespace terminates), its physical network devices are moved back to the initial network namespace (not to the parent of the process).
Use of network namespaces requires a kernel that is configured with the CONFIG_NET_NS option.
Use of UTS namespaces requires a kernel that is configured with the CONFIG_UTS_NS option.
new_fd = ioctl(fd, request);
In each case, fd refers to a /proc/[pid]/ns/* file. Both operations return a new file descriptor on success.
The new file descriptor returned by these operations is opened with the O_RDONLY and O_CLOEXEC (close-on-exec; see fcntl(2))flags.
By applying fstat(2) to the returned file descriptor, one obtains a stat structure whose st_dev (resident device) and st_ino (inode number) fields together identify the owning/parent namespace. This inode number can be matched with the inode number of another /proc/[pid]/ns/{pid,user} file to determine whether that is the owning/parent namespace.
Either of these ioctl(2) operations can fail with the following errors:
Additionally, the NS_GET_PARENT operation can fail with the following error:
See the EXAMPLE section for an example of the use of these operations.
The example shown below uses the ioctl(2) operations described above to perform simple introspection of namespace relationships. The following shell sessions show various examples of the use of this program.
Trying to get the parent of the initial user namespace fails, for the reasons explained earlier:
$ ./ns_introspect /proc/self/ns/user p The parent namespace is outside your namespace scope
Create a process running sleep(1) that resides in new user and UTS namespaces, and show that new UTS namespace is associated with the new user namespace:
$ unshare -Uu sleep 1000 & [1] 23235 $ ./ns_introspect /proc/23235/ns/uts Device/Inode of owning user namespace is: [0,3] / 4026532448 $ readlink /proc/23235/ns/user user:[4026532448]
Then show that the parent of the new user namespace in the preceding example is the initial user namespace:
$ readlink /proc/self/ns/user user:[4026531837] $ ./ns_introspect /proc/23235/ns/user Device/Inode of owning user namespace is: [0,3] / 4026531837
Start a shell in a new user namespace, and show that from within this shell, the parent user namespace can't be discovered. Similarly, the UTS namespace (which is associated with the initial user namespace) can't be discovered.
$ PS1="sh2$ " unshare -U bash sh2$ ./ns_introspect /proc/self/ns/user p The parent namespace is outside your namespace scope sh2$ ./ns_introspect /proc/self/ns/uts u The owning user namespace is outside your namespace scope
/* ns_introspect.c Licensed under the GNU General Public License v2 or later. */ #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <fcntl.h> #include <string.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <errno.h> #include <sys/sysmacros.h> #ifndef NS_GET_USERNS #define NSIO 0xb7 #define NS_GET_USERNS _IO(NSIO, 0x1) #define NS_GET_PARENT _IO(NSIO, 0x2) #endif int main(int argc, char *argv[]) { int fd, userns_fd, parent_fd; struct stat sb; if (argc < 2) { fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\n", argv[0]); fprintf(stderr, "\nDisplay the result of one or both " "of NS_GET_USERNS (u) or NS_GET_PARENT (p)\n" "for the specified /proc/[pid]/ns/[file]. If neither " "'p' nor 'u' is specified,\n" "NS_GET_USERNS is the default.\n"); exit(EXIT_FAILURE); } /* Obtain a file descriptor for the 'ns' file specified in argv[1] */ fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); exit(EXIT_FAILURE); } /* Obtain a file descriptor for the owning user namespace and then obtain and display the inode number of that namespace */ if (argc < 3 || strchr(argv[2], 'u')) { userns_fd = ioctl(fd, NS_GET_USERNS); if (userns_fd == -1) { if (errno == EPERM) printf("The owning user namespace is outside " "your namespace scope\n"); else perror("ioctl-NS_GET_USERNS"); exit(EXIT_FAILURE); } if (fstat(userns_fd, &sb) == -1) { perror("fstat-userns"); exit(EXIT_FAILURE); } printf("Device/Inode of owning user namespace is: " "[%lx,%lx] / %ld\n", (long) major(sb.st_dev), (long) minor(sb.st_dev), (long) sb.st_ino); close(userns_fd); } /* Obtain a file descriptor for the parent namespace and then obtain and display the inode number of that namespace */ if (argc > 2 && strchr(argv[2], 'p')) { parent_fd = ioctl(fd, NS_GET_PARENT); if (parent_fd == -1) { if (errno == EINVAL) printf("Can' get parent namespace of a " "nonhierarchical namespace\n"); else if (errno == EPERM) printf("The parent namespace is outside " "your namespace scope\n"); else perror("ioctl-NS_GET_PARENT"); exit(EXIT_FAILURE); } if (fstat(parent_fd, &sb) == -1) { perror("fstat-parentns"); exit(EXIT_FAILURE); } printf("Device/Inode of parent namespace is: [%lx,%lx] / %ld\n", (long) major(sb.st_dev), (long) minor(sb.st_dev), (long) sb.st_ino); close(parent_fd); } exit(EXIT_SUCCESS); }