Namespace system calls (clone, setns, unshare)¶
In the context of [[Linux Containers]] and process isolation, the Linux kernel provides three primary system calls to interact with [[Namespaces]]: clone(), setns(), and unshare(). These APIs allow programs to create new isolated environments, join existing ones, or modify the isolation context of the current process^[400-devops-06-kubernetes-k8s-paas-docker.md].
clone()¶
The clone() system call creates a new process, similar to fork(), but with fine-grained control over the execution context. It is the standard method for creating a new process and a new namespace simultaneously^[400-devops-06-kubernetes-k8s-paas-docker.md].
When calling clone(), the caller can pass specific flags—such as CLONE_NEWPID, CLONE_NEWNET, or CLONE_NEWNS—to the flags parameter^[400-devops-06-kubernetes-k8s-paas-docker.md]. If CLONE_NEWPID is used, for example, the new child process will be created in a new PID namespace where it will view itself as process ID (PID) 1, even though it has a different PID on the host system^[400-devops-06-kubernetes-k8s-paas-docker.md].
Function signature:
int clone(int (*child_func)(void *), void *child_stack, int flags, void *arg);
child_func: The function the child process will execute.child_stack: The stack location for the child process.flags: Bitmask ofCLONE_*flags (including namespace flags).arg: Argument passed to the child function^[400-devops-06-kubernetes-k8s-paas-docker.md].
setns()¶
The setns() system call allows the calling process to join an existing namespace. This is useful for entering a container or an isolated environment that has already been set up^[400-devops-06-kubernetes-k8s-paas-docker.md].
To use setns(), the process must have a file descriptor (fd) that refers to the namespace. These file descriptors are typically found in the /proc/[pid]/ns/ directory^[400-devops-06-kubernetes-k8s-paas-docker.md].
Function signature:
int setns(int fd, int nstype);
fd: The file descriptor of the namespace to join.nstype: Optional parameter to verify the namespace type (e.g., checking if it is a network namespace). If set to 0, no check is performed^[400-devops-06-kubernetes-k8s-paas-docker.md].
unshare()¶
The unshare() system call allows the calling process (or thread) to disassociate parts of its execution context from the parent. Unlike clone(), unshare() does not create a new process; instead, it isolates the current process context^[400-devops-06-kubernetes-k8s-paas-docker.md].
This enables the current process to operate in a "view" that is separated from the original namespace, effectively allowing the process to "jump out" of its current shared environment for specific operations^[400-devops-06-kubernetes-k8s-paas-docker.md]. The Linux command-line utility unshare is a wrapper around this system call^[400-devops-06-kubernetes-k8s-paas-docker.md].
Function signature:
int unshare(int flags);
flags: Bitmask specifying which namespaces to unshare (e.g.,CLONE_NEWNS,CLONE_NEWNET)^[400-devops-06-kubernetes-k8s-paas-docker.md].
Namespace Flags¶
Regardless of which system call is used, isolation is targeted using specific constants passed via the flags argument^[400-devops-06-kubernetes-k8s-paas-docker.md].
Common flags include:
CLONE_NEWIPC: Isolate System V IPC and POSIX message queues^[400-devops-06-kubernetes-k8s-paas-docker.md].CLONE_NEWNS: Isolate mount points (Filesystem)[^[400-devops-06-kubernetes-k8s-paas-docker.md]].CLONE_NEWNET: Isolate network resources (Network Interfaces, routing tables)[^[400-devops-06-kubernetes-k8s-paas-docker.md]].CLONE_NEWPID: Isolate Process IDs (PID)[^[400-devops-06-kubernetes-k8s-paas-docker.md]].CLONE_NEWUSER: Isolate User IDs and Group IDs[^[400-devops-06-kubernetes-k8s-paas-docker.md]].CLONE_NEWUTS: Isolate hostname and domain name[^[400-devops-06-kubernetes-k8s-paas-docker.md]].
Sources¶
^[400-devops-06-kubernetes-k8s-paas-docker.md]