Skip to content

Namespace system calls (clone, setns, unshare)

In the context of [[Linux Containers]] and process isolation, the Linux kernel provides three primary system calls to interact with [[Namespaces]]: clone(), setns(), and unshare(). These APIs allow programs to create new isolated environments, join existing ones, or modify the isolation context of the current process^[400-devops-06-kubernetes-k8s-paas-docker.md].

clone()

The clone() system call creates a new process, similar to fork(), but with fine-grained control over the execution context. It is the standard method for creating a new process and a new namespace simultaneously^[400-devops-06-kubernetes-k8s-paas-docker.md].

When calling clone(), the caller can pass specific flags—such as CLONE_NEWPID, CLONE_NEWNET, or CLONE_NEWNS—to the flags parameter^[400-devops-06-kubernetes-k8s-paas-docker.md]. If CLONE_NEWPID is used, for example, the new child process will be created in a new PID namespace where it will view itself as process ID (PID) 1, even though it has a different PID on the host system^[400-devops-06-kubernetes-k8s-paas-docker.md].

Function signature:

int clone(int (*child_func)(void *), void *child_stack, int flags, void *arg);

  • child_func: The function the child process will execute.
  • child_stack: The stack location for the child process.
  • flags: Bitmask of CLONE_* flags (including namespace flags).
  • arg: Argument passed to the child function^[400-devops-06-kubernetes-k8s-paas-docker.md].

setns()

The setns() system call allows the calling process to join an existing namespace. This is useful for entering a container or an isolated environment that has already been set up^[400-devops-06-kubernetes-k8s-paas-docker.md].

To use setns(), the process must have a file descriptor (fd) that refers to the namespace. These file descriptors are typically found in the /proc/[pid]/ns/ directory^[400-devops-06-kubernetes-k8s-paas-docker.md].

Function signature:

int setns(int fd, int nstype);

  • fd: The file descriptor of the namespace to join.
  • nstype: Optional parameter to verify the namespace type (e.g., checking if it is a network namespace). If set to 0, no check is performed^[400-devops-06-kubernetes-k8s-paas-docker.md].

unshare()

The unshare() system call allows the calling process (or thread) to disassociate parts of its execution context from the parent. Unlike clone(), unshare() does not create a new process; instead, it isolates the current process context^[400-devops-06-kubernetes-k8s-paas-docker.md].

This enables the current process to operate in a "view" that is separated from the original namespace, effectively allowing the process to "jump out" of its current shared environment for specific operations^[400-devops-06-kubernetes-k8s-paas-docker.md]. The Linux command-line utility unshare is a wrapper around this system call^[400-devops-06-kubernetes-k8s-paas-docker.md].

Function signature:

int unshare(int flags);

  • flags: Bitmask specifying which namespaces to unshare (e.g., CLONE_NEWNS, CLONE_NEWNET)^[400-devops-06-kubernetes-k8s-paas-docker.md].

Namespace Flags

Regardless of which system call is used, isolation is targeted using specific constants passed via the flags argument^[400-devops-06-kubernetes-k8s-paas-docker.md].

Common flags include:

  • CLONE_NEWIPC: Isolate System V IPC and POSIX message queues^[400-devops-06-kubernetes-k8s-paas-docker.md].
  • CLONE_NEWNS: Isolate mount points (Filesystem)[^[400-devops-06-kubernetes-k8s-paas-docker.md]].
  • CLONE_NEWNET: Isolate network resources (Network Interfaces, routing tables)[^[400-devops-06-kubernetes-k8s-paas-docker.md]].
  • CLONE_NEWPID: Isolate Process IDs (PID)[^[400-devops-06-kubernetes-k8s-paas-docker.md]].
  • CLONE_NEWUSER: Isolate User IDs and Group IDs[^[400-devops-06-kubernetes-k8s-paas-docker.md]].
  • CLONE_NEWUTS: Isolate hostname and domain name[^[400-devops-06-kubernetes-k8s-paas-docker.md]].

Sources

^[400-devops-06-kubernetes-k8s-paas-docker.md]