Namespace system calls¶
Namespace system calls are a set of Linux kernel interfaces used to isolate or virtualize system resources for processes. They are fundamental to container technology, allowing the creation of distinct execution environments where processes have restricted views of the operating system^[Docker基础.md].
Core Functionality¶
The primary function of Namespace system calls is to modify a process's view of the system. This is achieved by creating "walls" or boundaries between different process groups. While [[Cgroups]] are responsible for resource constraints (limiting CPU, memory, etc.), Namespaces are responsible for isolation—ensuring that a process perceives an isolated environment for specific global resources^[Docker基础.md].
At a technical level, containers are implemented as specialized processes. When a container is run (e.g., via docker run), the underlying mechanism uses these system calls to ensure that the process inside the container cannot fully see or interact with the host system's resources in an unrestricted manner^[Docker基础.md].
System Call APIs¶
The Linux Namespace API consists of three primary system calls: clone(), setns(), and unshare(). When using these calls, specific flags (constants) are used to determine which type of Namespace to operate upon^[Docker基础.md].
clone()¶
The clone() system call creates a new process, similar to fork(), but with granular control over the execution context^[Docker基础.md].
int clone(int (*child_func)(void *), void *child_stack, int flags, void *arg);
By passing specific CLONE_NEW* flags to the flags parameter, a new Namespace can be created simultaneously with the new process. For example, passing CLONE_NEWPID causes the new process to perceive itself as having a PID of 1 within its isolated PID namespace, even though it has a different PID on the host system^[Docker基础.md].
setns()¶
The setns() system call allows an existing process to join an existing Namespace^[Docker基础.md].
int setns(int fd, int nstype);
This is often used when a Namespace needs to be persisted even after the originating process has ended. The file descriptor (fd) typically points to entries in the /proc/[pid]/ns/ directory. The nstype parameter can be used to verify that the file descriptor matches the expected Namespace type^[Docker基础.md].
unshare()¶
The unshare() system call allows the current process to disassociate parts of its execution context from the parent^[Docker基础.md].
int unshare(int flags);
Unlike clone(), unshare() does not spawn a new process. Instead, it isolates the current process context. This is useful for performing operations in an isolated environment without the overhead of creating a child process^[Docker基础.md].
Namespace Types¶
Linux supports several types of Namespaces, identified by specific constants used with the system calls above^[Docker基础.md]:
CLONE_NEWPID: Isolates the Process ID (PID) space. Processes inside this namespace see their own PID hierarchy, typically starting with PID 1^[Docker基础.md].CLONE_NEWNS: Isolates Mount points. This allows the process to have a distinct filesystem view (different root directory and mount structure)^[Docker基础.md].CLONE_NEWNET: Isolates Network resources (network stacks, interfaces, routing tables)^[Docker基础.md].CLONE_NEWUTS: Isolates hostname and domain name^[Docker基础.md].CLONE_NEWIPC: Isolates Inter-Process Communication (IPC) objects (such as System V IPC and POSIX message queues)^[Docker基础.md].CLONE_NEWUSER: Isolates User ID and Group ID spaces. This allows a process to have root privileges inside the namespace without being root on the host system^[Docker基础.md].
Related Concepts¶
- [[Cgroups]]
- [[Docker基础]]
- [[Rootfs]]
Sources¶
- Docker基础.md