1# Container 2 3## Overview 4 5Container provides a mechanism to isolate global resources, such as process identifiers (PIDs), host information, and user information. The container mechanism allows the processes in different containers to have independent global resources. Changing system resources in a container does not affect processes in other containers. 6 7The LiteOS-A kernel container isolation function involves seven containers: UTS container, PID container, Mount container, Network container, Time container, IPC container, and User container. The container information is stored in the **container** and **credentials** structs of the process control block (**ProcessCB**) struct. 8 9The following table lists the LiteOS-A containers. 10 11| No. | Name | Macro Definition/Flag | Resource | Data Struct | 12| :-------- | :------------- | :------------------- | :----------------------- | :----------------------- | 13| 1 | UTS | CLONE_NEWUTS | Host names, domain names, and version information.|struct Container | 14| 2 | PID | CLONE_NEWPID | PIDs. |struct Container | 15| 3 | Mount | CLONE_NEWNS | File system mount points. |struct Container | 16| 4 | Network | CLONE_NEWNET | Network system resources. |struct Container | 17| 5 | TIME | CLONE_NEWTIME | Clock resources. |struct Container | 18| 6 | IPC | CLONE_NEWIPC | Inter-process communication (IPC) resources. |struct Container | 19| 7 | User | CLONE_NEWUSER | Users and user groups. |struct Credentials | 20 21The container-based resource isolation can be further classified into the following types: 22 23 - Global isolation: The containers are parallel (without inheritance relationships), and the container resources are invisible to each other. 24 25 - Non-global isolation: The containers have parent-child relationships. The resources of containers of the same level are invisible, but the upper-level container can access resources of the lower-level container. 26 27For the PID container, **unshare()** or **setns()** changes the container of the child process (not the process). 28 29You can add a **Container** struct and a **Credentials** struct to the **ProcessCB** of a process to implement container functionalities. You can also enable or disable specific container by using compiler switches. 30 31 - The **ProcessCB** struct of each process contains a pointer to the **Container** struct allocated. This allows a process to have an independent **Container** struct or share a **Container** struct. The **Container** struct contains pointers to the UTS, PID, Network, Mount, Time, and IPC containers. 32 33 - The **ProcessCB** struct of each process has a **Credentials** struct for independent User container management. This design facilitates modularization and independent processing of the unique logic of the User container. 34 35 36 37 38 39### Containers 40 41#### **UTS Container** 42 43The UTS container isolates information, such as host names, domain names, and version information. The information in different UTS containers is isolated from each other. 44 45#### **Mount Container** 46 47The Mount container isolates file mount points. The mount and unmount operations in a container does not affect other containers. 48 49The Mount container allows processes to use the file mounting system independently. Child processes perform mount operations in independent file mounting containers and have their own file mount structs. 50 51- To implement a Mount container, use **clone()** with the **CLONE_NEWNS** flag to create a process, and change the mount information from global information to the information specific to the Mount container. 52 53- After a Mount container is created, change the implementation of obtaining the mount information to enable the mount information to be obtained from the current Mount container. After that, the mount, unmount, and access to the mounted file system of the process do not affect that of other processes. 54 55#### **PID Container** 56 57The PID container isolates PIDs. Processes of different containers can use the same virtual process ID. 58 59 The PID container provides the following features: 60 61- The PIDs of different containers are independent of each other. 62- Nested PID containers are supported. The processes in the child PID containers are visible to the parent PID container. For the same process, the PID in the parent PID container is independent from the PID in the child PID container. 63- The child PID container cannot view the processes in its parent container. 64- All PIDs of the system can be viewed in the root container. 65 66#### **Network Container** 67 68The Network container isolates the system's network devices and network stacks. 69 70The Network container isolates the TCP/IP protocol stacks and network device resources. 71 72 - Transport layer isolation: The Network container isolates port numbers. The available port numbers in a Network container range from 0 to 65535. A process is bound to the port number of its own container. Processes of different Network containers can be bound to the same TCP/UDP port number without affecting each other. 73 - IP layer isolation: The Network container isolates IP resources. Each container has its own IP resources. Changing the IP address in a Network container does not affect other Network containers. 74 - Network device isolation: The Network container isolates network interface cards (NICs). Each container has its own NICs. The NICs in different Network containers are isolated from each other and cannot communicate with each other. You can configure veth-pair to implement communication between different containers. 75 76#### **User Container** 77 78The User container isolates users and user groups. 79 80The User container isolates management rights by User ID or Group ID (UID/GID) and capability. 81 82- UID/GID 83 84 The User container isolates UIDs/GIDs. Different User containers have different UIDs/GIDs. Each User container has independent UIDs/GIDs starting from 0. In this way, the processes in the container can have the **root** permission, which is restricted to the minimum range. Changing the UID/GID of a User container does not affect the processes of other User containers. 85 86- Capability 87 88 With the User container, you can set different capabilities for processes. 89 90 Each process calls **OsInitCapability()** to initialize its permissions. You can use **SysCapGet()** to obtain the capabilities of a process, and use **SysCapSet()** to modify the process permissions. 91 92The following table describes the capabilities. 93 94| Capability | Description | 95| --------------------- | ---------------------------------------------- | 96| CAP_CHOWN | Changes the owner of a file. | 97| CAP_DAC_EXECUTE | Overrides the Discretionary Access Control (DAC) restriction on file execution. | 98| CAP_DAC_WRITE | Overrides the DAC restriction on file write. | 99| CAP_DAC_READ_SEARCH | Overrides the DAC restriction on file read or search of a directory. | 100| CAP_FOWNER | Overrides the requirement that the file owner ID must match the process user ID.| 101| CAP_KILL | Sends a **kill** signal to another process that is not owned by the sender. | 102| CAP_SETGID | Changes the GID of a process. | 103| CAP_SETUID | Changes the UID of a process. | 104| CAP_NET_BIND_SERVICE | Binds a socket to a port whose number is less than 1024. | 105| CAP_NET_BROADCAST | Allows network broadcast and multicast access. | 106| CAP_NET_ADMIN | Allows network management tasks to be executed. | 107| CAP_NET_RAW | Allows the use of raw sockets. | 108| CAP_FS_MOUNT | Allows **chroot()**. | 109| CAP_FS_FORMAT | Allows the use of the file format. | 110| CAP_SCHED_SETPRIORITY | Sets the process scheduling priority. | 111| CAP_SET_TIMEOFDAY | Sets the system time. | 112| CAP_CLOCK_SETTIME | Sets the clock time. | 113| CAP_CAPSET | Sets any capability. | 114| CAP_REBOOT | Restarts the system. | 115| CAP_SHELL_EXEC | Executes shell. | 116 117#### **Time Container** 118 119The Time container isolates the time maintenance information of the system. 120 121Each process has its own Time container to hold the **CLOCK_MONOTONIC** and **CLOCK_MONOTONIC_RAW** clocks so that the operations on these clocks do not affect the clocks of other processes. 122 123The clock offset in the time_for_children container of the current process is recorded in the **/proc/PID/timens_offsets** file. You can also modify the file to change the offset of the Time container. These offsets indicate the time difference from the clock value in the initial Time container. 124 125Currently, the only way to create a Time container is to call **unshare()** with the **CLONE_NEWTIME** flag. The Time container created holds the child process created by the calling process instead of the calling process. 126 127You need to set the clock offset (**/proc/PID/timens_offsets**) for this container before the first process of the container is created. 128 129#### **IPC Container** 130 131The IPC container isolates IPC objects, including the message queues and shared memory. 132 133Each process has its own IPC container to hold the message queue and shared memory. 134 135As a result, the operations on the message queue and shared memory in different containers do not affect each other. 136 137- Message queue isolation: Change the global variable struct **LosQueueCB** to a local variable in each IPC container to implement the message queue isolation. 138 139- Shared memory isolation: Change the global variables **shmInfo**, **sysvShmMux**, **shmSegs**, and **shmUsedPageCount** to local variables in each IPC container to implement the isolation of the shared memory. 140 141### Working Principles 142 143#### Process of Creating a Container 144 145During the system initialization process, a root container is created for initial processes (processes 0, 1, and 2). The root container types include all of the seven containers. 146 147You can use **clone()** with the container flag specified to create a container for a process. If the container flag is not specified, the process reuses its parent process container. 148 149 150 151 152 153#### Process of Switching a Container 154 155Use **unshare()** to move a process to a newly created container. The following figure uses the IPC container as an example. 156 157<img src="figures/container-003.png" alt="ContainerBase" style="zoom:80%;" /> 158 159## How to Develop 160 161The following describes how to create, switch, and destroy a container. 162 163### Creating a Container 164 165You can create a container when using **clone()** to create a process. 166 167**clone** 168 169A container can be created when you use **clone()** to create a process. The function prototype is as follows: 170 171``` 172int clone(int (*fn)(void *), void *stack, int flags, void *arg, ... 173 /* pid_t *parent_tid, void *tls, pid_t *child_tid */ ); 174``` 175 176 - When using **clone()** to create a process, you can specify a container to isolate resources (such as the UTS information) for the process. 177 178 - If no container flag is specified, the process shares the containers of its parent process. 179 180### Switching a Container 181 182 You can use either of the following interfaces to move a process to another container: 183 184- **unshare** 185 186 Use **unshare()** to move a process to a newly created container. The function prototype is as follows: 187 188 ``` 189 int unshare(int flags); 190 ``` 191 192 > **NOTE** 193 > 194 > For the PID or Time container, **unshare()** moves the child process (not the process itself) to a new container created. 195 196- **setns** 197 198 Use **setns()** to move a process to another existing container. The function prototype is as follows: 199 200 ``` 201 int setns(int fd, int nstype); 202 ``` 203 204 > **NOTE** 205 > 206 > For the PID or Time container, **setns()** moves the child process (not the process itself) to another container. 207 208### Destroying a Container 209 210When a process is terminated, it exits all containers and the container reference count decrements. When the reference count decrements to 0, you need to destroy the container. 211 212You can use **kill()** to send a specified signal to the process to terminate or exit it. The function prototype is as follows: 213 214``` 215int kill(pid_t pid, int sig); 216``` 217 218### Querying Container Information 219 220You can run the **ls** command to view container information in the **/proc/[pid]/container/** directory. 221 222``` 223ls -l /proc/[pid]/container 224``` 225 226| Property | User| User Group| File Name | Description | 227| :--------- | :------- | :--------- | :--------------------------------------- | :--------------------- | 228| lr--r--r-- | u:0 | g:0 | net -> 'net:[4026531847]' | The referenced object is the container with a unique ID.| 229| lr--r--r-- | u:0 | g:0 | user -> 'user:[4026531841]' | The referenced object is the container with a unique ID. | 230| lr--r--r-- | u:0 | u:0 | time_for_children -> 'time:[4026531846]' | The referenced object is the container with a unique ID. | 231| lr--r--r-- | u:0 | g:0 | time -> 'time:[4026531846]' | The referenced object is the container with a unique ID. | 232| lr--r--r-- | u:0 | g:0 | ipc -> 'ipc:[4026531845]' | The referenced object is the container with a unique ID. | 233| lr--r--r-- | u:0 | g:0 | mnt -> 'mnt:[4026531844]' | The referenced object is the container with a unique ID. | 234| lr--r--r-- | u:0 | g:0 | uts -> 'uts:[4026531843]' | The referenced object is the container with a unique ID. | 235| lr--r--r-- | u:0 | g:0 | pid_for_children -> 'pid:[4026531842]' | The referenced object is the container with a unique ID. | 236| lr--r--r-- | u:0 | g:0 | pid -> 'pid:[4026531842]' | The referenced object is the container with a unique ID. | 237 238### plimits 239 240plimits sets resource limits of process groups. **/proc/plimits** is the root directory of plimits. 241 242- The plimits file system is a pseudo file system used to implement mappings between files and plimits variables. With this file system, you can modify kernel variables through operations on files. For example, you can modify the **memory.limit** file to restrict memory allocation. 243- In the plimits file system, files can be read and written, and directories can be added or deleted. 244- A plimits directory maps a plimits group. When a directory is created, the files (mapped to the control variables of the limiter) in the directory are automatically created. 245- Files for a limiter are created by group. For example, when a memory limiter is created, all files required, instead of a single file, are created. 246 247The macro **LOSCFG_PROCESS_LIMITS** specifies the setting of plimits. **y** means to enable plimits, and **n** (default) means the opposite. 248 249If **LOSCFG_PROCESS_LIMITS** is set to **y**, the **/proc/plimits** directory contains the following files: 250 251| Permission | User| User Group| File Name | Description | Remarks | 252| ---------- | ---- | ------ | ---------------- | --------------------------------- | ------------------------------------------------------------ | 253| -r--r--r-- | u:0 | g:0 | sched.stat | Scheduling statistics information. | Output format: [PID runTime] | 254| -r--r--r-- | u:0 | g:0 | sched.period | Scheduling period configuration, in μs. | / | 255| -r--r--r-- | u:0 | g:0 | sched.quota | Scheduling quota configuration, in μs. | / | 256| -r--r--r-- | u:0 | g:0 | devices.list | List of the devices accessed by processes in plimits. | Output format: [type name access] | 257| -r--r--r-- | u:0 | g:0 | devices.deny | Devices that cannot be accessed by the processes in plimits.| Format: ["type name access" >> device.deny] | 258| -r--r--r-- | u:0 | g:0 | devices.allow | Devices that can be accessed by the processes in plimits.| Format: ["type name access" >> device.allow] | 259| -r--r--r-- | u:0 | g:0 | ipc.stat | Statistics about the IPC objects allocated. | Output format: [mq count: mq failed count:<br> shm size: shm failed count: ] | 260| -r--r--r-- | u:0 | g:0 | ipc.shm_limit | Upper limit of the shared memory, in bytes. | / | 261| -r--r--r-- | u:0 | g:0 | ipc.mq_limit | Maximum number of messages in a message queue. | 0 to 64-bit positive integer | 262| -r--r--r-- | u:0 | g:0 | memory.stat | Memory statistics, in bytes. | / | 263| -r--r--r-- | u:0 | g:0 | memory.limit | Total memory limit for a process group, in bytes. | / | 264| -r--r--r-- | u:0 | g:0 | pids.max | Maximum number of processes in a group. | / | 265| -r--r--r-- | u:0 | g:0 | pids.priority | Highest process priority in a group. | / | 266| -r--r--r-- | u:0 | g:0 | plimits.procs | PIDs of all processes in a group. | / | 267| -r--r--r-- | u:0 | g:0 | plimits.limiters | Limiters in the plimits group. | / | 268 269The **devices** parameter is described as follows: 270 271| type (Device Type) | name (Device Name)| access (Permission) | 272| -------------------------------------------- | ----------------- | ---------------------------------- | 273| a - All devices, which can be character devices or block devices.| / | r - Allow the process to read the specified device. | 274| b - Block device | / | w - Allow the process to write to the specified device. | 275| c - Character device | / | m - Allow the process to generate a file that does not exist.| 276 277## Reference 278 279### Specifications 280 281#### Parameter Settings 282 283**LOSCFG_KERNEL_CONTAINER_DEFAULT_LIMIT** specifies the maximum number of containers of each type supported by the kernel. 284 285The initialization of the **proc/sys/user** directory generates the **max_net_container**, **max_ipc_container**, **max_time_container**, **max_uts_container**, **max_user_container**, **max_pid_container**, and **max_mnt_container** files, and binds the pseudo files and kernel parameters. You can modify the kernel parameters by configuring the pseudo files. New containers can be created if the number of containers is less than the maximum. Otherwise, NULL is returned. 286 287#### **Unique Container ID** 288 289All the containers are uniquely numbered based on a fixed value. 290 291``` 292#define CONTAINER_IDEX_BASE (0xF0000000) 293inum = CONTAINER_IDEX_BASE + (unsigned int)i; 294``` 295 296#### **Rule Settings** 297 298- The PID container and User container support nesting of up to three layers. Other containers do not support nested containers. 299 300- When **clone()**, **setns()**, and **unshare()** are used, flags complying with POSIX must be passed in. The flags are described as follows: 301 302| Flag | clone | setns | unshare | 303| ------------- | ---------------------------- | -------------------------------- | -------------------------------- | 304| CLONE_NEWNS | Create a Mount container for a child process. | Move this process to the specified Mount container.| Create a Mount container for this process. | 305| CLONE_NEWPID | Create a PID container for a child process. | Move this process to the specified PID container. | Create a PID container for a new child process. | 306| CLONE_NEWIPC | Create an IPC container for a child process. | Move this process to the specified IPC container. | Create an IPC container for this process. | 307| CLONE_NEWTIME | Create a Time container for the parent process of this process.| Not supported currently | Create a Time container for a new child process.| 308| CLONE_NEWUSER | Create a User container for a child process. | Move this process to the specified User container. | Create a User container for this process. | 309| CLONE_NEWUTS | Create a UTS container for a child process. | Move this process to the specified UTS container. | Create a UTS container for this process. | 310| CLONE_NEWNET | Create a Network container for a child process. | Move this process to the specified Network container.| Create a Network container for this process. | 311 312- The container features are controlled by compiler macros. 313 314 ``` 315 // Macro of the container feature 316 LOSCFG_CONTAINER 317 // Macro of the container of each type 318 LOSCFG_UTS_CONTAINER 319 LOSCFG_MNT_CONTAINER 320 LOSCFG_PID_CONTAINER 321 LOSCFG_NET_CONTAINER 322 LOSCFG_USER_CONTAINER 323 LOSCFG_TIME_CONTAINER 324 LOSCFG_IPC_CONTAINER 325 ``` 326 327 328### Development Examples 329 330The LiteOS-A smoke test cases contain the examples of the corresponding interfaces. You need to compile and verify the test cases. The recommended test cases are as follows: 331 332[Creating a UTS Container](https://gitee.com/openharmony/kernel_liteos_a/blob/master/testsuites/unittest/container/smoke/It_uts_container_001.cpp) 333 334[Moving a Process to a New UTS Container Using unshare()](https://gitee.com/openharmony/kernel_liteos_a/blob/master/testsuites/unittest/container/smoke/It_uts_container_004.cpp) 335 336[Moving a Process to the UTS Container of the Child Process Using setns()](https://gitee.com/openharmony/kernel_liteos_a/blob/master/testsuites/unittest/container/smoke/It_uts_container_005.cpp) 337 338[Creating a Network Container](https://gitee.com/openharmony/kernel_liteos_a/blob/master/testsuites/unittest/container/smoke/It_net_container_001.cpp) 339 340[Creating a User Container](https://gitee.com/openharmony/kernel_liteos_a/blob/master/testsuites/unittest/container/smoke/It_user_container_001.cpp) 341 342[Creating a PID Container](https://gitee.com/openharmony/kernel_liteos_a/blob/master/testsuites/unittest/container/smoke/It_pid_container_023.cpp) 343 344[Creating a Mount Container](https://gitee.com/openharmony/kernel_liteos_a/blob/master/testsuites/unittest/container/smoke/It_mnt_container_001.cpp) 345 346[Creating an IPC Container](https://gitee.com/openharmony/kernel_liteos_a/blob/master/testsuites/unittest/container/smoke/It_ipc_container_001.cpp) 347 348[Creating a Time Container](https://gitee.com/openharmony/kernel_liteos_a/blob/master/testsuites/unittest/container/smoke/It_time_container_001.cpp) 349