Linux Capabilities in Docker and Podman
When running Docker (or Podman) containers, sometimes you may encounter Operation not permitted error messages even if you are using the root user or sudo. This is because the root in the container does not have full root permissions. This permission control is implemented through Linux capabilities. This article will first introduce the concept of Linux capabilities, then use Docker as an example to introduce how to adjust the Linux capabilities of containers, and finally introduce the differences between Docker and Podman in default capabilities, providing reference for container developers and users.
Linux Capabilities
The classic Linux permission control model divides users into ordinary users and privileged users (such as the root user and users with sudo permissions). Privileged users have all permissions on the system, which can easily lead to security issues. For example, a web server needs privileges to listen on ports 443 or 80, but should not access other users' files or modify the system kernel; if the web server is compromised, the attacker will gain all permissions on the system.
Linux capabilities divide the privileges in the system into multiple different capabilities, which can reduce the risk of the system by granting processes partial privileges instead of full root permissions. For example, the CAP_NET_BIND_SERVICE capability allows a process to bind ports less than 1024 without requiring full root permissions. The list of Linux capabilities can be viewed with man 7 capabilities.
Linux Capabilities in Docker
Unlike servers, containers do not need full root permissions because the purpose of containers is to run one or more specific applications, not the entire system. For example:
- Containers usually do not need to manage networks and logs because the network and logs of the container are managed by the Docker Engine.
- Containers usually do not need to set the time because the time of the container is provided by the host machine.
- Containers usually do not need to run the rebootcommand because the lifecycle of the container is managed by the Docker Engine.
Therefore, Docker restricts the capabilities of containers by default through a whitelist, that is, containers only have specific capabilities by default. The capabilities used by Docker can be viewed here.
If you want to further restrict the capabilities of the container to increase security, you can remove capabilities with the --cap-drop option. If the program in the container does need certain capabilities, you can add these capabilities with the --cap-add option. For container developers, it is recommended to clearly state in the README when additional capabilities are needed.
Differences in Capabilities between Podman and Docker
Podman achieves higher security than Docker by further restricting the capabilities of containers. The default values of Podman's capabilities can be viewed here.
Podman's default capabilities are stricter than Docker's, so containers running in Podman may encounter more Operation not permitted errors, such as sudo will not be able to use the CAP_AUDIT_WRITE capability. If Podman users encounter Operation not permitted errors when running containers while others cannot reproduce them, it is likely due to the additional restrictions of Podman's capabilities.
References
Linux Capabilities in Docker and Podman
https://blog.caomingjun.com/linux-capabilities-in-docker-and-podman/en/