Kernel Capabilities in Docker: A fine-grained Access Control System

By Sheila A. Berta, Head of Research at Dreamlab Technologies

Kernel capabilities turn the binary “root/non-root” dichotomy into a fine-grained access control system. As was seen in the user namespace remapping post, the default user within a container is root, which is also root for the host machine. However, Docker drops most of the kernel capabilities for the container’s process. Which means that the root within a container has less privileges than the root of the host; although it is still quite powerful.

The default kernel capabilities available to the container root are found in this part of the Docker’s source code: https://github.com/moby/moby/blob/master/oci/caps/defaults.go

This can be checked by running a container and inspecting its capabilities. It is possible to do that either installing libcap in the container and using the capsh –print command or getting the PID of a container’s process and using the getpcaps command in the host machine.

The figure below shows the default kernel capabilities allowed in the container’s process.

It is good to mention that the terribly dangerous –privileged flag – that can be used at container’s execution – enables all the kernel capabilities. It totally breaks isolation. If an attacker compromises a container running with –privileged flag, they have also compromised the host.

The figure below shows that the container’s process running with the –privileged flag has the full kernel capabilities available.

Fine-grained access control system

In most cases, containers don’t need all the root privileges. Therefore, they can run with a reduced capability set; meaning that root within a container will have even less privileges than default.

Docker supports the addition and/or removal of kernel capabilities, allowing use of a non-default profile. This may make Docker more secure through capability removal, or less secure through the addition of capabilities. The best practice is to remove all capabilities except those explicitly required for the container to be run.

The parameter –cap-drop=all allows all the allowed default kernel capabilities to be dropped. For example, if a web service is to be run, then the following command might be executed:

However, an error will be returned due to the removal of the capability “cap_net_bind_service” that is necessary to bind privileged ports ( < 1024). In this case the port 80 for the web service.