Container Hardening Process

Author: Wei Reviewer: Shawn Chang

Revision: 0.1

Abstract:

This article presents a step-by-step guide to the container hardening process on the GNU/Linux operating system. For demonstration, we create a containerized application using the Podman container platform and Pandoc. The hardening begins with creating a customized Seccomp policy profile by analyzing system calls for the container process and applying the profile. Then, MAC tools such as AppArmor and SELinux are set up on the host OS to confine the container process from arbitrarily accessing the host file system. Finally, we give some suggestions about further hardening options and some discussions about security in general.

Introduction

Thanks to the features of quickly launching and easily deploying and managing, the container technique has been broadly applied in the industry. With the help of tools such as Kubernetes, the management tasks of these container instances can reach a relatively large scale. However, it raises some questions about the security of container and containerized applications. When there is a new vulnerability, the application and its dependencies in the container may not always be updated in time, at least not as fast as the applications covered by the package management tools in an ordinary system. Furthermore, unlike the virtualization technique where the guest OS runs in the virtual machine independently, the container shares the same kernel and the process management mechanism with the host system, which potentially exposes more attack interfaces to the host and other sibling container processes. Considering this situation, further hardening targeting the container and containerized applications might be necessary to mitigate the impact of the utilization of vulnerabilities and to reduce the damage to a low level.

This article explores container hardening methods utilizing the Linux Seccomp filter and MAC tools in the host system and other means. Firstly, we select the container application, namely Podman, and set up the test environment, where we build a Pandoc container image as an example. Then, the Seccomp filter is generated using oci-seccomp-bpf-hook and applied according to the specific container instance. After that, we will see how MAC tools, including AppArmor and SELinux, are configured for container confinement.

The process presented here targets the hardening of container instances and containerized applications in the industry environment. It is not a sandbox solution, and we do not intend to implement such a solution. Although containerization can be applied as one of the sandbox confinement mechanisms, regarding a complete sandbox solution, additional restriction interfaces might be involved, and a different security perspective is considered as well. As a reference, an article that provides more information about the sandbox solutions on the GNU/Linux system is available ¹.

For the purpose of better demonstrate this process, documents including Dockerfile, Makefile, and seccomp filters have been collected and uploaded to our git repository ².

Build the environment

Container Solution

First and foremost, a proper container platform is selected, is important considering that we need a hardening solution to make the exploitation more difficult.

For our environment, Podman is chosen instead of other solutions such as Docker and Linux Container. It supports the rootless container natively ³ so the container instance can be run safely by a non-root user. Also, Podman manages the container in a daemon-less way ³, which means, in contrast with Docker, no Podman management process is always running in the background. Both features reduce the attack interfaces exposed to the host.

Podman supports various commonly used container image formats, including OCI specification ⁴ ⁵ and Docker images ⁶. It also has a CLI user interface that is compatible with Docker, meaning that it is possible to use the same subcommands and options of the Docker interface to manage Podman containers and images.

In this process, the Podman version 5.2.2 is installed.

OS

Here, we use OpenSUSE Tumbleweed, a rolling release distribution, as our host operating system and the system inside the container. The rolling release guarantees the applications can always get the latest updates when the patches are available. In addition, it is usually easier to apply the fix for the vulnerability on a rolling distribution compared to the fixed-term release system, which often requires some backport efforts for the fixes.

The Appliation

For the demonstration purpose, a Pandoc container will be built. Pandoc ⁷ is a powerful open-source tool to convert documents between multiple formats such as markdown, markups, Open Document Format (ODF), and PDF. In the following process, a markdown document is converted to HTML for simplicity’s sake, and the output result is checked to make sure the application works as expected.

Build and Test the Image

The Dockerfile in the git repository under build/ shows a simple process to build the image. Only the minimized packages are included. Instead of using root, a dedicated pair of user and group pan:pan is created to run the application in the container.

Build the image.

$ podman build -t pandoc:latest build/

In this process, the latest update of pandoc installed in the container is in version 3.3.

Once done, check the build result.

$ podman images
REPOSITORY                                 TAG         IMAGE ID      CREATED             SIZE
localhost/pandoc                           latest      0c162b4e848a  About a minute ago  667 MB
registry.opensuse.org/opensuse/tumbleweed  latest      0d8c60935b25  22 hours ago        99.2 MB

Before hardening the container, test the image by converting a simple markdown document at demo/demo.md to the HTML format. Here, we map the current directory from the host to the home directory of the user that runs the Pandoc application from the container.

$ cd demo/
$ podman run --rm -it \
       --volume "$(pwd):/home/pan" \
       --userns keep-id:uid=$(id -u),gid=$(id -g) \
       localhost/pandoc:latest \
           pandoc -f markdown -t html demo.md -o demo.html

The generated markdown file demo.md should have contained the correct HTML format content converted from the original markdown.

So far, if everything goes well, we already have a workable image as the target of our hardening process.

Generate Customized Seccomp Filter

Seccomp is a kernel mechanism that restricts the process from accessing some unused system calls defined by the policy. Usually, only a small set of system calls are permitted. Both Docker and Podman come with the seccomp support. If not specified explicitly, the policy from a default Seccomp profile whose location is defined in containers.conf is applied.

On the Tumbleweed:

$ grep seccomp_profile /usr/share/containers/containers.conf
#seccomp_profile = "/usr/share/containers/seccomp.json"

As shown, the default profile is /usr/share/containers/seccomp.json. However, according to the policy in this profile, 375 syscalls are allowed in total, which is overly coarse-grained for a hardening solution. Because the profile is to be used by all containers, it must include system calls as much as possible to fit all situations.

A common practice to reduce the attack interface is creating a customized policy that fits our use scenario. In other words, only the system calls that are necessary for running the above Pandoc command are included in the policy.

An unignorable problem with manually creating the Seccomp policy for an application is that it may take considerable effort and loads of time. To do so, the policy developers need to be familiar with the application, sometimes even at the code level, which is usually not practicable. For efficiency, we recommend utilizing tools to generate the Seccomp policy.

The tool oci-seccomp-bpf-hook ⁸ is used in our demonstration, which provides an OCI hook to trace and analyze the system calls of the container from BPF and generate a Seccomp policy profile. The package is available on major GNU/Linux distributions such as Fedora, openSUSE Tumbleweed, Debian, and Ubuntu.

Install oci-seccomp-bpf-hook

On the openSUSE Tumbleweed, as it hasn’t been added to the official repository, we need to add the “security” development repository before installing the package. A subpackage for testing is also available so we can verify whether the tool works on the system before creating the policy profile.

$ sudo zypper addrepo https://download.opensuse.org/repositories/security/openSUSE_Tumbleweed/ security
$ sudo zypper install oci-seccomp-bpf-hook oci-seccomp-bpf-hook-tests

We plan to push the package to the factory project so that the application can be available from the official repository in the future.

The application might not be available on some distributions, but it could be built from the source code. In this case, some development libraries for the Go programming language need to be installed as dependencies.

By default, the hook’s binary is installed at /usr/libexec/oci/hooks.d/oci-seccomp-bpf-hook. Meanwhile, a JSON file located at /usr/share/containers/oci/hooks.d/oci-seccomp-bpf-hook.json defines the path to the binary, which will be passed to the command to trace the system calls later.

About Unprivileged BPF

Most distributions have disabled Unprivileged BPF for security considerations. Since the BPF hook is required to work with the root privilege, we don’t need to enable unprivileged BPF. Once the Seccomp profile is there, it can be used in the rootless Podman process.

Test oci-secomp-bpf-hook

Because the test suit is also available, we could check if the BPF hook really works.

$ sudo /usr/share/oci-seccomp-bpf-hook/test/system/test_runner.sh
++ time bats --tap .
1..8
ok 1 Podman available
ok 2 Version check # skip This test only makes sense in a source-tree environment
ok 3 Trace and check size of generated profile
ok 4 Trace and use generated profile
ok 5 Containers fails to run blocked syscall
ok 6 Extend existing seccomp profile
ok 7 Syscall blocked in input profile remains blocked in output profile
ok 8 Trace and look for syslogs

real    0m33.802s
user    0m1.020s
sys     0m0.575s

The article ⁹ by Red Hat gives detailed instructions about the usage of oci-seccomp-bpf-hook.

Generate the Seccomp Profile

To create the Seccomp profile, the previous command that converts between different document formats will be used, along with some additional options. The command can find oci-seccomp-bpf-hook.json without needing to specify its location. The option --annotation with the argument io.containers.trace-syscall is included to output the created profile. Since we run the command as root privilege and the podman image localhost/pandoc:latest we built has been stored in the storage of the unprivileged user, the image is invisible to the podman under the sudo. For this reason, the option --root is included.

$ sudo podman run --rm -it \
      --volume "$(pwd):/home/pan" \
      --userns keep-id:uid=$(id -u),gid=$(id -g) \
      --root ${HOME}/.local/share/containers/storage/ \
      --annotation io.containers.trace-syscall="of:demo_seccomp.json" \
      localhost/pandoc:latest \
          pandoc -f markdown -t html demo.md -o demo.html

The output Seccomp profile is in a JSON file. For consistency, change the file owner from root to the unprivileged user.

$ sudo chown -R $(id -un):$(id -gn) demo_seccomp.json

Since the command has run in root privilege on the image for the unprivileged user, the ownership of some files has been modified to the root user. We need to change it back in order to use the image as the unprivileged user.

$ sudo chown -R $(id -un):$(id -gn) \
      ${HOME}/.local/share/containers/storage/ \
      /tmp/storage-run-$(shell id -u)/containers/overlay-layers/mountpoints.json

Apply the Profile

Since the customized Seccomp profile is ready, the next step is to run the pandoc container with this profile and see if it works in the normal operation scenario. We insert the profile into the command with the --security-opt option.

$ podman run --rm -it \
       --volume "$(pwd):/home/pan" \
       --userns keep-id:uid=$(id -u),gid=$(id -g) \
       --security-opt seccomp=demo_seccomp.json \
       localhost/pandoc:latest \
           pandoc -f markdown -t html demo.md -o demo.html

If everything is correct, the HTML file demo.html is generated, and it should be identical to the one generated previously without the hook.

Check the Profile

Using tools like Python mjson ¹⁰, the customized Seccomp profile can be formatted for easy observation.

$ python3 -mjson.tool demo_seccomp.json demo_seccomp.json

By checking the file, we can see 73 system calls in the allow list, a small set compared to the 375 system calls from the default Seccomp profile for containers. About 80% of system calls from the default profile are blocked.

Configure the MAC Tools for the Container

Mandatory Access Control (MAC) tools can be enabled on the host as an additional hardening layer to prevent container escape and mitigate the impact of vulnerabilities on both container platform and the application. Common used MAC tools on major GNU/Linux distributions are AppArmor and SELinux.

AppArmor

Currently, there is an issue with using AppArmor to confine the Podman container run by the unprivileged user. When the Podman starts with the AppArmor enforcement and enables the AppArmor profile, it accesses /sys/kernel/security/apparmor/profiles, which require the root privilege for reading ¹¹. Therefore, when the container is run by an unprivileged user, it fails with an error that indicates AppArmor is not enabled on the system.

For normal Podman functionality with AppArmor enforced, accessing the profiles pseudo file is unnecessary. A commit ¹² was submitted by the community to remove the file accessing so that the AppArmor profile could work for the Podman run by the unprivileged user. However, it has been reverted ¹³ followingly due to the failure in the CI check procedure.

Still, it is possible to enable the AppArmor profile by running the Podman container with the root user, but the advantage of the rootless container in the hardening will be eliminated.

Another issue with AppArmor is that, so far, most distributions do not have an official workable profile for Podman, which means users have to create one manually by themselves.

For these reasons, instead of the AppArmor enforcement, SELinux is enabled in this process. We will continue monitoring the fixing progress from the Podman official repository and update our contents accordingly as necessary.

SELinux

Since SELinux is not delivered on openSUSE Tumbleweed by default, we must install the necessary packages and enable SELinux from the configure file and kernel command line.

Instructions about configuring the kernel command line, labeling the file system, and enforcing SELinux can be found on the openSUSE wiki page at ¹⁴.

Noticed that except for the targeted policies in selinux-policy-targeted, the package container-selinux is also required which contains policies for the Podman and general containers.

$ sudo zypper install selinux-policy-targeted container-selinux

The contexts are defined at:

$ cat /usr/share/containers/selinux/contexts
process = "system_u:system_r:container_t:s0"
file = "system_u:object_r:container_file_t:s0"
ro_file="system_u:object_r:container_ro_file_t:s0"
kvm_process = "system_u:system_r:container_kvm_t:s0"
init_process = "system_u:system_r:container_init_t:s0"
engine_process = "system_u:system_r:container_engine_t:s0"

The related policies can be found at: (only part of results are shown as an example)

$ sudo bzcat /usr/share/selinux/packages/container.pp.bz2 | grep -ai "pod\|/container"
/usr/s?bin/containerd.*         --      system_u:object_r:container_runtime_exec_t:s0
/usr/local/s?bin/containerd.*   --      system_u:object_r:container_runtime_exec_t:s0
/usr/bin/container[^/]*plugin   --      system_u:object_r:container_runtime_exec_t:s0
/usr/bin/podman         --      system_u:object_r:container_runtime_exec_t:s0
/usr/local/bin/podman   --      system_u:object_r:container_runtime_exec_t:s0
...
/var/log/pods(/.*)?             system_u:object_r:container_log_t:s0
...

Now, try to run the command to convert documents again and see if it works:

$ cd demo/
$ podman run --rm -it \
       --volume "$(pwd):/home/pan" \
       --userns keep-id:uid=$(id -u),gid=$(id -g) \
       --security-opt seccomp=demo_seccomp.json \
       localhost/pandoc:latest \
           pandoc -f markdown -t html demo.md -o demo.html
pandoc: demo.md: withBinaryFile: permission denied (Permission denied)

The command failed with permission denied. It is probably because the container under SELinux does not have access permission for the mapped directory where the markdown document is.

Check the SELinux context for the directory:

$ cd ..
$ ls -Zd demo/
system_u:object_r:user_home_t:s0 demo/

$ ls -Z1 demo/
system_u:object_r:user_home_t:s0 demo.md
system_u:object_r:user_home_t:s0 demo_seccomp.json
system_u:object_r:user_home_t:s0 Makefile

It shows the labeled type of the mapped directory is user_home_t. However, accroding to the contexts defined at /usr/share/containers/selinux/contexts, the Podman process is allowed only to access the files with the type container_file_t labeled, thus pointing us to relabel the mapped directory and the files.

To change the context temporarily:

$ sudo chcon -R -t container_file_t demo/
$ ls -Zd demo/
system_u:object_r:container_file_t:s0 demo/

However, we want to make the change permanent so in future the label can be restored to the correct one with restorecon.

The following command modify the context permanently.

$ sudo semanage fcontext -a -t container_file_t "/home/user/podman-pandoc/demo(/.*)?"

Confirm our customized context has been available.

$ sudo semanage fcontext -C -l
SELinux fcontext                         type           Context

/home/user/podman-pandoc/demo(/.*)?      all files      system_u:object_r:container_file_t:s0

Finally, we relabel the directory with customized context.

$ sudo restorecon -R -v /absolute/path/to/demo
Relabeled /home/user/podman-pandoc/demo from unconfined_u:object_r:user_home_t:s0 to unconfined_u:object_r:container_file_t:s0

Run the command of converting documents again and it should work as expected.

$ cd demo/
$ podman run --rm -it \
       --volume "$(pwd):/home/pan" \
       --userns keep-id:uid=$(id -u),gid=$(id -g) \
       --security-opt seccomp=demo_seccomp.json \
       localhost/pandoc:latest \
           pandoc -f markdown -t html demo.md -o demo.html

The HTML file should have been generated successfully.

$ cat demo.html

With these efforts, we set the MAC confinement on top of the container as an additional layer of protection towards the containerized application, which works as a barrier between the container environment and the host system.

For more information about SELinux configuration and operation, refer to the Red Hat SELinux User’s and Administrator’s Guide ¹⁵ and the SELinux Portal on the openSUSE Wiki ¹⁶.

Further Hardening Suggestions

To further harden the container runtime environment, we can enable the remote attestation based on the Linux IMA component ¹⁷ on the host. Remote attestation provides a solution for the system-wide integrity verification targeting the executables and configuration files. In our scenario, it could secure the integrity of the binary files for the Podman platform as well as the customized Seccomp policy file we created.

As a prerequisite for enabling IMA-based remote attestation, usually, a workable TPM (Trusted Platform Module) device with PCRs (Platform Configuration Registers) should be available on the host machine to protect the measurement list from being tampered with during the attack ¹⁸. Except for TPM, for the AArch64 machine with the TrustZone integrated, the remote attestation relying on TEE (Trusted Execution Environment), for instance, the OP-TEE project ¹⁹, is another possible option. Considering the nondeterminism problem ²⁰ related to the balance of security and performance, we suggest creating a customized IMA policy to limit the attestation to a set of critical files instead of system-wide. SELinux label is supported in the IMA policy to mark the measurement files. Since the MAC solution we selected is SELinux, this approach might be the most straightforward. SMACK’s labeling system is another candidate when SELinux is not available. However, in this case, SMACK needs to be enabled on top of existing MAC tools like AppArmor, which creates a further performance burden on the system. In addition, a remote attestation solution at the production level, such as Keylime ²¹, is recommended. Relying on an existing mature solution, the errors and vulnerabilities that come from the development can be limited to the minimal.

Last but not least, all the above hardening solutions for the container must be based on the general OS hardening implementation, for example, maintaining compliance with the security baseline, keeping the system up to date, and managing the incidents properly and timely. Also, not only the host OS but also the system in the image should be well-hardened and maintained, which usually means the customized container image needs a rebuild when there is a vulnerability or weakness. As put by Bruce Schneier: “security is a process, not a product,” ²² it is essential to be aware of the risk in the specific production environment and maintain a strong defense strategy with multiple levels continuously to contend with potential attacks, especially in the network that becomes more and more hostile nowadays.

References

“GNU/Linux Sandboxing - A Brief Review,” HardenedLinux, August 20, 2024. [Online]. Available: https://hardenedlinux.org/blog/2024-08-20-gnu/linux-sandboxing-a-brief-review ↩︎
“podman-pandoc,” Github. [Online]. Available: https://github.com/hardenedlinux/podman-pandoc ↩︎
“Podman”, Podman.io. [Online]. Available: https://podman.io/ ↩︎ ↩︎
“Open Container Initiative image-spec,” GitHub. Accessed: September 26, 2024. [Online]. Available: https://github.com/opencontainers/image-spec/blob/main/spec.md ↩︎
“Open Container Initiative,” Open Container Initiative. Accessed: September 28, 2024. [Online]. Available: https://opencontainers.org/ ↩︎
“containers/podman,” Github Repository. [Online]. Available: https://github.com/containers/podman ↩︎
“Pandoc,” Pandoc.org. [Online]. Available: https://pandoc.org/ ↩︎
“containers/oci-seccomp-bpf-hook,” GitHub Repository. [Online]. Available: https://github.com/containers/oci-seccomp-bpf-hook ↩︎
V. Rothberg, “Improving Linux container security with seccomp,” Red Hat - Enable Sysadmin, June 15, 2020. Accessed: October 2, 2024. [Online]. Available: https://www.redhat.com/sysadmin/container-security-seccomp ↩︎
“mjson,” PyPI. [Online]. Available: https://pypi.org/project/mjson/ ↩︎
kernelmethod, “Allow rootless containers to use AppArmor profiles,” GitHub, March 11, 2022. Accessed: October 8, 2024. [Online]. Available: https://github.com/containers/common/issues/958 ↩︎
kernelmethod, “Allow rootless containers to use AppArmor profiles,” GitHub, March 12, 2022. Accessed: October 8, 2024. [Online]. Available: https://github.com/containers/common/commit/55d217f7dd9ef4721cf32b32d0d8b8b029e877b3 ↩︎
V. Rothberg, “Revert ‘Allow rootless containers to use AppArmor profiles’,” GitHub, March 18, 2022. Accessed: October 8, 2024. [Online]. Available: https://github.com/containers/common/commit/d167b7f079029344ddbfc6218d16f5eb6932ccd7 ↩︎
“Portal:SELinux/Setup,” openSUSE Wiki. Accessed: October 12, 2024. [Online]. Available: https://en.opensuse.org/Portal:SELinux/Setup ↩︎
M Jahoda, B Ančincová, I. Gkioka, and T. Čapek, “Red Hat Enterprise Linux 7 SELinux User’s and Administrator’s Guide,” Red Hat Documentation, September 23, 2024. Accessed: October 10, 2024. [Online]. Available: https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/selinux_users_and_administrators_guide/index ↩︎
“Portal:SELinux,” openSUSE Wiki. Accessed: October 12, 2024. [Online]. Available: https://en.opensuse.org/Portal:SELinux ↩︎
D. Kasatkin and mzohar, “Integrity Measurement Architecture (IMA) Wiki,” SourceForge. Accessed: October 13, 2024. [Online]. Available: https://sourceforge.net/p/linux-ima/wiki/Home/ ↩︎
Corbet, “The Integrity Measurement Architecture,” LWN.net, May 24, 2005. Accessed: October 13, 2024. [Online]. Available: https://lwn.net/Articles/137306/ ↩︎
“About OP-TEE,” Github. [Online]. Available: https://github.com/OP-TEE/optee_docs/blob/master/general/about.rst ↩︎
J Son et al., “Quantitative analysis of measurement overhead for integrity verification,” in SAC ‘17: Proceedings of the Symposium on Applied Computing, 2017, pp.1528-1533, doi: 10.1145/3019612.3019738. [Online]. Available: https://dl.acm.org/doi/10.1145/3019612.3019738 ↩︎
“Keylime,” Keylime. [Online]. Available: https://keylime.dev/ ↩︎
B. Schneier. “The Process of Security,” Schneier on Security, April, 2000. Accessed: October 13, 2024. [Online]. Available: https://www.schneier.com/essays/archives/2000/04/the_process_of_secur.html ↩︎