Author: Wei Reviewer: Shawn Chang
Revision: 0.1
Abstract:
This article presents a step-by-step guide to the container hardening process on the GNU/Linux operating system. For demonstration, we create a containerized application using the Podman container platform and Pandoc. The hardening begins with creating a customized Seccomp policy profile by analyzing system calls for the container process and applying the profile. Then, MAC tools such as AppArmor and SELinux are set up on the host OS to confine the container process from arbitrarily accessing the host file system. Finally, we give some suggestions about further hardening options and some discussions about security in general.
Introduction
Thanks to the features of quickly launching and easily deploying and managing, the container technique has been broadly applied in the industry. With the help of tools such as Kubernetes, the management tasks of these container instances can reach a relatively large scale. However, it raises some questions about the security of container and containerized applications. When there is a new vulnerability, the application and its dependencies in the container may not always be updated in time, at least not as fast as the applications covered by the package management tools in an ordinary system. Furthermore, unlike the virtualization technique where the guest OS runs in the virtual machine independently, the container shares the same kernel and the process management mechanism with the host system, which potentially exposes more attack interfaces to the host and other sibling container processes. Considering this situation, further hardening targeting the container and containerized applications might be necessary to mitigate the impact of the utilization of vulnerabilities and to reduce the damage to a low level.
This article explores container hardening methods utilizing the Linux Seccomp filter and MAC tools in the host system and other means. Firstly, we select the container application, namely Podman, and set up the test environment, where we build a Pandoc container image as an example. Then, the Seccomp filter is generated using oci-seccomp-bpf-hook and applied according to the specific container instance. After that, we will see how MAC tools, including AppArmor and SELinux, are configured for container confinement.
The process presented here targets the hardening of container instances and containerized applications in the industry environment. It is not a sandbox solution, and we do not intend to implement such a solution. Although containerization can be applied as one of the sandbox confinement mechanisms, regarding a complete sandbox solution, additional restriction interfaces might be involved, and a different security perspective is considered as well. As a reference, an article that provides more information about the sandbox solutions on the GNU/Linux system is available 1.
For the purpose of better demonstrate this process, documents including Dockerfile, Makefile, and seccomp filters have been collected and uploaded to our git repository 2.
Build the environment
Container Solution
First and foremost, a proper container platform is selected, is important considering that we need a hardening solution to make the exploitation more difficult.
For our environment, Podman is chosen instead of other solutions such as Docker and Linux Container. It supports the rootless container natively 3 so the container instance can be run safely by a non-root user. Also, Podman manages the container in a daemon-less way 3, which means, in contrast with Docker, no Podman management process is always running in the background. Both features reduce the attack interfaces exposed to the host.
Podman supports various commonly used container image formats, including OCI specification 4 5 and Docker images 6. It also has a CLI user interface that is compatible with Docker, meaning that it is possible to use the same subcommands and options of the Docker interface to manage Podman containers and images.
In this process, the Podman version 5.2.2 is installed.
OS
Here, we use OpenSUSE Tumbleweed, a rolling release distribution, as our host operating system and the system inside the container. The rolling release guarantees the applications can always get the latest updates when the patches are available. In addition, it is usually easier to apply the fix for the vulnerability on a rolling distribution compared to the fixed-term release system, which often requires some backport efforts for the fixes.
The Appliation
For the demonstration purpose, a Pandoc container will be built. Pandoc 7 is a powerful open-source tool to convert documents between multiple formats such as markdown, markups, Open Document Format (ODF), and PDF. In the following process, a markdown document is converted to HTML for simplicity’s sake, and the output result is checked to make sure the application works as expected.
Build and Test the Image
The Dockerfile in the git repository under build/
shows a simple process to
build the image. Only the minimized packages are included. Instead of using
root, a dedicated pair of user and group pan:pan
is created to run the
application in the container.
Build the image.
$ podman build -t pandoc:latest build/
In this process, the latest update of pandoc installed in the container is in version 3.3.
Once done, check the build result.
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/pandoc latest 0c162b4e848a About a minute ago 667 MB
registry.opensuse.org/opensuse/tumbleweed latest 0d8c60935b25 22 hours ago 99.2 MB
Before hardening the container, test the image by converting a simple markdown
document at demo/demo.md
to the HTML format. Here, we map the current
directory from the host to the home directory of the user that runs the Pandoc
application from the container.
$ cd demo/
$ podman run --rm -it \
--volume "$(pwd):/home/pan" \
--userns keep-id:uid=$(id -u),gid=$(id -g) \
localhost/pandoc:latest \
pandoc -f markdown -t html demo.md -o demo.html
The generated markdown file demo.md
should have contained the correct HTML
format content converted from the original markdown.
So far, if everything goes well, we already have a workable image as the target of our hardening process.
Generate Customized Seccomp Filter
Seccomp is a kernel mechanism that restricts the process from accessing some
unused system calls defined by the policy. Usually, only a small set of system
calls are permitted. Both Docker and Podman come with the seccomp support. If
not specified explicitly, the policy from a default Seccomp profile whose
location is defined in containers.conf
is applied.
On the Tumbleweed:
$ grep seccomp_profile /usr/share/containers/containers.conf
#seccomp_profile = "/usr/share/containers/seccomp.json"
As shown, the default profile is /usr/share/containers/seccomp.json
. However,
according to the policy in this profile, 375 syscalls are allowed in total,
which is overly coarse-grained for a hardening solution. Because the profile is
to be used by all containers, it must include system calls as much as possible
to fit all situations.
A common practice to reduce the attack interface is creating a customized policy that fits our use scenario. In other words, only the system calls that are necessary for running the above Pandoc command are included in the policy.
An unignorable problem with manually creating the Seccomp policy for an application is that it may take considerable effort and loads of time. To do so, the policy developers need to be familiar with the application, sometimes even at the code level, which is usually not practicable. For efficiency, we recommend utilizing tools to generate the Seccomp policy.
The tool oci-seccomp-bpf-hook 8 is used in our demonstration, which provides an OCI hook to trace and analyze the system calls of the container from BPF and generate a Seccomp policy profile. The package is available on major GNU/Linux distributions such as Fedora, openSUSE Tumbleweed, Debian, and Ubuntu.
Install oci-seccomp-bpf-hook
On the openSUSE Tumbleweed, as it hasn’t been added to the official repository, we need to add the “security” development repository before installing the package. A subpackage for testing is also available so we can verify whether the tool works on the system before creating the policy profile.
$ sudo zypper addrepo https://download.opensuse.org/repositories/security/openSUSE_Tumbleweed/ security
$ sudo zypper install oci-seccomp-bpf-hook oci-seccomp-bpf-hook-tests
We plan to push the package to the factory project so that the application can be available from the official repository in the future.
The application might not be available on some distributions, but it could be built from the source code. In this case, some development libraries for the Go programming language need to be installed as dependencies.
By default, the hook’s binary is installed at
/usr/libexec/oci/hooks.d/oci-seccomp-bpf-hook
. Meanwhile, a JSON file located
at /usr/share/containers/oci/hooks.d/oci-seccomp-bpf-hook.json
defines the
path to the binary, which will be passed to the command to trace the system
calls later.
About Unprivileged BPF
Most distributions have disabled Unprivileged BPF for security considerations. Since the BPF hook is required to work with the root privilege, we don’t need to enable unprivileged BPF. Once the Seccomp profile is there, it can be used in the rootless Podman process.
Test oci-secomp-bpf-hook
Because the test suit is also available, we could check if the BPF hook really works.
$ sudo /usr/share/oci-seccomp-bpf-hook/test/system/test_runner.sh
++ time bats --tap .
1..8
ok 1 Podman available
ok 2 Version check # skip This test only makes sense in a source-tree environment
ok 3 Trace and check size of generated profile
ok 4 Trace and use generated profile
ok 5 Containers fails to run blocked syscall
ok 6 Extend existing seccomp profile
ok 7 Syscall blocked in input profile remains blocked in output profile
ok 8 Trace and look for syslogs
real 0m33.802s
user 0m1.020s
sys 0m0.575s
The article 9 by Red Hat gives detailed instructions about the usage of oci-seccomp-bpf-hook.
Generate the Seccomp Profile
To create the Seccomp profile, the previous command that converts between
different document formats will be used, along with some additional options.
The command can find oci-seccomp-bpf-hook.json
without needing to specify its
location. The option --annotation
with the argument
io.containers.trace-syscall
is included to output the created profile. Since
we run the command as root privilege and the podman image
localhost/pandoc:latest
we built has been stored in the storage of the
unprivileged user, the image is invisible to the podman under the sudo
. For
this reason, the option --root
is included.
$ sudo podman run --rm -it \
--volume "$(pwd):/home/pan" \
--userns keep-id:uid=$(id -u),gid=$(id -g) \
--root ${HOME}/.local/share/containers/storage/ \
--annotation io.containers.trace-syscall="of:demo_seccomp.json" \
localhost/pandoc:latest \
pandoc -f markdown -t html demo.md -o demo.html
The output Seccomp profile is in a JSON file. For consistency, change the file owner from root to the unprivileged user.
$ sudo chown -R $(id -un):$(id -gn) demo_seccomp.json
Since the command has run in root privilege on the image for the unprivileged user, the ownership of some files has been modified to the root user. We need to change it back in order to use the image as the unprivileged user.
$ sudo chown -R $(id -un):$(id -gn) \
${HOME}/.local/share/containers/storage/ \
/tmp/storage-run-$(shell id -u)/containers/overlay-layers/mountpoints.json
Apply the Profile
Since the customized Seccomp profile is ready, the next step is to run the
pandoc container with this profile and see if it works in the normal operation
scenario. We insert the profile into the command with the --security-opt
option.
$ podman run --rm -it \
--volume "$(pwd):/home/pan" \
--userns keep-id:uid=$(id -u),gid=$(id -g) \
--security-opt seccomp=demo_seccomp.json \
localhost/pandoc:latest \
pandoc -f markdown -t html demo.md -o demo.html
If everything is correct, the HTML file demo.html
is generated, and it should
be identical to the one generated previously without the hook.
Check the Profile
Using tools like Python mjson 10, the customized Seccomp profile can be formatted for easy observation.
$ python3 -mjson.tool demo_seccomp.json demo_seccomp.json
By checking the file, we can see 73 system calls in the allow list, a small set compared to the 375 system calls from the default Seccomp profile for containers. About 80% of system calls from the default profile are blocked.
Configure the MAC Tools for the Container
Mandatory Access Control (MAC) tools can be enabled on the host as an additional hardening layer to prevent container escape and mitigate the impact of vulnerabilities on both container platform and the application. Common used MAC tools on major GNU/Linux distributions are AppArmor and SELinux.
AppArmor
Currently, there is an issue with using AppArmor to confine the Podman
container run by the unprivileged user. When the Podman starts with the
AppArmor enforcement and enables the AppArmor profile, it accesses
/sys/kernel/security/apparmor/profiles
, which require the root privilege for
reading 11. Therefore, when the container is run by an unprivileged
user, it fails with an error that indicates AppArmor is not enabled on the
system.
For normal Podman functionality with AppArmor enforced, accessing the
profiles
pseudo file is unnecessary. A commit 12 was submitted by the
community to remove the file accessing so that the AppArmor profile could work
for the Podman run by the unprivileged user. However, it has been reverted
13 followingly due to the failure in the CI check procedure.
Still, it is possible to enable the AppArmor profile by running the Podman container with the root user, but the advantage of the rootless container in the hardening will be eliminated.
Another issue with AppArmor is that, so far, most distributions do not have an official workable profile for Podman, which means users have to create one manually by themselves.
For these reasons, instead of the AppArmor enforcement, SELinux is enabled in this process. We will continue monitoring the fixing progress from the Podman official repository and update our contents accordingly as necessary.
SELinux
Since SELinux is not delivered on openSUSE Tumbleweed by default, we must install the necessary packages and enable SELinux from the configure file and kernel command line.
Instructions about configuring the kernel command line, labeling the file system, and enforcing SELinux can be found on the openSUSE wiki page at 14.
Noticed that except for the targeted policies in selinux-policy-targeted
, the
package container-selinux
is also required which contains policies for the
Podman and general containers.
$ sudo zypper install selinux-policy-targeted container-selinux
The contexts are defined at:
$ cat /usr/share/containers/selinux/contexts
process = "system_u:system_r:container_t:s0"
file = "system_u:object_r:container_file_t:s0"
ro_file="system_u:object_r:container_ro_file_t:s0"
kvm_process = "system_u:system_r:container_kvm_t:s0"
init_process = "system_u:system_r:container_init_t:s0"
engine_process = "system_u:system_r:container_engine_t:s0"
The related policies can be found at: (only part of results are shown as an example)
$ sudo bzcat /usr/share/selinux/packages/container.pp.bz2 | grep -ai "pod\|/container"
/usr/s?bin/containerd.* -- system_u:object_r:container_runtime_exec_t:s0
/usr/local/s?bin/containerd.* -- system_u:object_r:container_runtime_exec_t:s0
/usr/bin/container[^/]*plugin -- system_u:object_r:container_runtime_exec_t:s0
/usr/bin/podman -- system_u:object_r:container_runtime_exec_t:s0
/usr/local/bin/podman -- system_u:object_r:container_runtime_exec_t:s0
...
/var/log/pods(/.*)? system_u:object_r:container_log_t:s0
...
Now, try to run the command to convert documents again and see if it works:
$ cd demo/
$ podman run --rm -it \
--volume "$(pwd):/home/pan" \
--userns keep-id:uid=$(id -u),gid=$(id -g) \
--security-opt seccomp=demo_seccomp.json \
localhost/pandoc:latest \
pandoc -f markdown -t html demo.md -o demo.html
pandoc: demo.md: withBinaryFile: permission denied (Permission denied)
The command failed with permission denied. It is probably because the container under SELinux does not have access permission for the mapped directory where the markdown document is.
Check the SELinux context for the directory:
$ cd ..
$ ls -Zd demo/
system_u:object_r:user_home_t:s0 demo/
$ ls -Z1 demo/
system_u:object_r:user_home_t:s0 demo.md
system_u:object_r:user_home_t:s0 demo_seccomp.json
system_u:object_r:user_home_t:s0 Makefile
It shows the labeled type of the mapped directory is user_home_t
. However,
accroding to the contexts defined at /usr/share/containers/selinux/contexts
,
the Podman process is allowed only to access the files with the type
container_file_t
labeled, thus pointing us to relabel the mapped directory
and the files.
To change the context temporarily:
$ sudo chcon -R -t container_file_t demo/
$ ls -Zd demo/
system_u:object_r:container_file_t:s0 demo/
However, we want to make the change permanent so in future the label can be
restored to the correct one with restorecon
.
The following command modify the context permanently.
$ sudo semanage fcontext -a -t container_file_t "/home/user/podman-pandoc/demo(/.*)?"
Confirm our customized context has been available.
$ sudo semanage fcontext -C -l
SELinux fcontext type Context
/home/user/podman-pandoc/demo(/.*)? all files system_u:object_r:container_file_t:s0
Finally, we relabel the directory with customized context.
$ sudo restorecon -R -v /absolute/path/to/demo
Relabeled /home/user/podman-pandoc/demo from unconfined_u:object_r:user_home_t:s0 to unconfined_u:object_r:container_file_t:s0
Run the command of converting documents again and it should work as expected.
$ cd demo/
$ podman run --rm -it \
--volume "$(pwd):/home/pan" \
--userns keep-id:uid=$(id -u),gid=$(id -g) \
--security-opt seccomp=demo_seccomp.json \
localhost/pandoc:latest \
pandoc -f markdown -t html demo.md -o demo.html
The HTML file should have been generated successfully.
$ cat demo.html
With these efforts, we set the MAC confinement on top of the container as an additional layer of protection towards the containerized application, which works as a barrier between the container environment and the host system.
For more information about SELinux configuration and operation, refer to the Red Hat SELinux User’s and Administrator’s Guide 15 and the SELinux Portal on the openSUSE Wiki 16.
Further Hardening Suggestions
To further harden the container runtime environment, we can enable the remote attestation based on the Linux IMA component 17 on the host. Remote attestation provides a solution for the system-wide integrity verification targeting the executables and configuration files. In our scenario, it could secure the integrity of the binary files for the Podman platform as well as the customized Seccomp policy file we created.
As a prerequisite for enabling IMA-based remote attestation, usually, a workable TPM (Trusted Platform Module) device with PCRs (Platform Configuration Registers) should be available on the host machine to protect the measurement list from being tampered with during the attack 18. Except for TPM, for the AArch64 machine with the TrustZone integrated, the remote attestation relying on TEE (Trusted Execution Environment), for instance, the OP-TEE project 19, is another possible option. Considering the nondeterminism problem 20 related to the balance of security and performance, we suggest creating a customized IMA policy to limit the attestation to a set of critical files instead of system-wide. SELinux label is supported in the IMA policy to mark the measurement files. Since the MAC solution we selected is SELinux, this approach might be the most straightforward. SMACK’s labeling system is another candidate when SELinux is not available. However, in this case, SMACK needs to be enabled on top of existing MAC tools like AppArmor, which creates a further performance burden on the system. In addition, a remote attestation solution at the production level, such as Keylime 21, is recommended. Relying on an existing mature solution, the errors and vulnerabilities that come from the development can be limited to the minimal.
Last but not least, all the above hardening solutions for the container must be based on the general OS hardening implementation, for example, maintaining compliance with the security baseline, keeping the system up to date, and managing the incidents properly and timely. Also, not only the host OS but also the system in the image should be well-hardened and maintained, which usually means the customized container image needs a rebuild when there is a vulnerability or weakness. As put by Bruce Schneier: “security is a process, not a product,” 22 it is essential to be aware of the risk in the specific production environment and maintain a strong defense strategy with multiple levels continuously to contend with potential attacks, especially in the network that becomes more and more hostile nowadays.
References
-
“GNU/Linux Sandboxing - A Brief Review,” HardenedLinux, August 20, 2024. [Online]. Available: https://hardenedlinux.org/blog/2024-08-20-gnu/linux-sandboxing-a-brief-review ↩︎
-
“podman-pandoc,” Github. [Online]. Available: https://github.com/hardenedlinux/podman-pandoc ↩︎
-
“Podman”, Podman.io. [Online]. Available: https://podman.io/ ↩︎ ↩︎
-
“Open Container Initiative image-spec,” GitHub. Accessed: September 26, 2024. [Online]. Available: https://github.com/opencontainers/image-spec/blob/main/spec.md ↩︎
-
“Open Container Initiative,” Open Container Initiative. Accessed: September 28, 2024. [Online]. Available: https://opencontainers.org/ ↩︎
-
“containers/podman,” Github Repository. [Online]. Available: https://github.com/containers/podman ↩︎
-
“Pandoc,” Pandoc.org. [Online]. Available: https://pandoc.org/ ↩︎
-
“containers/oci-seccomp-bpf-hook,” GitHub Repository. [Online]. Available: https://github.com/containers/oci-seccomp-bpf-hook ↩︎
-
V. Rothberg, “Improving Linux container security with seccomp,” Red Hat - Enable Sysadmin, June 15, 2020. Accessed: October 2, 2024. [Online]. Available: https://www.redhat.com/sysadmin/container-security-seccomp ↩︎
-
“mjson,” PyPI. [Online]. Available: https://pypi.org/project/mjson/ ↩︎
-
kernelmethod, “Allow rootless containers to use AppArmor profiles,” GitHub, March 11, 2022. Accessed: October 8, 2024. [Online]. Available: https://github.com/containers/common/issues/958 ↩︎
-
kernelmethod, “Allow rootless containers to use AppArmor profiles,” GitHub, March 12, 2022. Accessed: October 8, 2024. [Online]. Available: https://github.com/containers/common/commit/55d217f7dd9ef4721cf32b32d0d8b8b029e877b3 ↩︎
-
V. Rothberg, “Revert ‘Allow rootless containers to use AppArmor profiles’,” GitHub, March 18, 2022. Accessed: October 8, 2024. [Online]. Available: https://github.com/containers/common/commit/d167b7f079029344ddbfc6218d16f5eb6932ccd7 ↩︎
-
“Portal:SELinux/Setup,” openSUSE Wiki. Accessed: October 12, 2024. [Online]. Available: https://en.opensuse.org/Portal:SELinux/Setup ↩︎
-
M Jahoda, B Ančincová, I. Gkioka, and T. Čapek, “Red Hat Enterprise Linux 7 SELinux User’s and Administrator’s Guide,” Red Hat Documentation, September 23, 2024. Accessed: October 10, 2024. [Online]. Available: https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/selinux_users_and_administrators_guide/index ↩︎
-
“Portal:SELinux,” openSUSE Wiki. Accessed: October 12, 2024. [Online]. Available: https://en.opensuse.org/Portal:SELinux ↩︎
-
D. Kasatkin and mzohar, “Integrity Measurement Architecture (IMA) Wiki,” SourceForge. Accessed: October 13, 2024. [Online]. Available: https://sourceforge.net/p/linux-ima/wiki/Home/ ↩︎
-
Corbet, “The Integrity Measurement Architecture,” LWN.net, May 24, 2005. Accessed: October 13, 2024. [Online]. Available: https://lwn.net/Articles/137306/ ↩︎
-
“About OP-TEE,” Github. [Online]. Available: https://github.com/OP-TEE/optee_docs/blob/master/general/about.rst ↩︎
-
J Son et al., “Quantitative analysis of measurement overhead for integrity verification,” in SAC ‘17: Proceedings of the Symposium on Applied Computing, 2017, pp.1528-1533, doi: 10.1145/3019612.3019738. [Online]. Available: https://dl.acm.org/doi/10.1145/3019612.3019738 ↩︎
-
“Keylime,” Keylime. [Online]. Available: https://keylime.dev/ ↩︎
-
B. Schneier. “The Process of Security,” Schneier on Security, April, 2000. Accessed: October 13, 2024. [Online]. Available: https://www.schneier.com/essays/archives/2000/04/the_process_of_secur.html ↩︎