Alpaquita Linux: Keeping containers secure
1. Overview
This document provides several security recommendations and details that can help you secure containers and execution environments.
Note:
|
The examples in this document are mostly based on Podman, but you can apply all the recommendations to both Docker and Podman setup. |
2. Setting up Host
The following guidelines help you set up and maintain your host securely.
-
Regularly update the container host system, such as kernel and other software components. Make sure that kernel-based features related to kernel namespaces, private networking, and control groups are up-to-date with all available fixes.
-
Configure the container host system to use a minimal operating system setup and apply all security best practices. Ideally such systems must be set up only to host containers and not used for anything else.
Generally, we recommend reducing the number of services running on the same system to the required minimum. If some services are needed for the work process, consider moving all other services to run within containers controlled by Podman or transfer them to other host systems.
Set up audit to track activities
One of the most important security measures on Linux hosts is to conduct
proper audit of sensitive activities on the system. The instrument that
can help with tracking such activities is the audit support in Linux
kernel and corresponding userspace tools/packages. The audit allows
system administrators to monitor security sensitive events and report
them in a log, such as audit.log
. This log file can be created on a
local or remote system for better security protection.
Due to Podman implementation to use fork/exec
to run containers, the
audit feature works more correctly and stores necessary data in the log
files. In Docker, the auditd auid
is unset in the log, which
means a system administrator can see that a process associated with a
container has accessed security sensitive information, but the identity
is not recorded in the log. In Podman, auid
is properly recorded in
the audit log file.
For instance, a cat
coreutils executable was used to access sensitive
information and Podman provides all necessary information about an
identity while in Docker auid
is in an unset state.
type=SYSCALL msg=audit(02/04/23 20:31:56.759:5) : arch=x86_64 syscall=open success=yes exit=3 a0=0x7ffe5a873ecb a1=O_RDONLY a2=0x0 a3=0x0 items=1 ppid=2531 pid=2685 auid=test uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts0 ses=unset comm=cat exe=/usr/bin/coreutils key=(null)
It is recommended to use only Podman to set up container host environments and to use audit for tracking access to sensitive information.
Setting up an audit in Alpaquita requires installing the audit package and starting the audit service as follows.
sudo rc-service auditd start
Use cgroups
To properly set up resource management, the system must be configured to use Control Group v2 (cgroup v2). Consider performing the following actions to use cgroup v2.
Set up UIDs
Set up /etc/subuid
and /etc/subgid
.
Podman launches a container inside the user namespace, which is mapped
to the range of UIDs defined for the user in /etc/subuid
and
/etc/subgid
. Update those files for each user who is allowed to create
containers. If you update either /etc/subuid
or /etc/subgid
, all
running containers owned by affected users should be stopped. This can
be done automatically by using the following command that stops all
containers for a user and kills a pause process:
podman system migrate
Fix warnings about absent systemd
Alpaquita Linux does not use systemd
and you may observe the following
output if you run Podman on Alpaquita Linux:
WARN[0000] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers.
Run the following command to fix the setup.
sudo mount –make-rshared /
Install fuse-overlayfs
Note that fuse-overlayfs
package is not installed by default with the
Podman package. Install the package using the apk add
command.
sudo apk add fuse-overlayfs
Enable cgroups
Run the following commands.
sudo rc-update add cgroups
sudo rc-service cgroups start
We highly recommend using cgroups v2 that can be set by editing
/etc/rc.conf
. Assign the unified
option to rc_cgroup_mode
, and
enable controllers. You might need to run the following command for
the changes to take effect.
sudo rc-service cgroups restart
Use crun runtime
To limit resources for cgroups v2, use the corresponding OCI runtime in
the setup. We recommend setting up Podman’s crun
runtime as the default
runtime to use.
Edit /etc/containers/containers.conf
file and change Default OCI runtime
value to be always crun
.
Load additional Kernel modules
Load the following modules explicitly, because they are not loaded by the default.
sudo modprobe tun
sudo modprobe fuse
AppArmor
We recommend using AppArmor in the container host setup based on Alpaquita Linux. Proper AppArmor configuration helps prevent malicious container escapes and protects host sensitive resources.
3. Execute without root privileges
A healthy security practice is to never run containers with root privileges, because the root privileges are applied to access sensitive resources on a host system itself. Since containers with applications can be downloaded from the internet or have some security flaws, it is dangerous to let them have root privileges.
In the majority of cases there is no need to run containers with root privileges. There are specific cases depending on applications where running a container as root makes sense. One of the reasons could be when containers need to access specific mounts, devices on the host, or need to listen on ports less than 1024 on the host network.
Therefore, the most typical execution mode would be rootless and while Podman supports it out of the box, it becomes the main choice for setting up container environments.
By design, rootless Podman runs as root within the container unless it is reconfigured. This policy means that all processes in the container have the default list of namespaced capabilities that allow the processes to act like root inside the user namespace, including changing their UID and changing the user and group ownership of files that are mapped into the user namespace.
4. Set up namespaces
Podman takes advantage of user namespaces, so that root within the container is mapped to a non-root UID on the host. This setup allows Podman to safely install packages and run services from within the container without impacting the host.
Administrators can use user namespace to set a user identifier (UID) and group identifier (GID) mapping for running a container. This means that a process can run as UID 0 inside the container and as UID 123456 outside the container. In other words if a container process goes outside the container, the Linux kernel will treat that process as UID 123456.
The example --uidmap
setting instructs Podman to map a range of 5000 UIDs
inside the container, starting with UID 100000 outside the container (so
the range is 100000-104999) to a range starting at UID 0 inside the
container (so the range is 0-4999). If a process is running as UID 1
inside the container, it is 100001 on the host.
sudo podman run -d bellsoft/alpaquita-linux-base sleep 1000
sudo podman top --latest user huser
USER HUSER
root root
sudo podman run --uidmap 0:100000:5000 -d bellsoft/alpaquita-linux-base sleep 1000
sudo podman top --latest user huser
USER HUSER
root 100000
You can use the following namespace types:
-
Mount (mnt): isolates mount points;
-
Process ID (pid): isolates process IDs;
-
Network (net): isolates network stack;
-
Interprocess Communication (ipc): isolates interprocess communication resources;
-
UTS: isolates hostnames and domain names;
-
User ID (user): isolates user and groups IDs;
-
Control groups (cgroups): isolates cgroups;
-
Time: isolates time.
5. Container images and registries
It is important to verify that Podman images are received and deployed unchanged from a source registry with a trusted reputation and validated authentication. When pulling images from remote sources, ensure that the connection is protected and that HTTPS is used for the pull request. It is unsafe to use insecure image registries that are not protected by TLS. Note the following recommendations.
-
Pull images by their fully-qualified names instead of a shortened name to avoid a possibility of pulling from a different registry.
-
You can configure Podman to work with only trusted images from a remote registry if those images are signed and the signatures that can be validated against a local public key.
-
Images can be signed similarly as packages and a Podman host can be configured to require that images from a remote registry are signed and validated before being used locally.
-
Create reliably reproducible images from container files and required packages. Ensure that new images use base images and the software that you have is properly reviewed for security vulnerabilities:
-
Define a fixed version of the base image in an image Container File;
-
Define fixed versions of the package pulls in build steps of an image Container File;
-
Ensure that package pulls in the build steps use trusted and verified sources and repositories.
-
-
Reduce to a minimum the number of packages installed on images.
We do not recommend installing unnecessary packages to new image builds. Proper review of Container Files helps remove unnecessary installation steps, and the images are used for their main purpose.
6. Other resources and security considerations
Limit CPU and memory usage
Use the -m
option to limit memory and the -c
option to limit CPU.
Limit container file access
When creating and launching containers, limit file access using the
-v <host dir>:<container dir>:ro
option or --readonly
flag. It is
safer to explicitly create volumes for applications running inside a
container. Monitoring file changes in these volumes can help prevent
security breaches. Volumes that are dedicated for container write access
must be cleaned up regularly. Here is an example of how to use the :ro
option and mount a host directory in a way that the host directory/file
is read only for a container:
podman run -v /host_directory:/container_directory:ro bellsoft/alpaquita-linux-base
host_directory
is the host directory/files to be mounted as a volume
and become available in container_directory
in the container. We
strongly discourage you from mounting the following sensitive host
system directories at container runtime. /
, /boot
, /dev
, /etc,
/lib
, /usr
, /sys
, /proc
, and so on.
Limit container restarts
Malicious or accidental denial-of-service might happen to a container
that produces many errors, so limiting container restarts is a good
practice and can be done using the --restart=on-failure:N
option when
creating or launching a container.
Limit networking access from containers
It is an excellent practice to limit network access from a container
unless it is needed for running certain applications. When publishing
ports to the host, specify the IP address of the interface that a port
will be bound to so that the attack surface is reduced to the network
interface where the container should be listening to. Podman publishes to
all interfaces (0.0.0.0) by default if an IP address is not specified
when using the --publish
option.
Review the following recommendations:
-
Do not run SSH inside containers;
-
Do not map privileged ports (< 1024) inside containers;
-
Do not use the
--net=hostname
mode option when starting a container. This option gives the container full access to local system services and is insecure.
Review Kernel capabilities in containers
The following kernel capabilities are usually granted to a container by default. Review the capabilities and disable unnecessary ones.
– CHOWN
– DAC_OVERRIDE
– FSETID
– FOWNER
– NET_RAW
– SETGID
– SETUID
– SETFCAP
– SETPCAP
– NET_BIND_SERVICE
– SYS_CHROOT
– KILL
The --privileged
option disables the security measures used for
isolating a container from the host, so it should not be used unless
absolutely required. This option removes barriers from isolated
capabilities, limited devices, read-only mount points and volumes, and
so forth.
Monitor any malicious resource usage in container
Podman includes the necessary features to monitor container resource usage, such as memory consumption, CPU time, I/O, and network usage. Monitoring container resource usage for performance, error detection, and abnormal behavior like suspicious traffic or unexpected user activity, helps to detect security flaws as well.
Clean a container host system regularly
Images and containers that are not needed should be removed from the
host system to avoid image and container garbage and to protect from the
accidental execution of an old, unused image, or container that might
have possible security flaws. Podman has the auto-update
option to
automate the process of updating for new image/container versions
according to their auto-update policy.
Limit system calls in containers
By default, Podman containers limit the system calls available to containers based on the calls defined in the /usr/share/containers/seccomp.json
file. This list is valid for general purpose containers and is compatible with most containers, so there is no need to add more system calls.
However, you can narrow down system calls needed for a particular container to run. Create the seccomp generated profile using the following command:
sudo podman run --annotation io.containers.trace-syscall=”of:<file_path>” <other_options> <container_name>
You can reuse it later with the --security-opt seccomp=<file_path>
option.
Inspect 'runlabel' before execution
Podman has a useful feature to create shortcuts for podman run
with
all necessary metadata, so the actual container execution can be done
as follows:
podman container runlabel <my_label> <image_name>
Before actual execution, we recommend checking such shortcuts by
executing the runlabel --display
option to avoid any malicious
podman run
command. The option does not execute the label command, but
only displays what will be executed.
Secure remote API usage
Podman has a remote API implementation called varlink
that works over
a socket. The socket configuration is similar to a Unix socket,
restricting it to local use only. Security consideration is to run this
socket as a non-root user especially if a container has no requirement
to run as a privileged user. A socket can be created by running the
following Podman command:
podman varlink --timeout=0 unix:/run/user/$(id -u)/podman/io.podman
By creating a socket with Podman command, you ensure that proper permissions are set on the socket.