Seccomp and AppArmor
“https://kubernetes.io/docs/tutorials/security/seccomp/ | https://kubernetes.io/docs/tutorials/security/apparmor/”
Seccomp and AppArmor are Linux kernel security modules that restrict what a process can do at the syscall / file level. They are the last line of defense in the container sandbox: PSS / SecurityContext restrict what a container is allowed to request, but seccomp and AppArmor restrict what the kernel will do for the process inside. This is defense-in-depth — even if an attacker exploits the app, the kernel’s restrictions limit the blast radius.
Table of Contents
- The Kernel Sandbox Layers
- Seccomp — the Syscall Filter
- The Seccomp Profile
- Seccomp in k8s
- The RuntimeDefault Profile
- The Localhost Profile (Custom)
- Seccomp Profile Generation
- AppArmor — the File + Capability Filter
- The AppArmor Profile
- AppArmor in k8s
- Seccomp vs AppArmor — When to Use Which
- Common Patterns
- Operations and Debugging
- Gotchas and Common Mistakes
1. The Kernel Sandbox Layers
A container is a process. The kernel’s sandboxing primitives limit what the process can do:
| Layer | What it restricts | Where it lives |
|---|---|---|
| Linux capabilities | Privileged operations (mount, raw socket, etc.) | security.capability |
| Seccomp | Syscalls (open, read, write, clone, …) | seccomp |
| AppArmor | File paths, capabilities, network, mount | LSM (Linux Security Module) |
| SELinux | File paths, network, capabilities (more granular than AppArmor) | LSM |
| Namespaces | What the process can see (PIDs, network, mount) | clone() flags |
| cgroups | Resource limits (CPU, memory, disk) | cgroup fs |
Seccomp restricts syscalls — the process can only call a specific set. AppArmor restricts file paths, capabilities, and network — the process can only access a specific set of resources.
Both are LSMs (Linux Security Modules) — plug-ins to the kernel’s security framework. They sit between the syscall interface and the kernel’s actual operations.
2. Seccomp — the Syscall Filter
Seccomp (Secure Computing Mode) restricts the syscalls a process can make. The kernel evaluates each syscall: “is this syscall in the allow list? If yes, run it. If no, kill the process (with SIGSYS).”
The seccomp filter is a BPF program (the same BPF as eBPF / Cilium / Falco). It’s loaded into the kernel via prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, ...) or seccomp(SECCOMP_SET_MODE_FILTER, ...).
The BPF program returns one of:
SECCOMP_RET_ALLOW— the syscall runs.SECCOMP_RET_ERRNO— the syscall returns an error (with a specific errno).SECCOMP_RET_TRAP— the process is killed withSIGSYS.SECCOMP_RET_LOG— the syscall is allowed, but the action is logged.SECCOMP_RET_KILL_PROCESS— the process (and all threads) are killed.
The default in most kernels is Unconfined — all syscalls allowed. With seccomp, you narrow the set.
3. The Seccomp Profile
A seccomp profile is a JSON file that describes the allowed syscalls:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_AARCH64"],
"syscalls": [
{
"names": ["read", "write", "open", "close", "stat", "fstat", "mmap", "mprotect", "munmap", "brk", "rt_sigaction", "rt_sigprocmask", "rt_sigreturn", "ioctl", "nanosleep", "select", "mmap2", "madvise", "exit_group", "exit"],
"action": "SCMP_ACT_ALLOW"
}
]
}The structure:
defaultAction— what to do for syscalls not explicitly listed.SCMP_ACT_ERRNOreturns an error;SCMP_ACT_KILLkills the process.architectures— which CPU architectures the profile applies to.syscalls— list of rules. Each rule has syscall names and an action.
A whitelist profile has defaultAction: SCMP_ACT_ERRNO and explicit SCMP_ACT_ALLOW rules for allowed syscalls. A blacklist profile has defaultAction: SCMP_ACT_ALLOW and explicit SCMP_ACT_ERRNO rules for denied syscalls.
Whitelist is the standard for production. Blacklist is rarely correct.
4. Seccomp in k8s
In k8s, a seccomp profile is set via securityContext.seccompProfile:
apiVersion: v1
kind: Pod
metadata: { name: myapp }
spec:
securityContext:
seccompProfile:
type: RuntimeDefault # or Localhost or Unconfined
containers:
- name: app
image: myapp:1.0
securityContext:
seccompProfile:
type: RuntimeDefaultThree values for type:
Unconfined— no seccomp (the default). All syscalls allowed.RuntimeDefault— use the container runtime’s default seccomp profile. This is a safe superset of syscalls the runtime considers safe.Localhost— use a custom profile loaded from the node (in/var/lib/kubelet/seccomp/<name>.json).
PSS restricted requires seccompProfile.type: RuntimeDefault or Localhost (no Unconfined).
5. The RuntimeDefault Profile
The container runtime (containerd, CRI-O) has a default seccomp profile that’s safe for most workloads. It’s a whitelist of syscalls that are commonly needed (read, write, mmap, etc.) and excludes dangerous ones (raw socket manipulation, kernel module loading, etc.).
RuntimeDefault is the default for PSS restricted. It applies a curated, safe profile without you having to write one.
The profile is in the runtime’s source:
- containerd’s default — a JSON file in the containerd repo.
- CRI-O’s default — a JSON file in the CRI-O repo.
These profiles are very similar (whitelist of ~50 syscalls). They allow the common syscalls and deny the rest.
6. The Localhost Profile (Custom)
For workloads that need a custom seccomp profile (e.g. an app that uses a syscall not in the default):
- Write the profile — a JSON file like the one above.
- Place it on every node — at
/var/lib/kubelet/seccomp/<name>.json. - Reference it from the Pod:
seccompProfile:
type: Localhost
localhostProfile: profiles/my-profile.jsonThe localhostProfile is a relative path under /var/lib/kubelet/seccomp/. The kubelet reads the file and loads the profile.
The downside: the profile is on every node. With managed clusters (EKS, GKE), you can’t add files to nodes. You’d use a DaemonSet that mounts the profile, or use the k8s-native seccomp profile (next section).
7. Seccomp Profile Generation
Writing a seccomp profile by hand is tedious. Tools:
- bashica (spd-tx) — generates from a process’s actual syscalls. Run the app, capture the syscalls, generate a profile.
- kubectl-debug (Bhojwani) — runs in a pod, captures syscalls, generates a profile.
- Kubernetes Security Profile Operator (SPO) — generates and manages seccomp profiles as k8s objects.
The SPO is the k8s-native way to manage seccomp profiles. It:
- Lets you create
SeccompProfileCRDs. - Auto-generates profiles by recording an app’s syscalls.
- Distributes the profile to nodes (via a DaemonSet or a CSI driver).
apiVersion: security-profiles-operator.x-k8s.io/v1beta1
kind: SeccompProfile
metadata: { name: my-app }
spec:
defaultAction: SCMP_ACT_ERRNO
syscalls:
- names: [read, write, open, ...]
action: SCMP_ACT_ALLOWThe SPO controller makes the profile available to all nodes. The Pod references it:
seccompProfile:
type: Localhost
localhostProfile: my-app.json # matches the SPO's name8. AppArmor — the File + Capability Filter
AppArmor is a Linux Security Module that restricts:
- File access — which files the process can read / write / execute.
- Capabilities — which Linux capabilities the process has.
- Network — which network operations the process can do.
- Mount — which mount operations are allowed.
AppArmor is path-based — rules are tied to file paths. SELinux (the alternative) is label-based — rules are tied to inode labels. AppArmor is simpler; SELinux is more granular.
AppArmor is mostly used on Debian / Ubuntu. On RHEL / CentOS, the equivalent is SELinux.
9. The AppArmor Profile
An AppArmor profile is a text file (typically in /etc/apparmor.d/):
#include <tunables/global>
profile myapp flags=(attach_disconnected) {
#include <abstractions/base>
# Allow reading the app's data
/var/lib/myapp/** r,
/etc/myapp/** r,
# Allow writing to its temp dir
/tmp/** rw,
# Deny writing to /etc
deny /etc/** w,
# Allow network
network inet tcp,
network inet6 tcp,
# Deny raw socket
deny network raw,
# Allow capabilities
capability dac_read_search,
deny capability sys_admin,
}
The structure:
profile myapp flags=(attach_disconnected)— the profile name and flags.attach_disconnectedapplies the profile to threads that don’t have one.#include <abstractions/base>— common rules (read /lib, etc.).- Path rules —
path permission,(r = read, w = write, x = execute, etc.). deny— explicit denials.network— network rules.capability— Linux capabilities.
A profile is loaded into the kernel with apparmor_parser. Once loaded, the profile is in /sys/kernel/security/apparmor/profiles.
10. AppArmor in k8s
In k8s, an AppArmor profile is set via an annotation:
apiVersion: v1
kind: Pod
metadata:
name: myapp
annotations:
container.apparmor.security.beta.kubernetes.io/app: runtime/default
spec:
containers:
- name: app
image: myapp:1.0The annotation key is container.apparmor.security.beta.kubernetes.io/<container-name>. The value is:
runtime/default— use the runtime’s default profile.localhost/<profile-name>— use a profile loaded on the node (in/etc/apparmor.d/<name>).unconfined— no AppArmor.
The profile is loaded on the node (not in the Pod spec). The kubelet sets the profile via the container runtime.
10.1 Loading profiles
AppArmor profiles are loaded on each node:
- Write the profile to
/etc/apparmor.d/<name>. - Parse it with
apparmor_parser -r /etc/apparmor.d/<name>. - Reference it from the Pod annotation.
For k8s-native management:
- AppArmor profiles as a DaemonSet — a DaemonSet that loads profiles on each node.
- Security Profiles Operator (SPO) — k8s-native AppArmor + seccomp management.
11. Seccomp vs AppArmor — When to Use Which
| Seccomp | AppArmor | |
|---|---|---|
| Restricts | Syscalls | Files, capabilities, network, mount |
| Granularity | Per-syscall | Per-path |
| Profile format | JSON | Text |
| Common in k8s | Very (PSS restricted requires it) | Less (annotation-based, OS-dependent) |
| OS support | All Linux | Debian / Ubuntu primarily |
| Equivalent on RHEL | (seccomp itself) | SELinux (different syntax) |
The decision:
- Use seccomp as a baseline. It’s supported everywhere and is the PSS
restrictedrequirement. - Use AppArmor for additional path-based restrictions on Debian / Ubuntu. SELinux on RHEL.
- Use both for defense-in-depth (a syscall that bypasses seccomp is still caught by AppArmor, and vice versa).
For most clusters, seccomp RuntimeDefault is enough. AppArmor is added when there’s a specific threat (e.g. “the app should never read /etc/shadow”).
12. Common Patterns
12.1 PSS restricted baseline
securityContext:
seccompProfile:
type: RuntimeDefault
capabilities:
drop: ["ALL"]
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: trueThis is the safe default for application containers. Seccomp narrows the syscalls; capabilities are dropped; no root; no privilege escalation; read-only root.
12.2 Custom seccomp for a specific app
securityContext:
seccompProfile:
type: Localhost
localhostProfile: my-app-profile.jsonThe profile is on every node (via SPO or manual). The app gets its custom seccomp filter.
12.3 AppArmor for a privileged workload
metadata:
annotations:
container.apparmor.security.beta.kubernetes.io/app: localhost/myappThe profile is on the node. The workload’s container gets the AppArmor filter.
12.4 The “deny all, allow specific” pattern
A seccomp profile that denies everything except a small whitelist:
{
"defaultAction": "SCMP_ACT_ERRNO",
"syscalls": [
{"names": ["read", "write", "exit", "exit_group"], "action": "SCMP_ACT_ALLOW"}
]
}This is the strictest seccomp profile. The process can read, write, and exit — and nothing else. The process can’t even open files (no open / openat).
For most apps, this is too restrictive (you need open, mmap, etc.). For some (e.g. a tight network-only app), it works.
13. Operations and Debugging
13.1 Common commands
# check the seccomp status of a Pod
kubectl get pod <pod> -o jsonpath='{.spec.securityContext.seccompProfile}'
# check the seccomp profile applied (on the node)
# find the container's cgroup
cat /proc/<pid>/status | grep Seccomp
# Seccomp: 0 = disabled, 1 = strict, 2 = filter
# 2 = filter (a profile is loaded)
# check the AppArmor profile (on the node)
cat /proc/<pid>/attr/current
# shows the current AppArmor profile
# list loaded AppArmor profiles
sudo aa-status13.2 The “container is killed by seccomp” case
The container is CrashLoopBackOff. The logs show Bad system call (signal 31 / SIGSYS).
# 1. Check the seccomp profile
kubectl get pod <pod> -o jsonpath='{.spec.securityContext.seccompProfile}'
# 2. Check the container's syscalls
# (on the node, get the container's PID)
crictl inspect <container-id> | grep pid
# then strace the process
strace -p <pid> -e trace=all 2>&1 | tail
# shows the syscall that triggered the kill
# 3. If the profile is too strict, switch to Localhost with a custom profile
# (or remove the seccomp entirely as a test)13.3 The “AppArmor denies access” case
The container can’t read a file. The logs show permission denied (from AppArmor, not the FS).
# 1. Check the AppArmor profile
kubectl get pod <pod> -o jsonpath='{.metadata.annotations.container\.apparmor\.security\.beta\.kubernetes\.io\/<container>}'
# 2. Check the kernel audit
dmesg | grep -i apparmor
# shows "audit: type=1400 audit=... apparmor=\"DENIED\""
# 3. Update the profile
# add the path with the right permissions14. Gotchas and Common Mistakes
14.1 The 25+ common mistakes
-
Unconfinedis the default. Without explicitseccompProfile.type: RuntimeDefault, the container is unconfined. PSSrestrictedrequires it. -
A seccomp profile that’s too strict kills the container. The container may need syscalls not in the default. Use
Localhostfor custom profiles. -
Seccomp is per-syscall, not per-app. The filter applies to the process, regardless of the app. The process can’t call
openifopenis not in the allow list. -
AppArmor is path-based. A rule like
/var/lib/myapp/** rallows reading the path. But not following symlinks (the path traversal is the access check, not the symlink target). -
AppArmor profiles must be loaded on the node. A Pod can’t reference a profile that’s not on the node.
-
The annotation is
container.apparmor.security.beta.kubernetes.io/<container-name>. Notapparmor.security.beta.kubernetes.io. The container name matters. -
runtime/defaultis the runtime’s default AppArmor profile. It may be very permissive (or non-existent). Don’t assume it. -
Seccomp and AppArmor are not the same. They restrict different things. Use both for defense-in-depth.
-
SELinux is the RHEL equivalent of AppArmor. On RHEL, the container runtime enables SELinux by default. The default SELinux policy is
container_t. -
The kubelet doesn’t validate the seccomp profile. A
Localhostreference to a non-existent file is silently ignored (the container runs unconfined). -
The kubelet doesn’t validate the AppArmor profile either. A
localhost/<name>reference to a non-existent profile is silently ignored. -
A seccomp filter that returns
SCMP_ACT_LOGis for debugging. The syscall is allowed, but the action is logged. Use this to generate profiles. -
The seccomp profile is loaded by the container runtime, not k8s. The runtime (containerd, CRI-O) reads the profile and configures the container’s seccomp filter.
-
The
RuntimeDefaultprofile is the runtime’s, not k8s’s. Containerd and CRI-O have different defaults (slightly). The PSSrestrictedrequirement is “RuntimeDefault or Localhost”, and either is fine. -
Seccomp is for syscalls only. File access is not blocked by seccomp — the kernel’s VFS handles that. AppArmor or SELinux is needed for file restrictions.
-
A seccomp filter that returns
SCMP_ACT_KILLis irreversible. The process is killed. UseSCMP_ACT_ERRNOfor tests (the syscall fails with a specific error, the process can handle it). -
A seccomp filter is inherited by child processes. Forking a process inherits the filter. The child can’t call syscalls not in the parent’s filter.
-
The seccomp BPF program runs in the kernel. It’s fast (~100ns per syscall). The overhead is negligible.
-
AppArmor doesn’t replace SELinux on systems that have both. On Ubuntu, AppArmor is loaded; SELinux is not. On RHEL, SELinux is loaded; AppArmor is not. They don’t conflict, but only one is active.
-
The kubelet’s
--seccomp-profile-rootflag controls where to look forLocalhostprofiles. Default/var/lib/kubelet/seccomp/. If you change it, your profiles need to be there. -
The kubelet’s
--allowed-unsafe-sysctlsand--seccomp-defaultflags control defaults. With--seccomp-default=true, the kubelet setsRuntimeDefaultfor all containers that don’t have a profile. This is a hardening default. -
A seccomp profile is per-container, not per-Pod. A multi-container Pod can have different profiles for each container.
-
The seccomp filter is a BPF program, not a JSON file in the kernel. The kubelet / runtime parses the JSON and compiles it to BPF. The BPF is loaded into the kernel.
-
A seccomp filter can have multiple actions per syscall. You can have
SCMP_ACT_LOGfor one syscall andSCMP_ACT_ALLOWfor another. The first matching rule wins. -
The seccomp profile name in
localhostProfileis a relative path. The kubelet appends it to--seccomp-profile-root. SolocalhostProfile: profiles/my.jsonresolves to/var/lib/kubelet/seccomp/profiles/my.json. -
An AppArmor profile is loaded once, not per-container. The profile is in the kernel; containers reference it by name.
-
The
unconfinedannotation value disables AppArmor. Don’t use it for production. -
Seccomp doesn’t restrict the
ioctlsyscall fully. A seccomp filter can allow / denyioctl, but the arguments (which device) are not filterable. For device-level restrictions, use AppArmor or SELinux. -
A
seccompProfile.type: Localhostwithout the file present is silently ignored. The container runs unconfined. Check the kubelet’s log for warnings. -
The Seccomp and AppArmor layers are independent. A container can have
seccompProfile: RuntimeDefaultandcontainer.apparmor.security.beta.kubernetes.io/app: runtime/defaultsimultaneously. Both apply.
See also
- SecurityContext — where seccomp / AppArmor are set
- PSS — requires
RuntimeDefaultseccomp forrestricted - Runtime Sandboxing — gVisor / Kata as stronger alternatives
- Runtime Detection — Falco / Tetragon detect syscall anomalies