Table of Contents
- Introduction
- Understanding why unprivileged user namespaces are a problem
- TODO: add full pwn2own 2017, 2020, 2021, 2022, 2023, 2024
- Two types of restrictions
- Discovering if your kernel supports restrictions on unprivileged user namespaces
- Audit message
- Checking the current state of restricted unprivileged user namespaces
- Controlling unprivileged user namespace restrictions via sysctl
- kernel.apparmor_restrict_unprivileged_userns
- kernel.apparmor_restrict_unprivileged_userns_force
- kernel.apparmor_restrict_unprivileged_userns_complain
- Allowing user namespaces creation in policy
- unconfined and user namespace mediation
- Special unconfined profiles and user namespace mediation
- The default profile
- how this interacts with policy ABIs
- Disabling unprivileged user namespaces
- Ubuntu 24.04
- Kernel Build kconfig options
- Support Matrix
Introduction
Unprivileged user namespaces are a feature of the kernel that can be used to replace many of the uses of setuid and setguid programs, and also allow for applications to create more secure sandboxes.
Understanding why unprivileged user namespaces are a problem
While unprivileged user namespaces have been beneficial by reducing the need for setuid and setguid processes their underlying implementation presents several potential issues. The main issue is that unprivileged user namespaces are implemented in a way to fails open. This occurs in two ways the kernel surface exposed to a unprivileged user namespace is not fixed/controlled by the unprivileged user namespace, as the kernel evolves the set of interfaces grows and changes. The second way it fails open is that if a bug can be exploited it provides access to the entire system.
To understand the issue at a deeper level it is necessary to understand that the kernel is shared between the "host" and the unprivileged user namespace. The security for unprivileged user namespaces is primarily base on linux capabilities which is an extension of posix capabilities. The Linux kernel uses capability checks extensively on all its interfaces (files, syscalls, ...), to control who can access a given interface. ????
Unprivileged user namespaces make many of these interfaces available to the unprivileged user via allowing them to create a user namespace that the user is "root" in.
they expose kernel interfaces that are normally restricted to processes with privileged capabilities (root) to use by unprivileged users. Exposing more kernel interfaces than necessary to a process introduces additional security risks, and unfortunately unprivileged user namespaces are now broadly used as a step in several privilege escalation exploit chains. Basically even if unprivileged user namespaces are bug free, as long as any privileged kernel interface or combination of interfaces has a bug an unprivileged user can try to exploit that bug.
This has lead to many real world CVEs. Examples (to pick a few):
-
CVE-2024-1086: to exploit, needs to be able to add netfilter rules, granted by
CAP_NET_ADMIN
in a new user and network namespace. -
CVE-2022-0185: to exploit, need to be able to mount a filesystem, granted by
CAP_SYS_ADMIN
in a user namespace. -
CVE-2022-1015: to exploit, need to be able to add netfilter rules, granted by
CAP_NET_ADMIN
in a new user and network namespace. -
CVE-2022-2078: to exploit, need to be able to add netfilter rules, granted by
CAP_NET_ADMIN
in a new user and network namespace. -
CVE-2022-24122: reference counting error when leaving a user namespace.
-
CVE-2022-25636: to exploit, need to be able to add netfilter rules, granted by
CAP_NET_ADMIN
in a new user and network namespace. -
CVE-2020-14386: to exploit, need to interact with
AF_PACKET
, granted byCAP_NET_RAW
in a new user namespace. -
CVE-2020-16120: to exploit, needs to be able to mount fuse overlay and shiftfs.
-
CVE-2022-32250: to exploit, needs to be able to add netfilter rules, granted by
CAP_NET_ADMIN
in a new user and network namespace.
TODO: add full pwn2own 2017, 2020, 2021, 2022, 2023, 2024
In a report from google 44% of the exploits they saw required unprivileged user namespaces.
Because of this several distro kernels carry a patch that allows for a sysctl to disable unprivileged user namespaces as a mitigation. Unfortunately the sysctl is all or nothing, disabling unprivileged user namespaces might stop an exploit but also can break applications that use them. Generally an exploit targets a specific application, and as long as unprivileged user namespaces can be disabled for those applications there is no need to disable them for the entire system.
With introduction of restricted unprivileged user namespaces AppArmor can be used to selectively allow and disallow unprivileged user namespaces. AppArmor policy is used to selectively control access to unprivileged user namespaces on a per applications basis.
Two types of restrictions
AppArmor is capable of two styles of restrictions, denying the unprivileged unconfined from creating new user namespaces, and allow unprivileged unconfined process to create a user namespace but restricting tasks within that namespace with a default profile. The default profile is defined in policy and has reduced permissions, if the default profile is not present apparmor will fallback to denying unconfined access to unprivileged user namespaces.
Discovering if your kernel supports restrictions on unprivileged user namespaces
There are two ways to determine if your Kernel has support for restricting unprivileged user namespaces
Examine /proc
If the file
/proc/sys/kernel/apparmor_restrict_unprivileged_userns
exists your kernel supports restrictions on unprivileged user namespaces. And the value can be read to determine if they are enabled (1) or disabled (0).
Examine AppArmor features
The presence of AppArmor's ability to control user namespaces can also be found by introspecting AppArmor's advertised feature set via the following command
$ sudo cat /sys/kernel/security/apparmor/features/namespaces/mask
userns_create
If the value userns_create
is present AppArmor can control the creation of namespaces in policy and the use of unprivileged user namespaces by unconfined.
To check if apparmor support changing the profile on when a task creates a new user namespace
$ sudo cat /sys/kernel/security/apparmor/features/namespaces/userns_create
pciu&
This ability is used to be able to allow unprivileged user namespaces but remove capabilities within the namespace.
Audit message
If AppArmor denies an unconfined unprivileged process from creating a user namespace, it will log a message to the Audit subsystem. Similar to the following example message.
apparmor="DENIED" operation="userns_create" class="namespace" info="User namespace creation restricted" error=-13 profile="unconfined" pid=21323 comm="steamwebhelper" requested="userns_create" denied="userns_create"
If transitions are supported
apparmor="DENIED" operation="userns_create" class="namespace" info="User namespace creation restricted - failed to find unprivileged profile" error=-13 profile="unconfined" pid=1638 comm="plasmashell" requested="userns_create" denied="userns_create" target="unpriv_userns"
Checking the current state of restricted unprivileged user namespaces
The /proc file that can be introspected to determine if AppArmor restrictions on unprivileged user namespaces are available also provides the current status of the restriction.
$ sudo cat /proc/sys/kernel/apparmor_restrict_unprivileged_userns
0
If the returned value is 0
then restrictions on unprivileged user namespace are disabled, if a value of 1
is reported the restriction is enabled.
Controlling unprivileged user namespace restrictions via sysctl
Restrictions on unprivileged user namespaces can be controlled using the sysctl command. The changes made by the sysctl command do not persist between reboots. For the change to persist, the sysctl must be added to /etc/sysctl.conf
or to a .conf file in /etc/sysctl.d/
.
AppArmor offers three sysctls for controlling userns behavior.
- kernel.apparmor_restrict_unprivileged_userns
- kernel.apparmor_restrict_unprivileged_userns_force (6.2+)
- kernel.apparmor_restrict_unprivileged_userns_complain (6.2+)
kernel.apparmor_restrict_unprivileged_userns
This sysctl allows enabling or disabling all apparmor mediation/restrictions around unprivileged user namespaces. If set to off the other sysctls are ignored, rules in policy are ignored etc.
To disable
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0
To enable
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=1
kernel.apparmor_restrict_unprivileged_userns_force
This sysctl is used to control policy abi compatibility which is an AppArmor feature where older policy is supported at its declared ABI/feature set level. This is used to prevent confined applications from breaking when a new kernel is used without the policy being updated. However this also means confined applications can by-pass the user namespace restriction if they are using policy that has not been updated to the new ABI.
This sysctl allows forcing the userns restrictions on regardless of the policies declared ABI. When enabled all confined applications will have the user namespace mediation enforced; old policy missing the appropriate rule will deny access to use namespace creation. To Disable
$ echo 0 > /proc/sys/kernel/apparmor_restrict_unprivileged_userns_force
kernel.apparmor_restrict_unprivileged_userns_complain
This sysctl is used to control complain mode of user namespace mediation for unconfined processes. It allows for the restriction to be globally enabled, but only log uses, instead of denying them. To Enable
$ echo 1 > /proc/sys/kernel/apparmor_restrict_unprivileged_userns_complain
Allowing user namespaces creation in policy
When restrictions on unprivileged user namespaces are enabled unconfined unprivileged processes are not allowed to create user namespaces. Specifically unconfined processes that do not have CAP_SYS_ADMIN must be confined by a profile to be able to create user namespaces when restrictions on unprivileged user namespaces are enabled.
Confined processes whether privileged or unprivileged are by default also not allowed to create user namespaces. To enable them to create user namespaces the following rule should be add to the applications profile.
allow userns create,
unconfined and user namespace mediation
The default unconfined profile uses the rule
allow userns sys_admin=true sysctl_apparmor_restrict_unprivileged_userns=true create,
The behavior can change if unconfined is replaced.
Special unconfined profiles and user namespace mediation
Profiles that are tagged as unconfined have their permissions determined entirely by the profile. That is they are not controlled by the sysctl apparmor_restrict_unprivileged_userns
nor do they have the exception for privileged tasks.
eg. a profile without a user namespace rule will result in a DENIAL despite being tagged unconfined
abi <abi/4.0>,
profile (unconfined) { }
eg. a profile with a user namespace rule with an unconfined rule can allow creation of user namespace
abi <abi/4.0>,
profile (unconfined) {
allow userns create,
}
eg. a profile marked as unconfined without a user namespace rule and without an abi will allow user namespace creation
profile (unconfined) {
}
eg. a profile marked as unconfined with a user namespace rule and without an abi will either fail the compile due to an unsupported rule OR restrict user namespace creation.
profile (unconfined) {
deny userns, # will deny despite missing abi
}
The default profile
The default profile is the unprivileged_userns profile in policy. It must be loaded before an unprivileged user namespace is created. The profile has a default definition of
profile unprivileged_userns {
allow all,
deny capability,
allow pix /**,
}
how this interacts with policy ABIs
The user namespace control respects policy ABIs. This means confined applications with ABIs that did not support control of user namespaces will function unchanged on kernels that support restrictions on user namespaces. The net effect is these profiles can be used to by-pass user namespace controls. This behavior can be overriden by using ABI pinning.
AppArmor 2.x
To pin the ABI of AppArmor 2.x policy add the following rule to the /etc/apparmor/parser.conf
file.
policy-features=kernel
this will force AppArmor 2.x policy to use the current kernel's ABI. Note that this may cause failures beyond controlling user namespace. The kernel
keyword can be replaced by a path to any appropriate ABI file, resulting in forcing that particular ABI to be used.
AppArmor 3.x
AppArmor 3.x policy uses ABI rules in policy to indicate what ABI the policy was authored under. There are two ways to have this policy to enforce user namespace controls.
Update policy ABI rules
Policy ABI kernels can be updated to point to an ABI file that supports user namespace controls.
Eg.
abi <abi/3.0>,
can be changed to (assuming the file is available)
abi <abi/4.0>,
Pin an override ABI
ABI rules can be overridden using a special override pin similar to the pin used with AppArmor 2.x policy. To do this the following rule is added to the /etc/apparmor/parser.conf
file.
override-policy-abi=kernel
this will force AppArmor 2.x and AppArmor 3.x policy to use the current kernel's ABI. Note that this may cause failures beyond controlling user namespace. The kernel
keyword can be replaced by a path to any appropriate ABI file, resulting in forcing that particular ABI to be used.
Update/Replace the ABI file
This method is NOT recommended, as policy references to such a modified ABI file will not be universally consistent. The basic idea is you can insert the correct ABI info in the ABI file or completely over write the ABI file with a new file. The particulars are omitted as if you don't know how to do this you should not do it.
Disabling unprivileged user namespaces
Several distro kernels (but not all) have the ability to disable unprivileged user namespaces for the entire system via the unprivileged_userns_clone
sysctl. If a kernel has this ability the file /proc/sys/kernel/unprivileged_userns_clone
will be present. The current state of whether unprivileged user namespaces are allowed can be found by doing
$ cat /proc/sys/kernel/unprivileged_userns_clone
Where a value of 0
means disabled and a value of 1
means enabled.
Unprivileged user namespaces can be disabled by using the command
sudo sysctl -w kernel.unprivileged_userns_clone=0
Ubuntu 24.04
Ubuntu 24.04 has the restriction available in its kernel but not enabled by default. If the AppArmor userspace is installed a sysctl file is used to enable the restriction during boot. This enables the kernel to be used unmodified with older releases without having the restriction enabled, where the installed apparmor policy may not support the restriction.
Permanently Disabling the restriction
To permanent disable the restriction on Ubuntu 24.04 create the file /etc/sysctl.d/60-apparmor-namespace.conf
containing the following
kernel.apparmor_restrict_unprivileged_userns=0
Note: this will only take effect when the system is rebooted. To disable temporarily disable the restriction on a running system use the sysctl method described above.
Kernel Build kconfig options
User namespaces can be configured via the CONFIG_USER_NS
config symbol.
If user namespaces are enabled then the config symbol SECURITY_APPARMOR_RESTRICT_USERNS
allows controlling if AppArmor enforces restrictions on unprivileged user namespaces by default. If N
apparmor's unprivileged user namespace restrictions will be disabled by default, whereY
will enable restrictions by default. The default value can be overriden by setting sysctl at runtime.
Support Matrix
Policy ABI
Kconfig sysctl
Feature | Upstream | Ubuntu 22.04 Jammy |
Ubuntu 22.10 Kinetic |
Ubuntu 23.04 Lunar |
Ubuntu 23.10 Mantic |
Ubuntu 24.04 Noble |
|
---|---|---|---|---|---|---|---|
unconfined flag | kernel --- apparmor |
3.0 3.12 |
5.15 3.0.4 |
- | Kinetic 3.0.?? |
Lunar 3.0.8 |
Mantic 4.0.0-alpha2 |
default_allow flag | 4.0 | - | - | - | - | - | - |
default_allow fallback to unconfined | 4.0 | 3.12 | Jammy 3.0.4 |
- | Kinetic | Lunar | Mantic |
default_allow delegation | ?? | no | no | no | no | no | |
change_profile restriction | - | 6.7 | no | no | no | no | mantic 6.5 |
io_uring restriction | |||||||
mount restriction | |||||||
link restriction | |||||||
userns mediation | 4.0 | 6.7 | no | kernel 5.19 userspace ?? |
kernel 6.2 userspace ?? |
kernel 6.5 userspace ?? |
kernel ?6.7? userspace 4.0 |
unprivileged unconfined restriction | - | no | no | no | kernel 6.2 | kernel 6.5 | kernel ?6.7? |
specialize unconfined profile | - | no | no | no | no | kernel 6.5 userspace 4.0.0~alpha2 |
kernel ?6.7? userspace 4.0 |
sysctl restrict_unprivileged_userns |
- | no | yes - 5.19 | yes - 6.2 | yes - 6.5 | yes - ?6.7? | |
sysctl restrict_unprivileged_userns_force |
- | no | no | yes - 6.2 | yes - 6.5 | yes - ?6.7? | |
sysctl restrict_unprivileged_userns_complain |
- | no | no | yes - 6.2 | yes - 6.5 | yes - ?6.7? | |
/usr/lib/sysctl.d/10-apparmor.conf | no | - | no | no | 4.0.0~alpha2-0ubuntu5: disabled | 4.0.0~alpha2-0ubuntu7: enabled | |
replace unconfined | ?? | no | no | no | no | no | kernel ?6.7? |
A | B | C | D | E | F | G | H | |
---|---|---|---|---|---|---|---|---|
1 | Feature | Upstream | Ubuntu 22.04 Jammy | Ubuntu 22.10 Kinetic | Ubuntu 23.04 Lunar | Ubuntu 23.10 Mantic | Ubuntu 24.04 Noble | |
2 | unconfined flag | Kernel | 3.0 | 5.15 | 5.19 | 6.2 | 6.5 | ?6.7? |
3 | Userspace | ?? | ?? | 3.0.4 | 3.0.8 | 4.0~alpha2 | 4.0 | |
4 | default_allow flag | Kernel | - | - | - | - | - | - |
5 | Userspace | 4.0 | no | no | no | no | 4.0 | |
6 | default_allow fallback to unconfined flag | Kernel | 3.0 | 5.15 | 5.19 | 6.2 | 6.5 | ?6.7? |
7 | Userspace | 4.0 | no | no | no | no | 4.0 | |
8 | default_allow delegation | Kernel | no | no | no | no | no | ? |
9 | Userspace | 4.0 | no | no | no | no | ? | |
10 | userns mediation | Kernel | 6.7 | no | 5.19 | 6.2 | 6.5 | ?6.7? |
11 | Userspace | 4.0 | no | ? | ? | 4.0-alph2 | 4.0 | |
12 | userns domain transition | Kernel | no | no | no | no | no | ?6.7? |
13 | Userspace | 4.0 | no | no | no | no | 4.0 | |
14 | userns nspace conditionals | Kernel | no | no | no | no | no | ?6.7? |
15 | Userspace | 4.0 | no | no | no | no | 4.0 | |
16 | restrict unprivilged unconfined userns creation | Kernel | ||||||
17 | Userspace | |||||||
18 | special unconfined profile | Kernel | no | no | no | 6.5 sauce | ?6.7? | |
19 | Userspace | yes | yes | yes | yes | yes | ||
20 | restrict change_profile | Kernel | 6.7 - default off | 6.5 - sysctl | ||||
21 | Userspace | - | - | - | - | - | - | |
22 | change_profile sysctl conditional | Kernel | ||||||
23 | Userspace | no | no | no | no | |||
24 | change_profile unprivileged conditional | Kernel | ||||||
25 | Userspace | no | no | no | no | |||
26 | restrict io_uring | Kernel | ||||||
27 | Userspace | - | - | - | - | - | - | |
28 | restrict mount | Kernel | ||||||
29 | Userspace | - | - | - | - | - | - | |
30 | restrict link | Kernel | ||||||
31 | Userspace | - | - | - | - | - | - | |
32 | sysctl restrict_unprivileged_userns | Kernel | ||||||
33 | Userspace | - | - | - | - | - | - | |
34 | sysctl restrict_unprivileged_userns_force | Kernel | ||||||
35 | Userspace | - | - | - | - | - | - | |
36 | sysctl restrict_unprivileged_userns_complain | Kernel | ||||||
37 | Userspace | - | - | - | - | - | - | |
38 | /usr/lib/sysctl.d/10-apparmor.conf | Kernel | - | - | - | - | - | - |
39 | Userspace | |||||||
40 | replace unconfined | Kernel | no | |||||
41 | Userspace | - | - | - | - | - | - | |
42 | renaming replacement prereq of replace unconfined | Kernel | no | no | no | no | no | - |
43 | Userspace | |||||||
44 | ? | Kernel | ||||||
45 | Userspace |