51 unprivileged_userns_restriction
John Johansen edited this page 2024-10-30 10:11:25 +00:00

Introduction

Unprivileged user namespaces are a feature of the kernel that can be used to replace many of the uses of setuid and setguid programs, and also allow for applications to create more secure sandboxes.

Understanding why unprivileged user namespaces are a problem

While unprivileged user namespaces have been beneficial by reducing the need for setuid and setguid processes their underlying implementation presents several potential issues. The main issue is that unprivileged user namespaces are implemented in a way to fails open. This occurs in two ways the kernel surface exposed to a unprivileged user namespace is not fixed/controlled by the unprivileged user namespace, as the kernel evolves the set of interfaces grows and changes. The second way it fails open is that if a bug can be exploited it provides access to the entire system.

To understand the issue at a deeper level it is necessary to understand that the kernel is shared between the "host" and the unprivileged user namespace. The security for unprivileged user namespaces is primarily base on linux capabilities which is an extension of posix capabilities. The Linux kernel uses capability checks extensively on all its interfaces (files, syscalls, ...), to control who can access a given interface. ????

Unprivileged user namespaces make many of these interfaces available to the unprivileged user via allowing them to create a user namespace that the user is "root" in.

they expose kernel interfaces that are normally restricted to processes with privileged capabilities (root) to use by unprivileged users. Exposing more kernel interfaces than necessary to a process introduces additional security risks, and unfortunately unprivileged user namespaces are now broadly used as a step in several privilege escalation exploit chains. Basically even if unprivileged user namespaces are bug free, as long as any privileged kernel interface or combination of interfaces has a bug an unprivileged user can try to exploit that bug.

This has lead to many real world CVEs. Examples (to pick a few):

  • CVE-2024-1086: to exploit, needs to be able to add netfilter rules, granted by CAP_NET_ADMIN in a new user and network namespace.

  • CVE-2022-0185: to exploit, need to be able to mount a filesystem, granted by CAP_SYS_ADMIN in a user namespace.

  • CVE-2022-1015: to exploit, need to be able to add netfilter rules, granted by CAP_NET_ADMIN in a new user and network namespace.

  • CVE-2022-2078: to exploit, need to be able to add netfilter rules, granted by CAP_NET_ADMIN in a new user and network namespace.

  • CVE-2022-24122: reference counting error when leaving a user namespace.

  • CVE-2022-25636: to exploit, need to be able to add netfilter rules, granted by CAP_NET_ADMIN in a new user and network namespace.

  • CVE-2020-14386: to exploit, need to interact with AF_PACKET, granted by CAP_NET_RAW in a new user namespace.

  • CVE-2020-16120: to exploit, needs to be able to mount fuse overlay and shiftfs.

  • CVE-2023-35001: see write-up

  • CVE-2022-32250: to exploit, needs to be able to add netfilter rules, granted by CAP_NET_ADMIN in a new user and network namespace.

TODO: add full pwn2own 2017, 2020, 2021, 2022, 2023, 2024


In a report from google 44% of the exploits they saw required unprivileged user namespaces.

Because of this several distro kernels carry a patch that allows for a sysctl to disable unprivileged user namespaces as a mitigation. Unfortunately the sysctl is all or nothing, disabling unprivileged user namespaces might stop an exploit but also can break applications that use them. Generally an exploit targets a specific application, and as long as unprivileged user namespaces can be disabled for those applications there is no need to disable them for the entire system.

With introduction of restricted unprivileged user namespaces AppArmor can be used to selectively allow and disallow unprivileged user namespaces. AppArmor policy is used to selectively control access to unprivileged user namespaces on a per applications basis.

Two types of restrictions

AppArmor is capable of two styles of restrictions, denying the unprivileged unconfined from creating new user namespaces, and allow unprivileged unconfined process to create a user namespace but restricting tasks within that namespace with a default profile. The default profile is defined in policy and has reduced permissions, if the default profile is not present apparmor will fallback to denying unconfined access to unprivileged user namespaces.

Discovering if your kernel supports restrictions on unprivileged user namespaces

There are two ways to determine if your Kernel has support for restricting unprivileged user namespaces

Examine /proc

If the file

/proc/sys/kernel/apparmor_restrict_unprivileged_userns

exists your kernel supports restrictions on unprivileged user namespaces. And the value can be read to determine if they are enabled (1) or disabled (0).

Examine AppArmor features

The presence of AppArmor's ability to control user namespaces can also be found by introspecting AppArmor's advertised feature set via the following command

$ sudo cat /sys/kernel/security/apparmor/features/namespaces/mask
userns_create

If the value userns_create is present AppArmor can control the creation of namespaces in policy and the use of unprivileged user namespaces by unconfined.

To check if apparmor support changing the profile on when a task creates a new user namespace

$ sudo cat /sys/kernel/security/apparmor/features/namespaces/userns_create
pciu&

This ability is used to be able to allow unprivileged user namespaces but remove capabilities within the namespace.

Audit message

If AppArmor denies an unconfined unprivileged process from creating a user namespace, it will log a message to the Audit subsystem. Similar to the following example message.

apparmor="DENIED" operation="userns_create" class="namespace" info="User namespace creation restricted" error=-13 profile="unconfined" pid=21323 comm="steamwebhelper" requested="userns_create" denied="userns_create"

If transitions are supported

apparmor="DENIED" operation="userns_create" class="namespace" info="User namespace creation restricted - failed to find unprivileged profile" error=-13 profile="unconfined" pid=1638 comm="plasmashell" requested="userns_create" denied="userns_create" target="unpriv_userns"

Checking the current state of restricted unprivileged user namespaces

The /proc file that can be introspected to determine if AppArmor restrictions on unprivileged user namespaces are available also provides the current status of the restriction.

$ sudo cat /proc/sys/kernel/apparmor_restrict_unprivileged_userns
0

If the returned value is 0 then restrictions on unprivileged user namespace are disabled, if a value of 1 is reported the restriction is enabled.

Controlling unprivileged user namespace restrictions via sysctl

Restrictions on unprivileged user namespaces can be controlled using the sysctl command. The changes made by the sysctl command do not persist between reboots. For the change to persist, the sysctl must be added to /etc/sysctl.conf or to a .conf file in /etc/sysctl.d/.

AppArmor offers three sysctls for controlling userns behavior.

  • kernel.apparmor_restrict_unprivileged_userns
  • kernel.apparmor_restrict_unprivileged_userns_force (6.2+)
  • kernel.apparmor_restrict_unprivileged_userns_complain (6.2+)

kernel.apparmor_restrict_unprivileged_userns

This sysctl allows enabling or disabling all apparmor mediation/restrictions around unprivileged user namespaces. If set to off the other sysctls are ignored, rules in policy are ignored etc.

To disable

sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0

To enable

sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=1

kernel.apparmor_restrict_unprivileged_userns_force

This sysctl is used to control policy abi compatibility which is an AppArmor feature where older policy is supported at its declared ABI/feature set level. This is used to prevent confined applications from breaking when a new kernel is used without the policy being updated. However this also means confined applications can by-pass the user namespace restriction if they are using policy that has not been updated to the new ABI.

This sysctl allows forcing the userns restrictions on regardless of the policies declared ABI. When enabled all confined applications will have the user namespace mediation enforced; old policy missing the appropriate rule will deny access to use namespace creation. To Disable

$ echo 0 > /proc/sys/kernel/apparmor_restrict_unprivileged_userns_force

kernel.apparmor_restrict_unprivileged_userns_complain

This sysctl is used to control complain mode of user namespace mediation for unconfined processes. It allows for the restriction to be globally enabled, but only log uses, instead of denying them. To Enable

$ echo 1 > /proc/sys/kernel/apparmor_restrict_unprivileged_userns_complain

Allowing user namespaces creation in policy

When restrictions on unprivileged user namespaces are enabled unconfined unprivileged processes are not allowed to create user namespaces. Specifically unconfined processes that do not have CAP_SYS_ADMIN must be confined by a profile to be able to create user namespaces when restrictions on unprivileged user namespaces are enabled.

Confined processes whether privileged or unprivileged are by default also not allowed to create user namespaces. To enable them to create user namespaces the following rule should be add to the applications profile.

allow userns create,

unconfined and user namespace mediation

The default unconfined profile uses the rule

allow userns sys_admin=true sysctl_apparmor_restrict_unprivileged_userns=true create,

The behavior can change if unconfined is replaced.

Special unconfined profiles and user namespace mediation

Profiles that are tagged as unconfined have their permissions determined entirely by the profile. That is they are not controlled by the sysctl apparmor_restrict_unprivileged_userns nor do they have the exception for privileged tasks.

eg. a profile without a user namespace rule will result in a DENIAL despite being tagged unconfined

abi <abi/4.0>,
profile (unconfined) { }

eg. a profile with a user namespace rule with an unconfined rule can allow creation of user namespace

abi <abi/4.0>,
profile (unconfined) {
  allow userns create,
}

eg. a profile marked as unconfined without a user namespace rule and without an abi will allow user namespace creation

profile (unconfined) {
}

eg. a profile marked as unconfined with a user namespace rule and without an abi will either fail the compile due to an unsupported rule OR restrict user namespace creation.

profile (unconfined) {
   deny userns,    # will deny despite missing abi
}

The default profile

The default profile is the unprivileged_userns profile in policy. It must be loaded before an unprivileged user namespace is created. The profile has a default definition of

profile unprivileged_userns {
    allow all,
    deny capability,
    allow pix /**,
}

how this interacts with policy ABIs

The user namespace control respects policy ABIs. This means confined applications with ABIs that did not support control of user namespaces will function unchanged on kernels that support restrictions on user namespaces. The net effect is these profiles can be used to by-pass user namespace controls. This behavior can be overriden by using ABI pinning.

AppArmor 2.x

To pin the ABI of AppArmor 2.x policy add the following rule to the /etc/apparmor/parser.conf file.

policy-features=kernel

this will force AppArmor 2.x policy to use the current kernel's ABI. Note that this may cause failures beyond controlling user namespace. The kernel keyword can be replaced by a path to any appropriate ABI file, resulting in forcing that particular ABI to be used.

AppArmor 3.x

AppArmor 3.x policy uses ABI rules in policy to indicate what ABI the policy was authored under. There are two ways to have this policy to enforce user namespace controls.

Update policy ABI rules

Policy ABI kernels can be updated to point to an ABI file that supports user namespace controls.

Eg.

abi <abi/3.0>,

can be changed to (assuming the file is available)

abi <abi/4.0>,

Pin an override ABI

ABI rules can be overridden using a special override pin similar to the pin used with AppArmor 2.x policy. To do this the following rule is added to the /etc/apparmor/parser.conf file.

override-policy-abi=kernel

this will force AppArmor 2.x and AppArmor 3.x policy to use the current kernel's ABI. Note that this may cause failures beyond controlling user namespace. The kernel keyword can be replaced by a path to any appropriate ABI file, resulting in forcing that particular ABI to be used.

Update/Replace the ABI file

This method is NOT recommended, as policy references to such a modified ABI file will not be universally consistent. The basic idea is you can insert the correct ABI info in the ABI file or completely over write the ABI file with a new file. The particulars are omitted as if you don't know how to do this you should not do it.

Disabling unprivileged user namespaces

Several distro kernels (but not all) have the ability to disable unprivileged user namespaces for the entire system via the unprivileged_userns_clone sysctl. If a kernel has this ability the file /proc/sys/kernel/unprivileged_userns_clone will be present. The current state of whether unprivileged user namespaces are allowed can be found by doing

$ cat /proc/sys/kernel/unprivileged_userns_clone

Where a value of 0 means disabled and a value of 1 means enabled.

Unprivileged user namespaces can be disabled by using the command

sudo sysctl -w kernel.unprivileged_userns_clone=0

Ubuntu 24.04

Ubuntu 24.04 has the restriction available in its kernel but not enabled by default. If the AppArmor userspace is installed a sysctl file is used to enable the restriction during boot. This enables the kernel to be used unmodified with older releases without having the restriction enabled, where the installed apparmor policy may not support the restriction.

Permanently Disabling the restriction

To permanent disable the restriction on Ubuntu 24.04 create the file /etc/sysctl.d/60-apparmor-namespace.conf containing the following

kernel.apparmor_restrict_unprivileged_userns=0

Note: this will only take effect when the system is rebooted. To disable temporarily disable the restriction on a running system use the sysctl method described above.

Kernel Build kconfig options

User namespaces can be configured via the CONFIG_USER_NS config symbol.

If user namespaces are enabled then the config symbol SECURITY_APPARMOR_RESTRICT_USERNS allows controlling if AppArmor enforces restrictions on unprivileged user namespaces by default. If N apparmor's unprivileged user namespace restrictions will be disabled by default, whereY will enable restrictions by default. The default value can be overriden by setting sysctl at runtime.

Support Matrix

Policy ABI

Kconfig sysctl

Feature Upstream Ubuntu
22.04
Jammy
Ubuntu
22.10
Kinetic
Ubuntu
23.04
Lunar
Ubuntu
23.10
Mantic
Ubuntu
24.04
Noble
unconfined flag kernel
---
apparmor
3.0
3.12
5.15
3.0.4
- Kinetic
3.0.??
Lunar
3.0.8
Mantic
4.0.0-alpha2
default_allow flag 4.0 - - - - - -
default_allow fallback to unconfined 4.0 3.12 Jammy
3.0.4
- Kinetic Lunar Mantic
default_allow delegation ?? no no no no no
change_profile restriction - 6.7 no no no no mantic 6.5
io_uring restriction
mount restriction
link restriction
userns mediation 4.0 6.7 no kernel 5.19
userspace ??
kernel 6.2
userspace ??
kernel 6.5
userspace ??
kernel ?6.7?
userspace 4.0
unprivileged unconfined restriction - no no no kernel 6.2 kernel 6.5 kernel ?6.7?
specialize unconfined profile - no no no no kernel 6.5
userspace 4.0.0~alpha2
kernel ?6.7?
userspace 4.0
sysctl
restrict_unprivileged_userns
- no yes - 5.19 yes - 6.2 yes - 6.5 yes - ?6.7?
sysctl
restrict_unprivileged_userns_force
- no no yes - 6.2 yes - 6.5 yes - ?6.7?
sysctl
restrict_unprivileged_userns_complain
- no no yes - 6.2 yes - 6.5 yes - ?6.7?
/usr/lib/sysctl.d/10-apparmor.conf no - no no 4.0.0~alpha2-0ubuntu5: disabled 4.0.0~alpha2-0ubuntu7: enabled
replace unconfined ?? no no no no no kernel ?6.7?
ABCDEFGH
1
FeatureUpstreamUbuntu
22.04
Jammy
Ubuntu
22.10
Kinetic
Ubuntu
23.04
Lunar
Ubuntu
23.10
Mantic
Ubuntu
24.04
Noble
2
unconfined flagKernel3.05.155.196.26.5?6.7?
3
Userspace????3.0.43.0.84.0~alpha24.0
4
default_allow flagKernel------
5
Userspace4.0nononono4.0
6
default_allow fallback
to unconfined flag
Kernel3.05.155.196.26.5?6.7?
7
Userspace4.0nononono4.0
8
default_allow delegationKernelnonononono?
9
Userspace4.0nononono?
10
userns mediationKernel6.7no5.196.26.5?6.7?
11
Userspace4.0no??4.0-alph24.0
12
userns domain transitionKernelnonononono?6.7?
13
Userspace4.0nononono4.0
14
userns nspace conditionalsKernelnonononono?6.7?
15
Userspace4.0nononono4.0
16
restrict unprivilged unconfined
userns creation
Kernel
17
Userspace
18
special unconfined profileKernelnonono6.5 sauce?6.7?
19
Userspaceyesyesyesyesyes
20
restrict change_profileKernel6.7 - default off6.5 - sysctl
21
Userspace------
22
change_profile sysctl conditionalKernel
23
Userspacenononono
24
change_profile unprivileged conditionalKernel
25
Userspacenononono
26
restrict io_uringKernel
27
Userspace------
28
restrict mountKernel
29
Userspace------
30
restrict linkKernel
31
Userspace------
32
sysctl
restrict_unprivileged_userns
Kernel
33
Userspace------
34
sysctl
restrict_unprivileged_userns_force
Kernel
35
Userspace------
36
sysctl
restrict_unprivileged_userns_complain
Kernel
37
Userspace------
38
/usr/lib/sysctl.d/10-apparmor.confKernel------
39
Userspace
40
replace unconfinedKernelno
41
Userspace------
42
renaming replacement
prereq of replace unconfined
Kernelnonononono-
43
Userspace
44
?Kernel
45
Userspace