<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/mount.h, branch v7.2-rc1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-06-14T22:37:58+00:00</updated>
<entry>
<title>Merge tag 'vfs-7.2-rc1.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2026-06-14T22:37:58+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-06-14T22:37:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=79169a1624253363fed3e9a447b77e50bb226206'/>
<id>urn:sha1:79169a1624253363fed3e9a447b77e50bb226206</id>
<content type='text'>
Pull procfs updates from Christian Brauner:

 - Revamp fs/filesystems.c

   The file was a mess with a hand-rolled linked list in desperate need
   of a cleanup. The filesystems list is now RCU-ified, /proc files can
   be marked permanent from outside fs/proc/, and the string emitted
   when reading /proc/filesystems is pre-generated and cached instead of
   pointer-chasing and printfing entry by entry on every read.

   The file is read frequently because libselinux reads it and is linked
   into numerous frequently used programs (even ones you would not
   suspect, like sed!). Scalability also improves since reference
   maintenance on open/close is bypassed.

    open+read+close cycle single-threaded (ops/s):
      before: 442732
      after:  1063462 (+140%)

    open+read+close cycle with 20 processes (ops/s):
      before: 606177
      after:  3300576 (+444%)

   A follow-up patch adds missing unlocks in some corner cases and
   tidies things up.

 - Relax the mount visibility check for subset=pid mounts

   When procfs is mounted with subset=pid, all static files become
   unavailable and only the dynamic pid information is accessible. In
   that case there is no point in imposing the full mount visibility
   restrictions on the mounter - everything that can be hidden in procfs
   is already inaccessible. These restrictions prevented procfs from
   being mounted inside rootless containers since almost all container
   implementations overmount parts of procfs to hide certain
   directories.

   As part of this /proc/self/net is only shown in subset=pid mounts for
   CAP_NET_ADMIN, reconfiguring subset=pid is rejected, the
   SB_I_USERNS_VISIBLE superblock flag is replaced with an
   FS_USERNS_MOUNT_RESTRICTED filesystem flag, fully visible mounts are
   recorded in a list, and the mount restrictions are finally
   documented.

 - Protect ptrace_may_access() with exec_update_lock in procfs

   Most uses of ptrace_may_access() in procfs should hold
   exec_update_lock to avoid TOCTOU issues with concurrent privileged
   execve() (like setuid binary execution).

   This fixes the easy cases - the owner and visibility checks and the
   FD link permission checks - with the gnarlier ones to follow later.

* tag 'vfs-7.2-rc1.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: fix ups and tidy ups to /proc/filesystems caching
  proc: protect ptrace_may_access() with exec_update_lock (FD links)
  proc: protect ptrace_may_access() with exec_update_lock (part 1)
  docs: proc: add documentation about mount restrictions
  proc: handle subset=pid separately in userns visibility checks
  proc: prevent reconfiguring subset=pid
  proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN
  fs: cache the string generated by reading /proc/filesystems
  sysfs: remove trivial sysfs_get_tree() wrapper
  fs: RCU-ify filesystems list
  fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED
  proc: allow to mark /proc files permanent outside of fs/proc/
  namespace: record fully visible mounts in list
</content>
</entry>
<entry>
<title>fhandle: fix UAF due to unlocked -&gt;mnt_ns read in may_decode_fh()</title>
<updated>2026-06-04T07:39:50+00:00</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2026-06-03T19:31:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=40ab6644b99685755f740b872c00ef40d9aa870e'/>
<id>urn:sha1:40ab6644b99685755f740b872c00ef40d9aa870e</id>
<content type='text'>
may_decode_fh() accesses mount::mnt_ns without holding any locks; that
means the mount can concurrently be unmounted, and the mnt_namespace can
concurrently be freed after an RCU grace period.

This race can happens as follows, assuming that the mount point was
created by open_tree(..., OPEN_TREE_CLONE):

thread 1            thread 2            RCU
                    __do_sys_open_by_handle_at
                      do_handle_open
                        handle_to_path
                          may_decode_fh
                            is_mounted
                              [mount::mnt_ns access]
                            [mount::mnt_ns access]
__do_sys_close
  fput_close_sync
    __fput
      dissolve_on_fput
        umount_tree
        class_namespace_excl_destructor
          namespace_unlock
            free_mnt_ns
              mnt_ns_tree_remove
                call_rcu(mnt_ns_release_rcu)
                                        mnt_ns_release_rcu
                                          mnt_ns_release
                                            kfree
                            [mnt_namespace::user_ns access] **UAF**

Fix it by taking rcu_read_lock() around the mount::mnt_ns access, like
in __prepend_path().
Additionally, document the semantics of mount::mnt_ns, and use WRITE_ONCE()
for writers that can race with lockless readers.

This bug is unreachable unless one of the following is set:

 - CONFIG_PREEMPTION
 - CONFIG_RCU_STRICT_GRACE_PERIOD

because it requires an RCU grace period to happen during a syscall without
an explicit preemption.

This doesn't seem to have interesting security impact; worst-case, it could
leak the result of an integer comparison to userspace (from the level
check in cap_capable()), cause an endless loop, or crash the kernel by
dereferencing an invalid address.

Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Link: https://patch.msgid.link/20260603-vfs-fhandle-uaf-fix-v2-1-d05db76a5084@google.com
Signed-off-by: Christian Brauner (Amutable) &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>namespace: record fully visible mounts in list</title>
<updated>2026-05-11T21:13:01+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-04-27T08:26:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=672697c37c9dfe05aabfd44ccb51c2877deb524e'/>
<id>urn:sha1:672697c37c9dfe05aabfd44ccb51c2877deb524e</id>
<content type='text'>
Instead of wading through all the mounts in the mount namespace rbtree
to find fully visible procfs and sysfs mounts, be honest about them
being special cruft and record them in a separate per-mount namespace
list.

Link: https://patch.msgid.link/684859a8e0ac929cb89c1fbe16ce15b30c70eb1f.1777278334.git.legion@kernel.org
Reviewed-by: Aleksa Sarai &lt;aleksa@amutable.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: add immutable rootfs</title>
<updated>2026-01-12T15:52:09+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-01-12T15:47:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=576ee5dfd459abe8e29bee8b204cd259e60b4e18'/>
<id>urn:sha1:576ee5dfd459abe8e29bee8b204cd259e60b4e18</id>
<content type='text'>
Currently pivot_root() doesn't work on the real rootfs because it
cannot be unmounted. Userspace has to do a recursive removal of the
initramfs contents manually before continuing the boot.

Really all we want from the real rootfs is to serve as the parent mount
for anything that is actually useful such as the tmpfs or ramfs for
initramfs unpacking or the rootfs itself. There's no need for the real
rootfs to actually be anything meaningful or useful. Add a immutable
rootfs called "nullfs" that can be selected via the "nullfs_rootfs"
kernel command line option.

The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs
and fire up userspace which mounts the rootfs and can then just do:

  chdir(rootfs);
  pivot_root(".", ".");
  umount2(".", MNT_DETACH);

and be done with it. (Ofc, userspace can also choose to retain the
initramfs contents by using something like pivot_root(".", "/initramfs")
without unmounting it.)

Technically this also means that the rootfs mount in unprivileged
namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed
that the immutable rootfs remains permanently empty so there cannot be
anything revealed by unmounting the covering mount.

In the future this will also allow us to create completely empty mount
namespaces without risking to leak anything.

systemd already handles this all correctly as it tries to pivot_root()
first and falls back to MS_MOVE only when that fails.

This goes back to various discussion in previous years and a LPC 2024
presentation about this very topic.

Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-3-88dd1c34a204@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: use boolean to indicate anonymous mount namespace</title>
<updated>2025-11-11T09:01:31+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-11-10T15:08:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d9a44089ac7754e470dfdaf3658c3455b2d7f7dd'/>
<id>urn:sha1:d9a44089ac7754e470dfdaf3658c3455b2d7f7dd</id>
<content type='text'>
Stop playing games with the namespace id and use a boolean instead:

* This will remove the special-casing we need to do everywhere for mount
  namespaces.

* It will allow us to use asserts on the namespace id for initial
  namespaces everywhere.

* It will allow us to put anonymous mount namespaces on the namespaces
  trees in the future and thus make them available to statmount() and
  listmount().

Link: https://patch.msgid.link/20251110-work-namespace-nstree-fixes-v1-10-e8a9264e0fb9@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2025-10-03T17:19:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-10-03T17:19:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e64aeecbbb0962601bd2ac502a2f9c0d9be97502'/>
<id>urn:sha1:e64aeecbbb0962601bd2ac502a2f9c0d9be97502</id>
<content type='text'>
Pull vfs mount updates from Al Viro:
 "Several piles this cycle, this mount-related one being the largest and
  trickiest:

   - saner handling of guards in fs/namespace.c, getting rid of
     needlessly strong locking in some of the users

   - lock_mount() calling conventions change - have it set the
     environment for attaching to given location, storing the results in
     caller-supplied object, without altering the passed struct path.

     Make unlock_mount() called as __cleanup for those objects. It's not
     exactly guard(), but similar to it

   - MNT_WRITE_HOLD done right.

     mnt_hold_writers() does *not* mess with -&gt;mnt_flags anymore, so
     insertion of a new mount into -&gt;s_mounts of underlying superblock
     does not, in itself, expose -&gt;mnt_flags of that mount to concurrent
     modifications

   - getting rid of pathological cases when umount() spends quadratic
     time removing the victims from propagation graph - part of that had
     been dealt with last cycle, this should finish it

   - a bunch of stuff constified

   - assorted cleanups

* tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  constify {__,}mnt_is_readonly()
  WRITE_HOLD machinery: no need for to bump mount_lock seqcount
  struct mount: relocate MNT_WRITE_HOLD bit
  preparations to taking MNT_WRITE_HOLD out of -&gt;mnt_flags
  setup_mnt(): primitive for connecting a mount to filesystem
  simplify the callers of mnt_unhold_writers()
  copy_mnt_ns(): use guards
  copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure
  open_detached_copy(): separate creation of namespace into helper
  open_detached_copy(): don't bother with mount_lock_hash()
  path_has_submounts(): use guard(mount_locked_reader)
  fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
  ecryptfs: get rid of pointless mount references in ecryptfs dentries
  umount_tree(): take all victims out of propagation graph at once
  do_mount(): use __free(path_put)
  do_move_mount_old(): use __free(path_put)
  constify can_move_mount_beneath() arguments
  path_umount(): constify struct path argument
  may_copy_tree(), __do_loopback(): constify struct path argument
  path_mount(): constify struct path argument
  ...
</content>
</entry>
<entry>
<title>mnt: port to ns_ref_*() helpers</title>
<updated>2025-09-19T14:22:36+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-09-18T10:11:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2e9e6972279fec7dd352ff05abe596e55988ec41'/>
<id>urn:sha1:2e9e6972279fec7dd352ff05abe596e55988ec41</id>
<content type='text'>
Stop accessing ns.count directly.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>mnt: support ns lookup</title>
<updated>2025-09-19T12:26:15+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-09-12T11:52:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7d7d164989586c0ad61934975c40ca795dc134c7'/>
<id>urn:sha1:7d7d164989586c0ad61934975c40ca795dc134c7</id>
<content type='text'>
Move the mount namespace to the generic ns lookup infrastructure.
This allows us to drop a bunch of members from struct mnt_namespace.

Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>struct mount: relocate MNT_WRITE_HOLD bit</title>
<updated>2025-09-17T19:58:29+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2025-08-27T17:37:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3371fa2f27134fc4ec7d40b2ae7b9e92c3b2527e'/>
<id>urn:sha1:3371fa2f27134fc4ec7d40b2ae7b9e92c3b2527e</id>
<content type='text'>
... from -&gt;mnt_flags to LSB of -&gt;mnt_pprev_for_sb.

This is safe - we always set and clear it within the same mount_lock
scope, so we won't interfere with list operations - traversals are
always forward, so they don't even look at -&gt;mnt_prev_for_sb and
both insertions and removals are in mount_lock scopes of their own,
so that bit will be clear in *all* mount instances during those.

Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>preparations to taking MNT_WRITE_HOLD out of -&gt;mnt_flags</title>
<updated>2025-09-17T19:58:29+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2025-08-27T16:33:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=09a1b33c080f6ac700fadc67c8471e67bf75fda4'/>
<id>urn:sha1:09a1b33c080f6ac700fadc67c8471e67bf75fda4</id>
<content type='text'>
We have an unpleasant wart in accessibility rules for struct mount.  There
are per-superblock lists of mounts, used by sb_prepare_remount_readonly()
to check if any of those is currently claimed for write access and to
block further attempts to get write access on those until we are done.

As soon as it is attached to a filesystem, mount becomes reachable
via that list.  Only sb_prepare_remount_readonly() traverses it and
it only accesses a few members of struct mount.  Unfortunately,
-&gt;mnt_flags is one of those and it is modified - MNT_WRITE_HOLD set
and then cleared.  It is done under mount_lock, so from the locking
rules POV everything's fine.

However, it has easily overlooked implications - once mount has been
attached to a filesystem, it has to be treated as globally visible.
In particular, initializing -&gt;mnt_flags *must* be done either prior
to that point or under mount_lock.  All other members are still
private at that point.

Life gets simpler if we move that bit (and that's *all* that can get
touched by access via this list) out of -&gt;mnt_flags.  It's not even
hard to do - currently the list is implemented as list_head one,
anchored in super_block-&gt;s_mounts and linked via mount-&gt;mnt_instance.

As the first step, switch it to hlist-like open-coded structure -
address of the first mount in the set is stored in -&gt;s_mounts
and -&gt;mnt_instance replaced with -&gt;mnt_next_for_sb and -&gt;mnt_pprev_for_sb -
the former either NULL or pointing to the next mount in set, the
latter - address of either -&gt;s_mounts or -&gt;mnt_next_for_sb in the
previous element of the set.

In the next commit we'll steal the LSB of -&gt;mnt_pprev_for_sb as
replacement for MNT_WRITE_HOLD.

Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
</feed>
