<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/namespace.c, branch v7.1-rc5</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1-rc5</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1-rc5'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-04-14T07:30:15+00:00</updated>
<entry>
<title>mount: always duplicate mount</title>
<updated>2026-04-14T07:30:15+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-03-23T14:05:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ad4999496e73923adb524b24c2f448c9498476b5'/>
<id>urn:sha1:ad4999496e73923adb524b24c2f448c9498476b5</id>
<content type='text'>
In the OPEN_TREE_NAMESPACE path vfs_open_tree() resolves a path via
filename_lookup() without holding namespace_lock. Between the lookup
and create_new_namespace() acquiring namespace_lock via
LOCK_MOUNT_EXACT_COPY() another thread can unmount the mount, setting
mnt-&gt;mnt_ns to NULL.

When create_new_namespace() then checks !mnt-&gt;mnt_ns it incorrectly
takes the swap-and-mntget path that was designed for fsmount()'s
detached mounts. This reuses a mount whose mnt_mp_list is in an
inconsistent state from the concurrent unmount, causing a general
protection fault in __umount_mnt() -&gt; hlist_del_init(&amp;mnt-&gt;mnt_mp_list)
during namespace teardown.

Remove the !mnt-&gt;mnt_ns special case entirely. Instead, always
duplicate the mount:

 - For OPEN_TREE_NAMESPACE use __do_loopback() which will properly
   clone the mount or reject it via may_copy_tree() if it was
   unmounted in the race window.
 - For fsmount() use clone_mnt() directly (via the new MOUNT_COPY_NEW
   flag) since the mount is freshly created by vfs_create_mount() and
   not in any namespace so __do_loopback()'s IS_MNT_UNBINDABLE,
   may_copy_tree, and __has_locked_children checks don't apply.

Reported-by: syzbot+e4470cc28308f2081ec8@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>move_mount: allow MOVE_MOUNT_BENEATH on the rootfs</title>
<updated>2026-03-12T12:34:59+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-02-24T00:40:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ccfac16e0be52b674ac04fb5ba88c643f76ae0e1'/>
<id>urn:sha1:ccfac16e0be52b674ac04fb5ba88c643f76ae0e1</id>
<content type='text'>
Allow MOVE_MOUNT_BENEATH to target the caller's rootfs. When the target
of a mount-beneath operation is the caller's root mount, verify that:

(1) The caller is located at the root of the mount, as enforced by
    path_mounted() in do_lock_mount().
(2) Propagation from the parent mount would not overmount the target,
    to avoid propagating beneath the rootfs of other mount namespaces.

The root-switching is decomposed into individually atomic, locally-scoped
steps: mount-beneath inserts the new root under the old one, chroot(".")
switches the caller's root, and umount2(".", MNT_DETACH) removes the old
root. Since each step only modifies the caller's own state, this avoids
cross-namespace vulnerabilities and inherent fork/unshare/setns races
that a chroot_fs_refs()-based approach would have.

Userspace can use the following workflow to switch roots:

    fd_tree = open_tree(-EBADF, "/newroot",
                        OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC);
    fchdir(fd_tree);
    move_mount(fd_tree, "", AT_FDCWD, "/",
               MOVE_MOUNT_BENEATH | MOVE_MOUNT_F_EMPTY_PATH);
    chroot(".");
    umount2(".", MNT_DETACH);

Link: https://patch.msgid.link/20260224-work-mount-beneath-rootfs-v1-2-8c58bf08488f@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>move_mount: transfer MNT_LOCKED</title>
<updated>2026-03-12T12:34:58+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-02-24T00:40:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c62a4766937edec2962d52e583276b459b739f2d'/>
<id>urn:sha1:c62a4766937edec2962d52e583276b459b739f2d</id>
<content type='text'>
When performing a mount-beneath operation the target mount can often be
locked:

  unshare(CLONE_NEWUSER | CLONE_NEWNS);
  mount --beneath -t tmpfs tmpfs /proc

will fail because the procfs mount on /proc became locked when the mount
namespace was created from the parent mount namespace. Same logic for:

  unshare(CLONE_NEWUSER | CLONE_NEWNS);
  mount --beneath -t tmpfs tmpfs /

MNT_LOCKED is raised to prevent an unprivileged mount namespace from
revealing whatever is under a given mount. To replace the rootfs we need
to handle that case though.

We can simply transfer the locked mount property from the top mount to
the mount beneath. The new mount we mounted beneath the top mount takes
over the job of the top mount in protecting the parent mount from being
revealed. This leaves us free to allow the top mount to be unmounted.

This also works during mount propagation and also works for the
non-MOVE_MOUNT_BENEATH case:

(1) move_mount(MOVE_MOUNT_BENEATH): @source_mnt-&gt;overmount always NULL
(2) move_mount():                   @source_mnt-&gt;overmount maybe !NULL

For (1) can_move_mount_beneath() rejects overmounted @source_mnt (We
could allow this but whatever it's not really a use-case and it's fugly
to move an overmounted mount stack around. What are you even doing? So
let's keep that restriction.

For (2) we can have @source_mnt overmounted (Someone overmounted us
while we locked the target mount.). Both are fine. @source_mnt will be
mounted on whatever @q was mounted on and @q will be mounted on the top
of the @source_mnt mount stack. Even in such cases we can unlock @q and
lock @source_mnt if @q was locked.

This effectively makes mount propagation useful in cases where a mount
namespace has a locked mount somewhere and we propagate a new mount
beneath it but the mount namespace could never get at it because the old
top mount remains locked. Again, we just let the newly propagated mount
take over the protection and unlock the top mount.

Link: https://patch.msgid.link/20260224-work-mount-beneath-rootfs-v1-1-8c58bf08488f@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>namespace: allow creating empty mount namespaces</title>
<updated>2026-03-12T12:33:55+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-03-06T16:28:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d4e752a24f740b31ca827bfab07010e4e7f34b0'/>
<id>urn:sha1:9d4e752a24f740b31ca827bfab07010e4e7f34b0</id>
<content type='text'>
Add support for creating a mount namespace that contains only a copy of
the root mount from the caller's mount namespace, with none of the
child mounts.  This is useful for containers and sandboxes that want to
start with a minimal mount table and populate it from scratch rather
than inheriting and then tearing down the full mount tree.

Two new flags are introduced:

- CLONE_EMPTY_MNTNS for clone3(), using the 64-bit flag space.

- UNSHARE_EMPTY_MNTNS for unshare(), reusing the
  CLONE_PARENT_SETTID bit which has no meaning for unshare.

Both flags imply CLONE_NEWNS.  For the unshare path,
UNSHARE_EMPTY_MNTNS is converted to CLONE_EMPTY_MNTNS in
unshare_nsproxy_namespaces() before it reaches copy_mnt_ns(), so the
mount namespace code only needs to handle a single flag.

In copy_mnt_ns(), when CLONE_EMPTY_MNTNS is set, clone_mnt() is used
instead of copy_tree() to clone only the root mount.  The caller's root
and working directory are both reset to the root dentry of the new
mount.

The cleanup variables are changed from vfsmount pointers with
__free(mntput) to struct path with __free(path_put) because the empty
mount namespace path needs to release both mount and dentry references
when replacing the caller's root and pwd.  In the normal (non-empty)
path only the mount component is set, and dput(NULL) is a no-op so
path_put remains correct there as well.

Link: https://patch.msgid.link/20260306-work-empty-mntns-consolidated-v1-1-6eb30529bbb0@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>mount: add FSMOUNT_NAMESPACE</title>
<updated>2026-03-12T12:33:54+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-01-22T10:48:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5e8969bd192712419aae511dd5ba26855c2c78db'/>
<id>urn:sha1:5e8969bd192712419aae511dd5ba26855c2c78db</id>
<content type='text'>
Add FSMOUNT_NAMESPACE flag to fsmount() that creates a new mount
namespace with the newly created filesystem attached to a copy of the
real rootfs. This returns a namespace file descriptor instead of an
O_PATH mount fd, similar to how OPEN_TREE_NAMESPACE works for open_tree().

This allows creating a new filesystem and immediately placing it in a
new mount namespace in a single operation, which is useful for container
runtimes and other namespace-based isolation mechanisms.

The rootfs mount is created before copying the real rootfs for the new
namespace meaning that the mount namespace id for the mount of the root
of the namespace is bigger than the child mounted on top of it. We've
never explicitly given the guarantee for such ordering and I doubt
anyone relies on it. Accepting that lets us avoid copying the mount
again and also avoids having to massage may_copy_tree() to grant an
exception for fsmount-&gt;mnt-&gt;mnt_ns being NULL.

Link: https://patch.msgid.link/20260122-work-fsmount-namespace-v1-3-5ef0a886e646@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>mount: simplify __do_loopback()</title>
<updated>2026-03-12T12:33:53+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-01-22T10:48:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ad4a3599e58d5ac0caa3f576c48a4b62f38d400d'/>
<id>urn:sha1:ad4a3599e58d5ac0caa3f576c48a4b62f38d400d</id>
<content type='text'>
Remove the OPEN_TREE_NAMESPACE flag checking from __do_loopback() and
instead have callers pass CL_COPY_MNT_NS_FILE directly in copy_flags.

Link: https://patch.msgid.link/20260122-work-fsmount-namespace-v1-2-5ef0a886e646@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>mount: start iterating from start of rbtree</title>
<updated>2026-03-12T12:33:50+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-01-22T10:48:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0c0b046392b5b6e2402bf75215ab0c3a242d7af5'/>
<id>urn:sha1:0c0b046392b5b6e2402bf75215ab0c3a242d7af5</id>
<content type='text'>
If the root of the namespace has an id that's greater than the child
we'd not find it. Handle that case.

Link: https://patch.msgid.link/20260122-work-fsmount-namespace-v1-1-5ef0a886e646@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2026-02-25T18:34:23+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-25T18:34:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0e335a7745b0a3e0421d6b4fff718c0deeb130ee'/>
<id>urn:sha1:0e335a7745b0a3e0421d6b4fff718c0deeb130ee</id>
<content type='text'>
Pull vfs fixes from Christian Brauner:

 - Fix an uninitialized variable in file_getattr().

   The flags_valid field wasn't initialized before calling
   vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse

 - Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is
   not enabled.

   sysctl_hung_task_timeout_secs is 0 in that case causing spurious
   "waiting for writeback completion for more than 1 seconds" warnings

 - Fix a null-ptr-deref in do_statmount() when the mount is internal

 - Add missing kernel-doc description for the @private parameter in
   iomap_readahead()

 - Fix mount namespace creation to hold namespace_sem across the mount
   copy in create_new_namespace().

   The previous drop-and-reacquire pattern was fragile and failed to
   clean up mount propagation links if the real rootfs was a shared or
   dependent mount

 - Fix /proc mount iteration where m-&gt;index wasn't updated when
   m-&gt;show() overflows, causing a restart to repeatedly show the same
   mount entry in a rapidly expanding mount table

 - Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the
   inode number is out of range

 - Fix unshare(2) when CLONE_NEWNS is set and current-&gt;fs isn't shared.

   copy_mnt_ns() received the live fs_struct so if a subsequent
   namespace creation failed the rollback would leave pwd and root
   pointing to detached mounts. Always allocate a new fs_struct when
   CLONE_NEWNS is requested

 - fserror bug fixes:

    - Remove the unused fsnotify_sb_error() helper now that all callers
      have been converted to fserror_report_metadata

    - Fix a lockdep splat in fserror_report() where igrab() takes
      inode::i_lock which can be held in IRQ context.

      Replace igrab() with a direct i_count bump since filesystems
      should not report inodes that are about to be freed or not yet
      exposed

 - Handle error pointer in procfs for try_lookup_noperm()

 - Fix an integer overflow in ep_loop_check_proc() where recursive calls
   returning INT_MAX would overflow when +1 is added, breaking the
   recursion depth check

 - Fix a misleading break in pidfs

* tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  pidfs: avoid misleading break
  eventpoll: Fix integer overflow in ep_loop_check_proc()
  proc: Fix pointer error dereference
  fserror: fix lockdep complaint when igrabbing inode
  fsnotify: drop unused helper
  unshare: fix unshare_fs() handling
  minix: Correct errno in minix_new_inode
  namespace: fix proc mount iteration
  mount: hold namespace_sem across copy in create_new_namespace()
  iomap: Describe @private in iomap_readahead()
  statmount: Fix the null-ptr-deref in do_statmount()
  writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK
  fs: init flags_valid before calling vfs_fileattr_get
</content>
</entry>
<entry>
<title>Convert 'alloc_obj' family to use the new default GFP_KERNEL argument</title>
<updated>2026-02-22T01:09:51+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-22T00:37:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43'/>
<id>urn:sha1:bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43</id>
<content type='text'>
This was done entirely with mindless brute force, using

    git grep -l '\&lt;k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: Replace kmalloc with kmalloc_obj for non-scalar types</title>
<updated>2026-02-21T09:02:28+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2026-02-21T07:49:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=69050f8d6d075dc01af7a5f2f550a8067510366f'/>
<id>urn:sha1:69050f8d6d075dc01af7a5f2f550a8067510366f</id>
<content type='text'>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
</feed>
