<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/nsfs.c, branch v7.1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-05-11T13:59:14+00:00</updated>
<entry>
<title>nsfs: fix wrong error code returned for pidns ioctls</title>
<updated>2026-05-11T13:59:14+00:00</updated>
<author>
<name>Zhihao Cheng</name>
<email>chengzhihao1@huawei.com</email>
</author>
<published>2026-05-07T11:23:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=725ecd80688bf3c57ca9205431f2c06174ff0756'/>
<id>urn:sha1:725ecd80688bf3c57ca9205431f2c06174ff0756</id>
<content type='text'>
When executing NS_GET_PID_FROM_PIDNS (or similar pidns ioctls), if the
target task cannot be found in the corresponding pid_ns, the error code
should be ESRCH instead of ENOTTY.

This bug was introduced when the extensible ioctl handling was added.
Without proper return, ret would be overwritten by the default case in
the extensible ioctl switch statement.

Fixes: a1d220d9dafa8 ("nsfs: iterate through mount namespaces")
Signed-off-by: Zhihao Cheng &lt;chengzhihao1@huawei.com&gt;
Link: https://patch.msgid.link/20260507112301.1042757-1-chengzhihao1@huawei.com
Reviewed-by: Yang Erkun &lt;yangerkun@huawei.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2026-04-13T19:19:01+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-13T19:19:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b7d74ea0fdaa8d641fe6f18507c5f0d21b652d53'/>
<id>urn:sha1:b7d74ea0fdaa8d641fe6f18507c5f0d21b652d53</id>
<content type='text'>
Pull vfs i_ino updates from Christian Brauner:
 "For historical reasons, the inode-&gt;i_ino field is an unsigned long,
  which means that it's 32 bits on 32 bit architectures. This has caused
  a number of filesystems to implement hacks to hash a 64-bit identifier
  into a 32-bit field, and deprives us of a universal identifier field
  for an inode.

  This changes the inode-&gt;i_ino field from an unsigned long to a u64.
  This shouldn't make any material difference on 64-bit hosts, but
  32-bit hosts will see struct inode grow by at least 4 bytes. This
  could have effects on slabcache sizes and field alignment.

  The bulk of the changes are to format strings and tracepoints, since
  the kernel itself doesn't care that much about the i_ino field. The
  first patch changes some vfs function arguments, so check that one out
  carefully.

  With this change, we may be able to shrink some inode structures. For
  instance, struct nfs_inode has a fileid field that holds the 64-bit
  inode number. With this set of changes, that field could be
  eliminated. I'd rather leave that sort of cleanups for later just to
  keep this simple"

* tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group()
  EVM: add comment describing why ino field is still unsigned long
  vfs: remove externs from fs.h on functions modified by i_ino widening
  treewide: fix missed i_ino format specifier conversions
  ext4: fix signed format specifier in ext4_load_inode trace event
  treewide: change inode-&gt;i_ino from unsigned long to u64
  nilfs2: widen trace event i_ino fields to u64
  f2fs: widen trace event i_ino fields to u64
  ext4: widen trace event i_ino fields to u64
  zonefs: widen trace event i_ino fields to u64
  hugetlbfs: widen trace event i_ino fields to u64
  ext2: widen trace event i_ino fields to u64
  cachefiles: widen trace event i_ino fields to u64
  vfs: widen trace event i_ino fields to u64
  net: change sock.sk_ino and sock_i_ino() to u64
  audit: widen ino fields to u64
  vfs: widen inode hash/lookup functions to u64
</content>
</entry>
<entry>
<title>treewide: change inode-&gt;i_ino from unsigned long to u64</title>
<updated>2026-03-06T13:31:28+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2026-03-04T15:32:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b2600f81cefcdfcda58d50df7be8fd48ada8ce2'/>
<id>urn:sha1:0b2600f81cefcdfcda58d50df7be8fd48ada8ce2</id>
<content type='text'>
On 32-bit architectures, unsigned long is only 32 bits wide, which
causes 64-bit inode numbers to be silently truncated. Several
filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that
exceed 32 bits, and this truncation can lead to inode number collisions
and other subtle bugs on 32-bit systems.

Change the type of inode-&gt;i_ino from unsigned long to u64 to ensure that
inode numbers are always represented as 64-bit values regardless of
architecture. Update all format specifiers treewide from %lu/%lx to
%llu/%llx to match the new type, along with corresponding local variable
types.

This is the bulk treewide conversion. Earlier patches in this series
handled trace events separately to allow trace field reordering for
better struct packing on 32-bit.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org
Acked-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>nsfs: tighten permission checks for handle opening</title>
<updated>2026-02-27T21:00:11+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-02-26T13:50:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d2324a9317f00013facb0ba00b00440e19d2af5e'/>
<id>urn:sha1:d2324a9317f00013facb0ba00b00440e19d2af5e</id>
<content type='text'>
Even privileged services should not necessarily be able to see other
privileged service's namespaces so they can't leak information to each
other. Use may_see_all_namespaces() helper that centralizes this policy
until the nstree adapts.

Link: https://patch.msgid.link/20260226-work-visibility-fixes-v1-2-d2c2853313bd@kernel.org
Fixes: 5222470b2fbb ("nsfs: support file handles")
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Cc: stable@kernel.org # v6.18+
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>nsfs: tighten permission checks for ns iteration ioctls</title>
<updated>2026-02-27T21:00:08+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-02-26T13:50:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e6b899f08066e744f89df16ceb782e06868bd148'/>
<id>urn:sha1:e6b899f08066e744f89df16ceb782e06868bd148</id>
<content type='text'>
Even privileged services should not necessarily be able to see other
privileged service's namespaces so they can't leak information to each
other. Use may_see_all_namespaces() helper that centralizes this policy
until the nstree adapts.

Link: https://patch.msgid.link/20260226-work-visibility-fixes-v1-1-d2c2853313bd@kernel.org
Fixes: a1d220d9dafa ("nsfs: iterate through mount namespaces")
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Cc: stable@kernel.org # v6.12+
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>mount: add OPEN_TREE_NAMESPACE</title>
<updated>2026-01-16T18:21:40+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-12-29T13:03:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9b8a0ba68246a61d903ce62c35c303b1501df28b'/>
<id>urn:sha1:9b8a0ba68246a61d903ce62c35c303b1501df28b</id>
<content type='text'>
When creating containers the setup usually involves using CLONE_NEWNS
via clone3() or unshare(). This copies the caller's complete mount
namespace. The runtime will also assemble a new rootfs and then use
pivot_root() to switch the old mount tree with the new rootfs. Afterward
it will recursively umount the old mount tree thereby getting rid of all
mounts.

On a basic system here where the mount table isn't particularly large
this still copies about 30 mounts. Copying all of these mounts only to
get rid of them later is pretty wasteful.

This is exacerbated if intermediary mount namespaces are used that only
exist for a very short amount of time and are immediately destroyed
again causing a ton of mounts to be copied and destroyed needlessly.

With a large mount table and a system where thousands or ten-thousands
of containers are spawned in parallel this quickly becomes a bottleneck
increasing contention on the semaphore.

Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to
OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of
returning a file descriptor referring to that mount tree
OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor
to a new mount namespace. In that new mount namespace the copied mount
tree has been mounted on top of a copy of the real rootfs.

The caller can setns() into that mount namespace and perform any
additionally required setup such as move_mount() detached mounts in
there.

This allows OPEN_TREE_NAMESPACE to function as a combined
unshare(CLONE_NEWNS) and pivot_root().

A caller may for example choose to create an extremely minimal rootfs:

fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE);

This will create a mount namespace where "wootwoot" has become the
rootfs mounted on top of the real rootfs. The caller can now setns()
into this new mount namespace and assemble additional mounts.

This also works with user namespaces:

unshare(CLONE_NEWUSER);
fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE);

which creates a new mount namespace owned by the earlier created user
namespace with "wootwoot" as the rootfs mounted on top of the real
rootfs.

Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-1-bfb24c7b061f@kernel.org
Tested-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Suggested-by: Christian Brauner &lt;brauner@kernel.org&gt;
Suggested-by: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2025-12-02T01:32:07+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-12-02T01:32:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1b5dd29869b1e63f7e5c37d7552e2dcf22de3c26'/>
<id>urn:sha1:1b5dd29869b1e63f7e5c37d7552e2dcf22de3c26</id>
<content type='text'>
Pull fd prepare updates from Christian Brauner:
 "This adds the FD_ADD() and FD_PREPARE() primitive. They simplify the
  common pattern of get_unused_fd_flags() + create file + fd_install()
  that is used extensively throughout the kernel and currently requires
  cumbersome cleanup paths.

  FD_ADD() - For simple cases where a file is installed immediately:

      fd = FD_ADD(O_CLOEXEC, vfio_device_open_file(device));
      if (fd &lt; 0)
          vfio_device_put_registration(device);
      return fd;

  FD_PREPARE() - For cases requiring access to the fd or file, or
  additional work before publishing:

      FD_PREPARE(fdf, O_CLOEXEC, sync_file-&gt;file);
      if (fdf.err) {
          fput(sync_file-&gt;file);
          return fdf.err;
      }

      data.fence = fd_prepare_fd(fdf);
      if (copy_to_user((void __user *)arg, &amp;data, sizeof(data)))
          return -EFAULT;

      return fd_publish(fdf);

  The primitives are centered around struct fd_prepare. FD_PREPARE()
  encapsulates all allocation and cleanup logic and must be followed by
  a call to fd_publish() which associates the fd with the file and
  installs it into the caller's fdtable. If fd_publish() isn't called,
  both are deallocated automatically. FD_ADD() is a shorthand that does
  fd_publish() immediately and never exposes the struct to the caller.

  I've implemented this in a way that it's compatible with the cleanup
  infrastructure while also being usable separately. IOW, it's centered
  around struct fd_prepare which is aliased to class_fd_prepare_t and so
  we can make use of all the basica guard infrastructure"

* tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits)
  io_uring: convert io_create_mock_file() to FD_PREPARE()
  file: convert replace_fd() to FD_PREPARE()
  vfio: convert vfio_group_ioctl_get_device_fd() to FD_ADD()
  tty: convert ptm_open_peer() to FD_ADD()
  ntsync: convert ntsync_obj_get_fd() to FD_PREPARE()
  media: convert media_request_alloc() to FD_PREPARE()
  hv: convert mshv_ioctl_create_partition() to FD_ADD()
  gpio: convert linehandle_create() to FD_PREPARE()
  pseries: port papr_rtas_setup_file_interface() to FD_ADD()
  pseries: convert papr_platform_dump_create_handle() to FD_ADD()
  spufs: convert spufs_gang_open() to FD_PREPARE()
  papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()
  spufs: convert spufs_context_open() to FD_PREPARE()
  net/socket: convert __sys_accept4_file() to FD_ADD()
  net/socket: convert sock_map_fd() to FD_ADD()
  net/kcm: convert kcm_ioctl() to FD_PREPARE()
  net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE()
  secretmem: convert memfd_secret() to FD_ADD()
  memfd: convert memfd_create() to FD_ADD()
  bpf: convert bpf_token_create() to FD_PREPARE()
  ...
</content>
</entry>
<entry>
<title>nsfs: convert ns_ioctl() to FD_PREPARE()</title>
<updated>2025-11-28T11:42:32+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-11-23T16:33:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3d8aefd49aed34d2a3acfdcb10a0af0d0fe94a7c'/>
<id>urn:sha1:3d8aefd49aed34d2a3acfdcb10a0af0d0fe94a7c</id>
<content type='text'>
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-10-b6efa1706cfd@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>nsfs: convert open_namespace() to FD_PREPARE()</title>
<updated>2025-11-28T11:42:32+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-11-23T16:33:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=00de6e244807aa8e9c00cb6ad2976c429058d4ac'/>
<id>urn:sha1:00de6e244807aa8e9c00cb6ad2976c429058d4ac</id>
<content type='text'>
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-9-b6efa1706cfd@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>ns: handle setns(pidfd, ...) cleanly</title>
<updated>2025-11-10T09:20:54+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-11-09T21:11:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f8d5a8970d2f49411824fb1fdd34bbb3eea22756'/>
<id>urn:sha1:f8d5a8970d2f49411824fb1fdd34bbb3eea22756</id>
<content type='text'>
The setns() system call supports:

(1) namespace file descriptors (nsfd)
(2) process file descriptors (pidfd)

When using nsfds the namespaces will remain active because they are
pinned by the vfs. However, when pidfds are used things are more
complicated.

When the target task exits and passes through exit_nsproxy_namespaces()
or is reaped and thus also passes through exit_cred_namespaces() after
the setns()'ing task has called prepare_nsset() but before the active
reference count of the set of namespaces it wants to setns() to might
have been dropped already:

  P1                                                              P2

  pid_p1 = clone(CLONE_NEWUSER | CLONE_NEWNET | CLONE_NEWNS)
                                                                  pidfd = pidfd_open(pid_p1)
                                                                  setns(pidfd, CLONE_NEWUSER | CLONE_NEWNET | CLONE_NEWNS)
                                                                  prepare_nsset()

  exit(0)
  // ns-&gt;__ns_active_ref        == 1
  // parent_ns-&gt;__ns_active_ref == 1
  -&gt; exit_nsproxy_namespaces()
  -&gt; exit_cred_namespaces()

  // ns_active_ref_put() will also put
  // the reference on the owner of the
  // namespace. If the only reason the
  // owning namespace was alive was
  // because it was a parent of @ns
  // it's active reference count now goes
  // to zero... --------------------------------
  //                                           |
  // ns-&gt;__ns_active_ref        == 0           |
  // parent_ns-&gt;__ns_active_ref == 0           |
                                               |                  commit_nsset()
                                               -----------------&gt; // If setns()
                                                                  // now manages to install the namespaces
                                                                  // it will call ns_active_ref_get()
                                                                  // on them thus bumping the active reference
                                                                  // count from zero again but without also
                                                                  // taking the required reference on the owner.
                                                                  // Thus we get:
                                                                  //
                                                                  // ns-&gt;__ns_active_ref        == 1
                                                                  // parent_ns-&gt;__ns_active_ref == 0

  When later someone does ns_active_ref_put() on @ns it will underflow
  parent_ns-&gt;__ns_active_ref leading to a splat from our asserts
  thinking there are still active references when in fact the counter
  just underflowed.

So resurrect the ownership chain if necessary as well. If the caller
succeeded to grab passive references to the set of namespaces the
setns() should simply succeed even if the target task exists or gets
reaped in the meantime and thus has dropped all active references to its
namespaces.

The race is rare and can only be triggered when using pidfs to setns()
to namespaces. Also note that active reference on initial namespaces are
nops.

Since we now always handle parent references directly we can drop
ns_ref_active_get_owner() when adding a namespace to a namespace tree.
This is now all handled uniformly in the places where the new namespaces
actually become active.

Link: https://patch.msgid.link/20251109-namespace-6-19-fixes-v1-5-ae8a4ad5a3b3@kernel.org
Fixes: 3c9820d5c64a ("ns: add active reference count")
Reported-by: syzbot+1957b26299cf3ff7890c@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
</feed>
