<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/nsfs.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-25T10:08:42+00:00</updated>
<entry>
<title>nsfs: tighten permission checks for ns iteration ioctls</title>
<updated>2026-03-25T10:08:42+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-03-17T23:37:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3376b345df155ca36d8611857b41ff7d5183fc38'/>
<id>urn:sha1:3376b345df155ca36d8611857b41ff7d5183fc38</id>
<content type='text'>
[ Upstream commit e6b899f08066e744f89df16ceb782e06868bd148 ]

Even privileged services should not necessarily be able to see other
privileged service's namespaces so they can't leak information to each
other. Use may_see_all_namespaces() helper that centralizes this policy
until the nstree adapts.

Link: https://patch.msgid.link/20260226-work-visibility-fixes-v1-1-d2c2853313bd@kernel.org
Fixes: a1d220d9dafa ("nsfs: iterate through mount namespaces")
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Cc: stable@kernel.org # v6.12+
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
[ Different file names ]
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>[tree-wide] finally take no_llseek out</title>
<updated>2024-09-27T15:18:43+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2024-09-27T01:56:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cb787f4ac0c2e439ea8d7e6387b925f74576bdf8'/>
<id>urn:sha1:cb787f4ac0c2e439ea8d7e6387b925f74576bdf8</id>
<content type='text'>
no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\&lt;no_llseek\&gt;/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>nsfs: iterate through mount namespaces</title>
<updated>2024-08-09T10:46:59+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-07-19T11:41:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a1d220d9dafa8d76ba60a784a1016c3134e6a1e8'/>
<id>urn:sha1:a1d220d9dafa8d76ba60a784a1016c3134e6a1e8</id>
<content type='text'>
It is already possible to list mounts in other mount namespaces and to
retrieve namespace file descriptors without having to go through procfs
by deriving them from pidfds.

Augment these abilities by adding the ability to retrieve information
about a mount namespace via NS_MNT_GET_INFO. This will return the mount
namespace id and the number of mounts currently in the mount namespace.
The number of mounts can be used to size the buffer that needs to be
used for listmount() and is in general useful without having to actually
iterate through all the mounts. The structure is extensible.

And add the ability to iterate through all mount namespaces over which
the caller holds privilege returning the file descriptor for the next or
previous mount namespace.

To retrieve a mount namespace the caller must be privileged wrt to it's
owning user namespace. This means that PID 1 on the host can list all
mounts in all mount namespaces or that a container can list all mounts
of its nested containers.

Optionally pass a structure for NS_MNT_GET_INFO with
NS_MNT_GET_{PREV,NEXT} to retrieve information about the mount namespace
in one go. Both ioctls can be implemented for other namespace types
easily.

Together with recent api additions this means one can iterate through
all mounts in all mount namespaces without ever touching procfs.

Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-5-834113cab0d2@kernel.org
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>nsfs: use cleanup guard</title>
<updated>2024-07-18T07:50:08+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-07-16T07:19:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=280e36f0d5b997173d014c07484c03a7f7750668'/>
<id>urn:sha1:280e36f0d5b997173d014c07484c03a7f7750668</id>
<content type='text'>
Ensure that rcu read lock is given up before returning.

Link: https://lore.kernel.org/r/20240716-elixier-fliesen-1ab342151a61@brauner
Fixes: ca567df74a28 ("nsfs: add pid translation ioctls")
Reported-by: syzbot+a3e82ae343b26b4d2335@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'vfs-6.11.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2024-07-15T19:34:01+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-07-15T19:34:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=98f3a9a4fd449641010c77abca16aebb0b8d4419'/>
<id>urn:sha1:98f3a9a4fd449641010c77abca16aebb0b8d4419</id>
<content type='text'>
Pull pidfs updates from Christian Brauner:
 "This contains work to make it possible to derive namespace file
  descriptors from pidfd file descriptors.

  Right now it is already possible to use a pidfd with setns() to
  atomically change multiple namespaces at the same time. In other
  words, it is possible to switch to the namespace context of a process
  using a pidfd. There is no need to first open namespace file
  descriptors via procfs.

  The work included here is an extension of these abilities by allowing
  to open namespace file descriptors using a pidfd. This means it is now
  possible to interact with namespaces without ever touching procfs.

  To this end a new set of ioctls() on pidfds is introduced covering all
  supported namespace types"

* tag 'vfs-6.11.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  pidfs: allow retrieval of namespace file descriptors
  nsfs: add open_namespace()
  nsproxy: add helper to go from arbitrary namespace to ns_common
  nsproxy: add a cleanup helper for nsproxy
  file: add take_fd() cleanup helper
</content>
</entry>
<entry>
<title>Merge tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2024-07-15T19:27:39+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-07-15T19:27:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1b074abe885f43b2c207b5e748ffa60604dbc020'/>
<id>urn:sha1:1b074abe885f43b2c207b5e748ffa60604dbc020</id>
<content type='text'>
Pull namespace-fs updates from Christian Brauner:
 "This adds ioctls allowing to translate PIDs between PID namespaces.

  The motivating use-case comes from LXCFS which is a tiny fuse
  filesystem used to virtualize various aspects of procfs. LXCFS is run
  on the host. The files and directories it creates can be bind-mounted
  by e.g. a container at startup and mounted over the various procfs
  files the container wishes to have virtualized.

  When e.g. a read request for uptime is received, LXCFS will receive
  the pid of the reader. In order to virtualize the corresponding read,
  LXCFS needs to know the pid of the init process of the reader's pid
  namespace.

  In order to do this, LXCFS first needs to fork() two helper processes.
  The first helper process setns() to the readers pid namespace. The
  second helper process is needed to create a process that is a proper
  member of the pid namespace.

  The second helper process then creates a ucred message with ucred.pid
  set to 1 and sends it back to LXCFS. The kernel will translate the
  ucred.pid field to the corresponding pid number in LXCFS's pid
  namespace. This way LXCFS can learn the init pid number of the
  reader's pid namespace and can go on to virtualize.

  Since these two forks() are costly LXCFS maintains an init pid cache
  that caches a given pid for a fixed amount of time. The cache is
  pruned during new read requests. However, even with the cache the hit
  of the two forks() is singificant when a very large number of
  containers are running.

  So this adds a simple set of ioctls that let's a caller translate PIDs
  from and into a given PID namespace. This significantly improves
  performance with a very simple change.

  To protect against races pidfds can be used to check whether the
  process is still valid"

* tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  nsfs: add pid translation ioctls
</content>
</entry>
<entry>
<title>nsfs: add open_namespace()</title>
<updated>2024-06-28T08:37:29+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-06-27T14:11:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=460695a294e63de4e4f30ad1679ffddeec932bb8'/>
<id>urn:sha1:460695a294e63de4e4f30ad1679ffddeec932bb8</id>
<content type='text'>
and call it from open_related_ns().

Link: https://lore.kernel.org/r/20240627-work-pidfs-v1-3-7e9ab6cc3bb1@kernel.org
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Alexander Mikhalitsyn &lt;aleksandr.mikhalitsyn@canonical.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: add an ioctl to get the mnt ns id from nsfs</title>
<updated>2024-06-28T07:53:31+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2024-06-24T15:49:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e8e43a1fcc5c07575f37e40f8a2cd78aee46f9a0'/>
<id>urn:sha1:e8e43a1fcc5c07575f37e40f8a2cd78aee46f9a0</id>
<content type='text'>
In order to utilize the listmount() and statmount() extensions that
allow us to call them on different namespaces we need a way to get the
mnt namespace id from user space.  Add an ioctl to nsfs that will allow
us to extract the mnt namespace id in order to make these new extensions
usable.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Link: https://lore.kernel.org/r/180449959d5a756af7306d6bda55f41b9d53e3cb.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>nsfs: add pid translation ioctls</title>
<updated>2024-06-25T21:00:41+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2020-06-07T20:47:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca567df74a28a9fb368c6b2d93e864113f73f5c2'/>
<id>urn:sha1:ca567df74a28a9fb368c6b2d93e864113f73f5c2</id>
<content type='text'>
Add ioctl()s to translate pids between pid namespaces.

LXCFS is a tiny fuse filesystem used to virtualize various aspects of
procfs. LXCFS is run on the host. The files and directories it creates
can be bind-mounted by e.g. a container at startup and mounted over the
various procfs files the container wishes to have virtualized. When e.g.
a read request for uptime is received, LXCFS will receive the pid of the
reader. In order to virtualize the corresponding read, LXCFS needs to
know the pid of the init process of the reader's pid namespace. In order
to do this, LXCFS first needs to fork() two helper processes. The first
helper process setns() to the readers pid namespace. The second helper
process is needed to create a process that is a proper member of the pid
namespace. The second helper process then creates a ucred message with
ucred.pid set to 1 and sends it back to LXCFS. The kernel will translate
the ucred.pid field to the corresponding pid number in LXCFS's pid
namespace. This way LXCFS can learn the init pid number of the reader's
pid namespace and can go on to virtualize. Since these two forks() are
costly LXCFS maintains an init pid cache that caches a given pid for a
fixed amount of time. The cache is pruned during new read requests.
However, even with the cache the hit of the two forks() is singificant
when a very large number of containers are running. With this simple
patch we add an ns ioctl that let's a caller retrieve the init pid nr of
a pid namespace through its pid namespace fd. This significantly
improves performance with a very simple change.

Support translation of pids and tgids. Other concepts can be added but
there are no obvious users for this right now.

To protect against races pidfds can be used to check whether the process
is still valid. If needed, this can also be extended to work on pidfds
directly.

Link: https://lore.kernel.org/r/20240619-work-ns_ioctl-v1-1-7c0097e6bb6b@kernel.org
Reviewed-by: Alexander Mikhalitsyn &lt;aleksandr.mikhalitsyn@canonical.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>pidfs: remove config option</title>
<updated>2024-03-13T19:53:53+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-03-12T09:39:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d9539db8638cfe053fcd1f441746f0e2c8c2d32'/>
<id>urn:sha1:9d9539db8638cfe053fcd1f441746f0e2c8c2d32</id>
<content type='text'>
As Linus suggested this enables pidfs unconditionally. A key property to
retain is the ability to compare pidfds by inode number (cf. [1]).
That's extremely helpful just as comparing namespace file descriptors by
inode number is. They are used in a variety of scenarios where they need
to be compared, e.g., when receiving a pidfd via SO_PEERPIDFD from a
socket to trivially authenticate a the sender and various other
use-cases.

For 64bit systems this is pretty trivial to do. For 32bit it's slightly
more annoying as we discussed but we simply add a dumb ida based
allocator that gets used on 32bit. This gives the same guarantees about
inode numbers on 64bit without any overflow risk. Practically, we'll
never run into overflow issues because we're constrained by the number
of processes that can exist on 32bit and by the number of open files
that can exist on a 32bit system. On 64bit none of this matters and
things are very simple.

If 32bit also needs the uniqueness guarantee they can simply parse the
contents of /proc/&lt;pid&gt;/fd/&lt;nr&gt;. The uniqueness guarantees have a
variety of use-cases. One of the most obvious ones is that they will
make pidfiles (or "pidfdfiles", I guess) reliable as the unique
identifier can be placed into there that won't be reycled. Also a
frequent request.

Note, I took the chance and simplified path_from_stashed() even further.
Instead of passing the inode number explicitly to path_from_stashed() we
let the filesystem handle that internally. So path_from_stashed() ends
up even simpler than it is now. This is also a good solution allowing
the cleanup code to be clean and consistent between 32bit and 64bit. The
cleanup path in prepare_anon_dentry() is also switched around so we put
the inode before the dentry allocation. This means we only have to call
the cleanup handler for the filesystem's inode data once and can rely
-&gt;evict_inode() otherwise.

Aside from having to have a bit of extra code for 32bit it actually ends
up a nice cleanup for path_from_stashed() imho.

Tested on both 32 and 64bit including error injection.

Link: https://github.com/systemd/systemd/pull/31713 [1]
Link: https://lore.kernel.org/r/20240312-dingo-sehnlich-b3ecc35c6de7@brauner
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
