<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/proc_namespace.c, branch v6.6.132</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-05-24T14:42:16+00:00</updated>
<entry>
<title>tty, proc, kernfs, random: Use copy_splice_read()</title>
<updated>2023-05-24T14:42:16+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2023-05-22T13:49:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b0072734ffaa3f5fec64058d0d3333765d789bc0'/>
<id>urn:sha1:b0072734ffaa3f5fec64058d0d3333765d789bc0</id>
<content type='text'>
Use copy_splice_read() for tty, procfs, kernfs and random files rather
than going through generic_file_splice_read() as they just copy the file
into the output buffer and don't splice pages.  This avoids the need for
them to have a -&gt;read_folio() to satisfy filemap_splice_read().

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
cc: Christoph Hellwig &lt;hch@lst.de&gt;
cc: Jens Axboe &lt;axboe@kernel.dk&gt;
cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
cc: David Hildenbrand &lt;david@redhat.com&gt;
cc: Matthew Wilcox &lt;willy@infradead.org&gt;
cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-13-dhowells@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>vfs: escape hash as well</title>
<updated>2022-06-28T17:58:05+00:00</updated>
<author>
<name>Siddhesh Poyarekar</name>
<email>siddhesh@gotplt.org</email>
</author>
<published>2022-06-28T00:30:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ed5fce76b5ea40c87b44cafbe4f3222da8ec981a'/>
<id>urn:sha1:ed5fce76b5ea40c87b44cafbe4f3222da8ec981a</id>
<content type='text'>
When a filesystem is mounted with a name that starts with a #:

 # mount '#name' /mnt/bad -t tmpfs

this will cause the entry to look like this (leading space added so
that git does not strip it out):

 #name /mnt/bad tmpfs rw,seclabel,relatime,inode64 0 0

This breaks getmntent and any code that aims to parse fstab as well as
/proc/mounts with the same logic since they need to strip leading spaces
or skip over comment lines, due to which they report incorrect output or
skip over the line respectively.

Solve this by translating the hash character into its octal encoding
equivalent so that applications can decode the name correctly.

Signed-off-by: Siddhesh Poyarekar &lt;siddhesh@gotplt.org&gt;
Signed-off-by: Ian Kent &lt;raven@themaw.net&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>fs: add is_idmapped_mnt() helper</title>
<updated>2021-12-03T17:44:06+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2021-12-03T11:16:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bb49e9e730c2906a958eee273a7819f401543d6c'/>
<id>urn:sha1:bb49e9e730c2906a958eee273a7819f401543d6c</id>
<content type='text'>
Multiple places open-code the same check to determine whether a given
mount is idmapped. Introduce a simple helper function that can be used
instead. This allows us to get rid of the fragile open-coding. We will
later change the check that is used to determine whether a given mount
is idmapped. Introducing a helper allows us to do this in a single
place instead of doing it for multiple places.

Link: https://lore.kernel.org/r/20211123114227.3124056-2-brauner@kernel.org (v1)
Link: https://lore.kernel.org/r/20211130121032.3753852-2-brauner@kernel.org (v2)
Link: https://lore.kernel.org/r/20211203111707.3901969-2-brauner@kernel.org
Cc: Seth Forshee &lt;sforshee@digitalocean.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Seth Forshee &lt;sforshee@digitalocean.com&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>fs: introduce MOUNT_ATTR_IDMAP</title>
<updated>2021-01-24T13:43:45+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2021-01-21T13:19:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9caccd41541a6f7d6279928d9f971f6642c361af'/>
<id>urn:sha1:9caccd41541a6f7d6279928d9f971f6642c361af</id>
<content type='text'>
Introduce a new mount bind mount property to allow idmapping mounts. The
MOUNT_ATTR_IDMAP flag can be set via the new mount_setattr() syscall
together with a file descriptor referring to a user namespace.

The user namespace referenced by the namespace file descriptor will be
attached to the bind mount. All interactions with the filesystem going
through that mount will be mapped according to the mapping specified in
the user namespace attached to it.

Using user namespaces to mark mounts means we can reuse all the existing
infrastructure in the kernel that already exists to handle idmappings
and can also use this for permission checking to allow unprivileged user
to create idmapped mounts in the future.

Idmapping a mount is decoupled from the caller's user and mount
namespace. This means idmapped mounts can be created in the initial
user namespace which is an important use-case for systemd-homed,
portable usb-sticks between systems, sharing data between the initial
user namespace and unprivileged containers, and other use-cases that
have been brought up. For example, assume a home directory where all
files are owned by uid and gid 1000 and the home directory is brought to
a new laptop where the user has id 12345. The system administrator can
simply create a mount of this home directory with a mapping of
1000:12345:1 and other mappings to indicate the ids should be kept.
(With this it is e.g. also possible to create idmapped mounts on the
host with an identity mapping 1:1:100000 where the root user is not
mapped. A user with root access that e.g. has been pivot rooted into
such a mount on the host will be not be able to execute, read, write, or
create files as root.)

Given that mapping a mount is decoupled from the caller's user namespace
a sufficiently privileged process such as a container manager can set up
an idmapped mount for the container and the container can simply pivot
root to it. There's no need for the container to do anything. The mount
will appear correctly mapped independent of the user namespace the
container uses. This means we don't need to mark a mount as idmappable.

In order to create an idmapped mount the caller must currently be
privileged in the user namespace of the superblock the mount belongs to.
Once a mount has been idmapped we don't allow it to change its mapping.
This keeps permission checking and life-cycle management simple. Users
wanting to change the idmapped can always create a new detached mount
with a different idmapping.

Link: https://lore.kernel.org/r/20210121131959.646623-36-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Mauricio Vásquez Bernal &lt;mauricio@kinvolk.io&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>proc mountinfo: make splice available again</title>
<updated>2020-12-27T20:00:36+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-12-27T18:56:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=14e3e989f6a5d9646b6cf60690499cc8bdc11f7d'/>
<id>urn:sha1:14e3e989f6a5d9646b6cf60690499cc8bdc11f7d</id>
<content type='text'>
Since commit 36e2c7421f02 ("fs: don't allow splice read/write without
explicit ops") we've required that file operation structures explicitly
enable splice support, rather than falling back to the default handlers.

Most /proc files use the indirect 'struct proc_ops' to describe their
file operations, and were fixed up to support splice earlier in commits
40be821d627c..b24c30c67863, but the mountinfo files interact with the
VFS directly using their own 'struct file_operations' and got missed as
a result.

This adds the necessary support for splice to work for /proc/*/mountinfo
and friends.

Reported-by: Joan Bruguera Micó &lt;joanbrugueram@gmail.com&gt;
Reported-by: Jussi Kivilinna &lt;jussi.kivilinna@iki.fi&gt;
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209971
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Add a "nosymfollow" mount option.</title>
<updated>2020-08-27T20:06:47+00:00</updated>
<author>
<name>Mattias Nissler</name>
<email>mnissler@chromium.org</email>
</author>
<published>2020-08-27T17:09:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dab741e0e02bd3c4f5e2e97be74b39df2523fc6e'/>
<id>urn:sha1:dab741e0e02bd3c4f5e2e97be74b39df2523fc6e</id>
<content type='text'>
For mounts that have the new "nosymfollow" option, don't follow symlinks
when resolving paths. The new option is similar in spirit to the
existing "nodev", "noexec", and "nosuid" options, as well as to the
LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD
variants have been supporting the "nosymfollow" mount option for a long
time with equivalent implementations.

Note that symlinks may still be created on file systems mounted with
the "nosymfollow" option present. readlink() remains functional, so
user space code that is aware of symlinks can still choose to follow
them explicitly.

Setting the "nosymfollow" mount option helps prevent privileged
writers from modifying files unintentionally in case there is an
unexpected link along the accessed path. The "nosymfollow" option is
thus useful as a defensive measure for systems that need to deal with
untrusted file systems in privileged contexts.

More information on the history and motivation for this patch can be
found here:

https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/hardening-against-malicious-stateful-data#TOC-Restricting-symlink-traversal

Signed-off-by: Mattias Nissler &lt;mnissler@chromium.org&gt;
Signed-off-by: Ross Zwisler &lt;zwisler@google.com&gt;
Reviewed-by: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2020-06-04T20:54:34+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-06-04T20:54:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9ff7258575d5fee011649d20cc56de720a395191'/>
<id>urn:sha1:9ff7258575d5fee011649d20cc56de720a395191</id>
<content type='text'>
Pull proc updates from Eric Biederman:
 "This has four sets of changes:

   - modernize proc to support multiple private instances

   - ensure we see the exit of each process tid exactly

   - remove has_group_leader_pid

   - use pids not tasks in posix-cpu-timers lookup

  Alexey updated proc so each mount of proc uses a new superblock. This
  allows people to actually use mount options with proc with no fear of
  messing up another mount of proc. Given the kernel's internal mounts
  of proc for things like uml this was a real problem, and resulted in
  Android's hidepid mount options being ignored and introducing security
  issues.

  The rest of the changes are small cleanups and fixes that came out of
  my work to allow this change to proc. In essence it is swapping the
  pids in de_thread during exec which removes a special case the code
  had to handle. Then updating the code to stop handling that special
  case"

* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: proc_pid_ns takes super_block as an argument
  remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock
  posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type
  posix-cpu-timers: Extend rcu_read_lock removing task_struct references
  signal: Remove has_group_leader_pid
  exec: Remove BUG_ON(has_group_leader_pid)
  posix-cpu-timer:  Unify the now redundant code in lookup_task
  posix-cpu-timer: Tidy up group_leader logic in lookup_task
  proc: Ensure we see the exit of each process tid exactly once
  rculist: Add hlists_swap_heads_rcu
  proc: Use PIDTYPE_TGID in next_tgid
  Use proc_pid_ns() to get pid_namespace from the proc superblock
  proc: use named enums for better readability
  proc: use human-readable values for hidepid
  docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior
  proc: add option to mount only a pids subset
  proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option
  proc: allow to mount many instances of proc in one pid namespace
  proc: rename struct proc_fs_info to proc_fs_opts
</content>
</entry>
<entry>
<title>proc/mounts: add cursor</title>
<updated>2020-05-14T14:44:24+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2020-05-14T14:44:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9f6c61f96f2d97cbb5f7fa85607bc398f843ff0f'/>
<id>urn:sha1:9f6c61f96f2d97cbb5f7fa85607bc398f843ff0f</id>
<content type='text'>
If mounts are deleted after a read(2) call on /proc/self/mounts (or its
kin), the subsequent read(2) could miss a mount that comes after the
deleted one in the list.  This is because the file position is interpreted
as the number mount entries from the start of the list.

E.g. first read gets entries #0 to #9; the seq file index will be 10.  Then
entry #5 is deleted, resulting in #10 becoming #9 and #11 becoming #10,
etc...  The next read will continue from entry #10, and #9 is missed.

Solve this by adding a cursor entry for each open instance.  Taking the
global namespace_sem for write seems excessive, since we are only dealing
with a per-namespace list.  Instead add a per-namespace spinlock and use
that together with namespace_sem taken for read to protect against
concurrent modification of the mount list.  This may reduce parallelism of
is_local_mountpoint(), but it's hardly a big contention point.  We could
also use RCU freeing of cursors to make traversal not need additional
locks, if that turns out to be neceesary.

Only move the cursor once for each read (cursor is not added on open) to
minimize cacheline invalidation.  When EOF is reached, the cursor is taken
off the list, in order to prevent an excessive number of cursors due to
inactive open file descriptors.

Reported-by: Karel Zak &lt;kzak@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>proc: rename struct proc_fs_info to proc_fs_opts</title>
<updated>2020-04-22T15:51:21+00:00</updated>
<author>
<name>Alexey Gladkov</name>
<email>gladkov.alexey@gmail.com</email>
</author>
<published>2020-04-19T14:10:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1e88c420190be8f888015d9b357f16a85c81e865'/>
<id>urn:sha1:1e88c420190be8f888015d9b357f16a85c81e865</id>
<content type='text'>
Signed-off-by: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Reviewed-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>vfs: subtype handling moved to fuse</title>
<updated>2019-09-06T19:28:49+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2019-03-25T16:38:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c7eb6869632a5d33b41d0a00d683b8395392b7ee'/>
<id>urn:sha1:c7eb6869632a5d33b41d0a00d683b8395392b7ee</id>
<content type='text'>
The unused vfs code can be removed.  Don't pass empty subtype (same as if
-&gt;parse callback isn't called).

The bits that are left involve determining whether it's permitted to split the
filesystem type string passed in to mount(2).  Consequently, this means that we
cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string
with a dot in it always indicates a subtype specification.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
</feed>
