kernel/linux.git/fs/proc_namespace.c, branch v6.18.21

->mnt_devname is never NULL

2025-05-23T12:20:44+00:00

Not since 8f2918898eb5 "new helpers: vfs_create_mount(), fc_mount()" back in 2018. Get rid of the dead checks... Signed-off-by: Al Viro Link: https://lore.kernel.org/20250421033509.GV2023217@ZenIV Signed-off-by: Christian Brauner

fs: rename show_mnt_opts -> show_vfsmnt_opts

2024-06-28T12:36:43+00:00

This name is more consistent with what the helper does, which is to just show the vfsmount options. Signed-off-by: Josef Bacik Link: https://lore.kernel.org/r/fb363c62ffbf78a18095d596a19b8412aa991251.1719257716.git.josef@toxicpanda.com Reviewed-by: Jeff Layton Signed-off-by: Christian Brauner

namespace: extract show_path() helper

2023-11-18T13:56:16+00:00

To be used by the statmount(2) syscall as well. Signed-off-by: Miklos Szeredi Link: https://lore.kernel.org/r/20231025140205.3586473-4-mszeredi@redhat.com Reviewed-by: Ian Kent Signed-off-by: Christian Brauner

mounts: keep list of mounts in an rbtree

2023-11-18T13:56:16+00:00

When adding a mount to a namespace insert it into an rbtree rooted in the mnt_namespace instead of a linear list. The mnt.mnt_list is still used to set up the mount tree and for propagation, but not after the mount has been added to a namespace. Hence mnt_list can live in union with rb_node. Use MNT_ONRB mount flag to validate that the mount is on the correct list. This allows removing the cursor used for reading /proc/$PID/mountinfo. The mnt_id_unique of the next mount can be used as an index into the seq file. Tested by inserting 100k bind mounts, unsharing the mount namespace, and unmounting. No performance regressions have been observed. For the last mount in the 100k list the statmount() call was more than 100x faster due to the mount ID lookup not having to do a linear search. This patch makes the overhead of mount ID lookup non-observable in this range. Signed-off-by: Miklos Szeredi Link: https://lore.kernel.org/r/20231025140205.3586473-3-mszeredi@redhat.com Reviewed-by: Ian Kent Signed-off-by: Christian Brauner

tty, proc, kernfs, random: Use copy_splice_read()

2023-05-24T14:42:16+00:00

Use copy_splice_read() for tty, procfs, kernfs and random files rather than going through generic_file_splice_read() as they just copy the file into the output buffer and don't splice pages. This avoids the need for them to have a ->read_folio() to satisfy filemap_splice_read(). Signed-off-by: David Howells Acked-by: Greg Kroah-Hartman cc: Christoph Hellwig cc: Jens Axboe cc: Al Viro cc: John Hubbard cc: David Hildenbrand cc: Matthew Wilcox cc: Miklos Szeredi cc: Arnd Bergmann cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20230522135018.2742245-13-dhowells@redhat.com Signed-off-by: Jens Axboe

vfs: escape hash as well

2022-06-28T17:58:05+00:00

When a filesystem is mounted with a name that starts with a #: # mount '#name' /mnt/bad -t tmpfs this will cause the entry to look like this (leading space added so that git does not strip it out): #name /mnt/bad tmpfs rw,seclabel,relatime,inode64 0 0 This breaks getmntent and any code that aims to parse fstab as well as /proc/mounts with the same logic since they need to strip leading spaces or skip over comment lines, due to which they report incorrect output or skip over the line respectively. Solve this by translating the hash character into its octal encoding equivalent so that applications can decode the name correctly. Signed-off-by: Siddhesh Poyarekar Signed-off-by: Ian Kent Signed-off-by: Al Viro

fs: add is_idmapped_mnt() helper

2021-12-03T17:44:06+00:00

Multiple places open-code the same check to determine whether a given mount is idmapped. Introduce a simple helper function that can be used instead. This allows us to get rid of the fragile open-coding. We will later change the check that is used to determine whether a given mount is idmapped. Introducing a helper allows us to do this in a single place instead of doing it for multiple places. Link: https://lore.kernel.org/r/20211123114227.3124056-2-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-2-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-2-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner

fs: introduce MOUNT_ATTR_IDMAP

2021-01-24T13:43:45+00:00

Introduce a new mount bind mount property to allow idmapping mounts. The MOUNT_ATTR_IDMAP flag can be set via the new mount_setattr() syscall together with a file descriptor referring to a user namespace. The user namespace referenced by the namespace file descriptor will be attached to the bind mount. All interactions with the filesystem going through that mount will be mapped according to the mapping specified in the user namespace attached to it. Using user namespaces to mark mounts means we can reuse all the existing infrastructure in the kernel that already exists to handle idmappings and can also use this for permission checking to allow unprivileged user to create idmapped mounts in the future. Idmapping a mount is decoupled from the caller's user and mount namespace. This means idmapped mounts can be created in the initial user namespace which is an important use-case for systemd-homed, portable usb-sticks between systems, sharing data between the initial user namespace and unprivileged containers, and other use-cases that have been brought up. For example, assume a home directory where all files are owned by uid and gid 1000 and the home directory is brought to a new laptop where the user has id 12345. The system administrator can simply create a mount of this home directory with a mapping of 1000:12345:1 and other mappings to indicate the ids should be kept. (With this it is e.g. also possible to create idmapped mounts on the host with an identity mapping 1:1:100000 where the root user is not mapped. A user with root access that e.g. has been pivot rooted into such a mount on the host will be not be able to execute, read, write, or create files as root.) Given that mapping a mount is decoupled from the caller's user namespace a sufficiently privileged process such as a container manager can set up an idmapped mount for the container and the container can simply pivot root to it. There's no need for the container to do anything. The mount will appear correctly mapped independent of the user namespace the container uses. This means we don't need to mark a mount as idmappable. In order to create an idmapped mount the caller must currently be privileged in the user namespace of the superblock the mount belongs to. Once a mount has been idmapped we don't allow it to change its mapping. This keeps permission checking and life-cycle management simple. Users wanting to change the idmapped can always create a new detached mount with a different idmapping. Link: https://lore.kernel.org/r/20210121131959.646623-36-christian.brauner@ubuntu.com Cc: Christoph Hellwig Cc: David Howells Cc: Mauricio Vásquez Bernal Cc: Al Viro Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig Signed-off-by: Christian Brauner

proc mountinfo: make splice available again

2020-12-27T20:00:36+00:00

Since commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") we've required that file operation structures explicitly enable splice support, rather than falling back to the default handlers. Most /proc files use the indirect 'struct proc_ops' to describe their file operations, and were fixed up to support splice earlier in commits 40be821d627c..b24c30c67863, but the mountinfo files interact with the VFS directly using their own 'struct file_operations' and got missed as a result. This adds the necessary support for splice to work for /proc/*/mountinfo and friends. Reported-by: Joan Bruguera Micó Reported-by: Jussi Kivilinna Link: https://bugzilla.kernel.org/show_bug.cgi?id=209971 Cc: Greg Kroah-Hartman Cc: Christoph Hellwig Signed-off-by: Linus Torvalds

Add a "nosymfollow" mount option.

2020-08-27T20:06:47+00:00

For mounts that have the new "nosymfollow" option, don't follow symlinks when resolving paths. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options, as well as to the LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD variants have been supporting the "nosymfollow" mount option for a long time with equivalent implementations. Note that symlinks may still be created on file systems mounted with the "nosymfollow" option present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly. Setting the "nosymfollow" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nosymfollow" option is thus useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts. More information on the history and motivation for this patch can be found here: https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/hardening-against-malicious-stateful-data#TOC-Restricting-symlink-traversal Signed-off-by: Mattias Nissler Signed-off-by: Ross Zwisler Reviewed-by: Aleksa Sarai Signed-off-by: Al Viro