summaryrefslogtreecommitdiff
path: root/fs/namespace.c
AgeCommit message (Collapse)AuthorFilesLines
2020-03-14LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()Al Viro1-2/+2
New LOOKUP flag, telling path_lookupat() to act as path_mountpointat(). IOW, traverse mounts at the final point and skip revalidation of the location where it ends up. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-27follow_automount(): get rid of dead^Wstillborn codeAl Viro1-1/+8
1) no instances of ->d_automount() have ever made use of the "return ERR_PTR(-EISDIR) if you don't feel like mounting anything" - that's a rudiment of plans that got superseded before the thing went into the tree. Despite the comment in follow_automount(), autofs has never done that. 2) if there's no ->d_automount() in dentry_operations, filesystems should not set DCACHE_NEED_AUTOMOUNT in the first place. None have ever done so... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-27fix automount/automount race properlyAl Viro1-7/+33
Protection against automount/automount races (two threads hitting the same referral point at the same time) is based upon do_add_mount() prevention of identical overmounts - trying to overmount the root of mounted tree with the same tree fails with -EBUSY. It's unreliable (the other thread might've mounted something on top of the automount it has triggered) *and* causes no end of headache for follow_automount() and its caller, since finish_automount() behaves like do_new_mount() - if the mountpoint to be is overmounted, it mounts on top what's overmounting it. It's not only wrong (we want to go into what's overmounting the automount point and quietly discard what we planned to mount there), it introduces the possibility of original parent mount getting dropped. That's what 8aef18845266 (VFS: Fix vfsmount overput on simultaneous automount) deals with, but it can't do anything about the reliability of conflict detection - if something had been overmounted the other thread's automount (e.g. that other thread having stepped into automount in mount(2)), we don't get that -EBUSY and the result is referral point under automounted NFS under explicit overmount under another copy of automounted NFS What we need is finish_automount() *NOT* digging into overmounts - if it finds one, it should just quietly discard the thing it was asked to mount. And don't bother with actually crossing into the results of finish_automount() - the same loop that calls follow_automount() will do that just fine on the next iteration. IOW, instead of calling lock_mount() have finish_automount() do it manually, _without_ the "move into overmount and retry" part. And leave crossing into the results to the caller of follow_automount(), which simplifies it a lot. Moral: if you end up with a lot of glue working around the calling conventions of something, perhaps these calling conventions are simply wrong... Fixes: 8aef18845266 (VFS: Fix vfsmount overput on simultaneous automount) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-10do_add_mount(): lift lock_mount/unlock_mount into callersAl Viro1-23/+24
preparation to finish_automount() fix (next commit) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-04saner copy_mount_options()Al Viro1-42/+7
don't bother with the byte-by-byte loops, etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-05fs/namespace.c: make to_mnt_ns() staticEric Biggers1-1/+1
Make to_mnt_ns() static to address the following 'sparse' warning: fs/namespace.c:1731:22: warning: symbol 'to_mnt_ns' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20191209234830.156260-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-12init: use do_mount() instead of ksys_mount()Dominik Brodowski1-8/+2
In prepare_namespace(), do_mount() can be used instead of ksys_mount() as the first and third argument are const strings in the kernel, the second and fourth argument are passed through anyway, and the fifth argument is NULL. In do_mount_root(), ksys_mount() is called with the first and third argument being already kernelspace strings, which do not need to be copied over from userspace to kernelspace (again). The second and fourth arguments are passed through to do_mount() anyway. The fifth argument, while already residing in kernelspace, needs to be put into a page of its own. Then, do_mount() can be used instead of ksys_mount(). Once this is done, there are no in-kernel users to ksys_mount() left, which can therefore be removed. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2019-12-08Merge branch 'work.misc' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs cleanups from Al Viro: "No common topic, just three cleanups". * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make __d_alloc() static fs/namespace: add __user to open_tree and move_mount syscalls fs/fnctl: fix missing __user in fcntl_rw_hint()
2019-10-21fs/namespace: add __user to open_tree and move_mount syscallsBen Dooks1-3/+3
Thw open_tree and move_mount syscalls take names from the user, so add the __user to these to ensure the following warnings from sparse are fixed: fs/namespace.c:2392:35: warning: incorrect type in argument 2 (different address spaces) fs/namespace.c:2392:35: expected char const [noderef] <asn:1> *name fs/namespace.c:2392:35: got char const *filename fs/namespace.c:3541:38: warning: incorrect type in argument 2 (different address spaces) fs/namespace.c:3541:38: expected char const [noderef] <asn:1> *name fs/namespace.c:3541:38: got char const *from_pathname fs/namespace.c:3550:36: warning: incorrect type in argument 2 (different address spaces) fs/namespace.c:3550:36: expected char const [noderef] <asn:1> *name fs/namespace.c:3550:36: got char const *to_pathname Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-10-17fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()Eric Biggers1-8/+7
After do_add_mount() returns success, the caller doesn't hold a reference to the 'struct mount' anymore. So it's invalid to access it in mnt_warn_timestamp_expiry(). Fix it by calling mnt_warn_timestamp_expiry() before do_add_mount() rather than after, and adjusting the warning message accordingly. Reported-by: syzbot+da4f525235510683d855@syzkaller.appspotmail.com Fixes: f8b92ba67c5d ("mount: Add mount warning for impending timestamp expiry") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge more updates from Andrew Morton: - almost all of the rest of -mm - various other subsystems Subsystems affected by this patch series: memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork, cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise, cleanups, pagemap * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (77 commits) arch/sparc/include/asm/pgtable_64.h: fix build mm: treewide: clarify pgtable_page_{ctor,dtor}() naming ntfs: remove (un)?likely() from IS_ERR() conditions IB/hfi1: remove unlikely() from IS_ERR*() condition xfs: remove unlikely() from WARN_ON() condition wimax/i2400m: remove unlikely() from WARN*() condition fs: remove unlikely() from WARN_ON() condition xen/events: remove unlikely() from WARN() condition checkpatch: check for nested (un)?likely() calls hexagon: drop empty and unused free_initrd_mem mm: factor out common parts between MADV_COLD and MADV_PAGEOUT mm: introduce MADV_PAGEOUT mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM mm: introduce MADV_COLD mm: untag user pointers in mmap/munmap/mremap/brk vfio/type1: untag user pointers in vaddr_get_pfn tee/shm: untag user pointers in tee_shm_register media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get drm/radeon: untag user pointers in radeon_gem_userptr_ioctl drm/amdgpu: untag user pointers ...
2019-09-26fs/namespace: untag user pointers in copy_mount_optionsAndrey Konovalov1-1/+1
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. In copy_mount_options a user address is being subtracted from TASK_SIZE. If the address is lower than TASK_SIZE, the size is calculated to not allow the exact_copy_from_user() call to cross TASK_SIZE boundary. However if the address is tagged, then the size will be calculated incorrectly. Untag the address before subtracting. Link: http://lkml.kernel.org/r/1de225e4a54204bfd7f25dac2635e31aa4aa1d90.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25Merge tag 'fuse-update-5.4' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Continue separating the transport (user/kernel communication) and the filesystem layers of fuse. Getting rid of most layering violations will allow for easier cleanup and optimization later on. - Prepare for the addition of the virtio-fs filesystem. The actual filesystem will be introduced by a separate pull request. - Convert to new mount API. - Various fixes, optimizations and cleanups. * tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (55 commits) fuse: Make fuse_args_to_req static fuse: fix memleak in cuse_channel_open fuse: fix beyond-end-of-page access in fuse_parse_cache() fuse: unexport fuse_put_request fuse: kmemcg account fs data fuse: on 64-bit store time in d_fsdata directly fuse: fix missing unlock_page in fuse_writepage() fuse: reserve byteswapped init opcodes fuse: allow skipping control interface and forced unmount fuse: dissociate DESTROY from fuseblk fuse: delete dentry if timeout is zero fuse: separate fuse device allocation and installation in fuse_conn fuse: add fuse_iqueue_ops callbacks fuse: extract fuse_fill_super_common() fuse: export fuse_dequeue_forget() function fuse: export fuse_get_unique() fuse: export fuse_send_init_request() fuse: export fuse_len_args() fuse: export fuse_end_request() fuse: fix request limit ...
2019-09-19Merge tag 'y2038-vfs' of ↵Linus Torvalds1-1/+32
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull y2038 vfs updates from Arnd Bergmann: "Add inode timestamp clamping. This series from Deepa Dinamani adds a per-superblock minimum/maximum timestamp limit for a file system, and clamps timestamps as they are written, to avoid random behavior from integer overflow as well as having different time stamps on disk vs in memory. At mount time, a warning is now printed for any file system that can represent current timestamps but not future timestamps more than 30 years into the future, similar to the arbitrary 30 year limit that was added to settimeofday(). This was picked as a compromise to warn users to migrate to other file systems (e.g. ext4 instead of ext3) when they need the file system to survive beyond 2038 (or similar limits in other file systems), but not get in the way of normal usage" * tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: ext4: Reduce ext4 timestamp warnings isofs: Initialize filesystem timestamp ranges pstore: fs superblock limits fs: omfs: Initialize filesystem timestamp ranges fs: hpfs: Initialize filesystem timestamp ranges fs: ceph: Initialize filesystem timestamp ranges fs: sysv: Initialize filesystem timestamp ranges fs: affs: Initialize filesystem timestamp ranges fs: fat: Initialize filesystem timestamp ranges fs: cifs: Initialize filesystem timestamp ranges fs: nfs: Initialize filesystem timestamp ranges ext4: Initialize timestamps limits 9p: Fill min and max timestamps in sb fs: Fill in max and min timestamps in superblock utimes: Clamp the timestamps before update mount: Add mount warning for impending timestamp expiry timestamp_truncate: Replace users of timespec64_trunc vfs: Add timestamp_truncate() api vfs: Add file timestamp range support
2019-09-18Merge tag 'filelock-v5.4-1' of ↵Linus Torvalds1-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "Just a couple of minor bugfixes, a revision to a tracepoint to account for some earlier changes to the internals, and a patch to add a pr_warn message when someone tries to mount a filesystem with '-o mand' on a kernel that has that support disabled" * tag 'filelock-v5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: fix a memory leak bug in __break_lease() locks: print a warning when mount fails due to lack of "mand" support locks: Fix procfs output for file leases locks: revise generic_add_lease tracepoint
2019-09-18Merge branch 'work.namei' of ↵Linus Torvalds1-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs namei updates from Al Viro: "Pathwalk-related stuff" [ Audit-related cleanups, misc simplifications, and easier to follow nd->root refcounts - Linus ] * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: devpts_pty_kill(): don't bother with d_delete() infiniband: don't bother with d_delete() hypfs: don't bother with d_delete() fs/namei.c: keep track of nd->root refcount status fs/namei.c: new helper - legitimize_root() kill the last users of user_{path,lpath,path_dir}() namei.h: get the comments on LOOKUP_... in sync with reality kill LOOKUP_NO_EVAL, don't bother including namei.h from audit.h audit_inode(): switch to passing AUDIT_INODE_... filename_mountpoint(): make LOOKUP_NO_EVAL unconditional there filename_lookup(): audit_inode() argument is always 0
2019-09-06vfs: subtype handling moved to fuseDavid Howells1-2/+0
The unused vfs code can be removed. Don't pass empty subtype (same as if ->parse callback isn't called). The bits that are left involve determining whether it's permitted to split the filesystem type string passed in to mount(2). Consequently, this means that we cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string with a dot in it always indicates a subtype specification. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-08-31kill the last users of user_{path,lpath,path_dir}()Al Viro1-3/+5
old wrappers with few callers remaining; put them out of their misery... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-08-30mount: Add mount warning for impending timestamp expiryDeepa Dinamani1-1/+32
The warning reuses the uptime max of 30 years used by settimeofday(). Note that the warning is only emitted for writable filesystem mounts through the mount syscall. Automounts do not have the same warning. Print out the warning in human readable format using the struct tm. After discussion with Arnd Bergmann, we chose to print only the year number. The raw s_time_max is also displayed, and the user can easily decode it e.g. "date -u -d @$((0x7fffffff))". We did not want to consolidate struct rtc_tm and struct tm just to print the date using a format specifier as part of this series. Given that the rtc_tm is not compiled on all architectures, this is not a trivial patch. This can be added in the future. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-16locks: print a warning when mount fails due to lack of "mand" supportJeff Layton1-3/+8
Since 9e8925b67a ("locks: Allow disabling mandatory locking at compile time"), attempts to mount filesystems with "-o mand" will fail. Unfortunately, there is no other indiciation of the reason for the failure. Change how the function is defined for better readability. When CONFIG_MANDATORY_FILE_LOCKING is disabled, printk a warning when someone attempts to mount with -o mand. Also, add a blurb to the mandatory-locking.txt file to explain about the "mand" option, and the behavior one should expect when it is disabled. Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-07-26fix the struct mount leak in umount_tree()Al Viro1-2/+2
We need to drop everything we remove from the tree, whether mnt_has_parent() is true or not. Usually the bug manifests as a slow memory leak (leaked struct mount for initramfs); it becomes much more visible in mount_subtree() users, such as btrfs. There we leak a struct mount for btrfs superblock being mounted, which prevents fs shutdown on subsequent umount. Fixes: 56cbb429d911 ("switch the remnants of releasing the mountpoint away from fs_pin") Reported-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-22filename_mountpoint(): make LOOKUP_NO_EVAL unconditional thereAl Viro1-2/+0
user_path_mountpoint_at() always gets it and the reasons to have it there (i.e. in umount(2)) apply to kern_path_mountpoint() callers as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-20Merge branch 'work.dcache2' of ↵Linus Torvalds1-82/+77
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache and mountpoint updates from Al Viro: "Saner handling of refcounts to mountpoints. Transfer the counting reference from struct mount ->mnt_mountpoint over to struct mountpoint ->m_dentry. That allows us to get rid of the convoluted games with ordering of mount shutdowns. The cost is in teaching shrink_dcache_{parent,for_umount} to cope with mixed-filesystem shrink lists, which we'll also need for the Slab Movable Objects patchset" * 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch the remnants of releasing the mountpoint away from fs_pin get rid of detach_mnt() make struct mountpoint bear the dentry reference to mountpoint, not struct mount Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt() __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore nfs: dget_parent() never returns NULL ceph: don't open-code the check for dead lockref
2019-07-19Merge branch 'work.mount0' of ↵Linus Torvalds1-8/+7
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-17switch the remnants of releasing the mountpoint away from fs_pinAl Viro1-18/+19
We used to need rather convoluted ordering trickery to guarantee that dput() of ex-mountpoints happens before the final mntput() of the same. Since we don't need that anymore, there's no point playing with fs_pin for that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-17get rid of detach_mnt()Al Viro1-34/+28
Lift getting the original mount (dentry is actually not needed at all) of the mountpoint into the callers - to do_move_mount() and pivot_root() level. That simplifies the cleanup in those and allows to get saner arguments for attach_mnt_recursive(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-17make struct mountpoint bear the dentry reference to mountpoint, not struct mountAl Viro1-27/+25
Using dput_to_list() to shift the contributing reference from ->mnt_mountpoint to ->mnt_mp->m_dentry. Dentries are dropped (with dput_to_list()) as soon as struct mountpoint is destroyed; in cases where we are under namespace_sem we use the global list, shrinking it in namespace_unlock(). In case of detaching stuck MNT_LOCKed children at final mntput_no_expire() we use a local list and shrink it ourselves. ->mnt_ex_mountpoint crap is gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05mnt_init(): call shmem_init() unconditionallyAl Viro1-0/+2
No point having two call sites (earlier in init_rootfs() from mnt_init() in case we are going to use shmem-style rootfs, later from do_basic_setup() unconditionally), along with the logics in shmem_init() itself to make the second call a no-op... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05constify ksys_mount() string argumentsAl Viro1-2/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05don't bother with registering rootfsAl Viro1-6/+1
init_mount_tree() can get to rootfs_fs_type directly and that simplifies a lot of things. We don't need to register it, we don't need to look it up *and* we don't need to bother with preventing subsequent userland mounts. That's the way we should've done that from the very beginning. There is a user-visible change, namely the disappearance of "rootfs" from /proc/filesystems. Note that it's been unmountable all along and it didn't show up in /proc/mounts; however, it *is* a user-visible change and theoretically some script might've been using its presence in /proc/filesystems to tell 2.4.11+ from earlier kernels. *IF* any complaints about behaviour change do show up, we could fake it in /proc/filesystems. I very much doubt we'll have to, though. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()Al Viro1-4/+6
make unhash_mnt() return the mountpoint to be dropped, let callers deal with it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05__detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymoreAl Viro1-1/+1
... not since 1e9c75fb9c47 ("mnt: fix __detach_mounts infinite loop") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-01vfs: move_mount: reject moving kernel internal mountsEric Biggers1-3/+4
sys_move_mount() crashes by dereferencing the pointer MNT_NS_INTERNAL, a.k.a. ERR_PTR(-EINVAL), if the old mount is specified by fd for a kernel object with an internal mount, such as a pipe or memfd. Fix it by checking for this case and returning -EINVAL. [AV: what we want is is_mounted(); use that instead of making the condition even more convoluted] Reproducer: #include <unistd.h> #define __NR_move_mount 429 #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 int main() { int fds[2]; pipe(fds); syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH); } Reported-by: syzbot+6004acbaa1893ad013f0@syzkaller.appspotmail.com Fixes: 2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-06-18fs/namespace: fix unprivileged mount propagationChristian Brauner1-0/+1
When propagating mounts across mount namespaces owned by different user namespaces it is not possible anymore to move or umount the mount in the less privileged mount namespace. Here is a reproducer: sudo mount -t tmpfs tmpfs /mnt sudo --make-rshared /mnt # create unprivileged user + mount namespace and preserve propagation unshare -U -m --map-root --propagation=unchanged # now change back to the original mount namespace in another terminal: sudo mkdir /mnt/aaa sudo mount -t tmpfs tmpfs /mnt/aaa # now in the unprivileged user + mount namespace mount --move /mnt/aaa /opt Unfortunately, this is a pretty big deal for userspace since this is e.g. used to inject mounts into running unprivileged containers. So this regression really needs to go away rather quickly. The problem is that a recent change falsely locked the root of the newly added mounts by setting MNT_LOCKED. Fix this by only locking the mounts on copy_mnt_ns() and not when adding a new mount. Fixes: 3bd045cc9c4b ("separate copying and locking mount tree on cross-userns copies") Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Tested-by: Christian Brauner <christian@brauner.io> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-06-18vfs: fsmount: add missing mntget()Eric Biggers1-0/+1
sys_fsmount() needs to take a reference to the new mount when adding it to the anonymous mount namespace. Otherwise the filesystem can be unmounted while it's still in use, as found by syzkaller. Reported-by: Mark Rutland <mark.rutland@arm.com> Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209Thomas Gleixner1-1/+1
Based on 1 normalized pattern(s): released under gpl v2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 15 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528171438.895196075@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-26move mount_capable() further outAl Viro1-0/+2
Call graph of vfs_get_tree(): vfs_fsconfig_locked() # neither kernmount, nor submount do_new_mount() # neither kernmount, nor submount fc_mount() afs_mntpt_do_automount() # submount mount_one_hugetlbfs() # kernmount pid_ns_prepare_proc() # kernmount mq_create_mount() # kernmount vfs_kern_mount() simple_pin_fs() # kernmount vfs_submount() # submount kern_mount() # kernmount init_mount_tree() btrfs_mount() nfs_do_root_mount() The first two need the check (unconditionally). init_mount_tree() is setting rootfs up; any capability checks make zero sense for that one. And btrfs_mount()/ nfs_do_root_mount() have the checks already done in their callers. IOW, we can shift mount_capable() handling into the two callers - one in the normal case of mount(2), another - in fsconfig(2) handling of FSCONFIG_CMD_CREATE. I.e. the syscalls that set a new filesystem up. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-09do_move_mount(): fix an unsafe use of is_anon_ns()Al Viro1-1/+1
What triggers it is a race between mount --move and umount -l of the source; we should reject it (the source is parentless *and* not the root of anon namespace at that), but the check for namespace being an anon one is broken in that case - is_anon_ns() needs ns to be non-NULL. Better fixed here than in is_anon_ns(), since the rest of the callers is guaranteed to get a non-NULL argument... Reported-by: syzbot+494c7ddf66acac0ad747@syzkaller.appspotmail.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-03-21vfs: syscall: Add fsmount() to create a mount for a superblockDavid Howells1-3/+143
Provide a system call by which a filesystem opened with fsopen() and configured by a series of fsconfig() calls can have a detached mount object created for it. This mount object can then be attached to the VFS mount hierarchy using move_mount() by passing the returned file descriptor as the from directory fd. The system call looks like: int mfd = fsmount(int fsfd, unsigned int flags, unsigned int attr_flags); where fsfd is the file descriptor returned by fsopen(). flags can be 0 or FSMOUNT_CLOEXEC. attr_flags is a bitwise-OR of the following flags: MOUNT_ATTR_RDONLY Mount read-only MOUNT_ATTR_NOSUID Ignore suid and sgid bits MOUNT_ATTR_NODEV Disallow access to device special files MOUNT_ATTR_NOEXEC Disallow program execution MOUNT_ATTR__ATIME Setting on how atime should be updated MOUNT_ATTR_RELATIME - Update atime relative to mtime/ctime MOUNT_ATTR_NOATIME - Do not update access times MOUNT_ATTR_STRICTATIME - Always perform atime updates MOUNT_ATTR_NODIRATIME Do not update directory access times In the event that fsmount() fails, it may be possible to get an error message by calling read() on fsfd. If no message is available, ENODATA will be reported. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-api@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-03-21teach move_mount(2) to work with OPEN_TREE_CLONEDavid Howells1-7/+55
Allow a detached tree created by open_tree(..., OPEN_TREE_CLONE) to be attached by move_mount(2). If by the time of final fput() of OPEN_TREE_CLONE-opened file its tree is not detached anymore, it won't be dissolved. move_mount(2) is adjusted to handle detached source. That gives us equivalents of mount --bind and mount --rbind. Thanks also to Alan Jenkins <alan.christopher.jenkins@gmail.com> for providing a whole bunch of ways to break things using this interface. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-03-21vfs: syscall: Add move_mount(2) to move mounts aroundDavid Howells1-31/+95
Add a move_mount() system call that will move a mount from one place to another and, in the next commit, allow to attach an unattached mount tree. The new system call looks like the following: int move_mount(int from_dfd, const char *from_path, int to_dfd, const char *to_path, unsigned int flags); Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-api@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-03-21vfs: syscall: Add open_tree(2) to reference or clone a mountAl Viro1-22/+135
open_tree(dfd, pathname, flags) Returns an O_PATH-opened file descriptor or an error. dfd and pathname specify the location to open, in usual fashion (see e.g. fstatat(2)). flags should be an OR of some of the following: * AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW - same meanings as usual * OPEN_TREE_CLOEXEC - make the resulting descriptor close-on-exec * OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE - instead of opening the location in question, create a detached mount tree matching the subtree rooted at location specified by dfd/pathname. With AT_RECURSIVE the entire subtree is cloned, without it - only the part within in the mount containing the location in question. In other words, the same as mount --rbind or mount --bind would've taken. The detached tree will be dissolved on the final close of obtained file. Creation of such detached trees requires the same capabilities as doing mount --bind. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-api@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-03-13Merge branch 'work.mount' of ↵Linus Torvalds1-149/+246
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount infrastructure updates from Al Viro: "The rest of core infrastructure; no new syscalls in that pile, but the old parts are switched to new infrastructure. At that point conversions of individual filesystems can happen independently; some are done here (afs, cgroup, procfs, etc.), there's also a large series outside of that pile dealing with NFS (quite a bit of option-parsing stuff is getting used there - it's one of the most convoluted filesystems in terms of mount-related logics), but NFS bits are the next cycle fodder. It got seriously simplified since the last cycle; documentation is probably the weakest bit at the moment - I considered dropping the commit introducing Documentation/filesystems/mount_api.txt (cutting the size increase by quarter ;-), but decided that it would be better to fix it up after -rc1 instead. That pile allows to do followup work in independent branches, which should make life much easier for the next cycle. fs/super.c size increase is unpleasant; there's a followup series that allows to shrink it considerably, but I decided to leave that until the next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits) afs: Use fs_context to pass parameters over automount afs: Add fs_context support vfs: Add some logging to the core users of the fs_context log vfs: Implement logging through fs_context vfs: Provide documentation for new mount API vfs: Remove kern_mount_data() hugetlbfs: Convert to fs_context cpuset: Use fs_context kernfs, sysfs, cgroup, intel_rdt: Support fs_context cgroup: store a reference to cgroup_ns into cgroup_fs_context cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper cgroup_do_mount(): massage calling conventions cgroup: stash cgroup_root reference into cgroup_fs_context cgroup2: switch to option-by-option parsing cgroup1: switch to option-by-option parsing cgroup: take options parsing into ->parse_monolithic() cgroup: fold cgroup1_mount() into cgroup1_get_tree() cgroup: start switching to fs_context ipc: Convert mqueue fs to fs_context proc: Add fs_context support to procfs ...
2019-03-07Merge tag 'audit-pr-20190305' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "A lucky 13 audit patches for v5.1. Despite the rather large diffstat, most of the changes are from two bug fix patches that move code from one Kconfig option to another. Beyond that bit of churn, the remaining changes are largely cleanups and bug-fixes as we slowly march towards container auditing. It isn't all boring though, we do have a couple of new things: file capabilities v3 support, and expanded support for filtering on filesystems to solve problems with remote filesystems. All changes pass the audit-testsuite. Please merge for v5.1" * tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: mark expected switch fall-through audit: hide auditsc_get_stamp and audit_serial prototypes audit: join tty records to their syscall audit: remove audit_context when CONFIG_ AUDIT and not AUDITSYSCALL audit: remove unused actx param from audit_rule_match audit: ignore fcaps on umount audit: clean up AUDITSYSCALL prototypes and stubs audit: more filter PATH records keyed on filesystem magic audit: add support for fcaps v3 audit: move loginuid and sessionid from CONFIG_AUDITSYSCALL to CONFIG_AUDIT audit: add syscall information to CONFIG_CHANGE records audit: hand taken context to audit_kill_trees for syscall logging audit: give a clue what CONFIG_CHANGE op was involved
2019-03-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+1
Pull vfs fixes from Al Viro: "Assorted fixes that sat in -next for a while, all over the place" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: Fix locking in aio_poll() exec: Fix mem leak in kernel_read_file copy_mount_string: Limit string length to PATH_MAX cgroup: saner refcounting for cgroup_root fix cgroup_do_mount() handling of failure exits
2019-02-28vfs: Remove kern_mount_data()David Howells1-3/+3
The kern_mount_data() isn't used any more so remove it. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28vfs: Implement a filesystem superblock creation/configuration contextDavid Howells1-16/+9
[AV - unfuck kern_mount_data(); we want non-NULL ->mnt_ns on long-living mounts] [AV - reordering fs/namespace.c is badly overdue, but let's keep it separate from that series] [AV - drop simple_pin_fs() change] [AV - clean vfs_kern_mount() failure exits up] Implement a filesystem context concept to be used during superblock creation for mount and superblock reconfiguration for remount. The mounting procedure then becomes: (1) Allocate new fs_context context. (2) Configure the context. (3) Create superblock. (4) Query the superblock. (5) Create a mount for the superblock. (6) Destroy the context. Rather than calling fs_type->mount(), an fs_context struct is created and fs_type->init_fs_context() is called to set it up. Pointers exist for the filesystem and LSM to hang their private data off. A set of operations has to be set by ->init_fs_context() to provide freeing, duplication, option parsing, binary data parsing, validation, mounting and superblock filling. Legacy filesystems are supported by the provision of a set of legacy fs_context operations that build up a list of mount options and then invoke fs_type->mount() from within the fs_context ->get_tree() operation. This allows all filesystems to be accessed using fs_context. It should be noted that, whilst this patch adds a lot of lines of code, there is quite a bit of duplication with existing code that can be eliminated should all filesystems be converted over. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-25Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses"Linus Torvalds1-2/+0
This reverts commit 9da3f2b74054406f87dff7101a569217ffceb29b. It was well-intentioned, but wrong. Overriding the exception tables for instructions for random reasons is just wrong, and that is what the new code did. It caused problems for tracing, and it caused problems for strncpy_from_user(), because the new checks made perfectly valid use cases break, rather than catch things that did bad things. Unchecked user space accesses are a problem, but that's not a reason to add invalid checks that then people have to work around with silly flags (in this case, that 'kernel_uaccess_faults_ok' flag, which is just an odd way to say "this commit was wrong" and was sprinked into random places to hide the wrongness). The real fix to unchecked user space accesses is to get rid of the special "let's not check __get_user() and __put_user() at all" logic. Make __{get|put}_user() be just aliases to the regular {get|put}_user() functions, and make it impossible to access user space without having the proper checks in places. The raison d'être of the special double-underscore versions used to be that the range check was expensive, and if you did multiple user accesses, you'd do the range check up front (like the signal frame handling code, for example). But SMAP (on x86) and PAN (on ARM) have made that optimization pointless, because the _real_ expense is the "set CPU flag to allow user space access". Do let's not break the valid cases to catch invalid cases that shouldn't even exist. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01copy_mount_string: Limit string length to PATH_MAXChandan Rajendra1-1/+1
On ppc64le, When a string with PAGE_SIZE - 1 (i.e. 64k-1) length is passed as a "filesystem type" argument to the mount(2) syscall, copy_mount_string() ends up allocating 64k (the PAGE_SIZE on ppc64le) worth of space for holding the string in kernel's address space. Later, in set_precision() (invoked by get_fs_type() -> __request_module() -> vsnprintf()), we end up assigning strlen(fs-type-string) i.e. 65535 as the value to 'struct printf_spec'->precision member. This field has a width of 16 bits and it is a signed data type. Hence an invalid value ends up getting assigned. This causes the "WARN_ONCE(spec->precision != prec, "precision %d too large", prec)" statement inside set_precision() to be executed. This commit fixes the bug by limiting the length of the string passed by copy_mount_string() to strndup_user() to PATH_MAX. Signed-off-by: Chandan Rajendra <chandan@linux.ibm.com> Reported-by: Abdul Haleem <abdhalee@linux.ibm.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-31audit: ignore fcaps on umountRichard Guy Briggs1-0/+2
Don't fetch fcaps when umount2 is called to avoid a process hang while it waits for the missing resource to (possibly never) re-appear. Note the comment above user_path_mountpoint_at(): * A umount is a special case for path walking. We're not actually interested * in the inode in this situation, and ESTALE errors can be a problem. We * simply want track down the dentry and vfsmount attached at the mountpoint * and avoid revalidating the last component. This can happen on ceph, cifs, 9p, lustre, fuse (gluster) or NFS. Please see the github issue tracker https://github.com/linux-audit/audit-kernel/issues/100 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: merge fuzz in audit_log_fcaps()] Signed-off-by: Paul Moore <paul@paul-moore.com>