summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-02-16io_uring: tctx->task_lock should be IRQ safeJens Axboe1-6/+7
We add task_work from any context, hence we need to ensure that we can tolerate it being from IRQ context as well. Fixes: 7cbf1722d5fc ("io_uring: provide FIFO ordering for task_work") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-16ceph: defer flushing the capsnap if the Fb is usedXiubo Li2-13/+30
If the Fb cap is used it means the current inode is flushing the dirty data to OSD, just defer flushing the capsnap. URL: https://tracker.ceph.com/issues/48640 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-02-16ceph: allow queueing cap/snap handling after putting cap referencesJeff Layton4-8/+48
Testing with the fscache overhaul has triggered some lockdep warnings about circular lock dependencies involving page_mkwrite and the mmap_lock. It'd be better to do the "real work" without the mmap lock being held. Change the skip_checking_caps parameter in __ceph_put_cap_refs to an enum, and use that to determine whether to queue check_caps, do it synchronously or not at all. Change ceph_page_mkwrite to do a ceph_put_cap_refs_async(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-02-16ceph: clean up inode work queueingJeff Layton2-52/+24
Add a generic function for taking an inode reference, setting the I_WORK bit and queueing i_work. Turn the ceph_queue_* functions into static inline wrappers that pass in the right bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-02-16ceph: fix flush_snap logic after putting capsJeff Layton1-4/+6
A primary reason for skipping ceph_check_caps after putting the references was to avoid the locking in ceph_check_caps during a reconnect. __ceph_put_cap_refs can still call ceph_flush_snaps in that case though, and that takes many of the same inconvenient locks. Fix the logic in __ceph_put_cap_refs to skip flushing snaps when the skip_checking_caps flag is set. Fixes: e64f44a88465 ("ceph: skip checking caps when session reconnecting and releasing reqs") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-02-16kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTOREChris Wilson1-2/+2
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category. Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store. Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp. References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> # DRM depends on kcmp Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> # systemd uses kcmp Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210205220012.1983-1-chris@chris-wilson.co.uk
2021-02-16zonefs: add tracepoints for file operationsJohannes Thumshirn3-0/+113
Add tracepoints for file I/O operations to aid in debugging of I/O errors with zonefs. The added tracepoints are in: - zonefs_zone_mgmt() for tracing zone management operations - zonefs_iomap_begin() for tracing regular file I/O - zonefs_file_dio_append() for tracing zone-append operations Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
2021-02-15proc: don't allow async path resolution of /proc/thread-self componentsJens Axboe2-1/+8
If this is attempted by an io-wq kthread, then return -EOPNOTSUPP as we don't currently support that. Once we can get task_pid_ptr() doing the right thing, then this can go away again. Use PF_IO_WORKER for this to speciically target the io_uring workers. Modify the /proc/self/ check to use PF_IO_WORKER as well. Cc: stable@vger.kernel.org Fixes: 8d4c3e76e3be ("proc: don't allow async path resolution of /proc/self components") Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-15binfmt_misc: pass binfmt_misc flags to the interpreterLaurent Vivier3-3/+11
It can be useful to the interpreter to know which flags are in use. For instance, knowing if the preserve-argv[0] is in use would allow to skip the pathname argument. This patch uses an unused auxiliary vector, AT_FLAGS, to add a flag to inform interpreter if the preserve-argv[0] is enabled. Note by Helge Deller: The real-world user of this patch is qemu-user, which needs to know if it has to preserve the argv[0]. See Debian bug #970460. Signed-off-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: YunQiang Su <ysu@wavecomp.com> URL: http://bugs.debian.org/970460 Signed-off-by: Helge Deller <deller@gmx.de>
2021-02-15smb3: negotiate current dialect (SMB3.1.1) when version 3 or greater requestedSteve French2-7/+15
SMB3.1.1 is the newest, and preferred dialect, and is included in the requested dialect list by default (ie if no vers= is specified on mount) but it should also be requested if SMB3 or later is requested (vers=3 instead of a specific dialect: vers=2.1, vers=3.02 or vers=3.0). Currently specifying "vers=3" only requests smb3.0 and smb3.02 but this patch fixes it to also request smb3.1.1 dialect, as it is the newest and most secure dialect and is a "version 3 or later" dialect (the intent of "vers=3"). Signed-off-by: Steve French <stfrench@microsoft.com> Suggested-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-15nfsd: register pernet ops last, unregister firstJ. Bruce Fields1-7/+7
These pernet operations may depend on stuff set up or torn down in the module init/exit functions. And they may be called at any time in between. So it makes more sense for them to be the last to be registered in the init function, and the first to be unregistered in the exit function. In particular, without this, the drc slab is being destroyed before all the per-net drcs are shut down, resulting in an "Objects remaining in nfsd_drc on __kmem_cache_shutdown()" warning in exit_nfsd. Reported-by: Zhi Li <yieli@redhat.com> Fixes: 3ba75830ce17 "nfsd4: drc containerization" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-02-14ubifs: Fix error return code in alloc_wbufs()Wang ShaoBo1-1/+3
Fix to return PTR_ERR() error code from the error handling case instead fo 0 in function alloc_wbufs(), as done elsewhere in this function. Fixes: 6a98bc4614de ("ubifs: Add authentication nodes to journal") Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-13Merge tag 'for-5.11-rc7-tag' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A regression fix caused by a refactoring in 5.11. A corrupted superblock wouldn't be detected by checksum verification due to wrongly placed initialization of the checksum length, thus making memcmp always work" * tag 'for-5.11-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: initialize fs_info::csum_size earlier in open_ctree
2021-02-13s390,alpha: switch to 64-bit ino_tHeiko Carstens1-1/+4
s390 and alpha are the only 64 bit architectures with a 32-bit ino_t. Since this is quite unusual this causes bugs from time to time. See e.g. commit ebce3eb2f7ef ("ceph: fix inode number handling on arches with 32-bit ino_t") for an example. This (obviously) also prevents s390 and alpha to use 64-bit ino_t for tmpfs. See commit b85a7a8bb573 ("tmpfs: disallow CONFIG_TMPFS_INODE64 on s390"). Therefore switch both s390 and alpha to 64-bit ino_t. This should only have an effect on the ustat system call. To prevent ABI breakage define struct ustat compatible to the old layout and change sys_ustat() accordingly. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13io_uring: kill cached requests from exiting task closing the ringJens Axboe1-1/+3
Be nice and prune these upfront, in case the ring is being shared and one of the tasks is going away. This is a bit more important now that we account the allocations. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-13io_uring: add helper to free all request cachesJens Axboe1-7/+19
We have three different ones, put it in a helper for easy calling. This is in preparation for doing it outside of ring freeing as well. With that in mind, also ensure that we do the proper locking for safe calling from a context where the ring it still live. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-13io_uring: allow task match to be passed to io_req_cache_free()Jens Axboe1-6/+7
No changes in this patch, just allows a caller to pass in a targeted task that we must match for freeing requests in the cache. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-13Merge tag '5.11-rc7-smb3-github' of git://github.com/smfrench/smb3-kernelLinus Torvalds4-4/+28
Pull cifs fixes from Steve French: "Four small smb3 fixes to the new mount API (including a particularly important one for DFS links). These were found in testing this week of additional DFS scenarios, and a user testing of an apache container problem" * tag '5.11-rc7-smb3-github' of git://github.com/smfrench/smb3-kernel: cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath. cifs: In the new mount api we get the full devname as source= cifs: do not disable noperm if multiuser mount option is not provided cifs: fix dfs-links
2021-02-13f2fs: give a warning only for readonly partitionJaegeuk Kim1-5/+3
Let's allow mounting readonly partition. We're able to recovery later once we have it as read-write back. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-13io-wq: clear out worker ->fs and ->filesJens Axboe1-8/+6
By default, kernel threads have init_fs and init_files assigned. In the past, this has triggered security problems, as commands that don't ask for (and hence don't get assigned) fs/files from the originating task can then attempt path resolution etc with access to parts of the system they should not be able to. Rather than add checks in the fs code for misuse, just set these to NULL. If we do attempt to use them, then the resulting code will oops rather than provide access to something that it should not permit. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12jffs2: check the validity of dstlen in jffs2_zlib_compress()Yang Yang1-0/+3
KASAN reports a BUG when download file in jffs2 filesystem.It is because when dstlen == 1, cpage_out will write array out of bounds. Actually, data will not be compressed in jffs2_zlib_compress() if data's length less than 4. [ 393.799778] BUG: KASAN: slab-out-of-bounds in jffs2_rtime_compress+0x214/0x2f0 at addr ffff800062e3b281 [ 393.809166] Write of size 1 by task tftp/2918 [ 393.813526] CPU: 3 PID: 2918 Comm: tftp Tainted: G B 4.9.115-rt93-EMBSYS-CGEL-6.1.R6-dirty #1 [ 393.823173] Hardware name: LS1043A RDB Board (DT) [ 393.827870] Call trace: [ 393.830322] [<ffff20000808c700>] dump_backtrace+0x0/0x2f0 [ 393.835721] [<ffff20000808ca04>] show_stack+0x14/0x20 [ 393.840774] [<ffff2000086ef700>] dump_stack+0x90/0xb0 [ 393.845829] [<ffff20000827b19c>] kasan_object_err+0x24/0x80 [ 393.851402] [<ffff20000827b404>] kasan_report_error+0x1b4/0x4d8 [ 393.857323] [<ffff20000827bae8>] kasan_report+0x38/0x40 [ 393.862548] [<ffff200008279d44>] __asan_store1+0x4c/0x58 [ 393.867859] [<ffff2000084ce2ec>] jffs2_rtime_compress+0x214/0x2f0 [ 393.873955] [<ffff2000084bb3b0>] jffs2_selected_compress+0x178/0x2a0 [ 393.880308] [<ffff2000084bb530>] jffs2_compress+0x58/0x478 [ 393.885796] [<ffff2000084c5b34>] jffs2_write_inode_range+0x13c/0x450 [ 393.892150] [<ffff2000084be0b8>] jffs2_write_end+0x2a8/0x4a0 [ 393.897811] [<ffff2000081f3008>] generic_perform_write+0x1c0/0x280 [ 393.903990] [<ffff2000081f5074>] __generic_file_write_iter+0x1c4/0x228 [ 393.910517] [<ffff2000081f5210>] generic_file_write_iter+0x138/0x288 [ 393.916870] [<ffff20000829ec1c>] __vfs_write+0x1b4/0x238 [ 393.922181] [<ffff20000829ff00>] vfs_write+0xd0/0x238 [ 393.927232] [<ffff2000082a1ba8>] SyS_write+0xa0/0x110 [ 393.932283] [<ffff20000808429c>] __sys_trace_return+0x0/0x4 [ 393.937851] Object at ffff800062e3b280, in cache kmalloc-64 size: 64 [ 393.944197] Allocated: [ 393.946552] PID = 2918 [ 393.948913] save_stack_trace_tsk+0x0/0x220 [ 393.953096] save_stack_trace+0x18/0x20 [ 393.956932] kasan_kmalloc+0xd8/0x188 [ 393.960594] __kmalloc+0x144/0x238 [ 393.963994] jffs2_selected_compress+0x48/0x2a0 [ 393.968524] jffs2_compress+0x58/0x478 [ 393.972273] jffs2_write_inode_range+0x13c/0x450 [ 393.976889] jffs2_write_end+0x2a8/0x4a0 [ 393.980810] generic_perform_write+0x1c0/0x280 [ 393.985251] __generic_file_write_iter+0x1c4/0x228 [ 393.990040] generic_file_write_iter+0x138/0x288 [ 393.994655] __vfs_write+0x1b4/0x238 [ 393.998228] vfs_write+0xd0/0x238 [ 394.001543] SyS_write+0xa0/0x110 [ 394.004856] __sys_trace_return+0x0/0x4 [ 394.008684] Freed: [ 394.010691] PID = 2918 [ 394.013051] save_stack_trace_tsk+0x0/0x220 [ 394.017233] save_stack_trace+0x18/0x20 [ 394.021069] kasan_slab_free+0x88/0x188 [ 394.024902] kfree+0x6c/0x1d8 [ 394.027868] jffs2_sum_write_sumnode+0x2c4/0x880 [ 394.032486] jffs2_do_reserve_space+0x198/0x598 [ 394.037016] jffs2_reserve_space+0x3f8/0x4d8 [ 394.041286] jffs2_write_inode_range+0xf0/0x450 [ 394.045816] jffs2_write_end+0x2a8/0x4a0 [ 394.049737] generic_perform_write+0x1c0/0x280 [ 394.054179] __generic_file_write_iter+0x1c4/0x228 [ 394.058968] generic_file_write_iter+0x138/0x288 [ 394.063583] __vfs_write+0x1b4/0x238 [ 394.067157] vfs_write+0xd0/0x238 [ 394.070470] SyS_write+0xa0/0x110 [ 394.073783] __sys_trace_return+0x0/0x4 [ 394.077612] Memory state around the buggy address: [ 394.082404] ffff800062e3b180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 394.089623] ffff800062e3b200: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 394.096842] >ffff800062e3b280: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 394.104056] ^ [ 394.107283] ffff800062e3b300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 394.114502] ffff800062e3b380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 394.121718] ================================================================== Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubifs: Fix off-by-one errorSascha Hauer2-2/+2
An inode is allowed to have ubifs_xattr_max_cnt() xattrs, so we must complain only when an inode has more xattrs, having exactly ubifs_xattr_max_cnt() xattrs is fine. With this the maximum number of xattrs can be created without hitting the "has too many xattrs" warning when removing it. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubifs: replay: Fix high stack usage, againArnd Bergmann1-1/+3
An earlier commit moved out some functions to not be inlined by gcc, but after some other rework to remove one of those, clang started inlining the other one and ran into the same problem as gcc did before: fs/ubifs/replay.c:1174:5: error: stack frame size of 1152 bytes in function 'ubifs_replay_journal' [-Werror,-Wframe-larger-than=] Mark the function as noinline_for_stack to ensure it doesn't happen again. Fixes: f80df3851246 ("ubifs: use crypto_shash_tfm_digest()") Fixes: eb66eff6636d ("ubifs: replay: Fix high stack usage") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubifs: Fix memleak in ubifs_init_authenticationDinghao Liu1-1/+1
When crypto_shash_digestsize() fails, c->hmac_tfm has not been freed before returning, which leads to memleak. Fixes: 49525e5eecca5 ("ubifs: Add helper functions for authentication support") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12jffs2: fix use after free in jffs2_sum_write_data()Tom Rix1-0/+3
clang static analysis reports this problem fs/jffs2/summary.c:794:31: warning: Use of memory after it is freed c->summary->sum_list_head = temp->u.next; ^~~~~~~~~~~~ In jffs2_sum_write_data(), in a loop summary data is handles a node at a time. When it has written out the node it is removed the summary list, and the node is deleted. In the corner case when a JFFS2_FEATURE_RWCOMPAT_COPY is seen, a call is made to jffs2_sum_disable_collecting(). jffs2_sum_disable_collecting() deletes the whole list which conflicts with the loop's deleting the list by parts. To preserve the old behavior of stopping the write midway, bail out of the loop after disabling summary collection. Fixes: 6171586a7ae5 ("[JFFS2] Correct handling of JFFS2_FEATURE_RWCOMPAT_COPY nodes.") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: hostfs: use a kmem cache for inodesJohannes Berg1-2/+8
This collects all of them together and makes it possible to e.g. exclude it from slub debugging. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12Merge tag 'io_uring-5.11-2021-02-12' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+4
Pull io_uring fix from Jens Axboe: "Revert of a patch from this release that caused a regression" * tag 'io_uring-5.11-2021-02-12' of git://git.kernel.dk/linux-block: Revert "io_uring: don't take fs for recvmsg/sendmsg"
2021-02-12io_uring: optimise io_init_req() flags settingPavel Begunkov1-10/+7
Invalid req->flags are tolerated by free/put well, avoid this dancing needlessly presetting it to zero, and then not even resetting but modifying it, i.e. "|=". Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12io_uring: clean io_req_find_next() fast checkPavel Begunkov1-1/+1
Indirectly io_req_find_next() is called for every request, optimise the check by testing flags as it was long before -- __io_req_find_next() tolerates false-positives well (i.e. link==NULL), and those should be really rare. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12io_uring: don't check PF_EXITING from syscallPavel Begunkov1-3/+2
io_sq_thread_acquire_mm_files() can find a PF_EXITING task only when it's called from task_work context. Don't check it in all other cases, that are when we're in io_uring_enter(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12Merge tag 'kvmarm-5.12' of ↵Paolo Bonzini37-303/+474
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.12 - Make the nVHE EL2 object relocatable, resulting in much more maintainable code - Handle concurrent translation faults hitting the same page in a more elegant way - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Allow the disabling of symbol export from assembly code - Simplification of the early init hypercall handling
2021-02-12btrfs: initialize fs_info::csum_size earlier in open_ctreeSu Yue1-1/+2
User reported that btrfs-progs misc-tests/028-superblock-recover fails: [TEST/misc] 028-superblock-recover unexpected success: mounted fs with corrupted superblock test failed for case 028-superblock-recover The test case expects that a broken image with bad superblock will be rejected to be mounted. However, the test image just passed csum check of superblock and was successfully mounted. Commit 55fc29bed8dd ("btrfs: use cached value of fs_info::csum_size everywhere") replaces all calls to btrfs_super_csum_size by fs_info::csum_size. The calls include the place where fs_info->csum_size is not initialized. So btrfs_check_super_csum() passes because memcmp() with len 0 always returns 0. Fix it by caching csum size in btrfs_fs_info::csum_size once we know the csum type in superblock is valid in open_ctree(). Link: https://github.com/kdave/btrfs-progs/issues/250 Fixes: 55fc29bed8dd ("btrfs: use cached value of fs_info::csum_size everywhere") Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-12io_uring: don't split out consume out of SQE getPavel Begunkov1-12/+6
Remove io_consume_sqe() and inline it back into io_get_sqe(). It requires req dealloc on error, but in exchange we get cleaner io_submit_sqes() and better locality for cached_sq_head. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12io_uring: save ctx put/get for task_work submitPavel Begunkov1-5/+12
Do a little trick in io_ring_ctx_free() briefly taking uring_lock, that will wait for everyone currently holding it, so we can skip pinning ctx with ctx->refs for __io_req_task_submit(), which is executed and loses its refs/reqs while holding the lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12io_uring: don't duplicate io_req_task_queue()Pavel Begunkov1-7/+1
Don't hand code io_req_task_queue() inside of io_async_buf_func(), just call it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12io_uring: optimise SQPOLL mm/files grabbingPavel Begunkov1-14/+13
There are two reasons for this. First is to optimise io_sq_thread_acquire_mm_files() for non-SQPOLL case, which currently do too many checks and function calls in the hot path, e.g. in io_init_req(). The second is to not grab mm/files when there are not needed. As __io_queue_sqe() issues only one request now, we can reuse io_sq_thread_acquire_mm_files() instead of unconditional acquire mm/files. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12io_uring: optimise out unlikely link queuePavel Begunkov1-32/+10
__io_queue_sqe() tries to issue as much requests of a link as it can, and uses io_put_req_find_next() to extract a next one, targeting inline completed requests. As now __io_queue_sqe() is always used together with struct io_comp_state, it leaves next propagation only a small window and only for async reqs, that doesn't justify its existence. Remove it, make __io_queue_sqe() to issue only a head request. It simplifies the code and will allow other optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12io_uring: take compl state from submit statePavel Begunkov1-6/+3
Completion and submission states are now coupled together, it's weird to get one from argument and another from ctx, do it consistently for io_req_free_batch(). It's also faster as we already have @state cached in registers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12ext4: add .kunitconfig fragment to enable ext4-specific testsDaniel Latypov1-0/+3
As of [1], we no longer want EXT4_KUNIT_TESTS and others to `select` their deps. This means it can get harder to get all the right things selected as we gain more tests w/ more deps over time. This patch (and [2]) proposes we store kunitconfig fragments in-tree to represent sets of tests. (N.B. right now we only have one ext4 test). There's still a discussion to be had about how to have a hierarchy of these files (e.g. if one wanted to test all of fs/, not just fs/ext4). But this fragment would likely be a leaf node and isn't blocked on deciding if we want `import` statements and the like. Usage ===== Before [2] (on its way to being merged): $ cp fs/ext4/.kunitconfig .kunit/ $ ./tools/testing/kunit/kunit.py run After [2]: $ ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig ".kunitconfig" vs "kunitconfig" =============================== See also: commit 14ee5cfd4512 ("kunit: Rename 'kunitconfig' to '.kunitconfig'"). * The bit about .gitignore exluding it by default is now a con, however. * But there are a lot of directories with files that begin with "k" and so this could cause some annoyance w/ tab completion* * This is the name kunit.py expects right now, so some people are used to .kunitconfig over "kunitconfig" [1] https://lore.kernel.org/linux-ext4/20210122110234.2825685-1-geert@linux-m68k.org/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/commit/?h=kunit&id=243180f5924ed27ea417db39feb7f9691777688e * 372/5556 directories isn't too much, but still not a small number: $ find -type f -name 'k*' | xargs dirname | sort -u | wc -l 372 Signed-off-by: Daniel Latypov <dlatypov@google.com> Link: https://lore.kernel.org/r/20210210013206.136227-1-dlatypov@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-02-12ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting itGeert Uytterhoeven1-2/+1
EXT4_KUNIT_TESTS selects EXT4_FS, thus enabling an optional feature the user may not want to enable. Fix this by making the test depend on EXT4_FS instead. Fixes: 1cbeab1b242d16fd ("ext4: add kunit test for decoding extended timestamps") Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20210122110234.2825685-1-geert@linux-m68k.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-02-11io_uring: inline io_complete_rw_common()Pavel Begunkov1-17/+9
__io_complete_rw() casts request to kiocb for it to be immediately container_of()'ed by io_complete_rw_common(). And the last function's name doesn't do a great job of illuminating its purposes, so just inline it in its only user. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-11io_uring: move res check out of io_rw_reissue()Pavel Begunkov1-9/+8
We pass return code into io_rw_reissue() only to be able to check if it's -EAGAIN. That's not the cleanest approach and may prevent inlining of the non-EAGAIN fast path, so do it at call sites. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-11io_uring: simplify iopoll reissuingPavel Begunkov1-21/+5
Don't stash -EAGAIN'ed iopoll requests into a list to reissue it later, do it eagerly. It removes overhead on keeping and checking that list, and allows in case of failure for these requests to be completed through normal iopoll completion path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-11io_uring: clean up io_req_free_batch_finish()Pavel Begunkov1-3/+1
io_req_free_batch_finish() is final and does not permit struct req_batch to be reused without re-init. To be more consistent don't clear ->task there. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-11io_uring: move submit side state closer in the ringJens Axboe1-7/+10
We recently added the submit side req cache, but it was placed at the end of the struct. Move it near the other submission state for better memory placement, and reshuffle a few other members at the same time. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-11fs/jfs: fix potential integer overflow on shift of a intColin Ian King1-1/+1
The left shift of int 32 bit integer constant 1 is evaluated using 32 bit arithmetic and then assigned to a signed 64 bit integer. In the case where l2nb is 32 or more this can lead to an overflow. Avoid this by shifting the value 1LL instead. Addresses-Coverity: ("Uninitentional integer overflow") Fixes: b40c2e665cd5 ("fs/jfs: TRIM support for JFS Filesystem") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2021-02-11cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath.Shyam Prasad N1-0/+1
While debugging another issue today, Steve and I noticed that if a subdir for a file share is already mounted on the client, any new mount of any other subdir (or the file share root) of the same share results in sharing the cifs superblock, which e.g. can result in incorrect device name. While setting prefix path for the root of a cifs_sb, CIFS_MOUNT_USE_PREFIX_PATH flag should also be set. Without it, prepath is not even considered in some places, and output of "mount" and various /proc/<>/*mount* related options can be missing part of the device name. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-11cifs: In the new mount api we get the full devname as source=Ronnie Sahlberg3-2/+17
so we no longer need to handle or parse the UNC= and prefixpath= options that mount.cifs are generating. This also fixes a bug in the mount command option where the devname would be truncated into just //server/share because we were looking at the truncated UNC value and not the full path. I.e. in the mount command output the devive //server/share/path would show up as just //server/share Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-11xfs: consider shutdown in bmapbt cursor delete assertBrian Foster1-21/+12
The assert in xfs_btree_del_cursor() checks that the bmapbt block allocation field has been handled correctly before the cursor is freed. This field is used for accurate calculation of indirect block reservation requirements (for delayed allocations), for example. generic/019 reproduces a scenario where this assert fails because the filesystem has shutdown while in the middle of a bmbt record insertion. This occurs after a bmbt block has been allocated via the cursor but before the higher level bmap function (i.e. xfs_bmap_add_extent_hole_real()) completes and resets the field. Update the assert to accommodate the transient state if the filesystem has shutdown. While here, clean up the indentation and comments in the function. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-11io_uring: assign file_slot prior to calling io_sqe_file_register()Jens Axboe1-1/+2
We use the assigned slot in io_sqe_file_register(), and a previous patch moved the assignment to after we have called it. This isn't super pretty, and will get cleaned up in the future. For now, fix the regression by restoring the previous assignment/clear of the file_slot. Fixes: ea64ec02b31d ("io_uring: deduplicate file table slot calculation") Signed-off-by: Jens Axboe <axboe@kernel.dk>