summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-09-01Merge tag 'driver-core-5.15-rc1' of ↵Linus Torvalds9-96/+141
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core patches for 5.15-rc1. These do change a number of different things across different subsystems, and because of that, there were 2 stable tags created that might have already come into your tree from different pulls that did the following - changed the bus remove callback to return void - sysfs iomem_get_mapping rework Other than those two things, there's only a few small things in here: - kernfs performance improvements for huge numbers of sysfs users at once - tiny api cleanups - other minor changes All of these have been in linux-next for a while with no reported problems, other than the before-mentioned merge issue" * tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits) MAINTAINERS: Add dri-devel for component.[hc] driver core: platform: Remove platform_device_add_properties() ARM: tegra: paz00: Handle device properties with software node API bitmap: extend comment to bitmap_print_bitmask/list_to_buf drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI topology: use bin_attribute to break the size limitation of cpumap ABI lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list sysfs: Rename struct bin_attribute member to f_mapping sysfs: Invoke iomem_get_mapping() from the sysfs open callback debugfs: Return error during {full/open}_proxy_open() on rmmod zorro: Drop useless (and hardly used) .driver member in struct zorro_dev zorro: Simplify remove callback sh: superhyway: Simplify check in remove callback nubus: Simplify check in remove callback nubus: Make struct nubus_driver::remove return void kernfs: dont call d_splice_alias() under kernfs node lock kernfs: use i_lock to protect concurrent inode updates kernfs: switch kernfs to use an rwsem kernfs: use VFS negative dentry caching ...
2021-09-01f2fs: should put a page beyond EOF when preparing a writeJaegeuk Kim1-0/+2
The prepare_compress_overwrite() gets/locks a page to prepare a read, and calls f2fs_read_multi_pages() which checks EOF first. If there's any page beyond EOF, we unlock the page and set cc->rpages[i] = NULL, which we can't put the page anymore. This makes page leak, so let's fix by putting that page. Fixes: a949dc5f2c5c ("f2fs: compress: fix race condition of overwrite vs truncate") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-09-01f2fs: deallocate compressed pages when error happensJaegeuk Kim1-6/+6
In f2fs_write_multi_pages(), f2fs_compress_pages() allocates pages for compression work in cc->cpages[]. Then, f2fs_write_compressed_pages() initiates bio submission. But, if there's any error before submitting the IOs like early f2fs_cp_error(), previously it didn't free cpages by f2fs_compress_free_page(). Let's fix memory leak by putting that just before deallocating cc->cpages. Fixes: 4c8ff7095bef ("f2fs: support data compression") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-31io-wq: fix queue stalling raceJens Axboe1-8/+7
We need to set the stalled bit early, before we drop the lock for adding us to the stall hash queue. If not, then we can race with new work being queued between adding us to the stall hash and io_worker_handle_work() marking us stalled. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-31Merge tag 'fs.close_range.v5.15' of ↵Linus Torvalds1-24/+40
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull close_range() cleanup from Christian Brauner: "This is a cleanup for close_range() which was sent as part of a bugfix we did some time ago in commit 9b5b872215fe ("file: fix close_range() for unshare+cloexec"). We used to share more code between some helpers for close_range() which made retrieving the maximum number of open fds before calling into the helpers sensible. But with the introduction of CLOSE_RANGE_CLOEXEC and the need to retrieve the number of maximum fds once more for CLOSE_RANGE_CLOEXEC that stopped making sense. So the code was in a dumb in-limbo state. Fix this by simplifying the code a bit. The original idea was to only fix the bug itself and make backporting easy. And since the cleanup wasn't very pressing I left it in linux-next for a very long time. I didn't pull the patches from the list again back then which is why they don't have lore-links. So I'm listing them below explicitly" Commit 03ba0fe4d09f ("file: simplify logic in __close_range()") Link: https://lore.kernel.org/linux-fsdevel/20210402123548.108372-3-brauner@kernel.org Commit f49fd6d3c070 ("file: let pick_file() tell caller it's done") Link: https://lore.kernel.org/linux-fsdevel/20210402123548.108372-4-brauner@kernel.org * tag 'fs.close_range.v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: file: simplify logic in __close_range() file: let pick_file() tell caller it's done
2021-08-31Merge tag 'fs.move_mount.move_mount_set_group.v5.15' of ↵Linus Torvalds1-1/+76
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull move_mount updates from Christian Brauner: "This contains an extension to the move_mount() syscall making it possible to add a single private mount into an existing propagation tree. The use-case comes from the criu folks which have been struggling with restoring complex mount trees for a long time. Variations of this work have been discussed at Plumbers before, e.g. https://www.linuxplumbersconf.org/event/7/contributions/640/ The extension to move_mount() enables criu to restore any set of mount namespaces, mount trees and sharing group trees without introducing yet more complexity into mount propagation itself. The changes required to criu to make use of this and restore complex propagation trees are available at https://github.com/Snorch/criu/commits/mount-v2-poc A cleaned-up version of this will go up for merging into the main criu repo after this lands" * tag 'fs.move_mount.move_mount_set_group.v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: add move_mount(MOVE_MOUNT_SET_GROUP) selftest move_mount: allow to add a mount into an existing group
2021-08-31Merge tag 'iomap-5.15-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds14-920/+845
Pull iomap updates from Darrick Wong: "The most notable externally visible change for this cycle is the addition of support for reads to inline tail fragments of files, which was requested by the erofs developers; and a correction for a kernel memory corruption bug if the sysadmin tries to activate a swapfile with more pages than the swapfile header suggests. We also now report writeback completion errors to the file mapping correctly, instead of munging all errors into EIO. Internally, the bulk of the changes are Christoph's patchset to reduce the indirect function call count by a third to a half by converting iomap iteration from a loop pattern to a generator/consumer pattern. As an added bonus, fsdax no longer open-codes iomap apply loops. Summary: - Simplify the bio_end_page usage in the buffered IO code. - Support reading inline data at nonzero offsets for erofs. - Fix some typos and bad grammar. - Convert kmap_atomic usage in the inline data read path. - Add some extra inline data input checking. - Fix a memory corruption bug stemming from iomap_swapfile_activate trying to activate more pages than mm was expecting. - Pass errnos through the page writeback code so that writeback errors are reported correctly instead of being munged to EIO. - Replace iomap_apply with a open-coded iterator loops to reduce the number of indirect calls by a third to a half. - Refactor the fsdax code to use iomap iterators instead of the open-coded iomap_apply code that it had before. - Format file range iomap tracepoint data in hexadecimal and standardize the names used in the pretty-print string" * tag 'iomap-5.15-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (41 commits) iomap: standardize tracepoint formatting and storage mm/swap: consider max pages in iomap_swapfile_add_extent iomap: move loop control code to iter.c iomap: constify iomap_iter_srcmap fsdax: switch the fault handlers to use iomap_iter fsdax: factor out a dax_fault_actor() helper fsdax: factor out helpers to simplify the dax fault code iomap: rework unshare flag iomap: pass an iomap_iter to various buffered I/O helpers iomap: remove iomap_apply fsdax: switch dax_iomap_rw to use iomap_iter iomap: switch iomap_swapfile_activate to use iomap_iter iomap: switch iomap_seek_data to use iomap_iter iomap: switch iomap_seek_hole to use iomap_iter iomap: switch iomap_bmap to use iomap_iter iomap: switch iomap_fiemap to use iomap_iter iomap: switch __iomap_dio_rw to use iomap_iter iomap: switch iomap_page_mkwrite to use iomap_iter iomap: switch iomap_zero_range to use iomap_iter iomap: switch iomap_file_unshare to use iomap_iter ...
2021-08-31Merge tag 'vfs-5.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-0/+8
Pull project quota update from Darrick Wong: "A single VFS patch that prevents userspace from setting project quota ids on files that the VFS considers invalid" * tag 'vfs-5.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: forbid invalid project ID
2021-08-31Merge tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds12-110/+173
Pull nfsd updates from Chuck Lever: "New features: - Support for server-side disconnect injection via debugfs - Protocol definitions for new RPC_AUTH_TLS authentication flavor Performance improvements: - Reduce page allocator traffic in the NFSD splice read actor - Reduce CPU utilization in svcrdma's Send completion handler Notable bug fixes: - Stabilize lockd operation when re-exporting NFS mounts - Fix the use of %.*s in NFSD tracepoints - Fix /proc/sys/fs/nfs/nsm_use_hostnames" * tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits) nfsd: fix crash on LOCKT on reexported NFSv3 nfs: don't allow reexport reclaims lockd: don't attempt blocking locks on nfs reexports nfs: don't atempt blocking locks on nfs reexports Keep read and write fds with each nlm_file lockd: update nlm_lookup_file reexport comment nlm: minor refactoring nlm: minor nlm_lookup_file argument change lockd: lockd server-side shouldn't set fl_ops SUNRPC: Add documentation for the fail_sunrpc/ directory SUNRPC: Server-side disconnect injection SUNRPC: Move client-side disconnect injection SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free() nfsd4: Fix forced-expiry locking rpc: fix gss_svc_init cleanup on failure SUNRPC: Add RPC_AUTH_TLS protocol numbers lockd: change the proc_handler for nsm_use_hostnames sysctl: introduce new proc handler proc_dobool SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() ...
2021-08-31io_uring: don't submit half-prepared drain requestPavel Begunkov1-0/+5
[ 3784.910888] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 3784.910904] RIP: 0010:__io_file_supports_nowait+0x5/0xc0 [ 3784.910926] Call Trace: [ 3784.910928] ? io_read+0x17c/0x480 [ 3784.910945] io_issue_sqe+0xcb/0x1840 [ 3784.910953] __io_queue_sqe+0x44/0x300 [ 3784.910959] io_req_task_submit+0x27/0x70 [ 3784.910962] tctx_task_work+0xeb/0x1d0 [ 3784.910966] task_work_run+0x61/0xa0 [ 3784.910968] io_run_task_work_sig+0x53/0xa0 [ 3784.910975] __x64_sys_io_uring_enter+0x22/0x30 [ 3784.910977] do_syscall_64+0x3d/0x90 [ 3784.910981] entry_SYSCALL_64_after_hwframe+0x44/0xae io_drain_req() goes before checks for REQ_F_FAIL, which protect us from submitting under-prepared request (e.g. failed in io_init_req(). Fail such drained requests as well. Fixes: a8295b982c46d ("io_uring: fix failed linkchain code logic") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e411eb9924d47a131b1e200b26b675df0c2b7627.1630415423.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-31io_uring: fix queueing half-created requestsPavel Begunkov1-1/+12
[ 27.259845] general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI [ 27.261043] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] [ 27.263730] RIP: 0010:sock_from_file+0x20/0x90 [ 27.272444] Call Trace: [ 27.272736] io_sendmsg+0x98/0x600 [ 27.279216] io_issue_sqe+0x498/0x68d0 [ 27.281142] __io_queue_sqe+0xab/0xb50 [ 27.285830] io_req_task_submit+0xbf/0x1b0 [ 27.286306] tctx_task_work+0x178/0xad0 [ 27.288211] task_work_run+0xe2/0x190 [ 27.288571] exit_to_user_mode_prepare+0x1a1/0x1b0 [ 27.289041] syscall_exit_to_user_mode+0x19/0x50 [ 27.289521] do_syscall_64+0x48/0x90 [ 27.289871] entry_SYSCALL_64_after_hwframe+0x44/0xae io_req_complete_failed() -> io_req_complete_post() -> io_req_task_queue() still would try to enqueue hard linked request, which can be half prepared (e.g. failed init), so we can't allow that to happen. Fixes: a8295b982c46d ("io_uring: fix failed linkchain code logic") Reported-by: syzbot+f9704d1878e290eddf73@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/70b513848c1000f88bd75965504649c6bb1415c0.1630415423.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-31io-wq: ensure that hash wait lock is IRQ disablingJens Axboe1-2/+2
A previous commit removed the IRQ safety of the worker and wqe locks, but that left one spot of the hash wait lock now being done without already having IRQs disabled. Ensure that we use the right locking variant for the hashed waitqueue lock. Fixes: a9a4aa9fbfc5 ("io-wq: wqe and worker locks no longer need to be IRQ safe") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-31io_uring: retry in case of short read on block deviceMing Lei1-1/+7
In case of buffered reading from block device, when short read happens, we should retry to read more, otherwise the IO will be completed partially, for example, the following fio expects to read 2MB, but it can only read 1M or less bytes: fio --name=onessd --filename=/dev/nvme0n1 --filesize=2M \ --rw=randread --bs=2M --direct=0 --overwrite=0 --numjobs=1 \ --iodepth=1 --time_based=0 --runtime=2 --ioengine=io_uring \ --registerfiles --fixedbufs --gtod_reduce=1 --group_reporting Fix the issue by allowing short read retry for block device, which sets FMODE_BUF_RASYNC really. Fixes: 9a173346bd9e ("io_uring: fix short read retries for non-reg files") Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20210821150751.1290434-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-31io_uring: IORING_OP_WRITE needs hash_reg_file setJens Axboe1-0/+1
During some testing, it became evident that using IORING_OP_WRITE doesn't hash buffered writes like the other writes commands do. That's simply an oversight, and can cause performance regressions when doing buffered writes with this command. Correct that and add the flag, so that buffered writes are correctly hashed when using the non-iovec based write command. Cc: stable@vger.kernel.org Fixes: 3a6820f2bb8a ("io_uring: add non-vectored read/write commands") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-31io-wq: fix race between adding work and activating a free workerJens Axboe1-27/+24
The attempt to find and activate a free worker for new work is currently combined with creating a new one if we don't find one, but that opens io-wq up to a race where the worker that is found and activated can put itself to sleep without knowing that it has been selected to perform this new work. Fix this by moving the activation into where we add the new work item, then we can retain it within the wqe->lock scope and elimiate the race with the worker itself checking inside the lock, but sleeping outside of it. Cc: stable@vger.kernel.org Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-31Merge tag 'gfs2-v5.14-rc2-fixes' of ↵Linus Torvalds13-141/+139
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Various withdraw related fixes (freeze glock recursion, thread initialization / destruction order, journal recovery, glock cleanup, withdraw under journal lock). - Some error message improvements. - Various minor cleanups. * tag 'gfs2-v5.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Remove redundant check from gfs2_glock_dq gfs2: Delay withdraw from atomic context gfs2: Don't call dlm after protocol is unmounted gfs2: don't stop reads while withdraw in progress gfs2: Mark journal inodes as "don't cache" gfs2: nit: gfs2_drop_inode shouldn't return bool gfs2: Eliminate vestigial HIF_FIRST gfs2: Make recovery error more readable gfs2: Don't release and reacquire local statfs bh gfs2: init system threads before freeze lock gfs2: tiny cleanup in gfs2_log_reserve gfs2: trivial clean up of gfs2_ail_error gfs2: be more verbose replaying invalid rgrp blocks gfs2: Fix glock recursion in freeze_go_xmote_bh gfs2: Fix memory leak of object lsi on error return path
2021-08-31Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds5-44/+143
Pull fscrypt updates from Eric Biggers: "Some small fixes and cleanups for fs/crypto/: - Fix ->getattr() for ext4, f2fs, and ubifs to report the correct st_size for encrypted symlinks - Use base64url instead of a custom Base64 variant - Document struct fscrypt_operations" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: document struct fscrypt_operations fscrypt: align Base64 encoding with RFC 4648 base64url fscrypt: remove mention of symlink st_size quirk from documentation ubifs: report correct st_size for encrypted symlinks f2fs: report correct st_size for encrypted symlinks ext4: report correct st_size for encrypted symlinks fscrypt: add fscrypt_symlink_getattr() for computing st_size
2021-08-31Merge tag 'for-5.15-tag' of ↵Linus Torvalds49-1405/+2677
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The highlights of this round are integrations with fs-verity and idmapped mounts, the rest is usual mix of minor improvements, speedups and cleanups. There are some patches outside of btrfs, namely updating some VFS interfaces, all straightforward and acked. Features: - fs-verity support, using standard ioctls, backward compatible with read-only limitation on inodes with previously enabled fs-verity - idmapped mount support - make mount with rescue=ibadroots more tolerant to partially damaged trees - allow raid0 on a single device and raid10 on two devices, degenerate cases but might be useful as an intermediate step during conversion to other profiles - zoned mode block group auto reclaim can be disabled via sysfs knob Performance improvements: - continue readahead of node siblings even if target node is in memory, could speed up full send (on sample test +11%) - batching of delayed items can speed up creating many files - fsync/tree-log speedups - avoid unnecessary work (gains +2% throughput, -2% run time on sample load) - reduced lock contention on renames (on dbench +4% throughput, up to -30% latency) Fixes: - various zoned mode fixes - preemptive flushing threshold tuning, avoid excessive work on almost full filesystems Core: - continued subpage support, preparation for implementing remaining features like compression and defragmentation; with some limitations, write is now enabled on 64K page systems with 4K sectors, still considered experimental - no readahead on compressed reads - inline extents disabled - disabled raid56 profile conversion and mount - improved flushing logic, fixing early ENOSPC on some workloads - inode flags have been internally split to read-only and read-write incompat bit parts, used by fs-verity - new tree items for fs-verity - descriptor item - Merkle tree item - inode operations extended to be namespace-aware - cleanups and refactoring Generic code changes: - fs: new export filemap_fdatawrite_wbc - fs: removed sync_inode - block: bio_trim argument type fixups - vfs: add namespace-aware lookup" * tag 'for-5.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits) btrfs: reset replace target device to allocation state on close btrfs: zoned: fix ordered extent boundary calculation btrfs: do not do preemptive flushing if the majority is global rsv btrfs: reduce the preemptive flushing threshold to 90% btrfs: tree-log: check btrfs_lookup_data_extent return value btrfs: avoid unnecessarily logging directories that had no changes btrfs: allow idmapped mount btrfs: handle ACLs on idmapped mounts btrfs: allow idmapped INO_LOOKUP_USER ioctl btrfs: allow idmapped SUBVOL_SETFLAGS ioctl btrfs: allow idmapped SET_RECEIVED_SUBVOL ioctls btrfs: relax restrictions for SNAP_DESTROY_V2 with subvolids btrfs: allow idmapped SNAP_DESTROY ioctls btrfs: allow idmapped SNAP_CREATE/SUBVOL_CREATE ioctls btrfs: check whether fsgid/fsuid are mapped during subvolume creation btrfs: allow idmapped permission inode op btrfs: allow idmapped setattr inode op btrfs: allow idmapped tmpfile inode op btrfs: allow idmapped symlink inode op btrfs: allow idmapped mkdir inode op ...
2021-08-31Merge tag '5.15-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds28-763/+477
Pull cifs client updates from Steve French: "Eleven cifs/smb3 client fixes: - mostly restructuring to allow disabling less secure algorithms (this will allow eventual removing rc4 and md4 from general use in the kernel) - four fixes, including two for stable - enable r/w support with fscache and cifs.ko I am working on a larger set of changes (the usual ... multichannel, auth and signing improvements), but wanted to get these in earlier to reduce chance of merge conflicts later in the merge window" * tag '5.15-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: cifs: Do not leak EDEADLK to dgetents64 for STATUS_USER_SESSION_DELETED cifs: add cifs_common directory to MAINTAINERS file cifs: cifs_md4 convert to SPDX identifier cifs: create a MD4 module and switch cifs.ko to use it cifs: fork arc4 and create a separate module for it for cifs and other users cifs: remove support for NTLM and weaker authentication algorithms cifs: enable fscache usage even for files opened as rw oid_registry: Add OIDs for missing Spnego auth mechanisms to Macs smb3: fix posix extensions mount option cifs: fix wrong release in sess_alloc_buffer() failed path CIFS: Fix a potencially linear read overflow
2021-08-31Merge tag '5.15-rc-first-ksmbd-merge' of git://git.samba.org/ksmbdLinus Torvalds62-0/+32066
Pull initial ksmbd implementation from Steve French: "Initial merge of kernel smb3 file server, ksmbd. The SMB family of protocols is the most widely deployed network filesystem protocol, the default on Windows and Macs (and even on many phones and tablets), with clients and servers on all major operating systems, but lacked a kernel server for Linux. For many cases the current userspace server choices were suboptimal either due to memory footprint, performance or difficulty integrating well with advanced Linux features. ksmbd is a new kernel module which implements the server-side of the SMB3 protocol. The target is to provide optimized performance, GPLv2 SMB server, and better lease handling (distributed caching). The bigger goal is to add new features more rapidly (e.g. RDMA aka "smbdirect", and recent encryption and signing improvements to the protocol) which are easier to develop on a smaller, more tightly optimized kernel server than for example in Samba. The Samba project is much broader in scope (tools, security services, LDAP, Active Directory Domain Controller, and a cross platform file server for a wider variety of purposes) but the user space file server portion of Samba has proved hard to optimize for some Linux workloads, including for smaller devices. This is not meant to replace Samba, but rather be an extension to allow better optimizing for Linux, and will continue to integrate well with Samba user space tools and libraries where appropriate. Working with the Samba team we have already made sure that the configuration files and xattrs are in a compatible format between the kernel and user space server. Various types of functional and regression tests are regularly run against it. One example is the automated 'buildbot' regression tests which use the Linux client to test against ksmbd, e.g. http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/8/builds/56 but other test suites, including Samba's smbtorture functional test suite are also used regularly" * tag '5.15-rc-first-ksmbd-merge' of git://git.samba.org/ksmbd: (219 commits) ksmbd: fix __write_overflow warning in ndr_read_string MAINTAINERS: ksmbd: add cifs_common directory to ksmbd entry MAINTAINERS: ksmbd: update my email address ksmbd: fix permission check issue on chown and chmod ksmbd: don't set FILE DELETE and FILE_DELETE_CHILD in access mask by default MAINTAINERS: add git adddress of ksmbd ksmbd: update SMB3 multi-channel support in ksmbd.rst ksmbd: smbd: fix kernel oops during server shutdown ksmbd: remove select FS_POSIX_ACL in Kconfig ksmbd: use proper errno instead of -1 in smb2_get_ksmbd_tcon() ksmbd: update the comment for smb2_get_ksmbd_tcon() ksmbd: change int data type to boolean ksmbd: Fix multi-protocol negotiation ksmbd: fix an oops in error handling in smb2_open() ksmbd: add ipv6_addr_v4mapped check to know if connection from client is ipv4 ksmbd: fix missing error code in smb2_lock ksmbd: use channel signingkey for binding SMB2 session setup ksmbd: don't set RSS capable in FSCTL_QUERY_NETWORK_INTERFACE_INFO ksmbd: Return STATUS_OBJECT_PATH_NOT_FOUND if smb2_creat() returns ENOENT ksmbd: fix -Wstringop-truncation warnings ...
2021-08-31fs/ntfs3: Restyle comments to better align with kernel-docKonstantin Komarov15-75/+84
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-31fs/ntfs3: Rework file operationsKonstantin Komarov10-529/+563
Rename now works "Add new name and remove old name". "Remove old name and add new name" may result in bad inode if we can't add new name and then can't restore (add) old name. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-31fs/ntfs3: Remove fat ioctl's from ntfs3 driver for nowKari Argillander1-8/+0
For some reason we have FAT ioctl calls. Even old ntfs driver did not use these. We should not use these because it his hard to get things out of kernel when they are upstream. That's why we remove these for now. More discussion is needed what ioctl should be implemented and what is important. Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-31fuse: flush extending writesMiklos Szeredi1-1/+1
Callers of fuse_writeback_range() assume that the file is ready for modification by the server in the supplied byte range after the call returns. If there's a write that extends the file beyond the end of the supplied range, then the file needs to be extended to at least the end of the range, but currently that's not done. There are at least two cases where this can cause problems: - copy_file_range() will return short count if the file is not extended up to end of the source range. - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file, hence the region may not be fully allocated. Fix by flushing writes from the start of the range up to the end of the file. This could be optimized if the writes are non-extending, etc, but it's probably not worth the trouble. Fixes: a2bc92362941 ("fuse: fix copy_file_range() in the writeback case") Fixes: 6b1bdb56b17c ("fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)") Cc: <stable@vger.kernel.org> # v5.2 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-31ext4: make the updating inode data procedure atomicZhang Yi1-16/+28
Now that ext4_do_update_inode() return error before filling the whole inode data if we fail to set inode blocks in ext4_inode_blocks_set(). This error should never happen in theory since sb->s_maxbytes should not have allowed this, we have already init sb->s_maxbytes according to this feature in ext4_fill_super(). So even through that could only happen due to the filesystem corruption, we'd better to return after we finish updating the inode because it may left an uninitialized buffer and we could read this buffer later in "errors=continue" mode. This patch make the updating inode data procedure atomic, call EXT4_ERROR_INODE() after we dropping i_raw_lock after something bad happened, make sure that the inode is integrated, and also drop a BUG_ON and do some small cleanups. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210826130412.3921207-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: remove an unnecessary if statement in __ext4_get_inode_loc()Zhang Yi1-84/+78
The "if (!buffer_uptodate(bh))" hunk covered almost the whole code after getting buffer in __ext4_get_inode_loc() which seems unnecessary, remove it and switch to check ext4_buffer_uptodate(), it simplify code and make it more readable. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210826130412.3921207-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: move inode eio simulation behind io completeionZhang Yi1-3/+1
No EIO simulation is required if the buffer is uptodate, so move the simulation behind read bio completeion just like inode/block bitmap simulation does. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210826130412.3921207-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: Improve scalability of ext4 orphan file handlingJan Kara2-27/+53
Even though the length of the critical section when adding / removing orphaned inodes was significantly reduced by using orphan file, the contention of lock protecting orphan file still appears high in profiles for truncate / unlink intensive workloads with high number of threads. This patch makes handling of orphan file completely lockless. Also to reduce conflicts between CPUs different CPUs start searching for empty slot in orphan file in different blocks. Performance comparison of locked orphan file handling, lockless orphan file handling, and completely disabled orphan inode handling from 80 CPU Xeon Server with 526 GB of RAM, filesystem located on SAS SSD disk, average of 5 runs: stress-orphan (microbenchmark truncating files byte-by-byte from N processes in parallel) Threads Time Time Time Orphan locked Orphan lockless No orphan 1 0.945600 0.939400 0.891200 2 1.331800 1.246600 1.174400 4 1.995000 1.780600 1.713200 8 6.424200 4.900000 4.106000 16 14.937600 8.516400 8.138000 32 33.038200 24.565600 24.002200 64 60.823600 39.844600 38.440200 128 122.941400 70.950400 69.315000 So we can see that with lockless orphan file handling, addition / deletion of orphaned inodes got almost completely out of picture even for a microbenchmark stressing it. For reaim creat_clo workload on ramdisk there are also noticeable gains (average of 5 runs): Clients Vanilla (ops/s) Patched (ops/s) creat_clo-1 14705.88 ( 0.00%) 14354.07 * -2.39%* creat_clo-3 27108.43 ( 0.00%) 28301.89 ( 4.40%) creat_clo-5 37406.48 ( 0.00%) 45180.73 * 20.78%* creat_clo-7 41338.58 ( 0.00%) 54687.50 * 32.29%* creat_clo-9 45226.13 ( 0.00%) 62937.07 * 39.16%* creat_clo-11 44000.00 ( 0.00%) 65088.76 * 47.93%* creat_clo-13 36516.85 ( 0.00%) 68661.97 * 88.03%* creat_clo-15 30864.20 ( 0.00%) 69551.78 * 125.35%* creat_clo-17 27478.45 ( 0.00%) 67729.08 * 146.48%* creat_clo-19 25000.00 ( 0.00%) 61621.62 * 146.49%* creat_clo-21 18772.35 ( 0.00%) 63829.79 * 240.02%* creat_clo-23 16698.94 ( 0.00%) 61938.96 * 270.92%* creat_clo-25 14973.05 ( 0.00%) 56947.61 * 280.33%* creat_clo-27 16436.69 ( 0.00%) 65008.03 * 295.51%* creat_clo-29 13949.01 ( 0.00%) 69047.62 * 395.00%* creat_clo-31 14283.52 ( 0.00%) 67982.45 * 375.95%* Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: Speedup ext4 orphan inode handlingJan Kara4-52/+394
Ext4 orphan inode handling is a bottleneck for workloads which heavily truncate / unlink small files since it contends on the global s_orphan_mutex lock (and generally it's difficult to improve scalability of the ondisk linked list of orphaned inodes). This patch implements new way of handling orphan inodes. Instead of linking orphaned inode into a linked list, we store it's inode number in a new special file which we call "orphan file". Only if there's no more space in the orphan file (too many inodes are currently orphaned) we fall back to using old style linked list. Currently we protect operations in the orphan file with a spinlock for simplicity but even in this setting we can substantially reduce the length of the critical section and thus speedup some workloads. In the next patch we improve this by making orphan handling lockless. Note that the change is backwards compatible when the filesystem is clean - the existence of the orphan file is a compat feature, we set another ro-compat feature indicating orphan file needs scanning for orphaned inodes when mounting filesystem read-write. This ro-compat feature gets cleared on unmount / remount read-only. Some performance data from 80 CPU Xeon Server with 512 GB of RAM, filesystem located on SSD, average of 5 runs: stress-orphan (microbenchmark truncating files byte-by-byte from N processes in parallel) Threads Time Time Vanilla Patched 1 1.057200 0.945600 2 1.680400 1.331800 4 2.547000 1.995000 8 7.049400 6.424200 16 14.827800 14.937600 32 40.948200 33.038200 64 87.787400 60.823600 128 206.504000 122.941400 So we can see significant wins all over the board. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: Move orphan inode handling into a separate fileJan Kara5-363/+375
Move functions for handling orphan inodes into a new file fs/ext4/orphan.c to have them in one place and somewhat reduce size of other files. No code changes. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31jbd2: add sparse annotations for add_transaction_credits()Theodore Ts'o1-1/+18
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: Support for checksumming from journal triggersJan Kara16-128/+259
JBD2 layer support triggers which are called when journaling layer moves buffer to a certain state. We can use the frozen trigger, which gets called when buffer data is frozen and about to be written out to the journal, to compute block checksums for some buffer types (similarly as does ocfs2). This avoids unnecessary repeated recomputation of the checksum (at the cost of larger window where memory corruption won't be caught by checksumming) and is even necessary when there are unsynchronized updaters of the checksummed data. So add superblock and journal trigger type arguments to ext4_journal_get_write_access() and ext4_journal_get_create_access() so that frozen triggers can be set accordingly. Also add inode argument to ext4_walk_page_buffers() and all the callbacks used with that function for the same purpose. This patch is mostly only a change of prototype of the above mentioned functions and a few small helpers. Real checksumming will come later. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: fix sparse warningsTheodore Ts'o2-6/+25
Add sparse annotations to suppress false positive context imbalance warnings, and use NULL instead of 0 in EXT_MAX_{EXTENT,INDEX}. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: fix race writing to an inline_data file while its xattrs are changingTheodore Ts'o1-0/+6
The location of the system.data extended attribute can change whenever xattr_sem is not taken. So we need to recalculate the i_inline_off field since it mgiht have changed between ext4_write_begin() and ext4_write_end(). This means that caching i_inline_off is probably not helpful, so in the long run we should probably get rid of it and shrink the in-memory ext4 inode slightly, but let's fix the race the simple way for now. Cc: stable@kernel.org Fixes: f19d5870cbf72 ("ext4: add normal write support for inline data") Reported-by: syzbot+13146364637c7363a7de@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: Make sure quota files are not grabbed accidentallyJan Kara1-2/+6
If ext4 filesystem is corrupted so that quota files are linked from directory hirerarchy, bad things can happen. E.g. quota files can get corrupted or deleted. Make sure we are not grabbing quota file inodes when we expect normal inodes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20210812133122.26360-1-jack@suse.cz
2021-08-31ext4: fix e2fsprogs checksum failure for mounted filesystemJan Kara1-0/+8
Commit 81414b4dd48 ("ext4: remove redundant sb checksum recomputation") removed checksum recalculation after updating superblock free space / inode counters in ext4_fill_super() based on the fact that we will recalculate the checksum on superblock writeout. That is correct assumption but until the writeout happens (which can take a long time) the checksum is incorrect in the buffer cache and if programs such as tune2fs or resize2fs is called shortly after a file system is mounted can fail. So return back the checksum recalculation and add a comment explaining why. Fixes: 81414b4dd48f ("ext4: remove redundant sb checksum recomputation") Cc: stable@kernel.org Reported-by: Boyang Xue <bxue@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20210812124737.21981-1-jack@suse.cz
2021-08-31ext4: if zeroout fails fall back to splitting the extent nodeTheodore Ts'o1-2/+3
If the underlying storage device is using thin-provisioning, it's possible for a zeroout operation to return ENOSPC. Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks left") added logic to retry block allocation since we might get free block after we commit a transaction. But the ENOSPC from thin-provisioning will confuse ext4, and lead to an infinite loop. Since using zeroout instead of splitting the extent node is an optimization, if it fails, we might as well fall back to splitting the extent node. Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: reduce arguments of ext4_fc_add_dentry_tlvGuoqing Jiang1-18/+9
Let's pass fc_dentry directly since those arguments (tag, parent_ino and ino etc) can be deferenced from it. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20210727080708.3708814-1-guoqing.jiang@linux.dev
2021-08-31ext4: flush background discard kwork when retry allocationWang Jianchao3-3/+13
The background discard kwork tries to mark blocks used and issue discard. This can make filesystem suffer from NOSPC error, xfstest generic/371 can fail due to it. Fix it by flushing discard kwork in ext4_should_retry_alloc. At the same time, give up discard at the moment. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> Link: https://lore.kernel.org/r/20210830075246.12516-6-jianchao.wan9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31ext4: get discard out of jbd2 commit kthread contexWang Jianchao2-25/+78
Right now, discard is issued and waited to be completed in jbd2 commit kthread context after the logs are committed. When large amount of files are deleted and discard is flooding, jbd2 commit kthread can be blocked for long time. Then all of the metadata operations can be blocked to wait the log space. One case is the page fault path with read mm->mmap_sem held, which wants to update the file time but has to wait for the log space. When other threads in the task wants to do mmap, then write mmap_sem is blocked. Finally all of the following read mmap_sem requirements are blocked, even the ps command which need to read the /proc/pid/ -cmdline. Our monitor service which needs to read /proc/pid/cmdline used to be blocked for 5 mins. This patch frees the blocks back to buddy after commit and then do discard in a async kworker context in fstrim fashion, namely, - mark blocks to be discarded as used if they have not been allocated - do discard - mark them free After this, jbd2 commit kthread won't be blocked any more by discard and we won't get NOSPC even if the discard is slow or throttled. Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2 Suggested-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> Link: https://lore.kernel.org/r/20210830075246.12516-5-jianchao.wan9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-31Merge tag 'for-5.15/io_uring-vfs-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds4-110/+343
Pull io_uring mkdirat/symlinkat/linkat support from Jens Axboe: "This adds io_uring support for mkdirat, symlinkat, and linkat" * tag 'for-5.15/io_uring-vfs-2021-08-30' of git://git.kernel.dk/linux-block: io_uring: add support for IORING_OP_LINKAT io_uring: add support for IORING_OP_SYMLINKAT io_uring: add support for IORING_OP_MKDIRAT namei: update do_*() helpers to return ints namei: make do_linkat() take struct filename namei: add getname_uflags() namei: make do_symlinkat() take struct filename namei: make do_mknodat() take struct filename namei: make do_mkdirat() take struct filename namei: change filename_parentat() calling conventions namei: ignore ERR/NULL names in putname()
2021-08-31Merge tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds2-3/+5
Pull support for struct bio recycling from Jens Axboe: "This adds bio recycling support for polled IO, allowing quick reuse of a bio for high IOPS scenarios via a percpu bio_set list. It's good for almost a 10% improvement in performance, bumping our per-core IO limit from ~3.2M IOPS to ~3.5M IOPS" * tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block: bio: improve kerneldoc documentation for bio_alloc_kiocb() block: provide bio_clear_hipri() helper block: use the percpu bio cache in __blkdev_direct_IO io_uring: enable use of bio alloc cache block: clear BIO_PERCPU_CACHE flag if polling isn't supported bio: add allocation cache abstraction fs: add kiocb alloc cache flag bio: optimize initialization of a bio
2021-08-31Merge tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds3-801/+1155
Pull io_uring updates from Jens Axboe: - cancellation cleanups (Hao, Pavel) - io-wq accounting cleanup (Hao) - io_uring submit locking fix (Hao) - io_uring link handling fixes (Hao) - fixed file improvements (wangyangbo, Pavel) - allow updates of linked timeouts like regular timeouts (Pavel) - IOPOLL fix (Pavel) - remove batched file get optimization (Pavel) - improve reference handling (Pavel) - IRQ task_work batching (Pavel) - allow pure fixed file, and add support for open/accept (Pavel) - GFP_ATOMIC RT kernel fix - multiple CQ ring waiter improvement - funnel IRQ completions through task_work - add support for limiting async workers explicitly - add different clocksource support for timeouts - io-wq wakeup race fix - lots of cleanups and improvement (Pavel et al) * tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-block: (87 commits) io-wq: fix wakeup race when adding new work io-wq: wqe and worker locks no longer need to be IRQ safe io-wq: check max_worker limits if a worker transitions bound state io_uring: allow updating linked timeouts io_uring: keep ltimeouts in a list io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts io-wq: provide a way to limit max number of workers io_uring: add build check for buf_index overflows io_uring: clarify io_req_task_cancel() locking io_uring: add task-refs-get helper io_uring: fix failed linkchain code logic io_uring: remove redundant req_set_fail() io_uring: don't free request to slab io_uring: accept directly into fixed file table io_uring: hand code io_accept() fd installing io_uring: openat directly into fixed fd table net: add accept helper not installing fd io_uring: fix io_try_cancel_userdata race for iowq io_uring: IRQ rw completion batching io_uring: batch task work locking ...
2021-08-31Merge tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds12-234/+49
Pull block updates from Jens Axboe: "Nothing major in here - lots of good cleanups and tech debt handling, which is also evident in the diffstats. In particular: - Add disk sequence numbers (Matteo) - Discard merge fix (Ming) - Relax disk zoned reporting restrictions (Niklas) - Bio error handling zoned leak fix (Pavel) - Start of proper add_disk() error handling (Luis, Christoph) - blk crypto fix (Eric) - Non-standard GPT location support (Dmitry) - IO priority improvements and cleanups (Damien)o - blk-throtl improvements (Chunguang) - diskstats_show() stack reduction (Abd-Alrhman) - Loop scheduler selection (Bart) - Switch block layer to use kmap_local_page() (Christoph) - Remove obsolete disk_name helper (Christoph) - block_device refcounting improvements (Christoph) - Ensure gendisk always has a request queue reference (Christoph) - Misc fixes/cleanups (Shaokun, Oliver, Guoqing)" * tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block: (129 commits) sg: pass the device name to blk_trace_setup block, bfq: cleanup the repeated declaration blk-crypto: fix check for too-large dun_bytes blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN blk-zoned: allow zone management send operations without CAP_SYS_ADMIN block: mark blkdev_fsync static block: refine the disk_live check in del_gendisk mmc: sdhci-tegra: Enable MMC_CAP2_ALT_GPT_TEGRA mmc: block: Support alternative_gpt_sector() operation partitions/efi: Support non-standard GPT location block: Add alternative_gpt_sector() operation bio: fix page leak bio_add_hw_page failure block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT block: remove a pointless call to MINOR() in device_add_disk null_blk: add error handling support for add_disk() virtio_blk: add error handling support for add_disk() block: add error handling for device_add_disk / add_disk block: return errors from disk_alloc_events block: return errors from blk_integrity_add block: call blk_register_queue earlier in device_add_disk ...
2021-08-31Merge tag 'timers-core-2021-08-30' of ↵Linus Torvalds1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timekeeping, timers and related drivers: Core code: - Cure a couple of correctness issues in the posix CPU timer code to prevent that the tick dependency for NOHZ full is kept alive for no reason. - Avoid expensive double reprogramming of the clockevent device in hrtimer_start_range_ns(). - Avoid pointless SMP function calls when the clock was set to avoid disturbing CPUs which do not have any affected timers queued. - Make the clocksource watchdog test work correctly when CONFIG_HZ is less than 100. Drivers: - Prefer the ARM architected timer over the Exynos timer which is way more expensive to access. - Add device tree bindings for new Ingenic SoCs - The usual improvements and cleanups all over the place" * tag 'timers-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) clocksource: Make clocksource watchdog test safe for slow-HZ systems dt-bindings: timer: Add ABIs for new Ingenic SoCs clocksource/drivers/fttmr010: Pass around less pointers clocksource/drivers/mediatek: Optimize systimer irq clear flow on shutdown clocksource/drivers/ingenic: Use bitfield macro helpers clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel dt-bindings: timer: convert rockchip,rk-timer.txt to YAML clocksource/drivers/exynos_mct: Mark MCT device as CLOCK_EVT_FEAT_PERCPU clocksource/drivers/exynos_mct: Prioritise Arm arch timer on arm64 hrtimer: Unbreak hrtimer_force_reprogram() hrtimer: Use raw_cpu_ptr() in clock_was_set() hrtimer: Avoid more SMP function calls in clock_was_set() hrtimer: Avoid unnecessary SMP function calls in clock_was_set() hrtimer: Add bases argument to clock_was_set() time/timekeeping: Avoid invoking clock_was_set() twice timekeeping: Distangle resume and clock-was-set events timerfd: Provide timerfd_resume() hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case hrtimer: Ensure timerfd notification for HIGHRES=n hrtimer: Consolidate reprogramming code ...
2021-08-30Merge tag 'sched-core-2021-08-30' of ↵Linus Torvalds2-8/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - The biggest change in this cycle is scheduler support for asymmetric scheduling affinity, to support the execution of legacy 32-bit tasks on AArch32 systems that also have 64-bit-only CPUs. Architectures can fill in this functionality by defining their own task_cpu_possible_mask(p). When this is done, the scheduler will make sure the task will only be scheduled on CPUs that support it. (The actual arm64 specific changes are not part of this tree.) For other architectures there will be no change in functionality. - Add cgroup SCHED_IDLE support - Increase node-distance flexibility & delay determining it until a CPU is brought online. (This enables platforms where node distance isn't final until the CPU is only.) - Deadline scheduler enhancements & fixes - Misc fixes & cleanups. * tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) eventfd: Make signal recursion protection a task bit sched/fair: Mark tg_is_idle() an inline in the !CONFIG_FAIR_GROUP_SCHED case sched: Introduce dl_task_check_affinity() to check proposed affinity sched: Allow task CPU affinity to be restricted on asymmetric systems sched: Split the guts of sched_setaffinity() into a helper function sched: Introduce task_struct::user_cpus_ptr to track requested affinity sched: Reject CPU affinity changes based on task_cpu_possible_mask() cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq() cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 sched: Introduce task_cpu_possible_mask() to limit fallback rq selection sched: Cgroup SCHED_IDLE support sched/topology: Skip updating masks for non-online nodes sched: Replace deprecated CPU-hotplug functions. sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS sched: Fix UCLAMP_FLAG_IDLE setting sched/deadline: Fix missing clock update in migrate_task_rq_dl() sched/fair: Avoid a second scan of target in select_idle_cpu sched/fair: Use prev instead of new target as recent_used_cpu sched: Don't report SCHED_FLAG_SUGOV in sched_getattr() ...
2021-08-30Merge tag 'locks-v5.15' of ↵Linus Torvalds16-255/+28
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "This starts with a couple of fixes for potential deadlocks in the fowner/fasync handling. The next patch removes the old mandatory locking code from the kernel altogether. The last patch cleans up rw_verify_area a bit more after the mandatory locking removal" * tag 'locks-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: clean up after mandatory file locking support removal fs: remove mandatory file locking support fcntl: fix potential deadlock for &fasync_struct.fa_lock fcntl: fix potential deadlocks for &fown_struct.lock
2021-08-30Merge tag 'hole_punch_for_v5.15-rc1' of ↵Linus Torvalds31-275/+228
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fs hole punching vs cache filling race fixes from Jan Kara: "Fix races leading to possible data corruption or stale data exposure in multiple filesystems when hole punching races with operations such as readahead. This is the series I was sending for the last merge window but with your objection fixed - now filemap_fault() has been modified to take invalidate_lock only when we need to create new page in the page cache and / or bring it uptodate" * tag 'hole_punch_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: filesystems/locking: fix Malformed table warning cifs: Fix race between hole punch and page fault ceph: Fix race between hole punch and page fault fuse: Convert to using invalidate_lock f2fs: Convert to using invalidate_lock zonefs: Convert to using invalidate_lock xfs: Convert double locking of MMAPLOCK to use VFS helpers xfs: Convert to use invalidate_lock xfs: Refactor xfs_isilocked() ext2: Convert to using invalidate_lock ext4: Convert to use mapping->invalidate_lock mm: Add functions to lock invalidate_lock for two mappings mm: Protect operations adding pages to page cache with invalidate_lock documentation: Sync file_operations members with reality mm: Fix comments mentioning i_mutex
2021-08-30NFS: Always provide aligned buffers to the RPC read layersTrond Myklebust1-2/+6
Instead of messing around with XDR padding in the RDMA layer, we should just give the RPC layer an aligned buffer. Try to avoid creating extra RPC calls by aligning to the smaller value of ALIGN(len, rsize) and PAGE_SIZE. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-30Merge tag 'fs_for_v5.15-rc1' of ↵Linus Torvalds13-114/+103
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF and isofs updates from Jan Kara: "Several smaller fixes and cleanups in UDF and isofs" * tag 'fs_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf_get_extendedattr() had no boundary checks. isofs: joliet: Fix iocharset=utf8 mount option udf: Fix iocharset=utf8 mount option udf: Get rid of 0-length arrays in struct fileIdentDesc udf: Get rid of 0-length arrays udf: Remove unused declaration udf: Check LVID earlier