summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-06-17ext4: remove redundant check buffer_uptodate()Joseph Qi1-1/+1
Now set_buffer_uptodate() will test first and then set, so we don't have to check buffer_uptodate() first, remove it to simplify code. Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://lore.kernel.org/r/1619418587-5580-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17ext4: fix overflow in ext4_iomap_alloc()Jan Kara1-1/+1
A code in iomap alloc may overflow block number when converting it to byte offset. Luckily this is mostly harmless as we will just use more expensive method of writing using unwritten extents even though we are writing beyond i_size. Cc: stable@kernel.org Fixes: 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210412102333.2676-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17btrfs: zoned: fix negative space_info->bytes_readonlyNaohiro Aota1-4/+4
Consider we have a using block group on zoned btrfs. |<- ZU ->|<- used ->|<---free--->| `- Alloc offset ZU: Zone unusable Marking the block group read-only will migrate the zone unusable bytes to the read-only bytes. So, we will have this. |<- RO ->|<- used ->|<--- RO --->| RO: Read only When marking it back to read-write, btrfs_dec_block_group_ro() subtracts the above "RO" bytes from the space_info->bytes_readonly. And, it moves the zone unusable bytes back and again subtracts those bytes from the space_info->bytes_readonly, leading to negative bytes_readonly. This can be observed in the output as eg.: Data, single: total=512.00MiB, used=165.21MiB, zone_unusable=16.00EiB Data, single: total=536870912, used=173256704, zone_unusable=18446744073603186688 This commit fixes the issue by reordering the operations. Link: https://github.com/naota/linux/issues/37 Reported-by: David Sterba <dsterba@suse.com> Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones") CC: stable@vger.kernel.org # 5.12+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-17pstore/blk: Include zone in pstore_device_infoKees Cook1-64/+79
Information was redundant between struct pstore_zone_info and struct pstore_device_info. Use struct pstore_zone_info, with member name "zone". Additionally untangle the logic for the "best effort" block device instance. Signed-off-by: Kees Cook <keescook@chromium.org> Fixed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/lkml/20210617005424.182305-1-pulehui@huawei.com
2021-06-16pstore/blk: Fix kerndoc and redundancy on blkdev paramKees Cook1-23/+1
Remove redundant details of blkdev and fix up resulting kerndoc. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org>
2021-06-16pstore/blk: Use the normal block device I/O pathKees Cook1-181/+83
Stop poking into block layer internals and just open the block device file an use kernel_read and kernel_write on it. Note that this means the transformation from name_to_dev_t can't be used anymore when pstore_blk is loaded as a module: a full filesystem device path name must be used instead. Additionally removes ":internal:" kerndoc link, since no such documentation remains. Co-developed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org>
2021-06-16mm/hugetlb: expand restore_reserve_on_error functionalityMike Kravetz1-0/+1
The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.com/ Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16pstore/blk: Move verify_size() macro out of functionKees Cook1-11/+11
There's no good reason for the verify_size macro to live inside the function. Move it up with the check_size() macro and fix indenting. Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2021-06-16pstore/blk: Improve failure reportingKees Cook1-1/+15
There was no feedback on bad registration attempts. Add details on the failure cause. Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2021-06-16io-wq: remove header files not needed anymoreOlivier Langlois1-2/+0
mm related header files are not needed for io-wq module. remove them for a small clean-up. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: Add to traces the req pointer when availableOlivier Langlois1-5/+6
The req pointer uniquely identify a specific request. Having it in traces can provide valuable insights that is not possible to have if the calling process is reusing the same user_data value. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: optimise io_commit_cqring()Pavel Begunkov1-7/+11
In most cases io_commit_cqring() is just an smp_store_release(), and it's hot enough, especially for IRQ rw, to want it to save on a function call. Mark it inline and extract a non-inlined slow path doing drain and timeout flushing. The inlined part is pretty slim to not cause binary bloating. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7350f8b6b92caa50a48a80be39909f0d83eddd93.1623772051.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: shove more drain bits out of hot pathPavel Begunkov1-20/+22
Place all drain_next logic into io_drain_req(), so it's never executed if there was no drained requests before. The only thing we need is to set ->drain_active if we see a request with IOSQE_IO_DRAIN, do that in io_init_req() where flags are definitely in registers. Also, all drain-related code is encapsulated in io_drain_req(), makes it cleaner. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/68bf4f7395ddaafbf1a26bd97b57d57d45a9f900.1623772051.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: switch !DRAIN fast path when possiblePavel Begunkov1-6/+8
->drain_used is one way, which is not optimal if users use DRAIN but very rarely. However, we can just clear it in io_drain_req() when all drained before requests are gone. Also rename the flag to reflect the change and be more clear about it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7f37a240857546a94df6348507edddacab150460.1623772051.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: fix min types mismatch in table allocPavel Begunkov1-1/+1
fs/io_uring.c: In function 'io_alloc_page_table': include/linux/minmax.h:20:28: warning: comparison of distinct pointer types lacks a cast Cast everything to size_t using min_t. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 9123c8ffce16 ("io_uring: add helpers for 2 level table alloc") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/50f420a956bca070a43810d4a805293ed54f39d8.1623759527.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: Fix comment of io_get_sqeFam Zheng1-1/+1
The sqe_ptr argument has been gone since 709b302faddf (io_uring: simplify io_get_sqring, 2020-04-08), made the return value of the function. Update the comment accordingly. Signed-off-by: Fam Zheng <fam.zheng@bytedance.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20210604164256.12242-1-fam.zheng@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: optimise non-drain pathPavel Begunkov1-27/+30
Replace drain checks with one-way flag set upon seeing the first IOSQE_IO_DRAIN request. There are several places where it cuts cycles well: 1) It's much faster than the fast check with two conditions in io_drain_req() including pretty complex list_empty_careful(). 2) We can mark io_queue_sqe() inline now, that's a huge win. 3) It replaces timeout and drain checks in io_commit_cqring() with a single flags test. Also great not touching ->defer_list there without a reason so limiting cache bouncing. It adds a small amount of overhead to drain path, but it's negligible. The main nuisance is that once it meets any DRAIN request in io_uring instance lifetime it will _always_ go through a slower path, so drain-less and offset-mode timeout less applications are preferable. The overhead in that case would be not big, but it's worth to bear in mind. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/98d2fff8c4da5144bb0d08499f591d4768128ea3.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: refactor io_req_defer()Pavel Begunkov1-20/+19
Rename io_req_defer() into io_drain_req() and refactor it uncoupling it from io_queue_sqe() error handling and preparing for coming optimisations. Also, prioritise non IOSQE_ASYNC path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4f17dd56e7fbe52d1866f8acd8efe3284d2bebcb.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: move uring_lock locationPavel Begunkov1-9/+7
->uring_lock is prevalently used for submission, even though it protects many other things like iopoll, registeration, selected bufs, and more. And it's placed together with ->cq_wait poked on completion and CQ waiting sides. Move them apart, ->uring_lock goes to the submission data, and cq_wait to completion related chunk. The last one requires some reshuffling so everything needed by io_cqring_ev_posted*() is in one cacheline. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dea5e845caee4c98aa0922b46d713154d81f7bd8.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: wait heads renamingPavel Begunkov1-15/+15
We use several wait_queue_head's for different purposes, but namings are confusing. First rename ctx->cq_wait into ctx->poll_wait, because this one is used for polling an io_uring instance. Then rename ctx->wait into ctx->cq_wait, which is responsible for CQE waiting. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/47b97a097780c86c67b20b6ccc4e077523dce682.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: clean up check_overflow flagPavel Begunkov1-11/+9
There are no users of ->sq_check_overflow, only ->cq_check_overflow is used. Combine it and move out of completion related part of struct io_ring_ctx. A not so obvious benefit of it is fitting all completion side fields into a single cacheline. It was taking 2 lines before with 56B padding, and io_cqring_ev_posted*() were still touching both of them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/25927394964df31d113e3c729416af573afff5f5.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: small io_submit_sqe() optimisationPavel Begunkov1-1/+1
submit_state.link is used only to assemble a link and not used for actual submission, so clear it before io_queue_sqe() in io_submit_sqe(), awhile it's hot and in caches and queueing doesn't spoil it. May also potentially help compiler with spilling or to do other optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1579939426f3ad6b55af3005b1389bbbed7d780d.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: optimise completion timeout flushingPavel Begunkov1-4/+6
io_commit_cqring() might be very hot and we definitely don't want to touch ->timeout_list there, because 1) it's shared with the submission side so might lead to cache bouncing and 2) may need to load an extra cache line, especially for IRQ completions. We're interested in it at the completion side only when there are offset-mode timeouts, which are not so popular. Replace list_empty(->timeout_list) hot path check with a new one-way flag, which is set when we prepare the first offset-mode timeout. note: the flag sits in the same line as briefly used after ->rings Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e4892ec68b71a69f92ffbea4a1499be3ec0d463b.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: don't cache number of dropped SQEsPavel Begunkov1-7/+5
Kill ->cached_sq_dropped and wire DRAIN sequence number correction via ->cq_extra, which is there exactly for that purpose. User visible dropped counter will be populated by incrementing it instead of keeping a copy, similarly as it was done not so long ago with cq_overflow. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/088aceb2707a534d531e2770267c4498e0507cc1.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: refactor io_get_sqe()Pavel Begunkov1-2/+2
The line of io_get_sqe() evaluating @head consists of too many operations including READ_ONCE(), it's not convenient for probing. Refactor it also improving readability. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/866ad6e4ef4851c7c61f6b0e08dbd0a8d1abce84.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: shuffle more fields into SQ ctx sectionPavel Begunkov1-18/+17
Since moving locked_free_* out of struct io_submit_state ctx->submit_state is accessed on submission side only, so move it into the submission section. Same goes for rsrc table pointers/nodes/etc., they must be taken and checked during submission because sync'ed by uring_lock, so move them there as well. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8a5899a50afc6ccca63249e716f580b246f3dec6.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: move ctx->flags from SQ cachelinePavel Begunkov1-4/+4
ctx->flags are heavily used by both, completion and submission sides, so move it out from the ctx fields related to submissions. Instead, place it together with ctx->refs, because it's already cacheline-aligned and so pads lots of space, and both almost never change. Also, in most occasions they are accessed together as refs are taken at submission time and put back during completion. Do same with ctx->rings, where the pointer itself is never modified apart from ring init/free. Note: in percpu mode, struct percpu_ref doesn't modify the struct itself but takes indirection with ref->percpu_count_ptr. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4c48c173e63d35591383ba2b87e8b8e8dfdbd23d.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: keep SQ pointers in a single cachelinePavel Begunkov1-2/+1
sq_array and sq_sqes are always used together, however they are in different cachelines, where the borderline is right before cq_overflow_list is rather rarely touched. Move the fields together so it loads only one cacheline. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3ef2411a94874da06492506a8897eff679244f49.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io-wq: remove redundant initialization of variable retColin Ian King1-1/+1
The variable ret is being initialized with a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210615143424.60449-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16io_uring: Fix incorrect sizeof operator for copy_from_user callColin Ian King1-2/+4
Static analysis is warning that the sizeof being used is should be of *data->tags[i] and not data->tags[i]. Although these are the same size on 64 bit systems it is not a portable assumption to assume this is true for all cases. Fix this by using a temporary pointer tag_slot to make the code a clearer. Addresses-Coverity: ("Sizeof not portable") Fixes: d878c81610e1 ("io_uring: hide rsrc tag copy into generic helpers") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20210615130011.57387-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15proc: only require mm_struct for writingLinus Torvalds1-1/+3
Commit 591a22c14d3f ("proc: Track /proc/$pid/attr/ opener mm_struct") we started using __mem_open() to track the mm_struct at open-time, so that we could then check it for writes. But that also ended up making the permission checks at open time much stricter - and not just for writes, but for reads too. And that in turn caused a regression for at least Fedora 29, where NIC interfaces fail to start when using NetworkManager. Since only the write side wanted the mm_struct test, ignore any failures by __mem_open() at open time, leaving reads unaffected. The write() time verification of the mm_struct pointer will then catch the failure case because a NULL pointer will not match a valid 'current->mm'. Link: https://lore.kernel.org/netdev/YMjTlp2FSJYvoyFa@unreal/ Fixes: 591a22c14d3f ("proc: Track /proc/$pid/attr/ opener mm_struct") Reported-and-tested-by: Leon Romanovsky <leon@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-15kernfs: move revalidate to be near lookupIan Kent1-43/+43
While the dentry operation kernfs_dop_revalidate() is grouped with dentry type functions it also has a strong affinity to the inode operation ->lookup(). It makes sense to locate this function near to kernfs_iop_lookup() because we will be adding VFS negative dentry caching to reduce path lookup overhead for non-existent paths. There's no functional change from this patch. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/162375275365.232295.8995526416263659926.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-15afs: Fix an IS_ERR() vs NULL checkDan Carpenter1-2/+2
The proc_symlink() function returns NULL on error, it doesn't return error pointers. Fixes: 5b86d4ff5dce ("afs: Implement network namespacing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/YLjMRKX40pTrJvgf@mwanda/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-14mark pstore-blk as brokenChristoph Hellwig1-0/+1
pstore-blk just pokes directly into the pagecache for the block device without going through the file operations for that by faking up it's own file operations that do not match the block device ones. As this breaks the control of the block layer of it's page cache, and even now just works by accident only the best thing is to just disable this driver. Fixes: 17639f67c1d6 ("pstore/blk: Introduce backend for block devices") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210608161327.1537919-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: inline io_iter_do_read()Pavel Begunkov1-1/+1
There are only two calls in source code of io_iter_do_read(), the function is small and pretty hot though is failed to get inlined. Makr it as inline. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/25a26dae7660da73fbc2244b361b397ef43d3caf.1623634182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: unify SQPOLL and user task cancellationsPavel Begunkov1-59/+30
Merge io_uring_cancel_sqpoll() and __io_uring_cancel() as it's easier to have a conditional ctx traverse inside than keeping them in sync. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/adfe24d6dad4a3883a40eee54352b8b65ac851bb.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: cache task struct refsPavel Begunkov1-9/+28
tctx in submission part is always synchronised because is executed from the task's context, so we can batch allocate tctx/task references and store them across syscall boundaries. It avoids enough of operations, including an atomic for getting task ref and a percpu_counter_add() function call, which still fallback to spinlock for large batching cases (around >=32). Should be good for SQPOLL submitting in small portions and coming at some moment bpf submissions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/14b327b973410a3eec1f702ecf650e100513aca9.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: don't vmalloc rsrc tagsPavel Begunkov1-16/+36
We don't really need vmalloc for keeping tags, it's not a hot path and is there out of convenience, so replace it with two level tables to not litter kernel virtual memory mappings. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/241a3422747113a8909e7e1030eb585d4a349e0d.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: add helpers for 2 level table allocPavel Begunkov1-30/+43
Some parts like fixed file table use 2 level tables, factor out helpers for allocating/deallocating them as more users are to come. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1709212359cd82eb416d395f86fc78431ccfc0aa.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: remove rsrc put work irq save/restorePavel Begunkov1-3/+2
io_rsrc_put_work() is executed by workqueue in non-irq context, so no need for irqsave/restore variants of spinlocking. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2a7f77220735f4ad404ac885b4d73bdf42d2f836.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: hide rsrc tag copy into generic helpersPavel Begunkov1-28/+27
Make io_rsrc_data_alloc() taking care of rsrc tags loading on registration, so we don't need to repeat it for each new rsrc type. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5609680697bd09735de10561b75edb95283459da.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io-wq: simplify worker exitingPavel Begunkov1-4/+1
io_worker_handle_work() already takes care of the empty list case and releases spinlock, so get rid of ugly conditional unlocking and unconditionally call handle_work() Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7521e485677f381036676943e876a0afecc23017.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io-wq: don't repeat IO_WQ_BIT_EXIT check by workerPavel Begunkov1-2/+1
io_wqe_worker()'s main loop does check IO_WQ_BIT_EXIT flag, so no need for a second test_bit at the end as it will immediately jump to the first check afterwards. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d6af4a51c86523a527fb5417c9fbc775c4b26497.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: rename function *task_filePavel Begunkov1-9/+9
What at some moment was references to struct file used to control lifetimes of task/ctx is now just internal tctx structures/nodes, so rename outdated *task_file() routines into something more sensible. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e2fbce42932154c2631ce58ffbffaa232afe18d5.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: refactor io_iopoll_req_issuedPavel Begunkov1-23/+21
A simple refactoring of io_iopoll_req_issued(), move in_async inside so we don't pass it around and save on double checking it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1513bfde4f0c835be25ac69a82737ab0668d7665.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io-wq: remove unused io-wq refcountingPavel Begunkov1-5/+1
iowq->refs is initialised to one and killed on exit, so it's not used and we can kill it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/401007393528ea7c102360e69a29b64498e15db2.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io-wq: embed wqe ptr array into struct io_wqPavel Begunkov1-11/+4
io-wq keeps an array of pointers to struct io_wqe, allocate this array as a part of struct io-wq, it's easier to code and saves an extra indirection for nearly each io-wq call. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1482c6a001923bbed662dc38a8a580fb08b1ed8c.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: fix blocking inline submissionPavel Begunkov1-1/+1
There is a complaint against sys_io_uring_enter() blocking if it submits stdin reads. The problem is in __io_file_supports_async(), which sees that it's a cdev and allows it to be processed inline. Punt char devices using generic rules of io_file_supports_async(), including checking for presence of *_iter() versions of rw callbacks. Apparently, it will affect most of cdevs with some exceptions like null and zero devices. Cc: stable@vger.kernel.org Reported-by: Birk Hirdman <lonjil@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d60270856b8a4560a639ef5f76e55eb563633599.1623236455.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: enable shmem/memfd memory registrationPavel Begunkov1-0/+2
Relax buffer registration restictions, which filters out file backed memory, and allow shmem/memfd as they have normal anonymous pages underneath. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14io_uring: don't bounce submit_state cachelinesPavel Begunkov1-11/+9
struct io_submit_state contains struct io_comp_state and so locked_free_*, that renders cachelines around ->locked_free* being invalidated on most non-inline completions, that may terrorise caches if submissions and completions are done by different tasks. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/290cb5412b76892e8631978ee8ab9db0c6290dd5.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>