summaryrefslogtreecommitdiff
path: root/io_uring/rsrc.c
AgeCommit message (Collapse)AuthorFilesLines
2023-04-19io_uring/rsrc: devirtualise rsrc put callbacksPavel Begunkov1-6/+19
We only have two rsrc types, buffers and files, replace virtual callbacks for putting resources down with a switch..case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/02ca727bf8e5f7f820c2f404e95ae88c8f472930.1681822823.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-19io_uring/rsrc: pass node to io_rsrc_put_work()Pavel Begunkov1-6/+6
Instead of passing rsrc_data and a resource to io_rsrc_put_work() just forward node, that's all the function needs. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/791e8edd28d78797240b74d34e99facbaad62f3b.1681822823.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-19io_uring/rsrc: inline io_rsrc_put_work()Pavel Begunkov1-13/+6
io_rsrc_put_work() is simple enough to be open coded into its only caller. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1b36dd46766ced39a9b160767babfa2fce07b8f8.1681822823.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-19io_uring/rsrc: add empty flag in rsrc_nodePavel Begunkov1-3/+3
Unless a node was flushed by io_rsrc_ref_quiesce(), it'll carry a resource. Replace ->inline_items with an empty flag, which is initialised to false and only raised in io_rsrc_ref_quiesce(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/75d384c9d2252e12af73b9cf8a44e1699106aeb1.1681822823.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-19io_uring/rsrc: merge nodes and io_rsrc_putPavel Begunkov1-68/+23
struct io_rsrc_node carries a number of resources represented by struct io_rsrc_put. That was handy before for sync overhead ammortisation, but all complexity is gone and nodes are simple and lightweight. Let's allocate a separate node for each resource. Nodes and io_rsrc_put and not much different in size, and former are cached, so node allocation should work better. That also removes some overhead for nested iteration in io_rsrc_node_ref_zero() / __io_rsrc_put_work(). Another reason for the patch is that it greatly reduces complexity by moving io_rsrc_node_switch[_start]() inside io_queue_rsrc_removal(), so users don't have to care about it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c7d3a45b30cc14cd93700a710dd112edc703db98.1681822823.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-19io_uring/rsrc: infer node from ctx on io_queue_rsrc_removalPavel Begunkov1-4/+5
For io_queue_rsrc_removal() we should always use the current active rsrc node, don't pass it directly but let the function grab it from the context. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d15939b4afea730978b4925685c2577538b823bb.1681822823.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-15io_uring/rsrc: refactor io_queue_rsrc_removalPavel Begunkov1-4/+1
We can queue up a rsrc into a list in io_queue_rsrc_removal() while allocating io_rsrc_put and so simplify the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/36bd708ee25c0e2e7992dc19b17db166eea9ac40.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-15io_uring/rsrc: clean up __io_sqe_buffers_update()Pavel Begunkov1-2/+1
Inline offset variable, so we don't use it without subjecting it to array_index_nospec() first. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/77936d9ed23755588810c5eafcea7e1c3b90e3cd.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-15io_uring/rsrc: inline switch_start fast pathPavel Begunkov1-7/+5
Inline the part of io_rsrc_node_switch_start() that checks whether the cache is empty or not, as most of the times it will have some number of entries in there. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9619c1717a0e01f22c5fce2f1ba2735f804da0f2.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-15io_uring/rsrc: remove rsrc_data refsPavel Begunkov1-24/+8
Instead of waiting for rsrc_data->refs to be downed to zero, check whether there are rsrc nodes queued for completion, that's easier then maintaining references. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8e33fd143d83e11af3e386aea28eb6d6c6a1be10.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-15io_uring/rsrc: fix DEFER_TASKRUN rsrc quiescePavel Begunkov1-0/+9
For io_rsrc_ref_quiesce() to progress it should execute all task_work items, including deferred ones. However, currently nobody would wake us, and so let's set ctx->cq_wait_nr, so io_req_local_work_add() would wake us up. Fixes: c0e0d6ba25f18 ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f1a90d1bc5ebf096475b018fed52e54f3b89d4af.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-15io_uring/rsrc: use wq for quiescingPavel Begunkov1-6/+12
Replace completions with waitqueues for rsrc data quiesce, the main wakeup condition is when data refs hit zero. Note that data refs are only changes under ->uring_lock, so we prepare before mutex_unlock() reacquire it after taking the lock back. This change will be needed in the next patch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1d0dbc74b3b4fd67c8f01819e680c5e0da252956.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-15io_uring/rsrc: refactor io_rsrc_ref_quiescePavel Begunkov1-13/+5
Refactor io_rsrc_ref_quiesce() by moving the first mutex_unlock(), so we don't have to have a second mutex_unlock() further in the loop. It prepares us to the next patch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/65bc876271fb16bf550a53a4c76c91aacd94e52e.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-15io_uring/rsrc: remove io_rsrc_node::donePavel Begunkov1-4/+1
Kill io_rsrc_node::node and check refs instead, it's set when the nodes refcount hits zero, and it won't change afterwards. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bbde361f4010f7e8bf196f1ecca27a763b79926f.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-15io_uring/rsrc: use nospec'ed indexesPavel Begunkov1-1/+1
We use array_index_nospec() for registered buffer indexes, but don't use it while poking into rsrc tags, fix that. Fixes: 634d00df5e1cf ("io_uring: add full-fledged dynamic buffers support") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f02fafc5a9c0dd69be2b0618c38831c078232ff0.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-12io_uring/rsrc: extract SCM file put helperPavel Begunkov1-9/+11
SCM file accounting is a slow path and is only used for UNIX files. Extract a helper out of io_rsrc_file_put() that does the SCM unaccounting. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/58cc7bffc2ee96bec8c2b89274a51febcbfa5556.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-12io_uring/rsrc: refactor io_rsrc_node_switchPavel Begunkov1-25/+11
We use io_rsrc_node_switch() coupled with io_rsrc_node_switch_start() for a bunch of cases including initialising ctx->rsrc_node, i.e. by passing NULL instead of rsrc_data. Leave it to only deal with actual node changing. For that, first remove it from io_uring_create() and add a function allocating the first node. Then also remove all calls to io_rsrc_node_switch() from files/buffers register as we already have a node installed and it does essentially nothing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d146fe306ff98b1a5a60c997c252534f03d423d7.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-12io_uring/rsrc: zero node's rsrc data on allocPavel Begunkov1-0/+1
struct io_rsrc_node::rsrc_data field is initialised on rsrc removal and shouldn't be used before that, still let's play safe and zero the field on alloc. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/09bd03cedc8da8a7974c5e6e4bf0489fd16593ab.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-12io_uring/rsrc: consolidate node cachingPavel Begunkov1-9/+11
We store one pre-allocated rsrc node in ->rsrc_backup_node, merge it with ->rsrc_node_cache. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6d5410e51ccd29be7a716be045b51d6b371baef6.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-12io_uring/rsrc: add lockdep checksPavel Begunkov1-0/+2
Add a lockdep chek to make sure that file and buffer updates hold ->uring_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/961bbe6e433ec9bc0375127f23468b37b729df99.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: optimise io_rsrc_data refcountingPavel Begunkov1-12/+18
Every struct io_rsrc_node takes a struct io_rsrc_data reference, which means all rsrc updates do 2 extra atomics. Replace atomics refcounting with a int as it's all done under ->uring_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e73c3d6820cf679532696d790b5b8fae23537213.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: add lockdep sanity checksPavel Begunkov1-1/+1
We should hold ->uring_lock while putting nodes with io_put_rsrc_node(), add a lockdep check for that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b50d5f156ac41450029796738c1dfd22a521df7a.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: cache struct io_rsrc_nodePavel Begunkov1-8/+15
Add allocation cache for struct io_rsrc_node, it's always allocated and put under ->uring_lock, so it doesn't need any extra synchronisation around caches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/252a9d9ef9654e6467af30fdc02f57c0118fb76e.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: don't offload node freePavel Begunkov1-55/+4
struct delayed_work rsrc_put_work was previously used to offload node freeing because io_rsrc_node_ref_zero() was previously called by RCU in the IRQ context. Now, as percpu refcounting is gone, we can do it eagerly at the spot without pushing it to a worker. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/13fb1aac1e8d068ad8fd4a0c6d0d157ab61b90c0.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: optimise io_rsrc_put allocationPavel Begunkov1-17/+34
Every io_rsrc_node keeps a list of items to put, and all entries are kmalloc()'ed. However, it's quite often to queue up only one entry per node, so let's add an inline entry there to avoid extra allocations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c482c1c652c45c85ac52e67c974bc758a50fed5f.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: rename rsrc_listPavel Begunkov1-3/+3
We have too many "rsrc" around which makes the name of struct io_rsrc_node::rsrc_list confusing. The field is responsible for keeping a list of files or buffers, so call it item_list and add comments around. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3e34d4dfc1fdbb6b520f904ee6187c2ccf680efe.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: kill rsrc_ref_lockPavel Begunkov1-5/+0
We use ->rsrc_ref_lock spinlock to protect ->rsrc_ref_list in io_rsrc_node_ref_zero(). Now we removed pcpu refcounting, which means io_rsrc_node_ref_zero() is not executed from the irq context as an RCU callback anymore, and we also put it under ->uring_lock. io_rsrc_node_switch(), which queues up nodes into the list, is also protected by ->uring_lock, so we can safely get rid of ->rsrc_ref_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b60af883c263551190b526a55ff2c9d5ae07141.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: protect node refs with uring_lockPavel Begunkov1-26/+4
Currently, for nodes we have an atomic counter and some cached (non-atomic) refs protected by uring_lock. Let's put all ref manipulations under uring_lock and get rid of the atomic part. It's free as in all cases we care about we already hold the lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/25b142feed7d831008257d90c8b17c0115d4fc15.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: keep cached refs per nodePavel Begunkov1-6/+9
We cache refs of the current node (i.e. ctx->rsrc_node) in ctx->rsrc_cached_refs. We'll be moving away from atomics, so move the cached refs in struct io_rsrc_node for now. It's a prep patch and shouldn't change anything in practise. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9edc3669c1d71b06c2dca78b2b2b8bb9292738b9.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-04io_uring/rsrc: use non-pcpu refcounts for nodesPavel Begunkov1-10/+5
One problem with the current rsrc infra is that often updates will generates lots of rsrc nodes, each carry pcpu refs. That takes quite a lot of memory, especially if there is a stall, and takes lots of CPU cycles. Only pcpu allocations takes >50 of CPU with a naive benchmark updating files in a loop. Replace pcpu refs with normal refcounting. There is already a hot path avoiding atomics / refs, but following patches will further improve it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e9ed8a9457b331a26555ff9443afc64cdaab7247.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+1
Conflicts: drivers/net/ethernet/mediatek/mtk_ppe.c 3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting") 924531326e2d ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-22io_uring/rsrc: fix null-ptr-deref in io_file_bitmap_get()Savino Dicanosa1-0/+1
When fixed files are unregistered, file_alloc_end and alloc_hint are not cleared. This can later cause a NULL pointer dereference in io_file_bitmap_get() if auto index selection is enabled via IORING_FILE_INDEX_ALLOC: [ 6.519129] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] [ 6.541468] RIP: 0010:_find_next_zero_bit+0x1a/0x70 [...] [ 6.560906] Call Trace: [ 6.561322] <TASK> [ 6.561672] io_file_bitmap_get+0x38/0x60 [ 6.562281] io_fixed_fd_install+0x63/0xb0 [ 6.562851] ? __pfx_io_socket+0x10/0x10 [ 6.563396] io_socket+0x93/0xf0 [ 6.563855] ? __pfx_io_socket+0x10/0x10 [ 6.564411] io_issue_sqe+0x5b/0x3d0 [ 6.564914] io_submit_sqes+0x1de/0x650 [ 6.565452] __do_sys_io_uring_enter+0x4fc/0xb20 [ 6.566083] ? __do_sys_io_uring_register+0x11e/0xd80 [ 6.566779] do_syscall_64+0x3c/0x90 [ 6.567247] entry_SYSCALL_64_after_hwframe+0x72/0xdc [...] To fix the issue, set file alloc range and alloc_hint to zero after file tables are freed. Cc: stable@vger.kernel.org Fixes: 4278a0deb1f6 ("io_uring: defer alloc_hint update to io_file_bitmap_set()") Signed-off-by: Savino Dicanosa <sd7.dev@pm.me> [axboe: add explicit bitmap == NULL check as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+8
net/wireless/nl80211.c b27f07c50a73 ("wifi: nl80211: fix puncturing bitmap policy") cbbaf2bb829b ("wifi: nl80211: add a command to enable/disable HW timestamping") https://lore.kernel.org/all/20230314105421.3608efae@canb.auug.org.au tools/testing/selftests/net/Makefile 62199e3f1658 ("selftests: net: Add VXLAN MDB test") 13715acf8ab5 ("selftest: Add test for bind() conflicts.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16io_uring/rsrc: fix folio accountingPavel Begunkov1-1/+7
| BUG: Bad page state in process kworker/u8:0 pfn:5c001 | page:00000000bfda61c8 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20001 pfn:0x5c001 | head:0000000011409842 order:9 entire_mapcount:0 nr_pages_mapped:0 pincount:1 | anon flags: 0x3fffc00000b0004(uptodate|head|mappedtodisk|swapbacked|node=0|zone=0|lastcpupid=0xffff) | raw: 03fffc0000000000 fffffc0000700001 ffffffff00700903 0000000100000000 | raw: 0000000000000200 0000000000000000 00000000ffffffff 0000000000000000 | head: 03fffc00000b0004 dead000000000100 dead000000000122 ffff00000a809dc1 | head: 0000000000020000 0000000000000000 00000000ffffffff 0000000000000000 | page dumped because: nonzero pincount | CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0-rc2-00001-gc6811bf0cd87 #1 | Hardware name: linux,dummy-virt (DT) | Workqueue: events_unbound io_ring_exit_work | Call trace: | dump_backtrace+0x13c/0x208 | show_stack+0x34/0x58 | dump_stack_lvl+0x150/0x1a8 | dump_stack+0x20/0x30 | bad_page+0xec/0x238 | free_tail_pages_check+0x280/0x350 | free_pcp_prepare+0x60c/0x830 | free_unref_page+0x50/0x498 | free_compound_page+0xcc/0x100 | free_transhuge_page+0x1f0/0x2b8 | destroy_large_folio+0x80/0xc8 | __folio_put+0xc4/0xf8 | gup_put_folio+0xd0/0x250 | unpin_user_page+0xcc/0x128 | io_buffer_unmap+0xec/0x2c0 | __io_sqe_buffers_unregister+0xa4/0x1e0 | io_ring_exit_work+0x68c/0x1188 | process_one_work+0x91c/0x1a58 | worker_thread+0x48c/0xe30 | kthread+0x278/0x2f0 | ret_from_fork+0x10/0x20 Mark reports an issue with the recent patches coalescing compound pages while registering them in io_uring. The reason is that we try to drop excessive references with folio_put_refs(), but pages were acquired with pin_user_pages(), which has extra accounting and so should be put down with matching unpin_user_pages() or at least gup_put_folio(). As a fix unpin_user_pages() all but first page instead, and let's figure out a better API after. Fixes: 57bebf807e2abcf8 ("io_uring/rsrc: optimise registered huge pages") Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/10efd5507d6d1f05ea0f3c601830e08767e189bd.1678980230.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-16io_uring: rsrc: Optimize return value variable 'ret'Li zeming1-1/+1
The initialization assignment of the variable ret is changed to 0, only in 'goto fail;' Use the ret variable as the function return value. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20230317182538.3027-1-zeming@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-13/+45
Documentation/bpf/bpf_devel_QA.rst b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info") d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target") https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-08net: reclaim skb->scm_io_uring bitEric Dumazet1-2/+1
Commit 0091bfc81741 ("io_uring/af_unix: defer registered files gc to io_uring release") added one bit to struct sk_buff. This structure is critical for networking, and we try very hard to not add bloat on it, unless absolutely required. For instance, we can use a specific destructor as a wrapper around unix_destruct_scm(), to identify skbs that unix_gc() has to special case. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-24io_uring/rsrc: always initialize 'folio' to NULLJens Axboe1-1/+1
Smatch complains that: smatch warnings: io_uring/rsrc.c:1262 io_sqe_buffer_register() error: uninitialized symbol 'folio'. 'folio' may be used uninitialized, which can happen if we end up with a single page mapped. Ensure that we clear folio to NULL at the top so it's always set. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202302241432.YML1CD5C-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-22io_uring/rsrc: optimise registered huge pagesPavel Begunkov1-6/+32
When registering huge pages, internally io_uring will split them into many PAGE_SIZE bvec entries. That's bad for performance as drivers need to eventually dma-map the data and will do it individually for each bvec entry. Coalesce huge pages into one large bvec. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-22io_uring/rsrc: optimise single entry advancePavel Begunkov1-1/+4
Iterating within the first bvec entry should be essentially free, but we use iov_iter_advance() for that, which shows up in benchmark profiles taking up to 0.5% of CPU. Replace it with a hand coded version. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-22io_uring/rsrc: disallow multi-source reg buffersPavel Begunkov1-5/+8
If two or more mappings go back to back to each other they can be passed into io_uring to be registered as a single registered buffer. That would even work if mappings came from different sources, e.g. it's possible to mix in this way anon pages and pages from shmem or hugetlb. That is not a problem but it'd rather be less prone if we forbid such mixing. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-22io_uring/rsrc: fix a comment in io_import_fixed()Pavel Begunkov1-1/+1
io_import_fixed() supports offsets, but "may not" means the opposite. Replace it with "might not" so the comments rather speaks about possible cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/5b5f79958456caa6dc532f6205f75f224b232c81.1676902343.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03io_uring: use bvec_set_page to initialize a bvecChristoph Hellwig1-3/+1
Use the bvec_set_page helper to initialize a bvec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230203150634.3199647-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-07io_uring: use tw for putting rsrcPavel Begunkov1-2/+17
Use task_work for completing rsrc removals, it'll be needed later for spinlock optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cbba5d53a11ee6fc2194dacea262c1d733c8b529.1670384893.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-30io_uring: don't reinstall quiesce node for each twPavel Begunkov1-20/+18
There is no need to reinit data and install a new rsrc node every time we get a task_work, it's detrimental, just execute it and conitnue waiting. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3895d3344164cd9b3a0bbb24a6e357e20a13434b.1669821213.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-30io_uring: improve rsrc quiesce refs checksPavel Begunkov1-9/+6
Do a little bit of refactoring of io_rsrc_ref_quiesce(), flatten the data refs checks and so get rid of a conditional weird unlock-else-break construct. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d21283e9f88a77612c746ed526d86fe3bfb58a70.1669821213.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: remove overflow param from io_post_aux_cqeDylan Yudaken1-2/+2
The only call sites which would not allow overflow are also call sites which would use the io_aux_cqe as they care about ordering. So remove this parameter from io_post_aux_cqe. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-9-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-21io_uring: do not always force run task_work in io_uring_registerDylan Yudaken1-1/+6
Running task work when not needed can unnecessarily delay operations. Specifically IORING_SETUP_DEFER_TASKRUN tries to avoid running task work until the user requests it. Therefore do not run it in io_uring_register any more. The one catch is that io_rsrc_ref_quiesce expects it to have run in order to process all outstanding references, and so reorder it's loop to do this. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221107123349.4106213-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-17io_uring: remove FFS_SCMPavel Begunkov1-5/+2
THe lifetime of SCM'ed files is bound to ring_sock, which is destroyed strictly after we're done with registered file tables. This means there is no need for the FFS_SCM hack, which was not available on 32-bit builds anyway. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/984226a1045adf42dc35d8bd7fb5a8bbfa472ce1.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-13io_uring/af_unix: defer registered files gc to io_uring releasePavel Begunkov1-0/+1
Instead of putting io_uring's registered files in unix_gc() we want it to be done by io_uring itself. The trick here is to consider io_uring registered files for cycle detection but not actually putting them down. Because io_uring can't register other ring instances, this will remove all refs to the ring file triggering the ->release path and clean up with io_ring_ctx_free(). Cc: stable@vger.kernel.org Fixes: 6b06314c47e1 ("io_uring: add file set registration") Reported-and-tested-by: David Bouman <dbouman03@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> [axboe: add kerneldoc comment to skb, fold in skb leak fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>