summaryrefslogtreecommitdiff
path: root/io_uring
AgeCommit message (Collapse)AuthorFilesLines
3 daysio_uring/net: fix slab-out-of-bounds read in io_bundle_nbufs()Junxi Qian1-0/+4
commit b948f9d5d3057b01188e36664e7c7604d1c8ecb5 upstream. sqe->len is __u32 but gets stored into sr->len which is int. When userspace passes sqe->len values exceeding INT_MAX (e.g. 0xFFFFFFFF), sr->len overflows to a negative value. This negative value propagates through the bundle recv/send path: 1. io_recv(): sel.val = sr->len (ssize_t gets -1) 2. io_recv_buf_select(): arg.max_len = sel->val (size_t gets 0xFFFFFFFFFFFFFFFF) 3. io_ring_buffers_peek(): buf->len is not clamped because max_len is astronomically large 4. iov[].iov_len = 0xFFFFFFFF flows into io_bundle_nbufs() 5. io_bundle_nbufs(): min_t(int, 0xFFFFFFFF, ret) yields -1, causing ret to increase instead of decrease, creating an infinite loop that reads past the allocated iov[] array This results in a slab-out-of-bounds read in io_bundle_nbufs() from the kmalloc-64 slab, as nbufs increments past the allocated iovec entries. BUG: KASAN: slab-out-of-bounds in io_bundle_nbufs+0x128/0x160 Read of size 8 at addr ffff888100ae05c8 by task exp/145 Call Trace: io_bundle_nbufs+0x128/0x160 io_recv_finish+0x117/0xe20 io_recv+0x2db/0x1160 Fix this by rejecting negative sr->len values early in both io_sendmsg_prep() and io_recvmsg_prep(). Since sqe->len is __u32, any value > INT_MAX indicates overflow and is not a valid length. Fixes: a05d1f625c7a ("io_uring/net: support bundles for send") Cc: stable@vger.kernel.org Signed-off-by: Junxi Qian <qjx1298677004@gmail.com> Link: https://patch.msgid.link/20260329153909.279046-1-qjx1298677004@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: propagate BUF_MORE through early buffer commit pathJens Axboe1-1/+4
Commit 418eab7a6f3c002d8e64d6e95ec27118017019af upstream. When io_should_commit() returns true (eg for non-pollable files), buffer commit happens at buffer selection time and sel->buf_list is set to NULL. When __io_put_kbufs() generates CQE flags at completion time, it calls __io_put_kbuf_ring() which finds a NULL buffer_list and hence cannot determine whether the buffer was consumed or not. This means that IORING_CQE_F_BUF_MORE is never set for non-pollable input with incrementally consumed buffers. Likewise for io_buffers_select(), which always commits upfront and discards the return value of io_kbuf_commit(). Add REQ_F_BUF_MORE to store the result of io_kbuf_commit() during early commit. Then __io_put_kbuf_ring() can check this flag and set IORING_F_BUF_MORE accordingy. Reported-by: Martin Michaelis <code@mgjm.de> Cc: stable@vger.kernel.org Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Link: https://github.com/axboe/liburing/issues/1553 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: fix missing BUF_MORE for incremental buffers at EOFJens Axboe1-0/+4
Commit 3ecd3e03144b38a21a3b70254f1b9d2e16629b09 upstream. For a zero length transfer, io_kbuf_inc_commit() is called with !len. Since we never enter the while loop to consume the buffers, io_kbuf_inc_commit() ends up returning true, consuming the buffer. But if no data was consumed, by definition it cannot have consumed the buffer. Return false for that case. Reported-by: Martin Michaelis <code@mgjm.de> Cc: stable@vger.kernel.org Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Link: https://github.com/axboe/liburing/issues/1553 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: use WRITE_ONCE() for userspace-shared buffer ring fieldsJoanne Koong1-4/+4
Commit a4c694bfc2455e82b7caf6045ca893d123e0ed11 upstream. buf->addr and buf->len reside in memory shared with userspace. They should be written with WRITE_ONCE() to guarantee atomic stores and prevent tearing or other unsafe compiler optimizations. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Cc: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: use READ_ONCE() for userspace-mapped memoryCaleb Sander Mateos1-5/+5
Commit 78385c7299f7514697d196b3233a91bd5e485591 upstream. The struct io_uring_buf elements in a buffer ring are in a memory region accessible from userspace. A malicious/buggy userspace program could therefore write to them at any time, so they should be accessed with READ_ONCE() in the kernel. Commit 98b6fa62c84f ("io_uring/kbuf: always use READ_ONCE() to read ring provided buffer lengths") already switched the reads of the len field to READ_ONCE(). Do the same for bid and addr. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers") Cc: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: always use READ_ONCE() to read ring provided buffer lengthsJens Axboe1-7/+13
Commit 98b6fa62c84f2e129161e976a5b9b3cb4ccd117b upstream. Since the buffers are mapped from userspace, it is prudent to use READ_ONCE() to read the value into a local variable, and use that for any other actions taken. Having a stable read of the buffer length avoids worrying about it changing after checking, or being read multiple times. Similarly, the buffer may well change in between it being picked and being committed. Ensure the looping for incremental ring buffer commit stops if it hits a zero sized buffer, as no further progress can be made at that point. Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Link: https://lore.kernel.org/io-uring/tencent_000C02641F6250C856D0C26228DE29A3D30A@qq.com/ Reported-by: Qingyue Zhang <chunzhennn@qq.com> Reported-by: Suoxing Zhang <aftern00n@qq.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: enable bundles for incrementally consumed buffersJens Axboe1-30/+26
Commit cf9536e550dd243a1681fdbf804221527da20a80 upstream. The original support for incrementally consumed buffers didn't allow it to be used with bundles, with the assumption being that incremental buffers are generally larger, and hence there's less of a nedd to support it. But that assumption may not be correct - it's perfectly viable to use smaller buffers with incremental consumption, and there may be valid reasons for an application or framework to do so. As there's really no need to explicitly disable bundles with incrementally consumed buffers, allow it. This actually makes the peek side cheaper and simpler, with the completion side basically the same, just needing to iterate for the consumed length. Reported-by: Norman Maurer <norman_maurer@apple.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/rw: check for NULL io_br_sel when putting a bufferJens Axboe1-2/+5
Commit 18d6b1743eafeb3fb1e0ea5a2b7fd0a773d525a8 upstream. Both the read and write side use kiocb_done() to finish a request, and kiocb_done() will call io_put_kbuf() in case a provided buffer was used for the request. Provided buffers are not supported for writes, hence NULL is being passed in. This normally works fine, as io_put_kbuf() won't actually use the value unless REQ_F_BUFFER_RING or REQ_F_BUFFER_SELECTED is set in the request flags. But depending on compiler (or whether or not CONFIG_CC_OPTIMIZE_FOR_SIZE is set), that may be done even though the value is never used. This will then cause a NULL pointer dereference. Make it a bit more obvious and check for a NULL io_br_sel, and don't even bother calling io_put_kbuf() for that case. Fixes: 5fda51255439 ("io_uring/kbuf: switch to storing struct io_buffer_list locally") Reported-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/net: correct type for min_not_zero() castJens Axboe1-1/+1
Commit 37500634d0a8f931e15879760fb70f9b6f5d5370 upstream. The kernel test robot reports that after a recent change, the signedness of a min_not_zero() compare is now incorrect. Fix that up and cast to the right type. Fixes: 429884ff35f7 ("io_uring/kbuf: use struct io_br_sel for multiple buffers picking") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509020426.WJtrdwOU-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring: remove async/poll related provided buffer recyclesJens Axboe2-6/+0
Commit e973837b54024f070b2b48c7ee9725548548257a upstream. These aren't necessary anymore, get rid of them. Link: https://lore.kernel.org/r/20250821020750.598432-13-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: switch to storing struct io_buffer_list locallyJens Axboe6-68/+60
Commit 5fda51255439addd1c9059098e30847a375a1008 upstream. Currently the buffer list is stored in struct io_kiocb. The buffer list can be of two types: 1) Classic/legacy buffer list. These don't need to get referenced after a buffer pick, and hence storing them in struct io_kiocb is perfectly fine. 2) Ring provided buffer lists. These DO need to be referenced after the initial buffer pick, as they need to get consumed later on. This can be either just incrementing the head of the ring, or it can be consuming parts of a buffer if incremental buffer consumptions has been configured. For case 2, io_uring needs to be careful not to access the buffer list after the initial pick-and-execute context. The core does recycling of these, but it's easy to make a mistake, because it's stored in the io_kiocb which does persist across multiple execution contexts. Either because it's a multishot request, or simply because it needed some kind of async trigger (eg poll) for retry purposes. Add a struct io_buffer_list to struct io_br_sel, which is always on stack for the various users of it. This prevents the buffer list from leaking outside of that execution context, and additionally it enables kbuf to not even pass back the struct io_buffer_list if the given context isn't appropriately locked already. This doesn't fix any bugs, it's simply a defensive measure to prevent any issues with reuse of a buffer list. Link: https://lore.kernel.org/r/20250821020750.598432-12-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/net: use struct io_br_sel->val as the send finish valueJens Axboe1-10/+12
Commit 461382a51fb83a9c4b7c50e1f10d3ca94edff25e upstream. Currently a pointer is passed in to the 'ret' in the send mshot handler, but since we already have a value field in io_br_sel, just use that. This is also in preparation for needing to pass in struct io_br_sel to io_send_finish() anyway. Link: https://lore.kernel.org/r/20250821020750.598432-11-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/net: use struct io_br_sel->val as the recv finish valueJens Axboe1-14/+17
Commit 58d815091890e83aa2f83a9cce1fdfe3af02c7b4 upstream. Currently a pointer is passed in to the 'ret' in the receive handlers, but since we already have a value field in io_br_sel, just use that. This is also in preparation for needing to pass in struct io_br_sel to io_recv_finish() anyway. Link: https://lore.kernel.org/r/20250821020750.598432-10-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: use struct io_br_sel for multiple buffers pickingJens Axboe3-18/+23
Commit 429884ff35f75a8ac3e8f822f483e220e3ea6394 upstream. The networking side uses bundles, which is picking multiple buffers at the same time. Pass in struct io_br_sel to those helpers. Link: https://lore.kernel.org/r/20250821020750.598432-9-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: introduce struct io_br_selJens Axboe4-37/+57
Commit ab6559bdbb08f6bee606435cd014fc5ba0f7b750 upstream. Rather than return addresses directly from buffer selection, add a struct around it. No functional changes in this patch, it's in preparation for storing more buffer related information locally, rather than in struct io_kiocb. Link: https://lore.kernel.org/r/20250821020750.598432-7-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: pass in struct io_buffer_list to commit/recycle helpersJens Axboe6-45/+46
Commit 1b5add75d7c894c62506c9b55f1d9eaadae50ef1 upstream. Rather than have this implied being in the io_kiocb, pass it in directly so it's immediately obvious where these users of ->buf_list are coming from. Link: https://lore.kernel.org/r/20250821020750.598432-6-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/net: clarify io_recv_buf_select() return valueJens Axboe1-1/+1
Commit b22743f29b7d3dc68c68f9bd39a1b2600ec6434e upstream. It returns 0 on success, less than zero on error. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/net: don't use io_net_kbuf_recyle() for non-provided casesJens Axboe1-3/+3
Commit 15ba5e51e689ceb1c2e921c5180a70c88cfdc8e9 upstream. A previous commit used io_net_kbuf_recyle() for any network helper that did IO and needed partial retry. However, that's only needed if the opcode does buffer selection, which isnt support for sendzc, sendmsg_zc, or sendmsg. Just remove them - they don't do any harm, but it is a bit confusing when reading the code. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: drop 'issue_flags' from io_put_kbuf(s)() argumentsJens Axboe4-17/+14
Commit 5e73b402cbbea51bcab90fc5ee6c6d06af76ae1b upstream. Picking multiple buffers always requires the ring lock to be held across the operation, so there's no need to pass in the issue_flags to io_put_kbufs(). On the single buffer side, if the initial picking of a ring buffer was unlocked, then it will have been committed already. For legacy buffers, no locking is required, as they will simply be freed. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: uninline __io_put_kbufsPavel Begunkov2-63/+70
Commit 5d3e51240d89678b87b5dc6987ea572048a0f0eb upstream. __io_put_kbufs() and other helper functions are too large to be inlined, compilers would normally refuse to do so. Uninline it and move together with io_kbuf_commit into kbuf.c. io_kbuf_commitSigned-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3dade7f55ad590e811aff83b1ec55c9c04e17b2b.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: introduce io_kbuf_drop_legacy()Pavel Begunkov3-25/+14
Commit 54e00d9a612ab93f37f612a5ccd7c0c4f8a31cea upstream. io_kbuf_drop() is only used for legacy provided buffers, and so __io_put_kbuf_list() is never called for REQ_F_BUFFER_RING. Remove the dead branch out of __io_put_kbuf_list(), rename it into io_kbuf_drop_legacy() and use it directly instead of io_kbuf_drop(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c8cc73e2272f09a86ecbdad9ebdd8304f8e583c0.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: open code __io_put_kbuf()Pavel Begunkov2-8/+1
Commit e150e70fce425e1cdfc227974893cad9fb90a0d3 upstream. __io_put_kbuf() is a trivial wrapper, open code it into __io_put_kbufs(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dc17380272b48d56c95992c6f9eaacd5546e1d3.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: remove legacy kbuf cachingPavel Begunkov3-56/+9
Commit 13ee854e7c04236a47a5beaacdcf51eb0bc7a8fa upstream. Remove all struct io_buffer caches. It makes it a fair bit simpler. Apart from from killing a bunch of lines and juggling between lists, __io_put_kbuf_list() doesn't need ->completion_lock locking now. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/18287217466ee2576ea0b1e72daccf7b22c7e856.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: simplify __io_put_kbufPavel Begunkov3-31/+8
Commit dc39fb1093ea33019f192c93b77b863282e10162 upstream. As a preparation step remove an optimisation from __io_put_kbuf() trying to use the locked cache. With that __io_put_kbuf_list() is only used with ->io_buffers_comp, and we remove the explicit list argument. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1b7f1394ec4afc7f96b35a61f5992e27c49fd067.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: remove legacy kbuf kmem cachePavel Begunkov3-8/+3
Commit 9afe6847cff78e7f3aa8f4c920265cf298033251 upstream. Remove the kmem cache used by legacy provided buffers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8195c207d8524d94e972c0c82de99282289f7f5c.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysio_uring/kbuf: remove legacy kbuf bulk allocationPavel Begunkov1-25/+5
Commit 7919292a961421bfdb22f83c16657684c96076b3 upstream. Legacy provided buffers are slow and discouraged in favour of the ring variant. Remove the bulk allocation to keep it simpler as we don't care about performance. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a064d70370e590efed8076e9501ae4cfc20fe0ca.1738724373.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25io_uring/kbuf: propagate BUF_MORE through early buffer commit pathJens Axboe2-3/+7
Commit 418eab7a6f3c002d8e64d6e95ec27118017019af upstream. When io_should_commit() returns true (eg for non-pollable files), buffer commit happens at buffer selection time and sel->buf_list is set to NULL. When __io_put_kbufs() generates CQE flags at completion time, it calls __io_put_kbuf_ring() which finds a NULL buffer_list and hence cannot determine whether the buffer was consumed or not. This means that IORING_CQE_F_BUF_MORE is never set for non-pollable input with incrementally consumed buffers. Likewise for io_buffers_select(), which always commits upfront and discards the return value of io_kbuf_commit(). Add REQ_F_BUF_MORE to store the result of io_kbuf_commit() during early commit. Then __io_put_kbuf_ring() can check this flag and set IORING_F_BUF_MORE accordingy. Reported-by: Martin Michaelis <code@mgjm.de> Cc: stable@vger.kernel.org Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Link: https://github.com/axboe/liburing/issues/1553 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25io_uring/kbuf: check if target buffer list is still legacy on recycleJens Axboe1-2/+10
Commit c2c185be5c85d37215397c8e8781abf0a69bec1f upstream. There's a gap between when the buffer was grabbed and when it potentially gets recycled, where if the list is empty, someone could've upgraded it to a ring provided type. This can happen if the request is forced via io-wq. The legacy recycling is missing checking if the buffer_list still exists, and if it's of the correct type. Add those checks. Cc: stable@vger.kernel.org Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers") Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25io_uring/uring_cmd: fix too strict requirement on ioctlAsbjørn Sloth Tønnesen1-3/+6
[ Upstream commit 600b665b903733bd60334e86031b157cc823ee55 ] Attempting SOCKET_URING_OP_SETSOCKOPT on an AF_NETLINK socket resulted in an -EOPNOTSUPP, as AF_NETLINK doesn't have an ioctl in its struct proto, but only in struct proto_ops. Prior to the blamed commit, io_uring_cmd_sock() only had two cmd_op operations, both requiring ioctl, thus the check was warranted. Since then, 4 new cmd_op operations have been added, none of which depend on ioctl. This patch moves the ioctl check, so it only applies to the original operations. AFAICT, the ioctl requirement was unintentional, and it wasn't visible in the blamed patch within 3 lines of context. Cc: stable@vger.kernel.org Fixes: a5d2f99aff6b ("io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT") Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> [Asbjørn: function moved in commit 91db6edc573b; updated subject prefix] Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-04io_uring/filetable: clamp alloc_hint to the configured alloc rangeJens Axboe1-0/+4
[ Upstream commit a6bded921ed35f21b3f6bd8e629bf488499ca442 ] Explicit fixed file install/remove operations on slots outside the configured alloc range can corrupt alloc_hint via io_file_bitmap_set() and io_file_bitmap_clear(), which unconditionally update alloc_hint to the bit position. This causes subsequent auto-allocations to fall outside the configured range. For example, if the alloc range is [10, 20) and a file is removed at slot 2, alloc_hint gets set to 2. The next auto-alloc then starts searching from slot 2, potentially returning a slot below the range. Fix this by clamping alloc_hint to [file_alloc_start, file_alloc_end) at the top of io_file_bitmap_get() before starting the search. Cc: stable@vger.kernel.org Fixes: 6e73dffbb93c ("io_uring: let to set a range for file slot allocation") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04io_uring/net: don't continue send bundle if poll was required for retryJens Axboe1-1/+5
[ Upstream commit 806ae939c41e5da1d94a1e2b31f5702e96b6c3e3 ] If a send bundle has picked a bunch of buffers, then it needs to send all of those to be complete. This may require poll arming, if the send buffer ends up being full. Once a send bundle has been poll armed, no further bundles should be attempted. This allows a current bundle to complete even though it needs to go through polling to do so, but it will not allow another bundle to be started once that has happened. Ideally we would abort a bundle if it was only partially sent, but as some parts of it already went out on the wire, this obviously isn't feasible. Not continuing more bundle attempts post encountering a full socket buffer is the second best thing. Cc: stable@vger.kernel.org Fixes: a05d1f625c7a ("io_uring/net: support bundles for send") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04io_uring/cancel: de-unionize file and user_data in struct io_cancel_dataJens Axboe1-4/+2
[ Upstream commit 22dbb0987bd1e0ec3b1e4ad20756a98f99aa4a08 ] By having them share the same space in struct io_cancel_data, it ends up disallowing IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_USERDATA from working. Eg you cannot match on both a file and user_data for cancelation purposes. This obviously isn't a common use case as nobody has reported this, but it does result in -ENOENT potentially being returned when trying to match on both, rather than actually doing what the API says it would. Fixes: 4bf94615b888 ("io_uring: allow IORING_OP_ASYNC_CANCEL with 'fd' key") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04io_uring/sync: validate passed in offsetJens Axboe1-0/+2
[ Upstream commit 649dd18f559891bdafc5532d737c7dfb56060a6d ] Check if the passed in offset is negative once cast to sync->off. This ensures that -EINVAL is returned for that case, like it would be for sync_file_range(2). Fixes: c992fe2925d7 ("io_uring: add fsync support") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLEDCaleb Sander Mateos3-4/+17
[ Upstream commit 7a8737e1132ff07ca225aa7a4008f87319b5b1ca ] io_uring_enter(), __io_msg_ring_data(), and io_msg_send_fd() read ctx->flags and ctx->submitter_task without holding the ctx's uring_lock. This means they may race with the assignment to ctx->submitter_task and the clearing of IORING_SETUP_R_DISABLED from ctx->flags in io_register_enable_rings(). Ensure the correct ordering of the ctx->flags and ctx->submitter_task memory accesses by storing to ctx->flags using release ordering and loading it using acquire ordering. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 4add705e4eeb ("io_uring: remove io_register_submitter") Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-12io_uring/rw: recycle buffers manually for non-mshot readsJens Axboe1-0/+2
Commit d8e1dec2f860ee40623609aa6c4f22e1ee45605d upstream. The mshot side of reads already does this, but the regular read path does not. This leads to needing recycling checks sprinkled in various spots in the "go async" path, like arming poll. In preparation for getting rid of those, ensure that read recycles appropriately. Link: https://lore.kernel.org/r/20250821020750.598432-8-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-30io_uring/io-wq: check IO_WQ_BIT_EXIT inside work run loopJens Axboe1-1/+1
commit 10dc959398175736e495f71c771f8641e1ca1907 upstream. Currently this is checked before running the pending work. Normally this is quite fine, as work items either end up blocking (which will create a new worker for other items), or they complete fairly quickly. But syzbot reports an issue where io-wq takes seemingly forever to exit, and with a bit of debugging, this turns out to be because it queues a bunch of big (2GB - 4096b) reads with a /dev/msr* file. Since this file type doesn't support ->read_iter(), loop_rw_iter() ends up handling them. Each read returns 16MB of data read, which takes 20 (!!) seconds. With a bunch of these pending, processing the whole chain can take a long time. Easily longer than the syzbot uninterruptible sleep timeout of 140 seconds. This then triggers a complaint off the io-wq exit path: INFO: task syz.4.135:6326 blocked for more than 143 seconds. Not tainted syzkaller #0 Blocked by coredump. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz.4.135 state:D stack:26824 pid:6326 tgid:6324 ppid:5957 task_flags:0x400548 flags:0x00080000 Call Trace: <TASK> context_switch kernel/sched/core.c:5256 [inline] __schedule+0x1139/0x6150 kernel/sched/core.c:6863 __schedule_loop kernel/sched/core.c:6945 [inline] schedule+0xe7/0x3a0 kernel/sched/core.c:6960 schedule_timeout+0x257/0x290 kernel/time/sleep_timeout.c:75 do_wait_for_common kernel/sched/completion.c:100 [inline] __wait_for_common+0x2fc/0x4e0 kernel/sched/completion.c:121 io_wq_exit_workers io_uring/io-wq.c:1328 [inline] io_wq_put_and_exit+0x271/0x8a0 io_uring/io-wq.c:1356 io_uring_clean_tctx+0x10d/0x190 io_uring/tctx.c:203 io_uring_cancel_generic+0x69c/0x9a0 io_uring/cancel.c:651 io_uring_files_cancel include/linux/io_uring.h:19 [inline] do_exit+0x2ce/0x2bd0 kernel/exit.c:911 do_group_exit+0xd3/0x2a0 kernel/exit.c:1112 get_signal+0x2671/0x26d0 kernel/signal.c:3034 arch_do_signal_or_restart+0x8f/0x7e0 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:41 [inline] exit_to_user_mode_loop+0x8c/0x540 kernel/entry/common.c:75 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline] do_syscall_64+0x4ee/0xf80 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa02738f749 RSP: 002b:00007fa0281ae0e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 00007fa0275e6098 RCX: 00007fa02738f749 RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007fa0275e6098 RBP: 00007fa0275e6090 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fa0275e6128 R14: 00007fff14e4fcb0 R15: 00007fff14e4fd98 There's really nothing wrong here, outside of processing these reads will take a LONG time. However, we can speed up the exit by checking the IO_WQ_BIT_EXIT inside the io_worker_handle_work() loop, as syzbot will exit the ring after queueing up all of these reads. Then once the first item is processed, io-wq will simply cancel the rest. That should avoid syzbot running into this complaint again. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/68a2decc.050a0220.e29e5.0099.GAE@google.com/ Reported-by: syzbot+4eb282331cab6d5b6588@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23io_uring: move local task_work in exit cancel loopMing Lei1-4/+4
commit da579f05ef0faada3559e7faddf761c75cdf85e1 upstream. With IORING_SETUP_DEFER_TASKRUN, task work is queued to ctx->work_llist (local work) rather than the fallback list. During io_ring_exit_work(), io_move_task_work_from_local() was called once before the cancel loop, moving work from work_llist to fallback_llist. However, task work can be added to work_llist during the cancel loop itself. There are two cases: 1) io_kill_timeouts() is called from io_uring_try_cancel_requests() to cancel pending timeouts, and it adds task work via io_req_queue_tw_complete() for each cancelled timeout: 2) URING_CMD requests like ublk can be completed via io_uring_cmd_complete_in_task() from ublk_queue_rq() during canceling, given ublk request queue is only quiesced when canceling the 1st uring_cmd. Since io_allowed_defer_tw_run() returns false in io_ring_exit_work() (kworker != submitter_task), io_run_local_work() is never invoked, and the work_llist entries are never processed. This causes io_uring_try_cancel_requests() to loop indefinitely, resulting in 100% CPU usage in kworker threads. Fix this by moving io_move_task_work_from_local() inside the cancel loop, ensuring any work on work_llist is moved to fallback before each cancel attempt. Cc: stable@vger.kernel.org Fixes: c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08io_uring: fix min_wait wakeups for SQPOLLJens Axboe1-0/+3
Commit e15cb2200b934e507273510ba6bc747d5cde24a3 upstream. Using min_wait, two timeouts are given: 1) The min_wait timeout, within which up to 'wait_nr' events are waited for. 2) The overall long timeout, which is entered if no events are generated in the min_wait window. If the min_wait has expired, any event being posted must wake the task. For SQPOLL, that isn't the case, as it won't trigger the io_has_work() condition, as it will have already processed the task_work that happened when an event was posted. This causes any event to trigger post the min_wait to not always cause the waiting application to wakeup, and instead it will wait until the overall timeout has expired. This can be shown in a test case that has a 1 second min_wait, with a 5 second overall wait, even if an event triggers after 1.5 seconds: axboe@m2max-kvm /d/iouring-mre (master)> zig-out/bin/iouring info: MIN_TIMEOUT supported: true, features: 0x3ffff info: Testing: min_wait=1000ms, timeout=5s, wait_nr=4 info: 1 cqes in 5000.2ms where the expected result should be: axboe@m2max-kvm /d/iouring-mre (master)> zig-out/bin/iouring info: MIN_TIMEOUT supported: true, features: 0x3ffff info: Testing: min_wait=1000ms, timeout=5s, wait_nr=4 info: 1 cqes in 1500.3ms When the min_wait timeout triggers, reset the number of completions needed to wake the task. This should ensure that any future events will wake the task, regardless of how many events it originally wanted to wait for. Reported-by: Tip ten Brink <tip@tenbrinkmeijs.com> Cc: stable@vger.kernel.org Fixes: 1100c4a2656d ("io_uring: add support for batch wait timeout") Link: https://github.com/axboe/liburing/issues/1477 Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit e15cb2200b934e507273510ba6bc747d5cde24a3) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08io_uring/poll: correctly handle io_poll_add() return value on updateJens Axboe1-2/+7
Commit 84230ad2d2afbf0c44c32967e525c0ad92e26b4e upstream. When the core of io_uring was updated to handle completions consistently and with fixed return codes, the POLL_REMOVE opcode with updates got slightly broken. If a POLL_ADD is pending and then POLL_REMOVE is used to update the events of that request, if that update causes the POLL_ADD to now trigger, then that completion is lost and a CQE is never posted. Additionally, ensure that if an update does cause an existing POLL_ADD to complete, that the completion value isn't always overwritten with -ECANCELED. For that case, whatever io_poll_add() set the value to should just be retained. Cc: stable@vger.kernel.org Fixes: 97b388d70b53 ("io_uring: handle completions in the core") Reported-by: syzbot+641eec6b7af1f62f2b99@syzkaller.appspotmail.com Tested-by: syzbot+641eec6b7af1f62f2b99@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08io_uring: fix filename leak in __io_openat_prep()Prithvi Tambewagh1-1/+1
commit b14fad555302a2104948feaff70503b64c80ac01 upstream. __io_openat_prep() allocates a struct filename using getname(). However, for the condition of the file being installed in the fixed file table as well as having O_CLOEXEC flag set, the function returns early. At that point, the request doesn't have REQ_F_NEED_CLEANUP flag set. Due to this, the memory for the newly allocated struct filename is not cleaned up, causing a memory leak. Fix this by setting the REQ_F_NEED_CLEANUP for the request just after the successful getname() call, so that when the request is torn down, the filename will be cleaned up, along with other resources needing cleanup. Reported-by: syzbot+00e61c43eb5e4740438f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=00e61c43eb5e4740438f Tested-by: syzbot+00e61c43eb5e4740438f@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Prithvi Tambewagh <activprithvi@gmail.com> Fixes: b9445598d8c6 ("io_uring: openat directly into fixed fd table") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-24io_uring/napi: fix io_napi_entry RCU accessesOlivier Langlois1-7/+12
[Upstream commit 45b3941d09d13b3503309be1f023b83deaf69b4d ] correct 3 RCU structures modifications that were not using the RCU functions to make their update. Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: io-uring@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: lvc-project@linuxtesting.org Signed-off-by: Olivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/9f53b5169afa8c7bf3665a0b19dc2f7061173530.1728828877.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk> [Stepan Artuhov: cherry-picked a commit] Signed-off-by: Stepan Artuhov <s.artuhov@tssltd.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-13io_uring/zctx: check chained notif contextsPavel Begunkov1-0/+5
[ Upstream commit ab3ea6eac5f45669b091309f592c4ea324003053 ] Send zc only links ubuf_info for requests coming from the same context. There are some ambiguous syz reports, so let's check the assumption on notification completion. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fd527d8638203fe0f1c5ff06ff2e1d8fd68f831b.1755179962.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-29io_uring/sqpoll: be smarter on when to update the stime usageJens Axboe1-11/+32
Commit a94e0657269c5b8e1a90b17aa2c048b3d276e16d upstream. The current approach is a bit naive, and hence calls the time querying way too often. Only start the "doing work" timer when there's actual work to do, and then use that information to terminate (and account) the work time once done. This greatly reduces the frequency of these calls, when they cannot have changed anyway. Running a basic random reader that is setup to use SQPOLL, a profile before this change shows these as the top cycle consumers: + 32.60% iou-sqp-1074 [kernel.kallsyms] [k] thread_group_cputime_adjusted + 19.97% iou-sqp-1074 [kernel.kallsyms] [k] thread_group_cputime + 12.20% io_uring io_uring [.] submitter_uring_fn + 4.13% iou-sqp-1074 [kernel.kallsyms] [k] getrusage + 2.45% iou-sqp-1074 [kernel.kallsyms] [k] io_submit_sqes + 2.18% iou-sqp-1074 [kernel.kallsyms] [k] __pi_memset_generic + 2.09% iou-sqp-1074 [kernel.kallsyms] [k] cputime_adjust and after this change, top of profile looks as follows: + 36.23% io_uring io_uring [.] submitter_uring_fn + 23.26% iou-sqp-819 [kernel.kallsyms] [k] io_sq_thread + 10.14% iou-sqp-819 [kernel.kallsyms] [k] io_sq_tw + 6.52% iou-sqp-819 [kernel.kallsyms] [k] tctx_task_work_run + 4.82% iou-sqp-819 [kernel.kallsyms] [k] nvme_submit_cmds.part.0 + 2.91% iou-sqp-819 [kernel.kallsyms] [k] io_submit_sqes [...] 0.02% iou-sqp-819 [kernel.kallsyms] [k] cputime_adjust where it's spending the cycles on things that actually matter. Reported-by: Fengnan Chang <changfengnan@bytedance.com> Cc: stable@vger.kernel.org Fixes: 3fcb9d17206e ("io_uring/sqpoll: statistics of the true utilization of sq threads") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-29io_uring/sqpoll: switch away from getrusage() for CPU accountingJens Axboe3-18/+23
Commit 8ac9b0d33e5c0a995338ee5f25fe1b6ff7d97f65 upstream. getrusage() does a lot more than what the SQPOLL accounting needs, the latter only cares about (and uses) the stime. Rather than do a full RUSAGE_SELF summation, just query the used stime instead. Cc: stable@vger.kernel.org Fixes: 3fcb9d17206e ("io_uring/sqpoll: statistics of the true utilization of sq threads") Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-29io_uring: correct __must_hold annotation in io_install_fixed_fileAlok Tiwari1-1/+1
[ Upstream commit c5efc6a0b3940381d67887302ddb87a5cf623685 ] The __must_hold annotation references &req->ctx->uring_lock, but req is not in scope in io_install_fixed_file. This change updates the annotation to reference the correct ctx->uring_lock. improving code clarity. Fixes: f110ed8498af ("io_uring: split out fixed file installation and removal") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23Revert "io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()"Jens Axboe1-1/+1
Commit 927069c4ac2cd1a37efa468596fb5b8f86db9df0 upstream. This reverts commit 90bfb28d5fa8127a113a140c9791ea0b40ab156a. Kevin reports that this commit causes an issue for him with LVM snapshots, most likely because of turning off NOWAIT support while a snapshot is being created. This makes -EOPNOTSUPP bubble back through the completion handler, where io_uring read/write handling should just retry it. Reinstate the previous check removed by the referenced commit. Cc: stable@vger.kernel.org Fixes: 90bfb28d5fa8 ("io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()") Reported-by: Salvatore Bonaccorso <carnil@debian.org> Reported-by: Kevin Lumik <kevin@xf.ee> Link: https://lore.kernel.org/io-uring/cceb723c-051b-4de2-9a4c-4aa82e1619ee@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15io_uring/waitid: always prune wait queue entry in io_waitid_wait()Jens Axboe1-1/+2
commit 2f8229d53d984c6a05b71ac9e9583d4354e3b91f upstream. For a successful return, always remove our entry from the wait queue entry list. Previously this was skipped if a cancelation was in progress, but this can race with another invocation of the wait queue entry callback. Cc: stable@vger.kernel.org Fixes: f31ecf671ddc ("io_uring: add IORING_OP_WAITID support") Reported-by: syzbot+b9e83021d9c642a33d8c@syzkaller.appspotmail.com Tested-by: syzbot+b9e83021d9c642a33d8c@syzkaller.appspotmail.com Link: https://lore.kernel.org/io-uring/68e5195e.050a0220.256323.001f.GAE@google.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-25io_uring: fix incorrect io_kiocb reference in io_link_skbYang Xiuwei1-1/+1
[ Upstream commit 2c139a47eff8de24e3350dadb4c9d5e3426db826 ] In io_link_skb function, there is a bug where prev_notif is incorrectly assigned using 'nd' instead of 'prev_nd'. This causes the context validation check to compare the current notification with itself instead of comparing it with the previous notification. Fix by using the correct prev_nd parameter when obtaining prev_notif. Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: 6fe4220912d19 ("io_uring/notif: implement notification stacking") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-25io_uring/kbuf: drop WARN_ON_ONCE() from incremental length checkJens Axboe1-1/+1
Partially based on commit 98b6fa62c84f2e129161e976a5b9b3cb4ccd117b upstream. This can be triggered by userspace, so just drop it. The condition is appropriately handled. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-25io_uring/msg_ring: kill alloc_cache for io_kiocb allocationsJens Axboe2-27/+2
Commit df8922afc37aa2111ca79a216653a629146763ad upstream. A recent commit: fc582cd26e88 ("io_uring/msg_ring: ensure io_kiocb freeing is deferred for RCU") fixed an issue with not deferring freeing of io_kiocb structs that msg_ring allocates to after the current RCU grace period. But this only covers requests that don't end up in the allocation cache. If a request goes into the alloc cache, it can get reused before it is sane to do so. A recent syzbot report would seem to indicate that there's something there, however it may very well just be because of the KASAN poisoning that the alloc_cache handles manually. Rather than attempt to make the alloc_cache sane for that use case, just drop the usage of the alloc_cache for msg_ring request payload data. Fixes: 50cf5f3842af ("io_uring/msg_ring: add an alloc cache for io_kiocb entries") Link: https://lore.kernel.org/io-uring/68cc2687.050a0220.139b6.0005.GAE@google.com/ Reported-by: syzbot+baa2e0f4e02df602583e@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>