kernel/linux.git/io_uring/io_uring.c, branch v7.1-rc5

io_uring: propagate array_index_nospec opcode into req->opcode

2026-05-18T14:59:12+00:00

Commit 1e988c3fe126 ("io_uring: prevent opcode speculation") added array_index_nospec() to io_init_req(), but applied it only to a local opcode variable. req->opcode is initialized from sqe->opcode before the bounds check and remains the raw value. Keep req->opcode as the canonical opcode in io_init_req(): reject out-of-range values architecturally, then write the array_index_nospec() result back to req->opcode before any table lookup. This keeps downstream users of req->opcode from observing the raw user byte on a mispredicted path. No functional change: array_index_nospec() is a no-op for opcodes in [0, IORING_OP_LAST), and out-of-range opcodes are still rejected at the bounds check above the assignment. Fixes: 1e988c3fe126 ("io_uring: prevent opcode speculation") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito Link: https://patch.msgid.link/20260517213010.696135-1-michael.bommarito@gmail.com Signed-off-by: Jens Axboe

io_uring: validate user-controlled cq.head in io_cqe_cache_refill()

2026-05-14T03:44:57+00:00

A fuzzing run reproduced an unkillable io_uring task stuck at ~100% CPU: [root@fedora io_uring_stress]# ps -ef | grep io_uring root 1240 1 99 13:36 ? 00:01:35 [io_uring_stress] The task loops inside io_cqring_wait() and never returns to userspace, and SIGKILL has no effect. This is caused by the CQ ring exposing rings->cq.head to userspace as writable, while the authoritative tail lives in kernel-private ctx->cached_cq_tail. io_cqe_cache_refill() computes free space as an unsigned subtraction: free = ctx->cq_entries - min(tail - head, ctx->cq_entries); If userspace keeps head within [0, tail], the subtraction is well defined and min() just acts as a defensive clamp. But if userspace advances head past tail, (tail - head) wraps to a huge value, free becomes 0, and io_cqe_cache_refill() fails. The CQE is pushed onto the overflow list and IO_CHECK_CQ_OVERFLOW_BIT is set. The wait loop in io_cqring_wait() relies on an invariant: refill() only fails when the CQ is *physically* full, in which case rings->cq.tail has been advanced to iowq->cq_tail and io_should_wake() returns true. The tampered head breaks this: refill() fails while the ring is not full, no OCQE is copied in, rings->cq.tail never catches up, io_should_wake() stays false, and io_cqring_wait_schedule() keeps returning early because IO_CHECK_CQ_OVERFLOW_BIT is still set. The result is a tight retry loop that never returns to userspace. Introduce io_cqring_queued() as the single point that converts the (tail, head) pair into a trustworthy queued count. Since the real head/tail distance is bounded by cq_entries (far below 2^31), a signed comparison reliably detects userspace moving head past tail; in that case treat the queue as empty so callers see the full cache as free and forward progress is preserved. Suggested-by: Jens Axboe Signed-off-by: Zizhi Wo Link: https://patch.msgid.link/20260514021847.4062782-1-wozizhi@huaweicloud.com [axboe: fixup commit message, kill 'queued' var, and keep it all in io_uring.c] Signed-off-by: Jens Axboe

io_uring: hold uring_lock when walking link chain in io_wq_free_work()

2026-05-11T17:14:29+00:00

io_wq_free_work() calls io_req_find_next() from io-wq worker context, which reads and clears req->link without holding any lock. This can potentially race with other paths that mutate the same chain under ctx->uring_lock. Take ctx->uring_lock around the io_req_find_next() call. Only requests with IO_REQ_LINK_FLAGS reach this path, which is not the hot path. Signed-off-by: Jens Axboe

io_uring: fix spurious fput in registered ring path

2026-04-21T18:18:44+00:00

Fix an issue with io_uring_ctx_get_file() not gating fput() on whether or not the file descriptor is a registered/direct one or not. Fixes: c5e9f6a96bf7 ("io_uring: unify getting ctx from passed in file descriptor") Reviewed-by: Gabriel Krisman Bertazi Signed-off-by: Jens Axboe

Merge tag 'for-7.1/io_uring-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

2026-04-13T23:22:30+00:00

Pull io_uring updates from Jens Axboe: - Add a callback driven main loop for io_uring, and BPF struct_ops on top to allow implementing custom event loop logic - Decouple IOPOLL from being a ring-wide all-or-nothing setting, allowing IOPOLL use cases to also issue certain white listed non-polled opcodes - Timeout improvements. Migrate internal timeout storage from timespec64 to ktime_t for simpler arithmetic and avoid copying of timespec data - Zero-copy receive (zcrx) updates: - Add a device-less mode (ZCRX_REG_NODEV) for testing and experimentation where data flows through the copy fallback path - Fix two-step unregistration regression, DMA length calculations, xarray mark usage, and a potential 32-bit overflow in id shifting - Refactoring toward multi-area support: dedicated refill queue struct, consolidated DMA syncing, netmem array refilling format, and guard-based locking - Zero-copy transmit (zctx) cleanup: - Unify io_send_zc() and io_sendmsg_zc() into a single function - Add vectorized registered buffer send for IORING_OP_SEND_ZC - Add separate notification user_data via sqe->addr3 so notification and completion CQEs can be distinguished without extra reference counting - Switch struct io_ring_ctx internal bitfields to explicit flag bits with atomic-safe accessors, and annotate the known harmless races on those flags - Various optimizations caching ctx and other request fields in local variables to avoid repeated loads, and cleanups for tctx setup, ring fd registration, and read path early returns * tag 'for-7.1/io_uring-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (58 commits) io_uring: unify getting ctx from passed in file descriptor io_uring/register: don't get a reference to the registered ring fd io_uring/tctx: clean up __io_uring_add_tctx_node() error handling io_uring/tctx: have io_uring_alloc_task_context() return tctx io_uring/timeout: use 'ctx' consistently io_uring/rw: clean up __io_read() obsolete comment and early returns io_uring/zcrx: use correct mmap off constants io_uring/zcrx: use dma_len for chunk size calculation io_uring/zcrx: don't clear not allocated niovs io_uring/zcrx: don't use mark0 for allocating xarray io_uring: cast id to u64 before shifting in io_allocate_rbuf_ring() io_uring/zcrx: reject REG_NODEV with large rx_buf_size io_uring/cancel: validate opcode for IORING_ASYNC_CANCEL_OP io_uring/rsrc: use io_cache_free() to free node io_uring/zcrx: rename zcrx [un]register functions io_uring/zcrx: check ctrl op payload struct sizes io_uring/zcrx: cache fallback availability in zcrx ctx io_uring/zcrx: warn on a repeated area append io_uring/zcrx: consolidate dma syncing io_uring/zcrx: netmem array as refiling format ...

io_uring: unify getting ctx from passed in file descriptor

2026-04-08T19:21:35+00:00

io_uring_enter() and io_uring_register() end up having duplicated code for getting a ctx from a passed in file descriptor, for either a registered ring descriptor or a normal file descriptor. Move the io_uring_register_get_file() into io_uring.c and name it a bit more generically, and use it from both callsites rather than have that logic and handling duplicated. Signed-off-by: Jens Axboe

Merge tag 'io_uring-7.0-20260403' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

2026-04-03T18:58:04+00:00

Pull io_uring fixes from Jens Axboe: - A previous fix in this release covered the case of the rings being RCU protected during resize, but it missed a few spots. This covers the rest - Fix the cBPF filters when COW'ed, introduced in this merge window - Fix for an attempt to import a zero sized buffer - Fix for a missing clamp in importing bundle buffers * tag 'io_uring-7.0-20260403' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/bpf_filters: retain COW'ed settings on parse failures io_uring: protect remaining lockless ctx->rings accesses with RCU io_uring/rsrc: reject zero-length fixed buffer import io_uring/net: fix slab-out-of-bounds read in io_bundle_nbufs()

io_uring/zcrx: rename zcrx [un]register functions

2026-04-01T16:21:13+00:00

Drop "ifqs" from function names, as it refers to an interface queue and there might be none once a device-less mode is introduced. Signed-off-by: Pavel Begunkov Link: https://patch.msgid.link/657874acd117ec30fa6f45d9d844471c753b5a0f.1774261953.git.asml.silence@gmail.com Signed-off-by: Jens Axboe

io_uring/zcrx: return back two step unregistration

2026-04-01T16:21:12+00:00

There are reports where io_uring instance removal takes too long and an ifq reallocation by another zcrx instance fails. Split zcrx destruction into two steps similarly how it was before, first close the queue early but maintain zcrx alive, and then when all inflight requests are completed, drop the main zcrx reference. For extra protection, mark terminated zcrx instances in xarray and warn if we double put them. Cc: stable@vger.kernel.org # 6.19+ Link: https://github.com/axboe/liburing/issues/1550 Reported-by: Youngmin Choi Signed-off-by: Pavel Begunkov Link: https://patch.msgid.link/0ce21f0565ab4358668922a28a8a36922dfebf76.1774261953.git.asml.silence@gmail.com [axboe: NULL ifq before break inside scoped guard] Signed-off-by: Jens Axboe

io_uring: protect remaining lockless ctx->rings accesses with RCU

2026-04-01T14:34:11+00:00

Commit 96189080265e addressed one case of ctx->rings being potentially accessed while a resize is happening on the ring, but there are still a few others that need handling. Add a helper for retrieving the rings associated with an io_uring context, and add some sanity checking to that to catch bad uses. ->rings_rcu is always valid, as long as it's used within RCU read lock. Any use of ->rings_rcu or ->rings inside either ->uring_lock or ->completion_lock is sane as well. Do the minimum fix for the current kernel, but set it up such that this basic infra can be extended for later kernels to make this harder to mess up in the future. Thanks to Junxi Qian for finding and debugging this issue. Cc: stable@vger.kernel.org Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Reviewed-by: Junxi Qian Tested-by: Junxi Qian Link: https://lore.kernel.org/io-uring/20260330172348.89416-1-qjx1298677004@gmail.com/ Signed-off-by: Jens Axboe