diff options
| author | Zizhi Wo <wozizhi@huawei.com> | 2026-05-14 05:18:47 +0300 |
|---|---|---|
| committer | Jens Axboe <axboe@kernel.dk> | 2026-05-14 06:44:57 +0300 |
| commit | f44d38a31f1802b7222adaea9ee69f9d280f698a (patch) | |
| tree | 9f836b5a33519a0385aeeff1d51d44470d786e10 /drivers/gpu/tests/git@radix-linux.su:pub | |
| parent | d6a2d7b04b5a093021a7a0e2e69e9d5237dfa8cc (diff) | |
| download | linux-f44d38a31f1802b7222adaea9ee69f9d280f698a.tar.xz | |
io_uring: validate user-controlled cq.head in io_cqe_cache_refill()
A fuzzing run reproduced an unkillable io_uring task stuck at ~100% CPU:
[root@fedora io_uring_stress]# ps -ef | grep io_uring
root 1240 1 99 13:36 ? 00:01:35 [io_uring_stress] <defunct>
The task loops inside io_cqring_wait() and never returns to userspace,
and SIGKILL has no effect.
This is caused by the CQ ring exposing rings->cq.head to userspace as
writable, while the authoritative tail lives in kernel-private
ctx->cached_cq_tail. io_cqe_cache_refill() computes free space as an
unsigned subtraction:
free = ctx->cq_entries - min(tail - head, ctx->cq_entries);
If userspace keeps head within [0, tail], the subtraction is well
defined and min() just acts as a defensive clamp. But if userspace
advances head past tail, (tail - head) wraps to a huge value, free
becomes 0, and io_cqe_cache_refill() fails. The CQE is pushed onto the
overflow list and IO_CHECK_CQ_OVERFLOW_BIT is set.
The wait loop in io_cqring_wait() relies on an invariant: refill() only
fails when the CQ is *physically* full, in which case rings->cq.tail has
been advanced to iowq->cq_tail and io_should_wake() returns true. The
tampered head breaks this: refill() fails while the ring is not full, no
OCQE is copied in, rings->cq.tail never catches up, io_should_wake()
stays false, and io_cqring_wait_schedule() keeps returning early because
IO_CHECK_CQ_OVERFLOW_BIT is still set. The result is a tight retry loop
that never returns to userspace.
Introduce io_cqring_queued() as the single point that converts the
(tail, head) pair into a trustworthy queued count. Since the real
head/tail distance is bounded by cq_entries (far below 2^31), a signed
comparison reliably detects userspace moving head past tail; in that
case treat the queue as empty so callers see the full cache as free and
forward progress is preserved.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Link: https://patch.msgid.link/20260514021847.4062782-1-wozizhi@huaweicloud.com
[axboe: fixup commit message, kill 'queued' var, and keep it all in
io_uring.c]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Diffstat (limited to 'drivers/gpu/tests/git@radix-linux.su:pub')
0 files changed, 0 insertions, 0 deletions
