kernel/linux.git/io_uring/rw.c, branch v6.6.132

io_uring/rw: fix wrong NOWAIT check in io_rw_init_file()

2025-06-19T13:28:45+00:00

Commit ae6a888a4357131c01d85f4c91fb32552dd0bf70 upstream. A previous commit improved how !FMODE_NOWAIT is dealt with, but inadvertently negated a check whilst doing so. This caused -EAGAIN to be returned from reading files with O_NONBLOCK set. Fix up the check for REQ_F_SUPPORT_NOWAIT. Reported-by: Julian Orth Link: https://github.com/axboe/liburing/issues/1270 Fixes: f7c913438533 ("io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT") Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT

2025-06-19T13:28:44+00:00

Commit f7c9134385331c5ef36252895130aa01a92de907 upstream. The checking for whether or not io_uring can do a non-blocking read or write attempt is gated on FMODE_NOWAIT. However, if the file is pollable, it's feasible to just check if it's currently in a state in which it can sanely receive or send _some_ data. This avoids unnecessary io-wq punts, and repeated worthless retries before doing that punt, by assuming that some data can get delivered or received if poll tells us that is true. It also allows multishot reads to properly work with these types of files, enabling a bit of a cleanup of the logic that: c9d952b9103b ("io_uring/rw: fix cflags posting for single issue multishot read") had to put in place. Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

io_uring: add io_file_can_poll() helper

2025-06-19T13:28:44+00:00

Commit 95041b93e90a06bb613ec4bef9cd4d61570f68e4 upstream. This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at least twice for networked workloads, and if using ring provided buffers, we do it on every buffer selection. Particularly the latter is troublesome, as it's otherwise a very fast operation. Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

io_uring/rw: commit provided buffer state on async

2025-02-17T08:40:37+00:00

When we get -EIOCBQUEUED, we need to ensure that the buffer is consumed from the provided buffer ring, which can be done with io_kbuf_recycle() + REQ_F_PARTIAL_IO. Reported-by: Muhammad Ramdhan Reported-by: Bing-Jhong Billy Jheng Reported-by: Jacob Soo Fixes: c7fb19428d67d ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Pavel Begunkov Signed-off-by: Greg Kroah-Hartman

io_uring/rw: avoid punting to io-wq directly

2024-12-27T12:58:57+00:00

Commit 6e6b8c62120a22acd8cb759304e4cd2e3215d488 upstream. kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core code io_uring handle offloading. Signed-off-by: Pavel Begunkov Tested-by: Ming Lei Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe (cherry picked from commit 6e6b8c62120a22acd8cb759304e4cd2e3215d488) Signed-off-by: Greg Kroah-Hartman

io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN

2024-12-27T12:58:57+00:00

Commit c0a9d496e0fece67db777bd48550376cf2960c47 upstream. Some file systems, ocfs2 in this case, will return -EOPNOTSUPP for an IOCB_NOWAIT read/write attempt. While this can be argued to be correct, the usual return value for something that requires blocking issue is -EAGAIN. A refactoring io_uring commit dropped calling kiocb_done() for negative return values, which is otherwise where we already do that transformation. To ensure we catch it in both spots, check it in __io_read() itself as well. Reported-by: Robert Sander Link: https://fosstodon.org/@gurubert@mastodon.gurubert.de/113112431889638440 Cc: stable@vger.kernel.org Fixes: a08d195b586a ("io_uring/rw: split io_read() into a helper") Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

io_uring/rw: split io_read() into a helper

2024-12-27T12:58:57+00:00

Commit a08d195b586a217d76b42062f88f375a3eedda4d upstream. Add __io_read() which does the grunt of the work, leaving the completion side to the new io_read(). No functional changes in this patch. Reviewed-by: Gabriel Krisman Bertazi Signed-off-by: Jens Axboe (cherry picked from commit a08d195b586a217d76b42062f88f375a3eedda4d) Signed-off-by: Greg Kroah-Hartman

io_uring/rw: fix missing NOWAIT check for O_DIRECT start write

2024-11-08T15:28:26+00:00

[ Upstream commit 1d60d74e852647255bd8e76f5a22dc42531e4389 ] When io_uring starts a write, it'll call kiocb_start_write() to bump the super block rwsem, preventing any freezes from happening while that write is in-flight. The freeze side will grab that rwsem for writing, excluding any new writers from happening and waiting for existing writes to finish. But io_uring unconditionally uses kiocb_start_write(), which will block if someone is currently attempting to freeze the mount point. This causes a deadlock where freeze is waiting for previous writes to complete, but the previous writes cannot complete, as the task that is supposed to complete them is blocked waiting on starting a new write. This results in the following stuck trace showing that dependency with the write blocked starting a new write: task:fio state:D stack:0 pid:886 tgid:886 ppid:876 Call trace: __switch_to+0x1d8/0x348 __schedule+0x8e8/0x2248 schedule+0x110/0x3f0 percpu_rwsem_wait+0x1e8/0x3f8 __percpu_down_read+0xe8/0x500 io_write+0xbb8/0xff8 io_issue_sqe+0x10c/0x1020 io_submit_sqes+0x614/0x2110 __arm64_sys_io_uring_enter+0x524/0x1038 invoke_syscall+0x74/0x268 el0_svc_common.constprop.0+0x160/0x238 do_el0_svc+0x44/0x60 el0_svc+0x44/0xb0 el0t_64_sync_handler+0x118/0x128 el0t_64_sync+0x168/0x170 INFO: task fsfreeze:7364 blocked for more than 15 seconds. Not tainted 6.12.0-rc5-00063-g76aaf945701c #7963 with the attempting freezer stuck trying to grab the rwsem: task:fsfreeze state:D stack:0 pid:7364 tgid:7364 ppid:995 Call trace: __switch_to+0x1d8/0x348 __schedule+0x8e8/0x2248 schedule+0x110/0x3f0 percpu_down_write+0x2b0/0x680 freeze_super+0x248/0x8a8 do_vfs_ioctl+0x149c/0x1b18 __arm64_sys_ioctl+0xd0/0x1a0 invoke_syscall+0x74/0x268 el0_svc_common.constprop.0+0x160/0x238 do_el0_svc+0x44/0x60 el0_svc+0x44/0xb0 el0t_64_sync_handler+0x118/0x128 el0t_64_sync+0x168/0x170 Fix this by having the io_uring side honor IOCB_NOWAIT, and only attempt a blocking grab of the super block rwsem if it isn't set. For normal issue where IOCB_NOWAIT would always be set, this returns -EAGAIN which will have io_uring core issue a blocking attempt of the write. That will in turn also get completions run, ensuring forward progress. Since freezing requires CAP_SYS_ADMIN in the first place, this isn't something that can be triggered by a regular user. Cc: stable@vger.kernel.org # 5.10+ Reported-by: Peter Mann Link: https://lore.kernel.org/io-uring/38c94aec-81c9-4f62-b44e-1d87f5597644@sh.cz Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

io_uring/rw: ensure io->bytes_done is always initialized

2024-01-25T23:35:45+00:00

commit 0a535eddbe0dc1de4386046ab849f08aeb2f8faf upstream. If IOSQE_ASYNC is set and we fail importing an iovec for a readv or writev request, then we leave ->bytes_done uninitialized and hence the eventual failure CQE posted can potentially have a random res value rather than the expected -EINVAL. Setup ->bytes_done before potentially failing, so we have a consistent value if we fail the request early. Cc: stable@vger.kernel.org Reported-by: xingwei lee Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2023-10-28T02:44:58+00:00

Pull misc filesystem fixes from Al Viro: "Assorted fixes all over the place: literally nothing in common, could have been three separate pull requests. All are simple regression fixes, but not for anything from this cycle" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ceph_wait_on_conflict_unlink(): grab reference before dropping ->d_lock io_uring: kiocb_done() should *not* trust ->ki_pos if ->{read,write}_iter() failed sparc32: fix a braino in fault handling in csum_and_copy_..._user()