<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/io_uring/opdef.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-07-17T16:37:21+00:00</updated>
<entry>
<title>io_uring: make fallocate be hashed work</title>
<updated>2025-07-17T16:37:21+00:00</updated>
<author>
<name>Fengnan Chang</name>
<email>changfengnan@bytedance.com</email>
</author>
<published>2025-06-23T11:02:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=383b2399d5862fbee79e1f9028be1da5559756b9'/>
<id>urn:sha1:383b2399d5862fbee79e1f9028be1da5559756b9</id>
<content type='text'>
[ Upstream commit 88a80066af1617fab444776135d840467414beb6 ]

Like ftruncate and write, fallocate operations on the same file cannot
be executed in parallel, so it is better to make fallocate be hashed
work.

Signed-off-by: Fengnan Chang &lt;changfengnan@bytedance.com&gt;
Link: https://lore.kernel.org/r/20250623110218.61490-1-changfengnan@bytedance.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>io_uring: Fix probe of disabled operations</title>
<updated>2024-06-19T14:58:00+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-06-19T02:06:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e05b222382ec67dce7358d50b6006e91d028d8b'/>
<id>urn:sha1:3e05b222382ec67dce7358d50b6006e91d028d8b</id>
<content type='text'>
io_probe checks io_issue_def-&gt;not_supported, but we never really set
that field, as we mark non-supported functions through a specific -&gt;prep
handler.  This means we end up returning IO_URING_OP_SUPPORTED, even for
disabled operations.  Fix it by just checking the prep handler itself.

Fixes: 66f4af93da57 ("io_uring: add support for probing opcodes")
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20240619020620.5301-2-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: Introduce IORING_OP_LISTEN</title>
<updated>2024-06-19T13:57:21+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-06-14T16:30:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ff140cc8628abfb1755691d16cfa8788d8820ef7'/>
<id>urn:sha1:ff140cc8628abfb1755691d16cfa8788d8820ef7</id>
<content type='text'>
IORING_OP_LISTEN provides the semantic of listen(2) via io_uring.  While
this is an essentially synchronous system call, the main point is to
enable a network path to execute fully with io_uring registered and
descriptorless files.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20240614163047.31581-4-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: Introduce IORING_OP_BIND</title>
<updated>2024-06-19T13:57:21+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-06-14T16:30:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7481fd93fa0a851740e26026485f56a1305454ce'/>
<id>urn:sha1:7481fd93fa0a851740e26026485f56a1305454ce</id>
<content type='text'>
IORING_OP_BIND provides the semantic of bind(2) via io_uring.  While
this is an essentially synchronous system call, the main point is to
enable a network path to execute fully with io_uring registered and
descriptorless files.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20240614163047.31581-3-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/rw: Free iovec before cleaning async data</title>
<updated>2024-05-30T14:33:01+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2024-05-30T14:23:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e112311615a24e1618a591c73506571dc304eb8d'/>
<id>urn:sha1:e112311615a24e1618a591c73506571dc304eb8d</id>
<content type='text'>
kmemleak shows that there is a memory leak in io_uring read operation,
where a buffer is allocated at iovec import, but never de-allocated.

The memory is allocated at io_async_rw-&gt;free_iovec, but, then
io_async_rw is kfreed, taking the allocated memory with it. I saw this
happening when the read operation fails with -11 (EAGAIN).

This is the kmemleak splat.

    unreferenced object 0xffff8881da591c00 (size 256):
...
      backtrace (crc 7a15bdee):
	[&lt;00000000256f2de4&gt;] __kmalloc+0x2d6/0x410
	[&lt;000000007a9f5fc7&gt;] iovec_from_user.part.0+0xc6/0x160
	[&lt;00000000cecdf83a&gt;] __import_iovec+0x50/0x220
	[&lt;00000000d1d586a2&gt;] __io_import_iovec+0x13d/0x220
	[&lt;0000000054ee9bd2&gt;] io_prep_rw+0x186/0x340
	[&lt;00000000a9c0372d&gt;] io_prep_rwv+0x31/0x120
	[&lt;000000001d1170b9&gt;] io_prep_readv+0xe/0x30
	[&lt;0000000070b8eb67&gt;] io_submit_sqes+0x1bd/0x780
	[&lt;00000000812496d4&gt;] __do_sys_io_uring_enter+0x3ed/0x5b0
	[&lt;0000000081499602&gt;] do_syscall_64+0x5d/0x170
	[&lt;00000000de1c5a4d&gt;] entry_SYSCALL_64_after_hwframe+0x76/0x7e

This occurs because the async data cleanup functions are not set for
read/write operations. As a result, the potentially allocated iovec in
the rw async data is not freed before the async data is released,
leading to a memory leak.

With this following patch, kmemleak does not show the leaked memory
anymore, and all liburing tests pass.

Fixes: a9165b83c193 ("io_uring/rw: always setup io_async_rw for read/write requests")
Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Link: https://lore.kernel.org/r/20240530142340.1248216-1-leitao@debian.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/net: add provided buffer support for IORING_OP_SEND</title>
<updated>2024-04-22T17:25:56+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-02-19T17:46:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ac5f71a3d9d7eb540f6bf7e794eb4a3e4c3f11dd'/>
<id>urn:sha1:ac5f71a3d9d7eb540f6bf7e794eb4a3e4c3f11dd</id>
<content type='text'>
It's pretty trivial to wire up provided buffer support for the send
side, just like how it's done the receive side. This enables setting up
a buffer ring that an application can use to push pending sends to,
and then have a send pick a buffer from that ring.

One of the challenges with async IO and networking sends is that you
can get into reordering conditions if you have more than one inflight
at the same time. Consider the following scenario where everything is
fine:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, completes successfully, posts CQE
5) sendB is issued, completes successfully, posts CQE

All is fine. Requests are always issued in-order, and both complete
inline as most sends do.

However, if we're flooding socket1 with sends, the following could
also result from the same sequence:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, completes successfully, posts CQE
7) sendA is retried, completes successfully, posts CQE

Now we've sent sendB before sendA, which can make things unhappy. If
both sendA and sendB had been using provided buffers, then it would look
as follows instead:

1) App queues dataA for sendA, queues sendA for socket1
2) App queues dataB for sendB queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, picks first buffer (dataA), completes successfully,
   posts CQE (which says "I sent dataA")
7) sendA is retried, picks first buffer (dataB), completes successfully,
   posts CQE (which says "I sent dataB")

Now we've sent the data in order, and everybody is happy.

It's worth noting that this also opens the door for supporting multishot
sends, as provided buffers would be a prerequisite for that. Those can
trigger either when new buffers are added to the outgoing ring, or (if
stalled due to lack of space) when space frees up in the socket.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: drop -&gt;prep_async()</title>
<updated>2024-04-15T14:10:25+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-03-19T02:48:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e10677a8f6980dbae2e866b8320d90bae07e87ee'/>
<id>urn:sha1:e10677a8f6980dbae2e866b8320d90bae07e87ee</id>
<content type='text'>
It's now unused, drop the code related to it. This includes the
io_issue_defs-&gt;manual alloc field.

While in there, and since -&gt;async_size is now being used a bit more
frequently and in the issue path, move it to io_issue_defs[].

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/uring_cmd: switch to always allocating async data</title>
<updated>2024-04-15T14:10:25+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-03-19T02:41:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d10f19dff56eac5ae44dc270336b18071a8bd51c'/>
<id>urn:sha1:d10f19dff56eac5ae44dc270336b18071a8bd51c</id>
<content type='text'>
Basic conversion ensuring async_data is allocated off the prep path. Adds
a basic alloc cache as well, as passthrough IO can be quite high in rate.

Tested-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/net: move connect to always using async data</title>
<updated>2024-04-15T14:10:25+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-03-19T02:37:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e2ea5a7069133c01fe3dbda95d77af7f193a1a52'/>
<id>urn:sha1:e2ea5a7069133c01fe3dbda95d77af7f193a1a52</id>
<content type='text'>
While doing that, get rid of io_async_connect and just use the generic
io_async_msghdr. Both of them have a struct sockaddr_storage in there,
and while io_async_msghdr is bigger, if the same type can be used then
the netmsg_cache can get reused for connect as well.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/rw: always setup io_async_rw for read/write requests</title>
<updated>2024-04-15T14:10:25+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-03-18T22:13:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a9165b83c1937eeed1f0c731468216d6371d647f'/>
<id>urn:sha1:a9165b83c1937eeed1f0c731468216d6371d647f</id>
<content type='text'>
read/write requests try to put everything on the stack, and then alloc
and copy if a retry is needed. This necessitates a bunch of nasty code
that deals with intermediate state.

Get rid of this, and have the prep side setup everything that is needed
upfront, which greatly simplifies the opcode handlers.

This includes adding an alloc cache for io_async_rw, to make it cheap
to handle.

In terms of cost, this should be basically free and transparent. For
the worst case of {READ,WRITE}_FIXED which didn't need it before,
performance is unaffected in the normal peak workload that is being
used to test that. Still runs at 122M IOPS.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
