<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/io_uring/kbuf.c, branch v7.2-rc1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-06-16T07:23:59+00:00</updated>
<entry>
<title>Merge tag 'for-7.2/io_uring-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux</title>
<updated>2026-06-16T07:23:59+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-06-16T07:23:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9b40ba14edcdf70240af8114092a76f75f070774'/>
<id>urn:sha1:9b40ba14edcdf70240af8114092a76f75f070774</id>
<content type='text'>
Pull io_uring updates from Jens Axboe:

 - Rework the task_work infrastructure.

   Both the local (DEFER_TASKRUN) and the normal (tctx) task_work lists
   were llist based, which is LIFO ordered, and hence each run had to do
   an O(n) list reversal pass first to restore queue order.
   Additionally, to cap the amount of task_work run, each method needed
   a retry list as well.

   Add a lockless MPCS FIFO queue (based on Dmitry Vyukov's intrusive
   MPSC algorithm) and switch both task_work lists to it. It performs
   better than llists and we can then also ditch the retry lists as well
   as entries are popped one-at-the-time.

   On top of those changes, run the tctx fallback task_work directly and
   remove the now-unused per-ctx fallback machinery entirely.

 - zcrx user notifications.

   Add a mechanism for zcrx to communicate conditions back to userspace
   via a dedicated CQE, with the initial users being notification on
   running out of buffers and on a frag copy fallback, plus
   shared-memory notification statistics.

   Alongside that, a series of zcrx reliability and cleanup fixes: more
   reliable scrubbing, poisoning pointers on unregistration, dropping an
   extra ifq close, adding a ctx back-pointer, reordering fd allocation
   in the export path, and killing a dead 'sock' member.

 - Allow using io_uring registered buffers for plain SEND and RECV, not
   just for the zero-copy send path.

   This enables targets like ublk's NBD backend to push/pull IO data
   directly to/from a registered buffer over a plain send/recv on a TCP
   socket.

 - Registered buffer improvements: account huge pages correctly, bump
   the io_mapped_ubuf length field to size_t, and raise the previous 1GB
   registered buffer size limit.

 - Restrict the ctx access exposed to io_uring BPF struct_ops programs
   by handing them an opaque type rather than the full io_ring_ctx, and
   add a separate MAINTAINERS entry for the bpf-ops code.

 - Allow opcode filtering on IORING_OP_CONNECT.

 - Validate ring-provided buffer addresses with access_ok(), and align
   the legacy buffer add limit with MAX_BIDS_PER_BGID.

 - Various other cleanups and minor fixes, including avoiding msghdr
   async data on connect/bind, dropping async_size for OP_LISTEN, making
   the POLL_FIRST receive side checks consistent, re-checking
   IO_WQ_BIT_EXIT for each linked work item, and using
   trace_call__##name() at guarded tracepoint call sites.

* tag 'for-7.2/io_uring-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (31 commits)
  io_uring/bpf-ops: add a separate maintainer entry
  io_uring/net: make POLL_FIRST receive side checks consistent
  io_uring: remove the per-ctx fallback task_work machinery
  io_uring: run the tctx task_work fallback directly
  io_uring: switch normal task_work to a mpscq
  io_uring: switch local task_work to a mpscq
  io_uring/mpscq: add lockless multi-producer, single-consumer FIFO queue
  io_uring: grab RCU read lock marking task run
  io_uring/zcrx: kill dead 'sock' member in struct io_zcrx_args
  io_uring/kbuf: validate ring provided buffer addresses with access_ok()
  io_uring/net: support registered buffer for plain send and recv
  io_uring/nop: Drop a wrong comment in struct io_nop
  io_uring/net: Remove async_size for OP_LISTEN
  io_uring/net: Avoid msghdr on op_connect/op_bind async data
  io_uring/bpf-ops: restrict ctx access to BPF
  io_uring/io-wq: re-check IO_WQ_BIT_EXIT for each linked work item
  io_uring/kbuf: align legacy buffer add limit with MAX_BIDS_PER_BGID
  io_uring/zcrx: add shared-memory notification statistics
  io_uring/zcrx: notify user on frag copy fallback
  io_uring/zcrx: notify user when out of buffers
  ...
</content>
</entry>
<entry>
<title>io_uring/kbuf: validate ring provided buffer addresses with access_ok()</title>
<updated>2026-06-08T17:03:16+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-04-09T17:22:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=46800585ae04863c623b1563b03d12e9381089f1'/>
<id>urn:sha1:46800585ae04863c623b1563b03d12e9381089f1</id>
<content type='text'>
Commit:

809b997a5ce9 ("x86-64/arm64/powerpc: clean up and rename __copy_from_user_flushcache")

sanitized that any provided copy helper should separately validate
destination and source addresses, but we should also ensure that
anything that is retrieved from a buffer is validated upfront. For ring
provided buffers, always include an access_ok() when grabbing a new
buffer.

Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: don't truncate end buffer for bundles</title>
<updated>2026-06-07T22:11:47+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-06-07T22:05:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=70f4886bcbb929e88038c8807f1daf7fc587ae7c'/>
<id>urn:sha1:70f4886bcbb929e88038c8807f1daf7fc587ae7c</id>
<content type='text'>
If buffers have been peeked for a bundle receive, the kernel will
truncate the end buffer, if the available length is shorter than the
buffer itself. This is unnecessary, as applications iterating bundle
receives must always use the minimum size of the buffer length and the
remaining number of bytes in the bundle. The examples in liburing do
that as well, eg examples/proxy.c.

If the kernel does truncate this buffer AND the current transfer fails,
then the buffer will be left with a smaller size than what is otherwise
available.

Just remove the buffer truncation, as it's not necessary in the first
place.

Link: https://lore.kernel.org/io-uring/CAAEr8jbY60noGj1fw_k91UJRBkyiRVoS6=nLhZ7Svwidjn4CAA@mail.gmail.com/
Reported-by: Federico Brasili &lt;federico.brasili@gmail.com&gt;
Cc: stable@vger.kernel.org
Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: align legacy buffer add limit with MAX_BIDS_PER_BGID</title>
<updated>2026-05-28T14:02:57+00:00</updated>
<author>
<name>liyouhong</name>
<email>liyouhong@kylinos.cn</email>
</author>
<published>2026-05-28T02:49:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca55f98d6ff1ecb31a07b668a8d105b4e0829c6a'/>
<id>urn:sha1:ca55f98d6ff1ecb31a07b668a8d105b4e0829c6a</id>
<content type='text'>
io_provide_buffers_prep() accepts nbufs up to MAX_BIDS_PER_BGID, but
io_add_buffers() stops when bl-&gt;nbufs reaches USHRT_MAX. This makes the
effective add limit one lower than the validated limit.

Use MAX_BIDS_PER_BGID in the add-side boundary check so validation and
execution use the same limit, and update the comment to refer to the
actual limit constant.

Signed-off-by: liyouhong &lt;liyouhong@kylinos.cn&gt;
Link: https://patch.msgid.link/20260528024936.3672659-1-dayou5941@163.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: parenthesize io_ring_head_to_buf() expansion</title>
<updated>2026-05-14T13:21:32+00:00</updated>
<author>
<name>Yi Xie</name>
<email>xieyi@kylinos.cn</email>
</author>
<published>2026-05-14T08:34:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c84701cfc90a90a6a9dfbdb138706a6d79f5b186'/>
<id>urn:sha1:c84701cfc90a90a6a9dfbdb138706a6d79f5b186</id>
<content type='text'>
Wrap the io_ring_head_to_buf() macro value in an extra pair of parentheses
so it is safe when composed into larger expressions, and to satisfy
scripts/checkpatch.pl.

Signed-off-by: Yi Xie &lt;xieyi@kylinos.cn&gt;
Link: https://patch.msgid.link/20260514083443.203387-1-xieyi@kylinos.cn
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: support min length left for incremental buffers</title>
<updated>2026-04-28T22:08:56+00:00</updated>
<author>
<name>Martin Michaelis</name>
<email>code@mgjm.de</email>
</author>
<published>2026-04-23T21:54:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7deba791ad495ce1d7921683f4f7d1190fa210d1'/>
<id>urn:sha1:7deba791ad495ce1d7921683f4f7d1190fa210d1</id>
<content type='text'>
Incrementally consumed buffer rings are generally fully consumed, but
it's quite possible that the application has a minimum size it needs to
meet to avoid truncation. Currently that minimum limit is 1 byte, but
this should be a setting that is the hands of the application. For
recvmsg multishot, a prime use case for incrementally consumed buffers,
the application may get spurious -EFAULT returned at the end of an
incrementally consumed buffer, as less space is available than the
headers need.

Grab a u32 field in struct io_uring_buf_reg, which the application can
use to inform the kernel of the minimum size that should be available
in an incrementally consumed buffer. If less than that is available,
the current buffer is fully processed and the next one will be picked.

Cc: stable@vger.kernel.org
Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption")
Link: https://github.com/axboe/liburing/issues/1433
Signed-off-by: Martin Michaelis &lt;code@mgjm.de&gt;
[axboe: write commit message, change io_buffer_list member name]
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member</title>
<updated>2026-04-28T22:08:44+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-04-24T13:30:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=55ea968389172306341a6600a2acb759862d366a'/>
<id>urn:sha1:55ea968389172306341a6600a2acb759862d366a</id>
<content type='text'>
This is only ever assigned, never used. The only used part is the
calculated mask, which is used for indexing. Kill 'nr_entries'.

Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-7.1/io_uring-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux</title>
<updated>2026-04-13T23:22:30+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-13T23:22:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=23acda7c221a76ff711d65f4ca90029d43b249a0'/>
<id>urn:sha1:23acda7c221a76ff711d65f4ca90029d43b249a0</id>
<content type='text'>
Pull io_uring updates from Jens Axboe:

 - Add a callback driven main loop for io_uring, and BPF struct_ops
   on top to allow implementing custom event loop logic

 - Decouple IOPOLL from being a ring-wide all-or-nothing setting,
   allowing IOPOLL use cases to also issue certain white listed
   non-polled opcodes

 - Timeout improvements. Migrate internal timeout storage from
   timespec64 to ktime_t for simpler arithmetic and avoid copying of
   timespec data

 - Zero-copy receive (zcrx) updates:

      - Add a device-less mode (ZCRX_REG_NODEV) for testing and
        experimentation where data flows through the copy fallback path

      - Fix two-step unregistration regression, DMA length calculations,
        xarray mark usage, and a potential 32-bit overflow in id
        shifting

      - Refactoring toward multi-area support: dedicated refill queue
        struct, consolidated DMA syncing, netmem array refilling format,
        and guard-based locking

 - Zero-copy transmit (zctx) cleanup:

      - Unify io_send_zc() and io_sendmsg_zc() into a single function

      - Add vectorized registered buffer send for IORING_OP_SEND_ZC

      - Add separate notification user_data via sqe-&gt;addr3 so
        notification and completion CQEs can be distinguished without
        extra reference counting

 - Switch struct io_ring_ctx internal bitfields to explicit flag bits
   with atomic-safe accessors, and annotate the known harmless races on
   those flags

 - Various optimizations caching ctx and other request fields in local
   variables to avoid repeated loads, and cleanups for tctx setup, ring
   fd registration, and read path early returns

* tag 'for-7.1/io_uring-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (58 commits)
  io_uring: unify getting ctx from passed in file descriptor
  io_uring/register: don't get a reference to the registered ring fd
  io_uring/tctx: clean up __io_uring_add_tctx_node() error handling
  io_uring/tctx: have io_uring_alloc_task_context() return tctx
  io_uring/timeout: use 'ctx' consistently
  io_uring/rw: clean up __io_read() obsolete comment and early returns
  io_uring/zcrx: use correct mmap off constants
  io_uring/zcrx: use dma_len for chunk size calculation
  io_uring/zcrx: don't clear not allocated niovs
  io_uring/zcrx: don't use mark0 for allocating xarray
  io_uring: cast id to u64 before shifting in io_allocate_rbuf_ring()
  io_uring/zcrx: reject REG_NODEV with large rx_buf_size
  io_uring/cancel: validate opcode for IORING_ASYNC_CANCEL_OP
  io_uring/rsrc: use io_cache_free() to free node
  io_uring/zcrx: rename zcrx [un]register functions
  io_uring/zcrx: check ctrl op payload struct sizes
  io_uring/zcrx: cache fallback availability in zcrx ctx
  io_uring/zcrx: warn on a repeated area append
  io_uring/zcrx: consolidate dma syncing
  io_uring/zcrx: netmem array as refiling format
  ...
</content>
</entry>
<entry>
<title>Merge tag 'io_uring-7.0-20260320' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux</title>
<updated>2026-03-20T16:58:56+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-03-20T16:58:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c612261bedd6bbab7109f798715e449c9d20ff2f'/>
<id>urn:sha1:c612261bedd6bbab7109f798715e449c9d20ff2f</id>
<content type='text'>
Pull io_uring fixes from Jens Axboe:

 - A bit of a work-around for AF_UNIX recv multishot, as the in-kernel
   implementation doesn't properly signal EOF. We'll likely rework this
   one going forward, but the fix is sufficient for now

 - Two fixes for incrementally consumed buffers, for non-pollable files
   and for 0 byte reads

* tag 'io_uring-7.0-20260320' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/kbuf: propagate BUF_MORE through early buffer commit path
  io_uring/kbuf: fix missing BUF_MORE for incremental buffers at EOF
  io_uring/poll: fix multishot recv missing EOF on wakeup race
</content>
</entry>
<entry>
<title>io_uring/kbuf: propagate BUF_MORE through early buffer commit path</title>
<updated>2026-03-19T21:09:48+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-03-19T20:29:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=418eab7a6f3c002d8e64d6e95ec27118017019af'/>
<id>urn:sha1:418eab7a6f3c002d8e64d6e95ec27118017019af</id>
<content type='text'>
When io_should_commit() returns true (eg for non-pollable files), buffer
commit happens at buffer selection time and sel-&gt;buf_list is set to
NULL. When __io_put_kbufs() generates CQE flags at completion time, it
calls __io_put_kbuf_ring() which finds a NULL buffer_list and hence
cannot determine whether the buffer was consumed or not. This means that
IORING_CQE_F_BUF_MORE is never set for non-pollable input with
incrementally consumed buffers.

Likewise for io_buffers_select(), which always commits upfront and
discards the return value of io_kbuf_commit().

Add REQ_F_BUF_MORE to store the result of io_kbuf_commit() during early
commit. Then __io_put_kbuf_ring() can check this flag and set
IORING_F_BUF_MORE accordingy.

Reported-by: Martin Michaelis &lt;code@mgjm.de&gt;
Cc: stable@vger.kernel.org
Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption")
Link: https://github.com/axboe/liburing/issues/1553
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
