<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/uapi/linux/io_uring.h, branch linux-6.9.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.9.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.9.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-02-09T18:54:32+00:00</updated>
<entry>
<title>io_uring: add register/unregister napi function</title>
<updated>2024-02-09T18:54:32+00:00</updated>
<author>
<name>Stefan Roesch</name>
<email>shr@devkernel.io</email>
</author>
<published>2023-06-08T16:38:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ef1186c1a875bfa8a8cbfc2a9670b14b082187a9'/>
<id>urn:sha1:ef1186c1a875bfa8a8cbfc2a9670b14b082187a9</id>
<content type='text'>
This adds an api to register and unregister the napi for io-uring. If
the arg value is specified when unregistering, the current napi setting
for the busy poll timeout is copied into the user structure. If this is
not required, NULL can be passed as the arg value.

Signed-off-by: Stefan Roesch &lt;shr@devkernel.io&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20230608163839.2891748-7-shr@devkernel.io
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: add support for ftruncate</title>
<updated>2024-02-09T16:04:39+00:00</updated>
<author>
<name>Tony Solomonik</name>
<email>tony.solomonik@gmail.com</email>
</author>
<published>2024-02-02T12:17:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b4bb1900c12e6a0fe11ff51e1aa6eea19a4ad635'/>
<id>urn:sha1:b4bb1900c12e6a0fe11ff51e1aa6eea19a4ad635</id>
<content type='text'>
Adds support for doing truncate through io_uring, eliminating
the need for applications to roll their own thread pool or offload
mechanism to be able to do non-blocking truncates.

Signed-off-by: Tony Solomonik &lt;tony.solomonik@gmail.com&gt;
Link: https://lore.kernel.org/r/20240202121724.17461-3-tony.solomonik@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: add method for returning provided buffer ring head</title>
<updated>2023-12-21T16:47:06+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-12-21T16:02:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d293b1a89694fc4918d9a4330a71ba2458f9d581'/>
<id>urn:sha1:d293b1a89694fc4918d9a4330a71ba2458f9d581</id>
<content type='text'>
The tail of the provided ring buffer is shared between the kernel and
the application, but the head is private to the kernel as the
application doesn't need to see it. However, this also prevents the
application from knowing how many buffers the kernel has consumed.
Usually this is fine, as the information is inherently racy in that
the kernel could be consuming buffers continually, but for cleanup
purposes it may be relevant to know how many buffers are still left
in the ring.

Add IORING_REGISTER_PBUF_STATUS which will return status for a given
provided buffer ring. Right now it just returns the head, but space
is reserved for more information later in, if needed.

Link: https://github.com/axboe/liburing/discussions/1020
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL</title>
<updated>2023-12-12T14:42:57+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-12-08T03:06:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc18b89ab113e9c6c7a529316ddf7029fb55132d'/>
<id>urn:sha1:dc18b89ab113e9c6c7a529316ddf7029fb55132d</id>
<content type='text'>
io_uring can currently open/close regular files or fixed/direct
descriptors. Or you can instantiate a fixed descriptor from a regular
one, and then close the regular descriptor. But you currently can't turn
a purely fixed/direct descriptor into a regular file descriptor.

IORING_OP_FIXED_FD_INSTALL adds support for installing a direct
descriptor into the normal file table, just like receiving a file
descriptor or opening a new file would do. This is all nicely abstracted
into receive_fd(), and hence adding support for this is truly trivial.

Since direct descriptors are only usable within io_uring itself, it can
be useful to turn them into real file descriptors if they ever need to
be accessed via normal syscalls. This can either be a transitory thing,
or just a permanent transition for a given direct descriptor.

By default, new fds are installed with O_CLOEXEC set. The application
can disable O_CLOEXEC by setting IORING_FIXED_FD_NO_CLOEXEC in the
sqe-&gt;install_fd_flags member.

Suggested-by: Christian Brauner &lt;brauner@kernel.org&gt;
Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux</title>
<updated>2023-11-01T21:25:08+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-11-01T21:25:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4de520f1fcefd4ebb7dddcf28bde1b330c2f6b5d'/>
<id>urn:sha1:4de520f1fcefd4ebb7dddcf28bde1b330c2f6b5d</id>
<content type='text'>
Pull io_uring futex support from Jens Axboe:
 "This adds support for using futexes through io_uring - first futex
  wake and wait, and then the vectored variant of waiting, futex waitv.

  For both wait/wake/waitv, we support the bitset variant, as the
  'normal' variants can be easily implemented on top of that.

  PI and requeue are not supported through io_uring, just the above
  mentioned parts. This may change in the future, but in the spirit of
  keeping this small (and based on what people have been asking for),
  this is what we currently have.

  Wake support is pretty straight forward, most of the thought has gone
  into the wait side to avoid needing to offload wait operations to a
  blocking context. Instead, we rely on the usual callbacks to retry and
  post a completion event, when appropriate.

  As far as I can recall, the first request for futex support with
  io_uring came from Andres Freund, working on postgres. His aio rework
  of postgres was one of the early adopters of io_uring, and futex
  support was a natural extension for that. This is relevant from both a
  usability point of view, as well as for effiency and performance. In
  Andres's words, for the former:

     Futex wait support in io_uring makes it a lot easier to avoid
     deadlocks in concurrent programs that have their own buffer pool:
     Obviously pages in the application buffer pool have to be locked
     during IO. If the initiator of IO A needs to wait for a held lock
     B, the holder of lock B might wait for the IO A to complete. The
     ability to wait for a lock and IO completions at the same time
     provides an efficient way to avoid such deadlocks

  and in terms of effiency, even without unlocking the full potential
  yet, Andres says:

     Futex wake support in io_uring is useful because it allows for more
     efficient directed wakeups. For some "locks" postgres has queues
     implemented in userspace, with wakeup logic that cannot easily be
     implemented with FUTEX_WAKE_BITSET on a single "futex word"
     (imagine waiting for journal flushes to have completed up to a
     certain point).

     Thus a "lock release" sometimes need to wake up many processes in a
     row. A quick-and-dirty conversion to doing these wakeups via
     io_uring lead to a 3% throughput increase, with 12% fewer context
     switches, albeit in a fairly extreme workload"

* tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux:
  io_uring: add support for vectored futex waits
  futex: make the vectored futex operations available
  futex: make futex_parse_waitv() available as a helper
  futex: add wake_data to struct futex_q
  io_uring: add support for futex wake and wait
  futex: abstract out a __futex_wake_mark() helper
  futex: factor out the futex wake handling
  futex: move FUTEX2_VALID_MASK to futex.h
</content>
</entry>
<entry>
<title>io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT</title>
<updated>2023-10-19T22:42:03+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2023-10-16T13:47:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4232c6e349f3a591fd0f432e6b858d32095adce6'/>
<id>urn:sha1:4232c6e349f3a591fd0f432e6b858d32095adce6</id>
<content type='text'>
Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is
similar to setsockopt. This implementation leverages the function
do_sock_setsockopt(), which is shared with the setsockopt() system call
path.

Important to say that userspace needs to keep the pointer's memory alive
until the operation is completed. I.e, the memory could not be
deallocated before the CQE is returned to userspace.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20231016134750.1381153-11-leitao@debian.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT</title>
<updated>2023-10-19T22:42:03+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2023-10-16T13:47:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a5d2f99aff6b6f9cd6a1ab6907d8be8066114791'/>
<id>urn:sha1:a5d2f99aff6b6f9cd6a1ab6907d8be8066114791</id>
<content type='text'>
Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
where a sockptr_t is either userspace or kernel space, and handled as
such.

Differently from the getsockopt(2), the optlen field is not a userspace
pointers. In getsockopt(2), userspace provides optlen pointer, which is
overwritten by the kernel.  In this implementation, userspace passes a
u32, and the new value is returned in cqe-&gt;res. I.e., optlen is not a
pointer.

Important to say that userspace needs to keep the pointer alive until
the CQE is completed.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20231016134750.1381153-10-leitao@debian.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: add support for vectored futex waits</title>
<updated>2023-09-29T08:37:08+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-06-13T01:04:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8f350194d5cfd7016d4cd44e433df0faa4d4a703'/>
<id>urn:sha1:8f350194d5cfd7016d4cd44e433df0faa4d4a703</id>
<content type='text'>
This adds support for IORING_OP_FUTEX_WAITV, which allows registering a
notification for a number of futexes at once. If one of the futexes are
woken, then the request will complete with the index of the futex that got
woken as the result. This is identical to what the normal vectored futex
waitv operation does.

Use like IORING_OP_FUTEX_WAIT, except sqe-&gt;addr must now contain a
pointer to a struct futex_waitv array, and sqe-&gt;off must now contain the
number of elements in that array. As flags are passed in the futex_vector
array, and likewise for the value and futex address(es), sqe-&gt;addr2
and sqe-&gt;addr3 are also reserved for IORING_OP_FUTEX_WAITV.

For cancelations, FUTEX_WAITV does not rely on the futex_unqueue()
return value as we're dealing with multiple futexes. Instead, a separate
per io_uring request atomic is used to claim ownership of the request.

Waiting on N futexes could be done with IORING_OP_FUTEX_WAIT as well,
but that punts a lot of the work to the application:

1) Application would need to submit N IORING_OP_FUTEX_WAIT requests,
   rather than just a single IORING_OP_FUTEX_WAITV.

2) When one futex is woken, application would need to cancel the
   remaining N-1 requests that didn't trigger.

While this is of course doable, having a single vectored futex wait
makes for much simpler application code.

Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: add support for futex wake and wait</title>
<updated>2023-09-29T08:36:57+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-06-08T17:57:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=194bb58c6090e39bd7d9b9c888a079213628e1f6'/>
<id>urn:sha1:194bb58c6090e39bd7d9b9c888a079213628e1f6</id>
<content type='text'>
Add support for FUTEX_WAKE/WAIT primitives.

IORING_OP_FUTEX_WAKE is mix of FUTEX_WAKE and FUTEX_WAKE_BITSET, as
it does support passing in a bitset.

Similary, IORING_OP_FUTEX_WAIT is a mix of FUTEX_WAIT and
FUTEX_WAIT_BITSET.

For both of them, they are using the futex2 interface.

FUTEX_WAKE is straight forward, as those can always be done directly from
the io_uring submission without needing async handling. For FUTEX_WAIT,
things are a bit more complicated. If the futex isn't ready, then we
rely on a callback via futex_queue-&gt;wake() when someone wakes up the
futex. From that calback, we queue up task_work with the original task,
which will post a CQE and wake it, if necessary.

Cancelations are supported, both from the application point-of-view,
but also to be able to cancel pending waits if the ring exits before
all events have occurred. The return value of futex_unqueue() is used
to gate who wins the potential race between cancelation and futex
wakeups. Whomever gets a 'ret == 1' return from that claims ownership
of the io_uring futex request.

This is just the barebones wait/wake support. PI or REQUEUE support is
not added at this point, unclear if we might look into that later.

Likewise, explicit timeouts are not supported either. It is expected
that users that need timeouts would do so via the usual io_uring
mechanism to do that using linked timeouts.

The SQE format is as follows:

`addr`		Address of futex
`fd`		futex2(2) FUTEX2_* flags
`futex_flags`	io_uring specific command flags. None valid now.
`addr2`		Value of futex
`addr3`		Mask to wake/wait

Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: retain top 8bits of uring_cmd flags for kernel internal use</title>
<updated>2023-09-28T13:31:41+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2023-09-28T12:43:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=528ce6781726e022bc5dc84034360e6e8f1b89bd'/>
<id>urn:sha1:528ce6781726e022bc5dc84034360e6e8f1b89bd</id>
<content type='text'>
Retain top 8bits of uring_cmd flags for kernel internal use, so that we
can move IORING_URING_CMD_POLLED out of uapi header.

Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
