<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/io_uring/futex.c, branch v7.2-rc1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-06-03T09:38:51+00:00</updated>
<entry>
<title>futex: Add support for unlocking robust futexes</title>
<updated>2026-06-03T09:38:51+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-06-02T09:09:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3ca9595d9fb6cce6633a5b03d98c2aecb5499838'/>
<id>urn:sha1:3ca9595d9fb6cce6633a5b03d98c2aecb5499838</id>
<content type='text'>
Unlocking robust non-PI futexes happens in user space with the following
sequence:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);

  	lval = 0;
  3)	lval = atomic_xchg(lock, lval);
  4)	if (lval &amp; WAITERS)
  5)		sys_futex(WAKE,....);
  6)	robust_list_clear_op_pending();

That opens a window between #3 and #6 where the mutex could be acquired by
some other task which observes that it is the last user and:

  A) unmaps the mutex memory
  B) maps a different file, which ends up covering the same address

When the original task exits before reaching #6 then the kernel robust list
handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupting unrelated data.

PI futexes have a similar problem both for the non-contented user space
unlock and the in kernel unlock:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);

  	lval = gettid();
  3)	if (!atomic_try_cmpxchg(lock, lval, 0))
  4)		sys_futex(UNLOCK_PI,....);
  5)	robust_list_clear_op_pending();

Address the first part of the problem where the futexes have waiters and
need to enter the kernel anyway. Add a new FUTEX_ROBUST_UNLOCK flag, which
is valid for the sys_futex() FUTEX_UNLOCK_PI, FUTEX_WAKE, FUTEX_WAKE_BITSET
operations.

This deliberately omits FUTEX_WAKE_OP from this treatment as it's unclear
whether this is needed and there is no usage of it in glibc either to
investigate.

For the futex2 syscall family this needs to be implemented with a new
syscall.

The sys_futex() case [ab]uses the @uaddr2 argument to hand the pointer to
robust_list_head::list_pending_op into the kernel. This argument is only
evaluated when the FUTEX_ROBUST_UNLOCK bit is set and is therefore backward
compatible.

This is an explicit argument to avoid the lookup of the robust list pointer
and retrieving the pending op pointer from there. User space has the
pointer already available so it can just put it into the @uaddr2
argument. Aside of that this allows the usage of multiple robust lists in
the future without any changes to the internal functions as they just operate
on the provided pointer.

This requires a second flag FUTEX_ROBUST_LIST32 which indicates that the
robust list pointer points to an u32 and not to an u64. This is required
for two reasons:

    1) sys_futex() has no compat variant

    2) The gaming emulators use both both 64-bit and compat 32-bit robust
       lists in the same 64-bit application

As a consequence 32-bit applications have to set this flag unconditionally
so they can run on a 64-bit kernel in compat mode unmodified. 32-bit
kernels return an error code when the flag is not set. 64-bit kernels will
happily clear the full 64 bits if user space fails to set it.

In case of FUTEX_UNLOCK_PI this clears the robust list pending op when the
unlock succeeded. In case of errors, the user space value is still locked
by the caller and therefore the above cannot happen.

In case of FUTEX_WAKE* this does the unlock of the futex in the kernel and
clears the robust list pending op when the unlock was successful. If not,
the user space value is still locked and user space has to deal with the
returned error. That means that the unlocking of non-PI robust futexes has
to use the same try_cmpxchg() unlock scheme as PI futexes.

If the clearing of the pending list op fails (fault) then the kernel clears
the registered robust list pointer if it matches to prevent that exit()
will try to handle invalid data. That's a valid paranoid decision because
the robust list head sits usually in the TLS and if the TLS is not longer
accessible then the chance for fixing up the resulting mess is very close
to zero.

The problem of non-contended unlocks still exists and will be addressed
separately.

Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Link: https://patch.msgid.link/20260602090535.670514505@kernel.org
</content>
</entry>
<entry>
<title>io_uring/futex: ensure partial wakes are appropriately dequeued</title>
<updated>2026-04-21T18:19:06+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-04-20T20:24:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7faaa6812aba550c24bffdfd9399568223c8a477'/>
<id>urn:sha1:7faaa6812aba550c24bffdfd9399568223c8a477</id>
<content type='text'>
If a FUTEX_WAITV vectored operation is only partially woken, we
should call __futex_wake_mark() on the queue to account for that.
If not, then a later wakeup will wake the same entry, rather than
the next one in line.

Fixes: 8f350194d5cfd ("io_uring: add support for vectored futex waits")
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>treewide: Replace kmalloc with kmalloc_obj for non-scalar types</title>
<updated>2026-02-21T09:02:28+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2026-02-21T07:49:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=69050f8d6d075dc01af7a5f2f550a8067510366f'/>
<id>urn:sha1:69050f8d6d075dc01af7a5f2f550a8067510366f</id>
<content type='text'>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
<entry>
<title>io_uring/futex: use GFP_KERNEL_ACCOUNT for futex data allocation</title>
<updated>2026-01-25T17:07:09+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-01-25T17:07:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6e0d71c288fdcf5866f5d0c6cde850a091cc3c55'/>
<id>urn:sha1:6e0d71c288fdcf5866f5d0c6cde850a091cc3c55</id>
<content type='text'>
Be a bit nicer and ensure it plays nice with memcg accounting.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/futex: move futexv owned status to struct io_futexv_data</title>
<updated>2025-11-05T19:55:07+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-06-04T15:02:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=92469795363454aee7b79bd26650a44c3669b9c7'/>
<id>urn:sha1:92469795363454aee7b79bd26650a44c3669b9c7</id>
<content type='text'>
Free up a bit of space in the shared futex opcode private data, by
moving the futexv specific futexv_owned out of there and into the struct
specific to vectored futexes.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/futex: move futexv async data handling to struct io_futexv_data</title>
<updated>2025-11-05T19:54:31+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-06-04T14:58:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=88559f8b2a25a0293d7679907ffe3a58151662ef'/>
<id>urn:sha1:88559f8b2a25a0293d7679907ffe3a58151662ef</id>
<content type='text'>
Rather than alloc an array of struct futex_vector for the futexv wait
handling, wrap it in a struct io_futexv_data struct, similar to what
the non-vectored futex wait handling does.

No functional changes in this patch.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: add wrapper type for io_req_tw_func_t arg</title>
<updated>2025-11-03T15:31:26+00:00</updated>
<author>
<name>Caleb Sander Mateos</name>
<email>csander@purestorage.com</email>
</author>
<published>2025-10-31T20:34:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c33e779aba6804778c1440192a8033a145ba588d'/>
<id>urn:sha1:c33e779aba6804778c1440192a8033a145ba588d</id>
<content type='text'>
In preparation for uring_cmd implementations to implement functions
with the io_req_tw_func_t signature, introduce a wrapper struct
io_tw_req to hide the struct io_kiocb * argument. The intention is for
only the io_uring core to access the inner struct io_kiocb *. uring_cmd
implementations should instead call a helper from io_uring/cmd.h to
convert struct io_tw_req to struct io_uring_cmd *.

Signed-off-by: Caleb Sander Mateos &lt;csander@purestorage.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: add async data clear/free helpers</title>
<updated>2025-08-27T17:24:25+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-08-22T14:19:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4c0b26e23c79ecf934a92b2d9a516bffbb61c3e4'/>
<id>urn:sha1:4c0b26e23c79ecf934a92b2d9a516bffbb61c3e4</id>
<content type='text'>
Futex recently had an issue where it mishandled how -&gt;async_data and
REQ_F_ASYNC_DATA is handled. To avoid future issues like that, add a set
of helpers that either clear or clear-and-free the async data assigned
to a struct io_kiocb.

Convert existing manual handling of that to use the helpers. No intended
functional changes in this patch.

Reviewed-by: Caleb Sander Mateos &lt;csander@purestorage.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/futex: ensure io_futex_wait() cleans up properly on failure</title>
<updated>2025-08-21T19:53:33+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-08-21T19:23:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=508c1314b342b78591f51c4b5dadee31a88335df'/>
<id>urn:sha1:508c1314b342b78591f51c4b5dadee31a88335df</id>
<content type='text'>
The io_futex_data is allocated upfront and assigned to the io_kiocb
async_data field, but the request isn't marked with REQ_F_ASYNC_DATA
at that point. Those two should always go together, as the flag tells
io_uring whether the field is valid or not.

Additionally, on failure cleanup, the futex handler frees the data but
does not clear -&gt;async_data. Clear the data and the flag in the error
path as well.

Thanks to Trend Micro Zero Day Initiative and particularly ReDress for
reporting this.

Cc: stable@vger.kernel.org
Fixes: 194bb58c6090 ("io_uring: add support for futex wake and wait")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/futex: mark wait requests as inflight</title>
<updated>2025-06-04T16:50:14+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-06-04T16:25:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=079afb081c4288e94d5e4223d3eb6306d853c68b'/>
<id>urn:sha1:079afb081c4288e94d5e4223d3eb6306d853c68b</id>
<content type='text'>
Inflight marking is used so that do_exit() -&gt; io_uring_files_cancel()
will find requests with files that reference an io_uring instance,
so they can get appropriately canceled before the files go away.
However, it's also called before the mm goes away.

Mark futex/futexv wait requests as being inflight, so that
io_uring_files_cancel() will prune them. This ensures that the mm stays
alive, which is important as an exiting mm will also free the futex
private hash buckets. An io_uring futex request with FUTEX2_PRIVATE
set relies on those being alive until the request has completed. A
recent commit added these futex private hashes, which get killed when
the mm goes away.

Fixes: 80367ad01d93 ("futex: Add basic infrastructure for local task local hash")
Link: https://lore.kernel.org/io-uring/38053.1749045482@localhost/
Reported-by: Robert Morris &lt;rtm@csail.mit.edu&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
