<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/unix, branch v7.2-rc1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-06-04T22:29:04+00:00</updated>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2026-06-04T22:29:04+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2026-06-04T22:26:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8d72997dab65b1e9e3220302e26eaecd9b99c02f'/>
<id>urn:sha1:8d72997dab65b1e9e3220302e26eaecd9b99c02f</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-7.1-rc7).

Silent conflicts:

net/wireless/nl80211.c
  cb9959ab5f99 ("wifi: cfg80211: enforce HE/EHT cap/oper consistency")
  a384ae969902 ("wifi: cfg80211: move AP HT/VHT/... operation to beacon info")
https://lore.kernel.org/aiGJDaHV4UlCexIQ@sirena.org.uk

Conflicts:

drivers/net/wireless/intel/iwlwifi/mld/ap.c
  a342c99cb70d ("wifi: iwlwifi: mld: honor BSS_CHANGED_BEACON_ENABLED")
  9bf1b409afc7 ("wifi: iwlwifi: mld: send tx power constraints before link activation")
https://lore.kernel.org/ah2bfedhV45ZxMO8@sirena.org.uk

drivers/net/wireless/intel/iwlwifi/pcie/drv.c
  093305d801fa ("wifi: iwlwifi: pcie: simplify the resume flow if fast resume is not used")
  e2323929a68a ("wifi: iwlwifi: pcie: add debug print for resume flow if powered off")
https://lore.kernel.org/ah2bfedhV45ZxMO8@sirena.org.uk

Adjacent changes:

drivers/net/ethernet/airoha/airoha_eth.c
  b38cae85d1c4 ("net: airoha: Fix use-after-free in metadata dst teardown")
  ec6c391bcca7 ("net: airoha: Introduce airoha_gdm_dev struct")

drivers/net/ethernet/microchip/lan743x_main.c
  8173d22b211f ("net: lan743x: permit VLAN-tagged packets up to configured MTU")
  e3c6508a46f5 ("net: lan743x: avoid netdev-based logging before netdev registration")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>af_unix: Fix inq_len update problem in partial read</title>
<updated>2026-06-04T01:52:25+00:00</updated>
<author>
<name>Jianyu Li</name>
<email>jianyu.li@mediatek.com</email>
</author>
<published>2026-06-01T11:36:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c1f07a7f2d47aeb9878301e7bb36bc1c2bc2be8e'/>
<id>urn:sha1:c1f07a7f2d47aeb9878301e7bb36bc1c2bc2be8e</id>
<content type='text'>
Currently inq_len is updated only when the whole skb is consumed.
If only part of the data is read, following SIOCINQ query would
get value greater than what actually left.

This change update inq_len timely in unix_stream_read_generic(),
and adjust unix_stream_read_skb() accordingly to prevent
repetitive update.

Fixes: f4e1fb04c123 ("af_unix: Use cached value for SOCK_STREAM in unix_inq_len().")
Signed-off-by: Jianyu Li &lt;jianyu.li@mediatek.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260601113640.231897-2-jianyu.li@mediatek.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>af_unix: Remove sock-&gt;state assignment.</title>
<updated>2026-06-02T18:44:10+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2026-05-29T19:18:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ed9ea881746b4b80a5e3279159c07855275990e0'/>
<id>urn:sha1:ed9ea881746b4b80a5e3279159c07855275990e0</id>
<content type='text'>
Both struct socket and struct sock have a variable to
manage its state, sock-&gt;state and sk-&gt;sk_state.

When both are used, the former typically manages syscall
state and the latter manages the actual connection state.

AF_UNIX only uses sk-&gt;sk_state.

Let's remove unnecessary assignemnts for sock-&gt;state.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260529191829.3864438-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>af_unix: Fix UAF read of tail-&gt;len in unix_stream_data_wait()</title>
<updated>2026-05-20T01:53:56+00:00</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2026-05-18T16:51:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=be309f8eae8b474a4a617eaae01324da996fc719'/>
<id>urn:sha1:be309f8eae8b474a4a617eaae01324da996fc719</id>
<content type='text'>
unix_stream_data_wait() does skb_peek_tail(&amp;sk-&gt;sk_receive_queue) without
holding any lock that prevents SKBs on that queue from being dequeued and
freed.
This has been the case since commit 79f632c71bea ("unix/stream: fix
peeking with an offset larger than data in queue").
The first consequence of this is that the pointer comparison
`tail != last` can be false even if `last` semantically refers to an
already-freed SKB while `tail` is a new SKB allocated at the same address;
which can cause unix_stream_data_wait() to wrongly keep blocking after new
data has arrived, but only in a weird scenario where a peeking recv() and
a normal recv() on the same socket are racing, which is probably not a
real problem.

But since commit 2b514574f7e8 ("net: af_unix: implement splice for stream
af_unix sockets"), `tail` is actually dereferenced, which can cause UAF in
the following race scenario (where test_setup() runs single-threaded,
and afterwards, test_thread1() and test_thread2() run concurrently in
two threads:
```
static int socks[2];
void test_setup(void) {
  socketpair(AF_UNIX, SOCK_STREAM, 0, socks);
  send(socks[1], "A", 1, 0);
  int peekoff = 1;
  setsockopt(socks[0], SOL_SOCKET, SO_PEEK_OFF, &amp;peekoff, sizeof(peekoff));
}
void test_thread1(void) {
  char dummy;
  recv(socks[0], &amp;dummy, 1, MSG_PEEK);
}
void test_thread2(void) {
  char dummy;
  recv(socks[0], &amp;dummy, 1, 0);
  shutdown(socks[1], SHUT_WR);
}
```

when racing like this:
```
thread1                       thread2
unix_stream_read_generic
  mutex_lock(&amp;u-&gt;iolock)
  skb_peek(&amp;sk-&gt;sk_receive_queue)
  skb_peek_next(skb, &amp;sk-&gt;sk_receive_queue)
  mutex_unlock(&amp;u-&gt;iolock)
                              unix_stream_read_generic
                                unix_state_lock(sk)
                                skb_peek(&amp;sk-&gt;sk_receive_queue)
                                unix_state_unlock(sk)
  unix_stream_data_wait
    unix_state_lock(sk)
    tail = skb_peek_tail(&amp;sk-&gt;sk_receive_queue)
                                spin_lock(&amp;sk-&gt;sk_receive_queue.lock)
                                __skb_unlink(skb, &amp;sk-&gt;sk_receive_queue)
                                spin_unlock(&amp;sk-&gt;sk_receive_queue.lock)
                                consume_skb(skb) [frees the SKB]
    `tail != last`: false
    `tail`: true
    `tail-&gt;len != last_len` ***UAF***
```

Fix the UAF by removing the read of tail-&gt;len; checking tail-&gt;len would
only make sense if SKBs in the receive queue of a UNIX socket could grow,
which can no longer happen.

Kuniyuki explained:

&gt; When commit 869e7c62486e ("net: af_unix: implement stream sendpage
&gt; support") added sendpage() support, data could be appended to the last
&gt; skb in the receiver's queue.
&gt;
&gt; That's why we needed to check if the length of the last skb was changed
&gt; while waiting for new data in unix_stream_data_wait().
&gt;
&gt; However, commit a0dbf5f818f9 ("af_unix: Support MSG_SPLICE_PAGES") and
&gt; commit 57d44a354a43 ("unix: Convert unix_stream_sendpage() to use
&gt; MSG_SPLICE_PAGES") refactored sendmsg(), and now data is always added
&gt; to a new skb.

That means this fix is not suitable for kernels before 6.5.

Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
Cc: stable@vger.kernel.org # 6.5.x
Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260518-b4-unix-recv-wait-hotfix-v2-1-83e29ce8ad31@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>af_unix: Reject SIOCATMARK on non-stream sockets</title>
<updated>2026-05-07T15:36:02+00:00</updated>
<author>
<name>Jiexun Wang</name>
<email>wangjiexun2025@gmail.com</email>
</author>
<published>2026-05-06T14:08:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d119775f2bad827edc28071c061fdd4a91f889a5'/>
<id>urn:sha1:d119775f2bad827edc28071c061fdd4a91f889a5</id>
<content type='text'>
SIOCATMARK reports whether the receive queue is at the urgent mark for
MSG_OOB.

In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets.
SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(),
so they should not support SIOCATMARK either.

Return -EOPNOTSUPP for non-stream sockets before checking the receive
queue.

Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Suggested-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Signed-off-by: Jiexun Wang &lt;wangjiexun2025@gmail.com&gt;
Signed-off-by: Ren Wei &lt;n05ec@lzu.edu.cn&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260506140825.2987635-1-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>af_unix: Set gc_in_progress to true in unix_gc().</title>
<updated>2026-05-05T01:34:45+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2026-05-01T07:39:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d82ba05263c69fa2437fe93e4e561cc40f4c03af'/>
<id>urn:sha1:d82ba05263c69fa2437fe93e4e561cc40f4c03af</id>
<content type='text'>
Igor Ushakov reported that unix_gc() could run with gc_in_progress
being false if the work is scheduled while running:

  Thread 1         Thread 2                     Thread 3
  --------         --------                     --------
                   unix_schedule_gc()           unix_schedule_gc()
                   `- if (!gc_in_progress)      `- if (!gc_in_progress)
                      |- gc_in_progress = true     |
                      `- queue_work()              |
  unix_gc() &lt;----------------/                     |
  |                                                |- gc_in_progress = true
  ...                                              `- queue_work()
  |                                                       |
  `- gc_in_progress = false                               |
                                                          |
  unix_gc() &lt;---------------------------------------------'
  |
  ... /* gc_in_progress == false */
  |
  `- gc_in_progress = false

unix_peek_fpl() relies on gc_in_progress not to confuse GC
by MSG_PEEK.

Let's set gc_in_progress to true in unix_gc().

Fixes: 8b90a9f819dc ("af_unix: Run GC on only one CPU.")
Reported-by: Igor Ushakov &lt;sysroot314@gmail.com&gt;
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260501073945.1884564-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2026-04-23T23:50:42+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-23T23:50:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e728258debd553c95d2e70f9cd97c9fde27c7130'/>
<id>urn:sha1:e728258debd553c95d2e70f9cd97c9fde27c7130</id>
<content type='text'>
Pull  networking fixes from Jakub Kicinski:
 "Including fixes from Netfilter.

  Steady stream of fixes. Last two weeks feel comparable to the two
  weeks before the merge window. Lots of AI-aided bug discovery. A newer
  big source is Sashiko/Gemini (Roman Gushchin's system), which points
  out issues in existing code during patch review (maybe 25% of fixes
  here likely originating from Sashiko). Nice thing is these are often
  fixed by the respective maintainers, not drive-bys.

  Current release - new code bugs:

   - kconfig: MDIO_PIC64HPSC should depend on ARCH_MICROCHIP

  Previous releases - regressions:

   - add async ndo_set_rx_mode and switch drivers which we promised to
     be called under the per-netdev mutex to it

   - dsa: remove duplicate netdev_lock_ops() for conduit ethtool ops

   - hv_sock: report EOF instead of -EIO for FIN

   - vsock/virtio: fix MSG_PEEK calculation on bytes to copy

  Previous releases - always broken:

   - ipv6: fix possible UAF in icmpv6_rcv()

   - icmp: validate reply type before using icmp_pointers

   - af_unix: drop all SCM attributes for SOCKMAP

   - netfilter: fix a number of bugs in the osf (OS fingerprinting)

   - eth: intel: fix timestamp interrupt configuration for E825C

  Misc:

   - bunch of data-race annotations"

* tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (148 commits)
  rxrpc: Fix error handling in rxgk_extract_token()
  rxrpc: Fix re-decryption of RESPONSE packets
  rxrpc: Fix rxrpc_input_call_event() to only unshare DATA packets
  rxrpc: Fix missing validation of ticket length in non-XDR key preparsing
  rxgk: Fix potential integer overflow in length check
  rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
  rxrpc: Fix potential UAF after skb_unshare() failure
  rxrpc: Fix rxkad crypto unalignment handling
  rxrpc: Fix memory leaks in rxkad_verify_response()
  net: rds: fix MR cleanup on copy error
  m68k: mvme147: Make me the maintainer
  net: txgbe: fix firmware version check
  selftests/bpf: check epoll readiness during reuseport migration
  tcp: call sk_data_ready() after listener migration
  vhost_net: fix sleeping with preempt-disabled in vhost_net_busy_poll()
  ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim
  tipc: fix double-free in tipc_buf_append()
  llc: Return -EINPROGRESS from llc_ui_connect()
  ipv4: icmp: validate reply type before using icmp_pointers
  selftests/net: packetdrill: cover RFC 5961 5.2 challenge ACK on both edges
  ...
</content>
</entry>
<entry>
<title>af_unix: Drop all SCM attributes for SOCKMAP.</title>
<updated>2026-04-18T19:12:28+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2026-04-15T18:48:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=965dc93481d1b80d341bdd16c27b16fe197175ee'/>
<id>urn:sha1:965dc93481d1b80d341bdd16c27b16fe197175ee</id>
<content type='text'>
SOCKMAP can hide inflight fd from AF_UNIX GC.

When a socket in SOCKMAP receives skb with inflight fd,
sk_psock_verdict_data_ready() looks up the mapped socket and
enqueue skb to its psock-&gt;ingress_skb.

Since neither the old nor the new GC can inspect the psock
queue, the hidden skb leaks the inflight sockets.  Note that
this cannot be detected via kmemleak because inflight sockets
are linked to a global list.

In addition, SOCKMAP redirect breaks the Tarjan-based GC's
assumption that unix_edge.successor is always alive, which
is no longer true once skb is redirected, resulting in
use-after-free below. [0]

Moreover, SOCKMAP does not call scm_stat_del() properly,
so unix_show_fdinfo() could report an incorrect fd count.

sk_msg_recvmsg() does not support any SCM attributes in the
first place.

Let's drop all SCM attributes before passing skb to the
SOCKMAP layer.

[0]:
BUG: KASAN: slab-use-after-free in unix_del_edges (net/unix/garbage.c:118 net/unix/garbage.c:181 net/unix/garbage.c:251)
Read of size 8 at addr ffff888125362670 by task kworker/56:1/496

CPU: 56 UID: 0 PID: 496 Comm: kworker/56:1 Not tainted 7.0.0-rc7-00263-gb9d8b856689d #3 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
Workqueue: events sk_psock_backlog
Call Trace:
 &lt;TASK&gt;
 dump_stack_lvl (lib/dump_stack.c:122)
 print_report (mm/kasan/report.c:379)
 kasan_report (mm/kasan/report.c:597)
 unix_del_edges (net/unix/garbage.c:118 net/unix/garbage.c:181 net/unix/garbage.c:251)
 unix_destroy_fpl (net/unix/garbage.c:317)
 unix_destruct_scm (./include/net/scm.h:80 ./include/net/scm.h:86 net/unix/af_unix.c:1976)
 sk_psock_backlog (./include/linux/skbuff.h:?)
 process_scheduled_works (kernel/workqueue.c:?)
 worker_thread (kernel/workqueue.c:?)
 kthread (kernel/kthread.c:438)
 ret_from_fork (arch/x86/kernel/process.c:164)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
 &lt;/TASK&gt;

Allocated by task 955:
 kasan_save_track (mm/kasan/common.c:58 mm/kasan/common.c:78)
 __kasan_slab_alloc (mm/kasan/common.c:369)
 kmem_cache_alloc_noprof (mm/slub.c:4539)
 sk_prot_alloc (net/core/sock.c:2240)
 sk_alloc (net/core/sock.c:2301)
 unix_create1 (net/unix/af_unix.c:1099)
 unix_create (net/unix/af_unix.c:1169)
 __sock_create (net/socket.c:1606)
 __sys_socketpair (net/socket.c:1811)
 __x64_sys_socketpair (net/socket.c:1863 net/socket.c:1860 net/socket.c:1860)
 do_syscall_64 (arch/x86/entry/syscall_64.c:?)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Freed by task 496:
 kasan_save_track (mm/kasan/common.c:58 mm/kasan/common.c:78)
 kasan_save_free_info (mm/kasan/generic.c:587)
 __kasan_slab_free (mm/kasan/common.c:287)
 kmem_cache_free (mm/slub.c:6165)
 __sk_destruct (net/core/sock.c:2282 net/core/sock.c:2384)
 sk_psock_destroy (./include/net/sock.h:?)
 process_scheduled_works (kernel/workqueue.c:?)
 worker_thread (kernel/workqueue.c:?)
 kthread (kernel/kthread.c:438)
 ret_from_fork (arch/x86/kernel/process.c:164)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:258)

Fixes: c63829182c37 ("af_unix: Implement -&gt;psock_update_sk_prot()")
Fixes: 77462de14a43 ("af_unix: Add read_sock for stream socket types")
Reported-by: Xingyu Jin &lt;xingyuj@google.com&gt;
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260415184830.3988432-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, sockmap: Take state lock for af_unix iter</title>
<updated>2026-04-16T00:23:14+00:00</updated>
<author>
<name>Michal Luczaj</name>
<email>mhal@rbox.co</email>
</author>
<published>2026-04-14T14:13:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=64c2f93fc3254d3bf5de4445fb732ee5c451edb6'/>
<id>urn:sha1:64c2f93fc3254d3bf5de4445fb732ee5c451edb6</id>
<content type='text'>
When a BPF iterator program updates a sockmap, there is a race condition in
unix_stream_bpf_update_proto() where the `peer` pointer can become stale[1]
during a state transition TCP_ESTABLISHED -&gt; TCP_CLOSE.

        CPU0 bpf                          CPU1 close
        --------                          ----------
// unix_stream_bpf_update_proto()
sk_pair = unix_peer(sk)
if (unlikely(!sk_pair))
   return -EINVAL;
                                     // unix_release_sock()
                                     skpair = unix_peer(sk);
                                     unix_peer(sk) = NULL;
                                     sock_put(skpair)
sock_hold(sk_pair) // UaF

More practically, this fix guarantees that the iterator program is
consistently provided with a unix socket that remains stable during
iterator execution.

[1]:
BUG: KASAN: slab-use-after-free in unix_stream_bpf_update_proto+0x155/0x490
Write of size 4 at addr ffff8881178c9a00 by task test_progs/2231
Call Trace:
 dump_stack_lvl+0x5d/0x80
 print_report+0x170/0x4f3
 kasan_report+0xe4/0x1c0
 kasan_check_range+0x125/0x200
 unix_stream_bpf_update_proto+0x155/0x490
 sock_map_link+0x71c/0xec0
 sock_map_update_common+0xbc/0x600
 sock_map_update_elem+0x19a/0x1f0
 bpf_prog_bbbf56096cdd4f01_selective_dump_unix+0x20c/0x217
 bpf_iter_run_prog+0x21e/0xae0
 bpf_iter_unix_seq_show+0x1e0/0x2a0
 bpf_seq_read+0x42c/0x10d0
 vfs_read+0x171/0xb20
 ksys_read+0xff/0x200
 do_syscall_64+0xf7/0x5e0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Allocated by task 2236:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 __kasan_slab_alloc+0x63/0x80
 kmem_cache_alloc_noprof+0x1d5/0x680
 sk_prot_alloc+0x59/0x210
 sk_alloc+0x34/0x470
 unix_create1+0x86/0x8a0
 unix_stream_connect+0x318/0x15b0
 __sys_connect+0xfd/0x130
 __x64_sys_connect+0x72/0xd0
 do_syscall_64+0xf7/0x5e0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Freed by task 2236:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x70
 __kasan_slab_free+0x47/0x70
 kmem_cache_free+0x11c/0x590
 __sk_destruct+0x432/0x6e0
 unix_release_sock+0x9b3/0xf60
 unix_release+0x8a/0xf0
 __sock_release+0xb0/0x270
 sock_close+0x18/0x20
 __fput+0x36e/0xac0
 fput_close_sync+0xe5/0x1a0
 __x64_sys_close+0x7d/0xd0
 do_syscall_64+0xf7/0x5e0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 2c860a43dd77 ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.")
Suggested-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Signed-off-by: Michal Luczaj &lt;mhal@rbox.co&gt;
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260414-unix-proto-update-null-ptr-deref-v4-5-2af6fe97918e@rbox.co
</content>
</entry>
<entry>
<title>bpf, sockmap: Fix af_unix null-ptr-deref in proto update</title>
<updated>2026-04-16T00:22:58+00:00</updated>
<author>
<name>Michal Luczaj</name>
<email>mhal@rbox.co</email>
</author>
<published>2026-04-14T14:13:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dca38b7734d2ea00af4818ff3ae836fab33d5d5a'/>
<id>urn:sha1:dca38b7734d2ea00af4818ff3ae836fab33d5d5a</id>
<content type='text'>
unix_stream_connect() sets sk_state (`WRITE_ONCE(sk-&gt;sk_state,
TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`).
sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that
socket is properly set up, which would include having a defined peer. IOW,
there's a window when unix_stream_bpf_update_proto() can be called on
socket which still has unix_peer(sk) == NULL.

         CPU0 bpf                            CPU1 connect
         --------                            ------------

                                WRITE_ONCE(sk-&gt;sk_state, TCP_ESTABLISHED)
sock_map_sk_state_allowed(sk)
...
sk_pair = unix_peer(sk)
sock_hold(sk_pair)
                                sock_hold(newsk)
                                smp_mb__after_atomic()
                                unix_peer(sk) = newsk

BUG: kernel NULL pointer dereference, address: 0000000000000080
RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
Call Trace:
  sock_map_link+0x564/0x8b0
  sock_map_update_common+0x6e/0x340
  sock_map_update_elem_sys+0x17d/0x240
  __sys_bpf+0x26db/0x3250
  __x64_sys_bpf+0x21/0x30
  do_syscall_64+0x6b/0x3a0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Initial idea was to move peer assignment _before_ the sk_state update[1],
but that involved an additional memory barrier, and changing the hot path
was rejected.
Then a NULL check during proto update in unix_stream_bpf_update_proto() was
considered[2], but the follow-up discussion[3] focused on the root cause,
i.e. sockmap update taking a wrong lock. Or, more specifically, missing
unix_state_lock()[4].
In the end it was concluded that teaching sockmap about the af_unix locking
would be unnecessarily complex[5].
Complexity aside, since BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT
are allowed to update sockmaps, sock_map_update_elem() taking the unix
lock, as it is currently implemented in unix_state_lock():
spin_lock(&amp;unix_sk(s)-&gt;lock), would be problematic. unix_state_lock() taken
in a process context, followed by a softirq-context TC BPF program
attempting to take the same spinlock -- deadlock[6].
This way we circled back to the peer check idea[2].

[1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
[2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/
[3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@linux.dev/
[4]: https://lore.kernel.org/netdev/CAAVpQUA+8GL_j63CaKb8hbxoL21izD58yr1NvhOhU=j+35+3og@mail.gmail.com/
[5]: https://lore.kernel.org/bpf/CAAVpQUAHijOMext28Gi10dSLuMzGYh+jK61Ujn+fZ-wvcODR2A@mail.gmail.com/
[6]: https://lore.kernel.org/bpf/dd043c69-4d03-46fe-8325-8f97101435cf@linux.dev/

Summary of scenarios where af_unix/stream connect() may race a sockmap
update:

1. connect() vs. bpf(BPF_MAP_UPDATE_ELEM), i.e. sock_map_update_elem_sys()

   Implemented NULL check is sufficient. Once assigned, socket peer won't
   be released until socket fd is released. And that's not an issue because
   sock_map_update_elem_sys() bumps fd refcnf.

2. connect() vs BPF program doing update

   Update restricted per verifier.c:may_update_sockmap() to

      BPF_PROG_TYPE_TRACING/BPF_TRACE_ITER
      BPF_PROG_TYPE_SOCK_OPS (bpf_sock_map_update() only)
      BPF_PROG_TYPE_SOCKET_FILTER
      BPF_PROG_TYPE_SCHED_CLS
      BPF_PROG_TYPE_SCHED_ACT
      BPF_PROG_TYPE_XDP
      BPF_PROG_TYPE_SK_REUSEPORT
      BPF_PROG_TYPE_FLOW_DISSECTOR
      BPF_PROG_TYPE_SK_LOOKUP

   Plus one more race to consider:

            CPU0 bpf                            CPU1 connect
            --------                            ------------

                                   WRITE_ONCE(sk-&gt;sk_state, TCP_ESTABLISHED)
   sock_map_sk_state_allowed(sk)
                                   sock_hold(newsk)
                                   smp_mb__after_atomic()
                                   unix_peer(sk) = newsk
   sk_pair = unix_peer(sk)
   if (unlikely(!sk_pair))
      return -EINVAL;

                                                 CPU1 close
                                                 ----------

                                   skpair = unix_peer(sk);
                                   unix_peer(sk) = NULL;
                                   sock_put(skpair)
   // use after free?
   sock_hold(sk_pair)

   2.1 BPF program invoking helper function bpf_sock_map_update() -&gt;
       BPF_CALL_4(bpf_sock_map_update(), ...)

       Helper limited to BPF_PROG_TYPE_SOCK_OPS. Nevertheless, a unix sock
       might be accessible via bpf_map_lookup_elem(). Which implies sk
       already having psock, which in turn implies sk already having
       sk_pair. Since sk_psock_destroy() is queued as RCU work, sk_pair
       won't go away while BPF executes the update.

   2.2 BPF program invoking helper function bpf_map_update_elem() -&gt;
       sock_map_update_elem()

       2.2.1 Unix sock accessible to BPF prog only via sockmap lookup in
             BPF_PROG_TYPE_SOCKET_FILTER, BPF_PROG_TYPE_SCHED_CLS,
             BPF_PROG_TYPE_SCHED_ACT, BPF_PROG_TYPE_XDP,
             BPF_PROG_TYPE_SK_REUSEPORT, BPF_PROG_TYPE_FLOW_DISSECTOR,
             BPF_PROG_TYPE_SK_LOOKUP.

             Pretty much the same as case 2.1.

       2.2.2 Unix sock accessible to BPF program directly:
             BPF_PROG_TYPE_TRACING, narrowed down to BPF_TRACE_ITER.

             Sockmap iterator (sock_map_seq_ops) is safe: unix sock
             residing in a sockmap means that the sock already went through
             the proto update step.

             Unix sock iterator (bpf_iter_unix_seq_ops), on the other hand,
             gives access to socks that may still be unconnected. Which
             means iterator prog can race sockmap/proto update against
             connect().

             BUG: KASAN: null-ptr-deref in unix_stream_bpf_update_proto+0x253/0x4d0
             Write of size 4 at addr 0000000000000080 by task test_progs/3140
             Call Trace:
              dump_stack_lvl+0x5d/0x80
              kasan_report+0xe4/0x1c0
              kasan_check_range+0x125/0x200
              unix_stream_bpf_update_proto+0x253/0x4d0
              sock_map_link+0x71c/0xec0
              sock_map_update_common+0xbc/0x600
              sock_map_update_elem+0x19a/0x1f0
              bpf_prog_bbbf56096cdd4f01_selective_dump_unix+0x20c/0x217
              bpf_iter_run_prog+0x21e/0xae0
              bpf_iter_unix_seq_show+0x1e0/0x2a0
              bpf_seq_read+0x42c/0x10d0
              vfs_read+0x171/0xb20
              ksys_read+0xff/0x200
              do_syscall_64+0xf7/0x5e0
              entry_SYSCALL_64_after_hwframe+0x76/0x7e

             While the introduced NULL check prevents null-ptr-deref in the
             BPF program path as well, it is insufficient to guard against
             a poorly timed close() leading to a use-after-free. This will
             be addressed in a subsequent patch.

Fixes: c63829182c37 ("af_unix: Implement -&gt;psock_update_sk_prot()")
Closes: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
Reported-by: Michal Luczaj &lt;mhal@rbox.co&gt;
Reported-by: 钱一铭 &lt;yimingqian591@gmail.com&gt;
Suggested-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Suggested-by: Martin KaFai Lau &lt;martin.lau@linux.dev&gt;
Signed-off-by: Michal Luczaj &lt;mhal@rbox.co&gt;
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260414-unix-proto-update-null-ptr-deref-v4-4-2af6fe97918e@rbox.co
</content>
</entry>
</feed>
