summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
3 daysnetfilter: require Ethernet MAC header before using eth_hdr()Zhengchuan Liang6-14/+24
[ Upstream commit 62443dc21114c0bbc476fa62973db89743f2f137 ] `ip6t_eui64`, `xt_mac`, the `bitmap:ip,mac`, `hash:ip,mac`, and `hash:mac` ipset types, and `nf_log_syslog` access `eth_hdr(skb)` after either assuming that the skb is associated with an Ethernet device or checking only that the `ETH_HLEN` bytes at `skb_mac_header(skb)` lie between `skb->head` and `skb->data`. Make these paths first verify that the skb is associated with an Ethernet device, that the MAC header was set, and that it spans at least a full Ethernet header before accessing `eth_hdr(skb)`. Suggested-by: Florian Westphal <fw@strlen.de> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysvsock/virtio: fix skb overhead overflow on 32-bit buildsStefano Garzarella1-1/+1
commit 4157501b9a8ff1bbe32ff5a7d8aece7ab18eff40 upstream. On 32-bit architectures, both skb_queue_len() and SKB_TRUESIZE(0) evaluate to 32-bit values. The multiplication can overflow before being assigned to the u64 skb_overhead variable, making the skb overhead check ineffective. Cast skb_queue_len() to u64 so the multiplication is always performed in 64-bit arithmetic. This issue was reported by Sashiko while reviewing another patch. Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue") Closes: https://sashiko.dev/#/patchset/20260518090656.134588-1-sgarzare%40redhat.com Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260521124732.125771-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysvsock/virtio: fix skb overhead accounting to preserve full buf_allocStefano Garzarella1-1/+8
commit c6087c5aaad6d1b8be1a1a641e0a422218ade911 upstream. After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from buf_alloc when checking whether a new packet fits. This reduces the effective receive buffer below what the user configured via SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be silently dropped and applications that rely on the full buffer size to deadlock. Also, the reduced space is not communicated to the remote peer, so its credit calculation accounts more credit than the receiver will actually accept, causing data loss (there is no retransmission). With this approach we currently have failures in tools/testing/vsock/vsock_test.c. Test 18 sometimes fails, while test 22 always fails in this way: 18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch 22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed: Resource temporarily unavailable Fix by allowing at most `buf_alloc * 2` as the total budget for payload plus skb overhead in virtio_transport_inc_rx_pkt(), similar to how SO_RCVBUF is doubled to reserve space for sk_buff metadata. This preserves the full buf_alloc for payload under normal operation, while still bounding the skb queue growth. With this patch, all tests in tools/testing/vsock/vsock_test.c are now passing again. Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260518090656.134588-3-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysvsock/virtio: fix potential unbounded skb queueEric Dumazet1-1/+3
commit 059b7dbd20a6f0c539a45ddff1573cb8946685b5 upstream. virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc. virtio_transport_recv_enqueue() skips coalescing for packets with VIRTIO_VSOCK_SEQ_EOM. If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM, a very large number of packets can be queued because vvs->rx_bytes stays at 0. Fix this by estimating the skb metadata size: (Number of skbs in the queue) * SKB_TRUESIZE(0) Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: "Eugenio Pérez" <eperezma@redhat.com> Cc: virtualization@lists.linux.dev Link: https://patch.msgid.link/20260430122653.554058-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 dayswifi: mac80211: tests: mark HT check strictJohannes Berg1-0/+1
commit 0cfff13c94cb5fa818bb374945ff280e08dc1bb9 upstream. The HT check now only applies in strict mode since APs were found to be broken. Mark it as such. Fixes: 711a9c018ad2 ("wifi: mac80211: skip ieee80211_verify_sta_ht_mcs_support check in non-strict mode") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 dayswifi: mac80211: skip ieee80211_verify_sta_ht_mcs_support check in non-strict ↵Rio Liu1-0/+9
mode commit 711a9c018ad252b2807f85d44e1267b595644f9b upstream. Some Xfinity XB8 firmware advertises >1 spatial stream MCS indexes in their basic HT-MCS set. On cards with lower spatial streams, the check would fail, and we'd be stuck with no HT when in fact work fine with its own supported rate. This change makes it so the check is only performed in strict mode. Fixes: 574faa0e936d ("wifi: mac80211: add HT and VHT basic set verification") Signed-off-by: Rio Liu <rio@r26.me> Link: https://patch.msgid.link/99Mv9QEceyPrQhSP52MtAVmz0_kWJmzqotJjD9YW6LGLqk-AZloAueUyHCURilFkuqOh6Ecv8i2KKdSE1ujP3AnbU5QEouVisT1w_V3xdfc=@r26.me Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysnetfilter: nft_fib: fix stale stack leak via the OIFNAME registerDavide Ornaghi3-2/+8
[ Upstream commit ab185e0c4fb82dfba6fb86f8271e06f931d9c64c ] For NFT_FIB_RESULT_OIFNAME the destination register is declared with len = IFNAMSIZ (four 32-bit registers), but on the lookup-fail, RTN_LOCAL and oif-mismatch paths nft_fib{4,6}_eval() only writes one register via "*dest = 0". The remaining three registers are left as whatever was on the stack in nft_do_chain()'s struct nft_regs, and a downstream expression that loads the register span can leak that uninitialised kernel stack to userspace. The NFTA_FIB_F_PRESENT existence check has the same shape: it is only meaningful for NFT_FIB_RESULT_OIF, yet it was accepted for any result type while the eval stores a single byte via nft_reg_store8(), leaving the rest of the declared span stale. Fix both: - replace the bare "*dest = 0" in the eval with nft_fib_store_result(), which strscpy_pad()s the whole IFNAMSIZ for OIFNAME (and is already used on the other early-return path), and - restrict NFTA_FIB_F_PRESENT to NFT_FIB_RESULT_OIF and declare its destination as a single u8, so the marked span matches the one byte the eval writes. Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression") Suggested-by: Florian Westphal <fw@strlen.de> Cc: stable@vger.kernel.org Signed-off-by: Davide Ornaghi <d.ornaghi97@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> [ kept the tree's older `ip6_route_lookup()`/`rt6_info` IPv6 context and changed only `*dest = 0;` to `nft_fib_store_result(dest, priv, NULL);` ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 dayssctp: stream: fully roll back denied add-stream stateWyatt Feng1-1/+5
commit a5f8a90ac9f77c678a9781c0a464b635e0d63e49 upstream. When ADD_OUT_STREAMS is denied, SCTP only shrinks the queued chunks and then lowers outcnt. That leaves removed stream metadata behind, so a later re-add can reuse a stale ext and hit a null-pointer dereference in the scheduler get path. Fix the rollback by tearing down the removed stream state the same way other stream resizes do. Unschedule the current scheduler state, drop the removed stream ext state with sctp_stream_outq_migrate(), and then reschedule the remaining streams. This keeps scheduler-private RR/FC/PRIO lists consistent while fully rolling back denied outgoing stream additions. Fixes: 637784ade221 ("sctp: introduce priority based stream scheduler") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/d78954ecd94954653ee299400e98d74a03a6f7d3.1780603399.git.bronzed_45_vested@icloud.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 dayssctp: diag: reject stale associations in dump_one pathZhao Zhang1-8/+9
commit 5eba3e48d78edd7551b992cb7ba687019b3a78da upstream. The SCTP exact sock_diag lookup can hold a transport reference, block on lock_sock(sk), and then resume after sctp_association_free() has marked the association dead and freed its bind address list. When that happens, inet_assoc_attr_size() and inet_diag_msg_sctpasoc_fill() can still dereference association state that is no longer valid for reporting. In particular, inet_diag_msg_sctpasoc_fill() may read an empty bind-address list as a real sctp_sockaddr_entry and trigger an out-of-bounds read from unrelated association memory. Reject the association after taking the socket lock if it has been reaped or detached from the endpoint, and report the lookup as stale. This keeps the exact dump-one path from formatting torn association state. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Zhao Zhang <zzhan461@ucr.edu> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/fac6043fa20a2ff68e12958c431836f692c51268.1780113823.git.zzhan461@ucr.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysrxrpc: Fix the ACK parser to extract the SACK table for parsingDavid Howells1-9/+17
commit 333b6d5bb9f87827ac2639c737bf9613dbae7253 upstream. Fix modification of the received skbuff in rxrpc_input_soft_acks() and a potential incorrect access of the buffer in a fragmented UDP packet (the packet would probably have to be deliberately pre-generated as fragmented) when AF_RXRPC tries to extract the contents of the SACK table by copying out the contents of the SACK table into a buffer before attempting to parse AF_RXRPC assumes that it can just call skb_condense() and then validly access the SACK table from skb->data and that it will be a flat buffer - but skb_condense() can silently fail to do anything under some circumstances. Note that whilst rxrpc_input_soft_acks() should be able to parse extended ACKs, the rest of AF_RXRPC doesn't currently support that. Further, there's then no need to call skb_condense() in rxrpc_input_ack(), so don't. Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs") Reported-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://lore.kernel.org/r/20260513180907.2061972-1-michael.bommarito@gmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org cc: stable@kernel.org Link: https://patch.msgid.link/105362.1780573560@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysnet: rds: clear i_sends on setup unwindYuqi Xu1-0/+1
commit 20cf0fb715c41111469577e85e35d15f099473e0 upstream. The RDS IB connection teardown path is written so it can run during partial startup and on repeated shutdown attempts. It uses NULL pointers to distinguish resources that are still owned from resources that have already been released. When rds_ib_setup_qp() fails after allocating i_sends but before allocating i_recvs, the sends_out path frees i_sends without clearing the pointer. A later shutdown pass can still treat that stale pointer as a live send ring allocation. Clear i_sends after vfree() in the error unwind path so the existing shutdown logic continues to use the correct ownership state. Fixes: 3b12f73a5c29 ("rds: ib: add error handle") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yuqi Xu <xuyq21@lenovo.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/5a0f7624bb9845a7b67d26166a150b59e7f394ce.1779632468.git.xuyq21@lenovo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysnet: phonet: free phonet_device after RCU grace periodSantosh Kalluri1-1/+1
commit 71de0177b28da751f407581a4515cf4d762f6296 upstream. phonet_device_destroy() removes a phonet_device from the per-net device list with list_del_rcu(), but frees it immediately. RCU readers walking the same list can still hold a pointer to the object after it has been removed, leading to a slab-use-after-free. Use kfree_rcu(), matching the lifetime rule already used by phonet_address_del() for the same object type. Fixes: eeb74a9d45f7 ("Phonet: convert devices list to RCU") Cc: stable@vger.kernel.org Signed-off-by: Santosh Kalluri <santosh.kalluri129@gmail.com> Acked-by: Rémi Denis-Courmont <remi@remlab.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysinet: frags: fix use-after-free caused by the fqdir_pre_exit() flushHyunwoo Kim2-3/+3
commit 32594b09854970d7ba83eb2dc8c69a2edd158c8e upstream. On netns teardown, fqdir_pre_exit() walks the fqdir rhashtable and flushes every fragment queue that is not yet complete using inet_frag_queue_flush(). That helper frees all the skbs queued on the fragment queue but does not set INET_FRAG_COMPLETE, and leaves q->fragments_tail and q->last_run_head pointing at the freed skbs. The queue itself stays in the rhashtable. fqdir_pre_exit() first lowers high_thresh to 0 to stop new queue lookups, but it cannot stop a fragment that already obtained the queue through inet_frag_find() earlier and stalled just before taking the queue lock. Once that fragment resumes after the flush and takes the queue lock, it passes the INET_FRAG_COMPLETE check and then dereferences the freed fragments_tail. inet_frag_queue_insert() reads FRAG_CB() and ->len of that pointer and, on the append path, writes ->next_frag, causing a slab use-after-free. IPv6, nf_conntrack_reasm6 and 6lowpan reassembly share the same flush path and are affected as well. Reset rb_fragments, fragments_tail and last_run_head in inet_frag_queue_flush() so a flushed queue no longer points at the freed skbs. A fragment that resumes after the flush and takes the queue lock then finds an empty queue and starts a new run instead of dereferencing the freed fragments_tail. ip_frag_reinit() already performed this reset after its own flush, so drop the now duplicate code there. Cc: stable@vger.kernel.org Fixes: 006a5035b495 ("inet: frags: flush pending skbs in fqdir_pre_exit()") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Link: https://patch.msgid.link/ah6ukYq5G98LshdA@v4bel Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysvsock/vmci: fix sk_ack_backlog leak on failed handshakeRaf Dickson1-1/+3
commit c05fa14db43ebef3bd862ca9d073981c0358b3f0 upstream. When vmci_transport_recv_connecting_server() returns an error, vmci_transport_recv_listen() calls vsock_remove_pending() but never calls sk_acceptq_removed(). This leaves sk_ack_backlog incremented permanently. Repeated handshake failures (malformed packets, queue pair alloc failure, event subscribe failure) cause sk_ack_backlog to climb toward sk_max_ack_backlog. Once it reaches the limit the listener permanently refuses all new connections with -ECONNREFUSED, a silent denial of service requiring a process restart to recover. The two existing sk_acceptq_removed() calls in af_vsock.c do not cover this path: line 764 checks vsock_is_pending() which returns false after vsock_remove_pending(), and line 1889 is only reached on successful accept(). Fix by balancing sk_acceptq_added() with sk_acceptq_removed() on the error path. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Cc: stable@vger.kernel.org Signed-off-by: Raf Dickson <rafdog35@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260526104356.469928-1-rafdog35@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 dayswifi: nl80211: reject oversized EMA RNR listsYuqi Xu1-0/+3
commit 4cd92957e8f8cc4ebfe8a5d4203c14c592fde6b1 upstream. nl80211_parse_rnr_elems() stores the parsed element count in a u8-backed cfg80211_rnr_elems::cnt field and uses that count to size the flexible array allocation. Reject nested NL80211_ATTR_EMA_RNR_ELEMS input once the count reaches 255, before incrementing it again. This keeps the parser aligned with the data structure it fills and matches the existing bound check used by nl80211_parse_mbssid_elems(). Fixes: dbbb27e183b1 ("cfg80211: support RNR for EMA AP") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:gpt-5.4 Signed-off-by: Yuqi Xu <xuyuqiabc@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Link: https://patch.msgid.link/20260529152542.1412734-1-n05ec@lzu.edu.cn Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysmptcp: add-addr: always drop other suboptionsMatthieu Baerts (NGI0)3-38/+14
commit bd34fa0257261b76964df1c98f44b3cb4ee14620 upstream. When an ADD_ADDR needs to be sent, it could be prepared if there is enough remaining space and even if the packet is not a pure ACK. But it would be dropped soon after. Indeed, in mptcp_pm_add_addr_signal(), there is enough space to fit a DSS of 20 octets and an ADD_ADDR echo containing an IPv4 address on 8 octets for example. In this case, the packet would be prepared, the MPTCP_ADD_ADDR_ECHO bit would be removed from pm->addr_signal, but the option would be silently dropped in mptcp_established_options_add_addr() not to override DSS info in the union from 'struct mptcp_out_options', and also because mptcp_write_options() will enforce mutually exclusion with DSS. Instead, don't even try to send an ADD_ADDR if it is not a pure ACK. Retry for each new packet until a pure-ACK is emitted. That's fine to do that, because each time an ADD_ADDR (echo) is scheduled, a pure ACK is queued. This also simplifies the code, and the skb checks can be done earlier, before the lock. Note: also, since commit 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets"), opts->ahmac would not have been set to 0 when other suboptions were not dropped, and when sending an ADD_ADDR echo. That would have resulted in sending an ADD_ADDR using garbage info, where there was not enough space, instead of an echo one without the ADD_ADDR HMAC. Fixes: 1bff1e43a30e ("mptcp: optimize out option generation") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-11-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysmptcp: sockopt: set sockopt on all subflowsMatthieu Baerts (NGI0)1-3/+4
commit 7690137e70ab0fb1f8b5a30e6f087f8ee908b680 upstream. The mptcp_setsockopt_all_sf(), currently used only with TCP_MAXSEG, stopped when one subflow returned an error. Even if it is not wrong, this is different from the other helpers trying to set the option on all subflows, and then returning an error if at least one of them had an issue. Follow this behaviour, for a question of uniformity. Fixes: 51c5fd09e1b4 ("mptcp: add TCP_MAXSEG sockopt support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-8-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysmptcp: sockopt: check timestamping ret valueMatthieu Baerts (NGI0)1-2/+6
commit 57132affbc89c02e1bf73fdf5724311bdc9a29da upstream. sock_set_timestamping() can fail for different reasons. The returned value should then be checked. If sock_set_timestamping() fails for at least one subflow, the first error is now reported to the userspace, similar to what is done with other socket options. Fixes: 9061f24bf82e ("mptcp: sockopt: propagate timestamp request to subflows") Cc: stable@vger.kernel.org Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Closes: https://lore.kernel.org/willemdebruijn.kernel.178a41a53d041@gmail.com Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-7-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysmptcp: check desc->count in read_sockGang Yan1-0/+2
commit c378b1a6f8dd3e02eb08661f4d5d50f236eead03 upstream. __tcp_read_sock() checks desc->count after each skb is consumed and breaks the loop when it reaches 0. The MPTCP variant lacks this check. This is a functional bug, other subsystems also rely on this check: TLS strparser sets desc->count to 0 once a full TLS record is assembled and depends on this break to stop reading. Add the same desc->count check to __mptcp_read_sock(), mirroring __tcp_read_sock(). Fixes: 250d9766a984 ("mptcp: implement .read_sock") Cc: stable@vger.kernel.org Co-developed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-9-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysmptcp: pm: fix extra_subflows underflow on userspace PM subflow creationTao Cui1-6/+8
commit 14e9fea30b68fc75b2b3d97396a7e6adb544bd2a upstream. The userspace PM increments extra_subflows after __mptcp_subflow_connect() succeeds, but __mptcp_subflow_connect() calls mptcp_pm_close_subflow() on failure to roll back the pre-increment done by the kernel PM's fill_*() helpers. Because the userspace PM hasn't incremented yet at that point, this decrement is spurious and causes extra_subflows to underflow. Fix it by aligning the userspace PM with the kernel PM: increment extra_subflows before calling __mptcp_subflow_connect(), so the existing error path in subflow.c correctly rolls it back on failure. Also simplify the error handling by taking pm.lock only when needed for cleanup. Fixes: 77e4b94a3de6 ("mptcp: update userspace pm infos") Cc: stable@vger.kernel.org Signed-off-by: Tao Cui <cuitao@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-5-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysmptcp: allow subflow rcv wnd to shrinkPaolo Abeni1-0/+7
commit da23be77e1292cd611e736c3aa17da633d7ddce7 upstream. In MPTCP connection, the `window` field in the TCP header refers to the MPTCP-level rcv_nxt and it's right edge should not move backward. Such constraint is enforced at DSS option generation time. At the same time, the TCP stack ensures independently that the TCP-level rcv wnd right's edge does not move backward. That in turn causes artificial inflating of the MPTCP rcv window when the incoming data is acked at the TCP level and is OoO in the MPTCP sequence space (or lands in the backlog). As a consequence, the incoming traffic can exceed the receiver rcvbuf size even when the sender is not misbehaving. Prevent such scenario forcibly allowing the TCP subflow to shrink the TCP-level rcv wnd regardless of the current netns setting. Fixes: f3589be0c420 ("mptcp: never shrink offered window") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-4-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysmptcp: close TOCTOU race while computing rcv_wndPaolo Abeni1-18/+18
commit 8ab24fdebc369c0dfb90f82c1650b1e66662bb45 upstream. The MPTCP output path access locklessly the MPTCP-level ack_seq in multiple times, using possibly different values for the data_ack in the DSS option and to compute the announced rcv wnd for the same packet. Refactor the cote to avoid inconsistencies which may confuse the peer. Also ensure that the MPTCP level rcv wnd is updated only when the egress packet actually contains a DSS ack. Fixes: fa3fe2b15031 ("mptcp: track window announced to peer") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-3-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysmptcp: fix retransmission loop when csum is enabledPaolo Abeni1-0/+4
commit d1918b36edcaed0ec4ef6888b2358c6b1ddcff47 upstream. Sashiko noted that retransmission with csum enabled can actually transmit new data, but currently the relevant code does not update accordingly snd_nxt. The may cause incoming ack drop and an endless retransmission loop. Address the issue incrementing snd_nxt as needed. Fixes: 4e14867d5e91 ("mptcp: tune re-injections for csum enabled mode") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-2-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysmptcp: fix missing wakeups in edge scenariosPaolo Abeni1-0/+4
commit 9d8d28738f24b75616d6ca7a27cb4aed88520343 upstream. The mptcp_recvmsg() can fill MPTCP socket receive queue via mptcp_move_skbs(), but currently does not try to wakeup any listener, because the same process is going to check the receive queue soon. When multiple threads are reading from the same fd, the above can cause stall. Add the missing wakeup. Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-1-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysudp: clear skb->dev before running a sockmap verdictSechang Lim1-0/+8
commit 3c94f241f776562c489876ff506f366224565c21 upstream. On the UDP receive path skb->dev is repurposed as dev_scratch (the truesize/state cache set by udp_set_dev_scratch()), through the union { struct net_device *dev; unsigned long dev_scratch; } in sk_buff. When a UDP socket is in a sockmap, sk_data_ready is sk_psock_verdict_data_ready(), which calls udp_read_skb() -> recv_actor() (sk_psock_verdict_recv) to run the attached SK_SKB verdict program in softirq. If that program calls a socket-lookup helper (bpf_sk_lookup_tcp/udp, bpf_skc_lookup_tcp), bpf_skc_lookup() does: if (skb->dev) caller_net = dev_net(skb->dev); skb->dev still holds the dev_scratch value (a non-NULL integer), so dev_net() dereferences it as a struct net_device * and the kernel takes a general protection fault on a non-canonical address in softirq: Oops: general protection fault, probably for non-canonical address 0x1010000800004a0 CPU: 1 UID: 0 PID: 1406 Comm: syz.2.19 Not tainted 7.1.0-rc6 #1 PREEMPT(full) RIP: 0010:bpf_skc_lookup net/core/filter.c:7033 [inline] RIP: 0010:bpf_sk_lookup+0x45/0x160 net/core/filter.c:7047 Call Trace: <IRQ> bpf_prog_4675cb904b7071f8+0x12e/0x14e bpf_prog_run_pin_on_cpu+0xc6/0x1f0 sk_psock_verdict_recv+0x1ba/0x350 udp_read_skb+0x31a/0x370 sk_psock_verdict_data_ready+0x2e3/0x600 __udp_enqueue_schedule_skb+0x4c8/0x650 udpv6_queue_rcv_one_skb+0x3ec/0x740 udp6_unicast_rcv_skb+0x11d/0x140 ip6_protocol_deliver_rcu+0x61e/0x950 ip6_input_finish+0xa9/0x150 NF_HOOK+0x286/0x2f0 ip6_input+0x117/0x220 NF_HOOK+0x286/0x2f0 __netif_receive_skb+0x85/0x200 process_backlog+0x374/0x9a0 __napi_poll+0x4f/0x1c0 net_rx_action+0x3b0/0x770 handle_softirqs+0x15a/0x460 do_softirq+0x57/0x80 </IRQ> The rmem charge that dev_scratch accounted for is released by skb_recv_udp() on dequeue, just above, so the scratch is dead by the time recv_actor() runs. Clear skb->dev so bpf_skc_lookup() falls back to sock_net(skb->sk), which skb_set_owner_sk_safe() set just above. Fixes: 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()") Cc: stable@vger.kernel.org Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260603162737.697215-1-rhkrqnwk98@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysxfrm: iptfs: fix ABBA deadlock in iptfs_destroy_state()Tristan Madani1-3/+2
commit c8a8a75b733467b00c08b91a38dbaf207a08ed6e upstream. iptfs_destroy_state() calls hrtimer_cancel() while holding a spinlock that the timer callback also acquires, leading to an ABBA deadlock on SMP systems. For the output timer (iptfs_timer): - iptfs_destroy_state() holds x->lock, calls hrtimer_cancel() - iptfs_delay_timer() callback takes x->lock For the drop timer (drop_timer): - iptfs_destroy_state() holds drop_lock, calls hrtimer_cancel() - iptfs_drop_timer() callback takes drop_lock Both timers use HRTIMER_MODE_REL_SOFT, so their callbacks run in softirq context. When hrtimer_cancel() is called for a soft timer that is currently executing on another CPU, hrtimer_cancel_wait_running() spins on softirq_expiry_lock -- the same lock held by the softirq running the callback. If the callback is blocked waiting for the spinlock held by the caller of hrtimer_cancel(), a circular dependency forms: CPU 0: holds lock_A -> waits for softirq_expiry_lock CPU 1: holds softirq_expiry_lock -> waits for lock_A Fix by calling hrtimer_cancel() before acquiring the respective locks. hrtimer_cancel() is safe to call without holding any lock and will wait for any in-progress callback to complete. For the output timer, the lock is still acquired afterwards to drain the packet queue. For the drop timer, the lock/unlock pair is removed entirely since it only existed to serialize with the timer callback, which hrtimer_cancel() already guarantees. Found by source code audit. Fixes: 4b3faf610cc6 ("xfrm: iptfs: add new iptfs xfrm mode impl") Cc: Christian Hopps <chopps@labn.net> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: stable@vger.kernel.org Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysxfrm: iptfs: preserve shared-frag marker in iptfs_consume_frags()Takao Sato1-0/+2
commit e9096a5a170e7ecd6467bc2e08668ec39897cda7 upstream. iptfs_consume_frags() transfers paged fragments from one socket buffer to another but fails to propagate the SKBFL_SHARED_FRAG flag. This is the same class of bug that was fixed in skb_try_coalesce() for CVE-2026-46300: when fragments backed by read-only page-cache pages are merged, the marker indicating their shared nature must be preserved so that ESP can decide correctly whether in-place encryption is safe. Apply the same two-line fix used in skb_try_coalesce() to iptfs_consume_frags(). Fixes: b96ba312e21c ("xfrm: iptfs: share page fragments of inner packets") Cc: stable@vger.kernel.org # 6.14+ Signed-off-by: Takao Sato <takaosato1997@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysxfrm: espintcp: do not reuse an in-progress partial sendWyatt Feng1-0/+4
commit c381039ade2e161ab08c0eda73c4f8b9a7115928 upstream. espintcp keeps a single in-flight transmit in ctx->partial. Before building a new sk_msg, espintcp_sendmsg() first tries to flush that state through espintcp_push_msgs(). For blocking callers, espintcp_push_msgs() may return success even when the previous partial send is still pending. espintcp_sendmsg() would then reinitialize emsg->skmsg and reuse ctx->partial while the old transfer still owns that state. Do not rebuild the send message when ctx->partial is still in progress. If espintcp_push_msgs() returns with emsg->len still set, fail the new send instead of overwriting the live partial state. This is a memory-safety fix: reusing the live partial-send state can leave a stale offset attached to a new sk_msg and lead to an out-of- bounds read in the send path. tcp_sendmsg_locked() already handles waiting for send buffer memory, so the fix here is just to preserve espintcp's one-message-at-a-time transmit state. Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:GPT-5.4 Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysBluetooth: L2CAP: reject BR/EDR signaling packets over MTUsigMichael Bommarito1-0/+46
commit dd214733544427587a95f66dbf3adff072568990 upstream. net/bluetooth/l2cap_core.c:l2cap_sig_channel() accepts BR/EDR signaling packets up to the channel MTU and dispatches each command without enforcing the signaling MTU (MTUsig). A Bluetooth BR/EDR peer within radio range can send a fixed-channel CID 0x0001 packet that is larger than MTUsig and contains many L2CAP_ECHO_REQ commands before pairing. In a real-radio stock-kernel run, one 681-byte signaling packet containing 168 zero-length ECHO_REQ commands made the target transmit 168 ECHO_RSP frames over about 220 ms. Impact: a Bluetooth BR/EDR peer within radio range, before pairing, can force 168 ECHO_RSP frames from one 681-byte fixed-channel signaling packet containing packed ECHO_REQ commands. Define Linux's BR/EDR signaling MTU as the spec minimum of 48 bytes and reject any larger signaling packet with one L2CAP_COMMAND_REJECT_RSP carrying L2CAP_REJ_MTU_EXCEEDED before any command is dispatched. The Bluetooth Core spec wording for MTUExceeded says the reject identifier shall match the first request command in the packet, and that packets containing only responses shall be silently discarded. Linux intentionally deviates from that prescription: silently discarding desynchronizes the peer because the remote stack never learns its responses were dropped, and locating the first request command requires walking command headers past MTUsig, i.e. processing bytes from a packet we have already decided is too large to process. We therefore always emit one reject and use the identifier from the first command header, a single fixed-offset byte read. The unrestricted BR/EDR signaling parser and ECHO_REQ response path both trace to the initial git import; no later introducing commit is available for a Fixes tag. Cc: stable@vger.kernel.org Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Link: https://lore.kernel.org/r/20260518002800.1361430-1-michael.bommarito@gmail.com Link: https://lore.kernel.org/r/20260520135034.1060859-1-michael.bommarito@gmail.com Link: https://lore.kernel.org/r/20260521000555.3712030-1-michael.bommarito@gmail.com Assisted-by: Claude:claude-opus-4-7 Assisted-by: Codex:gpt-5-5-xhigh Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysBluetooth: hci_sync: reject oversized Broadcast Announcement prependYuqi Xu1-0/+5
commit 5c65b96b549ea2dcfde497436bf9e048deb87758 upstream. Existing advertising instances can already hold the maximum extended advertising payload. When hci_adv_bcast_annoucement() prepends the Broadcast Announcement service data to that payload, the combined data may no longer fit in the temporary buffer used to rebuild the advertising data. Reject that case before copying the existing payload and report the failure through the device log. This keeps the existing advertising data intact and avoids overrunning the temporary buffer. Fixes: 5725bc608252 ("Bluetooth: hci_sync: Fix broadcast/PA when using an existing instance") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:GPT-5.4 Signed-off-by: Yuqi Xu <xuyq21@lenovo.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysnetfilter: nft_meta_bridge: fix stale stack leak via IIFHWADDR registerDavide Ornaghi1-0/+2
commit c7d573551f9286100a055ef696cde6af54549677 upstream. NFT_META_BRI_IIFHWADDR declares its destination register with len = ETH_ALEN (6 bytes), which the register-init tracking rounds up to two 32-bit registers (8 bytes). nft_meta_bridge_get_eval() then does memcpy(dest, br_dev->dev_addr, ETH_ALEN), writing only 6 bytes and leaving the upper 2 bytes of the second register as uninitialised nft_do_chain() stack. A downstream load of that register span leaks those stale bytes to userspace. Zero the second register before the memcpy so the full declared span is written. Fixes: cbd2257dc96e ("netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support") Cc: stable@vger.kernel.org Signed-off-by: Davide Ornaghi <d.ornaghi97@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysnetfilter: nft_tunnel: fix use-after-free on object destroyTristan Madani1-1/+1
commit c32b26aaa2f9216520a38b3f4bfeec846eb3eb8a upstream. nft_tunnel_obj_destroy() calls metadata_dst_free() which directly kfree()s the metadata_dst, ignoring the dst_entry refcount. Packets that took a reference via dst_hold() in nft_tunnel_obj_eval() and are still queued (e.g. in a netem qdisc) are left with a dangling pointer. When these packets are eventually dequeued, dst_release() operates on freed memory. Replace metadata_dst_free() with dst_release() so the metadata_dst is freed only after all references are dropped. The dst subsystem already handles metadata_dst cleanup in dst_destroy() when DST_METADATA is set. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Cc: stable@vger.kernel.org Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysipv6: Fix a potential NPD in cleanup_prefix_route()Ido Schimmel1-2/+4
[ Upstream commit b70c687b7cf267fb08586667a3946c8851cad672 ] addrconf_get_prefix_route() can return the fib6_null_entry sentinel entry which has a NULL fib6_table pointer. Therefore, before setting the route's expiration time, check that we are not working with this entry, as otherwise a NPD will be triggered [1]. Note that the other callers of addrconf_get_prefix_route() are not susceptible to this bug: 1. addrconf_prefix_rcv(): Requests a route with the 'RTF_ADDRCONF | RTF_PREFIX_RT' flags which are not set on fib6_null_entry. 2. modify_prefix_route(): Fixed by commit a747e02430df ("ipv6: avoid possible NULL deref in modify_prefix_route()"). 3. __ipv6_ifa_notify(): Calls ip6_del_rt() which specifically checks for fib6_null_entry and returns an error. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] [...] Call Trace: <TASK> __kasan_check_byte (mm/kasan/common.c:573) lock_acquire.part.0 (kernel/locking/lockdep.c:5842 (discriminator 1)) _raw_spin_lock_bh (kernel/locking/spinlock.c:182 (discriminator 1)) cleanup_prefix_route (net/ipv6/addrconf.c:1280) ipv6_del_addr (net/ipv6/addrconf.c:1342) inet6_addr_del.isra.0 (net/ipv6/addrconf.c:3119) inet6_rtm_deladdr (net/ipv6/addrconf.c:4812) rtnetlink_rcv_msg (net/core/rtnetlink.c:6997) netlink_rcv_skb (net/netlink/af_netlink.c:2555) netlink_unicast (net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1899) __sock_sendmsg (net/socket.c:802 (discriminator 4)) ____sys_sendmsg (net/socket.c:2698) ___sys_sendmsg (net/socket.c:2752) __sys_sendmsg (net/socket.c:2784) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) Fixes: 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.") Reported-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com> Reviewed-by: David Ahern <dahern@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260609145448.768318-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnetfilter: nft_exthdr: fix register tracking for F_PRESENT flagFlorian Westphal1-0/+3
[ Upstream commit 772cecf198da732faebb5dcfc46d66a505be8495 ] nft_exthdr_init() passes user-controlled priv->len to nft_parse_register_store(), which marks that many bytes in the register bitmap as initialized. However, when NFT_EXTHDR_F_PRESENT is set, the eval paths write only 1 byte (nft_reg_store8) or 4 bytes (*dest = 0 on TCP/DCCP error path). When len > 4, registers beyond the first are never written, retaining uninitialized stack data from nft_regs. Bail out if userspace requests too much data when F_PRESENT is set. Reported-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com> Fixes: c078ca3b0c5b ("netfilter: nft_exthdr: Add support for existence check") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnetfilter: nf_log: validate MAC header was set before dumping itXiang Mei1-2/+2
[ Upstream commit a84b6fedbc97078788be78dbdd7517d143ad1a77 ] The fallback path of dump_mac_header() guards the MAC header access only with "skb->mac_header != skb->network_header", without checking skb_mac_header_was_set(). When the MAC header is unset, mac_header is 0xffff, so the test passes and skb_mac_header(skb) returns skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads dev->hard_header_len bytes out of bounds into the kernel log. This is reachable via the netdev logger: nf_log_unknown_packet() calls dump_mac_header() unconditionally, and an skb sent through AF_PACKET with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still unset (__dev_queue_xmit(), which would reset it, is bypassed). Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already uses, and replace the open-coded MAC header length test with skb_mac_header_len(). Only skbs with an unset MAC header are affected; valid ones are dumped as before. BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831) Read of size 1 at addr ffff88800ea49d3f by task exploit/148 Call Trace: kasan_report (mm/kasan/report.c:595) dump_mac_header (net/netfilter/nf_log_syslog.c:831) nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963) nf_log_packet (net/netfilter/nf_log.c:260) nft_log_eval (net/netfilter/nft_log.c:60) nft_do_chain (net/netfilter/nf_tables_core.c:285) nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307) nf_hook_slow (net/netfilter/core.c:619) nf_hook_direct_egress (net/packet/af_packet.c:257) packet_xmit (net/packet/af_packet.c:280) packet_sendmsg (net/packet/af_packet.c:3114) __sys_sendto (net/socket.c:2265) Fixes: 7eb9282cd0ef ("netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC header") Reported-by: Weiming Shi <bestswngs@gmail.com> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnetfilter: x_tables: avoid leaking percpu counter pointersKyle Zeng3-27/+18
[ Upstream commit f7f2fbb0e893a0238dc464f8d8c0f5609bec584f ] The native and compat get-entries paths copy the fixed rule entry header from the kernelized rule blob to userspace before overwriting the entry's counter fields with a sanitized counter snapshot. On SMP kernels, entry->counters.pcnt contains the percpu allocation address used by x_tables rule counters. A caller can provide a userspace buffer that faults during the initial fixed-header copy after pcnt has been copied but before the later sanitized counter copy runs. The syscall then returns -EFAULT while leaving the raw percpu pointer in userspace. Copy only the fixed entry prefix before counters from the kernelized rule blob, then copy the sanitized counter snapshot into the counter field. Apply this ordering to the IPv4, IPv6, and ARP native and compat get-entries implementations so a fault cannot expose the internal percpu counter pointer. Fixes: 71ae0dff02d7 ("netfilter: xtables: use percpu rule counters") Signed-off-by: Kyle Zeng <kylebot@openai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnetfilter: nf_conntrack: destroy stale expectfn expectations on unregisterWeiming Shi4-0/+24
[ Upstream commit c3009418f9fa1dcb3eb86f4d8c92583537b5faa3 ] NAT helpers such as nf_nat_h323 store a raw pointer to module text in exp->expectfn (e.g. ip_nat_q931_expect). nf_ct_helper_expectfn_unregister() only unlinks the callback descriptor and never walks the expectation table, so an expectation pending at module removal survives with a dangling exp->expectfn into freed module text. When the expected connection arrives, init_conntrack() invokes exp->expectfn(), now a stale pointer into the unloaded module. Reproduced on a KASAN build by loading the H.323 helpers, creating a Q.931 expectation, unloading nf_nat_h323, then connecting to the expected port: Oops: int3: 0000 [#1] SMP KASAN NOPTI RIP: 0010:0xffffffffa06102d1 init_conntrack.isra.0 (net/netfilter/nf_conntrack_core.c:1862) nf_conntrack_in (net/netfilter/nf_conntrack_core.c:2049) ipv4_conntrack_local (net/netfilter/nf_conntrack_proto.c:223) nf_hook_slow (net/netfilter/core.c:619) __ip_local_out (net/ipv4/ip_output.c:120) __tcp_transmit_skb (net/ipv4/tcp_output.c:1715) tcp_connect (net/ipv4/tcp_output.c:4374) tcp_v4_connect (net/ipv4/tcp_ipv4.c:345) __sys_connect (net/socket.c:2167) Modules linked in: nf_conntrack_h323 [last unloaded: nf_nat_h323] Reaching the dangling state requires CAP_SYS_MODULE in the initial user namespace to remove a NAT helper that still has live expectations, so this is a robustness fix; leaving an expectation pointing at freed text is wrong regardless. Add nf_ct_helper_expectfn_destroy(), which walks the expectation table and drops every expectation whose ->expectfn matches the descriptor being torn down. Call it from each NAT helper's exit path after the existing RCU grace period, so no expectation outlives the code it points at and no extra synchronize_rcu() is introduced. With the fix, the same reproducer runs to completion without the Oops. Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port") Reported-by: Xiang Mei <xmei5@asu.edu> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnetfilter: revalidate bridge portsFlorian Westphal4-18/+89
[ Upstream commit ccb9fd4b87538ccf19ccff78ee26700526d94867 ] ebt_redirect_tg() dereferences br_port_get_rcu() return without a NULL check, causing a kernel panic when the bridge port has been removed between the original hook invocation and an NFQUEUE reinject. A mere NULL check isn't sufficient, however. As sashiko review points out userspace can not only remove the port from the bridge, it could also place the device in a different virtual device, e.g. macvlan. If this happens, we must drop the packet, there is no way for us to reinject it into the bridge path. Switch to _upper API, we don't need the bridge port structure. Also, this fix keeps another bug intact: Both nfnetlink_log and nfnetlink_queue use CONFIG_BRIDGE_NETFILTER too aggressive, which prevents certain logging features when queueing in bridge family: NETFILTER_FAMILY_BRIDGE can be enabled while the old CONFIG_BRIDGE_NETFILTER cruft is off. Fixes tag is a common ancestor, this was always broken. Fixes: f350a0a87374 ("bridge: use rx_handler_data pointer to store net_bridge_port pointer") Reported-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com> Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysrds: mark snapshot pages dirty in rds_info_getsockopt()Breno Leitao1-1/+1
[ Upstream commit 512db8267b73a220a64180d95ab5eebe7c4964a8 ] rds_info_getsockopt() pins the destination user pages with FOLL_WRITE and the RDS_INFO_* producers memcpy the snapshot into them through kmap_atomic(). Because that copy goes through the kernel direct map, the dirty bit on the user PTE is never set, so unpin_user_pages() releases the pages without marking them dirty. A file-backed destination page can then be reclaimed without writeback, silently discarding the copied data. Use unpin_user_pages_dirty_lock() with make_dirty=true so the modified pages are marked dirty before they are unpinned. Fixes: a8c879a7ee98 ("RDS: Info and stats") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260608-rds_fix-v1-1-006c88543408@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysip6_vti: fix incorrect tunnel matching in vti6_tnl_lookup()Eric Dumazet1-0/+2
[ Upstream commit a5c0359f5cbc51a2e2b114d6041e0f3c73f903e9 ] In vti6_tnl_lookup(), when an exact match for a tunnel fails, the code falls back to searching for wildcard tunnels: - Tunnels matching the packet's local address, with any remote address wildcard remote). - Tunnels matching the packet's remote address, with any local address (wildcard local). However, vti6 stores all these different types of tunnels in the same hash table (ip6n->tnls_r_l) prone to hash collisions. The bug is that the fallback search loops in vti6_tnl_lookup() were missing checks to ensure that the candidate tunnel actually has a wildcard address. Fixes: fbe68ee87522 ("vti6: Add a lookup method for tunnels with wildcard endpoints.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20260608164613.933023-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnet/rds: fix NULL deref in rds_ib_send_cqe_handler() on masked atomic completionWeiming Shi1-0/+2
[ Upstream commit 34080db3e70ddf94c38512ad2331e3c3afca6cc1 ] rds_ib_xmit_atomic() always programs a masked atomic opcode (IB_WR_MASKED_ATOMIC_CMP_AND_SWP or IB_WR_MASKED_ATOMIC_FETCH_AND_ADD) for every RDS atomic cmsg. But the completion-side switch in rds_ib_send_unmap_op() only handles the non-masked opcodes, so a masked atomic completion falls through to default and returns rm == NULL while send->s_op is left set. rds_ib_send_cqe_handler() then dereferences the NULL rm via rm->m_final_op, oopsing in softirq context. An unprivileged AF_RDS sendmsg() of an atomic cmsg over an active RDS/IB connection triggers it; on hardware that natively accepts masked atomics (mlx4, mlx5) no extra setup is needed. RDS/IB: rds_ib_send_unmap_op: unexpected opcode 0xd in WR! Oops: general protection fault [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000190-0x0000000000000197] RIP: rds_ib_send_cqe_handler+0x25c/0xb10 (net/rds/ib_send.c:282) Call Trace: <IRQ> rds_ib_send_cqe_handler (net/rds/ib_send.c:282) poll_scq (net/rds/ib_cm.c:274) rds_ib_tasklet_fn_send (net/rds/ib_cm.c:294) tasklet_action_common (kernel/softirq.c:943) handle_softirqs (kernel/softirq.c:573) run_ksoftirqd (kernel/softirq.c:479) </IRQ> Kernel panic - not syncing: Fatal exception in interrupt Handle the masked atomic opcodes in the same case as the non-masked ones: they map to the same struct rds_message.atomic union member, so the existing container_of()/rds_ib_send_unmap_atomic() body is correct for them. Fixes: 20c72bd5f5f9 ("RDS: Implement masked atomic operations") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260606192447.1179255-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnet: guard timestamp cmsgs to real error queue skbsKyle Zeng2-8/+9
[ Upstream commit 1ee90b77b727df903033db873c75caac5c27ec98 ] skb_is_err_queue() treats PACKET_OUTGOING as the sole marker for an skb from sk_error_queue. That assumption is not true for AF_PACKET sockets: outgoing packet taps are also delivered to packet sockets with skb->pkt_type == PACKET_OUTGOING, but their skb->cb is owned by AF_PACKET instead of struct sock_exterr_skb. If such an skb is received with timestamping enabled, the generic timestamp cmsg path can read AF_PACKET control-buffer state as sock_exterr_skb::opt_stats. With SO_RXQ_OVFL enabled, the packet drop counter overlaps opt_stats. An odd drop count makes the path emit SCM_TIMESTAMPING_OPT_STATS with skb->len and skb->data. For non-linear skbs this copies past the linear head and can trigger hardened usercopy or disclose adjacent heap contents. Keep skb_is_err_queue() local to net/socket.c, but make it verify that the PACKET_OUTGOING marker is paired with the sock_rmem_free destructor installed by sock_queue_err_skb(). AF_PACKET receive skbs use normal receive ownership and no longer pass as error-queue skbs, while legitimate sk_error_queue entries keep the PACKET_OUTGOING marker and sock_rmem_free ownership. Fixes: 8605330aac5a ("tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs") Signed-off-by: Kyle Zeng <kylebot@openai.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260607021819.49698-1-kylebot@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 dayssctp: validate embedded INIT chunk and address list lengths in cookieXin Long2-3/+17
[ Upstream commit 6f4c80a2a7e6d06753b89a578b710a2499a5e62b ] sctp_unpack_cookie() only checked that the embedded INIT chunk length did not exceed the remaining cookie payload, but did not ensure that the INIT chunk is large enough to contain a complete INIT header. A malformed COOKIE_ECHO can therefore carry a truncated INIT chunk whose length field is smaller than sizeof(struct sctp_init_chunk). Later, sctp_process_init() accesses INIT parameters unconditionally, which may lead to out-of-bounds reads. In addition, raw_addr_list_len is not fully validated against the remaining cookie payload. When cookie authentication is disabled, an attacker can supply an oversized raw_addr_list_len and cause sctp_raw_to_bind_addrs() to read beyond the end of the cookie. The address parser also lacks sufficient bounds checks for parameter headers and lengths, allowing malformed address parameters to trigger out-of-bounds reads. Fix this by: - requiring the embedded INIT chunk length to be at least sizeof(struct sctp_init_chunk); - validating that the INIT chunk and raw address list together fit within the cookie payload; - verifying sufficient data exists for each address parameter header and payload before parsing it. Note that sctp_verify_init() must be called after sctp_unpack_cookie() and before sctp_process_init() when cookie authentication is disabled. This will be addressed in a separate patch. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Sashiko <sashiko-bot@kernel.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/75af23a89adf881a0895d511775e4770da367cbf.1780873427.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysip6_vti: set netns_immutable on the fallback device.Eric Dumazet1-0/+1
[ Upstream commit d289d5307762d1838aaece22c6b6fcad9e8865f9 ] john1988 and Noam Rathaus reported that vti6_init_net() does not set the netns_immutable flag on the per-netns fallback tunnel device (ip6_vti0). Other similar tunnel drivers (like ip6_tunnel, sit, ip6_gre, and ip_tunnel) correctly set this flag during their fallback device initialization to prevent them from being moved to another network namespace. Fixes: 61220ab34948 ("vti6: Enable namespace changing") Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20260608155918.787644-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 dayssctp: fix uninit-value in __sctp_rcv_asconf_lookup()Michael Bommarito1-0/+8
[ Upstream commit f8373d7090b745728de66308deeecc67e8d319ce ] __sctp_rcv_asconf_lookup() in net/sctp/input.c only checks that the ASCONF chunk can hold the ADDIP header and a parameter header, then calls af->from_addr_param(), which reads the full address (16 bytes for IPv6) trusting the parameter's declared length. An unauthenticated peer can send a truncated trailing ASCONF chunk that declares an IPv6 address parameter but stops after the 4-byte parameter header; reached from the no-association lookup path, from_addr_param() then reads uninitialized bytes past the parameter. Impact: an unauthenticated SCTP peer makes the receive path read up to 16 bytes of uninitialized memory past a truncated ASCONF address parameter. The sibling __sctp_rcv_init_lookup() bounds parameters with sctp_walk_params(); this path open-codes the fetch and omits the bound. Verify the whole address parameter lies within the chunk before from_addr_param() reads it, the same class of fix as commit 51e5ad549c43 ("net: sctp: fix KMSAN uninit-value in sctp_inq_pop"). Fixes: df2185771439 ("[SCTP]: Update association lookup to look at ASCONF chunks as well") Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20260608122234.459098-1-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysesp: fix page frag reference leak on skb_to_sgvec failureAlessandro Schino2-12/+22
[ Upstream commit 2982e599fff6faa21c8df147d96fc7af6c1a2f24 ] In esp_output_tail(), when esp->inplace is false, the old skb page frags are replaced with a new page from the xfrm page_frag cache. The source scatterlist (sg) is built from the old frags before the replacement, and esp_ssg_unref() is responsible for releasing the old page references after the crypto operation completes. However, if the second skb_to_sgvec() call (which builds the destination scatterlist from the new page) fails, the code jumps to error_free which only calls kfree(tmp). The old page frag references captured in the source scatterlist are never released: 1. sg[] is built from old frags via skb_to_sgvec() (no extra get_page) 2. nr_frags is set to 1 and frag[0] is replaced with the new page 3. Second skb_to_sgvec() fails -> goto error_free 4. kfree(tmp) frees the sg[] memory but old frags are not unref'd 5. kfree_skb() only releases frag[0] (the new page), not the old ones Fix this by adding a bool parameter to esp_ssg_unref() that, when true, unconditionally unrefs the source scatterlist frags without checking req->src and req->dst, since those fields are not yet initialized by aead_request_set_crypt() at the point of the error. Existing callers pass false to preserve the original behavior. The same issue exists in both esp4 and esp6 as the code is identical. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Alessandro Schino <7991aleschino@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Stable-dep-of: 26aad08a9289 ("esp: fix page frag reference leak on skb_to_sgvec failure") Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnet: openvswitch: fix possible kfree_skb of ERR_PTRAdrian Moreno1-0/+1
[ Upstream commit ee30dd2909d8b98619f4341c70ec8dc8e155ab02 ] After the patch in the "Fixes" tag, the allocation of the "reply" skb can happen either before or after locking the ovs_mutex. However, error cleanups still follow the classical reversed order, assuming "reply" is allocated before locking: it is freed after unlocking. If "reply" allocation happens after locking the mutex and it fails, "reply" is left with an ERR_PTR, and execution jumps to the correspondent cleanup stage which will try to free an invalid pointer. Fix this by setting the pointer to NULL after having saved its error value. Fixes: 893f139b9a6c ("openvswitch: Minimize ovs_flow_cmd_new|set critical sections.") Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://patch.msgid.link/20260604121946.942164-1-amorenoz@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysipv6: sit: reload inner IPv6 header after GSO offloadsKyle Zeng1-0/+1
[ Upstream commit f0e42f0c4337b1f220de1ddd63f47197c7dee4de ] ipip6_tunnel_xmit() caches the inner IPv6 header pointer at function entry and continues using it after iptunnel_handle_offloads(). For GSO skbs, iptunnel_handle_offloads() calls skb_header_unclone(). When the skb header is cloned, skb_header_unclone() can call pskb_expand_head(), which may move the skb head. The pskb_expand_head() contract requires pointers into the skb header to be reloaded after the call. If the later skb_realloc_headroom() branch is not taken, SIT uses the stale iph6 pointer to read the inner hop limit and DS field. That can read from a freed skb head after the old head's remaining clone is released. Reload iph6 after the offload helper succeeds and before subsequent reads from the inner IPv6 header. Keep the existing reload after skb_realloc_headroom(), since that branch can also replace the skb. Fixes: 14909664e4e1 ("sit: Setup and TX path for sit/UDP foo-over-udp encapsulation") Signed-off-by: Kyle Zeng <kylebot@openai.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+6eb9ca986d80f6f88cf9@syzkaller.appspotmail.com Link: https://patch.msgid.link/20260605073448.6524-1-kylebot@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnet: qrtr: fix refcount saturation and potential UAF in qrtr_port_removeMingyu Wang1-2/+2
[ Upstream commit a2171131ecda1ed61a594a1eb715e75fdad0fef5 ] In qrtr_port_remove(), the socket reference count is decremented via __sock_put() before the port is removed from the qrtr_ports XArray and before the RCU grace period elapses. This breaks the fundamental RCU update paradigm. It exposes a race window where a concurrent RCU reader (such as qrtr_reset_ports() or qrtr_port_lookup()) can obtain a pointer to the socket from the XArray, and attempt to call sock_hold() on a socket whose reference count has already dropped to zero. This exact race condition was hit during syzkaller fuzzing, leading to the following refcount saturation warning and a potential Use-After-Free: refcount_t: saturated; leaking memory. WARNING: CPU: 3 PID: 1273 at lib/refcount.c:22 refcount_warn_saturate+0xae/0x1d0 Modules linked in: qrtr(+) bochs drm_shmem_helper ... Call Trace: <TASK> qrtr_reset_ports net/qrtr/af_qrtr.c:768 [inline] [qrtr] __qrtr_bind.isra.0+0x48b/0x570 net/qrtr/af_qrtr.c:805 [qrtr] qrtr_bind+0x17d/0x210 net/qrtr/af_qrtr.c:901 [qrtr] kernel_bind+0xe4/0x120 net/socket.c:3592 qrtr_ns_init+0x1a6/0x380 net/qrtr/ns.c:715 [qrtr] qrtr_proto_init+0x3b/0xff0 net/qrtr/af_qrtr.c:169 [qrtr] do_one_initcall+0xf5/0x5e0 init/main.c:1283 ... </TASK> Fix this by deferring the reference count decrement until after the xa_erase() and the synchronize_rcu() complete. (Note: The v1 of this patch incorrectly replaced __sock_put() with sock_put(). As Simon Horman pointed out, the callers of qrtr_port_remove() still hold a reference to the socket, so freeing the socket memory here would lead to a subsequent UAF in the caller. Thus, the __sock_put() is kept, but only repositioned to close the RCU race.) Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260604064801.1180388-1-w15303746062@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnetdev: fix double-free in netdev_nl_bind_rx_doit()Jakub Kicinski1-3/+1
[ Upstream commit c849de7d8757a7af801fc4a4058f71d481d367f2 ] Sashiko flags that genlmsg_reply() always consumes the skb. The error path calls nlmsg_free(rsp) so we can't jump directly to it. Let's not unbind, just propagate the error to the user. This is the typical way of handling genlmsg_reply() failures. They shouldn't happen unless user does something silly like calling the kernel with an already-full rcvbuf. Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice") Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>