summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
6 daysMerge tag 'net-7.2-rc1' of ↵Linus Torvalds9-14/+44
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter and IPsec. Current release - regressions: - do not acquire dev->tx_global_lock in netdev_watchdog_up() - ethtool: keep rtnl_lock for ops using ethtool_op_get_link() - fix deadlock in nested UP notifier events Current release - new code bugs: - eth: - cn20k: fix subbank free list indexing for search order - airoha: fix BQL underflow in shared QDMA TX ring Previous releases - regressions: - netfilter: - flowtable: fix offloaded ct timeout never being extended - nf_conncount: prevent connlimit drops for early confirmed ct Previous releases - always broken: - require CAP_NET_ADMIN in the originating netns when modifying cross-netns devices - report NAPI thread PID in the caller's pid namespace - mac802154: fix dirty frag in in-place crypto for IOT radios - sctp: hold socket lock when dumping endpoints in sctp_diag, avoid an overflow - eth: gve: fix header buffer corruption with header-split and HW-GRO - af_key: initialize alg_key_len for IPComp states, prevent OOB read" * tag 'net-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (213 commits) selftests: bonding: add a test for VLAN propagation over a bonded real device vlan: defer real device state propagation to netdev_work net: add the driver-facing netdev_work scheduling API net: turn the rx_mode work into a generic netdev_work facility net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link() rxrpc: Fix rxrpc_rotate_tx_rotate() to check there's something to rotate rxrpc: Fix leak of released call in recvmsg(MSG_PEEK) rxrpc: Fix socket notification race rxrpc: Fix potential infinite loop in rxrpc_recvmsg() rxrpc: Fix oob challenge leak in cleanup after notification failure rxrpc: Fix the reception of a reply packet before data transmission afs: Fix uncancelled rxrpc OOB message handler afs: Fix further netns teardown to cancel the preallocation charger rxrpc: Fix double unlock in rxrpc_recvmsg() rxrpc: Fix leak of connection from OOB challenge rxrpc: Fix ACKALL packet handling net: hns3: differentiate autoneg default values between copper and fiber net: hns3: fix permanent link down deadlock after reset net: hns3: refactor MAC autoneg and speed configuration net: hns3: unify copper port ksettings configuration path ...
7 daysMerge tag 'nf-26-06-23' of ↵Jakub Kicinski2-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Add a workaround to avoid a possible crash if nf_nat and nft_chain_nat are compiled built-in and nf_nat fails to register, allowing nft_chain_nat to access the incorrect pernetns area. This is crash specific of all built-in compilation. From Matias Krause. 2) Revisit conncount GC optimization for confirmed conntracks, skip GC round if IPS_ASSURED is set on. This is addressing an issue for corner case use case scenario involving locally generated traffic. No crash, just a functionality fix. From Fernando F. Mancera. 3) Validate iph->ihl in flowtable IPIP tunnel support, from Lorenzo Bianconi. This a sanity check to bounces back malformed IPIP packets to classic forwarding path. 4) Kdoc fixes for x_tables.h, from Randy Dunlap. 5) Use info->options so nft_synproxy_tcp_options() stays on the same local snapshot, otherwise eval path can observe inconsistent mix of mss and timestamps. From Runyu Xiao. 6) Add conntrack_sctp_collision.sh to cover for SCTP INIT collisions. From Yi Chen. 7) Do not allow NFPROTO_UNSPEC targets if family is NFPROTO_BRIDGE in nft_compat. This allows to use non-sense targets such as xt_nat leading to crash. From Florian Westphal. 8) Add a selftest queueing from bridge family. From Florian Westphal. 9) Do not allow to reset a conntrack helper via ctnetlink. This feature antedates the creation of the conntrack-tools, and it is not used I don't have a usecase for it, I prefer to remove than fixing it. 10) Add deprecation warning for IPv4 only conntrack helpers for PPTP and IRC. From Florian Westphal. 11) Store the master tuple in the expectation object and use it, otherwise SLAB_TYPESAFE_RCU rules allow to display incorrect master tuple information through ctnetlink. 12) Run expectation eviction when inserting an expectation with no helper, this is a fix for the nft_ct custom expectation support. 13) Fix nft_ct custom expectation timeouts, userspace provides a timeout in milliseconds but kernel assumes this comes in seconds. From Florian Westphal. 14) Cap maximum number of expectations per class to 255 expectations per master conntrack at helper registration. This is a fix to restrict the maximum number of expectations per master conntrack which can be a issue for the new lazy GC expectation approach. * tag 'nf-26-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conntrack_helper: cap maximum number of expectation at helper registration netfilter: nft_ct: expectation timeouts are passed in milliseconds netfilter: nf_conntrack_expect: run expectation eviction with no helper netfilter: nf_conntrack_expect: store master_tuple in expectation netfilter: conntrack: add deprecation warnings for irc and pptp trackers netfilter: ctnetlink: do not allow to reset helper on existing conntrack selftests: nft_queue.sh: add a bridge queue test netfilter: nft_compat: ebtables emulation must reject non-bridge targets selftests: netfilter: conntrack_sctp_collision.sh: Introduce SCTP INIT collision test netfilter: nft_synproxy: stop bypassing the priv->info snapshot netfilter: x_tables.h: fix all kernel-doc warnings netfilter: flowtable: Validate iph->ihl in nf_flow_ip4_tunnel_proto() netfilter: nf_conncount: prevent connlimit drops for early confirmed ct netfilter: nf_nat: avoid invalid nat_net pointer use on failed nf_nat_init() ==================== Link: https://patch.msgid.link/20260623221548.701545-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysMerge tag 'ipsec-2026-06-22' of ↵Jakub Kicinski1-4/+11
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-06-22 1) xfrm: use compat translator only for u64 alignment mismatch Gate the XFRM_USER_COMPAT translator on COMPAT_FOR_U64_ALIGNMENT so 32-bit compat tasks on arches whose 32-bit ABI already matches the native 64-bit layout are no longer rejected with -EOPNOTSUPP. From Sanman Pradhan. 2) net: af_key: initialize alg_key_len for IPComp states Initialize the alg_key_len to 0 in the IPComp branch of pfkey_msg2xfrm_state() so an uninitialized value cannot drive xfrm_alg_len() into a slab-out-of-bounds kmemdup during XFRM_MSG_MIGRATE. From Zijing Yin. 3) xfrm: Fix dev use-after-free in xfrm async resumption Stash the original skb->dev and extend the RCU critical section across xfrm_rcv_cb() and transport_finish() to prevent a tunnel-device UAF and original-device refcount leak when a callback replaces skb->dev. From Dong Chenchen. 4) xfrm: Fix xfrm state cache insertion race Move the state-validity check inside xfrm_state_lock in the input state cache insertion path so a state cannot be killed between the check and the insert. From Herbert Xu. 5) xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[] Add READ_ONCE()/WRITE_ONCE() annotations on xfrm_policy_count and xfrm_policy_default to silence the KCSAN data race reported on net->xfrm.policy_count. From Eric Dumazet. 6) espintcp: use sk_msg_free_partial to fix partial send Replace the manual skmsg accounting in espintcp with sk_msg_free_partial() so the skmsg stays consistent on every iteration and the partial-send accounting bugs go away. From Sabrina Dubroca. 7) xfrm: validate selector family and prefixlen during match Reject mismatched address families in xfrm_selector_match() and bound prefixlen in addr4_match()/addr_match() to prevent the shift-out-of-bounds syzbot reported when an AF_UNSPEC selector with a large prefixlen is matched against an IPv4 flow. From Eric Dumazet. * tag 'ipsec-2026-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: validate selector family and prefixlen during match espintcp: use sk_msg_free_partial to fix partial send xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[] xfrm: Fix xfrm state cache insertion race xfrm: Fix dev use-after-free in xfrm async resumption net: af_key: initialize alg_key_len for IPComp states xfrm: use compat translator only for u64 alignment mismatch ==================== Link: https://patch.msgid.link/20260622075726.29685-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnetfilter: nf_conntrack_expect: store master_tuple in expectationPablo Neira Ayuso1-0/+1
Store master conntrack tuple in the expectation since exp->master might refer to a different conntrack when accessed from rcu read side lock area due to typesafe rcu rules. Fixes: 02a3231b6d82 ("netfilter: nf_conntrack_expect: store netns and zone in expectation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 daysnetfilter: conntrack: add deprecation warnings for irc and pptp trackersFlorian Westphal1-0/+4
IRC Direct client-to-client requires plaintext. IRC over TLS should be preferred, making this helper ineffective. Add a deprecation warning and update the help text to better reflect that this is needed for the DCC extension, not IRC itself. PPTP is esoteric these days and it is the only helper that requires the destroy callback in the conntrack helper API. Removal would simplify the conntrack core. Both helpers are IPv4 only. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 daysipv4: fib: Don't ignore error route in local/main tables.Kuniyuki Iwashima1-4/+3
When CONFIG_IP_MULTIPLE_TABLES is enabled but no rule is added, fib_lookup() performs route lookup directly on two tables. Since the first lookup does not properly bail out, the result of an error route in the merged local/main table could be overwritten by another route in the default table: # unshare -n # ip link set lo up # ip route add 192.168.0.0/24 dev lo table 253 # ip route add unreachable 192.168.0.0/24 # ip route get 192.168.0.1 192.168.0.1 dev lo table default uid 0 cache <local> Once a random rule is added, the error route is respected: # ip rule add table 0 # ip rule del table 0 # ip route get 192.168.0.1 RTNETLINK answers: No route to host Let's fix the inconsistent behaviour. Fixes: f4530fa574df ("ipv4: Avoid overhead when no custom FIB rules are installed.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260619212753.3367244-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge tag 'nf-26-06-21' of ↵Jakub Kicinski3-3/+16
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net. This batches fixes for real crashes with trivial/correctness fixes. There is too a rework of the conntrack expectation timeout strategy to deal with a possible race when removing an expectation. 1) Fix the incorrect flowtable timeout extension for entries in hw offload, from Adrian Bente. This is correcting a defect in the functionality, no crash. 2) Hold reference to device under the fake dst in br_netfilter, from Haoze Xie. This is fixing a possible UaF if the device is removed while packet is sitting in nfqueue. 3) Reject template conntrack in xt_cluster, otherwise access to uninitialize conntrack fields are possible leading to WARN_ON due to unset layer 3 protocol. From Wyatt Feng. 4) Make sure the IPv6 tunnel header is in the linear skb data area before pulling. While at it remove incomplete NEXTHDR_DEST support. From Lorenzo Bianconi. This possibly leading to crash if IPv4 header is not in the linear area. 5) Use test_bit_acquire in ipset hash set to avoid reordering of subsequent memory access. This is addressing a LLM related report, no crash has been observed. From Jozsef Kadlecsik. 6) Use test_bit_acquire in ipset bitmap set too, for the same reason as in the previous patch, from Jozsef Kadlecsik. 7) Call kfree_rcu() after rcu_assign_pointer() to address a possible UaF if kfree_rcu() runs inmediately, which to my understanding never happens. Never observed in practise, reported by LLM. Also from Jozsef Kadlecsik. 8) Use disable_delayed_work_sync() instead cancel_delayed_work_sync() to avoid that ipset GC handler re-queues work as reported by LLM. From Jozsef Kadlecsik. This is for correctness. 9) Restore the check in nft_payload for exceeding payloda offset over 2^16. From Florian Westphal. This fixes a silent truncation, not a big deal, but better be assertive and reject it. 10) Validate NFT_META_BRI_IIFHWADDR can only run from bridge prerouting. From Florian Westphal. Harmless but it could allow to read bytes from skb->cb. 11) Zero out destination hardware address during the flowtable path setup, also from Florian. This is a correctness fix, LLM points that possible infoleak can happen but topology to achieve it is not clear. 12) Skip IPv4 options if present when building the IPV4 reject reply. Otherwise bytes in the IPv4 options header can be sent back to origin where the ICMP header is being expected. Again from Florian Westphal. 13) Replace timer API for expectation by GC worker approach. This is implicitly fixing a race between nf_ct_remove_expectations() which might fail to remove the expectation due to timer_del() returning false because timer has expired and callback is being run concurrently. This fix is addressing a crash that has been already reported with a reproducer. 14) Check if br_vlan_get_pvid_rcu() fails, otherwise possible stack infoleak of 4-bytes. From Florian Westphal. * tag 'nf-26-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak netfilter: nf_conntrack_expect: use conntrack GC to reap expectations netfilter: nf_reject: skip iphdr options when looking for icmp header netfilter: nft_flow_offload: zero device address for non-ether case netfilter: nft_meta_bridge: add validate callback for get operations netfilter: nft_payload: reject offsets exceeding 65535 bytes netfilter: ipset: make sure gc is properly stopped netfilter: ipset: fix order of kfree_rcu() and rcu_assign_pointer() netfilter: ipset: Don't use test_bit() in lockless RCU readers in bitmap types netfilter: ipset: Don't use test_bit() in lockless RCU readers in hash types netfilter: flowtable: fix and simplify IP6IP6 tunnel handling netfilter: xt_cluster: reject template conntracks in hash match netfilter: nf_queue: pin bridge device while NFQUEUE holds fake dst netfilter: flowtable: fix offloaded ct timeout never being extended ==================== Link: https://patch.msgid.link/20260620222738.112506-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: dst_metadata: fix false-positive memcpy overflow in tun_dst_uncloneIlya Maximets1-2/+5
kmalloc_flex() in metadata_dst_alloc() sets __counted_by for the structure to the options_len, which is then initialized to zero. Later, we're initializing the structure by copying the tunnel info together with the options, and this triggers a warning for a potential memcpy overflow, since the compiler estimates that the options can't fit into the structure, even though the memory for them is actually allocated. memcpy: detected buffer overflow: 104 byte write of buffer size 96 WARNING: CPU: X PID: Y at lib/string_helpers.c:1036 __fortify_report skb_tunnel_info_unclone+0x179/0x190 geneve_xmit+0x7fe/0xe00 The issue is triggered when built with clang and source fortification. Fix that by doing the copy in two stages: first - the main data with the options_len, then the options. This way the correct length should be known at the time of the copy. It would be better if the options_len never changed after allocation, but the allocation code is a little separate from the initialization and it would be awkward and potentially dangerous to return a struct with options_len set to a non-zero value from the metadata_dst_alloc(). Another option would be to use ip_tunnel_info_opts_set(), but it is doing too many unnecessary operations for the use case here. Fixes: 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") Reported-by: Johan Thomsen <write@ownrisk.dk> Closes: https://lore.kernel.org/netdev/CAKv6aAM8_EWgXScnKmKYm_4SwGDVBK++dzfP+Y6msUXbp99QUw@mail.gmail.com/ Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Link: https://patch.msgid.link/20260616100332.1308294-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysMerge tag '9p-for-7.2-rc1' of https://github.com/martinetd/linuxLinus Torvalds1-0/+2
Pull 9p updates from Dominique Martinet: "Asides of the avalanche of LLM-driven fixes, there are a couple of big changes this cycle: - negative dentry and symlink cache - a way out of the unkillable "io_wait_event_killable" (because it looped around waiting for the request flush to come back from server; this has been bugging syzcaller folks since forever): I'm still not 100% sure about this patch, but I think it's as good as we'll ever get, and will keep testing a bit further in the coming weeks The rest is more noisy than usual, but shouldn't cause any trouble" * tag '9p-for-7.2-rc1' of https://github.com/martinetd/linux: 9p: Add missing read barrier in virtio zero-copy path net/9p: Replace strlen() strcpy() pair with strscpy() 9p: skip nlink update in cacheless mode to fix WARN_ON net/9p: fix race condition on rdma->state in trans_rdma.c 9p: v9fs_file_do_lock: replace WARN_ONCE with p9_debug 9p: Enable symlink caching in page cache 9p: Set default negative dentry retention time for cache=loose 9p: Add mount option for negative dentry cache retention 9p: Cache negative dentries for lookup performance 9p: avoid returning ERR_PTR(0) from mkdir operations 9p: avoid putting oldfid in p9_client_walk() error path net/9p: fix infinite loop in p9_client_rpc on fatal signal docs/filesystems/9p: fix broken external links 9p: invalidate readdir buffer on seek 9p: use kvzalloc for readdir buffer net/9p/usbg: Constify struct configfs_item_operations
11 days9p: Cache negative dentries for lookup performanceRemi Pommarel1-0/+2
Not caching negative dentries can result in poor performance for workloads that repeatedly look up non-existent paths. Each such lookup triggers a full 9P transaction with the server, adding unnecessary overhead. A typical example is source compilation, where multiple cc1 processes are spawned and repeatedly search for the same missing header files over and over again. This change enables caching of negative dentries, so that lookups for known non-existent paths do not require a full 9P transaction. The cached negative dentries are retained for a configurable duration (expressed in milliseconds), as specified by the ndentry_timeout field in struct v9fs_session_info. If set to -1, negative dentries are cached indefinitely. This optimization reduces lookup overhead and improves performance for workloads involving frequent access to non-existent paths. Signed-off-by: Remi Pommarel <repk@triplefau.lt> Message-ID: <e542317dd03bbadb5249abd3ea6aecfdca692c19.1779355927.git.repk@triplefau.lt> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
11 daysnetfilter: nf_conntrack_expect: use conntrack GC to reap expectationsPablo Neira Ayuso1-3/+13
This patch replaces the timer API by GC worker approach for expectations, as it already happened in many other subsystems. Use the existing conntrack GC worker to iterate over the local list of expectations in the master conntrack to reap expired expectations. Check IPS_HELPER_BIT to run GC for expectations, set it on for nft_ct expectation which nevers sets it. Hold the expectation spinlock while iterating over the master conntrack expectation list to synchronize with nf_ct_remove_expectations(). This also performs runtime packet path garbage collection through the expectation insertion and lookup functions while walking over one of the chains of the global expectation hashtables. Unconfirmed conntrack entries are skipped since ct->ext can be reallocated and dying are skipped since those will be gone soon. Set on IPS_HELPER_BIT if the helper ct extension is added, then the new GC worker does not need to bump the ct refcount to check if the ct->ext helper is available. This removes the extra bump on the refcount for expectation timers, this allows to remove several nf_ct_expect_put() calls after the unlink, after this update only refcount remains at 1 while on the expectation hashes. This patch implicitly addresses a race with the existing timer API allowing an expectation to access a stale exp->master pointer which has been already released when expectation removal loses races with an expiring timer, ie. timer_del() reporting false. Add a new NF_CT_EXPECT_DEAD flag to reap this expectation via GC. This is needed by nf_conntrack_unexpect_related() which is called in error paths to invalidate newly created expectations that has been added into the hashes. These expectactions cannot be inmediately released as GC or nf_ct_remove_expectations() could race to make it. On expectation insert, the runtime GC reaps stale expectations before checking the expectation limit set by policy. Set current timestamp in nf_ct_expect_alloc(), then add the expectation policy timeout (or custom timeout specified added on top of this) to specify the expectation lifetime. Fixes: bffcaad9afdf ("netfilter: ctnetlink: ensure safe access to master conntrack") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 daysnetfilter: nft_meta_bridge: add validate callback for get operationsFlorian Westphal1-0/+2
Blamed commit added NFT_META_BRI_IIFHWADDR to the set validate callback, yet this is a get operation. Add a get validate callback and move the NFT_META_BRI_IIFHWADDR key there. AFAICS this is harmless, NFT_META_BRI_IIFHWADDR can deal with a NULL input device and the set handler ignores a NFT_META_BRI_IIFHWADDR operation, but it allows to read 4 bytes off bridge skb->cb[]. Fixes: cbd2257dc96e ("netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
13 daysnetfilter: nf_queue: pin bridge device while NFQUEUE holds fake dstHaoze Xie1-0/+1
The br_netfilter fake rtable is embedded in struct net_bridge and is attached to bridged packets with skb_dst_set_noref(). If such a packet is queued to NFQUEUE, __nf_queue() upgrades that fake dst with skb_dst_force(). At that point the queued skb can hold a real dst reference after bridge teardown has started. The problem is not that every bridged packet needs its own dst reference. The problem is that NFQUEUE can keep the bridge private fake dst alive after unregister begins. Fix this by keeping the bridge fake dst model unchanged and pinning the bridge master device only while the packet sits in NFQUEUE. Record the bridge device in nf_queue_entry when the queued skb carries a bridge fake dst, take a device reference for the queue lifetime, and drop it when the queue entry is freed. Also make sure queued entries are reaped when that bridge device goes down, and drop the redundant nf_bridge_info_exists() test from the fake dst detection. This keeps netdev_priv(br->dev) alive until verdict completion, so the embedded fake rtable and its metrics backing storage cannot be freed out from under dst_release(). It also avoids the constant refcount bump and avoids using ipv4-specific dst helpers for IPv6 bridge traffic. Fixes: 34666d467cbf ("netfilter: bridge: move br_netfilter out of the core") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-06-18sctp: hold socket lock when dumping endpoints in sctp_diagXin Long1-1/+2
SCTP_DIAG endpoint dumping was traversing endpoint address lists without holding lock_sock(), while those lists could change concurrently via socket operations (e.g., bindx changes). This creates a race where nla_reserve() counts addresses under RCU protection, but the subsequent copy may see fewer entries, potentially leaking uninitialized memory to userspace. Fix this by: - Taking a reference on each endpoint during hash traversal - Moving socket operations (lock_sock()) outside read_lock_bh() - Serializing address list access during dump - Reworking sctp_for_each_endpoint() to support restart-based traversal with (net, pos) tracking Also: - Add WARN_ON_ONCE() for inconsistent address counts - Fix idiag_states filtering for LISTEN vs association cases - Skip dumping endpoints being freed (ep->base.dead) - Move dump position tracking into iterator, removing cb->args[4] and its comment for sctp_ep_dump()., - Update the comment for cb->args[4] and remove the comment for unused cb->args[5] for sctp_sock_dump(). Note: traversal is restart-based and may re-scan buckets multiple times, but this is acceptable due to small bucket sizes and required to support sleeping-safe callbacks. This issue was reported by Nico Yip (@_cyeaa_) working with TrendAI Zero Day Initiative. Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com> Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/4c1b49ab87e0f7d552ebd8172b364b1994e913c9.1781552190.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-18net: ip_gre: require CAP_NET_ADMIN in the device netns for changelinkMaoyi Xie1-0/+2
A tunnel changelink() operates on at most two netns, dev_net(dev) and the tunnel link netns t->net. They differ once the device is created in or moved to a netns other than the one the request runs in. The rtnl changelink path checks CAP_NET_ADMIN only against dev_net(dev), so a caller privileged there but not in t->net can rewrite a tunnel that lives in t->net. Add rtnl_dev_link_net_capable() next to rtnl_get_net_ns_capable() in net/core/rtnetlink.c. It requires CAP_NET_ADMIN in the link netns and is skipped when the link netns is dev_net(dev), where the rtnl path already checked it. The other patches in this series use the same helper. Gate ipgre_changelink() and erspan_changelink() with it, at the top of the op before any attribute is parsed, because the parsers update live tunnel fields first. ipgre_netlink_parms() sets t->collect_md before ip_tunnel_changelink() runs. Commit 8b484efd5cb4 ("ip6: vti: Use ip6_tnl.net in vti6_siocdevprivate().") added the same check on the ioctl path. This adds it on RTM_NEWLINK. Reported-by: Xiao Liang <shaw.leon@gmail.com> Closes: https://lore.kernel.org/netdev/CABAhCOSzP1vaThGV35_VnsRCb=87_CPjPVsTHbq905k8A+BuUg@mail.gmail.com/ Fixes: b57708add314 ("gre: add x-netns support") Cc: stable@vger.kernel.org Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260612085941.3158249-2-maoyixie.tju@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-17xfrm: validate selector family and prefixlen during matchEric Dumazet1-0/+7
syzbot reported a shift-out-of-bounds in xfrm_selector_match() due to AF_UNSPEC selector with large prefixlen (e.g. 128) matched against IPv4 flow (when XFRM_STATE_AF_UNSPEC is set). Fix this by: - Rejecting mismatched families in xfrm_selector_match. - Returning false in addr4_match if prefixlen > 32. - Returning false in addr_match if prefixlen > 128 (prevents overflow). Fixes: 3f0ab59e6537 ("xfrm: validate new SA's prefixlen using SA family when sel.family is unset") Reported-by: syzbot+9383b1ff0df4b29ca5e6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a2fbe35.be3f099c.2836ae.0018.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-06-17xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]Eric Dumazet1-4/+4
KCSAN reported a data race involving net->xfrm.policy_count access. Add missing READ_ONCE()/WRITE_ONCE() annotations on xfrm_policy_count and xfrm_policy_default. Fixes: 2518c7c2b3d7 ("[XFRM]: Hash policies when non-prefixed.") Reported-by: syzbot+d85ba1c732720b9a4097@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a2b9e96.99669fcc.12a77b.0006.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-0/+19
Merge in late fixes in preparation for the net-next PR. Conflicts: net/tls/tls_sw.c 406e8a651a7b ("net: skmsg: preserve sg.copy across SG transforms") 79511603a65b ("tls: remove dead sockmap (psock) handling from the SW path") drivers/net/ethernet/microsoft/mana/mana_en.c f8fd56977eeea ("net: mana: guard TX wq object destroy with INVALID_MANA_HANDLE check") d07efe5a6e641 ("net: mana: Use per-queue allocation for tx_qp to reduce allocation size") https://lore.kernel.org/ajAPXu-C_PuTgV-a@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-16tcp: rehash onto different local ECMP path on retransmit timeoutNeil Spring3-6/+40
Currently sk_rethink_txhash() re-rolls the socket's txhash on RTO, PLB, and spurious-retransmission events, but the cached route is reused and the new hash is not propagated into the ECMP path selection logic. Two changes are needed to make rehash select a different local ECMP path: 1. Add __sk_dst_reset() alongside sk_rethink_txhash() in tcp_write_timeout(), tcp_rcv_spurious_retrans(), and tcp_plb_check_rehash() so the cached dst is invalidated and the next transmit triggers a fresh route lookup. 2. Set fl6->mp_hash from sk_txhash (or tcp_rsk(req)->txhash for SYN/ACK retransmits and syncookies) in tcp_v6_connect(), inet6_sk_rebuild_header(), inet6_csk_route_req(), inet6_csk_route_socket(), tcp_v6_send_response(), and cookie_v6_check() so fib6_select_path() picks a path based on the new hash. The mp_hash override only applies to fib_multipath_hash_policy 0 (the default L3 policy). Its hash includes the flow label, but that is 0 by default -- np->flow_label is unset, and auto_flowlabels only computes the on-wire label later, per packet -- so flows to the same peer share one local path. Keying the hash on sk_txhash makes the local path per-connection and lets a rehash re-select it. Policies 1-3 are left unchanged. The mp_hash assignment is factored into a small helper, ip6_ecmp_set_mp_hash(), shared by inet6_csk_route_req(), inet6_csk_route_socket(), tcp_v6_connect(), inet6_sk_rebuild_header(), tcp_v6_send_response(), and cookie_v6_check(). It applies (txhash >> 1) ?: 1 for policy 0 (the >> 1 keeps mp_hash in the 31-bit range; ?: 1 keeps it non-zero, since 0 would fall back to rt6_multipath_hash()). inet6_csk_route_socket() calls it only for sk_protocol == IPPROTO_TCP so that non-TCP callers (e.g., L2TP via inet6_csk_xmit) fall through to rt6_multipath_hash() and retain their existing flow-key-based ECMP behavior. tcp_v6_send_response() also sets mp_hash from the response txhash so that a control packet (a RST from the full socket, or an ACK from a time-wait socket) selects the same local ECMP nexthop as the connection's txhash rather than falling back to the flow hash. The time-wait socket's tw_txhash is copied from sk_txhash when the connection enters TIME_WAIT, so it reflects any rehash that occurred. Setting mp_hash explicitly is necessary because the default ECMP hash derives from fl6->flowlabel via np->flow_label, which is not updated from sk_txhash (REPFLOW is off by default). ip6_make_flowlabel() cannot help either, as it runs after the route lookup. As a consequence, for policy 0 the local ECMP path of an IPv6 TCP flow follows sk_txhash even when fl6->flowlabel is non-zero, e.g. a reflected (REPFLOW) or explicitly set (IPV6_FLOWLABEL_MGR) flow label. This is intentional: only local path selection changes, so rehash can recover from a failed path; the on-wire flow label is unchanged. sk_set_txhash() is moved before ip6_dst_lookup_flow() in tcp_v6_connect() so the initial ECMP path is selected by the same txhash that subsequent route rebuilds will use. This avoids unintended path changes when the cached dst is naturally invalidated (e.g., by PMTU discovery or route changes). The rehash sites (tcp_write_timeout(), tcp_plb_check_rehash(), and tcp_rcv_spurious_retrans()) call __sk_rethink_txhash_reset_dst(), which re-rolls the txhash and, when it changed, drops the cached dst so the next transmit re-runs route selection. The dst reset is guarded by sk->sk_family == AF_INET6 since IPv4 ECMP does not currently use sk_txhash for path selection. For IPv4-mapped IPv6 sockets this produces a redundant dst reset on a cold path (RTO/PLB); the subsequent IPv4 route lookup returns the same result. The helper is deliberately separate from sk_rethink_txhash() itself: dst_negative_advice() calls sk_rethink_txhash() before its own dst op, so resetting the dst inside sk_rethink_txhash() would skip that op (e.g. rt6_remove_exception_rt()). For syncookies, cookie_init_sequence() computes the cookie value before route_req() and sets txhash so the SYN-ACK selects the same ECMP path that cookie_v6_check() will use when the full socket is created. cookie_tcp_reqsk_init() derives txhash from the cookie so the full socket's ECMP path matches the SYN-ACK. Both the SYN-ACK assignment in tcp_conn_request() and the full-socket assignment in cookie_tcp_reqsk_init() set txhash from the cookie for IPv4 and IPv6 alike. On IPv6 this drives ECMP path selection; on IPv4, which does not use sk_txhash for ECMP, it only affects TX-queue selection. That selection scales the hash by its high bits (reciprocal_scale()), which are uniform in the keyed secure_tcp_syn_cookie() output -- the MSS index only perturbs the low bits -- so the queue distribution matches net_tx_rndhash(). cookie_init_sequence() is split from the former version that also called tcp_synq_overflow() and incremented SYNCOOKIESSENT; those side effects are now in cookie_record_sent(), called after route_req() succeeds so they are not bumped when route_req() fails. cookie_record_sent() is guarded by CONFIG_SYN_COOKIES to match the guard on tcp_synq_overflow(). route_req() receives 0 as tw_isn for the syncookie path so that tcp_v6_init_req() still saves ireq->pktopts for REPFLOW flowlabel reflection and IPv6 cmsg options. The ecn_ok clear for syncookies without timestamps stays after tcp_ecn_create_request() so it takes precedence. Signed-off-by: Neil Spring <ntspring@meta.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260615042158.1600746-2-ntspring@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-16sctp: correct CONFIG_SCTP_DBG_OBJCNT macro name in commentEthan Nelson-Moore1-1/+1
A comment in <net/sctp/sctp.h> incorrectly refers to CONFIG_SCTP_DBG_OBJCOUNT instead of CONFIG_SCTP_DBG_OBJCNT. Correct it. Discovered while searching for CONFIG_* symbols referenced in code but not defined in any Kconfig file. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Link: https://patch.msgid.link/20260613233725.162470-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-16Merge tag 'nf-next-26-06-14' of ↵Jakub Kicinski2-5/+31
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next. More specifically, this contains conncount rework to address AI related reports, assorted Netfiter updates and two small incremental updates on IPVS: 1) Replace old obsolete workqueues (system_wq, system_unbound_wq) in IPVS, from Marco Crivellari. 2) Replace WARN_ON{_ONCE} by DEBUG_NET_WARN_ON_ONCE in nf_tables. In the recent years, reporters say that the use of WARN_ON{_ONCE} in conjunction with panic_on_warn=1 results in DoS. Let's replace it by DEBUG_NET_WARN_ON_ONCE so this is only exercised by test infrastructure and fuzzers, while also providing context to AI agents. From Fernando F. Mancera. Five patches from Florian Westphal to address AI reports in the conncount infrastructures: 3) Fix missing rcu read lock section when calling __ovs_ct_limit_get_zone_limit(). 4) Add a dedicate lock per rbtree tree, this increases memory usage but it should improve scalability. 5) Add a helper function to find the rbtree node, no functional changes are intented. 6) Add sequence counter to detect concurrent tree modifications and retry lookups. 7) Add locks to GC conncount walk and address other nitpicks. Then, several assorted updates: 8) Defensive Tree-wide addition of NULL checks for ct extensions. 9) Bail out if flowtable bypass cannot be fully set up from the flow offload expression, instead of lazy building a likely incomplete one. 10) Fix documentation for the new conn_max sysctl toggle in IPVS. 11) Add nf_dev_xmit_recursion*() helpers and use them, to address recent AI reports. * tag 'nf-next-26-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_dup_netdev: add nf_dev_xmit_recursion*() helpers and use them ipvs: fix doc syntax for conn_max sysctl netfilter: flowtable: bail out if forward path cannot be discovered netfilter: conntrack: check NULL when retrieving ct extension netfilter: nf_conncount: gc and rcu fixes netfilter: nf_conncount: add sequence counter to detect tree modifications netfilter: nf_conncount: split count_tree_node rbtree walk into helper netfilter: nf_conncount: use per nf_conncount_data spinlocks netfilter: nf_conncount: callers must hold rcu read lock netfilter: nf_tables: use DEBUG_NET_WARN_ON_ONCE in packet and control paths ipvs: Replace use of system_unbound_wq with system_dfl_long_wq ==================== Link: https://patch.msgid.link/20260614114605.474783-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-14netfilter: nf_dup_netdev: add nf_dev_xmit_recursion*() helpers and use themPablo Neira Ayuso1-5/+29
Update nft_dup and nft_fwd to use the nf_dev_xmit_recursion() helpers. This patch also disables BH when transmitting the skb to address a possible migration to different CPU leading to imbalanced decrementation of the recursion counters. This is modeled after Florian Westphal's dev_xmit_recursion*() API available since commit 97cdcf37b57e ("net: place xmit recursion in softnet data") according to its current state in the tree. Fixes: 1d47b55b36d2 ("netfilter: nft_fwd_netdev: use recursion counter in neigh egress path") Fixes: f37ad9127039 ("netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-06-14netfilter: conntrack: check NULL when retrieving ct extensionPablo Neira Ayuso1-0/+2
nf_ct_ext_find() might return NULL if ct extension is not found. Add also the null checks to: - nfct_help() - nfct_help_data() - nfct_seqadj() - nfct_nat() This is defensive, for safety reasons. nf_ct_ext_find() used to return NULL if the extension is stale for unconfirmed conntracks if the genid validation fails. Skip NULL check in nf_nat_inet_fn() given this is valid to be NULL for non-initialized ct nat extensions. While at it, fetch ct helper area in nf_ct_expect_related_report() only once and pass it on to other ancilliary functions. Replace WARN_ON() by WARN_ON_ONCE() in nf_ct_unlink_expect_report(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-06-14devlink: Implement devlink param multi attribute nested data valuesSaeed Mahameed1-0/+8
Devlink param value attribute is not defined since devlink is handling the value validating and parsing internally, this allows us to implement multi attribute values without breaking any policies. Devlink param multi-attribute values are considered to be dynamically sized arrays of u64 values, by introducing a new devlink param type DEVLINK_PARAM_TYPE_U64_ARRAY, driver and user space can set a variable count of u64 values into the DEVLINK_ATTR_PARAM_VALUE_DATA attribute. Implement get/set parsing and add to the internal value structure passed to drivers. This is useful for devices that need to configure a list of values for a specific configuration. example: $ devlink dev param show pci/... name multi-value-param name multi-value-param type driver-specific values: cmode permanent value: 0,1,2,3,4,5,6,7 $ devlink dev param set pci/... name multi-value-param \ value 4,5,6,7,0,1,2,3 cmode permanent Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260609040453.711932-5-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-13Merge tag 'ipsec-next-2026-06-12' of ↵Jakub Kicinski1-16/+62
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2026-06-12 1) Replace the open-coded manual cleanup in xfrm_add_policy() error path with xfrm_policy_destroy() for consistency with xfrm_policy_construct(). From Deepanshu Kartikey. 2) Limit XFRMA_TFCPAD to a sensible maximum (max IP length, 64k) since u32 is excessive for traffic flow confidentiality padding. From David Ahern. 3) Add a new netlink message XFRM_MSG_MIGRATE_STATE that allows migrating individual IPsec SAs independently of their policies. The existing XFRM_MSG_MIGRATE is tightly coupled to policy+SA migration, lacks SPI for unique SA identification, and cannot express reqid changes or migrate Transport mode selectors. The new interface identifies the SA via SPI and mark, supports reqid changes, address family changes, encap removal, and uses an atomic create+install flow under x->lock to prevent SN/IV reuse during AEAD SA migration. From Antony Antony. * tag 'ipsec-next-2026-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: add documentation for XFRM_MSG_MIGRATE_STATE xfrm: restrict netlink attributes for XFRM_MSG_MIGRATE_STATE xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration xfrm: make xfrm_dev_state_add xuo parameter const xfrm: extract address family and selector validation helpers xfrm: refactor XFRMA_MTIMER_THRESH validation into a helper xfrm: move encap and xuo into struct xfrm_migrate xfrm: add error messages to state migration xfrm: add state synchronization after migration xfrm: check family before comparing addresses in migrate xfrm: split xfrm_state_migrate into create and install functions xfrm: rename reqid in xfrm_migrate xfrm: fix NAT-related field inheritance in SA migration xfrm: allow migration from UDP encapsulated to non-encapsulated ESP xfrm: add extack to xfrm_init_state xfrm: remove redundant assignments xfrm: Reject excessive values for XFRMA_TFCPAD xfrm: cleanup error path in xfrm_add_policy() ==================== Link: https://patch.msgid.link/20260612074725.1760473-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-13vsock: introduce vsock_pending_to_accept() helperRaf Dickson1-0/+1
Add vsock_pending_to_accept() to move a socket directly from the pending list to the accept queue in a single operation, avoiding the sock_put/sock_hold dance and the sk_acceptq_removed()/ sk_acceptq_added() pair that would otherwise be needed when calling vsock_remove_pending() followed by vsock_enqueue_accept(). Use it in vmci_transport_recv_connecting_server() where a completed handshake transitions the socket from pending to accept queue. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Raf Dickson <rafdog35@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260612045216.105796-2-rafdog35@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-13psp: add new netlink cmd for dev-assoc and dev-disassocWei Wang1-0/+23
The main purpose of this cmd is to be able to associate a non-psp-capable device (e.g. veth or netkit) with a psp device. One use case is if we create a pair of veth/netkit, and assign 1 end inside a netns, while leaving the other end within the default netns, with a real PSP device, e.g. netdevsim or a physical PSP-capable NIC. With this command, we could associate the veth/netkit inside the netns with PSP device, so the virtual device could act as PSP-capable device to initiate PSP connections, and performs PSP encryption/decryption on the real PSP device. Signed-off-by: Wei Wang <weibunny@fb.com> Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20260608233118.2694144-3-weibunny.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-13tls: remove tls_toe and the related driverSabrina Dubroca2-78/+0
The tls_toe feature and its single user (chelsio chtls) have been unmaintained for multiple years. It also hooks into the core of the TCP implementation, and bypasses most of the networking stack. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/1f30e73275c07bf879f547589872d0916025a52e.1781165969.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-13tcp: clear sock_ops cb flags before force-closing a child socketSechang Lim1-0/+9
A child socket inherits the listener's bpf_sock_ops_cb_flags via sk_clone_lock(). If its setup fails in tcp_v4_syn_recv_sock() / tcp_v6_syn_recv_sock(), the child is freed through put_and_exit, where inet_csk_prepare_forced_close() drops the socket lock and tcp_done() runs without it. If BPF_SOCK_OPS_STATE_CB_FLAG was inherited, tcp_done() -> tcp_set_state() calls tcp_call_bpf(), which expects the lock and trips sock_owned_by_me(): WARNING: include/net/sock.h:1799 at tcp_set_state+0x433/0x550 RIP: 0010:tcp_set_state+0x433/0x550 include/net/sock.h:1799 Call Trace: <IRQ> tcp_done+0xba/0x250 net/ipv4/tcp.c:5095 tcp_v4_syn_recv_sock+0x850/0xa50 net/ipv4/tcp_ipv4.c:1787 tcp_check_req+0xf30/0x1360 net/ipv4/tcp_minisocks.c:926 tcp_v4_rcv+0x1047/0x1b50 net/ipv4/tcp_ipv4.c:2164 </IRQ> The child is freed before it is ever established, so it should run no sock_ops callback. Clear its cb flags in inet_csk_prepare_for_destroy_sock(), the common point for the IPv4, IPv6 and chtls forced-close paths and for the MPTCP ->syn_recv_sock() failure path (dispose_child), which reaches tcp_done() on a child that was never established too. Suggested-by: Jiayuan Chen <jiayuan.chen@linux.dev> Fixes: d44874910a26 ("bpf: Add BPF_SOCK_OPS_STATE_CB") Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260611092923.1895982-1-rhkrqnwk98@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-12Merge tag 'for-net-next-2026-06-11' of ↵Jakub Kicinski9-36/+16
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: core: - hci_sync: Add support for HCI_LE_Set_Host_Feature [v2] - SMP: Use AES-CMAC library API - sockets: convert to getsockopt_iter - Add SPDX id lines to some source files drivers: - btintel_pcie: Support Product level reset - btintel_pcie: Add support for smart trigger dump - btintel_pcie: Add 50 ms delay before MAC init on BlazarIW - btintel_pcie: Separate coredump work from RX work - btmtk: add event filter to filter specific event - btrtl: fix RTL8761B/BU broken LE extended scan - btusb: Add Realtek RTL8922AE VID/PID 0bda/d922 - btusb: Add Realtek RTL8922AE VID/PID 0bda/d923 - btusb: MT7922: Add VID/PID 0e8d/223c - btusb: MT7925: Add VID/PID 0e8d/8c38 - btusb: Add support for TP-Link TL-UB250 - btusb: Add Mercusys MA530 for Realtek RTL8761BUV - btusb: Add TP-Link UB600 for Realtek 8761BUV - btusb: Add support for Intel Lizard Peak 2 (0x8087:0x0040) - btusb: Add USB ID 2c4e:0128 for Mercusys MA60XNB - btusb: MT7925: Add VID/PID 13d3/3609 * tag 'for-net-next-2026-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (49 commits) Bluetooth: btintel_pcie: Separate coredump work from RX work Bluetooth: btmtksdio: fix infinite loop in btmtksdio_txrx_work() Bluetooth: qca: Add BT FW build version to kernel log Bluetooth: vhci: validate devcoredump state before side effects Bluetooth: L2CAP: validate connectionless PSM length Bluetooth: hci: validate codec capability element length Bluetooth: L2CAP: Fix UAF in channel timeout by holding conn ref Bluetooth: btintel_pcie: Load IOSF debug regs by controller variant Bluetooth: btintel_pcie: Add 50 ms delay before MAC init on BlazarIW Bluetooth: Add SPDX id lines to some source files Bluetooth: btintel_pcie: Add support for smart trigger dump Bluetooth: hci_h5: reset hci_uart::priv in the close() method Bluetooth: btusb: clean up probe error handling Bluetooth: btusb: fix wakeup irq devres lifetime Bluetooth: btusb: fix wakeup source leak on probe failure Bluetooth: btusb: fix use-after-free on marvell probe failure Bluetooth: btusb: fix use-after-free on registration failure Bluetooth: btmtk: fix URB leak in alloc_mtk_intr_urb error path Bluetooth: hci_core: Fix UAF in hci_unregister_dev() Bluetooth: hci_event: fix simultaneous discovery stuck in FINDING ... ==================== Link: https://patch.msgid.link/20260611183358.176776-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-12tcp: allow mptcp to drop TS for some packetsMatthieu Baerts (NGI0)1-10/+3
With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port taking 30 bytes, the 40-byte limit for the TCP options is reached. In this case, it is then not possible to send the address signal. The idea is to let MPTCP dropping the TCP-timestamps option for some specific packets, to be able to send some specific pure ACK carrying >28 bytes of MPTCP options, like with this specific ADD_ADDR. A new parameter is passed from tcp_established_options to the MPTCP side to indicate if the TCP TS option is used, and if it should be dropped. The next commit implements the part on MPTCP side, but split into two patches to help TCP maintainers to identify the modifications on TCP side. This feature will be controlled by a new add_addr_v6_port_drop_ts MPTCP sysctl knob. It is important to keep in mind that dropping the TCP timestamps option for one packet of the connection could eventually disrupt some middleboxes: even if it should be unlikely, they could drop the packet or even block the connection. That's why this new feature will be controlled by a sysctl knob. Note that it would be technically possible to squeeze both options into the header if the ADD_ADDR is first written, and then the TCP timestamps without the NOPs preceding it. But this means more modifications on TCP side, plus some middleboxes could still be disrupted by that. In this implementation, an unused bit is used in mptcp_out_options structure to avoid passing an address to a local variable. Reading and setting it needs CONFIG_MPTCP, so the whole block now has this #if condition: mptcp_established_options() is then no longer used without CONFIG_MPTCP. About alternatives, instead of passing a new boolean (has_ts), another option would be to pass the whole option structure (opts), but 'struct tcp_out_options' is currently defined in tcp_output.c, and it would need to be exported. Plus that means the removal of the TCP TS option would be done on the MPTCP side, and not here on the TCP side. It feels clearer to remove other TCP options from the TCP side, than hiding that from the MPTCP side. Yet an other alternative would be to pass the size already taken by the other TCP options, and have a way to drop them all when needed. But this feels better to target only the timestamps option where dropping it should be safe, even if it is currently the only option that would be set before MPTCP, when MPTCP is used. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-5-758e7ca73f4d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-12net: fib_rules: Don't dump dying fib_rule in fib_rules_dump().Kuniyuki Iwashima1-0/+5
rocker_router_fib_event() calls fib_rule_get() during RCU dump. If the fib_rule is dying, refcount_inc() will complain about it. Let's call refcount_inc_not_zero() in fib_rules_dump(). Fixes: 5d7bfd141924 ("ipv4: fib_rules: Dump FIB rules when registering FIB notifier") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260610061744.2030996-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-12ipv4: fib: Don't dump dying fib_info in fib_leaf_notify().Kuniyuki Iwashima1-0/+5
syzbot reported use-after-free in nsim_fib4_prepare_event(). [0] The problem is that the following functions call fib_info_hold() / refcount_inc() while dumping fib_info under RCU, which is unsafe. * mlxsw_sp_router_fib4_event() * rocker_router_fib_event() * nsim_fib4_prepare_event() refcount_inc_not_zero() must be used, but it would be too late there. Let's guarantee the lifetime of fib_info in fib_leaf_notify(). Note that IPv6 does not need the corresponding change since fib6_table_dump() holds fib6_table.tb6_lock. [0]: refcount_t: addition on 0; use-after-free. WARNING: lib/refcount.c:25 at refcount_warn_saturate+0x9f/0x110 lib/refcount.c:25, CPU#0: kworker/u8:15/3420 Modules linked in: CPU: 0 UID: 0 PID: 3420 Comm: kworker/u8:15 Not tainted syzkaller #0 PREEMPT_{RT,(full)} Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026 Workqueue: netns cleanup_net RIP: 0010:refcount_warn_saturate+0x9f/0x110 lib/refcount.c:25 Code: eb 66 85 db 74 3e 83 fb 01 75 4c e8 1b f1 22 fd 48 8d 3d 84 cb f1 0a 67 48 0f b9 3a eb 4a e8 08 f1 22 fd 48 8d 3d 81 cb f1 0a <67> 48 0f b9 3a eb 37 e8 f5 f0 22 fd 48 8d 3d 7e cb f1 0a 67 48 0f RSP: 0018:ffffc9000f2c7270 EFLAGS: 00010293 RAX: ffffffff84a18858 RBX: 0000000000000002 RCX: ffff888032ff9ec0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8f9353e0 RBP: 0000000000000000 R08: ffff888032ff9ec0 R09: 0000000000000005 R10: 0000000000000100 R11: 0000000000000004 R12: ffff8880570cc000 R13: dffffc0000000000 R14: ffff88802b40563c R15: ffff8880570cc000 FS: 0000000000000000(0000) GS:ffff888126173000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb1f4d5d000 CR3: 000000006072a000 CR4: 00000000003526f0 Call Trace: <TASK> __refcount_add include/linux/refcount.h:-1 [inline] __refcount_inc include/linux/refcount.h:366 [inline] refcount_inc include/linux/refcount.h:383 [inline] fib_info_hold include/net/ip_fib.h:629 [inline] nsim_fib4_prepare_event drivers/net/netdevsim/fib.c:930 [inline] nsim_fib_event_schedule_work drivers/net/netdevsim/fib.c:1000 [inline] nsim_fib_event_nb+0x1055/0x1240 drivers/net/netdevsim/fib.c:1043 call_fib_notifier+0x45/0x80 net/core/fib_notifier.c:25 call_fib_entry_notifier net/ipv4/fib_trie.c:90 [inline] fib_leaf_notify net/ipv4/fib_trie.c:2176 [inline] fib_table_notify net/ipv4/fib_trie.c:2194 [inline] fib_notify+0x36b/0x5e0 net/ipv4/fib_trie.c:2217 fib_net_dump net/core/fib_notifier.c:70 [inline] register_fib_notifier+0x184/0x360 net/core/fib_notifier.c:108 nsim_fib_create+0x85d/0x9f0 drivers/net/netdevsim/fib.c:1596 nsim_dev_reload_create drivers/net/netdevsim/dev.c:1604 [inline] nsim_dev_reload_up+0x374/0x7c0 drivers/net/netdevsim/dev.c:1058 devlink_reload+0x501/0x8d0 net/devlink/dev.c:475 devlink_pernet_pre_exit+0x1ff/0x420 net/devlink/core.c:558 ops_pre_exit_list net/core/net_namespace.c:161 [inline] ops_undo_list+0x187/0x940 net/core/net_namespace.c:234 cleanup_net+0x56e/0x800 net/core/net_namespace.c:702 process_one_work kernel/workqueue.c:3314 [inline] process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478 kthread+0x388/0x470 kernel/kthread.c:436 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Fixes: 0ae3eb7b4611 ("netdevsim: fib: Perform the route programming in a non-atomic context") Fixes: c3852ef7f2f8 ("ipv4: fib: Replay events when registering FIB notifier") Reported-by: syzbot+cb2aa2390ac024e25f5c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a290011.39669fcc.33b062.00b1.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260610061744.2030996-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-0/+2
Cross-merge networking fixes after downstream PR (net-7.1-rc8). Conflicts: drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c f67aead16e85 ("net: txgbe: rework service event handling") 57d39faed4c9 ("net: txgbe: improve functions of AML 40G devices") net/rds/info.c 512db8267b73 ("rds: mark snapshot pages dirty in rds_info_getsockopt()") 6e94eeb2a2a6 ("rds: convert to getsockopt_iter") Adjacent changes: include/net/sock.h 1ee90b77b727 ("net: guard timestamp cmsgs to real error queue skbs") f0de88303d5e ("net: make is_skb_wmem() available to modules") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-11Bluetooth: L2CAP: Fix UAF in channel timeout by holding conn refMarco Elver1-0/+1
l2cap_chan_timeout() runs asynchronously and accesses chan->conn. If the connection is torn down while the timer is running or pending, chan->conn can be freed, leading to a use-after-free when the timer worker attempts to lock conn->lock: | BUG: KASAN: slab-use-after-free in instrument_atomic_read_write include/linux/instrumented.h:112 [inline] | BUG: KASAN: slab-use-after-free in atomic_long_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:4456 [inline] | BUG: KASAN: slab-use-after-free in __mutex_trylock_fast kernel/locking/mutex.c:161 [inline] | BUG: KASAN: slab-use-after-free in mutex_lock+0x4f/0xa0 kernel/locking/mutex.c:318 | Write of size 8 at addr ffff8881298d9550 by task kworker/2:1/83 | | CPU: 2 UID: 0 PID: 83 Comm: kworker/2:1 Not tainted 7.1.0-rc6-next-20260601-dirty #6 PREEMPT(full) | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 | Workqueue: events l2cap_chan_timeout | Call Trace: | <TASK> | instrument_atomic_read_write include/linux/instrumented.h:112 [inline] | atomic_long_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:4456 [inline] | __mutex_trylock_fast kernel/locking/mutex.c:161 [inline] | mutex_lock+0x4f/0xa0 kernel/locking/mutex.c:318 | l2cap_chan_timeout+0x5d/0x1b0 net/bluetooth/l2cap_core.c:422 | process_one_work kernel/workqueue.c:3326 [inline] | process_scheduled_works+0x7c8/0xfb0 kernel/workqueue.c:3409 | worker_thread+0x8a9/0xcf0 kernel/workqueue.c:3490 | kthread+0x346/0x430 kernel/kthread.c:436 | ret_from_fork+0x1a3/0x470 arch/x86/kernel/process.c:158 | ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 | </TASK> | | Allocated by task 320: | l2cap_conn_add+0xa7/0x820 net/bluetooth/l2cap_core.c:7075 | l2cap_connect_cfm+0xdb/0xd70 net/bluetooth/l2cap_core.c:7452 | hci_connect_cfm include/net/bluetooth/hci_core.h:2139 [inline] | hci_remote_features_evt+0x52f/0x9f0 net/bluetooth/hci_event.c:3760 | hci_event_func net/bluetooth/hci_event.c:7796 [inline] | hci_event_packet+0x561/0xa70 net/bluetooth/hci_event.c:7847 | hci_rx_work+0x370/0x890 net/bluetooth/hci_core.c:4040 | process_one_work kernel/workqueue.c:3326 [inline] | process_scheduled_works+0x7c8/0xfb0 kernel/workqueue.c:3409 | worker_thread+0x8a9/0xcf0 kernel/workqueue.c:3490 | kthread+0x346/0x430 kernel/kthread.c:436 | ret_from_fork+0x1a3/0x470 arch/x86/kernel/process.c:158 | ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 | | Freed by task 322: | hci_disconn_cfm include/net/bluetooth/hci_core.h:2154 [inline] | hci_conn_hash_flush+0x101/0x1f0 net/bluetooth/hci_conn.c:2736 | hci_dev_close_sync+0x889/0xde0 net/bluetooth/hci_sync.c:5405 | hci_dev_do_close net/bluetooth/hci_core.c:502 [inline] | hci_unregister_dev+0x1f7/0x370 net/bluetooth/hci_core.c:2679 | vhci_release+0x12a/0x180 drivers/bluetooth/hci_vhci.c:690 | __fput+0x369/0x890 fs/file_table.c:510 | task_work_run+0x160/0x1d0 kernel/task_work.c:233 | get_signal+0xf5b/0x1120 kernel/signal.c:2810 | arch_do_signal_or_restart+0x4d/0x600 arch/x86/kernel/signal.c:337 | __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] | exit_to_user_mode_loop+0x85/0x510 kernel/entry/common.c:98 | do_syscall_64+0x263/0x3d0 arch/x86/entry/syscall_64.c:100 | entry_SYSCALL_64_after_hwframe+0x77/0x7f | | The buggy address belongs to the object at ffff8881298d9400 | which belongs to the cache kmalloc-512 of size 512 | The buggy address is located 336 bytes inside of | freed 512-byte region [ffff8881298d9400, ffff8881298d9600) Fix it by having chan->conn hold a reference to l2cap_conn (via l2cap_conn_get) when the channel is added to the connection, and releasing it in the channel destructor. This ensures the l2cap_conn remains alive as long as the channel exists. A new FLAG_DEL channel flag is introduced to indicate that the channel has been deleted from its connection. l2cap_chan_del() atomically sets this flag using test_and_set_bit() instead of setting chan->conn to NULL. All asynchronous workers (l2cap_chan_timeout, l2cap_ack_timeout, l2cap_monitor_timeout, l2cap_retrans_timeout) and l2cap_chan_send() check FLAG_DEL to determine whether the channel has been torn down, rather than testing chan->conn for NULL. Fixes: 8c8e620467a7 ("Bluetooth: L2CAP: use chan timer to close channels in cleanup_listen()") Cc: <stable@vger.kernel.org> Cc: Siwei Zhang <oss@fourdim.xyz> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Assisted-by: Gemini:gemini-3.1-pro-preview Reported-by: https://sashiko.dev/#/patchset/20260521021249.3258069-1-oss%40fourdim.xyz Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-06-11Bluetooth: Add SPDX id lines to some source filesTim Bird9-36/+9
Many bluetooth source files are missing SPDX-License-Identifier lines. Add appropriate IDs to these files, and remove other license lines from the headers. Leave the warranty disclaimer in files where the license ID is GPL-2.0 but the wording of the disclaimer is slightly different from that of the GPL v2 disclaimer. It is not different enough to cause licensing conflicts, but is kept to honor the original contributors' legal intent. Signed-off-by: Tim Bird <tim.bird@sony.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-06-11Bluetooth: hci_sync: Add support for HCI_LE_Set_Host_Feature [v2]Luiz Augusto von Dentz1-0/+6
This adds support for using HCI_LE_Set_Host_Feature [v2] instead of v1 if LL Extented Features is supported and the controller supports the command. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-06-11Merge tag 'nf-26-06-10' of ↵Paolo Abeni1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Revalidate bridge ports, add missing NULL checks to fetch the bridge device by the port. From Florian Westphal. 2) Fix netdevice refcount leak in the error path of nft_fwd hardware offload function, also from Florian. 3) Unregister helper expectfn callback on conntrack helper module removal, otherwise dangling pointer remains in place, from Weiming Shi. 4) Fix possible pointer infoleak in getsockopt() IPT_SO_GET_ENTRIES, From Kyle Zeng. 5) Validate that device MAC header is present before nf_syslog accesses it. From Xiang Mei. 6-8) Three patches to address a possible infoleak of stale stack data in three nf_tables expressions, due to mismatch in the _init() and _eval() function which is possible since 14fb07130c7d. From Davide Ornaghi and Florian Westphal. netfilter pull request 26-06-10 * tag 'nf-26-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_meta_bridge: fix stale stack leak via IIFHWADDR register netfilter: nft_fib: fix stale stack leak via the OIFNAME register netfilter: nft_exthdr: fix register tracking for F_PRESENT flag netfilter: nf_log: validate MAC header was set before dumping it netfilter: x_tables: avoid leaking percpu counter pointers netfilter: nf_conntrack: destroy stale expectfn expectations on unregister netfilter: nf_tables_offload: drop device refcount on error netfilter: revalidate bridge ports ==================== Link: https://patch.msgid.link/20260610161629.214092-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-06-10netfilter: nf_conntrack: destroy stale expectfn expectations on unregisterWeiming Shi1-0/+1
NAT helpers such as nf_nat_h323 store a raw pointer to module text in exp->expectfn (e.g. ip_nat_q931_expect). nf_ct_helper_expectfn_unregister() only unlinks the callback descriptor and never walks the expectation table, so an expectation pending at module removal survives with a dangling exp->expectfn into freed module text. When the expected connection arrives, init_conntrack() invokes exp->expectfn(), now a stale pointer into the unloaded module. Reproduced on a KASAN build by loading the H.323 helpers, creating a Q.931 expectation, unloading nf_nat_h323, then connecting to the expected port: Oops: int3: 0000 [#1] SMP KASAN NOPTI RIP: 0010:0xffffffffa06102d1 init_conntrack.isra.0 (net/netfilter/nf_conntrack_core.c:1862) nf_conntrack_in (net/netfilter/nf_conntrack_core.c:2049) ipv4_conntrack_local (net/netfilter/nf_conntrack_proto.c:223) nf_hook_slow (net/netfilter/core.c:619) __ip_local_out (net/ipv4/ip_output.c:120) __tcp_transmit_skb (net/ipv4/tcp_output.c:1715) tcp_connect (net/ipv4/tcp_output.c:4374) tcp_v4_connect (net/ipv4/tcp_ipv4.c:345) __sys_connect (net/socket.c:2167) Modules linked in: nf_conntrack_h323 [last unloaded: nf_nat_h323] Reaching the dangling state requires CAP_SYS_MODULE in the initial user namespace to remove a NAT helper that still has live expectations, so this is a robustness fix; leaving an expectation pointing at freed text is wrong regardless. Add nf_ct_helper_expectfn_destroy(), which walks the expectation table and drops every expectation whose ->expectfn matches the descriptor being torn down. Call it from each NAT helper's exit path after the existing RCU grace period, so no expectation outlives the code it points at and no extra synchronize_rcu() is introduced. With the fix, the same reproducer runs to completion without the Oops. Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port") Reported-by: Xiang Mei <xmei5@asu.edu> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-06-10Merge tag 'wireless-next-2026-06-10' of ↵Jakub Kicinski2-3/+6
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Quite a few last updates, notably: - b43: new support for an 11n device - mt76: - mt792x broken usb transport detection - mt7921 regd improvements - mt7927 support - iwlwifi: - more kunit tests - FW version updates - ath12k: WDS support - rtw89: - RTL8922AU support - USB 3 mode switch for performance - better monitor radiotap support - RTL8922DE preparations - cfg80211/mac80211: - update UHR to D1.4, UHR DBE support - finally remove 5/10 MHz support - S1G rate reporting - multicast encapsulation offload * tag 'wireless-next-2026-06-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (285 commits) b43: add RF power offset for N-PHY r8 + radio 2057 r8 b43: add channel info table for N-PHY r8 + radio 2057 r8 b43: add IPA TX gain table for N-PHY r8 + radio 2057 r8 b43: support radio 2057 rev 8 b43: route d11 corerev 22 to 24-bit indirect radio access b43: add d11 core revision 0x16 to id table b43: add firmware mappings for rev22 rfkill: Replace strcpy() with memcpy() wifi: brcmfmac: flowring: simplify flow allocation wifi: brcm80211: change current_bss to value wifi: ath12k: enable IEEE80211_VHT_EXT_NSS_BW_CAPABLE when NSS ratio is reported wifi: ath12k: fix EAPOL TX failure caused by stale tcl_metadata bits wifi: ath: Update copyright in testmode_i.h wifi: ath10k: Update Qualcomm copyrights wifi: ath11k: Update Qualcomm copyrights wifi: ath12k: Update Qualcomm copyrights wifi: mt76: Drop unneeded mt76_register_debugfs_fops() return checks wifi: mt76: mt7921: assert sniffer on chanctx change wifi: mt76: mt7996: fix potential tx_retries underflow wifi: mt76: mt7925: fix potential tx_retries underflow ... ==================== Link: https://patch.msgid.link/20260610103637.179340-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10bonding: 3ad: add lacp_strict configuration knobLouis Scalbert2-0/+2
When an 802.3ad (LACP) bonding interface has no slaves in the collecting/distributing state, the bonding master still reports carrier as up as long as at least 'min_links' slaves have carrier. In this situation, only one slave is effectively used for TX/RX, while traffic received on other slaves is dropped. Upper-layer daemons therefore consider the interface operational, even though traffic may be blackholed if the lack of LACP negotiation means the partner is not ready to deal with traffic. Introduce a configuration knob to control this behavior. It allows the bonding master to assert carrier only when at least 'min_links' slaves are in Collecting_Distributing state. The default mode preserves the existing behavior. This patch only introduces the knob; its behavior is implemented in the subsequent commit. Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20260603150331.1919611-4-louis.scalbert@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10net: guard timestamp cmsgs to real error queue skbsKyle Zeng1-0/+1
skb_is_err_queue() treats PACKET_OUTGOING as the sole marker for an skb from sk_error_queue. That assumption is not true for AF_PACKET sockets: outgoing packet taps are also delivered to packet sockets with skb->pkt_type == PACKET_OUTGOING, but their skb->cb is owned by AF_PACKET instead of struct sock_exterr_skb. If such an skb is received with timestamping enabled, the generic timestamp cmsg path can read AF_PACKET control-buffer state as sock_exterr_skb::opt_stats. With SO_RXQ_OVFL enabled, the packet drop counter overlaps opt_stats. An odd drop count makes the path emit SCM_TIMESTAMPING_OPT_STATS with skb->len and skb->data. For non-linear skbs this copies past the linear head and can trigger hardened usercopy or disclose adjacent heap contents. Keep skb_is_err_queue() local to net/socket.c, but make it verify that the PACKET_OUTGOING marker is paired with the sock_rmem_free destructor installed by sock_queue_err_skb(). AF_PACKET receive skbs use normal receive ownership and no longer pass as error-queue skbs, while legitimate sk_error_queue entries keep the PACKET_OUTGOING marker and sock_rmem_free ownership. Fixes: 8605330aac5a ("tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs") Signed-off-by: Kyle Zeng <kylebot@openai.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260607021819.49698-1-kylebot@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queriesErni Sri Satya Vennela1-0/+4
mana_query_link_cfg() sends an HWC command to firmware on every call, but the link speed and QoS values it returns only change when the driver explicitly calls mana_set_bw_clamp(). This function is called not only by userspace via ethtool get_link_ksettings, but also periodically by hv_netvsc through netvsc_get_link_ksettings and by the sysfs speed_show attribute via dev_attr_show, resulting in unnecessary HWC traffic every few minutes. Add a link_cfg_error field to mana_port_context to cache the query result. The field uses three states: 1 (not yet queried, initial value set during mana_probe_port), 0 (success, speed/max_speed are valid), or a negative errno for permanent errors like -EOPNOTSUPP when the hardware does not support the command. Transient errors and qos_unconfigured responses are not cached so that subsequent calls will retry. MANA is ops-locked because it implements net_shaper_ops, so the core already takes netdev_lock() around all ethtool_ops and net_shaper_ops entry points. Reuse that lock to serialize mana_query_link_cfg() and mana_set_bw_clamp(). This prevents a concurrent mana_set_bw_clamp() from racing with an in-flight query and publishing stale pre-clamp speed/max_speed. Invalidate the cache inside mana_set_bw_clamp() on success, so all current and future callers that change the link configuration automatically trigger a fresh query on the next mana_query_link_cfg() call. Also reset link_cfg_error during resume in mana_probe() under netdev_lock(), so that any query already in flight cannot later store 0 and silently overwrite the post-resume invalidation. Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Link: https://patch.msgid.link/20260606133301.2180073-1-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10net: mana: Add support for PF device 0x00C1Haiyang Zhang1-0/+2
Update the device id table to include the new device id 0x00C1. This device's BAR layout is similar to VF's, update the function, mana_gd_init_registers(), accordingly. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260605212302.2135499-1-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10RDMA/mana_ib: Allocate interrupt contexts on EQsLong Li1-2/+5
Use the GIC functions to allocate interrupt contexts for RDMA EQs. These interrupt contexts may be shared with Ethernet EQs when MSI-X vectors are limited. The driver now supports allocating dedicated MSI-X for each EQ. Indicate this capability through driver capability bits. The RDMA EQs pass use_msi_bitmap=false to share MSI-X vectors with Ethernet, while the capability flag advertises that the driver supports per-vPort EQ separation when hardware has sufficient vectors. Populate eq.irq on all RDMA EQs for consistency with the Ethernet path. Also relocate the GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE define to its numeric BIT(6) position among the other capability flags. Signed-off-by: Long Li <longli@microsoft.com> Acked-by: Leon Romanovsky <leon@kernel.org> Link: https://patch.msgid.link/20260605005717.2059954-7-longli@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10net: mana: Allocate interrupt context for each EQ when creating vPortLong Li1-0/+1
Use GIC functions to create a dedicated interrupt context or acquire a shared interrupt context for each EQ when setting up a vPort. The caller now owns the GIC reference across the EQ create/destroy lifecycle: mana_create_eq() calls mana_gd_get_gic() before creating each EQ and mana_destroy_eq() calls mana_gd_put_gic() after destroying it. The msix_index invalidation is moved from mana_gd_deregister_irq() to the mana_gd_create_eq() error path so that mana_destroy_eq() can read the index before teardown. Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/20260605005717.2059954-6-longli@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10net: mana: Introduce GIC context with refcounting for interrupt managementLong Li1-0/+12
To allow Ethernet EQs to use dedicated or shared MSI-X vectors and RDMA EQs to share the same MSI-X, introduce a GIC (GDMA IRQ Context) with reference counting. This allows the driver to create an interrupt context on an assigned or unassigned MSI-X vector and share it across multiple EQ consumers. Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/20260605005717.2059954-4-longli@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10net: mana: Query device capabilities and configure MSI-X sharing for EQsLong Li1-1/+12
When querying the device, adjust the max number of queues to allow dedicated MSI-X vectors for each vPort. The per-vPort queue count is clamped towards MANA_DEF_NUM_QUEUES but will not exceed the hardware maximum reported by the device. MSI-X sharing among vPorts is enabled when there are not enough MSI-X vectors for dedicated allocation, or when the platform does not support dynamic MSI-X allocation (in which case all vectors are pre-allocated at probe time and sharing is always used). The msi_sharing flag is reset at the top of mana_gd_query_max_resources() so it is recomputed from current hardware state on each probe or resume cycle. Clamp apc->max_queues to gc->max_num_queues_vport in mana_init_port() so that on resume, if max_num_queues_vport has decreased due to fewer MSI-X vectors, num_queues is reduced accordingly before EQ allocation. A device reporting zero ports now results in a fatal probe error since the per-vPort MSI-X math requires at least one port. Rename mana_query_device_cfg() to mana_gd_query_device_cfg() as it is used at GDMA device probe time for querying device capabilities. Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/20260605005717.2059954-3-longli@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10net: mana: Create separate EQs for each vPortLong Li1-3/+12
To prepare for assigning vPorts to dedicated MSI-X vectors, remove EQ sharing among the vPorts and create dedicated EQs for each vPort. Move the EQ definition from struct mana_context to struct mana_port_context and update related support functions. Export mana_create_eq() and mana_destroy_eq() for use by the MANA RDMA driver. RSS QPs now take a vport reference via pd->vport_use_count to ensure EQs outlive all QP consumers. The vport must already be configured by a raw QP before an RSS QP can be created. EQs are only destroyed when the last QP (raw or RSS) on the PD releases its reference. Restrict each vport to a single RSS QP. The hardware only supports one steering configuration (indirection table / hash key) per vport, and mana_disable_vport_rx() on QP destroy disables RX globally for the vport. Previously, creating a second RSS QP would silently overwrite the first QP's steering config and destroy would blackhole all traffic. This is now explicitly rejected with -EBUSY. Existing applications (DPDK being the primary RDMA consumer) always create one RSS QP per vport, so no real-world flows are affected. Reject cross-port PD sharing for both raw and RSS QPs. Since EQs and vport configuration are per-port, a PD is bound to the port used by its first raw QP. Subsequent QPs on the same PD must use the same port or the creation fails with -EINVAL. Previously this was silently broken: with shared EQs it appeared to work, but with per-vPort EQs a cross-port PD would cause wrong-port EQ teardown and corruption. DPDK creates one PD per port so no existing flows are affected. Serialize mana_set_channels() and the async per-port queue reset handler against RDMA vport configuration to prevent RDMA from claiming the vport during the detach/attach window. A channel_changing flag is set under apc->vport_mutex before detach and checked by mana_cfg_vport() when called from the RDMA path, blocking RDMA from grabbing the vport during the entire window. When the port is down and RDMA already holds the vport, the channel change is rejected with -EBUSY. Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/20260605005717.2059954-2-longli@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-10net: ncsi: Set ncsi_stop_dev() to inline while NET_NCSI not enabledMinda Chen1-1/+1
While NET_NCSI not enabled, ncsi_stop_dev() is not inline and call with it, casue compile waring: linux/include/net/ncsi.h:63:13: warning: 'ncsi_stop_dev' defined but not used [-Wunused-function] static void ncsi_stop_dev(struct ncsi_dev *nd) Setting ncsi_stop_dev() to inline like other function to remove compile warnings. Signed-off-by: Minda Chen <minda.chen@starfivetech.com> Link: https://patch.msgid.link/20260605033607.37630-1-minda.chen@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>