summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-01-07udp: call skb_orphan() before skb_attempt_defer_free()Eric Dumazet1-0/+1
Standard UDP receive path does not use skb->destructor. But skmsg layer does use it, since it calls skb_set_owner_sk_safe() from udp_read_skb(). This then triggers this warning in skb_attempt_defer_free(): DEBUG_NET_WARN_ON_ONCE(skb->destructor); We must call skb_orphan() to fix this issue. Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()") Reported-by: syzbot+3e68572cf2286ce5ebe9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/695b83bd.050a0220.1c9965.002b.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260105093630.1976085-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-07Revert "dsa: mv88e6xxx: make serdes SGMII/Fiber tx amplitude configurable"Vladimir Oltean4-78/+0
This reverts commit 926eae604403acfa27ba5b072af458e87e634a50, which never could have produced the intended effect: https://lore.kernel.org/netdev/AM0PR06MB10396BBF8B568D77556FC46F8F7DEA@AM0PR06MB10396.eurprd06.prod.outlook.com/ The reason why it is broken beyond repair in this form is that the mv88e6xxx driver outsources its "tx-p2p-microvolt" property to the OF node of an external Ethernet PHY. This: (a) does not work if there is no external PHY (chip-to-chip connection, or SFP module) (b) pollutes the OF property namespace / bindings of said external PHY ("tx-p2p-microvolt" could have meaning for the Ethernet PHY's SerDes interface as well) We can revisit the idea of making SerDes amplitude configurable once we have proper bindings for the mv88e6xxx SerDes. Until then, remove the code that leaves us with unnecessary baggage. Fixes: 926eae604403 ("dsa: mv88e6xxx: make serdes SGMII/Fiber tx amplitude configurable") Cc: Holger Brunck <holger.brunck@hitachienergy.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260104093952.486606-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-07MAINTAINERS: Add an additional maintainer to the AMD XGBE driverShyam Sundar S K1-0/+1
Add Raju Rangoju as an additional maintainer to support the AMD XGBE network device driver. Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20251211112831.1781030-1-Shyam-sundar.S-k@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06net: fix memory leak in skb_segment_list for GRO packetsMohammad Heib1-3/+5
When skb_segment_list() is called during packet forwarding, it handles packets that were aggregated by the GRO engine. Historically, the segmentation logic in skb_segment_list assumes that individual segments are split from a parent SKB and may need to carry their own socket memory accounting. Accordingly, the code transfers truesize from the parent to the newly created segments. Prior to commit ed4cccef64c1 ("gro: fix ownership transfer"), this truesize subtraction in skb_segment_list() was valid because fragments still carry a reference to the original socket. However, commit ed4cccef64c1 ("gro: fix ownership transfer") changed this behavior by ensuring that fraglist entries are explicitly orphaned (skb->sk = NULL) to prevent illegal orphaning later in the stack. This change meant that the entire socket memory charge remained with the head SKB, but the corresponding accounting logic in skb_segment_list() was never updated. As a result, the current code unconditionally adds each fragment's truesize to delta_truesize and subtracts it from the parent SKB. Since the fragments are no longer charged to the socket, this subtraction results in an effective under-count of memory when the head is freed. This causes sk_wmem_alloc to remain non-zero, preventing socket destruction and leading to a persistent memory leak. The leak can be observed via KMEMLEAK when tearing down the networking environment: unreferenced object 0xffff8881e6eb9100 (size 2048): comm "ping", pid 6720, jiffies 4295492526 backtrace: kmem_cache_alloc_noprof+0x5c6/0x800 sk_prot_alloc+0x5b/0x220 sk_alloc+0x35/0xa00 inet6_create.part.0+0x303/0x10d0 __sock_create+0x248/0x640 __sys_socket+0x11b/0x1d0 Since skb_segment_list() is exclusively used for SKB_GSO_FRAGLIST packets constructed by GRO, the truesize adjustment is removed. The call to skb_release_head_state() must be preserved. As documented in commit cf673ed0e057 ("net: fix fraglist segmentation reference count leak"), it is still required to correctly drop references to SKB extensions that may be overwritten during __copy_skb_header(). Fixes: ed4cccef64c1 ("gro: fix ownership transfer") Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260104213101.352887-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06netlink: specs: netdev: clarify the page pool API a littleJakub Kicinski1-2/+4
The phrasing of the page-pool-get doc is very confusing. It's supposed to highlight that support depends on the driver doing its part but it sounds like orphaned page pools won't be visible. The description of the ifindex is completely wrong. We move the page pool to loopback and skip the attribute if ifindex is loopback. Link: https://lore.kernel.org/20260104084347.5de3a537@kernel.org Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20260104165232.710460-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06net: airoha: Fix npu rx DMA definitionsLorenzo Bianconi1-4/+4
Fix typos in npu rx DMA descriptor definitions. Fixes: b3ef7bdec66fb ("net: airoha: Add airoha_offload.h header") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260102-airoha-npu-dma-rx-def-fixes-v1-1-205fc6bf7d94@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06selftests: mptcp: Mark xerror and die_perror __noreturnAnkit Khushwaha5-6/+11
Compiler reports potential uses of uninitialized variables in mptcp_connect.c when xerror() is called from failure paths. mptcp_connect.c:1262:11: warning: variable 'raw_addr' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] xerror() terminates execution by calling exit(), but it is not visible to the compiler & assumes control flow may continue past the call. Annotate xerror() with __noreturn so the compiler can correctly reason about control flow and avoid false-positive uninitialized variable warnings. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com> Link: https://patch.msgid.link/20260101172840.90186-1-ankitkhushwaha.linux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06Merge branch 'net-sched-fix-memory-leak-on-mirred-loop'Jakub Kicinski2-12/+59
Jamal Hadi Salim says: ==================== net/sched: Fix memory leak on mirred loop Initialize at_ingress earlier before the if statement. ==================== Link: https://patch.msgid.link/20260101135608.253079-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06selftests/tc-testing: Add test case redirecting to self on egressVictor Nogueira1-0/+47
Add single mirred test case that attempts to redirect to self on egress using clsact Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260101135608.253079-3-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06net/sched: act_mirred: Fix leak when redirecting to self on egressJamal Hadi Salim1-12/+12
Whenever a mirred redirect to self on egress happens, mirred allocates a new skb (skb_to_send). The loop to self check was done after that allocation, but was not freeing the newly allocated skb, causing a leak. Fix this by moving the if-statement to before the allocation of the new skb. The issue was found by running the accompanying tdc test in 2/2 with config kmemleak enabled. After a few minutes the kmemleak thread ran and reported the leak coming from mirred. Fixes: 1d856251a009 ("net/sched: act_mirred: fix loop detection") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260101135608.253079-2-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06Merge branch 'vsock-fix-so_zerocopy-on-accept-ed-vsocks'Jakub Kicinski2-0/+36
Michal Luczaj says: ==================== vsock: Fix SO_ZEROCOPY on accept()ed vsocks vsock has its own handling of setsockopt(SO_ZEROCOPY). Which works just fine unless socket comes from a call to accept(). Because SOCK_CUSTOM_SOCKOPT flag is missing, attempting to set the option always results in errno EOPNOTSUPP. ==================== Link: https://patch.msgid.link/20251229-vsock-child-sock-custom-sockopt-v2-0-64778d6c4f88@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06vsock/test: Test setting SO_ZEROCOPY on accept()ed socketMichal Luczaj1-0/+32
Make sure setsockopt(SOL_SOCKET, SO_ZEROCOPY) on an accept()ed socket is handled by vsock's implementation. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20251229-vsock-child-sock-custom-sockopt-v2-2-64778d6c4f88@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06vsock: Make accept()ed sockets use custom setsockopt()Michal Luczaj1-0/+4
SO_ZEROCOPY handling in vsock_connectible_setsockopt() does not get called on accept()ed sockets due to a missing flag. Flip it. Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY") Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20251229-vsock-child-sock-custom-sockopt-v2-1-64778d6c4f88@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04MAINTAINERS: Update email address for Justin IurmanJustin Iurman2-1/+2
Due to a change of employer, I'll be using a permanent and personal email address. Signed-off-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260103165331.20120-1-justin.iurman@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04Merge tag 'nf-26-01-02' of ↵Jakub Kicinski8-12/+56
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net The following patchset contains Netfilter fixes for *net*: 1) Fix overlap detection for nf_tables with concatenated ranges. There are cases where element could not be added due to a conflict with existing range, while kernel reports success to userspace. 2) update selftest to cover this bug. 3) synproxy update path should use READ/WRITE once as we replace config struct while packet path might read it in parallel. This relies on said config struct to fit sizeof(long). From Fernando Fernandez Mancera. 4) Don't return -EEXIST from xtables in module load path, a pending patch to module infra will spot a warning if this happens. From Daniel Gomez. 5) Fix a memory leak in nf_tables when chain hits 2**32 users and rule is to be hw-offloaded, from Zilin Guan. 6) Avoid infinite list growth when insert rate is high in nf_conncount, also from Fernando. * tag 'nf-26-01-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conncount: update last_gc only when GC has been performed netfilter: nf_tables: fix memory leak in nf_tables_newrule() netfilter: replace -EEXIST with -EBUSY netfilter: nft_synproxy: avoid possible data-race on update operation selftests: netfilter: nft_concat_range.sh: add check for overlap detection bug netfilter: nft_set_pipapo: fix range overlap detection ==================== Link: https://patch.msgid.link/20260102114128.7007-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04inet: frags: drop fraglist conntrack referencesFlorian Westphal1-0/+2
Jakub added a warning in nf_conntrack_cleanup_net_list() to make debugging leaked skbs/conntrack references more obvious. syzbot reports this as triggering, and I can also reproduce this via ip_defrag.sh selftest: conntrack cleanup blocked for 60s WARNING: net/netfilter/nf_conntrack_core.c:2512 [..] conntrack clenups gets stuck because there are skbs with still hold nf_conn references via their frag_list. net.core.skb_defer_max=0 makes the hang disappear. Eric Dumazet points out that skb_release_head_state() doesn't follow the fraglist. ip_defrag.sh can only reproduce this problem since commit 6471658dc66c ("udp: use skb_attempt_defer_free()"), but AFAICS this problem could happen with TCP as well if pmtu discovery is off. The relevant problem path for udp is: 1. netns emits fragmented packets 2. nf_defrag_v6_hook reassembles them (in output hook) 3. reassembled skb is tracked (skb owns nf_conn reference) 4. ip6_output refragments 5. refragmented packets also own nf_conn reference (ip6_fragment calls ip6_copy_metadata()) 6. on input path, nf_defrag_v6_hook skips defragmentation: the fragments already have skb->nf_conn attached 7. skbs are reassembled via ipv6_frag_rcv() 8. skb_consume_udp -> skb_attempt_defer_free() -> skb ends up in pcpu freelist, but still has nf_conn reference. Possible solutions: 1 let defrag engine drop nf_conn entry, OR 2 export kick_defer_list_purge() and call it from the conntrack netns exit callback, OR 3 add skb_has_frag_list() check to skb_attempt_defer_free() 2 & 3 also solve ip_defrag.sh hang but share same drawback: Such reassembled skbs, queued to socket, can prevent conntrack module removal until userspace has consumed the packet. While both tcp and udp stack do call nf_reset_ct() before placing skb on socket queue, that function doesn't iterate frag_list skbs. Therefore drop nf_conn entries when they are placed in defrag queue. Keep the nf_conn entry of the first (offset 0) skb so that reassembled skb retains nf_conn entry for sake of TX path. Note that fixes tag is incorrect; it points to the commit introducing the 'ip_defrag.sh reproducible problem': no need to backport this patch to every stable kernel. Reported-by: syzbot+4393c47753b7808dac7d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/693b0fa7.050a0220.4004e.040d.GAE@google.com/ Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260102140030.32367-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04virtio_net: fix device mismatch in devm_kzalloc/devm_kfreeKommula Shiva Shankar1-3/+3
Initial rss_hdr allocation uses virtio_device->device, but virtnet_set_queues() frees using net_device->device. This device mismatch causing below devres warning [ 3788.514041] ------------[ cut here ]------------ [ 3788.514044] WARNING: drivers/base/devres.c:1095 at devm_kfree+0x84/0x98, CPU#16: vdpa/1463 [ 3788.514054] Modules linked in: octep_vdpa virtio_net virtio_vdpa [last unloaded: virtio_vdpa] [ 3788.514064] CPU: 16 UID: 0 PID: 1463 Comm: vdpa Tainted: G W 6.18.0 #10 PREEMPT [ 3788.514067] Tainted: [W]=WARN [ 3788.514069] Hardware name: Marvell CN106XX board (DT) [ 3788.514071] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 3788.514074] pc : devm_kfree+0x84/0x98 [ 3788.514076] lr : devm_kfree+0x54/0x98 [ 3788.514079] sp : ffff800084e2f220 [ 3788.514080] x29: ffff800084e2f220 x28: ffff0003b2366000 x27: 000000000000003f [ 3788.514085] x26: 000000000000003f x25: ffff000106f17c10 x24: 0000000000000080 [ 3788.514089] x23: ffff00045bb8ab08 x22: ffff00045bb8a000 x21: 0000000000000018 [ 3788.514093] x20: ffff0004355c3080 x19: ffff00045bb8aa00 x18: 0000000000080000 [ 3788.514098] x17: 0000000000000040 x16: 000000000000001f x15: 000000000007ffff [ 3788.514102] x14: 0000000000000488 x13: 0000000000000005 x12: 00000000000fffff [ 3788.514106] x11: ffffffffffffffff x10: 0000000000000005 x9 : ffff800080c8c05c [ 3788.514110] x8 : ffff800084e2eeb8 x7 : 0000000000000000 x6 : 000000000000003f [ 3788.514115] x5 : ffff8000831bafe0 x4 : ffff800080c8b010 x3 : ffff0004355c3080 [ 3788.514119] x2 : ffff0004355c3080 x1 : 0000000000000000 x0 : 0000000000000000 [ 3788.514123] Call trace: [ 3788.514125] devm_kfree+0x84/0x98 (P) [ 3788.514129] virtnet_set_queues+0x134/0x2e8 [virtio_net] [ 3788.514135] virtnet_probe+0x9c0/0xe00 [virtio_net] [ 3788.514139] virtio_dev_probe+0x1e0/0x338 [ 3788.514144] really_probe+0xc8/0x3a0 [ 3788.514149] __driver_probe_device+0x84/0x170 [ 3788.514152] driver_probe_device+0x44/0x120 [ 3788.514155] __device_attach_driver+0xc4/0x168 [ 3788.514158] bus_for_each_drv+0x8c/0xf0 [ 3788.514161] __device_attach+0xa4/0x1c0 [ 3788.514164] device_initial_probe+0x1c/0x30 [ 3788.514168] bus_probe_device+0xb4/0xc0 [ 3788.514170] device_add+0x614/0x828 [ 3788.514173] register_virtio_device+0x214/0x258 [ 3788.514175] virtio_vdpa_probe+0xa0/0x110 [virtio_vdpa] [ 3788.514179] vdpa_dev_probe+0xa8/0xd8 [ 3788.514183] really_probe+0xc8/0x3a0 [ 3788.514186] __driver_probe_device+0x84/0x170 [ 3788.514189] driver_probe_device+0x44/0x120 [ 3788.514192] __device_attach_driver+0xc4/0x168 [ 3788.514195] bus_for_each_drv+0x8c/0xf0 [ 3788.514197] __device_attach+0xa4/0x1c0 [ 3788.514200] device_initial_probe+0x1c/0x30 [ 3788.514203] bus_probe_device+0xb4/0xc0 [ 3788.514206] device_add+0x614/0x828 [ 3788.514209] _vdpa_register_device+0x58/0x88 [ 3788.514211] octep_vdpa_dev_add+0x104/0x228 [octep_vdpa] [ 3788.514215] vdpa_nl_cmd_dev_add_set_doit+0x2d0/0x3c0 [ 3788.514218] genl_family_rcv_msg_doit+0xe4/0x158 [ 3788.514222] genl_rcv_msg+0x218/0x298 [ 3788.514225] netlink_rcv_skb+0x64/0x138 [ 3788.514229] genl_rcv+0x40/0x60 [ 3788.514233] netlink_unicast+0x32c/0x3b0 [ 3788.514237] netlink_sendmsg+0x170/0x3b8 [ 3788.514241] __sys_sendto+0x12c/0x1c0 [ 3788.514246] __arm64_sys_sendto+0x30/0x48 [ 3788.514249] invoke_syscall.constprop.0+0x58/0xf8 [ 3788.514255] do_el0_svc+0x48/0xd0 [ 3788.514259] el0_svc+0x48/0x210 [ 3788.514264] el0t_64_sync_handler+0xa0/0xe8 [ 3788.514268] el0t_64_sync+0x198/0x1a0 [ 3788.514271] ---[ end trace 0000000000000000 ]--- Fix by using virtio_device->device consistently for allocation and deallocation Fixes: 4944be2f5ad8c ("virtio_net: Allocate rss_hdr with devres") Signed-off-by: Kommula Shiva Shankar <kshankar@marvell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20260102101900.692770-1-kshankar@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04bnxt_en: Fix potential data corruption with HW GRO/LROSrijit Bose2-6/+13
Fix the max number of bits passed to find_first_zero_bit() in bnxt_alloc_agg_idx(). We were incorrectly passing the number of long words. find_first_zero_bit() may fail to find a zero bit and cause a wrong ID to be used. If the wrong ID is already in use, this can cause data corruption. Sometimes an error like this can also be seen: bnxt_en 0000:83:00.0 enp131s0np0: TPA end agg_buf 2 != expected agg_bufs 1 Fix it by passing the correct number of bits MAX_TPA_P5. Use DECLARE_BITMAP() to more cleanly define the bitmap. Add a sanity check to warn if a bit cannot be found and reset the ring [MChan]. Fixes: ec4d8e7cf024 ("bnxt_en: Add TPA ID mapping logic for 57500 chips.") Reviewed-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Srijit Bose <srijit.bose@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251231083625.3911652-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net: wwan: iosm: Fix memory leak in ipc_mux_deinit()Zilin Guan1-0/+6
Commit 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support") allocated memory for pp_qlt in ipc_mux_init() but did not free it in ipc_mux_deinit(). This results in a memory leak when the driver is unloaded. Free the allocated memory in ipc_mux_deinit() to fix the leak. Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support") Co-developed-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Link: https://patch.msgid.link/20251230071853.1062223-1-zilin@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net/ena: fix missing lock when update devlink paramsFrank Liang1-0/+4
Fix assert lock warning while calling devl_param_driverinit_value_set() in ena. WARNING: net/devlink/core.c:261 at devl_assert_locked+0x62/0x90, CPU#0: kworker/0:0/9 CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.19.0-rc2+ #1 PREEMPT(lazy) Hardware name: Amazon EC2 m8i-flex.4xlarge/, BIOS 1.0 10/16/2017 Workqueue: events work_for_cpu_fn RIP: 0010:devl_assert_locked+0x62/0x90 Call Trace: <TASK> devl_param_driverinit_value_set+0x15/0x1c0 ena_devlink_alloc+0x18c/0x220 [ena] ? __pfx_ena_devlink_alloc+0x10/0x10 [ena] ? trace_hardirqs_on+0x18/0x140 ? lockdep_hardirqs_on+0x8c/0x130 ? __raw_spin_unlock_irqrestore+0x5d/0x80 ? __raw_spin_unlock_irqrestore+0x46/0x80 ? devm_ioremap_wc+0x9a/0xd0 ena_probe+0x4d2/0x1b20 [ena] ? __lock_acquire+0x56a/0xbd0 ? __pfx_ena_probe+0x10/0x10 [ena] ? local_clock+0x15/0x30 ? __lock_release.isra.0+0x1c9/0x340 ? mark_held_locks+0x40/0x70 ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 ? trace_hardirqs_on+0x18/0x140 ? lockdep_hardirqs_on+0x8c/0x130 ? __raw_spin_unlock_irqrestore+0x5d/0x80 ? __raw_spin_unlock_irqrestore+0x46/0x80 ? __pfx_ena_probe+0x10/0x10 [ena] ...... </TASK> Fixes: 816b52624cf6 ("net: ena: Control PHC enable through devlink") Signed-off-by: Frank Liang <xiliang@redhat.com> Reviewed-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251231145808.6103-1-xiliang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04Merge branch 'mlx5-misc-fixes-2025-12-25'Jakub Kicinski4-12/+29
Mark Bloch says: ==================== mlx5 misc fixes 2025-12-25 This patchset provides misc bug fixes from the team to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/20251225132717.358820-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net/mlx5e: Dealloc forgotten PSP RX modify headerCosmin Ratiu1-3/+11
The commit which added RX steering rules for PSP forgot to free a modify header HW object on the cleanup path, which lead to health errors when reloading the driver and uninitializing the device: mlx5_core 0000:08:00.0: poll_health:803:(pid 3021): Fatal error 3 detected Fix that by saving the modify header pointer in the PSP steering struct and deallocating it after freeing the rule which references it. Fixes: 9536fbe10c9d ("net/mlx5e: Add PSP steering in local NIC RX") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20251225132717.358820-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net/mlx5e: Don't print error message due to invalid moduleGal Pressman1-1/+2
Dumping module EEPROM on newer modules is supported through the netlink interface only. Querying with old userspace ethtool (or other tools, such as 'lshw') which still uses the ioctl interface results in an error message that could flood dmesg (in addition to the expected error return value). The original message was added under the assumption that the driver should be able to handle all module types, but now that such flows are easily triggered from userspace, it doesn't serve its purpose. Change the log level of the print in mlx5_query_module_eeprom() to debug. Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20251225132717.358820-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net/mlx5e: Fix NULL pointer dereference in ioctl module EEPROM queryGal Pressman1-2/+4
The mlx5_query_mcia() function unconditionally dereferences the status pointer to store the MCIA register status value. However, mlx5e_get_module_id() passes NULL since it doesn't need the status value. Add a NULL check before dereferencing the status pointer to prevent a NULL pointer dereference. Fixes: 2e4c44b12f4d ("net/mlx5: Refactor EEPROM query error handling to return status separately") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20251225132717.358820-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net/mlx5e: Don't gate FEC histograms on ppcnt_statistical_groupAlexei Lazar1-4/+5
Currently, the ppcnt_statistical_group capability check incorrectly gates access to FEC histogram statistics. This capability applies only to statistical and physical counter groups, not for histogram data. Restrict the ppcnt_statistical_group check to the Physical_Layer_Counters and Physical_Layer_Statistical_Counters groups. Histogram statistics access remains gated by the pphcr capability. The issue is harmless as of today, as it happens that ppcnt_statistical_group is set on all existing devices that have pphcr set. Fixes: 6b81b8a0b197 ("net/mlx5e: Don't query FEC statistics when FEC is disabled") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20251225132717.358820-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net/mlx5: Lag, multipath, give priority for routes with smaller network prefixPatrisious Haddad1-2/+7
Today multipath offload is controlled by a single route and the route controlling is selected if it meets one of the following criteria: 1. No controlling route is set. 2. New route destination is the same as old one. 3. New route metric is lower than old route metric. This can cause unwanted behaviour in case a new route is added with a smaller network prefix which should get the priority. Fix this by adding a new criteria to give priority to new route with a smaller network prefix. Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20251225132717.358820-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04netdev: preserve NETIF_F_ALL_FOR_ALL across TSO updatesDi Zhu1-1/+2
Directly increment the TSO features incurs a side effect: it will also directly clear the flags in NETIF_F_ALL_FOR_ALL on the master device, which can cause issues such as the inability to enable the nocache copy feature on the bonding driver. The fix is to include NETIF_F_ALL_FOR_ALL in the update mask, thereby preventing it from being cleared. Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master") Signed-off-by: Di Zhu <zhud@hygon.cn> Link: https://patch.msgid.link/20251224012224.56185-1-zhud@hygon.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net: sock: fix hardened usercopy panic in sock_recv_errqueueWeiming Shi1-3/+4
skbuff_fclone_cache was created without defining a usercopy region, [1] unlike skbuff_head_cache which properly whitelists the cb[] field. [2] This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled and the kernel attempts to copy sk_buff.cb data to userspace via sock_recv_errqueue() -> put_cmsg(). The crash occurs when: 1. TCP allocates an skb using alloc_skb_fclone() (from skbuff_fclone_cache) [1] 2. The skb is cloned via skb_clone() using the pre-allocated fclone [3] 3. The cloned skb is queued to sk_error_queue for timestamp reporting 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE) 5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb [4] 6. __check_heap_object() fails because skbuff_fclone_cache has no usercopy whitelist [5] When cloned skbs allocated from skbuff_fclone_cache are used in the socket error queue, accessing the sock_exterr_skb structure in skb->cb via put_cmsg() triggers a usercopy hardening violation: [ 5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)! [ 5.382796] kernel BUG at mm/usercopy.c:102! [ 5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI [ 5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7 [ 5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 5.384903] RIP: 0010:usercopy_abort+0x6c/0x80 [ 5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490 [ 5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246 [ 5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74 [ 5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0 [ 5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74 [ 5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001 [ 5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00 [ 5.384903] FS: 0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000 [ 5.384903] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0 [ 5.384903] PKRU: 55555554 [ 5.384903] Call Trace: [ 5.384903] <TASK> [ 5.384903] __check_heap_object+0x9a/0xd0 [ 5.384903] __check_object_size+0x46c/0x690 [ 5.384903] put_cmsg+0x129/0x5e0 [ 5.384903] sock_recv_errqueue+0x22f/0x380 [ 5.384903] tls_sw_recvmsg+0x7ed/0x1960 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5.384903] ? schedule+0x6d/0x270 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5.384903] ? mutex_unlock+0x81/0xd0 [ 5.384903] ? __pfx_mutex_unlock+0x10/0x10 [ 5.384903] ? __pfx_tls_sw_recvmsg+0x10/0x10 [ 5.384903] ? _raw_spin_lock_irqsave+0x8f/0xf0 [ 5.384903] ? _raw_read_unlock_irqrestore+0x20/0x40 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 The crash offset 296 corresponds to skb2->cb within skbuff_fclones: - sizeof(struct sk_buff) = 232 - offsetof(struct sk_buff, cb) = 40 - offset of skb2.cb in fclones = 232 + 40 = 272 - crash offset 296 = 272 + 24 (inside sock_exterr_skb.ee) This patch uses a local stack variable as a bounce buffer to avoid the hardened usercopy check failure. [1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885 [2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104 [3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566 [4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491 [5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719 Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251223203534.1392218-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net: phy: mxl-86110: Add power management and soft reset supportStefano Radaelli1-0/+3
Implement soft_reset, suspend, and resume callbacks using genphy_soft_reset(), genphy_suspend(), and genphy_resume() to fix PHY initialization and power management issues. The soft_reset callback is needed to properly recover the PHY after an ifconfig down/up cycle. Without it, the PHY can remain in power-down state, causing MDIO register access failures during config_init(). The soft reset ensures the PHY is operational before configuration. The suspend/resume callbacks enable proper power management during system suspend/resume cycles. Fixes: b2908a989c59 ("net: phy: add driver for MaxLinear MxL86110 PHY") Signed-off-by: Stefano Radaelli <stefano.r@variscite.com> Link: https://patch.msgid.link/20251223120940.407195-1-stefano.r@variscite.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04inet: ping: Fix icmp out countingyuan.gao1-3/+1
When the ping program uses an IPPROTO_ICMP socket to send ICMP_ECHO messages, ICMP_MIB_OUTMSGS is counted twice. ping_v4_sendmsg ping_v4_push_pending_frames ip_push_pending_frames ip_finish_skb __ip_make_skb icmp_out_count(net, icmp_type); // first count icmp_out_count(sock_net(sk), user_icmph.type); // second count However, when the ping program uses an IPPROTO_RAW socket, ICMP_MIB_OUTMSGS is counted correctly only once. Therefore, the first count should be removed. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: yuan.gao <yuan.gao@ucloud.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251224063145.3615282-1-yuan.gao@ucloud.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net: mscc: ocelot: Fix crash when adding interface under a lagJerry Wu1-2/+4
Commit 15faa1f67ab4 ("lan966x: Fix crash when adding interface under a lag") fixed a similar issue in the lan966x driver caused by a NULL pointer dereference. The ocelot_set_aggr_pgids() function in the ocelot driver has similar logic and is susceptible to the same crash. This issue specifically affects the ocelot_vsc7514.c frontend, which leaves unused ports as NULL pointers. The felix_vsc9959.c frontend is unaffected as it uses the DSA framework which registers all ports. Fix this by checking if the port pointer is valid before accessing it. Fixes: 528d3f190c98 ("net: mscc: ocelot: drop the use of the "lags" array") Signed-off-by: Jerry Wu <w.7erry@foxmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/tencent_75EF812B305E26B0869C673DD1160866C90A@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egressAlexandre Knecht1-4/+7
When using an 802.1ad bridge with vlan_tunnel, the C-VLAN tag is incorrectly stripped from frames during egress processing. br_handle_egress_vlan_tunnel() uses skb_vlan_pop() to remove the S-VLAN from hwaccel before VXLAN encapsulation. However, skb_vlan_pop() also moves any "next" VLAN from the payload into hwaccel: /* move next vlan tag to hw accel tag */ __skb_vlan_pop(skb, &vlan_tci); __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci); For QinQ frames where the C-VLAN sits in the payload, this moves it to hwaccel where it gets lost during VXLAN encapsulation. Fix by calling __vlan_hwaccel_clear_tag() directly, which clears only the hwaccel S-VLAN and leaves the payload untouched. This path is only taken when vlan_tunnel is enabled and tunnel_info is configured, so 802.1Q bridges are unaffected. Tested with 802.1ad bridge + VXLAN vlan_tunnel, verified C-VLAN preserved in VXLAN payload via tcpdump. Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20251228020057.2788865-1-knecht.alexandre@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net: bnge: add AUXILIARY_BUS to Kconfig dependenciesMarkus Blöchl1-0/+1
The build can currently fail with ld: drivers/net/ethernet/broadcom/bnge/bnge_auxr.o: in function `bnge_rdma_aux_device_add': bnge_auxr.c:(.text+0x366): undefined reference to `__auxiliary_device_add' ld: drivers/net/ethernet/broadcom/bnge/bnge_auxr.o: in function `bnge_rdma_aux_device_init': bnge_auxr.c:(.text+0x43c): undefined reference to `auxiliary_device_init' if BNGE is enabled but no other driver pulls in AUXILIARY_BUS. Select AUXILIARY_BUS in BNGE like in all other drivers which create an auxiliary_device. Fixes: 8ac050ec3b1c ("bng_en: Add RoCE aux device support") Signed-off-by: Markus Blöchl <markus@blochl.de> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Link: https://patch.msgid.link/20251228-bnge_aux_bus-v1-1-82e273ebfdac@blochl.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-04net: marvell: prestera: fix NULL dereference on devlink_alloc() failureAlok Tiwari1-0/+2
devlink_alloc() may return NULL on allocation failure, but prestera_devlink_alloc() unconditionally calls devlink_priv() on the returned pointer. This leads to a NULL pointer dereference if devlink allocation fails. Add a check for a NULL devlink pointer and return NULL early to avoid the crash. Fixes: 34dd1710f5a3 ("net: marvell: prestera: Add basic devlink support") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Acked-by: Elad Nachman <enachman@marvell.com> Link: https://patch.msgid.link/20251230052124.897012-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-02netfilter: nf_conncount: update last_gc only when GC has been performedFernando Fernandez Mancera1-1/+1
Currently last_gc is being updated everytime a new connection is tracked, that means that it is updated even if a GC wasn't performed. With a sufficiently high packet rate, it is possible to always bypass the GC, causing the list to grow infinitely. Update the last_gc value only when a GC has been actually performed. Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-01-02netfilter: nf_tables: fix memory leak in nf_tables_newrule()Zilin Guan1-1/+2
In nf_tables_newrule(), if nft_use_inc() fails, the function jumps to the err_release_rule label without freeing the allocated flow, leading to a memory leak. Fix this by adding a new label err_destroy_flow and jumping to it when nft_use_inc() fails. This ensures that the flow is properly released in this error case. Fixes: 1689f25924ada ("netfilter: nf_tables: report use refcount overflow") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-01-01netfilter: replace -EEXIST with -EBUSYDaniel Gomez3-4/+4
The -EEXIST error code is reserved by the module loading infrastructure to indicate that a module is already loaded. When a module's init function returns -EEXIST, userspace tools like kmod interpret this as "module already loaded" and treat the operation as successful, returning 0 to the user even though the module initialization actually failed. Replace -EEXIST with -EBUSY to ensure correct error reporting in the module initialization path. Affected modules: * ebtable_broute ebtable_filter ebtable_nat arptable_filter * ip6table_filter ip6table_mangle ip6table_nat ip6table_raw * ip6table_security iptable_filter iptable_mangle iptable_nat * iptable_raw iptable_security Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-01-01netfilter: nft_synproxy: avoid possible data-race on update operationFernando Fernandez Mancera1-3/+3
During nft_synproxy eval we are reading nf_synproxy_info struct which can be modified on update operation concurrently. As nf_synproxy_info struct fits in 32 bits, use READ_ONCE/WRITE_ONCE annotations. Fixes: ee394f96ad75 ("netfilter: nft_synproxy: add synproxy stateful object support") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-01-01selftests: netfilter: nft_concat_range.sh: add check for overlap detection bugFlorian Westphal1-1/+44
without 'netfilter: nft_set_pipapo: fix range overlap detection': reject overlapping range on add 0s [FAIL] Returned success for add { 1.2.3.4 . 1.2.4.1-1.2.4.2 } given set: table inet filter { [..] elements = { 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0, 1.2.3.0-1.2.3.4 . 1.2.4.2 counter packets 0 bytes 0 } } The element collides with existing ones and was not added, but kernel returned success to userspace. Signed-off-by: Florian Westphal <fw@strlen.de>
2026-01-01netfilter: nft_set_pipapo: fix range overlap detectionFlorian Westphal1-2/+2
set->klen has to be used, not sizeof(). The latter only compares a single register but a full check of the entire key is needed. Example: table ip t { map s { typeof iifname . ip saddr : verdict flags interval } } nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected nft add element t s '{ "lo" . 10.0.0.0/8 : drop }' # bug: no error The 3rd 'add element' should be rejected via -ENOTEMPTY, not -EEXIST, so userspace / nft can report an error to the user. The latter is only correct for the 2nd case (re-add of existing element). As-is, userspace is told that the command was successful, but no elements were added. After this patch, 3rd command gives: Error: Could not process rule: File exists add element t s { "lo" . 127.0.0.0/8 . "lo" : drop } ^^^^^^^^^^^^^^^^^^^^^^^^^ Fixes: 0eb4b5ee33f2 ("netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion") Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-30Merge tag 'net-6.19-rc4' of ↵Linus Torvalds71-194/+428
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth and WiFi. Notably this includes the fix for the iwlwifi issue you reported. Current release - regressions: - core: avoid prefetching NULL pointers - wifi: - iwlwifi: implement settime64 as stub for MVM/MLD PTP - mac80211: fix list iteration in ieee80211_add_virtual_monitor() - handshake: fix null-ptr-deref in handshake_complete() - eth: mana: fix use-after-free in reset service rescan path Previous releases - regressions: - openvswitch: avoid needlessly taking the RTNL on vport destroy - dsa: properly keep track of conduit reference - ipv4: - fix error route reference count leak with nexthop objects - fib: restore ECMP balance from loopback - mptcp: ensure context reset on disconnect() - bluetooth: fix potential UaF in btusb - nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write - eth: - gve: defer interrupt enabling until NAPI registration - i40e: fix scheduling in set_rx_mode - macb: relocate mog_init_rings() callback from macb_mac_link_up() to macb_open() - rtl8150: fix memory leak on usb_submit_urb() failure - wifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc() Previous releases - always broken: - ip6_gre: make ip6gre_header() robust - ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT - af_unix: don't post cmsg for SO_INQ unless explicitly asked for - phy: mediatek: fix nvmem cell reference leak in mt798x_phy_calibration - wifi: mac80211: discard beacon frames to non-broadcast address - eth: - iavf: fix off-by-one issues in iavf_config_rss_reg() - stmmac: fix the crash issue for zero copy XDP_TX action - team: fix check for port enabled when priority changes" * tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT net: rose: fix invalid array index in rose_kill_by_device() net: enetc: do not print error log if addr is 0 net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open() selftests: fib_test: Add test case for ipv4 multi nexthops net: fib: restore ECMP balance from loopback selftests: fib_nexthops: Add test cases for error routes deletion ipv4: Fix reference count leak when using error routes with nexthop objects net: usb: sr9700: fix incorrect command used to write single register ipv6: BUG() in pskb_expand_head() as part of calipso_skbuff_setattr() usbnet: avoid a possible crash in dql_completed() gve: defer interrupt enabling until NAPI registration net: stmmac: fix the crash issue for zero copy XDP_TX action octeontx2-pf: fix "UBSAN: shift-out-of-bounds error" af_unix: don't post cmsg for SO_INQ unless explicitly asked for net: mana: Fix use-after-free in reset service rescan path net: avoid prefetching NULL pointers net: bridge: Describe @tunnel_hash member in net_bridge_vlan_group struct net: nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write net: usb: asix: validate PHY address before use ...
2025-12-30ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RTJiayuan Chen1-1/+12
On PREEMPT_RT kernels, after rt6_get_pcpu_route() returns NULL, the current task can be preempted. Another task running on the same CPU may then execute rt6_make_pcpu_route() and successfully install a pcpu_rt entry. When the first task resumes execution, its cmpxchg() in rt6_make_pcpu_route() will fail because rt6i_pcpu is no longer NULL, triggering the BUG_ON(prev). It's easy to reproduce it by adding mdelay() after rt6_get_pcpu_route(). Using preempt_disable/enable is not appropriate here because ip6_rt_pcpu_alloc() may sleep. Fix this by handling the cmpxchg() failure gracefully on PREEMPT_RT: free our allocation and return the existing pcpu_rt installed by another task. The BUG_ON is replaced by WARN_ON_ONCE for non-PREEMPT_RT kernels where such races should not occur. Link: https://syzkaller.appspot.com/bug?extid=9b35e9bc0951140d13e6 Fixes: d2d6422f8bd1 ("x86: Allow to enable PREEMPT_RT.") Reported-by: syzbot+9b35e9bc0951140d13e6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6918cd88.050a0220.1c914e.0045.GAE@google.com/T/ Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20251223051413.124687-1-jiayuan.chen@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30net: rose: fix invalid array index in rose_kill_by_device()Pwnverse1-1/+1
rose_kill_by_device() collects sockets into a local array[] and then iterates over them to disconnect sockets bound to a device being brought down. The loop mistakenly indexes array[cnt] instead of array[i]. For cnt < ARRAY_SIZE(array), this reads an uninitialized entry; for cnt == ARRAY_SIZE(array), it is an out-of-bounds read. Either case can lead to an invalid socket pointer dereference and also leaks references taken via sock_hold(). Fix the index to use i. Fixes: 64b8bc7d5f143 ("net/rose: fix races in rose_kill_by_device()") Co-developed-by: Fatma Alwasmi <falwasmi@purdue.edu> Signed-off-by: Fatma Alwasmi <falwasmi@purdue.edu> Signed-off-by: Pwnverse <stanksal@purdue.edu> Link: https://patch.msgid.link/20251222212227.4116041-1-ritviktanksalkar@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30net: enetc: do not print error log if addr is 0Wei Fang1-1/+7
A value of 0 for addr indicates that the IEB_LBCR register does not need to be configured, as its default value is 0. However, the driver will print an error log if addr is 0, so this issue needs to be fixed. Fixes: 50bfd9c06f0f ("net: enetc: set external PHY address in IERB for i.MX94 ENETC") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251222022628.4016403-1-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to ↵Xiaolei Wang1-1/+2
macb_open() In the non-RT kernel, local_bh_disable() merely disables preemption, whereas it maps to an actual spin lock in the RT kernel. Consequently, when attempting to refill RX buffers via netdev_alloc_skb() in macb_mac_link_up(), a deadlock scenario arises as follows: WARNING: possible circular locking dependency detected 6.18.0-08691-g2061f18ad76e #39 Not tainted ------------------------------------------------------ kworker/0:0/8 is trying to acquire lock: ffff00080369bbe0 (&bp->lock){+.+.}-{3:3}, at: macb_start_xmit+0x808/0xb7c but task is already holding lock: ffff000803698e58 (&queue->tx_ptr_lock){+...}-{3:3}, at: macb_start_xmit +0x148/0xb7c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&queue->tx_ptr_lock){+...}-{3:3}: rt_spin_lock+0x50/0x1f0 macb_start_xmit+0x148/0xb7c dev_hard_start_xmit+0x94/0x284 sch_direct_xmit+0x8c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 -> #2 (_xmit_ETHER#2){+...}-{3:3}: rt_spin_lock+0x50/0x1f0 sch_direct_xmit+0x11c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 -> #1 ((softirq_ctrl.lock)){+.+.}-{3:3}: lock_release+0x250/0x348 __local_bh_enable_ip+0x7c/0x240 __netdev_alloc_skb+0x1b4/0x1d8 gem_rx_refill+0xdc/0x240 gem_init_rings+0xb4/0x108 macb_mac_link_up+0x9c/0x2b4 phylink_resolve+0x170/0x614 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 -> #0 (&bp->lock){+.+.}-{3:3}: __lock_acquire+0x15a8/0x2084 lock_acquire+0x1cc/0x350 rt_spin_lock+0x50/0x1f0 macb_start_xmit+0x808/0xb7c dev_hard_start_xmit+0x94/0x284 sch_direct_xmit+0x8c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 other info that might help us debug this: Chain exists of: &bp->lock --> _xmit_ETHER#2 --> &queue->tx_ptr_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&queue->tx_ptr_lock); lock(_xmit_ETHER#2); lock(&queue->tx_ptr_lock); lock(&bp->lock); *** DEADLOCK *** Call trace: show_stack+0x18/0x24 (C) dump_stack_lvl+0xa0/0xf0 dump_stack+0x18/0x24 print_circular_bug+0x28c/0x370 check_noncircular+0x198/0x1ac __lock_acquire+0x15a8/0x2084 lock_acquire+0x1cc/0x350 rt_spin_lock+0x50/0x1f0 macb_start_xmit+0x808/0xb7c dev_hard_start_xmit+0x94/0x284 sch_direct_xmit+0x8c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 Notably, invoking the mog_init_rings() callback upon link establishment is unnecessary. Instead, we can exclusively call mog_init_rings() within the ndo_open() callback. This adjustment resolves the deadlock issue. Furthermore, since MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings() when opening the network interface via at91ether_open(), moving mog_init_rings() to macb_open() also eliminates the MACB_CAPS_MACB_IS_EMAC check. Fixes: 633e98a711ac ("net: macb: use resolved link config in mac_link_up()") Cc: stable@vger.kernel.org Suggested-by: Kevin Hao <kexin.hao@windriver.com> Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com> Link: https://patch.msgid.link/20251222015624.1994551-1-xiaolei.wang@windriver.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30selftests: fib_test: Add test case for ipv4 multi nexthopsVadim Fedorenko1-1/+69
The test checks that with multi nexthops route the preferred route is the one which matches source ip. In case when source ip is on dummy interface, it checks that the routes are balanced. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251221192639.3911901-2-vadim.fedorenko@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30net: fib: restore ECMP balance from loopbackVadim Fedorenko1-16/+10
Preference of nexthop with source address broke ECMP for packets with source addresses which are not in the broadcast domain, but rather added to loopback/dummy interfaces. Original behaviour was to balance over nexthops while now it uses the latest nexthop from the group. To fix the issue introduce next hop scoring system where next hops with source address equal to requested will always have higher priority. For the case with 198.51.100.1/32 assigned to dummy0 and routed using 192.0.2.0/24 and 203.0.113.0/24 networks: 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether d6:54:8a:ff:78:f5 brd ff:ff:ff:ff:ff:ff inet 198.51.100.1/32 scope global dummy0 valid_lft forever preferred_lft forever 7: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 06:ed:98:87:6d:8a brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.0.2.2/24 scope global veth1 valid_lft forever preferred_lft forever inet6 fe80::4ed:98ff:fe87:6d8a/64 scope link proto kernel_ll valid_lft forever preferred_lft forever 9: veth3@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether ae:75:23:38:a0:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 203.0.113.2/24 scope global veth3 valid_lft forever preferred_lft forever inet6 fe80::ac75:23ff:fe38:a0d2/64 scope link proto kernel_ll valid_lft forever preferred_lft forever ~ ip ro list: default nexthop via 192.0.2.1 dev veth1 weight 1 nexthop via 203.0.113.1 dev veth3 weight 1 192.0.2.0/24 dev veth1 proto kernel scope link src 192.0.2.2 203.0.113.0/24 dev veth3 proto kernel scope link src 203.0.113.2 before: for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c: 255 veth3 after: for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c: 122 veth1 133 veth3 Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address") Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20251221192639.3911901-1-vadim.fedorenko@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30selftests: fib_nexthops: Add test cases for error routes deletionIdo Schimmel1-0/+15
Add test cases that check that error routes (e.g., blackhole) are deleted when their nexthop is deleted. Output without "ipv4: Fix reference count leak when using error routes with nexthop objects": # ./fib_nexthops.sh -t "ipv4_fcnal ipv6_fcnal" IPv4 functional ---------------------- [...] WARNING: Unexpected route entry TEST: Error route removed on nexthop deletion [FAIL] IPv6 ---------------------- [...] TEST: Error route removed on nexthop deletion [ OK ] Tests passed: 20 Tests failed: 1 Tests skipped: 0 Output with "ipv4: Fix reference count leak when using error routes with nexthop objects": # ./fib_nexthops.sh -t "ipv4_fcnal ipv6_fcnal" IPv4 functional ---------------------- [...] TEST: Error route removed on nexthop deletion [ OK ] IPv6 ---------------------- [...] TEST: Error route removed on nexthop deletion [ OK ] Tests passed: 21 Tests failed: 0 Tests skipped: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20251221144829.197694-2-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30ipv4: Fix reference count leak when using error routes with nexthop objectsIdo Schimmel1-3/+4
When a nexthop object is deleted, it is marked as dead and then fib_table_flush() is called to flush all the routes that are using the dead nexthop. The current logic in fib_table_flush() is to only flush error routes (e.g., blackhole) when it is called as part of network namespace dismantle (i.e., with flush_all=true). Therefore, error routes are not flushed when their nexthop object is deleted: # ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip route add 198.51.100.1/32 nhid 1 # ip route add blackhole 198.51.100.2/32 nhid 1 # ip nexthop del id 1 # ip route show blackhole 198.51.100.2 nhid 1 dev dummy1 As such, they keep holding a reference on the nexthop object which in turn holds a reference on the nexthop device, resulting in a reference count leak: # ip link del dev dummy1 [ 70.516258] unregister_netdevice: waiting for dummy1 to become free. Usage count = 2 Fix by flushing error routes when their nexthop is marked as dead. IPv6 does not suffer from this problem. Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Closes: https://lore.kernel.org/netdev/d943f806-4da6-4970-ac28-b9373b0e63ac@I-love.SAKURA.ne.jp/ Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20251221144829.197694-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30net: usb: sr9700: fix incorrect command used to write single registerEthan Nelson-Moore1-2/+2
This fixes the device failing to initialize with "error reading MAC address" for me, probably because the incorrect write of NCR_RST to SR_NCR is not actually resetting the device. Fixes: c9b37458e95629b1d1171457afdcc1bf1eb7881d ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Cc: stable@vger.kernel.org Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Link: https://patch.msgid.link/20251221082400.50688-1-enelsonmoore@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>