summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)AuthorFilesLines
44 hoursnet: phy: micrel: fix LAN8814 QSGMII soft resetRobert Marko1-7/+8
[ Upstream commit e027c218c482c6a0ae1948129ccda3b0a2033368 ] LAN8814 QSGMII soft reset was moved into the probe function to avoid triggering it for each of 4 PHY-s in the package. However, that broke QSGMII link between the MAC and PHY on most LAN8814 PHY-s, specificaly for us on the Microchip LAN969x switch. Reading the QSGMII status registers it was visible that lanes were only partially synced. It looks like the reset timing is crucial, so lets move the reset back into the .config_init function but guard it with phy_package_init_once() to avoid it being triggered on each of 4 PHY-s in the package. Change the probe function to use phy_package_probe_once() for coma and PtP setup. Fixes: 96a9178a29a6 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") Signed-off-by: Robert Marko <robert.marko@sartura.hr> Link: https://patch.msgid.link/20260428134138.1741253-1-robert.marko@sartura.hr Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hoursocteontx2-af: validate body pcifunc in rvu_mbox_handler_rep_event_notifyMichael Bommarito3-1/+10
commit 2156a29aecfffa2eb7c558255690084efbe9f3b0 upstream. rvu_mbox_handler_rep_event_notify() in drivers/net/ethernet/marvell/ octeontx2/af/rvu_rep.c queues a sender-controlled REP_EVENT_NOTIFY request body verbatim, and rvu_rep_up_notify() then forwards event->pcifunc (the nested body field, distinct from the AF-normalised header pcifunc) into rvu_get_pfvf(), rvu_get_pf() and the AF->PF mailbox device index without any bounds check. A VF attached to a PF that has been put into switchdev representor mode reaches this path: the VF mailbox handler otx2_pfvf_mbox_handler() forwards every message id including MBOX_MSG_REP_EVENT_NOTIFY to AF without an allowlist, and the AF dispatcher rewrites only msg->pcifunc, leaving struct rep_event::pcifunc attacker-controlled. The sibling rvu_mbox_handler_esw_cfg() refuses requests whose header pcifunc is not rvu->rep_pcifunc; this handler has no equivalent gate. An out-of-range body pcifunc selects an &rvu->pf[]/&rvu->hwvf[] element past the allocated array and, for RVU_EVENT_MAC_ADDR_CHANGE, turns into a six-byte attacker-chosen OOB ether_addr_copy() target inside the queued worker; KASAN reports a slab-out-of-bounds write in rvu_rep_wq_handler. Reject malformed requests at the handler entry by gating on is_pf_func_valid(), which is already the canonical PF/VF range check in this driver; expose it via rvu.h so callers in rvu_rep.c can use it instead of open-coding the same range arithmetic. Fixes: b8fea84a0468 ("octeontx2-pf: Add support to sync link state between representor and VFs") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260520154157.1439319-1-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
44 hoursmacsec: fix replay protection at XPN lower-PN wrapJunrui Luo1-1/+2
commit e68842b3356471ba56c882209f324613dac47f64 upstream. In macsec_post_decrypt(), when pn is U32_MAX, pn + 1 overflows u32 to 0 and the first branch never fires. If next_pn_halves.lower is also in the upper half, pn_same_half(pn, lower) is true and the XPN else-if does not fire either, leaving next_pn_halves unchanged. An attacker that captures the legitimate frame carrying pn == 0xFFFFFFFF on an XPN association can then replay it indefinitely, since lowest_pn never rises above the captured pn and macsec_decrypt() reconstructs the same IV. Extend the XPN else-if to also fire when pn + 1 wraps to 0, so receipt of pn == U32_MAX advances next_pn_halves to (upper + 1, 0). Fixes: a21ecf0e0338 ("macsec: Support XPN frame handling - IEEE 802.1AEbw") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Link: https://patch.msgid.link/SYBPR01MB78813FD49E58F253B989F197AF012@SYBPR01MB7881.ausprd01.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
44 hourswireguard: send: append trailer after expanding headJason A. Donenfeld1-10/+10
commit f75e3eb08fe31d30a9af6ed80cdd22e6772837e2 upstream. With how this is currently written, we add the trailer, zero it out, and then add the header space on. If that header space requires a reallocation + copy, the zeros in the trailer aren't copied, because the skb len hasn't actually been yet expanded to cover that. Instead add the padding at the end of the process rather than at the beginning. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20260529173134.3080773-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
44 hoursnet: pcs: pcs-mtk-lynxi: fix bpi-r3 serdes configurationFrank Wunderlich1-0/+3
[ Upstream commit 422b5233b607476ac7176bfa2a101b9a103d7653 ] Commit 8871389da151 introduces common pcs dts properties which writes rx=normal,tx=normal polarity to register SGMSYS_QPHY_WRAP_CTRL of switch. This is initialized with tx-bit set and so change inverts polarity compared to before. It looks like mt7531 has tx polarity inverted in hardware and set tx-bit by default to restore the normal polarity. The MT7531 datasheet quite clearly states: Register 000050EC QPHY_WRAP_CTRL -- QPHY wrapper control Reset value: 0x00000501 BIT 1 RX_BIT_POLARITY -- RX bit polarity control 1'b0: normal 1'b1: inverted BIT 0 TX_BIT_POLARITY -- TX bit polarity control (TX default inversed in MT7531) 1'b0: normal 1'b1: inverted Till this patch the register write was only called when mediatek,pnswap property was set which cannot be done for switch because the fw-node param was always NULL from switch driver in the mtk_pcs_lynxi_create call. Do not configure switch side like it's done before. Fixes: 8871389da151 ("net: pcs: pcs-mtk-lynxi: deprecate "mediatek,pnswap"") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260526153239.30194-1-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hoursnet: mana: Skip redundant detach on already-detached portDipayaan Roy1-0/+6
[ Upstream commit 5b05aa36ee24297d7296ca58dfd8c448d0e4cda3 ] When mana_per_port_queue_reset_work_handler() runs after a previous detach succeeded but attach failed, the port is left in a detached state with apc->tx_qp and apc->rxqs already freed. Calling mana_detach() again unconditionally leads to NULL pointer dereferences during queue teardown. Add an early exit in mana_detach() when the port is already in detached state (!netif_device_present) for non-close callers, making it safe to call idempotently. This allows the queue reset handler and other recovery paths to simply retry mana_attach() without redundant teardown. Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-3-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hoursnet: mana: Add NULL guards in teardown path to prevent panic on attach failureDipayaan Roy1-29/+41
[ Upstream commit 17bfe0a8c014ee1d542ad352cd6a0a505361664a ] When queue allocation fails partway through, the error cleanup frees and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as mana_remove(), mana_change_mtu() recovery, and internal error handling in mana_alloc_queues() can subsequently call into functions that dereference these pointers without NULL checks: - mana_chn_setxdp() dereferences apc->rxqs[0], causing a NULL pointer dereference panic (CR2: 0000000000000000 at mana_chn_setxdp+0x26). - mana_destroy_vport() iterates apc->rxqs without a NULL check. - mana_fence_rqs() iterates apc->rxqs without a NULL check. - mana_dealloc_queues() iterates apc->tx_qp without a NULL check. Add NULL guards for apc->rxqs in mana_fence_rqs(), mana_destroy_vport(), and before the mana_chn_setxdp() call. Add a NULL guard for apc->tx_qp in mana_dealloc_queues() to skip TX queue draining when TX queues were never allocated or already freed. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-2-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hoursnet: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX pathJijie Shao1-3/+3
[ Upstream commit b545b6ea1802b32436fa97f1d2918718212cc831 ] The dma_rmb() barrier was placed before dma_sync_single_for_cpu(), which is incorrect. DMA sync must complete first to make the buffer accessible to the CPU, then the rmb barrier ensures subsequent descriptor reads observe the latest data written by the hardware. Reorder the operations so dma_sync_single_for_cpu() is called before dma_rmb() to guarantee the driver reads consistent data from the DMA buffer. Fixes: f72e25594061 ("net: hibmcge: Implement rx_poll function to receive packets") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260525144525.94884-3-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hoursnet: hibmcge: disable Relaxed Ordering to fix RX packet corruptionJijie Shao1-0/+3
[ Upstream commit 463a1271aa26eac992851b9d98cc75bc3cd4a1ed ] When SMMU is disabled, the hibmcge driver may receive corrupted packets. The hardware writes packet data and descriptors to the same page, but with Relaxed Ordering enabled, PCI write transactions may not be strictly ordered. This can cause the driver to observe a valid descriptor before the corresponding packet data is fully written. Fix this by clearing PCI_EXP_DEVCTL_RELAX_EN in the PCI bridge control register to ensure strict write ordering between packet data and descriptors. Fixes: f72e25594061 ("net: hibmcge: Implement rx_poll function to receive packets") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260525144525.94884-2-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hoursbonding: refuse to enslave CAN devicesOliver Hartkopp1-0/+6
[ Upstream commit 8ba68464e4787b6a7ec938826e16124df20fd23d ] syzbot reported a kernel paging request crash in can_rx_unregister() inside net/can/af_can.c. The crash occurs because a virtual CAN device (vxcan) is being enslaved to a bonding master. During the enslavement process, the bonding driver mutates and modifies the network device states to fit an Ethernet-like aggregation model. However, CAN devices operate on a completely different Layer 2 architecture, relying on the CAN mid-layer private data structure (can_ml_priv) instead of standard Ethernet structures. Since bonding does not initialize or maintain these CAN structures, subsequent operations on the half-enslaved interface (such as closing associated sockets via isotp_release) lead to a null-pointer dereference when accessing the CAN receiver lists. Bonding CAN interfaces is architecturally invalid as CAN lacks MAC addresses, ARP capabilities, and standard Ethernet link-layer mechanisms. While generic loopback devices are blocked globally in net/core/dev.c, virtual CAN devices bypass this check because they do not carry the IFF_LOOPBACK flag, despite acting as local software-loopbacks. Fix this by explicitly blocking network devices of type ARPHRD_CAN from being enslaved at the very beginning of bond_enslave(). This prevents illegal state mutations, eliminates the resulting KASAN crashes, and avoids potential memory leaks from incomplete socket cleanups. As the CAN support has been added a long time after bonding the Fixes-tag points to the introduction of ARPHRD_CAN that would have needed a specific handling in bonding_main.c. Fixes: cd05acfe65ed ("[CAN]: Allocate protocol numbers for PF_CAN") Reported-by: syzbot+8ed98cbd0161632bce95@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8ed98cbd0161632bce95 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20260526-bonding-candev-v1-1-ba1df400918a@hartkopp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hoursvxlan: do not reuse cached ip_hdr() value after skb_tunnel_check_pmtu()Eric Dumazet1-2/+2
[ Upstream commit 7d9ef0cb271555d8cf39fefe6c981e1493b25ecf ] skb_tunnel_check_pmtu() can change skb->head. Reusing old_iph afer skb_tunnel_check_pmtu() can cause an UAF. Use instead ip_hdr(skb) as done in drivers/net/bareudp.c and drivers/net/geneve.c. Found by Sashiko. Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Link: https://patch.msgid.link/20260525203642.2389723-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hoursnet/mlx5: HWS: Reject unsupported remove-header actionPrathamesh Deshpande1-1/+3
[ Upstream commit 86f1d0f063e423a5c1982db1e5e7a8eac511e603 ] mlx5_cmd_hws_packet_reformat_alloc() handles MLX5_REFORMAT_TYPE_REMOVE_HDR by looking up a matching HWS remove-header action. If mlx5_fs_get_action_remove_header_vlan() returns NULL, the code only logs an error and continues. The function then returns success with a NULL HWS action stored in the packet-reformat object. Return an error when no matching remove-header action is available. Fixes: aecd9d1020e3 ("net/mlx5: fs, add HWS packet reformat API function") Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260506000054.51797-1-prathameshdeshpande7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hourstun: free page on build_skb failure in tun_xdp_one()Weiming Shi1-0/+1
[ Upstream commit aa8963fdce667a42fb7f0bdd2909fadcab02f9a8 ] When build_skb() fails in tun_xdp_one(), the function sets ret to -ENOMEM and jumps to the out label, which returns without freeing the page that vhost_net_build_xdp() allocated for the frame. As with the short-frame rejection path, tun_sendmsg() discards the per-buffer error and still returns total_len, so vhost_tx_batch() takes the success path and never frees the page. Each build_skb() failure in a batch leaks one page-frag chunk. Free the page before taking the error path, matching the put_page() the other error exits of tun_xdp_one() already perform. Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260521163312.1479805-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hourstap: free page on error paths in tap_get_user_xdp()Weiming Shi1-0/+2
[ Upstream commit 3bcf7aec6a9d16438f2cec29f5d7c8d5b8edf9b2 ] tap_get_user_xdp() rejects a frame shorter than ETH_HLEN with -EINVAL, and returns -ENOMEM when build_skb() fails. Both paths jump to the err label without freeing the page that vhost_net_build_xdp() allocated for the frame. tap_sendmsg() discards the per-buffer return value and always returns 0, so vhost_tx_batch() takes the success path and never frees the page; each rejected frame in a batch leaks one page-frag chunk. Free the page on both error paths, before the skb is built. This is the tap counterpart of the same leak in tun_xdp_one(). Fixes: 0efac27791ee ("tap: accept an array of XDP buffs through sendmsg()") Fixes: ed7f2afdd0e0 ("tap: add missing verification for short frame") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260521163230.1478627-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
44 hourstun: free page on short-frame rejection in tun_xdp_one()Weiming Shi1-1/+3
[ Upstream commit f4feb1e20058e407cb00f45aff47f5b7e19a6bbf ] tun_xdp_one() returns -EINVAL on a frame shorter than ETH_HLEN without freeing the page that vhost_net_build_xdp() allocated for it. tun_sendmsg() discards that -EINVAL and still returns total_len, so vhost_tx_batch() takes the success path and never frees the page; each short frame in a batch leaks one page-frag chunk. A local process that can open /dev/net/tun and /dev/vhost-net can hit this path: it attaches a tun/tap device as the vhost-net backend and feeds TX descriptors whose length minus the virtio-net header is below ETH_HLEN. Each kick leaks the page-frag chunks for that batch, and a tight submission loop exhausts host memory and triggers an OOM panic. Free the page before returning -EINVAL, matching the XDP-program error path in the same function. Fixes: 049584807f1d ("tun: add missing verification for short frame") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260520160020.375349-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: enetc: fix missing error code when pf->vf_state allocation failsWei Fang1-1/+3
[ Upstream commit 5027266dea471e140f93dd534845c9c4f43219a3 ] In enetc_pf_probe(), when the memory allocation for pf->vf_state fails, the code jumps to the error handling label but the variable 'err' is not assigned an appropriate error code beforehand. This causes the function to return 0 (success) on an allocation failure path, misleading the caller into thinking the probe succeeded. So set err to -ENOMEM before jumping to the error handling label when the allocation for pf->vf_state returns NULL. Fixes: e15c5506dd39 ("net: enetc: allocate vf_state during PF probes") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260520064421.91569-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 dayspds_core: ensure null-termination for firmware version stringsNikhil P. Rao1-2/+4
[ Upstream commit 3d4432d34c1992701289cbe12df9fd024f315998 ] The driver passes fw_version directly to devlink_info_version_stored_put() without ensuring null-termination. While current firmware null-terminates these strings, the driver should not rely on this behavior. Add explicit null-termination to prevent potential issues if firmware behavior changes. Fixes: 45d76f492938 ("pds_core: set up device and adminq") Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Link: https://patch.msgid.link/20260520205842.1486718-1-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: airoha: Disable GDM2 forwarding before configuring GDM2 loopbackLorenzo Bianconi1-4/+6
[ Upstream commit 985d4a55e64e43bd86eeb896b81ceba453301989 ] Hw design requires to disable GDM2 forwarding before configuring GDM2 loopback in airoha_set_gdm2_loopback routine. Fixes: 9cd451d414f6e ("net: airoha: Add loopback support for GDM2") Tested-by: Madhur Agrawal <madhur.agrawal@airoha.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260520-airoha-disable-gdm2-fwd-v1-1-1eeea5dffc2f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daystap: fix stack info leak in tap_ioctl() SIOCGIFHWADDRWeiming Shi1-1/+1
[ Upstream commit bddc09212c24934643bd44fc794748d2bbb3b6cd ] In the SIOCGIFHWADDR path, tap_ioctl() copies 16 bytes of an uninitialised on-stack struct sockaddr_storage to userspace via ifr_hwaddr, but netif_get_mac_address() only writes sa_family and dev->addr_len (6 for Ethernet) bytes, leaving sa_data[6..13] uninitialised. Those 8 trailing bytes leak kernel stack contents; SIOCGIFHWADDR on a macvtap chardev returns kernel .text and direct-map pointers, defeating KASLR. Initialise ss at declaration. Fixes: 3b23a32a6321 ("net: fix dev_ifsioc_locked() race condition") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260520075736.3415676-3-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: mana: validate rx_req_idx to prevent out-of-bounds array accessAditya Garg1-0/+6
[ Upstream commit b809d0409991b75a6cff846a5ac27c3062953f84 ] In mana_hwc_rx_event_handler(), rx_req_idx is derived from sge->address in DMA-coherent memory. In Confidential VMs (SEV-SNP/TDX), this memory is shared unencrypted and HW can modify WQE contents at any time. No bounds check exists on rx_req_idx, which can lead to an out-of-bounds access into reqs[]. Add bounds check on rx_req_idx in mana_hwc_rx_event_handler() before using it to index the reqs[] array. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260520051553.857120-1-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysocteontx2-af: npc: Fix allmulticast skip logic for LBK and SDP VFsRatheesh Kannoth1-1/+1
[ Upstream commit 9eddc819f00b5b74bb4ac91396f80bd35f5f3561 ] When installing the allmulticast NPC rule, rvu_npc_install_allmulti_entry() should skip LBK and SDP VFs (only CGX PF/VF may add the entry). The code combined is_lbk_vf() and is_sdp_vf() with logical AND, which is never true for a single pcifunc, so the intended early return never ran. Use logical OR instead. Cc: Geetha sowjanya <gakula@marvell.com> Fixes: ae703539f49d2 ("octeontx2-af: Cleanup loopback device checks") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260520043036.1523798-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: stmmac: eswin: validate RGMII delay valuesZhi Li1-4/+25
[ Upstream commit c2e152f7ce3208b9333d212d41a87637ec1dd170 ] Validate rx-internal-delay-ps and tx-internal-delay-ps against the hardware capabilities of the EIC7700 MAC. The programmable RGMII delay supports 20 ps steps and a maximum value of 2540 ps. The driver previously accepted arbitrary values and silently truncated unsupported settings when converting them to hardware units. As a result, invalid device tree values could lead to unexpected delay programming and incorrect RGMII timing. Reject delay values that are not multiples of 20 ps or exceed the supported hardware range. Fixes: ea77dbbdbc4e ("net: stmmac: add Eswin EIC7700 glue driver") Signed-off-by: Zhi Li <lizhi2@eswincomputing.com> Link: https://patch.msgid.link/20260518022214.507-1-lizhi2@eswincomputing.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: stmmac: eswin: correct RGMII delay granularity to 20 psZhi Li1-4/+4
[ Upstream commit 6ffcef9bc1fc2ad8110777decd6d026e3cb468ce ] The EIC7700 MAC implements programmable RGMII delay adjustment with a granularity of 20 ps per hardware step. The driver previously converted rx-internal-delay-ps and tx-internal-delay-ps values using a 100 ps step size, resulting in incorrect delay programming. Update the conversion to use the correct 20 ps granularity so the programmed delay matches the values described in the device tree. Fixes: ea77dbbdbc4e ("net: stmmac: add Eswin EIC7700 glue driver") Signed-off-by: Zhi Li <lizhi2@eswincomputing.com> Link: https://patch.msgid.link/20260518022156.484-1-lizhi2@eswincomputing.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: stmmac: eswin: clear TXD and RXD delay registers during initializationZhi Li1-0/+22
[ Upstream commit 6872fb088edc1a3c36792b301f8e4a1c35dd7c35 ] Clear the TXD and RXD delay control registers during EIC7700 DWMAC initialization. These registers may retain values programmed by the bootloader. If left unchanged, residual delays can alter the effective RGMII timing seen by the MAC and override the configuration described by the device tree. This may violate the expected RGMII timing model and can cause link instability or prevent the Ethernet controller from operating correctly. Explicitly clearing these registers ensures that the MAC delay settings are determined solely by the kernel configuration. The corresponding register offsets are optional, and the registers are only cleared when the offsets are provided in the device tree. Fixes: ea77dbbdbc4e ("net: stmmac: add Eswin EIC7700 glue driver") Signed-off-by: Zhi Li <lizhi2@eswincomputing.com> Link: https://patch.msgid.link/20260518022137.464-1-lizhi2@eswincomputing.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: stmmac: eswin: fix HSP CSR init ordering after clock enableZhi Li1-32/+41
[ Upstream commit 23386defe949c0db4f746bed7098fc5e06746083 ] Fix the initialization ordering of the HSP CSR configuration in the EIC7700 DWMAC glue driver. The HSP CSR registers control MAC-side RGMII delay behavior and must only be accessed after the corresponding clocks are enabled. The previous implementation could trigger register access before clock enablement, leading to undefined behavior depending on boot state. Move the HSP CSR configuration into the post-clock-enable initialization path to ensure all register accesses occur under valid clock domains. This change ensures deterministic initialization and prevents clock-dependent register access failures during probe or resume. Fixes: ea77dbbdbc4e ("net: stmmac: add Eswin EIC7700 glue driver") Signed-off-by: Zhi Li <lizhi2@eswincomputing.com> Link: https://patch.msgid.link/20260518022055.444-1-lizhi2@eswincomputing.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: ag71xx: check error for platform_get_irqRosen Penev1-0/+3
[ Upstream commit e7c70bf97e90d974cd575e4c90f8f9b07d056da3 ] Complete error handling for a failed platform_get_irq() call Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20260516212616.11758-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet/mlx5e: Fix eswitch mode block underflow on IPsec acquire SAPrathamesh Deshpande1-3/+4
[ Upstream commit abe003b33223ff33552f291644bf35d9c2f992fb ] mlx5e_xfrm_add_state() handles acquire-flow temporary SAs by allocating software state and skipping hardware offload setup. That path jumps to the common success label before taking the eswitch mode block. After tunnel-mode validation was moved earlier, the common success label unconditionally calls mlx5_eswitch_unblock_mode(). For acquire SAs, this decrements esw->offloads.num_block_mode without a matching increment. Return directly after installing the acquire SA offload handle, so only the paths that successfully called mlx5_eswitch_block_mode() call the matching unblock. Fixes: 22239eb258bc ("net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed") Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260510225903.13184-1-prathameshdeshpande7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 dayswifi: wilc1000: fix dma_buffer leak on bus acquire failureShitalkumar Gandhi1-1/+1
[ Upstream commit dd7b6a8671939708cc4b7a46786d8c11297e8f69 ] wilc_wlan_firmware_download() allocates dma_buffer with kmalloc() at the top of the function and uses a 'fail:' label to free it via kfree(dma_buffer) on error. All later error paths correctly use 'goto fail' to route through this cleanup. However, the early failure path after the first acquire_bus() call uses a bare 'return ret;', which leaks dma_buffer whenever the bus acquire fails. Replace the early return with goto fail so the existing cleanup path runs. Found via a custom Coccinelle semantic patch hunting for kmalloc'd locals leaked on early-return error paths in driver firmware-download code. Fixes: 1241c5650ff7 ("wifi: wilc1000: Fill in missing error handling") Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260511042732.998311-1-shitalkumar.gandhi@cambiumnetworks.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 dayspds_core: fix debugfs_lookup dentry leak and error handlingNikhil P. Rao1-1/+6
[ Upstream commit dc416e32baaeb620b9809e9e25fc7b30889686e9 ] debugfs_lookup() returns a dentry with an elevated reference count that must be released with dput(). The current code discards the returned dentry without calling dput(), causing a reference leak on every firmware reset recovery. Additionally, when CONFIG_DEBUG_FS is disabled, debugfs_lookup() returns ERR_PTR(-ENODEV), not NULL. The current check passes for error pointers and would call dput() on an invalid pointer, causing a crash. Fixes: bc90fbe0c318 ("pds_core: Rework teardown/setup flow to be more common") Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Link: https://patch.msgid.link/20260515212907.998028-3-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 dayspds_core: fix error handling in pdsc_devcmd_waitNikhil P. Rao1-2/+9
[ Upstream commit 0e46b6635b03d29807f810c3b415c4755a3f958d ] Fix two cases where pdsc_devcmd_wait() returns stale success from the completion register instead of an error: 1. FW crash: If firmware stops running, the wait loop breaks early with running=false. The condition "if ((!done || timeout) && running)" is false, so error handling is bypassed and stale status is returned. Check !running first and return -ENXIO. 2. Timeout: If a command times out, err is set to -ETIMEDOUT but then overwritten by pdsc_err_to_errno(status) which reads stale status. Return -ETIMEDOUT immediately after cleaning up. Both errors now propagate to pdsc_devcmd_locked() which queues health_work for recovery. Fixes: 45d76f492938 ("pds_core: set up device and adminq") Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Link: https://patch.msgid.link/20260515212907.998028-1-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: phy: honor eee_disabled_modes in phy_advertise_eee_all()Nicolai Buchwitz1-1/+2
[ Upstream commit 8baa7506d793f0636e3f6f01b01ef7be19674d06 ] phy_advertise_eee_all() copies supported_eee into advertising_eee unconditionally, overwriting any filtering applied during phy_probe() based on DT eee-broken-* properties or driver-populated eee_disabled_modes. genphy_c45_ethtool_set_eee() calls this helper when user space passes an empty advertisement, undoing the filtering. Apply the same eee_disabled_modes mask in phy_advertise_eee_all() so the filtering survives the copy, matching the pattern in phy_probe() and phy_support_eee(). Fixes: b64691274f5d ("net: phy: add helper phy_advertise_eee_all") Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260518-devel-phy-support-eee-fix-v2-2-05b52626fa68@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: phy: honor eee_disabled_modes in phy_support_eee()Nicolai Buchwitz1-1/+2
[ Upstream commit 3655063e083889ed4b79b7dda9cec65478dce09a ] phy_support_eee() copies supported_eee into advertising_eee unconditionally, overwriting any filtering applied during phy_probe() based on DT eee-broken-* properties or driver-populated eee_disabled_modes. MAC drivers that call phy_support_eee() after probe (e.g. bcmgenet, fec, lan743x, lan78xx, r8169) then cause the PHY to advertise EEE for modes the user marked as broken. The symptom is that ethtool --show-eee on the local interface reports "not supported" (supported & ~eee_disabled_modes is empty) while the link partner sees EEE negotiated and active. phy_probe() already filters advertising_eee via eee_disabled_modes after calling of_set_phy_eee_broken(). Apply the same mask in phy_support_eee() so the filtering survives the copy. Fixes: 49168d1980e2 ("net: phy: Add phy_support_eee() indicating MAC support EEE") Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260518-devel-phy-support-eee-fix-v2-1-05b52626fa68@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: mana: Fix TOCTOU double-fetch of hwc_msg_id from DMA bufferErni Sri Satya Vennela1-10/+13
[ Upstream commit 35f0f0a2536a4d604b4dbad92c85c4a8fdebb870 ] In mana_hwc_rx_event_handler(), resp->response.hwc_msg_id is read from DMA-coherent memory and bounds-checked, then mana_hwc_handle_resp() re-reads the same field from the same DMA buffer for test_bit() and pointer arithmetic. DMA-coherent memory is mapped uncacheable on x86 and is shared, unencrypted, in Confidential VMs (SEV-SNP/TDX), so each load goes directly to host-visible memory. A H/W can modify the value between the check and the use, bypassing the bounds validation. Fix this by reading hwc_msg_id exactly once using READ_ONCE() into a stack-local variable in mana_hwc_rx_event_handler(), and passing the validated value as a parameter to mana_hwc_handle_resp(). Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Link: https://patch.msgid.link/20260514194156.466823-1-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: dsa: mt7530: preserve VLAN tags on trapped link-local framesDaniel Golle1-12/+15
[ Upstream commit 3ac85bcfd404b588298c95c6fba8aad4ad334f57 ] The BPC, RGAC1 and RGAC2 registers control the handling of link-local frames with reserved MAC DAs (01:80:C2:00:00:0x). These frames are correctly trapped to the CPU port, but the egress VLAN tag attribute was set to MT7530_VLAN_EG_UNTAGGED which causes the switch to strip any VLAN tags from trapped frames before they reach the CPU. This causes VLAN-tagged link-local frames (STP BPDUs, LLDP, PTP Peer Delay Requests) to arrive at the CPU without their VLAN tag, so they are delivered to the base network interface instead of the VLAN sub-interface. The DSA local_termination selftest confirms this: all link-local protocol tests on VLAN upper interfaces fail. Set the EG_TAG attribute to MT7530_VLAN_EG_DISABLED (system default) so that the switch does not modify VLAN tags in trapped frames. This way VLAN-tagged frames retain their original tag and are delivered to the correct VLAN sub-interface, matching the behavior of non-trapped frames which pass through without VLAN tag modification. Fixes: 69ddba9d170b ("net: dsa: mt7530: fix handling of all link-local frames") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Chester A. Unal <chester.a.unal@arinc9.com> Link: https://patch.msgid.link/891e0cd34db2a5fe20ceb73283a81fb5f71427ca.1778766629.git.daniel@makrotopia.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: dsa: mt7530: fix FDB entries not aging out with short timeoutDaniel Golle1-6/+14
[ Upstream commit e824e40d0e841fab66ab7897d6c7b14dc81c66a7 ] The DSA forwarding selftests bridge_vlan_aware.sh and bridge_vlan_unaware.sh configure the bridge with ageing_time set to LOW_AGEING_TIME (1000 centiseconds, i.e. 10 seconds) and then run learning_test() in lib.sh, which expects a learned FDB entry to be removed after ageing_time + 10 seconds. On MT7530/MT7531 the entry persisted past the deadline and the "Found FDB record when should not" assertion failed. With msecs=10000, the algorithm in mt7530_set_ageing_time() finds AGE_CNT=0 and AGE_UNIT=9 as the first exact match (starting the search from tmp_age_count=0). The per-entry aging counter is initialized to AGE_CNT when a MAC address is learned, so with AGE_CNT=0 new entries start with a counter value of 0, which the hardware treats as "already aged" and never removes, effectively disabling aging. Fix this by starting the search from tmp_age_count=1 to ensure entries always have a non-zero initial aging counter. For a 10-second ageing time this yields AGE_CNT=1 and AGE_UNIT=4 instead: the timer ticks every 5 seconds and entries are removed after 2 ticks. Starting the search at AGE_CNT=1 raises the minimum representable ageing time from 1 to 2 seconds. Without bounds, a stale ageing_time of 1 second would now make the loop fall through without setting age_count and age_unit, leaving them uninitialized when written to the MT7530_AAC hardware register. Set ds->ageing_time_min and ds->ageing_time_max so the DSA core validates the range before the callback is invoked, and drop the now-redundant range check from mt7530_set_ageing_time(). Fixes: ea6d5c924e39 ("net: dsa: mt7530: support setting ageing time") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/7788ded12dc07b1bce329ec35fa70f4b45f3f9b7.1778766629.git.daniel@makrotopia.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysigc: set tx buffer type for SMD framesKohei Enju1-0/+1
[ Upstream commit 5acc641e590e008caaed480ed9ffae47cf7ecbdf ] Sashiko pointed out that igc_fpe_init_smd_frame() initializes igc_tx_buffer fields for an SMD skb, but does not set the buffer type: https://sashiko.dev/#/patchset/20260415025226.114115-1-kohei%40enjuk.jp Since igc_tx_buffer entries are reused, a stale XDP or XSK type can remain and make TX completion use the wrong cleanup path. Set the buffer type to IGC_TX_BUFFER_TYPE_SKB. Fixes: 5422570c0010 ("igc: add support for frame preemption verification") Signed-off-by: Kohei Enju <kohei@enjuk.jp> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260515182419.1597859-9-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysice: ptp: use primary NAC semaphore on E825Grzegorz Nitka1-0/+9
[ Upstream commit 7b28523546c7e4adbb8436f2986efcfc8382985e ] For E825 2xNAC configurations, PTP semaphore operations must hit the primary NAC register block so both sides coordinate on the same lock. Commit e2193f9f9ec9 ("ice: enable timesync operation on 2xNAC E825 devices") updated other primary-only PTP register accesses to use the primary NAC on non-primary functions, but left ice_ptp_lock() and ice_ptp_unlock() operating on the local NAC. As a result, secondary NAC PTP paths can take a different semaphore than the primary side. Select the primary hardware in ice_ptp_lock() and ice_ptp_unlock() when the current function is not primary, keeping semaphore operations symmetric and consistent with the rest of the 2xNAC PTP register access path. Fixes: e2193f9f9ec9 ("ice: enable timesync operation on 2xNAC E825 devices") Reviewed-by: Arkadiusz Kubalewski <Arkadiusz.kubalewski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260515182419.1597859-6-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysice: ptp: serialize E825 PHY timer start with PTP lockGrzegorz Nitka1-2/+13
[ Upstream commit 781ff8f2d575a794a2a4f11605288ae06757f5eb ] ice_start_phy_timer_eth56g() programs TIMETUS registers and issues INIT_INCVAL without holding the global PTP semaphore. This allows concurrent PTP command paths to interleave with PHY timer start, which can make the sequence fail and leave timer initialization inconsistent. Take the PTP lock around TIMETUS registers programming and INIT_INCVAL command execution, and make sure the lock is released on all error paths. Keep the subsequent sync step outside of this critical section, since ice_sync_phy_timer_eth56g() takes the same semaphore internally. Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski <Arkadiusz.kubalewski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260515182419.1597859-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet/mlx5e: xsk: Fix unlocked writing to ICOSQDragos Tatulea1-1/+1
[ Upstream commit c326f9c68921e2f14dfcecb2f6b4216313d50248 ] During napi poll, when the affinity changes and there's still XSK work to be done, we trigger an ICOSQ interrupt on the new CPU. However, this triggering on the ICOSQ is done unprotected. There are 2 such races: A) mlx5e_trigger_irq() is called while mlx5e_xsk_alloc_rx_mpwqe() is running from a different CPU due to affinity change. This can happen because IRQ triggering is done after napi_complete_done(). At this point the NAPI can be scheduled on a different CPU. Like this: CPU A (old affinity, NAPI tail) CPU B (new affinity, fresh NAPI) ------------------------------- -------------------------------- napi_complete_done() clears SCHED mlx5e_cq_arm(...) napi_schedule_prep() sets SCHED mlx5e_napi_poll() mlx5e_xsk_alloc_rx_mpwqe() mlx5e_icosq_sync_lock() // noop memcpy 640 B UMR body advance sq->pc by 10 mlx5e_trigger_irq(&c->icosq) wqe_info[pi] = {NOP, 1} mlx5e_post_nop() advances sq->pc B) mlx5e_trigger_irq() is called on the ICOSQ when mlx5e_trigger_napi_icosq() is running. The obvious fix would be to lock the ICOSQ. But ICOSQ has an optimized locking scheme that doesn't work for this scenario. Kick the async ICOSQ instead which is always locked. This issue was noticed in the wild with the following splat: netdevice: ge-0-0-1: Bad OP in ICOSQ CQE: 0xd WARNING: drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:826 [...] [...] Call Trace: <IRQ> mlx5e_napi_poll+0x11d/0x7f0 [mlx5_core] __napi_poll+0x30/0x200 ? skb_defer_free_flush+0x9c/0xc0 net_rx_action+0x2fe/0x3f0 handle_softirqs+0xd8/0x340 __irq_exit_rcu+0xbc/0xe0 common_interrupt+0x85/0xa0 </IRQ> <TASK> asm_common_interrupt+0x26/0x40 [...] ---[ end trace 0000000000000000 ]--- mlx5_core 0000:08:00.0 ge-0-0-1: Error cqe on cqn 0x548, ci 0x2022, qn 0x8f4, opcode 0xd, syndrome 0x2, vendor syndrome 0x68 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 01 00 68 02 01 00 08 f4 de 14 59 d2 WQE DUMP: WQ size 16384 WQ cur size 0, WQE index 0x1e14, len: 64 00000000: 00 00 00 01 d9 ed 80 02 00 00 00 01 d9 ed 90 02 00000010: 00 00 00 01 d9 ed a0 02 00 00 00 01 d9 ed b0 02 00000020: 00 00 00 01 d9 ed c0 02 00 00 00 01 d9 ed d0 02 00000030: 00 00 00 01 d9 ed e0 02 00 00 00 01 d9 ed f0 02 mlx5_core 0000:08:00.0 ge-0-0-1: Error cqe on cqn 0x548, ci 0x2023, qn 0x8f4, opcode 0xd, syndrome 0x5, vendor syndrome 0xf9 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 01 00 f9 05 01 00 08 f4 de 15 cf d2 Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support") Reported-by: Paul Saab <ps@mu.org> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260513064613.334602-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 dayswifi: ath12k: fix EHT TX MCS limitation due to wrong 20 MHz-only parsingBaochen Qiang1-2/+6
[ Upstream commit 60fb2cf51e77bb1c0261160b4be44209d68956b1 ] When connecting to an AP configured for EHT 20 MHz with a full EHT MCS/NSS map (supporting MCS 0-13) Supported EHT-MCS and NSS Set EHT-MCS Map (BW <= 80MHz): 0x444444 .... .... .... .... .... 0100 = Rx Max Nss That Supports EHT-MCS 0-9: 4 .... .... .... .... 0100 .... = Tx Max Nss That Supports EHT-MCS 0-9: 4 .... .... .... 0100 .... .... = Rx Max Nss That Supports EHT-MCS 10-11: 4 .... .... 0100 .... .... .... = Tx Max Nss That Supports EHT-MCS 10-11: 4 .... 0100 .... .... .... .... = Rx Max Nss That Supports EHT-MCS 12-13: 4 0100 .... .... .... .... .... = Tx Max Nss That Supports EHT-MCS 12-13: 4 TX throughput is observed to be significantly lower than expected. Investigation shows that TX rates are limited to EHT MCS 11, even though the AP advertises support for EHT MCS 12/13. The root cause is an incorrect parsing of the Supported EHT-MCS and NSS Set element in ath12k_peer_assoc_h_eht(). IEEE Std 802.11be-2024 Figure 9-1074as describes the format for 20 MHz-Only Non-AP STAs. IEEE Std 802.11be-2024 Figure 9-1074at describes the format for all other AP and non-AP STAs. Currently the first format is parsed when the peer advertises no wider HE channel width support, without considering whether it is an AP or a non-AP STA. This is incorrect: the peer AP's capabilities must be parsed using Figure 9-1074at even when it operates on 20 MHz only. Parsing it as Figure 9-1074as causes rx_tx_mcs13_max_nss to be interpreted as zero, which is then passed to firmware, leading firmware to assume the peer does not support MCS 13 and to limit TX rates at MCS 11. Fix this by parsing the Figure 9-1074as format only when the peer is a 20 MHz-Only non-AP STA, i.e. when the local interface operates as AP or mesh point. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3 Fixes: 6c95151e2e77 ("wifi: ath12k: Add EHT MCS/NSS rates to Peer Assoc") Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com> Link: https://patch.msgid.link/20260514-ath12k-fix-20mhz-only-mcs-map-v1-1-a38d4a9b21a2@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 dayswifi: ath11k: fix peer resolution on rx path when peer_id=0Matthew Leach2-6/+2
[ Upstream commit 2a2451a34afdf563b3102d36a4b6cf335cf813e2 ] It has been observed that on certain chipsets a peer can be assigned peer_id=0. For reception of non-aggregated MPDUs this is fine as ath11k_dp_rx_h_find_peer() has a fallback case where it locates the peer based upon the source MAC address. On an aggregated link, the mpdu_start header is only populated by hardware on the first sub-MSDU. This causes the peer resolution to be skipped for the subsequent MSDUs and the encryption type of these frames to be set to an incorrect value, resulting in these MSDUs being dropped by ieee80211. ath11k_pci 0000:03:00.0: data rx skb 000000002f4b704d len 1534 peer xx:xx:xx:xx:xx:xx 0 ucast sn 3063 he160 rate_idx 9 vht_nss 2 freq 5240 band 1 flag 0x40d1a fcs-err 0 mic-err 0 amsdu-more 0 peer_id 0 first_msdu 1 last_msdu 0 ath11k_pci 0000:03:00.0: data rx skb 0000000038acd580 len 1534 peer (null) 0 ucast sn 3063 he160 rate_idx 9 vht_nss 2 freq 5240 band 1 flag 0x40d00 fcs-err 0 mic-err 0 amsdu-more 0 peer_id 0 first_msdu 0 last_msdu 1 Remove the null peer_id checks in ath11k_dp_rx_h_find_peer() and ath11k_hal_rx_parse_mon_status_tlv(), allowing peers with an assigned ID of 0 to be resolved. Tested-on: QCA2066 hw2.1 PCI WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.9 Fixes: 2167fa606c0f ("ath11k: Add support for RX decapsulation offload") Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Signed-off-by: Matthew Leach <matthew.leach@collabora.com> Reviewed-by: P Praneesh <praneesh.p@oss.qualcomm.com> Link: https://patch.msgid.link/20260424-ath11k-null-peerid-workaround-v4-1-252b224d3cf6@collabora.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 dayswifi: iwlwifi: mld: don't dereference a pointer before NULL checking itMiri Korenblit1-7/+6
[ Upstream commit d733ed481fd20a8e7bfe5119c4e77761ba3f87ee ] In iwl_mld_remove_link, the link->fw_id is saved at the beginning of the function so we have it after we freed the link. But the link pointer can be NULL, and is not checked when the fw_id is stored. Fix it by simply freeing the link at the end of the function. fFixes: 0e66a39f4f0e ("wifi: iwlwifi: fix potential use after free in iwl_mld_remove_link()") Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20260515151351.371f40fc6711.I6a82cfe9655564e9c5731af91c36493b26b1208e@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 dayswifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabledCole Leavitt1-1/+4
[ Upstream commit 92cee08dc4f00e77fd1317e4343c5d458b0abab7 ] When the TLC notification disables AMSDU for a TID, the MLD driver sets max_tid_amsdu_len to the sentinel value 1. The TSO segmentation path in iwl_mld_tx_tso_segment() checks for zero but not for this sentinel, allowing it to reach the num_subframes calculation: num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad) = (1 + 2) / (1534 + 2) = 0 This zero propagates to iwl_tx_tso_segment() which sets: gso_size = num_subframes * mss = 0 Calling skb_gso_segment() with gso_size=0 creates over 32000 tiny segments from a single GSO skb. This floods the TX ring with ~1024 micro-frames (the rest are purged), creating a massive burst of TX completion events that can lead to memory corruption and a subsequent use-after-free in TCP's retransmit queue (refcount underflow in tcp_shifted_skb, NULL deref in tcp_rack_detect_loss). The MVM driver is immune because it checks mvmsta->amsdu_enabled before reaching the num_subframes calculation. The MLD driver has no equivalent bitmap check and relies solely on max_tid_amsdu_len, which does not catch the sentinel value. Fix this by detecting the sentinel value (max_tid_amsdu_len == 1) at the existing check and falling back to non-AMSDU TSO segmentation. Also add a WARN_ON_ONCE guard after the num_subframes division as defense-in-depth to catch any future code paths that produce zero through a different mechanism. Suggested-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver") Signed-off-by: Cole Leavitt <cole@unwrap.rs> Link: https://patch.msgid.link/20260405054145.1064152-3-cole@unwrap.rs Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet/mlx5: Skip disabled vports when setting max TX speedOr Har-Toov3-0/+27
[ Upstream commit c6df9a65cbb0fe7808a4b2872095f4c849b3196a ] When setting vports max TX speed during LAG activation or bond state changes, the code iterates over all eswitch vports. However, some vports may not be enabled yet. Skip vports that are not enabled to avoid sending FW commands for uninitialized vports. Save the LAG aggregated speed in the vport struct so it can be applied when the vport is enabled later. Fixes: 50f1d188c580 ("net/mlx5: Propagate LAG effective max_tx_speed to vports") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260513063640.334132-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet/mlx5: Do not restore destination-less TC rulesJeroen Massar1-1/+2
[ Upstream commit 8d0a5af8b1ba598e7340761729801624e7a9330e ] After IPsec policy/state TX rules are added, any TC flow rule, which forwards packets to uplink, is modified to forward to IPsec TX tables. As these tables are destroyed dynamically, whenever there is no reference to them, the destinations of this kind of rules must be restored to uplink, unless there is no destination for that rule. The flow rules FLOW_ACTION_ACCEPT, DROP, TRAP, GOTO and SAMPLE do not have a destination port, and thus out_count = 0. At cleanup time of the rules in mlx5_esw_ipsec_modify_flow_dests we call mlx5_eswitch_restore_ipsec_rule but as the above types do not have a destination we get an underflow of out_count, as the port is passed, which is esw_attr->out_count - 1. This change avoids calling mlx5_eswitch_restore_ipsec_rule when there are no output destinations and thus avoids the underflow. Fixes: d1569537a837 ("net/mlx5e: Modify and restore TC rules for IPSec TX rules") Signed-off-by: Jeroen Massar <jmassar@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260513063302.333761-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysovpn: disable BHs when updating device statsRalf Lici4-12/+28
[ Upstream commit 0c0dddc07d272a8d25922e48041e8e4d2434df7e ] ovpn updates dev->dstats from both process and softirq contexts. In particular, TCP paths may run from socket callbacks, workqueues or strparser work, while UDP receive and ovpn's ndo_start_xmit path may update the same per-device dstats from BH context. Add ovpn device drop-stat helpers that disable BHs around dev_dstats_rx_dropped() and dev_dstats_tx_dropped(), and use them for drop accounting. The successful RX dev_dstats_rx_add() update is already covered by the BH-disabled section around gro_cells_receive(). For the successful TCP TX dev_dstats_tx_add() update, replace the existing preempt-disabled section with a BH-disabled one. Fixes: 11851cbd60ea ("ovpn: implement TCP transport") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysovpn: fix race between deleting interface and adding new peerAntonio Quartulli2-13/+20
[ Upstream commit 982422b11e6f95f766a8cd2c2b1cbdb77e234a61 ] While deleting an existing ovpn interface, there is a very narrow window where adding a new peer via netlink may cause the netdevice to hang and prevent its unregistration. It may happen during ovpn_dellink(), when all existing peers are freed and the device is queued for deregistration, but a CMD_PEER_NEW message comes in adding a new peer that takes again a reference to the netdev. At this point there is no way to release the device because we are under the assumption that all peers were already released. Fix the race condition by releasing all peers in ndo_uninit(), when the netdevice has already been removed from the netdev list. Also ovpn_peer_add() has now an extra check that forces the function to bail out if the device reg_state is not REGISTERED. This way any incoming CMD_PEER_NEW racing with the interface deletion routine will simply stop before adding the peer. Note that the above check happens while holding the netdev_lock to prevent racing netdev state changes. ovpn_dellink() is now empty and can be removed. Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Closes: https://lore.kernel.org/netdev/aaVgJ16edTfQkYbx@v4bel/ Suggested-by: Sabrina Dubroca <sd@queasysnail.net> Fixes: 80747caef33d ("ovpn: introduce the ovpn_peer object") Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysovpn: respect peer refcount in CMD_NEW_PEER error pathDavid Carlier3-5/+6
[ Upstream commit 1fef6614673ff0846d30acdeeaf3cf98bb5f6116 ] ovpn_nl_peer_new_doit()'s error path calls ovpn_peer_release() directly rather than ovpn_peer_put(), bypassing the kref. The accompanying comment ("peer was not yet hashed, thus it is not used in any context") holds for UDP but not for TCP. For UDP, the ovpn_socket union uses the .ovpn arm and never points back at a peer; UDP encap_recv looks up peers via the not-yet-populated hashtables, so the new peer is unreachable until ovpn_peer_add() publishes it. For TCP, ovpn_socket_new() sets ovpn_sock->peer and ovpn_tcp_socket_attach() publishes ovpn_sock via rcu_assign_sk_user_data(). From that moment until ovpn_socket_release() detaches in the error path, the TCP fd is fully wired: userspace recvmsg / sendmsg / close / poll on the fd, as well as the strparser-driven ovpn_tcp_rcv() path, can reach the peer through sk_user_data -> ovpn_sock->peer and bump its refcount via ovpn_peer_hold(). ovpn_tcp_socket_wait_finish() (called inside ovpn_socket_release()) drains strparser and the tx work, but does not synchronize with userspace syscall callers that already hold a peer reference. If ovpn_nl_peer_modify() or ovpn_peer_add() returns an error while such a caller is in flight - notably an ovpn_tcp_recvmsg() blocked in __skb_recv_datagram() on peer->tcp.user_queue - the direct ovpn_peer_release() destroys the peer while the caller still holds the reference, and the eventual ovpn_peer_put() from that caller operates on freed memory. Replace the direct destructor call with ovpn_peer_put() so the kref correctly defers destruction until the last reference is dropped. In the common case where no concurrent user is present, behaviour is unchanged: the kref hits zero immediately and ovpn_peer_release_kref() runs the same destructor. With this conversion ovpn_peer_release() has no callers outside peer.c - ovpn_peer_release_kref() in the same translation unit is the only remaining user - so make it static and drop its declaration from peer.h. Fixes: 11851cbd60ea ("ovpn: implement TCP transport") Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysovpn: tcp - use cached peer pointer in ovpn_tcp_close()David Carlier1-2/+7
[ Upstream commit 775d8d7ad02aa345e1588424a6a8b9ae49fb9012 ] ovpn_tcp_close() loads the ovpn_socket via rcu_dereference_sk_user_data() under rcu_read_lock(), takes a reference on sock->peer, caches the peer pointer in a local, and drops the read lock. It then passes sock->peer (rather than the cached local) to ovpn_peer_del(), re-dereferencing the ovpn_socket after the RCU read section has ended. Unlike ovpn_tcp_sendmsg(), which uses the same "load under RCU, use after unlock" pattern but is protected by lock_sock() held across the function, ovpn_tcp_close() runs without the socket lock: inet_release() invokes sk_prot->close() without taking lock_sock first. ovpn_socket_release() can therefore complete its kref_put -> detach -> synchronize_rcu -> kfree(sock) sequence concurrently, in the window after ovpn_tcp_close() drops rcu_read_lock() but before it dereferences sock->peer. The synchronize_rcu() in ovpn_socket_release() protects readers that use the dereferenced pointer inside the RCU read section, not those that escape the pointer to a local and use it afterwards. A reproducer follows the pattern of commit 94560267d6c4 ("ovpn: tcp - don't deref NULL sk_socket member after tcp_close()"): trigger a peer removal (keepalive expiration or netlink OVPN_CMD_DEL_PEER) at the same moment userspace closes the TCP fd. That commit fixed the detach-side of the same race window; this one fixes the close-side at a different victim. Tighten the entry block to read sock->peer exactly once into the cached peer local, and route all subsequent uses (the hold check, the ovpn_peer_del() call, and the prot->close() invocation) through that local. sock->peer is only ever written once in ovpn_socket_new() under lock_sock(), before rcu_assign_sk_user_data() publishes the ovpn_socket, and is never reassigned afterwards - but the previous multi-read pattern made that invariant implicit rather than explicit. The same multi-read shape exists in ovpn_tcp_recvmsg(), ovpn_tcp_sendmsg(), ovpn_tcp_data_ready() and ovpn_tcp_write_space(); those will be cleaned up via a dedicated helper in a follow-up net-next series. Fixes: 11851cbd60ea ("ovpn: implement TCP transport") Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysnet: phy: DP83TC811: add reading of abilitiesSven Schuchmann1-0/+1
[ Upstream commit c78bdba7b9666020c0832150a4fc4c0aebc7c6ac ] At this time the driver is not listing any speeds it supports. This should be ETHTOOL_LINK_MODE_100baseT1_Full_BIT for DP83TC811. Add the missing call for phylib to read the abilities. Fixes: b753a9faaf9a ("net: phy: DP83TC811: Introduce support for the DP83TC811 phy") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Sven Schuchmann <schuchmann@schleissheimer.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260512071949.6218-1-schuchmann@schleissheimer.de [pabeni@redhat.com: dropped revision history] Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>