summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2026-03-25net: mvpp2: guard flow control update with global_tx_fc in buffer switchingMuhammad Hammad Ijaz1-2/+2
[ Upstream commit 8a63baadf08453f66eb582fdb6dd234f72024723 ] mvpp2_bm_switch_buffers() unconditionally calls mvpp2_bm_pool_update_priv_fc() when switching between per-cpu and shared buffer pool modes. This function programs CM3 flow control registers via mvpp2_cm3_read()/mvpp2_cm3_write(), which dereference priv->cm3_base without any NULL check. When the CM3 SRAM resource is not present in the device tree (the third reg entry added by commit 60523583b07c ("dts: marvell: add CM3 SRAM memory to cp11x ethernet device tree")), priv->cm3_base remains NULL and priv->global_tx_fc is false. Any operation that triggers mvpp2_bm_switch_buffers(), for example an MTU change that crosses the jumbo frame threshold, will crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits pc : readl+0x0/0x18 lr : mvpp2_cm3_read.isra.0+0x14/0x20 Call trace: readl+0x0/0x18 mvpp2_bm_pool_update_fc+0x40/0x12c mvpp2_bm_pool_update_priv_fc+0x94/0xd8 mvpp2_bm_switch_buffers.isra.0+0x80/0x1c0 mvpp2_change_mtu+0x140/0x380 __dev_set_mtu+0x1c/0x38 dev_set_mtu_ext+0x78/0x118 dev_set_mtu+0x48/0xa8 dev_ifsioc+0x21c/0x43c dev_ioctl+0x2d8/0x42c sock_ioctl+0x314/0x378 Every other flow control call site in the driver already guards hardware access with either priv->global_tx_fc or port->tx_fc. mvpp2_bm_switch_buffers() is the only place that omits this check. Add the missing priv->global_tx_fc guard to both the disable and re-enable calls in mvpp2_bm_switch_buffers(), consistent with the rest of the driver. Fixes: 3a616b92a9d1 ("net: mvpp2: Add TX flow control support for jumbo frames") Signed-off-by: Muhammad Hammad Ijaz <mhijaz@amazon.com> Reviewed-by: Gunnar Kudrjavets <gunnarku@amazon.com> Link: https://patch.msgid.link/20260316193157.65748-1-mhijaz@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net/mlx5e: Fix race condition during IPSec ESN updateJianbo Liu1-19/+14
[ Upstream commit beb6e2e5976a128b0cccf10d158124422210c5ef ] In IPSec full offload mode, the device reports an ESN (Extended Sequence Number) wrap event to the driver. The driver validates this event by querying the IPSec ASO and checking that the esn_event_arm field is 0x0, which indicates an event has occurred. After handling the event, the driver must re-arm the context by setting esn_event_arm back to 0x1. A race condition exists in this handling path. After validating the event, the driver calls mlx5_accel_esp_modify_xfrm() to update the kernel's xfrm state. This function temporarily releases and re-acquires the xfrm state lock. So, need to acknowledge the event first by setting esn_event_arm to 0x1. This prevents the driver from reprocessing the same ESN update if the hardware sends events for other reason. Since the next ESN update only occurs after nearly 2^31 packets are received, there's no risk of missing an update, as it will happen long after this handling has finished. Processing the event twice causes the ESN high-order bits (esn_msb) to be incremented incorrectly. The driver then programs the hardware with this invalid ESN state, which leads to anti-replay failures and a complete halt of IPSec traffic. Fix this by re-arming the ESN event immediately after it is validated, before calling mlx5_accel_esp_modify_xfrm(). This ensures that any spurious, duplicate events are correctly ignored, closing the race window. Fixes: fef06678931f ("net/mlx5e: Fix ESN update kernel panic") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260316094603.6999-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net/mlx5e: Prevent concurrent access to IPSec ASO contextJianbo Liu2-9/+9
[ Upstream commit 99b36850d881e2d65912b2520a1c80d0fcc9429a ] The query or updating IPSec offload object is through Access ASO WQE. The driver uses a single mlx5e_ipsec_aso struct for each PF, which contains a shared DMA-mapped context for all ASO operations. A race condition exists because the ASO spinlock is released before the hardware has finished processing WQE. If a second operation is initiated immediately after, it overwrites the shared context in the DMA area. When the first operation's completion is processed later, it reads this corrupted context, leading to unexpected behavior and incorrect results. This commit fixes the race by introducing a private context within each IPSec offload object. The shared ASO context is now copied to this private context while the ASO spinlock is held. Subsequent processing uses this saved, per-object context, ensuring its integrity is maintained. Fixes: 1ed78fc03307 ("net/mlx5e: Update IPsec soft and hard limits") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260316094603.6999-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net/mlx5: qos: Restrict RTNL area to avoid a lock cycleCosmin Ratiu1-14/+9
[ Upstream commit b7e3a5d9c0d66b7fb44f63aef3bd734821afa0c8 ] A lock dependency cycle exists where: 1. mlx5_ib_roce_init -> mlx5_core_uplink_netdev_event_replay -> mlx5_blocking_notifier_call_chain (takes notifier_rwsem) -> mlx5e_mdev_notifier_event -> mlx5_netdev_notifier_register -> register_netdevice_notifier_dev_net (takes rtnl) => notifier_rwsem -> rtnl 2. mlx5e_probe -> _mlx5e_probe -> mlx5_core_uplink_netdev_set (takes uplink_netdev_lock) -> mlx5_blocking_notifier_call_chain (takes notifier_rwsem) => uplink_netdev_lock -> notifier_rwsem 3: devlink_nl_rate_set_doit -> devlink_nl_rate_set -> mlx5_esw_devlink_rate_leaf_tx_max_set -> esw_qos_devlink_rate_to_mbps -> mlx5_esw_qos_max_link_speed_get (takes rtnl) -> mlx5_esw_qos_lag_link_speed_get_locked -> mlx5_uplink_netdev_get (takes uplink_netdev_lock) => rtnl -> uplink_netdev_lock => BOOM! (lock cycle) Fix that by restricting the rtnl-protected section to just the necessary part, the call to netdev_master_upper_dev_get and speed querying, so that the last lock dependency is avoided and the cycle doesn't close. This is safe because mlx5_uplink_netdev_get uses netdev_hold to keep the uplink netdev alive while its master device is queried. Use this opportunity to rename the ambiguously-named "hold_rtnl_lock" argument to "take_rtnl" and remove the "_locked" suffix from mlx5_esw_qos_lag_link_speed_get_locked. Fixes: 6b4be64fd9fe ("net/mlx5e: Harden uplink netdev access against device unbind") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260316094603.6999-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net: macb: fix uninitialized rx_fs_lockFedor Pchelkin1-0/+3
[ Upstream commit 34b11cc56e4369bc08b1f4c4a04222d75ed596ce ] If hardware doesn't support RX Flow Filters, rx_fs_lock spinlock is not initialized leading to the following assertion splat triggerable via set_rxnfc callback. INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 1 PID: 949 Comm: syz.0.6 Not tainted 6.1.164+ #113 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8d/0xba lib/dump_stack.c:106 assign_lock_key kernel/locking/lockdep.c:974 [inline] register_lock_class+0x141b/0x17f0 kernel/locking/lockdep.c:1287 __lock_acquire+0x74f/0x6c40 kernel/locking/lockdep.c:4928 lock_acquire kernel/locking/lockdep.c:5662 [inline] lock_acquire+0x190/0x4b0 kernel/locking/lockdep.c:5627 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x33/0x50 kernel/locking/spinlock.c:162 gem_del_flow_filter drivers/net/ethernet/cadence/macb_main.c:3562 [inline] gem_set_rxnfc+0x533/0xac0 drivers/net/ethernet/cadence/macb_main.c:3667 ethtool_set_rxnfc+0x18c/0x280 net/ethtool/ioctl.c:961 __dev_ethtool net/ethtool/ioctl.c:2956 [inline] dev_ethtool+0x229c/0x6290 net/ethtool/ioctl.c:3095 dev_ioctl+0x637/0x1070 net/core/dev_ioctl.c:510 sock_do_ioctl+0x20d/0x2c0 net/socket.c:1215 sock_ioctl+0x577/0x6d0 net/socket.c:1320 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x18c/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:46 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:76 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 A more straightforward solution would be to always initialize rx_fs_lock, just like rx_fs_list. However, in this case the driver set_rxnfc callback would return with a rather confusing error code, e.g. -EINVAL. So deny set_rxnfc attempts directly if the RX filtering feature is not supported by hardware. Fixes: ae8223de3df5 ("net: macb: Added support for RX filtering") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Link: https://patch.msgid.link/20260316103826.74506-2-pchelkin@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25iavf: fix VLAN filter lost on add/delete racePetr Oros1-3/+6
[ Upstream commit fc9c69be594756b81b54c6bc40803fa6052f35ae ] When iavf_add_vlan() finds an existing filter in IAVF_VLAN_REMOVE state, it transitions the filter to IAVF_VLAN_ACTIVE assuming the pending delete can simply be cancelled. However, there is no guarantee that iavf_del_vlans() has not already processed the delete AQ request and removed the filter from the PF. In that case the filter remains in the driver's list as IAVF_VLAN_ACTIVE but is no longer programmed on the NIC. Since iavf_add_vlans() only picks up filters in IAVF_VLAN_ADD state, the filter is never re-added, and spoof checking drops all traffic for that VLAN. CPU0 CPU1 Workqueue ---- ---- --------- iavf_del_vlan(vlan 100) f->state = REMOVE schedule AQ_DEL_VLAN iavf_add_vlan(vlan 100) f->state = ACTIVE iavf_del_vlans() f is ACTIVE, skip iavf_add_vlans() f is ACTIVE, skip Filter is ACTIVE in driver but absent from NIC. Transition to IAVF_VLAN_ADD instead and schedule IAVF_FLAG_AQ_ADD_VLAN_FILTER so iavf_add_vlans() re-programs the filter. A duplicate add is idempotent on the PF. Fixes: 0c0da0e95105 ("iavf: refactor VLAN filter states") Signed-off-by: Petr Oros <poros@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25igc: fix page fault in XDP TX timestamps handlingZdenek Bouska3-0/+42
[ Upstream commit 45b33e805bd39f615d9353a7194b2da5281332df ] If an XDP application that requested TX timestamping is shutting down while the link of the interface in use is still up the following kernel splat is reported: [ 883.803618] [ T1554] BUG: unable to handle page fault for address: ffffcfb6200fd008 ... [ 883.803650] [ T1554] Call Trace: [ 883.803652] [ T1554] <TASK> [ 883.803654] [ T1554] igc_ptp_tx_tstamp_event+0xdf/0x160 [igc] [ 883.803660] [ T1554] igc_tsync_interrupt+0x2d5/0x300 [igc] ... During shutdown of the TX ring the xsk_meta pointers are left behind, so that the IRQ handler is trying to touch them. This issue is now being fixed by cleaning up the stale xsk meta data on TX shutdown. TX timestamps on other queues remain unaffected. Fixes: 15fd021bc427 ("igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet") Signed-off-by: Zdenek Bouska <zdenek.bouska@siemens.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Florian Bezdeka <florian.bezdeka@siemens.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25igc: fix missing update of skb->tail in igc_xmit_frame()Kohei Enju1-5/+2
[ Upstream commit 0ffba246652faf4a36aedc66059c2f94e4c83ea5 ] igc_xmit_frame() misses updating skb->tail when the packet size is shorter than the minimum one. Use skb_put_padto() in alignment with other Intel Ethernet drivers. Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers") Signed-off-by: Kohei Enju <kohei@enjuk.jp> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net: airoha: Remove airoha_dev_stop() in airoha_remove()Lorenzo Bianconi1-1/+0
[ Upstream commit d4a533ad249e9fbdc2d0633f2ddd60a5b3a9a4ca ] Do not run airoha_dev_stop routine explicitly in airoha_remove() since ndo_stop() callback is already executed by unregister_netdev() in __dev_close_many routine if necessary and, doing so, we will end up causing an underflow in the qdma users atomic counters. Rely on networking subsystem to stop the device removing the airoha_eth module. Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260313-airoha-remove-ndo_stop-remove-net-v2-1-67542c3ceeca@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net: airoha: Read completion queue data in airoha_qdma_tx_napi_poll()Lorenzo Bianconi1-18/+13
[ Upstream commit 3affa310de523d63e52ea8e2efb3c476df29e414 ] In order to avoid any possible race, read completion queue head and pending entry in airoha_qdma_tx_napi_poll routine instead of doing it in airoha_irq_handler. Remove unused airoha_tx_irq_queue unused fields. This is a preliminary patch to add Qdisc offload for airoha_eth driver. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20241029-airoha-en7581-tx-napi-work-v1-1-96ad1686b946@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: d4a533ad249e ("net: airoha: Remove airoha_dev_stop() in airoha_remove()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net: airoha: fix PSE memory configuration in airoha_fe_pse_ports_init()Lorenzo Bianconi1-2/+4
[ Upstream commit 8e38e08f2c560328a873c35aff1a0dbea6a7d084 ] Align PSE memory configuration to vendor SDK. In particular, increase initial value of PSE reserved memory in airoha_fe_pse_ports_init() routine by the value used for the second Packet Processor Engine (PPE2) and do not overwrite the default value. Introduced by commit 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001-airoha-eth-pse-fix-v2-2-9a56cdffd074@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: d4a533ad249e ("net: airoha: Remove airoha_dev_stop() in airoha_remove()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net: airoha: read default PSE reserved pages value before updatingLorenzo Bianconi1-4/+10
[ Upstream commit 1f3e7ff4f296af1f4350f457d5bd82bc825e645a ] Store the default value for the number of PSE reserved pages in orig_val at the beginning of airoha_fe_set_pse_oq_rsv routine, before updating it with airoha_fe_set_pse_queue_rsv_pages(). Introduce airoha_fe_get_pse_all_rsv utility routine. Introduced by commit 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001-airoha-eth-pse-fix-v2-1-9a56cdffd074@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: d4a533ad249e ("net: airoha: Remove airoha_dev_stop() in airoha_remove()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net: mana: fix use-after-free in mana_hwc_destroy_channel() by reordering ↵Dipayaan Roy1-3/+3
teardown [ Upstream commit fa103fc8f56954a60699a29215cb713448a39e87 ] A potential race condition exists in mana_hwc_destroy_channel() where hwc->caller_ctx is freed before the HWC's Completion Queue (CQ) and Event Queue (EQ) are destroyed. This allows an in-flight CQ interrupt handler to dereference freed memory, leading to a use-after-free or NULL pointer dereference in mana_hwc_handle_resp(). mana_smc_teardown_hwc() signals the hardware to stop but does not synchronize against IRQ handlers already executing on other CPUs. The IRQ synchronization only happens in mana_hwc_destroy_cq() via mana_gd_destroy_eq() -> mana_gd_deregister_irq(). Since this runs after kfree(hwc->caller_ctx), a concurrent mana_hwc_rx_event_handler() can dereference freed caller_ctx (and rxq->msg_buf) in mana_hwc_handle_resp(). Fix this by reordering teardown to reverse-of-creation order: destroy the TX/RX work queues and CQ/EQ before freeing hwc->caller_ctx. This ensures all in-flight interrupt handlers complete before the memory they access is freed. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/abHA3AjNtqa1nx9k@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net: bcmgenet: increase WoL poll timeoutJustin Chen1-1/+1
[ Upstream commit 6cfc3bc02b977f2fba5f7268e6504d1931a774f7 ] Some systems require more than 5ms to get into WoL mode. Increase the timeout value to 50ms. Fixes: c51de7f3976b ("net: bcmgenet: add Wake-on-LAN support code") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20260312191852.3904571-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net: macb: Reinitialize tx/rx queue pointer registers and rx ring during resumeKevin Hao1-0/+10
[ Upstream commit 718d0766ce4c7634ce62fa78b526ea7263487edd ] On certain platforms, such as AMD Versal boards, the tx/rx queue pointer registers are cleared after suspend, and the rx queue pointer register is also disabled during suspend if WOL is enabled. Previously, we assumed that these registers would be restored by macb_mac_link_up(). However, in commit bf9cf80cab81, macb_init_buffers() was moved from macb_mac_link_up() to macb_open(). Therefore, we should call macb_init_buffers() to reinitialize the tx/rx queue pointer registers during resume. Due to the reset of these two registers, we also need to adjust the tx/rx rings accordingly. The tx ring will be handled by gem_shuffle_tx_rings() in macb_mac_link_up(), so we only need to initialize the rx ring here. Fixes: bf9cf80cab81 ("net: macb: Fix tx/rx malfunction after phy link down and up") Reported-by: Quanyang Wang <quanyang.wang@windriver.com> Signed-off-by: Kevin Hao <haokexin@gmail.com> Tested-by: Quanyang Wang <quanyang.wang@windriver.com> Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260312-macb-versal-v1-2-467647173fa4@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25net: macb: Introduce gem_init_rx_ring()Kevin Hao1-4/+9
[ Upstream commit 1a7124ecd655bcaf1845197fe416aa25cff4c3ea ] Extract the initialization code for the GEM RX ring into a new function. This change will be utilized in a subsequent patch. No functional changes are introduced. Signed-off-by: Kevin Hao <haokexin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260312-macb-versal-v1-1-467647173fa4@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 718d0766ce4c ("net: macb: Reinitialize tx/rx queue pointer registers and rx ring during resume") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25net: macb: fix use-after-free access to PTP clockFedor Pchelkin1-1/+3
commit 8da13e6d63c1a97f7302d342c89c4a56a55c7015 upstream. PTP clock is registered on every opening of the interface and destroyed on every closing. However it may be accessed via get_ts_info ethtool call which is possible while the interface is just present in the kernel. BUG: KASAN: use-after-free in ptp_clock_index+0x47/0x50 drivers/ptp/ptp_clock.c:426 Read of size 4 at addr ffff8880194345cc by task syz.0.6/948 CPU: 1 PID: 948 Comm: syz.0.6 Not tainted 6.1.164+ #109 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8d/0xba lib/dump_stack.c:106 print_address_description mm/kasan/report.c:316 [inline] print_report+0x17f/0x496 mm/kasan/report.c:420 kasan_report+0xd9/0x180 mm/kasan/report.c:524 ptp_clock_index+0x47/0x50 drivers/ptp/ptp_clock.c:426 gem_get_ts_info+0x138/0x1e0 drivers/net/ethernet/cadence/macb_main.c:3349 macb_get_ts_info+0x68/0xb0 drivers/net/ethernet/cadence/macb_main.c:3371 __ethtool_get_ts_info+0x17c/0x260 net/ethtool/common.c:558 ethtool_get_ts_info net/ethtool/ioctl.c:2367 [inline] __dev_ethtool net/ethtool/ioctl.c:3017 [inline] dev_ethtool+0x2b05/0x6290 net/ethtool/ioctl.c:3095 dev_ioctl+0x637/0x1070 net/core/dev_ioctl.c:510 sock_do_ioctl+0x20d/0x2c0 net/socket.c:1215 sock_ioctl+0x577/0x6d0 net/socket.c:1320 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x18c/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:46 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:76 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 </TASK> Allocated by task 457: kmalloc include/linux/slab.h:563 [inline] kzalloc include/linux/slab.h:699 [inline] ptp_clock_register+0x144/0x10e0 drivers/ptp/ptp_clock.c:235 gem_ptp_init+0x46f/0x930 drivers/net/ethernet/cadence/macb_ptp.c:375 macb_open+0x901/0xd10 drivers/net/ethernet/cadence/macb_main.c:2920 __dev_open+0x2ce/0x500 net/core/dev.c:1501 __dev_change_flags+0x56a/0x740 net/core/dev.c:8651 dev_change_flags+0x92/0x170 net/core/dev.c:8722 do_setlink+0xaf8/0x3a80 net/core/rtnetlink.c:2833 __rtnl_newlink+0xbf4/0x1940 net/core/rtnetlink.c:3608 rtnl_newlink+0x63/0xa0 net/core/rtnetlink.c:3655 rtnetlink_rcv_msg+0x3c6/0xed0 net/core/rtnetlink.c:6150 netlink_rcv_skb+0x15d/0x430 net/netlink/af_netlink.c:2511 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x6d7/0xa30 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x97e/0xeb0 net/netlink/af_netlink.c:1872 sock_sendmsg_nosec net/socket.c:718 [inline] __sock_sendmsg+0x14b/0x180 net/socket.c:730 __sys_sendto+0x320/0x3b0 net/socket.c:2152 __do_sys_sendto net/socket.c:2164 [inline] __se_sys_sendto net/socket.c:2160 [inline] __x64_sys_sendto+0xdc/0x1b0 net/socket.c:2160 do_syscall_x64 arch/x86/entry/common.c:46 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:76 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 938: kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1729 [inline] slab_free_freelist_hook mm/slub.c:1755 [inline] slab_free mm/slub.c:3687 [inline] __kmem_cache_free+0xbc/0x320 mm/slub.c:3700 device_release+0xa0/0x240 drivers/base/core.c:2507 kobject_cleanup lib/kobject.c:681 [inline] kobject_release lib/kobject.c:712 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x1cd/0x350 lib/kobject.c:729 put_device+0x1b/0x30 drivers/base/core.c:3805 ptp_clock_unregister+0x171/0x270 drivers/ptp/ptp_clock.c:391 gem_ptp_remove+0x4e/0x1f0 drivers/net/ethernet/cadence/macb_ptp.c:404 macb_close+0x1c8/0x270 drivers/net/ethernet/cadence/macb_main.c:2966 __dev_close_many+0x1b9/0x310 net/core/dev.c:1585 __dev_close net/core/dev.c:1597 [inline] __dev_change_flags+0x2bb/0x740 net/core/dev.c:8649 dev_change_flags+0x92/0x170 net/core/dev.c:8722 dev_ifsioc+0x151/0xe00 net/core/dev_ioctl.c:326 dev_ioctl+0x33e/0x1070 net/core/dev_ioctl.c:572 sock_do_ioctl+0x20d/0x2c0 net/socket.c:1215 sock_ioctl+0x577/0x6d0 net/socket.c:1320 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x18c/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:46 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:76 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Set the PTP clock pointer to NULL after unregistering. Fixes: c2594d804d5c ("macb: Common code to enable ptp support for MACB/GEM") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Link: https://patch.msgid.link/20260316103826.74506-1-pchelkin@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25net: stmmac: remove support for lpi_intr_oRussell King (Oracle)6-58/+0
commit 14eb64db8ff07b58a35b98375f446d9e20765674 upstream. The dwmac databook for v3.74a states that lpi_intr_o is a sideband signal which should be used to ungate the application clock, and this signal is synchronous to the receive clock. The receive clock can run at 2.5, 25 or 125MHz depending on the media speed, and can stop under the control of the link partner. This means that the time it takes to clear is dependent on the negotiated media speed, and thus can be 8, 40, or 400ns after reading the LPI control and status register. It has been observed with some aggressive link partners, this clock can stop while lpi_intr_o is still asserted, meaning that the signal remains asserted for an indefinite period that the local system has no direct control over. The LPI interrupts will still be signalled through the main interrupt path in any case, and this path is not dependent on the receive clock. This, since we do not gate the application clock, and the chances of adding clock gating in the future are slim due to the clocks being ill-defined, lpi_intr_o serves no useful purpose. Remove the code which requests the interrupt, and all associated code. Reported-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> Tested-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> # Renesas RZ/V2H board Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vnJbt-00000007YYN-28nm@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25Octeontx2-af: Add proper checks for fwdataHariprasad Kelam2-1/+4
[ Upstream commit 4a3dba48188208e4f66822800e042686784d29d1 ] firmware populates MAC address, link modes (supported, advertised) and EEPROM data in shared firmware structure which kernel access via MAC block(CGX/RPM). Accessing fwdata, on boards booted with out MAC block leading to kernel panics. Internal error: Oops: 0000000096000005 [#1] SMP [ 10.460721] Modules linked in: [ 10.463779] CPU: 0 UID: 0 PID: 174 Comm: kworker/0:3 Not tainted 6.19.0-rc5-00154-g76ec646abdf7-dirty #3 PREEMPT [ 10.474045] Hardware name: Marvell OcteonTX CN98XX board (DT) [ 10.479793] Workqueue: events work_for_cpu_fn [ 10.484159] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 10.491124] pc : rvu_sdp_init+0x18/0x114 [ 10.495051] lr : rvu_probe+0xe58/0x1d18 Fixes: 997814491cee ("Octeontx2-af: Fetch MAC channel info from firmware") Fixes: 5f21226b79fd ("Octeontx2-pf: ethtool: support multi advertise mode") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20260121094819.2566786-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Rajani Kantha <681739313@139.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25ice: fix devlink reload call tracePaul Greenwalt1-2/+1
[ Upstream commit d3f867e7a04678640ebcbfb81893c59f4af48586 ] Commit 4da71a77fc3b ("ice: read internal temperature sensor") introduced internal temperature sensor reading via HWMON. ice_hwmon_init() was added to ice_init_feature() and ice_hwmon_exit() was added to ice_remove(). As a result if devlink reload is used to reinit the device and then the driver is removed, a call trace can occur. BUG: unable to handle page fault for address: ffffffffc0fd4b5d Call Trace: string+0x48/0xe0 vsnprintf+0x1f9/0x650 sprintf+0x62/0x80 name_show+0x1f/0x30 dev_attr_show+0x19/0x60 The call trace repeats approximately every 10 minutes when system monitoring tools (e.g., sadc) attempt to read the orphaned hwmon sysfs attributes that reference freed module memory. The sequence is: 1. Driver load, ice_hwmon_init() gets called from ice_init_feature() 2. Devlink reload down, flow does not call ice_remove() 3. Devlink reload up, ice_hwmon_init() gets called from ice_init_feature() resulting in a second instance 4. Driver unload, ice_hwmon_exit() called from ice_remove() leaving the first hwmon instance orphaned with dangling pointer Fix this by moving ice_hwmon_exit() from ice_remove() to ice_deinit_features() to ensure proper cleanup symmetry with ice_hwmon_init(). Fixes: 4da71a77fc3b ("ice: read internal temperature sensor") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [ Adjust context. The context change is irrelevant to the current patch logic. ] Signed-off-by: Wenshan Lan <jetlan9@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25net: macb: Shuffle the tx ring before enabling txKevin Hao1-3/+95
[ Upstream commit 881a0263d502e1a93ebc13a78254e9ad19520232 ] Quanyang observed that when using an NFS rootfs on an AMD ZynqMp board, the rootfs may take an extended time to recover after a suspend. Upon investigation, it was determined that the issue originates from a problem in the macb driver. According to the Zynq UltraScale TRM [1], when transmit is disabled, the transmit buffer queue pointer resets to point to the address specified by the transmit buffer queue base address register. In the current implementation, the code merely resets `queue->tx_head` and `queue->tx_tail` to '0'. This approach presents several issues: - Packets already queued in the tx ring are silently lost, leading to memory leaks since the associated skbs cannot be released. - Concurrent write access to `queue->tx_head` and `queue->tx_tail` may occur from `macb_tx_poll()` or `macb_start_xmit()` when these values are reset to '0'. - The transmission may become stuck on a packet that has already been sent out, with its 'TX_USED' bit set, but has not yet been processed. However, due to the manipulation of 'queue->tx_head' and 'queue->tx_tail', `macb_tx_poll()` incorrectly assumes there are no packets to handle because `queue->tx_head == queue->tx_tail`. This issue is only resolved when a new packet is placed at this position. This is the root cause of the prolonged recovery time observed for the NFS root filesystem. To resolve this issue, shuffle the tx ring and tx skb array so that the first unsent packet is positioned at the start of the tx ring. Additionally, ensure that updates to `queue->tx_head` and `queue->tx_tail` are properly protected with the appropriate lock. [1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm Fixes: bf9cf80cab81 ("net: macb: Fix tx/rx malfunction after phy link down and up") Reported-by: Quanyang Wang <quanyang.wang@windriver.com> Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260307-zynqmp-v2-1-6ef98a70e1d0@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ adapted include block context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25gve: fix incorrect buffer cleanup in gve_tx_clean_pending_packets for QPLAnkit Garg1-30/+24
[ Upstream commit fb868db5f4bccd7a78219313ab2917429f715cea ] In DQ-QPL mode, gve_tx_clean_pending_packets() incorrectly uses the RDA buffer cleanup path. It iterates num_bufs times and attempts to unmap entries in the dma array. This leads to two issues: 1. The dma array shares storage with tx_qpl_buf_ids (union). Interpreting buffer IDs as DMA addresses results in attempting to unmap incorrect memory locations. 2. num_bufs in QPL mode (counting 2K chunks) can significantly exceed the size of the dma array, causing out-of-bounds access warnings (trace below is how we noticed this issue). UBSAN: array-index-out-of-bounds in drivers/net/ethernet/drivers/net/ethernet/google/gve/gve_tx_dqo.c:178:5 index 18 is out of range for type 'dma_addr_t[18]' (aka 'unsigned long long[18]') Workqueue: gve gve_service_task [gve] Call Trace: <TASK> dump_stack_lvl+0x33/0xa0 __ubsan_handle_out_of_bounds+0xdc/0x110 gve_tx_stop_ring_dqo+0x182/0x200 [gve] gve_close+0x1be/0x450 [gve] gve_reset+0x99/0x120 [gve] gve_service_task+0x61/0x100 [gve] process_scheduled_works+0x1e9/0x380 Fix this by properly checking for QPL mode and delegating to gve_free_tx_qpl_bufs() to reclaim the buffers. Cc: stable@vger.kernel.org Fixes: a6fb8d5a8b69 ("gve: Tx path for DQO-QPL") Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260220215324.1631350-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ netmem_dma_unmap_page_attrs() => dma_unmap_page() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25net: ethernet: arc: emac: quiesce interrupts before requesting IRQFan Wu1-0/+11
commit 2503d08f8a2de618e5c3a8183b250ff4a2e2d52c upstream. Normal RX/TX interrupts are enabled later, in arc_emac_open(), so probe should not see interrupt delivery in the usual case. However, hardware may still present stale or latched interrupt status left by firmware or the bootloader. If probe later unwinds after devm_request_irq() has installed the handler, such a stale interrupt can still reach arc_emac_intr() during teardown and race with release of the associated net_device. Avoid that window by putting the device into a known quiescent state before requesting the IRQ: disable all EMAC interrupt sources and clear any pending EMAC interrupt status bits. This keeps the change hardware-focused and minimal, while preventing spurious IRQ delivery from leftover state. Fixes: e4f2379db6c6 ("ethernet/arc/arc_emac - Add new driver") Cc: stable@vger.kernel.org Signed-off-by: Fan Wu <fanwu01@zju.edu.cn> Link: https://patch.msgid.link/20260309132409.584966-1-fanwu01@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25ice: fix retry for AQ command 0x06EEJakub Staniszewski2-21/+15
commit fb4903b3354aed4a2301180cf991226f896c87ed upstream. Executing ethtool -m can fail reporting a netlink I/O error while firmware link management holds the i2c bus used to communicate with the module. According to Intel(R) Ethernet Controller E810 Datasheet Rev 2.8 [1] Section 3.3.10.4 Read/Write SFF EEPROM (0x06EE) request should to be retried upon receiving EBUSY from firmware. Commit e9c9692c8a81 ("ice: Reimplement module reads used by ethtool") implemented it only for part of ice_get_module_eeprom(), leaving all other calls to ice_aq_sff_eeprom() vulnerable to returning early on getting EBUSY without retrying. Remove the retry loop from ice_get_module_eeprom() and add Admin Queue (AQ) command with opcode 0x06EE to the list of commands that should be retried on receiving EBUSY from firmware. Cc: stable@vger.kernel.org Fixes: e9c9692c8a81 ("ice: Reimplement module reads used by ethtool") Signed-off-by: Jakub Staniszewski <jakub.staniszewski@linux.intel.com> Co-developed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://www.intel.com/content/www/us/en/content-details/613875/intel-ethernet-controller-e810-datasheet.html [1] Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25net: mana: Ring doorbell at 4 CQ wraparoundsLong Li1-5/+18
commit dabffd08545ffa1d7183bc45e387860984025291 upstream. MANA hardware requires at least one doorbell ring every 8 wraparounds of the CQ. The driver rings the doorbell as a form of flow control to inform hardware that CQEs have been consumed. The NAPI poll functions mana_poll_tx_cq() and mana_poll_rx_cq() can poll up to CQE_POLLING_BUFFER (512) completions per call. If the CQ has fewer than 512 entries, a single poll call can process more than 4 wraparounds without ringing the doorbell. The doorbell threshold check also uses ">" instead of ">=", delaying the ring by one extra CQE beyond 4 wraparounds. Combined, these issues can cause the driver to exceed the 8-wraparound hardware limit, leading to missed completions and stalled queues. Fix this by capping the number of CQEs polled per call to 4 wraparounds of the CQ in both TX and RX paths. Also change the doorbell threshold from ">" to ">=" so the doorbell is rung as soon as 4 wraparounds are reached. Cc: stable@vger.kernel.org Fixes: 58a63729c957 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings") Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260226192833.1050807-1-longli@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25ixgbevf: fix link setup issueJedrzej Jagielski1-1/+2
commit feae40a6a178bb525a15f19288016e5778102a99 upstream. It may happen that VF spawned for E610 adapter has problem with setting link up. This happens when ixgbevf supporting mailbox API 1.6 cooperates with PF driver which doesn't support this version of API, and hence doesn't support new approach for getting PF link data. In that case VF asks PF to provide link data but as PF doesn't support it, returns -EOPNOTSUPP what leads to early bail from link configuration sequence. Avoid such situation by using legacy VFLINKS approach whenever negotiated API version is less than 1.6. To reproduce the issue just create VF and set its link up - adapter must be any from the E610 family, ixgbevf must support API 1.6 or higher while ixgbevf must not. Fixes: 53f0eb62b4d2 ("ixgbevf: fix getting link speed data for E610 devices") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Cc: stable@vger.kernel.org Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25ice: reintroduce retry mechanism for indirect AQJakub Staniszewski1-3/+9
commit 326256c0a72d4877cec1d4df85357da106233128 upstream. Add retry mechanism for indirect Admin Queue (AQ) commands. To do so we need to keep the command buffer. This technically reverts commit 43a630e37e25 ("ice: remove unused buffer copy code in ice_sq_send_cmd_retry()"), but combines it with a fix in the logic by using a kmemdup() call, making it more robust and less likely to break in the future due to programmer error. Cc: Michal Schmidt <mschmidt@redhat.com> Cc: stable@vger.kernel.org Fixes: 3056df93f7a8 ("ice: Re-send some AQ commands, as result of EBUSY AQ error") Signed-off-by: Jakub Staniszewski <jakub.staniszewski@linux.intel.com> Co-developed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25octeontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt statusAlok Tiwari1-2/+2
[ Upstream commit 87f7dff3ec75b91def0024ebaaf732457f47a63b ] The NIX RAS health report path uses nix_af_rvu_err when handling the NIX_AF_RVU_RAS case, so the report prints the ERR interrupt status rather than the RAS interrupt status. Use nix_af_rvu_ras for the NIX_AF_RVU_RAS report. Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20260310184824.1183651-2-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25octeontx2-af: devlink: fix NIX RAS reporter recovery conditionAlok Tiwari1-1/+1
[ Upstream commit dc26ca99b835e21e76a58b1463b84adb0ca34f58 ] The NIX RAS health reporter recovery routine checks nix_af_rvu_int to decide whether to re-enable NIX_AF_RAS interrupts. This is the RVU interrupt status field and is unrelated to RAS events, so the recovery flow may incorrectly skip re-enabling NIX_AF_RAS interrupts. Check nix_af_rvu_ras instead before writing NIX_AF_RAS_ENA_W1S. Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20260310184824.1183651-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net: bcmgenet: fix broken EEE by converting to phylib-managed stateNicolai Buchwitz3-28/+18
[ Upstream commit 908c344d5cfac4160f49715da9efacdf5b6a28bd ] The bcmgenet EEE implementation is broken in several ways. phy_support_eee() is never called, so the PHY never advertises EEE and phylib never sets phydev->enable_tx_lpi. bcmgenet_mac_config() checks priv->eee.eee_enabled to decide whether to enable the MAC LPI logic, but that field is never initialised to true, so the MAC never enters Low Power Idle even when EEE is negotiated - wasting the power savings EEE is designed to provide. The only way to get EEE working at all is a manual 'ethtool --set-eee eth0 eee on' after every link-up, and even then bcmgenet_get_eee() immediately clobbers the reported state because phy_ethtool_get_eee() overwrites eee_enabled and tx_lpi_enabled with the uninitialised PHY eee_cfg values. Finally, bcmgenet_mac_config() is only called on link-up, so EEE is never disabled in hardware on link-down. Fix all of this by removing the MAC-side EEE state tracking (priv->eee) and aligning with the pattern used by other non-phylink MAC drivers such as FEC. Call phy_support_eee() in bcmgenet_mii_probe() so the PHY advertises EEE link modes and phylib tracks negotiation state. Move the EEE hardware control to bcmgenet_mii_setup(), which is called on every link event, and drive it directly from phydev->enable_tx_lpi - the flag phylib sets when EEE is negotiated and the user has not disabled it. This enables EEE automatically once the link partner agrees and disables it cleanly on link-down. Make bcmgenet_get_eee() and bcmgenet_set_eee() pure passthroughs to phy_ethtool_get_eee() and phy_ethtool_set_eee(), with the MAC hardware register read/written for tx_lpi_timer. Drop struct ethtool_keee eee from struct bcmgenet_priv. Fixes: fe0d4fd9285e ("net: phy: Keep track of EEE configuration") Link: https://lore.kernel.org/netdev/d352039f-4cbb-41e6-9aeb-0b4f3941b54c@lunn.ch/ Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20260310054935.1238594-1-nb@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25e1000/e1000e: Fix leak in DMA error cleanupMatt Vollrath2-4/+0
[ Upstream commit e94eaef11142b01f77bf8ba4d0b59720b7858109 ] If an error is encountered while mapping TX buffers, the driver should unmap any buffers already mapped for that skb. Because count is incremented after a successful mapping, it will always match the correct number of unmappings needed when dma_error is reached. Decrementing count before the while loop in dma_error causes an off-by-one error. If any mapping was successful before an unsuccessful mapping, exactly one DMA mapping would leak. In these commits, a faulty while condition caused an infinite loop in dma_error: Commit 03b1320dfcee ("e1000e: remove use of skb_dma_map from e1000e driver") Commit 602c0554d7b0 ("e1000: remove use of skb_dma_map from e1000 driver") Commit c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()") fixed the infinite loop, but introduced the off-by-one error. This issue may still exist in the igbvf driver, but I did not address it in this patch. Fixes: c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()") Assisted-by: Claude:claude-4.6-opus Signed-off-by: Matt Vollrath <tactii@gmail.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25i40e: fix src IP mask checks and memcpy argument names in cloud filterAlok Tiwari1-7/+7
[ Upstream commit e809085f492842ce7a519c9ef72d40f4bca89c13 ] Fix following issues in the IPv4 and IPv6 cloud filter handling logic in both the add and delete paths: - The source-IP mask check incorrectly compares mask.src_ip[0] against tcf.dst_ip[0]. Update it to compare against tcf.src_ip[0]. This likely goes unnoticed because the check is in an "else if" path that only executes when dst_ip is not set, most cloud filter use cases focus on destination-IP matching, and the buggy condition can accidentally evaluate true in some cases. - memcpy() for the IPv4 source address incorrectly uses ARRAY_SIZE(tcf.dst_ip) instead of ARRAY_SIZE(tcf.src_ip), although both arrays are the same size. - The IPv4 memcpy operations used ARRAY_SIZE(tcf.dst_ip) and ARRAY_SIZE (tcf.src_ip), Update these to use sizeof(cfilter->ip.v4.dst_ip) and sizeof(cfilter->ip.v4.src_ip) to ensure correct and explicit copy size. - In the IPv6 delete path, memcmp() uses sizeof(src_ip6) when comparing dst_ip6 fields. Replace this with sizeof(dst_ip6) to make the intent explicit, even though both fields are struct in6_addr. Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25amd-xgbe: prevent CRC errors during RX adaptation with AN disabledRaju Rangoju3-2/+69
[ Upstream commit 27a4dd0c702b3b2b9cf2c045d100cc2fe8720b81 ] When operating in 10GBASE-KR mode with auto-negotiation disabled and RX adaptation enabled, CRC errors can occur during the RX adaptation process. This happens because the driver continues transmitting and receiving packets while adaptation is in progress. Fix this by stopping TX/RX immediately when the link goes down and RX adaptation needs to be re-triggered, and only re-enabling TX/RX after adaptation completes and the link is confirmed up. Introduce a flag to track whether TX/RX was disabled for adaptation so it can be restored correctly. This prevents packets from being transmitted or received during the RX adaptation window and avoids CRC errors from corrupted frames. The flag tracking the data path state is synchronized with hardware state in xgbe_start() to prevent stale state after device restarts. This ensures that after a restart cycle (where xgbe_stop disables TX/RX and xgbe_start re-enables them), the flag correctly reflects that the data path is active. Fixes: 4f3b20bfbb75 ("amd-xgbe: add support for rx-adaptation") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260306111629.1515676-3-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25amd-xgbe: fix link status handling in xgbe_rx_adaptationRaju Rangoju1-5/+14
[ Upstream commit 6485cb96be5cd0f4bf39554737ba11322cc9b053 ] The link status bit is latched low to allow detection of momentary link drops. If the status indicates that the link is already down, read it again to obtain the current state. Fixes: 4f3b20bfbb75 ("amd-xgbe: add support for rx-adaptation") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260306111629.1515676-2-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25bnxt_en: Fix RSS table size check when changing ethtool channelsPavan Chebbi1-2/+2
[ Upstream commit 0d9a60a0618d255530ca56072c5f39eb58e1ed4a ] When changing channels, the current check in bnxt_set_channels() is not checking for non-default RSS contexts when the RSS table size changes. The current check for IFF_RXFH_CONFIGURED is only sufficient for the default RSS context. Expand the check to include the presence of any non-default RSS contexts. Allowing such change will result in incorrect configuration of the context's RSS table when the table size changes. Fixes: b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") Reported-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/netdev/20260303181535.2671734-1-bjorn@kernel.org/ Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260306225854.3575672-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net/mlx5e: Fix DMA FIFO desync on error CQE SQ recoveryGal Pressman1-1/+0
[ Upstream commit 1633111d69053512d099658d4a05fc736fab36b0 ] In case of a TX error CQE, a recovery flow is triggered, mlx5e_reset_txqsq_cc_pc() resets dma_fifo_cc to 0 but not dma_fifo_pc, desyncing the DMA FIFO producer and consumer. After recovery, the producer pushes new DMA entries at the old dma_fifo_pc, while the consumer reads from position 0. This causes us to unmap stale DMA addresses from before the recovery. The DMA FIFO is a purely software construct with no HW counterpart. At the point of reset, all WQEs have been flushed so dma_fifo_cc is already equal to dma_fifo_pc. There is no need to reset either counter, similar to how skb_fifo pc/cc are untouched. Remove the 'dma_fifo_cc = 0' reset. This fixes the following WARNING: WARNING: CPU: 0 PID: 0 at drivers/iommu/dma-iommu.c:1240 iommu_dma_unmap_page+0x79/0x90 Modules linked in: mlx5_vdpa vringh vdpa bonding mlx5_ib mlx5_vfio_pci ipip mlx5_fwctl tunnel4 mlx5_core ib_ipoib geneve ip6_gre ip_gre gre nf_tables ip6_tunnel rdma_ucm ib_uverbs ib_umad vfio_pci vfio_pci_core act_mirred act_skbedit act_vlan vhost_net vhost tap ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress vhost_iotlb iptable_raw tunnel6 vfio_iommu_type1 vfio openvswitch nsh rpcsec_gss_krb5 auth_rpcgss oid_registry xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter overlay zram zsmalloc rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core fuse [last unloaded: nf_tables] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc5_for_upstream_min_debug_2024_12_30_21_33 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:iommu_dma_unmap_page+0x79/0x90 Code: 2b 4d 3b 21 72 26 4d 3b 61 08 73 20 49 89 d8 44 89 f9 5b 4c 89 f2 4c 89 e6 48 89 ef 5d 41 5c 41 5d 41 5e 41 5f e9 c7 ae 9e ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f 84 00 00 00 00 Call Trace: <IRQ> ? __warn+0x7d/0x110 ? iommu_dma_unmap_page+0x79/0x90 ? report_bug+0x16d/0x180 ? handle_bug+0x4f/0x90 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? iommu_dma_unmap_page+0x79/0x90 ? iommu_dma_unmap_page+0x2e/0x90 dma_unmap_page_attrs+0x10d/0x1b0 mlx5e_tx_wi_dma_unmap+0xbe/0x120 [mlx5_core] mlx5e_poll_tx_cq+0x16d/0x690 [mlx5_core] mlx5e_napi_poll+0x8b/0xac0 [mlx5_core] __napi_poll+0x24/0x190 net_rx_action+0x32a/0x3b0 ? mlx5_eq_comp_int+0x7e/0x270 [mlx5_core] ? notifier_call_chain+0x35/0xa0 handle_softirqs+0xc9/0x270 irq_exit_rcu+0x71/0xd0 common_interrupt+0x7f/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 Fixes: db75373c91b0 ("net/mlx5e: Recover Send Queue (SQ) from error state") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260305142634.1813208-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net/mlx5: Fix crash when moving to switchdev modePatrisious Haddad1-1/+1
[ Upstream commit 24b2795f9683e092dc22a68f487e7aaaf2ddafea ] When moving to switchdev mode when the device doesn't support IPsec, we try to clean up the IPsec resources anyway which causes the crash below, fix that by correctly checking for IPsec support before trying to clean up its resources. [27642.515799] WARNING: arch/x86/mm/fault.c:1276 at do_user_addr_fault+0x18a/0x680, CPU#4: devlink/6490 [27642.517159] Modules linked in: xt_conntrack xt_MASQUERADE ip6table_nat ip6table_filter ip6_tables iptable_nat nf_nat xt_addrtype rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_fwctl nfnetlink zram zsmalloc mlx5_ib fuse rpcrdma rdma_ucm ib_uverbs ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_core ib_core [27642.521358] CPU: 4 UID: 0 PID: 6490 Comm: devlink Not tainted 6.19.0-rc5_for_upstream_min_debug_2026_01_14_16_47 #1 NONE [27642.522923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [27642.524528] RIP: 0010:do_user_addr_fault+0x18a/0x680 [27642.525362] Code: ff 0f 84 75 03 00 00 48 89 ee 4c 89 e7 e8 5e b9 22 00 49 89 c0 48 85 c0 0f 84 a8 02 00 00 f7 c3 60 80 00 00 74 22 31 c9 eb ae <0f> 0b 48 83 c4 10 48 89 ea 48 89 de 4c 89 f7 5b 5d 41 5c 41 5d 41 [27642.528166] RSP: 0018:ffff88810770f6b8 EFLAGS: 00010046 [27642.529038] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff88810b980f00 [27642.530158] RDX: 00000000000000a0 RSI: 0000000000000002 RDI: ffff88810770f728 [27642.531270] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000000 [27642.532383] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888103f3c4c0 [27642.533499] R13: 0000000000000000 R14: ffff88810770f728 R15: 0000000000000000 [27642.534614] FS: 00007f197c741740(0000) GS:ffff88856a94c000(0000) knlGS:0000000000000000 [27642.535915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [27642.536858] CR2: 00000000000000a0 CR3: 000000011334c003 CR4: 0000000000172eb0 [27642.537982] Call Trace: [27642.538466] <TASK> [27642.538907] exc_page_fault+0x76/0x140 [27642.539583] asm_exc_page_fault+0x22/0x30 [27642.540282] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30 [27642.541134] Code: 07 85 c0 75 11 ba ff 00 00 00 f0 0f b1 17 75 06 b8 01 00 00 00 c3 31 c0 c3 90 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 7e 02 00 00 48 89 d8 5b [27642.543936] RSP: 0018:ffff88810770f7d8 EFLAGS: 00010046 [27642.544803] RAX: 0000000000000000 RBX: 0000000000000202 RCX: ffff888113ad96d8 [27642.545916] RDX: 0000000000000001 RSI: ffff88810770f818 RDI: 00000000000000a0 [27642.547027] RBP: 0000000000000098 R08: 0000000000000400 R09: ffff88810b980f00 [27642.548140] R10: 0000000000000001 R11: ffff888101845a80 R12: 00000000000000a8 [27642.549263] R13: ffffffffa02a9060 R14: 00000000000000a0 R15: ffff8881130d8a40 [27642.550379] complete_all+0x20/0x90 [27642.551010] mlx5e_ipsec_disable_events+0xb6/0xf0 [mlx5_core] [27642.552022] mlx5e_nic_disable+0x12d/0x220 [mlx5_core] [27642.552929] mlx5e_detach_netdev+0x66/0xf0 [mlx5_core] [27642.553822] mlx5e_netdev_change_profile+0x5b/0x120 [mlx5_core] [27642.554821] mlx5e_vport_rep_load+0x419/0x590 [mlx5_core] [27642.555757] ? xa_load+0x53/0x90 [27642.556361] __esw_offloads_load_rep+0x54/0x70 [mlx5_core] [27642.557328] mlx5_esw_offloads_rep_load+0x45/0xd0 [mlx5_core] [27642.558320] esw_offloads_enable+0xb4b/0xc90 [mlx5_core] [27642.559247] mlx5_eswitch_enable_locked+0x34e/0x4f0 [mlx5_core] [27642.560257] ? mlx5_rescan_drivers_locked+0x222/0x2d0 [mlx5_core] [27642.561284] mlx5_devlink_eswitch_mode_set+0x5ac/0x9c0 [mlx5_core] [27642.562334] ? devlink_rate_set_ops_supported+0x21/0x3a0 [27642.563220] devlink_nl_eswitch_set_doit+0x67/0xe0 [27642.564026] genl_family_rcv_msg_doit+0xe0/0x130 [27642.564816] genl_rcv_msg+0x183/0x290 [27642.565466] ? __devlink_nl_pre_doit.isra.0+0x160/0x160 [27642.566329] ? devlink_nl_eswitch_get_doit+0x290/0x290 [27642.567181] ? devlink_nl_pre_doit_parent_dev_optional+0x20/0x20 [27642.568147] ? genl_family_rcv_msg_dumpit+0xf0/0xf0 [27642.568966] netlink_rcv_skb+0x4b/0xf0 [27642.569629] genl_rcv+0x24/0x40 [27642.570215] netlink_unicast+0x255/0x380 [27642.570901] ? __alloc_skb+0xfa/0x1e0 [27642.571560] netlink_sendmsg+0x1f3/0x420 [27642.572249] __sock_sendmsg+0x38/0x60 [27642.572911] __sys_sendto+0x119/0x180 [27642.573561] ? __sys_recvmsg+0x5c/0xb0 [27642.574227] __x64_sys_sendto+0x20/0x30 [27642.574904] do_syscall_64+0x55/0xc10 [27642.575554] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [27642.576391] RIP: 0033:0x7f197c85e807 [27642.577050] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 08 0d 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0 [27642.579846] RSP: 002b:00007ffebd4e2248 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [27642.581082] RAX: ffffffffffffffda RBX: 000055cfcd9cd2a0 RCX: 00007f197c85e807 [27642.582200] RDX: 0000000000000038 RSI: 000055cfcd9cd490 RDI: 0000000000000003 [27642.583320] RBP: 00007ffebd4e2290 R08: 00007f197c942200 R09: 000000000000000c [27642.584437] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [27642.585555] R13: 000055cfcd9cd490 R14: 00007ffebd4e45d1 R15: 000055cfcd9cd2a0 [27642.586671] </TASK> [27642.587121] ---[ end trace 0000000000000000 ]--- [27642.587910] BUG: kernel NULL pointer dereference, address: 00000000000000a0 Fixes: 664f76be38a1 ("net/mlx5: Fix IPsec cleanup over MPV device") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260305142634.1813208-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net/mlx5: Fix deadlock between devlink lock and esw->wqCosmin Ratiu3-8/+19
[ Upstream commit aed763abf0e905b4b8d747d1ba9e172961572f57 ] esw->work_queue executes esw_functions_changed_event_handler -> esw_vfs_changed_event_handler and acquires the devlink lock. .eswitch_mode_set (acquires devlink lock in devlink_nl_pre_doit) -> mlx5_devlink_eswitch_mode_set -> mlx5_eswitch_disable_locked -> mlx5_eswitch_event_handler_unregister -> flush_workqueue deadlocks when esw_vfs_changed_event_handler executes. Fix that by no longer flushing the work to avoid the deadlock, and using a generation counter to keep track of work relevance. This avoids an old handler manipulating an esw that has undergone one or more mode changes: - the counter is incremented in mlx5_eswitch_event_handler_unregister. - the counter is read and passed to the ephemeral mlx5_host_work struct. - the work handler takes the devlink lock and bails out if the current generation is different than the one it was scheduled to operate on. - mlx5_eswitch_cleanup does the final draining before destroying the wq. No longer flushing the workqueue has the side effect of maybe no longer cancelling pending vport_change_handler work items, but that's ok since those are disabled elsewhere: - mlx5_eswitch_disable_locked disables the vport eq notifier. - mlx5_esw_vport_disable disarms the HW EQ notification and marks vport->enabled under state_lock to false to prevent pending vport handler from doing anything. - mlx5_eswitch_cleanup destroys the workqueue and makes sure all events are disabled/finished. Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260305081019.1811100-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25net/mlx5: Query to see if host PF is disabledDaniel Jurgens2-0/+24
[ Upstream commit 9e84de72aef9bcf0e751a0bff3ac91b0cf52366f ] The host PF can be disabled, query firmware to check if the host PF of this function exists. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1755112796-467444-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: aed763abf0e9 ("net/mlx5: Fix deadlock between devlink lock and esw->wq") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13i40e: use xdp.frame_sz as XDP RxQ info frag_sizeLarysa Zaremba1-3/+6
[ Upstream commit c69d22c6c46a1d792ba8af3d8d6356fdc0e6f538 ] The only user of frag_size field in XDP RxQ info is bpf_xdp_frags_increase_tail(). It clearly expects whole buffer size instead of DMA write size. Different assumptions in i40e driver configuration lead to negative tailroom. Set frag_size to the same value as frame_sz in shared pages mode, use new helper to set frag_size when AF_XDP ZC is active. Fixes: a045d2f2d03d ("i40e: set xdp_rxq_info::frag_size") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20260305111253.2317394-7-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13i40e: fix registering XDP RxQ infoLarysa Zaremba2-17/+22
[ Upstream commit 8f497dc8a61429cc004720aa8e713743355d80cf ] Current way of handling XDP RxQ info in i40e has a problem, where frag_size is not updated when xsk_buff_pool is detached or when MTU is changed, this leads to growing tail always failing for multi-buffer packets. Couple XDP RxQ info registering with buffer allocations and unregistering with cleaning the ring. Fixes: a045d2f2d03d ("i40e: set xdp_rxq_info::frag_size") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20260305111253.2317394-6-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13net: ethernet: mtk_eth_soc: Reset prog ptr to old_prog in case of error in ↵Lorenzo Bianconi1-3/+12
mtk_xdp_setup() [ Upstream commit 0abc73c8a40fd64ac1739c90bb4f42c418d27a5e ] Reset eBPF program pointer to old_prog and do not decrease its ref-count if mtk_open routine in mtk_xdp_setup() fails. Fixes: 7c26c20da5d42 ("net: ethernet: mtk_eth_soc: add basic XDP support") Suggested-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260303-mtk-xdp-prog-ptr-fix-v2-1-97b6dbbe240f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13net: stmmac: Fix error handling in VLAN add and delete pathsOvidiu Panait1-4/+14
[ Upstream commit 35dfedce442c4060cfe5b98368bc9643fb995716 ] stmmac_vlan_rx_add_vid() updates active_vlans and the VLAN hash register before writing the HW filter entry. If the filter write fails, it leaves a stale VID in active_vlans and the hash register. stmmac_vlan_rx_kill_vid() has the reverse problem: it clears active_vlans before removing the HW filter. On failure, the VID is gone from active_vlans but still present in the HW filter table. To fix this, reorder the operations to update the hash table first, then attempt the HW filter operation. If the HW filter fails, roll back both the active_vlans bitmap and the hash table by calling stmmac_vlan_update() again. Fixes: ed64639bc1e0 ("net: stmmac: Add support for VLAN Rx filtering") Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> Link: https://patch.msgid.link/20260303145828.7845-2-ovidiu.panait.rb@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13amd-xgbe: fix sleep while atomic on suspend/resumeRaju Rangoju3-14/+0
[ Upstream commit e2f27363aa6d983504c6836dd0975535e2e9dba0 ] The xgbe_powerdown() and xgbe_powerup() functions use spinlocks (spin_lock_irqsave) while calling functions that may sleep: - napi_disable() can sleep waiting for NAPI polling to complete - flush_workqueue() can sleep waiting for pending work items This causes a "BUG: scheduling while atomic" error during suspend/resume cycles on systems using the AMD XGBE Ethernet controller. The spinlock protection in these functions is unnecessary as these functions are called from suspend/resume paths which are already serialized by the PM core Fix this by removing the spinlock. Since only code that takes this lock is xgbe_powerdown() and xgbe_powerup(), remove it completely. Fixes: c5aa9e3b8156 ("amd-xgbe: Initial AMD 10GbE platform driver") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260302042124.1386445-1-Raju.Rangoju@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13iavf: fix netdev->max_mtu to respect actual hardware limitKohei Enju1-1/+16
[ Upstream commit b84852170153671bb0fa6737a6e48370addd8e1a ] iavf sets LIBIE_MAX_MTU as netdev->max_mtu, ignoring vf_res->max_mtu from PF [1]. This allows setting an MTU beyond the actual hardware limit, causing TX queue timeouts [2]. Set correct netdev->max_mtu using vf_res->max_mtu from the PF. Note that currently PF drivers such as ice/i40e set the frame size in vf_res->max_mtu, not MTU. Convert vf_res->max_mtu to MTU before setting netdev->max_mtu. [1] # ip -j -d link show $DEV | jq '.[0].max_mtu' 16356 [2] iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5692 ms iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 6: transmit queue 3 timed out 5312 ms iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex ... Fixes: 5fa4caff59f2 ("iavf: switch to Page Pool") Signed-off-by: Kohei Enju <kohei@enjuk.jp> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13octeon_ep_vf: avoid compiler and IQ/OQ reorderingVimlesh Kumar2-16/+33
[ Upstream commit 6c73126ecd1080351b468fe43353b2f705487f44 ] Utilize READ_ONCE and WRITE_ONCE APIs for IO queue Tx/Rx variable access to prevent compiler optimization and reordering. Additionally, ensure IO queue OUT/IN_CNT registers are flushed by performing a read-back after writing. The compiler could reorder reads/writes to pkts_pending, last_pkt_count, etc., causing stale values to be used when calculating packets to process or register updates to send to hardware. The Octeon hardware requires a read-back after writing to OUT_CNT/IN_CNT registers to ensure the write has been flushed through any posted write buffers before the interrupt resend bit is set. Without this, we have observed cases where the hardware didn't properly update its internal state. wmb/rmb only provides ordering guarantees but doesn't prevent the compiler from performing optimizations like caching in registers, load tearing etc. Fixes: 1cd3b407977c3 ("octeon_ep_vf: add Tx/Rx processing and interrupt support") Signed-off-by: Sathesh Edara <sedara@marvell.com> Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com> Link: https://patch.msgid.link/20260227091402.1773833-5-vimleshk@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13octeon_ep_vf: Relocate counter updates before NAPIVimlesh Kumar1-2/+15
[ Upstream commit 2ae7d20fb24f598f60faa8f6ecc856dac782261a ] Relocate IQ/OQ IN/OUT_CNTS updates to occur before NAPI completion. Moving the IQ/OQ counter updates before napi_complete_done ensures 1. Counter registers are updated before re-enabling interrupts. 2. Prevents a race where new packets arrive but counters aren't properly synchronized. Fixes: 1cd3b407977c3 ("octeon_ep_vf: add Tx/Rx processing and interrupt support") Signed-off-by: Sathesh Edara <sedara@marvell.com> Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com> Link: https://patch.msgid.link/20260227091402.1773833-4-vimleshk@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13octeon_ep: avoid compiler and IQ/OQ reorderingVimlesh Kumar2-16/+32
[ Upstream commit 43b3160cb639079a15daeb5f080120afbfbfc918 ] Utilize READ_ONCE and WRITE_ONCE APIs for IO queue Tx/Rx variable access to prevent compiler optimization and reordering. Additionally, ensure IO queue OUT/IN_CNT registers are flushed by performing a read-back after writing. The compiler could reorder reads/writes to pkts_pending, last_pkt_count, etc., causing stale values to be used when calculating packets to process or register updates to send to hardware. The Octeon hardware requires a read-back after writing to OUT_CNT/IN_CNT registers to ensure the write has been flushed through any posted write buffers before the interrupt resend bit is set. Without this, we have observed cases where the hardware didn't properly update its internal state. wmb/rmb only provides ordering guarantees but doesn't prevent the compiler from performing optimizations like caching in registers, load tearing etc. Fixes: 37d79d0596062 ("octeon_ep: add Tx/Rx processing and interrupt support") Signed-off-by: Sathesh Edara <sedara@marvell.com> Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com> Link: https://patch.msgid.link/20260227091402.1773833-3-vimleshk@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13octeon_ep: Relocate counter updates before NAPIVimlesh Kumar1-4/+15
[ Upstream commit 18c04a808c436d629d5812ce883e3822a5f5a47f ] Relocate IQ/OQ IN/OUT_CNTS updates to occur before NAPI completion, and replace napi_complete with napi_complete_done. Moving the IQ/OQ counter updates before napi_complete_done ensures 1. Counter registers are updated before re-enabling interrupts. 2. Prevents a race where new packets arrive but counters aren't properly synchronized. napi_complete_done (vs napi_complete) allows for better interrupt coalescing. Fixes: 37d79d0596062 ("octeon_ep: add Tx/Rx processing and interrupt support") Signed-off-by: Sathesh Edara <sedara@marvell.com> Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com> Link: https://patch.msgid.link/20260227091402.1773833-2-vimleshk@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13amd-xgbe: fix MAC_TCR_SS register width for 2.5G and 10M speedsRaju Rangoju1-1/+1
[ Upstream commit 9439a661c2e80485406ce2c90b107ca17858382d ] Extend the MAC_TCR_SS (Speed Select) register field width from 2 bits to 3 bits to properly support all speed settings. The MAC_TCR register's SS field encoding requires 3 bits to represent all supported speeds: - 0x00: 10Gbps (XGMII) - 0x02: 2.5Gbps (GMII) / 100Mbps - 0x03: 1Gbps / 10Mbps - 0x06: 2.5Gbps (XGMII) - P100a only With only 2 bits, values 0x04-0x07 cannot be represented, which breaks 2.5G XGMII mode on newer platforms and causes incorrect speed select values to be programmed. Fixes: 07445f3c7ca1 ("amd-xgbe: Add support for 10 Mbps speed") Co-developed-by: Guruvendra Punugupati <Guruvendra.Punugupati@amd.com> Signed-off-by: Guruvendra Punugupati <Guruvendra.Punugupati@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260226170753.250312-1-Raju.Rangoju@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>