summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
4 daysi40e: fix IRQ freeing in i40e_vsi_request_irq_msix error pathMichal Schmidt1-1/+1
[ Upstream commit 915470e1b44e71d1dd07ee067276f003c3521ee3 ] If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration later than the first, the error path wants to free the IRQs requested so far. However, it uses the wrong dev_id argument for free_irq(), so it does not free the IRQs correctly and instead triggers the warning: Trying to free already-free IRQ 173 WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0 Modules linked in: i40e(+) [...] CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy) Hardware name: [...] RIP: 0010:__free_irq+0x192/0x2c0 [...] Call Trace: <TASK> free_irq+0x32/0x70 i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e] i40e_vsi_request_irq+0x79/0x80 [i40e] i40e_vsi_open+0x21f/0x2f0 [i40e] i40e_open+0x63/0x130 [i40e] __dev_open+0xfc/0x210 __dev_change_flags+0x1fc/0x240 netif_change_flags+0x27/0x70 do_setlink.isra.0+0x341/0xc70 rtnl_newlink+0x468/0x860 rtnetlink_rcv_msg+0x375/0x450 netlink_rcv_skb+0x5c/0x110 netlink_unicast+0x288/0x3c0 netlink_sendmsg+0x20d/0x430 ____sys_sendmsg+0x3a2/0x3d0 ___sys_sendmsg+0x99/0xe0 __sys_sendmsg+0x8a/0xf0 do_syscall_64+0x82/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] </TASK> ---[ end trace 0000000000000000 ]--- Use the same dev_id for free_irq() as for request_irq(). I tested this with inserting code to fail intentionally. Fixes: 493fb30011b3 ("i40e: Move q_vectors from pointer to array to array of pointers") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysigb: fix link test skipping when interface is admin downKohei Enju1-4/+1
[ Upstream commit d709f178abca22a4d3642513df29afe4323a594b ] The igb driver incorrectly skips the link test when the network interface is admin down (if_running == false), causing the test to always report PASS regardless of the actual physical link state. This behavior is inconsistent with other drivers (e.g. i40e, ice, ixgbe, etc.) which correctly test the physical link state regardless of admin state. Remove the if_running check to ensure link test always reflects the physical link state. Fixes: 8d420a1b3ea6 ("igb: correct link test not being run when link is down") Signed-off-by: Kohei Enju <enjuk@amazon.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnet: fec: Fix possible NPD in fec_enet_phy_reset_after_clk_enable()Stefan Wahren1-1/+2
[ Upstream commit 03e79de4608bdd48ad6eec272e196124cefaf798 ] The function of_phy_find_device may return NULL, so we need to take care before dereferencing phy_dev. Fixes: 64a632da538a ("net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Cc: Christoph Niedermaier <cniedermaier@dh-electronics.com> Cc: Richard Leitner <richard.leitner@skidata.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250904091334.53965-1-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysnet/mlx5: HWS, change error flow on matcher disconnectYevgeny Kliteynik1-16/+8
commit 1ce840c7a659aa53a31ef49f0271b4fd0dc10296 upstream. Currently, when firmware failure occurs during matcher disconnect flow, the error flow of the function reconnects the matcher back and returns an error, which continues running the calling function and eventually frees the matcher that is being disconnected. This leads to a case where we have a freed matcher on the matchers list, which in turn leads to use-after-free and eventual crash. This patch fixes that by not trying to reconnect the matcher back when some FW command fails during disconnect. Note that we're dealing here with FW error. We can't overcome this problem. This might lead to bad steering state (e.g. wrong connection between matchers), and will also lead to resource leakage, as it is the case with any other error handling during resource destruction. However, the goal here is to allow the driver to continue and not crash the machine with use-after-free error. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Itamar Gozlan <igozlan@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250102181415.1477316-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jan Alexander Preissler <akendo@akendo.eu> Signed-off-by: Sujana Subramaniam <sujana.subramaniam@sap.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13 daysmicrochip: lan865x: Fix LAN8651 autoloadingStefan Wahren1-0/+2
commit ca47c44d36a9ad3268d17f89789104a471c07f81 upstream. Add missing IDs for LAN8651 devices, which are also defined in the DT bindings. Fixes: 5cd2340cb6a3 ("microchip: lan865x: add driver support for Microchip's LAN865X MAC-PHY") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Cc: stable@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250827115341.34608-4-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13 daysmicrochip: lan865x: Fix module autoloadingStefan Wahren1-2/+3
commit c7217963eb779be0a7627dd2121152fa6786ecf7 upstream. Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from spi_device_id table. While at this, fix the misleading variable name (spidev is unrelated to this driver). Fixes: 5cd2340cb6a3 ("microchip: lan865x: add driver support for Microchip's LAN865X MAC-PHY") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Cc: stable@kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250827115341.34608-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13 dayse1000e: fix heap overflow in e1000_set_eepromVitaly Lifshits1-3/+7
commit 90fb7db49c6dbac961c6b8ebfd741141ffbc8545 upstream. Fix a possible heap overflow in e1000_set_eeprom function by adding input validation for the requested length of the change in the EEPROM. In addition, change the variable type from int to size_t for better code practices and rearrange declarations to RCT. Cc: stable@vger.kernel.org Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)") Co-developed-by: Mikael Wessel <post@mikaelkw.online> Signed-off-by: Mikael Wessel <post@mikaelkw.online> Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13 daysnet: ethernet: oa_tc6: Handle failure of spi_setupStefan Wahren1-1/+2
commit b3852ae3105ec1048535707545d23c1e519c190f upstream. There is no guarantee that spi_setup succeed, so properly handle the error case. Fixes: aa58bec064ab ("net: ethernet: oa_tc6: implement register write operation") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Cc: stable@kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250827115341.34608-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13 daysnet: xilinx: axienet: Add error handling for RX metadata pointer retrievalAbin Joseph1-0/+10
[ Upstream commit 8bbceba7dc5090c00105e006ce28d1292cfda8dd ] Add proper error checking for dmaengine_desc_get_metadata_ptr() which can return an error pointer and lead to potential crashes or undefined behaviour if the pointer retrieval fails. Properly handle the error by unmapping DMA buffer, freeing the skb and returning early to prevent further processing with invalid data. Fixes: 6a91b846af85 ("net: axienet: Introduce dmaengine support") Signed-off-by: Abin Joseph <abin.joseph@amd.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://patch.msgid.link/20250903025213.3120181-1-abin.joseph@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysnet: thunder_bgx: decrement cleanup index before useRosen Penev1-1/+1
[ Upstream commit 9e3d71a92e561ccc77025689dab25d201fee7a3e ] All paths in probe that call goto defer do so before assigning phydev and thus it makes sense to cleanup the prior index. It also fixes a bug where index 0 does not get cleaned up. Fixes: b7d3e3d3d21a ("net: thunderx: Don't leak phy device references on -EPROBE_DEFER condition.") Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250901213314.48599-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysnet: thunder_bgx: add a missing of_node_putRosen Penev1-7/+11
[ Upstream commit 9d28f94912589f04ab51fbccaef287d4f40e0d1f ] phy_np needs to get freed, just like the other child nodes. Fixes: 5fc7cf179449 ("net: thunderx: Cleanup PHY probing code.") Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250901213018.47392-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysnet: ethernet: mtk_eth_soc: fix tx vlan tag for llc packetsFelix Fietkau1-1/+9
[ Upstream commit d4736737110ffa83d29f1c5d17b26113864205f6 ] When sending llc packets with vlan tx offload, the hardware fails to actually add the tag. Deal with this by fixing it up in software. Fixes: 656e705243fd ("net-next: mediatek: add support for MT7623 ethernet") Reported-by: Thibaut VARENE <hacks@slashdirt.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250831182007.51619-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysixgbe: fix incorrect map used in eee linkmodeAlok Tiwari1-2/+2
[ Upstream commit b7e5c3e3bfa9dc8af75ff6d8633ad7070e1985e4 ] incorrectly used ixgbe_lp_map in loops intended to populate the supported and advertised EEE linkmode bitmaps based on ixgbe_ls_map. This results in incorrect bit setting and potential out-of-bounds access, since ixgbe_lp_map and ixgbe_ls_map have different sizes and purposes. ixgbe_lp_map[i] -> ixgbe_ls_map[i] Use ixgbe_ls_map for supported and advertised linkmodes, and keep ixgbe_lp_map usage only for link partner (lp_advertised) mapping. Fixes: 9356b6db9d05 ("net: ethernet: ixgbe: Convert EEE to use linkmodes") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysi40e: Fix potential invalid access when MAC list is emptyZhen Ni1-2/+2
[ Upstream commit a556f06338e1d5a85af0e32ecb46e365547f92b9 ] list_first_entry() never returns NULL - if the list is empty, it still returns a pointer to an invalid object, leading to potential invalid memory access when dereferenced. Fix this by using list_first_entry_or_null instead of list_first_entry. Fixes: e3219ce6a775 ("i40e: Add support for client interface for IWARP driver") Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysi40e: remove read access to debugfs filesJacob Keller1-104/+19
[ Upstream commit 9fcdb1c3c4ba134434694c001dbff343f1ffa319 ] The 'command' and 'netdev_ops' debugfs files are a legacy debugging interface supported by the i40e driver since its early days by commit 02e9c290814c ("i40e: debugfs interface"). Both of these debugfs files provide a read handler which is mostly useless, and which is implemented with questionable logic. They both use a static 256 byte buffer which is initialized to the empty string. In the case of the 'command' file this buffer is literally never used and simply wastes space. In the case of the 'netdev_ops' file, the last command written is saved here. On read, the files contents are presented as the name of the device followed by a colon and then the contents of their respective static buffer. For 'command' this will always be "<device>: ". For 'netdev_ops', this will be "<device>: <last command written>". But note the buffer is shared between all devices operated by this module. At best, it is mostly meaningless information, and at worse it could be accessed simultaneously as there doesn't appear to be any locking mechanism. We have also recently received multiple reports for both read functions about their use of snprintf and potential overflow that could result in reading arbitrary kernel memory. For the 'command' file, this is definitely impossible, since the static buffer is always zero and never written to. For the 'netdev_ops' file, it does appear to be possible, if the user carefully crafts the command input, it will be copied into the buffer, which could be large enough to cause snprintf to truncate, which then causes the copy_to_user to read beyond the length of the buffer allocated by kzalloc. A minimal fix would be to replace snprintf() with scnprintf() which would cap the return to the number of bytes written, preventing an overflow. A more involved fix would be to drop the mostly useless static buffers, saving 512 bytes and modifying the read functions to stop needing those as input. Instead, lets just completely drop the read access to these files. These are debug interfaces exposed as part of debugfs, and I don't believe that dropping read access will break any script, as the provided output is pretty useless. You can find the netdev name through other more standard interfaces, and the 'netdev_ops' interface can easily result in garbage if you issue simultaneous writes to multiple devices at once. In order to properly remove the i40e_dbg_netdev_ops_buf, we need to refactor its write function to avoid using the static buffer. Instead, use the same logic as the i40e_dbg_command_write, with an allocated buffer. Update the code to use this instead of the static buffer, and ensure we free the buffer on exit. This fixes simultaneous writes to 'netdev_ops' on multiple devices, and allows us to remove the now unused static buffer along with removing the read access. Fixes: 02e9c290814c ("i40e: debugfs interface") Reported-by: Kunwu Chan <chentao@kylinos.cn> Closes: https://lore.kernel.org/intel-wired-lan/20231208031950.47410-1-chentao@kylinos.cn/ Reported-by: Wang Haoran <haoranwangsec@gmail.com> Closes: https://lore.kernel.org/all/CANZ3JQRRiOdtfQJoP9QM=6LS1Jto8PGBGw6y7-TL=BcnzHQn1Q@mail.gmail.com/ Reported-by: Amir Mohammad Jahangirzad <a.jahangirzad@gmail.com> Closes: https://lore.kernel.org/all/20250722115017.206969-1-a.jahangirzad@gmail.com/ Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysidpf: set mac type when adding and removing MAC filtersEmil Tantilov2-3/+18
[ Upstream commit acf3a5c8be80fe238c1a7629db1c21c74a1f9dd4 ] On control planes that allow changing the MAC address of the interface, the driver must provide a MAC type to avoid errors such as: idpf 0000:0a:00.0: Transaction failed (op 535) idpf 0000:0a:00.0: Received invalid MAC filter payload (op 535) (len 0) idpf 0000:0a:00.0: Transaction failed (op 536) These errors occur during driver load or when changing the MAC via: ip link set <iface> address <mac> Add logic to set the MAC type when sending ADD/DEL (opcodes 535/536) to the control plane. Since only one primary MAC is supported per vport, the driver only needs to send an ADD opcode when setting it. Remove the old address by calling __idpf_del_mac_filter(), which skips the message and just clears the entry from the internal list. This avoids an error on DEL as it attempts to remove an address already cleared by the preceding ADD opcode. Fixes: ce1b75d0635c ("idpf: add ptypes and MAC filter support") Reported-by: Jian Liu <jianliu@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysice: fix NULL access of tx->in_use in ice_ll_ts_intrJacob Keller1-5/+7
[ Upstream commit f6486338fde3f04ed0ec59fe67a69a208c32734f ] Recent versions of the E810 firmware have support for an extra interrupt to handle report of the "low latency" Tx timestamps coming from the specialized low latency firmware interface. Instead of polling the registers, software can wait until the low latency interrupt is fired. This logic makes use of the Tx timestamp tracking structure, ice_ptp_tx, as it uses the same "ready" bitmap to track which Tx timestamps complete. Unfortunately, the ice_ll_ts_intr() function does not check if the tracker is initialized before its first access. This results in NULL dereference or use-after-free bugs similar to the issues fixed in the ice_ptp_ts_irq() function. Fix this by only checking the in_use bitmap (and other fields) if the tracker is marked as initialized. The reset flow will clear the init field under lock before it tears the tracker down, thus preventing any use-after-free or NULL access. Fixes: 82e71b226e0e ("ice: Enable SW interrupt from FW for LL TS") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysnet: macb: Fix tx_ptr_lock lockingSean Anderson1-12/+16
[ Upstream commit 6bc8a5098bf4a365c4086a4a4130bfab10a58260 ] macb_start_xmit and macb_tx_poll can be called with bottom-halves disabled (e.g. from softirq) as well as with interrupts disabled (with netpoll). Because of this, all other functions taking tx_ptr_lock must use spin_lock_irqsave. Fixes: 138badbc21a0 ("net: macb: use NAPI for TX completion path") Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250829143521.1686062-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysbnxt_en: fix incorrect page count in RX aggr ring logAlok Tiwari1-1/+1
[ Upstream commit 7000f4fa9b24ae2511b07babd0d49e888db5d265 ] The warning in bnxt_alloc_one_rx_ring_netmem() reports the number of pages allocated for the RX aggregation ring. However, it mistakenly used bp->rx_ring_size instead of bp->rx_agg_ring_size, leading to confusing or misleading log output. Use the correct bp->rx_agg_ring_size value to fix this. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250830062331.783783-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
13 daysxirc2ps_cs: fix register access when enabling FullDuplexAlok Tiwari1-1/+1
[ Upstream commit b79e498080b170fd94fc83bca2471f450811549b ] The current code incorrectly passes (XIRCREG1_ECR | FullDuplex) as the register address to GetByte(), instead of fetching the register value and OR-ing it with FullDuplex. This results in an invalid register access. Fix it by reading XIRCREG1_ECR first, then or-ing with FullDuplex before writing it back. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250827192645.658496-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net: macb: Disable clocks onceNeil Mandir1-5/+2
[ Upstream commit dac978e51cce0c1f00a14c4a82f81d387f79b2d4 ] When the driver is removed the clocks are disabled twice: once in macb_remove and a second time by runtime pm. Disable wakeup in remove so all the clocks are disabled and skip the second call to macb_clks_disable. Always suspend the device as we always set it active in probe. Fixes: d54f89af6cc4 ("net: macb: Add pm runtime support") Signed-off-by: Neil Mandir <neil.mandir@seco.com> Co-developed-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250826143022.935521-1-sean.anderson@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04fbnic: Move phylink resume out of service_task and into open/closeAlexander Duyck2-2/+4
[ Upstream commit 6ede14a2c6365e7e5d855643c7c8390b5268c467 ] The fbnic driver was presenting with the following locking assert coming out of a PM resume: [ 42.208116][ T164] RTNL: assertion failed at drivers/net/phy/phylink.c (2611) [ 42.208492][ T164] WARNING: CPU: 1 PID: 164 at drivers/net/phy/phylink.c:2611 phylink_resume+0x190/0x1e0 [ 42.208872][ T164] Modules linked in: [ 42.209140][ T164] CPU: 1 UID: 0 PID: 164 Comm: bash Not tainted 6.17.0-rc2-virtme #134 PREEMPT(full) [ 42.209496][ T164] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014 [ 42.209861][ T164] RIP: 0010:phylink_resume+0x190/0x1e0 [ 42.210057][ T164] Code: 83 e5 01 0f 85 b0 fe ff ff c6 05 1c cd 3e 02 01 90 ba 33 0a 00 00 48 c7 c6 20 3a 1d a5 48 c7 c7 e0 3e 1d a5 e8 21 b8 90 fe 90 <0f> 0b 90 90 e9 86 fe ff ff e8 42 ea 1f ff e9 e2 fe ff ff 48 89 ef [ 42.210708][ T164] RSP: 0018:ffffc90000affbd8 EFLAGS: 00010296 [ 42.210983][ T164] RAX: 0000000000000000 RBX: ffff8880078d8400 RCX: 0000000000000000 [ 42.211235][ T164] RDX: 0000000000000000 RSI: 1ffffffff4f10938 RDI: 0000000000000001 [ 42.211466][ T164] RBP: 0000000000000000 R08: ffffffffa2ae79ea R09: fffffbfff4b3eb84 [ 42.211707][ T164] R10: 0000000000000003 R11: 0000000000000000 R12: ffff888007ad8000 [ 42.211997][ T164] R13: 0000000000000002 R14: ffff888006a18800 R15: ffffffffa34c59e0 [ 42.212234][ T164] FS: 00007f0dc8e39740(0000) GS:ffff88808f51f000(0000) knlGS:0000000000000000 [ 42.212505][ T164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 42.212704][ T164] CR2: 00007f0dc8e9fe10 CR3: 000000000b56d003 CR4: 0000000000772ef0 [ 42.213227][ T164] PKRU: 55555554 [ 42.213366][ T164] Call Trace: [ 42.213483][ T164] <TASK> [ 42.213565][ T164] __fbnic_pm_attach.isra.0+0x8e/0xa0 [ 42.213725][ T164] pci_reset_function+0x116/0x1d0 [ 42.213895][ T164] reset_store+0xa0/0x100 [ 42.214025][ T164] ? pci_dev_reset_attr_is_visible+0x50/0x50 [ 42.214221][ T164] ? sysfs_file_kobj+0xc1/0x1e0 [ 42.214374][ T164] ? sysfs_kf_write+0x65/0x160 [ 42.214526][ T164] kernfs_fop_write_iter+0x2f8/0x4c0 [ 42.214677][ T164] ? kernfs_vma_page_mkwrite+0x1f0/0x1f0 [ 42.214836][ T164] new_sync_write+0x308/0x6f0 [ 42.214987][ T164] ? __lock_acquire+0x34c/0x740 [ 42.215135][ T164] ? new_sync_read+0x6f0/0x6f0 [ 42.215288][ T164] ? lock_acquire.part.0+0xbc/0x260 [ 42.215440][ T164] ? ksys_write+0xff/0x200 [ 42.215590][ T164] ? perf_trace_sched_switch+0x6d0/0x6d0 [ 42.215742][ T164] vfs_write+0x65e/0xbb0 [ 42.215876][ T164] ksys_write+0xff/0x200 [ 42.215994][ T164] ? __ia32_sys_read+0xc0/0xc0 [ 42.216141][ T164] ? do_user_addr_fault+0x269/0x9f0 [ 42.216292][ T164] ? rcu_is_watching+0x15/0xd0 [ 42.216442][ T164] do_syscall_64+0xbb/0x360 [ 42.216591][ T164] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 42.216784][ T164] RIP: 0033:0x7f0dc8ea9986 A bit of digging showed that we were invoking the phylink_resume as a part of the fbnic_up path when we were enabling the service task while not holding the RTNL lock. We should be enabling this sooner as a part of the ndo_open path and then just letting the service task come online later. This will help to enforce the correct locking and brings the phylink interface online at the same time as the network interface, instead of at a later time. I tested this on QEMU to verify this was working by putting the system to sleep using "echo mem > /sys/power/state" to put the system to sleep in the guest and then using the command "system_wakeup" in the QEMU monitor. Fixes: 69684376eed5 ("eth: fbnic: Add link detection") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/175616257316.1963577.12238158800417771119.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net: stmmac: Set CIC bit only for TX queues with COERohan G Thomas1-2/+4
[ Upstream commit b1eded580ab28119de0b0f21efe37ee2b4419144 ] Currently, in the AF_XDP transmit paths, the CIC bit of TX Desc3 is set for all packets. Setting this bit for packets transmitting through queues that don't support checksum offloading causes the TX DMA to get stuck after transmitting some packets. This patch ensures the CIC bit of TX Desc3 is set only if the TX queue supports checksum offloading. Fixes: 132c32ee5bc0 ("net: stmmac: Add TX via XDP zero-copy socket") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Link: https://patch.msgid.link/20250825-xgmac-minor-fixes-v3-3-c225fe4444c0@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net: stmmac: xgmac: Correct supported speed modesRohan G Thomas2-2/+16
[ Upstream commit 42ef11b2bff5b6a2910c28d2ea47cc00e0fbcaec ] Correct supported speed modes as per the XGMAC databook. Commit 9cb54af214a7 ("net: stmmac: Fix IP-cores specific MAC capabilities") removes support for 10M, 100M and 1000HD. 1000HD is not supported by XGMAC IP, but it does support 10M and 100M FD mode for XGMAC version >= 2_20, and it also supports 10M and 100M HD mode if the HDSEL bit is set in the MAC_HW_FEATURE0 reg. This commit enables support for 10M and 100M speed modes for XGMAC IP based on XGMAC version and MAC capabilities. Fixes: 9cb54af214a7 ("net: stmmac: Fix IP-cores specific MAC capabilities") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Link: https://patch.msgid.link/20250825-xgmac-minor-fixes-v3-2-c225fe4444c0@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net: stmmac: xgmac: Do not enable RX FIFO Overflow interruptsRohan G Thomas1-4/+0
[ Upstream commit 4f23382841e67174211271a454811dd17c0ef3c5 ] Enabling RX FIFO Overflow interrupts is counterproductive and causes an interrupt storm when RX FIFO overflows. Disabling this interrupt has no side effect and eliminates interrupt storms when the RX FIFO overflows. Commit 8a7cb245cf28 ("net: stmmac: Do not enable RX FIFO overflow interrupts") disables RX FIFO overflow interrupts for DWMAC4 IP and removes the corresponding handling of this interrupt. This patch is doing the same thing for XGMAC IP. Fixes: 2142754f8b9c ("net: stmmac: Add MAC related callbacks for XGMAC2") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250825-xgmac-minor-fixes-v3-1-c225fe4444c0@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net/mlx5e: Set local Xoff after FW updateAlexei Lazar1-1/+2
[ Upstream commit aca0c31af61e0d5cf1675a0cbd29460b95ae693c ] The local Xoff value is being set before the firmware (FW) update. In case of a failure where the FW is not updated with the new value, there is no fallback to the previous value. Update the local Xoff value after the FW has been successfully set. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-12-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net/mlx5e: Update and set Xon/Xoff upon port speed setAlexei Lazar1-0/+2
[ Upstream commit d24341740fe48add8a227a753e68b6eedf4b385a ] Xon/Xoff sizes are derived from calculations that include the port speed. These settings need to be updated and applied whenever the port speed is changed. The port speed is typically set after the physical link goes down and is negotiated as part of the link-up process between the two connected interfaces. Xon/Xoff parameters being updated at the point where the new negotiated speed is established. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-11-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net/mlx5e: Update and set Xon/Xoff upon MTU setAlexei Lazar2-1/+28
[ Upstream commit ceddedc969f0532b7c62ca971ee50d519d2bc0cb ] Xon/Xoff sizes are derived from calculation that include the MTU size. Set Xon/Xoff when MTU is set. If Xon/Xoff fails, set the previous MTU. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-10-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net/mlx5: Nack sync reset when SFs are presentMoshe Shemesh3-0/+22
[ Upstream commit 26e42ec7712d392d561964514b1f253b1a96f42d ] If PF (Physical Function) has SFs (Sub-Functions), since the SFs are not taking part in the synchronization flow, sync reset can lead to fatal error on the SFs, as the function will be closed unexpectedly from the SF point of view. Add a check to prevent sync reset when there are SFs on a PF device which is not ECPF, as ECPF is teardowned gracefully before reset. Fixes: 92501fa6e421 ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net/mlx5: Fix lockdep assertion on sync reset unload eventMoshe Shemesh3-48/+63
[ Upstream commit 902a8bc23a24882200f57cadc270e15a2cfaf2bb ] Fix lockdep assertion triggered during sync reset unload event. When the sync reset flow is initiated using the devlink reload fw_activate option, the PF already holds the devlink lock while handling unload event. In this case, delegate sync reset unload event handling back to the devlink callback process to avoid double-locking and resolve the lockdep warning. Kernel log: WARNING: CPU: 9 PID: 1578 at devl_assert_locked+0x31/0x40 [...] Call Trace: <TASK> mlx5_unload_one_devl_locked+0x2c/0xc0 [mlx5_core] mlx5_sync_reset_unload_event+0xaf/0x2f0 [mlx5_core] process_one_work+0x222/0x640 worker_thread+0x199/0x350 kthread+0x10b/0x230 ? __pfx_worker_thread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x8e/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 7a9770f1bfea ("net/mlx5: Handle sync reset unload event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-7-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net/mlx5: Reload auxiliary drivers on fw_activateMoshe Shemesh1-1/+1
[ Upstream commit 34cc6a54914f478c93e176450fae6313404f9f74 ] The devlink reload fw_activate command performs firmware activation followed by driver reload, while devlink reload driver_reinit triggers only driver reload. However, the driver reload logic differs between the two modes, as on driver_reinit mode mlx5 also reloads auxiliary drivers, while in fw_activate mode the auxiliary drivers are suspended where applicable. Additionally, following the cited commit, if the device has multiple PFs, the behavior during fw_activate may vary between PFs: one PF may suspend auxiliary drivers, while another reloads them. Align devlink dev reload fw_activate behavior with devlink dev reload driver_reinit, to reload all auxiliary drivers. Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250825143435.598584-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04bnxt_en: Fix stats context reservation logicMichael Chan1-1/+2
[ Upstream commit b4fc8faacfea2538184a1dbd616ae9447a361f3d ] The HW resource reservation logic allows the L2 driver to use the RoCE resources if the RoCE driver is not registered. When calculating the stats contexts available for L2, we should not blindly subtract the stats contexts reserved for RoCE unless the RoCE driver is registered. This bug may cause the L2 rings to be less than the number requested when we are close to running out of stats contexts. Fixes: 2e4592dc9bee ("bnxt_en: Change MSIX/NQs allocation policy") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250825175927.459987-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04bnxt_en: Adjust TX rings if reservation is less than requestedMichael Chan1-0/+12
[ Upstream commit 1ee581c24dfdcbc6de25aac95a48c1f08e9a542c ] Before we accept an ethtool request to increase a resource (such as rings), we call the FW to check that the requested resource is likely available first before we commit. But it is still possible that the actual reservation or allocation can fail. The existing code is missing the logic to adjust the TX rings in case the reserved TX rings are less than requested. Add a warning message (a similar message for RX rings already exists) and add the logic to adjust the TX rings. Without this fix, the number of TX rings reported to the stack can exceed the actual TX rings and ethtool -l will report more than the actual TX rings. Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250825175927.459987-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04bnxt_en: Fix memory corruption when FW resources change during ifdownSreekanth Reddy1-5/+16
[ Upstream commit 2747328ba2714f1a7454208dbbc1dc0631990b4a ] bnxt_set_dflt_rings() assumes that it is always called before any TC has been created. So it doesn't take bp->num_tc into account and assumes that it is always 0 or 1. In the FW resource or capability change scenario, the FW will return flags in bnxt_hwrm_if_change() that will cause the driver to reinitialize and call bnxt_cancel_reservations(). This will lead to bnxt_init_dflt_ring_mode() calling bnxt_set_dflt_rings() and bp->num_tc may be greater than 1. This will cause bp->tx_ring[] to be sized too small and cause memory corruption in bnxt_alloc_cp_rings(). Fix it by properly scaling the TX rings by bp->num_tc in the code paths mentioned above. Add 2 helper functions to determine bp->tx_nr_rings and bp->tx_nr_rings_per_tc. Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250825175927.459987-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net: dlink: fix multicast stats being counted incorrectlyYeounsu Moon1-1/+1
[ Upstream commit 007a5ffadc4fd51739527f1503b7cf048f31c413 ] `McstFramesRcvdOk` counts the number of received multicast packets, and it reports the value correctly. However, reading `McstFramesRcvdOk` clears the register to zero. As a result, the driver was reporting only the packets since the last read, instead of the accumulated total. Fix this by updating the multicast statistics accumulatively instaed of instantaneously. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Tested-on: D-Link DGE-550T Rev-A3 Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250823182927.6063-3-yyyynoom@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04ice: fix incorrect counter for buffer allocation failuresMichal Kubiak1-1/+1
[ Upstream commit b1a0c977c6f1130f7dd125ee3db8c2435d7e3d41 ] Currently, the driver increments `alloc_page_failed` when buffer allocation fails in `ice_clean_rx_irq()`. However, this counter is intended for page allocation failures, not buffer allocation issues. This patch corrects the counter by incrementing `alloc_buf_failed` instead, ensuring accurate statistics reporting for buffer allocation failures. Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Reported-by: Jacob Keller <jacob.e.keller@intel.com> Suggested-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Priya Singh <priyax.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04ice: use fixed adapter index for E825C embedded devicesJacob Keller2-13/+40
[ Upstream commit 5c5e5b52bf05c7fe88768318c041052c5fac36b8 ] The ice_adapter structure is used by the ice driver to connect multiple physical functions of a device in software. It was introduced by commit 0e2bddf9e5f9 ("ice: add ice_adapter for shared data across PFs on the same NIC") and is primarily used for PTP support, as well as for handling certain cross-PF synchronization. The original design of ice_adapter used PCI address information to determine which devices should be connected. This was extended to support E825C devices by commit fdb7f54700b1 ("ice: Initial support for E825C hardware in ice_adapter"), which used the device ID for E825C devices instead of the PCI address. Later, commit 0093cb194a75 ("ice: use DSN instead of PCI BDF for ice_adapter index") replaced the use of Bus/Device/Function addressing with use of the device serial number. E825C devices may appear in "Dual NAC" configuration which has multiple physical devices tied to the same clock source and which need to use the same ice_adapter. Unfortunately, each "NAC" has its own NVM which has its own unique Device Serial Number. Thus, use of the DSN for connecting ice_adapter does not work properly. It "worked" in the pre-production systems because the DSN was not initialized on the test NVMs and all the NACs had the same zero'd serial number. Since we cannot rely on the DSN, lets fall back to the logic in the original E825C support which used the device ID. This is safe for E825C only because of the embedded nature of the device. It isn't a discreet adapter that can be plugged into an arbitrary system. All E825C devices on a given system are connected to the same clock source and need to be configured through the same PTP clock. To make this separation clear, reserve bit 63 of the 64-bit index values as a "fixed index" indicator. Always clear this bit when using the device serial number as an index. For E825C, use a fixed value defined as the 0x579C E825C backplane device ID bitwise ORed with the fixed index indicator. This is slightly different than the original logic of just using the device ID directly. Doing so prevents a potential issue with systems where only one of the NACs is connected with an external PHY over SGMII. In that case, one NAC would have the E825C_SGMII device ID, but the other would not. Separate the determination of the full 64-bit index from the 32-bit reduction logic. Provide both ice_adapter_index() and a wrapping ice_adapter_xa_index() which handles reducing the index to a long on 32-bit systems. As before, cache the full index value in the adapter structure to warn about collisions. This fixes issues with E825C not initializing PTP on both NACs, due to failure to connect the appropriate devices to the same ice_adapter. Fixes: 0093cb194a75 ("ice: use DSN instead of PCI BDF for ice_adapter index") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04ice: don't leave device non-functional if Tx scheduler config failsJacob Keller2-17/+43
[ Upstream commit 86aae43f21cf784c1d7f6a9af93e5116b0f232ab ] The ice_cfg_tx_topo function attempts to apply Tx scheduler topology configuration based on NVM parameters, selecting either a 5 or 9 layer topology. As part of this flow, the driver acquires the "Global Configuration Lock", which is a hardware resource associated with programming the DDP package to the device. This "lock" is implemented by firmware as a way to guarantee that only one PF can program the DDP for a device. Unlike a traditional lock, once a PF has acquired this lock, no other PF will be able to acquire it again (including that PF) until a CORER of the device. Future requests to acquire the lock report that global configuration has already completed. The following flow is used to program the Tx topology: * Read the DDP package for scheduler configuration data * Acquire the global configuration lock * Program Tx scheduler topology according to DDP package data * Trigger a CORER which clears the global configuration lock This is followed by the flow for programming the DDP package: * Acquire the global configuration lock (again) * Download the DDP package to the device * Release the global configuration lock. However, if configuration of the Tx topology fails, (i.e. ice_get_set_tx_topo returns an error code), the driver exits ice_cfg_tx_topo() immediately, and fails to trigger CORER. While the global configuration lock is held, the firmware rejects most AdminQ commands, as it is waiting for the DDP package download (or Tx scheduler topology programming) to occur. The current driver flows assume that the global configuration lock has been reset by CORER after programming the Tx topology. Thus, the same PF attempts to acquire the global lock again, and fails. This results in the driver reporting "an unknown error occurred when loading the DDP package". It then attempts to enter safe mode, but ultimately fails to finish ice_probe() since nearly all AdminQ command report error codes, and the driver stops loading the device at some point during its initialization. The only currently known way that ice_get_set_tx_topo() can fail is with certain older DDP packages which contain invalid topology configuration, on firmware versions which strictly validate this data. The most recent releases of the DDP have resolved the invalid data. However, it is still poor practice to essentially brick the device, and prevent access to the device even through safe mode or recovery mode. It is also plausible that this command could fail for some other reason in the future. We cannot simply release the global lock after a failed call to ice_get_set_tx_topo(). Releasing the lock indicates to firmware that global configuration (downloading of the DDP) has completed. Future attempts by this or other PFs to load the DDP will fail with a report that the DDP package has already been downloaded. Then, PFs will enter safe mode as they realize that the package on the device does not meet the minimum version requirement to load. The reported error messages are confusing, as they indicate the version of the default "safe mode" package in the NVM, rather than the version of the file loaded from /lib/firmware. Instead, we need to trigger CORER to clear global configuration. This is the lowest level of hardware reset which clears the global configuration lock and related state. It also clears any already downloaded DDP. Crucially, it does *not* clear the Tx scheduler topology configuration. Refactor ice_cfg_tx_topo() to always trigger a CORER after acquiring the global lock, regardless of success or failure of the topology configuration. We need to re-initialize the HW structure when we trigger the CORER. Thus, it makes sense for this to be the responsibility of ice_cfg_tx_topo() rather than its caller, ice_init_tx_topology(). This avoids needless re-initialization in cases where we don't attempt to update the Tx scheduler topology, such as if it has already been programmed. There is one catch: failure to re-initialize the HW struct should stop ice_probe(). If this function fails, we won't have a valid HW structure and cannot ensure the device is functioning properly. To handle this, ensure ice_cfg_tx_topo() returns a limited set of error codes. Set aside one specifically, -ENODEV, to indicate that the ice_init_tx_topology() should fail and stop probe. Other error codes indicate failure to apply the Tx scheduler topology. This is treated as a non-fatal error, with an informational message informing the system administrator that the updated Tx topology did not apply. This allows the device to load and function with the default Tx scheduler topology, rather than failing to load entirely. Note that this use of CORER will not result in loops with future PFs attempting to also load the invalid Tx topology configuration. The first PF will acquire the global configuration lock as part of programming the DDP. Each PF after this will attempt to acquire the global lock as part of programming the Tx topology, and will fail with the indication from firmware that global configuration is already complete. Tx scheduler topology configuration is only performed during driver init (probe or devlink reload) and not during cleanup for a CORER that happens after probe completes. Fixes: 91427e6d9030 ("ice: Support 5 layer topology") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net: macb: fix unregister_netdev call order in macb_remove()luoguangfei1-1/+1
[ Upstream commit 01b9128c5db1b470575d07b05b67ffa3cb02ebf1 ] When removing a macb device, the driver calls phy_exit() before unregister_netdev(). This leads to a WARN from kernfs: ------------[ cut here ]------------ kernfs: can not remove 'attached_dev', no directory WARNING: CPU: 1 PID: 27146 at fs/kernfs/dir.c:1683 Call trace: kernfs_remove_by_name_ns+0xd8/0xf0 sysfs_remove_link+0x24/0x58 phy_detach+0x5c/0x168 phy_disconnect+0x4c/0x70 phylink_disconnect_phy+0x6c/0xc0 [phylink] macb_close+0x6c/0x170 [macb] ... macb_remove+0x60/0x168 [macb] platform_remove+0x5c/0x80 ... The warning happens because the PHY is being exited while the netdev is still registered. The correct order is to unregister the netdev before shutting down the PHY and cleaning up the MDIO bus. Fix this by moving unregister_netdev() ahead of phy_exit() in macb_remove(). Fixes: 8b73fa3ae02b ("net: macb: Added ZynqMP-specific initialization") Signed-off-by: luoguangfei <15388634752@163.com> Link: https://patch.msgid.link/20250818232527.1316-1-15388634752@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28net/mlx5e: Preserve shared buffer capacity during headroom updatesArmen Ratner1-10/+8
[ Upstream commit 8b0587a885fdb34fd6090a3f8625cb7ac1444826 ] When port buffer headroom changes, port_update_shared_buffer() recalculates the shared buffer size and splits it in a 3:1 ratio (lossy:lossless) - Currently, the calculation is: lossless = shared / 4; lossy = (shared / 4) * 3; Meaning, the calculation dropped the remainder of shared % 4 due to integer division, unintentionally reducing the total shared buffer by up to three cells on each update. Over time, this could shrink the buffer below usable size. Fix it by changing the calculation to: lossless = shared / 4; lossy = shared - lossless; This retains all buffer cells while still approximating the intended 3:1 split, preventing capacity loss over time. While at it, perform headroom calculations in units of cells rather than in bytes for more accurate calculations avoiding extra divisions. Fixes: a440030d8946 ("net/mlx5e: Update shared buffer along with device buffer changes") Signed-off-by: Armen Ratner <armeng@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Alexei Lazar <alazar@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250820133209.389065-9-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28net/mlx5e: Query FW for buffer ownershipAlexei Lazar4-4/+31
[ Upstream commit 451d2849ea66659040b59ae3cb7e50cc97404733 ] The SW currently saves local buffer ownership when setting the buffer. This means that the SW assumes it has ownership of the buffer after the command is set. If setting the buffer fails and we remain in FW ownership, the local buffer ownership state incorrectly remains as SW-owned. This leads to incorrect behavior in subsequent PFC commands, causing failures. Instead of saving local buffer ownership in SW, query the FW for buffer ownership when setting the buffer. This ensures that the buffer ownership state is accurately reflected, avoiding the issues caused by incorrect ownership states. Fixes: ecdf2dadee8e ("net/mlx5e: Receive buffer support for DCBX") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28net/mlx5: Relocate function declarations from port.h to mlx5_core.hShahar Shitrit2-20/+85
[ Upstream commit a2f61f1db85532e72fb8a3af51b06df94bb82912 ] The port header is a general file under include, yet it contains declarations for functions that are either not exported or exported but not used outside the mlx5_core driver. To enhance code organization, we move these declarations to mlx5_core.h, where they are more appropriately scoped. This refactor removes unnecessary exported symbols and prevents unexported functions from being inadvertently referenced outside of the mlx5_core driver. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250304160620.417580-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 451d2849ea66 ("net/mlx5e: Query FW for buffer ownership") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28net/mlx5: Base ECVF devlink port attrs from 0Daniel Jurgens1-1/+3
[ Upstream commit bc17455bc843b2f4b206e0bb8139013eb3d3c08b ] Adjust the vport number by the base ECVF vport number so the port attributes start at 0. Previously the port attributes would start 1 after the maximum number of host VFs. Fixes: dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28Octeontx2-af: Skip overlap check for SPI fieldHariprasad Kelam1-2/+2
[ Upstream commit 8c5d95988c34f0aeba1f34cd5e4ba69494c90c5f ] Octeontx2/CN10K silicon supports generating a 256-bit key per packet. The specific fields to be extracted from a packet for key generation are configurable via a Key Extraction (MKEX) Profile. The AF driver scans the configured extraction profile to ensure that fields from upper layers do not overwrite fields from lower layers in the key. Example Packet Field Layout: LA: DMAC + SMAC LB: VLAN LC: IPv4/IPv6 LD: TCP/UDP Valid MKEX Profile Configuration: LA -> DMAC -> key_offset[0-5] LC -> SIP -> key_offset[20-23] LD -> SPORT -> key_offset[30-31] Invalid MKEX profile configuration: LA -> DMAC -> key_offset[0-5] LC -> SIP -> key_offset[20-23] LD -> SPORT -> key_offset[2-3] // Overlaps with DMAC field In another scenario, if the MKEX profile is configured to extract the SPI field from both AH and ESP headers at the same key offset, the driver rejecting this configuration. In a regular traffic, ipsec packet will be having either AH(LD) or ESP (LE). This patch relaxes the check for the same. Fixes: 12aa0a3b93f3 ("octeontx2-af: Harden rule validation.") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250820063919.1463518-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28igc: fix disabling L1.2 PCI-E link substate on I226 on initValdikSS1-7/+7
[ Upstream commit 1468c1f97cf32418e34dbb40b784ed9333b9e123 ] Device ID comparison in igc_is_device_id_i226 is performed before the ID is set, resulting in always failing check on init. Before the patch: * L1.2 is not disabled on init * L1.2 is properly disabled after suspend-resume cycle With the patch: * L1.2 is properly disabled both on init and after suspend-resume How to test: Connect to the 1G link with 300+ mbit/s Internet speed, and run the download speed test, such as: curl -o /dev/null http://speedtest.selectel.ru/1GB Without L1.2 disabled, the speed would be no more than ~200 mbit/s. With L1.2 disabled, the speed would reach 1 gbit/s. Note: it's required that the latency between your host and the remote be around 3-5 ms, the test inside LAN (<1 ms latency) won't trigger the issue. Link: https://lore.kernel.org/intel-wired-lan/15248b4f-3271-42dd-8e35-02bfc92b25e1@intel.com Fixes: 0325143b59c6 ("igc: disable L1.2 PCI-E link substate to avoid performance issue") Signed-off-by: ValdikSS <iam@valdikss.org.ru> Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250819222000.3504873-6-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28ixgbe: xsk: resolve the negative overflow of budget in ixgbe_xmit_zcJason Xing1-1/+3
[ Upstream commit 4d4d9ef9dfee877d494e5418f68a1016ef08cad6 ] Resolve the budget negative overflow which leads to returning true in ixgbe_xmit_zc even when the budget of descs are thoroughly consumed. Before this patch, when the budget is decreased to zero and finishes sending the last allowed desc in ixgbe_xmit_zc, it will always turn back and enter into the while() statement to see if it should keep processing packets, but in the meantime it unexpectedly decreases the value again to 'unsigned int (0--)', namely, UINT_MAX. Finally, the ixgbe_xmit_zc returns true, showing 'we complete cleaning the budget'. That also means 'clean_complete = true' in ixgbe_poll. The true theory behind this is if that budget number of descs are consumed, it implies that we might have more descs to be done. So we should return false in ixgbe_xmit_zc to tell napi poll to find another chance to start polling to handle the rest of descs. On the contrary, returning true here means job done and we know we finish all the possible descs this time and we don't intend to start a new napi poll. It is apparently against our expectations. Please also see how ixgbe_clean_tx_irq() handles the problem: it uses do..while() statement to make sure the budget can be decreased to zero at most and the negative overflow never happens. The patch adds 'likely' because we rarely would not hit the loop condition since the standard budget is 256. Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Priya Singh <priyax.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250819222000.3504873-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28microchip: lan865x: fix missing Timer Increment config for Rev.B0/B1Parthiban Veerasooran1-0/+19
[ Upstream commit 2cd58fec912acec273cb155911ab8f06ddbb131a ] Fix missing configuration for LAN865x silicon revisions B0 and B1 as per Microchip Application Note AN1760 (Rev F, June 2024). The Timer Increment register was not being set, which is required for accurate timestamping. As per the application note, configure the MAC to set timestamping at the end of the Start of Frame Delimiter (SFD), and set the Timer Increment register to 40 ns (corresponding to a 25 MHz internal clock). Link: https://www.microchip.com/en-us/application-notes/an1760 Fixes: 5cd2340cb6a3 ("microchip: lan865x: add driver support for Microchip's LAN865X MAC-PHY") Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250818060514.52795-3-parthiban.veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28microchip: lan865x: fix missing netif_start_queue() call on device openParthiban Veerasooran1-0/+2
[ Upstream commit 1683fd1b2fa79864d3c7a951d9cea0a9ba1a1923 ] This fixes an issue where the transmit queue is started implicitly only the very first time the device is registered. When the device is taken down and brought back up again (using `ip` or `ifconfig`), the transmit queue is not restarted, causing packet transmission to hang. Adding an explicit call to netif_start_queue() in lan865x_net_open() ensures the transmit queue is properly started every time the device is reopened. Fixes: 5cd2340cb6a3 ("microchip: lan865x: add driver support for Microchip's LAN865X MAC-PHY") Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> Link: https://patch.msgid.link/20250818060514.52795-2-parthiban.veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28gve: prevent ethtool ops after shutdownJordan Rhee1-0/+2
[ Upstream commit 75a9a46d67f46d608205888f9b34e315c1786345 ] A crash can occur if an ethtool operation is invoked after shutdown() is called. shutdown() is invoked during system shutdown to stop DMA operations without performing expensive deallocations. It is discouraged to unregister the netdev in this path, so the device may still be visible to userspace and kernel helpers. In gve, shutdown() tears down most internal data structures. If an ethtool operation is dispatched after shutdown(), it will dereference freed or NULL pointers, leading to a kernel panic. While graceful shutdown normally quiesces userspace before invoking the reboot syscall, forced shutdowns (as observed on GCP VMs) can still trigger this path. Fix by calling netif_device_detach() in shutdown(). This marks the device as detached so the ethtool ioctl handler will skip dispatching operations to the driver. Fixes: 974365e51861 ("gve: Implement suspend/resume/shutdown") Signed-off-by: Jordan Rhee <jordanrhee@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250818211245.1156919-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28net: ti: icssg-prueth: Fix HSR and switch offload Enablement during firwmare ↵MD Danish Anwar1-31/+41
reload. [ Upstream commit 01792bc3e5bdafa171dd83c7073f00e7de93a653 ] To enable HSR / Switch offload, certain configurations are needed. Currently they are done inside icssg_change_mode(). This function only gets called if we move from one mode to another without bringing the links up / down. Once in HSR / Switch mode, if we bring the links down and bring it back up again. The callback sequence is, - emac_ndo_stop() Firmwares are stopped - emac_ndo_open() Firmwares are loaded In this path icssg_change_mode() doesn't get called and as a result the configurations needed for HSR / Switch is not done. To fix this, put all these configurations in a separate function icssg_enable_fw_offload() and call this from both icssg_change_mode() and emac_ndo_open() Fixes: 56375086d093 ("net: ti: icssg-prueth: Enable HSR Tx duplication, Tx Tag and Rx Tag offload") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20250814105106.1491871-1-danishanwar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>