summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
8 daysMerge branch '10GbE' of ↵Jakub Kicinski1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== ixgbe: bypass devlink phys_port_name generation Jedrzej adds option to skip phys_port_name generation and opts ixgbe into it as some configurations rely on pre-devlink naming which could end up broken as a result. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ixgbe: prevent from unwanted interface name changes devlink: let driver opt out of automatic phys_port_name generation ==================== Link: https://patch.msgid.link/20250812205226.1984369-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysbnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZEDavid Wei1-3/+9
The data page pool always fills the HW rx ring with pages. On arm64 with 64K pages, this will waste _at least_ 32K of memory per entry in the rx ring. Fix by fragmenting the pages if PAGE_SIZE > BNXT_RX_PAGE_SIZE. This makes the data page pool the same as the header pool. Tested with iperf3 with a small (64 entries) rx ring to encourage buffer circulation. Fixes: cd1fafe7da1f ("eth: bnxt: add support rx side device memory TCP") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250812182907.1540755-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysixgbe: prevent from unwanted interface name changesJedrzej Jagielski1-0/+1
Users of the ixgbe driver report that after adding devlink support by the commit a0285236ab93 ("ixgbe: add initial devlink support") their configs got broken due to unwanted changes of interface names. It's caused by automatic phys_port_name generation during devlink port initialization flow. To prevent from that set no_phys_port_name flag for ixgbe devlink ports. Reported-by: David Howells <dhowells@redhat.com> Closes: https://lore.kernel.org/netdev/3452224.1745518016@warthog.procyon.org.uk/ Reported-by: David Kaplan <David.Kaplan@amd.com> Closes: https://lore.kernel.org/netdev/LV3PR12MB92658474624CCF60220157199470A@LV3PR12MB9265.namprd12.prod.outlook.com/ Fixes: a0285236ab93 ("ixgbe: add initial devlink support") Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
10 daysnet: stmmac: thead: Get and enable APB clock on initializationYao Zi1-0/+14
It's necessary to adjust the MAC TX clock when the linkspeed changes, but it's noted such adjustment always fails on TH1520 SoC, and reading back from APB glue registers that control clock generation results in garbage, causing broken link. With some testing, it's found a clock must be ungated for access to APB glue registers. Without any consumer, the clock is automatically disabled during late kernel startup. Let's get and enable it if it's described in devicetree. For backward compatibility with older devicetrees, probing won't fail if the APB clock isn't found. In this case, we emit a warning since the link will break if the speed changes. Fixes: 33a1a01e3afa ("net: stmmac: Add glue layer for T-HEAD TH1520 SoC") Signed-off-by: Yao Zi <ziyao@disroot.org> Tested-by: Drew Fustini <fustini@kernel.org> Reviewed-by: Drew Fustini <fustini@kernel.org> Link: https://patch.msgid.link/20250808093655.48074-4-ziyao@disroot.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 daysnet: stmmac: dwc-qos: fix clk prepare/enable leak on probe failureRussell King (Oracle)1-11/+2
dwc_eth_dwmac_probe() gets bulk clocks, and then prepares and enables them. Unfortunately, if dwc_eth_dwmac_config_dt() or stmmac_dvr_probe() fail, we leave the clocks prepared and enabled. Fix this by using devm_clk_bulk_get_all_enabled() to combine the steps and provide devm based release of the prepare and enable state. This also fixes a similar leakin dwc_eth_dwmac_remove() which wasn't correctly retrieving the struct plat_stmmacenet_data. This becomes unnecessary. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: a045e40645df ("net: stmmac: refactor clock management in EQoS driver") Link: https://patch.msgid.link/E1ukM1X-0086qu-Td@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: stmmac: rk: put the PHY clock on removeRussell King (Oracle)1-1/+5
The PHY clock (bsp_priv->clk_phy) is obtained using of_clk_get(), which doesn't take part in the devm release. Therefore, when a device is unbound, this clock needs to be explicitly put. Fix this. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: fecd4d7eef8b ("net: stmmac: dwmac-rk: Add integrated PHY support") Link: https://patch.msgid.link/E1ukM1S-0086qo-PC@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysnet: ti: icss-iep: Fix incorrect type for return value in extts_enable()Alok Tiwari1-1/+2
The variable ret in icss_iep_extts_enable() was incorrectly declared as u32, while the function returns int and may return negative error codes. This will cause sign extension issues and incorrect error propagation. Update ret to be int to fix error handling. This change corrects the declaration to avoid potential type mismatch. Fixes: c1e0230eeaab ("net: ti: icss-iep: Add IEP driver") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250805142323.1949406-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysnet: page_pool: allow enabling recycling late, fix false positive warningJakub Kicinski1-1/+8
Page pool can have pages "directly" (locklessly) recycled to it, if the NAPI that owns the page pool is scheduled to run on the same CPU. To make this safe we check that the NAPI is disabled while we destroy the page pool. In most cases NAPI and page pool lifetimes are tied together so this happens naturally. The queue API expects the following order of calls: -> mem_alloc alloc new pp -> stop napi_disable -> start napi_enable -> mem_free free old pp Here we allocate the page pool in ->mem_alloc and free in ->mem_free. But the NAPIs are only stopped between ->stop and ->start. We created page_pool_disable_direct_recycling() to safely shut down the recycling in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't have to worry about recycling any more. Unfortunately, the page_pool_disable_direct_recycling() is not enough to deal with failures which necessitate freeing the _new_ page pool. If we hit a failure in ->mem_alloc or ->stop the new page pool has to be freed while the NAPI is active (assuming driver attaches the page pool to an existing NAPI instance and doesn't reallocate NAPIs). Freeing the new page pool is technically safe because it hasn't been used for any packets, yet, so there can be no recycling. But the check in napi_assert_will_not_race() has no way of knowing that. We could check if page pool is empty but that'd make the check much less likely to trigger during development. Add page_pool_enable_direct_recycling(), pairing with page_pool_disable_direct_recycling(). It will allow us to create the new page pools in "disabled" state and only enable recycling when we know the reconfig operation will not fail. Coincidentally it will also let us re-enable the recycling for the old pool, if the reconfig failed: -> mem_alloc (new) -> stop (old) # disables direct recycling for old -> start (new) # fail!! -> start (old) # go back to old pp but direct recycling is lost :( -> mem_free (new) The new helper is idempotent to make the life easier for drivers, which can operate in HDS mode and support zero-copy Rx. The driver can call the helper twice whether there are two pools or it has multiple references to a single pool. Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue") Tested-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250805003654.2944974-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysnet: ti: icssg-prueth: Fix emac link speed handlingMD Danish Anwar1-0/+6
When link settings are changed emac->speed is populated by emac_adjust_link(). The link speed and other settings are then written into the DRAM. However if both ports are brought down after this and brought up again or if the operating mode is changed and a firmware reload is needed, the DRAM is cleared by icssg_config(). As a result the link settings are lost. Fix this by calling emac_adjust_link() after icssg_config(). This re populates the settings in the DRAM after a new firmware load. Fixes: 9facce84f406 ("net: ti: icssg-prueth: Fix firmware load sequence.") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Message-ID: <20250805173812.2183161-1-danishanwar@ti.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysnet: hibmcge: fix the np_link_fail error reporting issueJijie Shao1-2/+13
Currently, after modifying device port mode, the np_link_ok state is immediately checked. At this point, the device may not yet ready, leading to the querying of an intermediate state. This patch will poll to check if np_link is ok after modifying device port mode, and only report np_link_fail upon timeout. Fixes: e0306637e85d ("net: hibmcge: Add support for mac link exception handling feature") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysnet: hibmcge: fix the division by zero issueJijie Shao1-1/+6
When the network port is down, the queue is released, and ring->len is 0. In debugfs, hbg_get_queue_used_num() will be called, which may lead to a division by zero issue. This patch adds a check, if ring->len is 0, hbg_get_queue_used_num() directly returns 0. Fixes: 40735e7543f9 ("net: hibmcge: Implement .ndo_start_xmit function") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysnet: hibmcge: fix rtnl deadlock issueJijie Shao1-9/+5
Currently, the hibmcge netdev acquires the rtnl_lock in pci_error_handlers.reset_prepare() and releases it in pci_error_handlers.reset_done(). However, in the PCI framework: pci_reset_bus - __pci_reset_slot - pci_slot_save_and_disable_locked - pci_dev_save_and_disable - err_handler->reset_prepare(dev); In pci_slot_save_and_disable_locked(): list_for_each_entry(dev, &slot->bus->devices, bus_list) { if (!dev->slot || dev->slot!= slot) continue; pci_dev_save_and_disable(dev); if (dev->subordinate) pci_bus_save_and_disable_locked(dev->subordinate); } This will iterate through all devices under the current bus and execute err_handler->reset_prepare(), causing two devices of the hibmcge driver to sequentially request the rtnl_lock, leading to a deadlock. Since the driver now executes netif_device_detach() before the reset process, it will not concurrently with other netdev APIs, so there is no need to hold the rtnl_lock now. Therefore, this patch removes the rtnl_lock during the reset process and adjusts the position of HBG_NIC_STATE_RESETTING to ensure that multiple resets are not executed concurrently. Fixes: 3f5a61f6d504f ("net: hibmcge: Add reset supported in this module") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-08Merge tag 'net-6.17-rc1' of ↵Linus Torvalds15-39/+86
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: Previous releases - regressions: - netlink: avoid infinite retry looping in netlink_unicast() Previous releases - always broken: - packet: fix a race in packet_set_ring() and packet_notifier() - ipv6: reject malicious packets in ipv6_gso_segment() - sched: mqprio: fix stack out-of-bounds write in tc entry parsing - net: drop UFO packets (injected via virtio) in udp_rcv_segment() - eth: mlx5: correctly set gso_segs when LRO is used, avoid false positive checksum validation errors - netpoll: prevent hanging NAPI when netcons gets enabled - phy: mscc: fix parsing of unicast frames for PTP timestamping - a number of device tree / OF reference leak fixes" * tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits) pptp: fix pptp_xmit() error path net: ti: icssg-prueth: Fix skb handling for XDP_PASS net: Update threaded state in napi config in netif_set_threaded selftests: netdevsim: Xfail nexthop test on slow machines eth: fbnic: Lock the tx_dropped update eth: fbnic: Fix tx_dropped reporting eth: fbnic: remove the debugging trick of super high page bias net: ftgmac100: fix potential NULL pointer access in ftgmac100_phy_disconnect dt-bindings: net: Replace bouncing Alexandru Tachici emails dpll: zl3073x: ZL3073X_I2C and ZL3073X_SPI should depend on NET net/sched: mqprio: fix stack out-of-bounds write in tc entry parsing Revert "net: mdio_bus: Use devm for getting reset GPIO" selftests: net: packetdrill: xfail all problems on slow machines net/packet: fix a race in packet_set_ring() and packet_notifier() benet: fix BUG when creating VFs net: airoha: npu: Add missing MODULE_FIRMWARE macros net: devmem: fix DMA direction on unmapping ipa: fix compile-testing with qcom-mdt=m eth: fbnic: unlink NAPIs from queues on error to open net: Add locking to protect skb->dev access in ip_output ...
2025-08-06net: ti: icssg-prueth: Fix skb handling for XDP_PASSMeghana Malladi1-7/+8
emac_rx_packet() is a common function for handling traffic for both xdp and non-xdp use cases. Use common logic for handling skb with or without xdp to prevent any incorrect packet processing. This patch fixes ping working with XDP_PASS for icssg driver. Fixes: 62aa3246f4623 ("net: ti: icssg-prueth: Add XDP support") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250803180216.3569139-1-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-06eth: fbnic: Lock the tx_dropped updateMohsin Bashir1-0/+2
Wrap copying of drop stats on TX path from fbd->hw_stats by the hw_stats_lock. Currently, it is being performed outside the lock and another thread accessing fbd->hw_stats can lead to inconsistencies. Fixes: 5f8bd2ce8269 ("eth: fbnic: add support for TMI stats") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250802024636.679317-3-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-06eth: fbnic: Fix tx_dropped reportingMohsin Bashir1-4/+4
Correctly copy the tx_dropped stats from the fbd->hw_stats to the rtnl_link_stats64 struct. Fixes: 5f8bd2ce8269 ("eth: fbnic: add support for TMI stats") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250802024636.679317-2-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-06eth: fbnic: remove the debugging trick of super high page biasJakub Kicinski2-6/+4
Alex added page bias of LONG_MAX, which is admittedly quite a clever way of catching overflows of the pp ref count. The page pool code was "optimized" to leave the ref at 1 for freed pages so it can't catch basic bugs by itself any more. (Something we should probably address under DEBUG_NET...) Unfortunately for fbnic since commit f7dc3248dcfb ("skbuff: Optimization of SKB coalescing for page pool") core _may_ actually take two extra pp refcounts, if one of them is returned before driver gives up the bias the ret < 0 check in page_pool_unref_netmem() will trigger. While at it add a FBNIC_ to the name of the driver constant. Fixes: 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Link: https://patch.msgid.link/20250801170754.2439577-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-06net: ftgmac100: fix potential NULL pointer access in ftgmac100_phy_disconnectHeiner Kallweit1-3/+4
After the call to phy_disconnect() netdev->phydev is reset to NULL. So fixed_phy_unregister() would be called with a NULL pointer as argument. Therefore cache the phy_device before this call. Fixes: e24a6c874601 ("net: ftgmac100: Get link speed and duplex for NC-SI") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Link: https://patch.msgid.link/2b80a77a-06db-4dd7-85dc-3a8e0de55a1d@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-05benet: fix BUG when creating VFsMichal Schmidt1-1/+1
benet crashes as soon as SRIOV VFs are created: kernel BUG at mm/vmalloc.c:3457! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 4 UID: 0 PID: 7408 Comm: test.sh Kdump: loaded Not tainted 6.16.0+ #1 PREEMPT(voluntary) [...] RIP: 0010:vunmap+0x5f/0x70 [...] Call Trace: <TASK> __iommu_dma_free+0xe8/0x1c0 be_cmd_set_mac_list+0x3fe/0x640 [be2net] be_cmd_set_mac+0xaf/0x110 [be2net] be_vf_eth_addr_config+0x19f/0x330 [be2net] be_vf_setup+0x4f7/0x990 [be2net] be_pci_sriov_configure+0x3a1/0x470 [be2net] sriov_numvfs_store+0x20b/0x380 kernfs_fop_write_iter+0x354/0x530 vfs_write+0x9b9/0xf60 ksys_write+0xf3/0x1d0 do_syscall_64+0x8c/0x3d0 be_cmd_set_mac_list() calls dma_free_coherent() under a spin_lock_bh. Fix it by freeing only after the lock has been released. Fixes: 1a82d19ca2d6 ("be2net: fix sleeping while atomic bugs in be_ndo_bridge_getlink") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250801101338.72502-1-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-05net: airoha: npu: Add missing MODULE_FIRMWARE macrosLorenzo Bianconi1-0/+2
Introduce missing MODULE_FIRMWARE definitions for firmware autoload. Fixes: 23290c7bc190d ("net: airoha: Introduce Airoha NPU support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250801-airoha-npu-missing-module-firmware-v2-1-e860c824d515@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-05eth: fbnic: unlink NAPIs from queues on error to openJakub Kicinski1-1/+3
CI hit a UaF in fbnic in the AF_XDP portion of the queues.py test. The UaF is in the __sk_mark_napi_id_once() call in xsk_bind(), NAPI has been freed. Looks like the device failed to open earlier, and we lack clearing the NAPI pointer from the queue. Fixes: 557d02238e05 ("eth: fbnic: centralize the queue count and NAPI<>queue setting") Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250728163129.117360-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-04Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Significant patch series in this pull request: - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets us closer to being able to remove page->mapping - "relayfs: misc changes" (Jason Xing) does some maintenance and minor feature addition work in relayfs - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches us from static preallocation of the kdump crashkernel's working memory over to dynamic allocation. So the difficulty of a-priori estimation of the second kernel's needs is removed and the first kernel obtains extra memory - "generalize panic_print's dump function to be used by other kernel parts" (Feng Tang) implements some consolidation and rationalization of the various ways in which a failing kernel splats information at the operator * tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits) tools/getdelays: add backward compatibility for taskstats version kho: add test for kexec handover delaytop: enhance error logging and add PSI feature description samples: Kconfig: fix spelling mistake "instancess" -> "instances" fat: fix too many log in fat_chain_add() scripts/spelling.txt: add notifer||notifier to spelling.txt xen/xenbus: fix typo "notifer" net: mvneta: fix typo "notifer" drm/xe: fix typo "notifer" cxl: mce: fix typo "notifer" KVM: x86: fix typo "notifer" MAINTAINERS: add maintainers for delaytop ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below() ucount: fix atomic_long_inc_below() argument type kexec: enable CMA based contiguous allocation stackdepot: make max number of pools boot-time configurable lib/xxhash: remove unused functions init/Kconfig: restore CONFIG_BROKEN help text lib/raid6: update recov_rvv.c zero page usage docs: update docs after introducing delaytop ...
2025-08-02net: mvneta: fix typo "notifer"WangYuli1-1/+1
There is a spelling mistake of 'notifer' in the comment which should be 'notifier'. Link: https://lkml.kernel.org/r/0CB4300CB6F49007+20250722073431.21983-4-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-02net/mlx5: Correctly set gso_segs when LRO is usedChristoph Paasch1-0/+1
When gso_segs is left at 0, a number of assumptions will end up being incorrect throughout the stack. For example, in the GRO-path, we set NAPI_GRO_CB()->count to gso_segs. So, if a non-LRO'ed packet followed by an LRO'ed packet is being processed in GRO, the first one will have NAPI_GRO_CB()->count set to 1 and the next one to 0 (in dev_gro_receive()). Since commit 531d0d32de3e ("net/mlx5: Correctly set gso_size when LRO is used") these packets will get merged (as their gso_size now matches). So, we end up in gro_complete() with NAPI_GRO_CB()->count == 1 and thus don't call inet_gro_complete(). Meaning, checksum-validation in tcp_checksum_complete() will fail with a "hw csum failure". Even before the above mentioned commit, incorrect gso_segs means that other things like TCP's accounting of incoming packets (tp->segs_in, data_segs_in, rcv_ooopack) will be incorrect. Which means that if one does bytes_received/data_segs_in, the result will be bigger than the MTU. Fix this by initializing gso_segs correctly when LRO is used. Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Reported-by: Gal Pressman <gal@nvidia.com> Closes: https://lore.kernel.org/netdev/6583783f-f0fb-4fb1-a415-feec8155bc69@nvidia.com/ Signed-off-by: Christoph Paasch <cpaasch@openai.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250729-mlx5_gso_segs-v1-1-b48c480c1c12@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-02sfc: unfix not-a-typo in commentEdward Cree1-1/+1
Commit fe09560f8241 ("net: Fix typos") removed duplicated word 'fallback', but this was not a typo and change altered the semantic meaning of the comment. Partially revert, using the phrase 'fallback of the fallback' to make the meaning more clear to future readers so that they won't try to change it again. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250731144138.2637949-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-01net: airoha: Fix PPE table access in airoha_ppe_debugfs_foe_show()Lorenzo Bianconi1-6/+20
In order to avoid any possible race we need to hold the ppe_lock spinlock accessing the hw PPE table. airoha_ppe_foe_get_entry routine is always executed holding ppe_lock except in airoha_ppe_debugfs_foe_show routine. Fix the problem introducing airoha_ppe_foe_get_entry_locked routine. Fixes: 3fe15c640f380 ("net: airoha: Introduce PPE debugfs support") Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250731-airoha_ppe_foe_get_entry_locked-v2-1-50efbd8c0fd6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds5-5/+180
Pull rdma updates from Jason Gunthorpe: - Various minor code cleanups and fixes for hns, iser, cxgb4, hfi1, rxe, erdma, mana_ib - Prefetch supprot for rxe ODP - Remove memory window support from hns as new device FW is no longer support it - Remove qib, it is very old and obsolete now, Cornelis wishes to restructure the hfi1/qib shared layer - Fix a race in destroying CQs where we can still end up with work running because the work is cancled before the driver stops triggering it - Improve interaction with namespaces: * Follow the devlink namespace for newly spawned RDMA devices * Create iopoib net devces in the parent IB device's namespace * Allow CAP_NET_RAW checks to pass in user namespaces - A new flow control scheme for IB MADs to try and avoid queue overflows in the network - Fix 2G message sizes in bnxt_re - Optimize mkey layout for mlx5 DMABUF - New "DMA Handle" concept to allow controlling PCI TPH and steering tags * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits) RDMA/siw: Change maintainer email address RDMA/mana_ib: add support of multiple ports RDMA/mlx5: Refactor optional counters steering code RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr IB: Extend UVERBS_METHOD_REG_MR to get DMAH RDMA/mlx5: Add DMAH object support RDMA/core: Introduce a DMAH object and its alloc/free APIs IB/core: Add UVERBS_METHOD_REG_MR on the MR object net/mlx5: Add support for device steering tag net/mlx5: Expose IFC bits for TPH PCI/TPH: Expose pcie_tph_get_st_table_size() RDMA/mlx5: Fix incorrect MKEY masking RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey() RDMA/mlx5: remove redundant check on err on return expression RDMA/mana_ib: add additional port counters RDMA/mana_ib: Fix DSCP value in modify QP RDMA/efa: Add CQ with external memory support RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers RDMA/uverbs: Add a common way to create CQ with umem RDMA/mlx5: Optimize DMABUF mkey page size ...
2025-07-31net: ti: icss-iep: fix device and OF node leaks at probeJohan Hovold1-5/+18
Make sure to drop the references to the IEP OF node and device taken by of_parse_phandle() and of_find_device_by_node() when looking up IEP devices during probe. Drop the bogus additional reference taken on successful lookup so that the device is released correctly by icss_iep_put(). Fixes: c1e0230eeaab ("net: ti: icss-iep: Add IEP driver") Cc: stable@vger.kernel.org # 6.6 Cc: Roger Quadros <rogerq@kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250725171213.880-6-johan@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-31net: mtk_eth_soc: fix device leak at probeJohan Hovold1-1/+0
The reference count to the WED devices has already been incremented when looking them up using of_find_device_by_node() so drop the bogus additional reference taken during probe. Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Cc: stable@vger.kernel.org # 5.19 Cc: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250725171213.880-5-johan@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-31net: gianfar: fix device leak when querying time stamp infoJohan Hovold1-1/+3
Make sure to drop the reference to the ptp device taken by of_find_device_by_node() when querying the time stamping capabilities. Note that holding a reference to the ptp device does not prevent its driver data from going away. Fixes: 7349a74ea75c ("net: ethernet: gianfar_ethtool: get phc index through drvdata") Cc: stable@vger.kernel.org # 4.18 Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250725171213.880-4-johan@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-31net: enetc: fix device and OF node leak at probeJohan Hovold1-2/+12
Make sure to drop the references to the IERB OF node and platform device taken by of_parse_phandle() and of_find_device_by_node() during probe. Fixes: e7d48e5fbf30 ("net: enetc: add a mini driver for the Integrated Endpoint Register Block") Cc: stable@vger.kernel.org # 5.13 Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250725171213.880-3-johan@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-31net: dpaa: fix device leak when querying time stamp infoJohan Hovold1-1/+3
Make sure to drop the reference to the ptp device taken by of_find_device_by_node() when querying the time stamping capabilities. Note that holding a reference to the ptp device does not prevent its driver data from going away. Fixes: 17ae0b0ee9db ("dpaa_eth: add the get_ts_info interface for ethtool") Cc: stable@vger.kernel.org # 4.19 Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250725171213.880-2-johan@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-30Merge tag 'net-next-6.17' of ↵Linus Torvalds560-21398/+27521
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Wrap datapath globals into net_aligned_data, to avoid false sharing - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container) - Add SO_INQ and SCM_INQ support to AF_UNIX - Add SIOCINQ support to AF_VSOCK - Add TCP_MAXSEG sockopt to MPTCP - Add IPv6 force_forwarding sysctl to enable forwarding per interface - Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB - Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users - Convert TCP send queue handling from tasklet to BH workque - Improve BPF iteration over TCP sockets to see each socket exactly once - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code - Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization - Allow multicast routing to take effect on locally-generated packets - Add output interface argument for End.X in segment routing - MCTP: add support for gateway routing, improve bind() handling - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink - Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed - Support NUD_PERMANENT for proxy neighbor entries - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM - Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup - Support inspecting ref_tracker state via DebugFS - Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links - Allow providing upcall pid for the 'execute' command in openvswitch - Remove DCCP support from Netfilter's conntrack - Disallow multiple packet duplications in the queuing layer - Prevent use of deprecated iptables code on PREEMPT_RT Driver API: - Support RSS and hashing configuration over ethtool Netlink - Add dedicated ethtool callbacks for getting and setting hashing fields - Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc - Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs - Support traffic classes in devlink rate API for bandwidth management - Remove rtnl_lock dependency from UDP tunnel port configuration Device drivers: - Add a new Broadcom driver for 800G Ethernet (bnge) - Add a standalone driver for Microchip ZL3073x DPLL - Remove IBM's NETIUCV device driver - Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access) - Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping - Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583 - Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x - CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info - WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling - WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device - Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS - Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading" * tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
2025-07-30Merge tag 'timers-cleanups-2025-07-27' of ↵Linus Torvalds16-18/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide cleanup of struct cycle_counter const annotations. The initial idea of making them const was correct as they were seperate instances. When they got embedded into larger data structures, which are even modified by the callback this got moot. The only reason why this went unnoticed is that the required container_of() casts the const attribute forcefully away. Stop pretending that it is const" * tag 'timers-cleanups-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time/timecounter: Fix the lie that struct cyclecounter is const
2025-07-29Merge tag 'driver-core-6.17-rc1' of ↵Linus Torvalds3-27/+32
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Remove unneeded debugfs_file_{get,put}() instances - Remove last remnants of debugfs_real_fops() - Allow storing non-const void * in struct debugfs_inode_info::aux sysfs: - Switch back to attribute_group::bin_attrs (treewide) - Switch back to bin_attribute::read()/write() (treewide) - Constify internal references to 'struct bin_attribute' Support cache-ids for device-tree systems: - Add arch hook arch_compact_of_hwid() - Use arch_compact_of_hwid() to compact MPIDR values on arm64 Rust: - Device: - Introduce CoreInternal device context (for bus internal methods) - Provide generic drvdata accessors for bus devices - Provide Driver::unbind() callbacks - Use the infrastructure above for auxiliary, PCI and platform - Implement Device::as_bound() - Rename Device::as_ref() to Device::from_raw() (treewide) - Implement fwnode and device property abstractions - Implement example usage in the Rust platform sample driver - Devres: - Remove the inner reference count (Arc) and use pin-init instead - Replace Devres::new_foreign_owned() with devres::register() - Require T to be Send in Devres<T> - Initialize the data kept inside a Devres last - Provide an accessor for the Devres associated Device - Device ID: - Add support for ACPI device IDs and driver match tables - Split up generic device ID infrastructure - Use generic device ID infrastructure in net::phy - DMA: - Implement the dma::Device trait - Add DMA mask accessors to dma::Device - Implement dma::Device for PCI and platform devices - Use DMA masks from the DMA sample module - I/O: - Implement abstraction for resource regions (struct resource) - Implement resource-based ioremap() abstractions - Provide platform device accessors for I/O (remap) requests - Misc: - Support fallible PinInit types in Revocable - Implement Wrapper<T> for Opaque<T> - Merge pin-init blanket dependencies (for Devres) Misc: - Fix OF node leak in auxiliary_device_create() - Use util macros in device property iterators - Improve kobject sample code - Add device_link_test() for testing device link flags - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits - Hint to prefer container_of_const() over container_of()" * tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits) rust: io: fix broken intra-doc links to `platform::Device` rust: io: fix broken intra-doc link to missing `flags` module rust: io: mem: enable IoRequest doc-tests rust: platform: add resource accessors rust: io: mem: add a generic iomem abstraction rust: io: add resource abstraction rust: samples: dma: set DMA mask rust: platform: implement the `dma::Device` trait rust: pci: implement the `dma::Device` trait rust: dma: add DMA addressing capabilities rust: dma: implement `dma::Device` trait rust: net::phy Change module_phy_driver macro to use module_device_table macro rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id rust: device_id: split out index support into a separate trait device: rust: rename Device::as_ref() to Device::from_raw() arm64: cacheinfo: Provide helper to compress MPIDR value into u32 cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id cacheinfo: Set cache 'id' based on DT data container_of: Document container_of() is not to be used in new code driver core: auxiliary bus: fix OF node leak ...
2025-07-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski7-4/+42
Merge in late fixes to prepare for the 6.17 net-next PR. Conflicts: net/core/neighbour.c 1bbb76a89948 ("neighbour: Fix null-ptr-deref in neigh_flush_dev().") 13a936bb99fb ("neighbour: Protect tbl->phash_buckets[] with a dedicated mutex.") 03dc03fa0432 ("neighbor: Add NTF_EXT_VALIDATED flag for externally validated entries") Adjacent changes: drivers/net/usb/usbnet.c 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event") 2c04d279e857 ("net: usb: Convert tasklet API to new bottom half workqueue mechanism") net/ipv6/route.c 31d7d67ba127 ("ipv6: annotate data-races around rt->fib6_nsiblings") 1caf27297215 ("ipv6: adopt dst_dev() helper") 3b3ccf9ed05e ("net: Remove unnecessary NULL check for lwtunnel_fill_encap()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-26net: fsl_pq_mdio: use dev_err_probeAlexander Stein1-2/+2
Silence deferred probes using dev_err_probe(). This can happen when the ethernet PHY uses an IRQ line attached to a i2c GPIO expander. If the i2c bus is not yet ready, a probe deferral can occur. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250725055615.259945-1-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-26igb: xsk: solve negative overflow of nb_pkts in zerocopy modeJason Xing1-2/+1
There is no break time in the while() loop, so every time at the end of igb_xmit_zc(), negative overflow of nb_pkts will occur, which renders the return value always false. But theoretically, the result should be set after calling xsk_tx_peek_release_desc_batch(). We can take i40e_xmit_zc() as a good example. Returning false means we're not done with transmission and we need one more poll, which is exactly what igb_xmit_zc() always did before this patch. After this patch, the return value depends on the nb_pkts value. Two cases might happen then: 1. if (nb_pkts < budget), it means we process all the possible data, so return true and no more necessary poll will be triggered because of this. 2. if (nb_pkts == budget), it means we might have more data, so return false to let another poll run again. Fixes: f8e284a02afc ("igb: Add AF_XDP zero-copy Tx support") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20250723142327.85187-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-26stmmac: xsk: fix negative overflow of budget in zerocopy modeJason Xing1-1/+1
A negative overflow can happen when the budget number of descs are consumed. as long as the budget is decreased to zero, it will again go into while (budget-- > 0) statement and get decreased by one, so the overflow issue can happen. It will lead to returning true whereas the expected value should be false. In this case where all the budget is used up, it means zc function should return false to let the poll run again because normally we might have more data to process. Without this patch, zc function would return true instead. Fixes: 132c32ee5bc0 ("net: stmmac: Add TX via XDP zero-copy socket") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20250723142327.85187-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-26net: stmmac: dwmac-socfpga: Add xgmac support for Agilex5Mun Yew Tham1-0/+1
Add support for Agilex5 compatible value. Signed-off-by: Mun Yew Tham <mun.yew.tham@altera.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com> Link: https://patch.msgid.link/20250724154052.205706-5-matthew.gerlach@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-26Merge branch '100GbE' of ↵Jakub Kicinski58-2142/+1267
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== libie: commonize adminq structure Michal Swiatkowski says: It is a prework to allow reusing some specific Intel code (eq. fwlog). Move common *_aq_desc structure to libie header and changing it in ice, ixgbe, i40e and iavf. Only generic adminq commands can be easily moved to common header, as rest is slightly different. Format remains the same. It will be better to correctly move it when it will be needed to commonize other part of the code. Move *_aq_str() to new libie module (libie_adminq) and use it across drivers. The functions are exactly the same in each driver. Some more adminq helpers/functions can be moved to libie_adminq when needed. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: use libie_aq_str iavf: use libie_aq_str ice: use libie_aq_str libie: add adminq helper for converting err to str iavf: use libie adminq descriptors i40e: use libie adminq descriptors ixgbe: use libie adminq descriptors ice, libie: move generic adminq descriptors to lib ==================== Link: https://patch.msgid.link/20250724182826.3758850-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-26net/mlx5e: Expose TIS via devlink tx reporter diagnoseFeng Liu1-0/+25
Underneath "TIS Config" tag expose TIS diagnostic information. Expose the tisn of each TC under each lag port. $ sudo devlink health diagnose auxiliary/mlx5_core.eth.2/131072 reporter tx ...... TIS Config: lag port: 0 tc: 0 tisn: 0 lag port: 1 tc: 0 tisn: 8 ...... Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1753194228-333722-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-26net/mlx5e: Support routed networks during IPsec MACs initializationAlexandre Cassen1-2/+80
Remote IPsec tunnel endpoint may refer to a network segment that is not directly connected to the host. In such a case, IPsec tunnel endpoints are connected to a router and reachable via a routing path. In IPsec packet offload mode, HW is initialized with the MAC address of both IPsec tunnel endpoints. Extend the current IPsec init MACs procedure to resolve nexthop for routed networks. Direct neighbour lookup and probe is still used for directly connected networks and as a fallback mechanism if fib lookup fails. Signed-off-by: Alexandre Cassen <acassen@corp.free.fr> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1753194228-333722-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25octeontx2-af: use unsigned int as iterator for unsigned valuesSimon Horman1-3/+3
The local variable i is used to iterate over unsigned values. The lower bound of the loop is set to 0. While the upper bound is cgx->lmac_count, where they lmac_count is an u8. So the theoretical upper bound is 255. As is, GCC can't see this range of values and warns that a formatted string, which includes the %d representation of i, may overflow the buffer provided. GCC 15.1.0 says: .../cgx.c: In function 'cgx_lmac_init': .../cgx.c:1737:49: warning: '%d' directive writing between 1 and 11 bytes into a region of size between 4 and 6 [-Wformat-overflow=] 1737 | sprintf(lmac->name, "cgx_fwi_%d_%d", cgx->cgx_id, i); | ^~ .../cgx.c:1737:37: note: directive argument in the range [-2147483641, 254] 1737 | sprintf(lmac->name, "cgx_fwi_%d_%d", cgx->cgx_id, i); | ^~~~~~~~~~~~~~~ .../cgx.c:1737:17: note: 'sprintf' output between 12 and 24 bytes into a destination of size 16 1737 | sprintf(lmac->name, "cgx_fwi_%d_%d", cgx->cgx_id, i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Empirically, changing the type of i from (signed) int to unsigned int addresses this problem. I assume by allowing GCC to see the range of values described above. Also update the format specifiers for the integer values in the string in question from %d to %u. This seems appropriate as they are now both unsigned. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250724-octeontx2-af-unsigned-v1-1-c745c106e06f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25net: hibmcge: support for statistics of reset failuresJijie Shao4-0/+5
Add a statistical item to count the number of reset failures. This statistical item can be queried using ethtool -S or reported through diagnose information. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250723074826.2756135-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25net/mlx5e: Fix potential deadlock by deferring RX timeout recoveryShahar Shitrit3-1/+33
mlx5e_reporter_rx_timeout() is currently invoked synchronously in the driver's open error flow. This causes the thread holding priv->state_lock to attempt acquiring the devlink lock, which can result in a circular dependency with other devlink operations. For example: - Devlink health diagnose flow: - __devlink_nl_pre_doit() acquires the devlink lock. - devlink_nl_health_reporter_diagnose_doit() invokes the driver's diagnose callback. - mlx5e_rx_reporter_diagnose() then attempts to acquire priv->state_lock. - Driver open flow: - mlx5e_open() acquires priv->state_lock. - If an error occurs, devlink_health_reporter may be called, attempting to acquire the devlink lock. To prevent this circular locking scenario, defer the RX timeout recovery by scheduling it via a workqueue. This ensures that the recovery work acquires locks in a consistent order: first the devlink lock, then priv->state_lock. Additionally, make the recovery work acquire the netdev instance lock to safely synchronize with the open/close channel flows, similar to mlx5e_tx_timeout_work. Repeatedly attempt to acquire the netdev instance lock until it is taken or the target RQ is no longer active, as indicated by the MLX5E_STATE_CHANNELS_ACTIVE bit. Fixes: 32c57fb26863 ("net/mlx5e: Report and recover from rx timeout") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1753256672-337784-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25net/mlx5e: Remove skb secpath if xfrm state is not foundJianbo Liu1-0/+4
Hardware returns a unique identifier for a decrypted packet's xfrm state, this state is looked up in an xarray. However, the state might have been freed by the time of this lookup. Currently, if the state is not found, only a counter is incremented. The secpath (sp) extension on the skb is not removed, resulting in sp->len becoming 0. Subsequently, functions like __xfrm_policy_check() attempt to access fields such as xfrm_input_state(skb)->xso.type (which dereferences sp->xvec[sp->len - 1]) without first validating sp->len. This leads to a crash when dereferencing an invalid state pointer. This patch prevents the crash by explicitly removing the secpath extension from the skb if the xfrm state is not found after hardware decryption. This ensures downstream functions do not operate on a zero-length secpath. BUG: unable to handle page fault for address: ffffffff000002c8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 282e067 P4D 282e067 PUD 0 Oops: Oops: 0000 [#1] SMP CPU: 12 UID: 0 PID: 0 Comm: swapper/12 Not tainted 6.15.0-rc7_for_upstream_min_debug_2025_05_27_22_44 #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:__xfrm_policy_check+0x61a/0xa30 Code: b6 77 7f 83 e6 02 74 14 4d 8b af d8 00 00 00 41 0f b6 45 05 c1 e0 03 48 98 49 01 c5 41 8b 45 00 83 e8 01 48 98 49 8b 44 c5 10 <0f> b6 80 c8 02 00 00 83 e0 0c 3c 04 0f 84 0c 02 00 00 31 ff 80 fa RSP: 0018:ffff88885fb04918 EFLAGS: 00010297 RAX: ffffffff00000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000000 RBP: ffffffff8311af80 R08: 0000000000000020 R09: 00000000c2eda353 R10: ffff88812be2bbc8 R11: 000000001faab533 R12: ffff88885fb049c8 R13: ffff88812be2bbc8 R14: 0000000000000000 R15: ffff88811896ae00 FS: 0000000000000000(0000) GS:ffff8888dca82000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff000002c8 CR3: 0000000243050002 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? try_to_wake_up+0x108/0x4c0 ? udp4_lib_lookup2+0xbe/0x150 ? udp_lib_lport_inuse+0x100/0x100 ? __udp4_lib_lookup+0x2b0/0x410 __xfrm_policy_check2.constprop.0+0x11e/0x130 udp_queue_rcv_one_skb+0x1d/0x530 udp_unicast_rcv_skb+0x76/0x90 __udp4_lib_rcv+0xa64/0xe90 ip_protocol_deliver_rcu+0x20/0x130 ip_local_deliver_finish+0x75/0xa0 ip_local_deliver+0xc1/0xd0 ? ip_protocol_deliver_rcu+0x130/0x130 ip_sublist_rcv+0x1f9/0x240 ? ip_rcv_finish_core+0x430/0x430 ip_list_rcv+0xfc/0x130 __netif_receive_skb_list_core+0x181/0x1e0 netif_receive_skb_list_internal+0x200/0x360 ? mlx5e_build_rx_skb+0x1bc/0xda0 [mlx5_core] gro_receive_skb+0xfd/0x210 mlx5e_handle_rx_cqe_mpwrq+0x141/0x280 [mlx5_core] mlx5e_poll_rx_cq+0xcc/0x8e0 [mlx5_core] ? mlx5e_handle_rx_dim+0x91/0xd0 [mlx5_core] mlx5e_napi_poll+0x114/0xab0 [mlx5_core] __napi_poll+0x25/0x170 net_rx_action+0x32d/0x3a0 ? mlx5_eq_comp_int+0x8d/0x280 [mlx5_core] ? notifier_call_chain+0x33/0xa0 handle_softirqs+0xda/0x250 irq_exit_rcu+0x6d/0xc0 common_interrupt+0x81/0xa0 </IRQ> Fixes: b2ac7541e377 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1753256672-337784-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25net/mlx5e: Clear Read-Only port buffer size in PBMC before updateAlexei Lazar1-0/+3
When updating the PBMC register, we read its current value, modify desired fields, then write it back. The port_buffer_size field within PBMC is Read-Only (RO). If this RO field contains a non-zero value when read, attempting to write it back will cause the entire PBMC register update to fail. This commit ensures port_buffer_size is explicitly cleared to zero after reading the PBMC register but before writing back the modified value. This allows updates to other fields in the PBMC register to succeed. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1753256672-337784-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25net: Fix typosBjorn Helgaas58-77/+77
Fix typos in comments and error messages. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250723201528.2908218-1-helgaas@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25net/mlx5: Fix build -Wframe-larger-than warningsZhu Yanjun3-23/+58
When building, the following warnings will appear. " pci_irq.c: In function ‘mlx5_ctrl_irq_request’: pci_irq.c:494:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] pci_irq.c: In function ‘mlx5_irq_request_vector’: pci_irq.c:561:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] eq.c: In function ‘comp_irq_request_sf’: eq.c:897:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=] irq_affinity.c: In function ‘irq_pool_request_irq’: irq_affinity.c:74:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=] " These warnings indicate that the stack frame size exceeds 1024 bytes in these functions. To resolve this, instead of allocating large memory buffers on the stack, it is better to use kvzalloc to allocate memory dynamically on the heap. This approach reduces stack usage and eliminates these frame size warnings. Acked-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250722212023.244296-1-yanjun.zhu@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>