kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2025-06-18	net/mlx5e: SHAMPO: Improve hw gro capability checking	Saeed Mahameed	1	-6/+7
	Add missing HW capabilities, declare the feature in netdev->vlan_features, similar to other features in mlx5e_build_nic_netdev. No functional change here as all by default disabled features are explicitly disabled at the bottom of the function. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-7-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net/mlx5e: SHAMPO: Remove redundant params	Saeed Mahameed	3	-24/+20
	Two SHAMPO params are static and always the same, remove them from the global mlx5e_params struct. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc	Saeed Mahameed	2	-70/+66
	Drop redundant SHAMPO structure alloc/free functions. Gather together function calls pertaining to header split info, pass header per WQE (hd_per_wqe) as parameter to those function to avoid use before initialization future mistakes. Allocate HW GRO related info outside of the header related info scope. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	gve: Return error for unknown admin queue command	Alok Tiwari	1	-0/+1
	In gve_adminq_issue_cmd(), return -EINVAL instead of 0 when an unknown admin queue command opcode is encountered. This prevents the function from silently succeeding on invalid input and prevents undefined behavior by ensuring the function fails gracefully when an unrecognized opcode is provided. These changes improve error handling. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20250616054504.1644770-2-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	gve: Fix various typos and improve code comments	Alok Tiwari	5	-6/+6
	- Correct spelling and improves the clarity of comments "confiugration" -> "configuration" "spilt" -> "split" "It if is 0" -> "If it is 0" "DQ" -> "DQO" (correct abbreviation) - Clarify BIT(0) flag usage in gve_get_priv_flags() - Replaced hardcoded array size with GVE_NUM_PTYPES for clarity and maintainability. These changes are purely cosmetic and do not affect functionality. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250616054504.1644770-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: stmmac: visconti: make phy_intf_sel local	Russell King (Oracle)	1	-12/+11
	There is little need to have phy_intf_sel as a member of struct visconti_eth when we have the PHY interface mode available from phylink in visconti_eth_set_clk_tx_rate(). Without multiple interface support, phylink is fixed to supporting only plat->phy_interface, so we can be sure that "interface" passed into this function is the same as plat->phy_interface. Make phy_intf_sel local to visconti_eth_init_hw() and clean up. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uRH2G-004UyY-GD@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: stmmac: visconti: clean up code formatting	Russell King (Oracle)	1	-15/+20
	Ensure that code is wrapped prior to column 80, and shorten the needlessly long "clk_sel_val" to just "clk_sel". Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uRH2B-004UyS-Ch@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: stmmac: visconti: reorganise visconti_eth_set_clk_tx_rate()	Russell King (Oracle)	1	-22/+29
	Rather than testing dwmac->phy_intf_sel several times for the same values in this function, group the code together. The only part which was common was stopping the internal clock before programming the clock setting. This further improves the readability of this function. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uRH26-004UyM-9G@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: stmmac: visconti: re-arrange speed decode	Russell King (Oracle)	1	-18/+26
	Re-arrange the speed decode in visconti_eth_set_clk_tx_rate() to be more readable by first checking to see if we're using RGMII or RMII and then decoding the speed, rather than decoding the speed and then testing the interface mode. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uRH21-004UyG-50@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	rtase: Link queues to NAPI instances	Justin Lai	2	-2/+18
	Link queues to NAPI instances with netif_queue_set_napi. This information can be queried with the netdev-genl API. Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250616032226.7318-3-justinlai0215@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	rtase: Link IRQs to NAPI instances	Justin Lai	1	-6/+14
	Link IRQs to NAPI instances with netif_napi_set_irq. This information can be queried with the netdev-genl API. Also add support for persistent NAPI configuration using netif_napi_add_config(). Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250616032226.7318-2-justinlai0215@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: bcmgenet: update PHY power down	Doug Berger	1	-1/+6
	The disable sequence in bcmgenet_phy_power_set() is updated to match the inverse sequence and timing (and spacing) of the enable sequence. This ensures that LEDs driven by the GENET IP are disabled when the GPHY is powered down. Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250614025817.3808354-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	bnxt_en: Improve comment wording and error return code	Alok Tiwari	3	-5/+5
	Improved wording and grammar in several comments for clarity. "the must belongs" -> "it must belong" "mininum" -> "minimum" "fileds" -> "fields" Replaced return -1 with -EINVAL in hwrm_ring_alloc_send_msg() to return a proper error code. These changes enhance code readability and consistent error handling. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250615154051.1365631-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: liquidio: Remove unused validate_cn23xx_pf_config_info()	Dr. David Alan Gilbert	2	-42/+0
	[Note, I'm wondering if actually this is a case of a missing call; the other similar function is called in __verify_octeon_config_info(), but I don't have or know the hardware.] validate_cn23xx_pf_config_info() was added in 2016 by commit 72c0091293c0 ("liquidio: CN23XX device init and sriov config") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250614234941.61769-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	bnxt_en: Update MRU and RSS table of RSS contexts on queue reset	Pavan Chebbi	1	-5/+51
	The commit under the Fixes tag below which updates the VNICs' RSS and MRU during .ndo_queue_start(), needs to be extended to cover any non-default RSS contexts which have their own VNICs. Without this step, packets that are destined to a non-default RSS context may be dropped after .ndo_queue_start(). We further optimize this scheme by updating the VNIC only if the RX ring being restarted is in the RSS table of the VNIC. Updating the VNIC (in particular setting the MRU to 0) will momentarily stop all traffic to all rings in the RSS table. Any VNIC that has the RX ring excluded from the RSS table can skip this step and avoid the traffic disruption. Note that this scheme is just an improvement. A VNIC with multiple rings in the RSS table will still see traffic disruptions to all rings in the RSS table when one of the rings is being restarted. We are working on a FW scheme that will improve upon this further. Fixes: 5ac066b7b062 ("bnxt_en: Fix queue start to update vnic RSS table") Reported-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250613231841.377988-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	bnxt_en: Add a helper function to configure MRU and RSS	Pavan Chebbi	1	-11/+26
	Add a new helper function that will configure MRU and RSS table of a VNIC. This will be useful when we configure both on a VNIC when resetting an RX ring. This function will be used again in the next bug fix patch where we have to reconfigure VNICs for RSS contexts. Suggested-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250613231841.377988-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	bnxt_en: Fix double invocation of bnxt_ulp_stop()/bnxt_ulp_start()	Kalesh AP	1	-14/+10
	Before the commit under the Fixes tag below, bnxt_ulp_stop() and bnxt_ulp_start() were always invoked in pairs. After that commit, the new bnxt_ulp_restart() can be invoked after bnxt_ulp_stop() has been called. This may result in the RoCE driver's aux driver .suspend() method being invoked twice. The 2nd bnxt_re_suspend() call will crash when it dereferences a NULL pointer: (NULL ib_device): Handle device suspend call BUG: kernel NULL pointer dereference, address: 0000000000000b78 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 20 UID: 0 PID: 181 Comm: kworker/u96:5 Tainted: G S 6.15.0-rc1 #4 PREEMPT(voluntary) Tainted: [S]=CPU_OUT_OF_SPEC Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 Workqueue: bnxt_pf_wq bnxt_sp_task [bnxt_en] RIP: 0010:bnxt_re_suspend+0x45/0x1f0 [bnxt_re] Code: 8b 05 a7 3c 5b f5 48 89 44 24 18 31 c0 49 8b 5c 24 08 4d 8b 2c 24 e8 ea 06 0a f4 48 c7 c6 04 60 52 c0 48 89 df e8 1b ce f9 ff <48> 8b 83 78 0b 00 00 48 8b 80 38 03 00 00 a8 40 0f 85 b5 00 00 00 RSP: 0018:ffffa2e84084fd88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffffb4b6b934 RDI: 00000000ffffffff RBP: ffffa1760954c9c0 R08: 0000000000000000 R09: c0000000ffffdfff R10: 0000000000000001 R11: ffffa2e84084fb50 R12: ffffa176031ef070 R13: ffffa17609775000 R14: ffffa17603adc180 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffa17daa397000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000b78 CR3: 00000004aaa30003 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> bnxt_ulp_stop+0x69/0x90 [bnxt_en] bnxt_sp_task+0x678/0x920 [bnxt_en] ? __schedule+0x514/0xf50 process_scheduled_works+0x9d/0x400 worker_thread+0x11c/0x260 ? __pfx_worker_thread+0x10/0x10 kthread+0xfe/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2b/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Check the BNXT_EN_FLAG_ULP_STOPPED flag and do not proceed if the flag is already set. This will preserve the original symmetrical bnxt_ulp_stop() and bnxt_ulp_start(). Also, inside bnxt_ulp_start(), clear the BNXT_EN_FLAG_ULP_STOPPED flag after taking the mutex to avoid any race condition. And for symmetry, only proceed in bnxt_ulp_start() if the BNXT_EN_FLAG_ULP_STOPPED is set. Fixes: 3c163f35bd50 ("bnxt_en: Optimize recovery path ULP locking in the driver") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Co-developed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250613231841.377988-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: ti: icssg-prueth: Fix packet handling for XDP_TX	Meghana Malladi	1	-17/+2
	While transmitting XDP frames for XDP_TX, page_pool is used to get the DMA buffers (already mapped to the pages) and need to be freed/reycled once the transmission is complete. This need not be explicitly done by the driver as this is handled more gracefully by the xdp driver while returning the xdp frame. __xdp_return() frees the XDP memory based on its memory type, under which page_pool memory is also handled. This change fixes the transmit queue timeout while running XDP_TX. logs: [ 309.069682] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 45860 ms [ 313.933780] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 50724 ms [ 319.053656] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 55844 ms ... Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250616063319.3347541-1-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: stmmac: rk: remove unnecessary clk_mac	Russell King (Oracle)	1	-8/+1
	The stmmac platform code already gets the "stmmaceth" clock, so there is no need for drivers to get it. Use the stored pointer in struct plat_stmmacenet_data instead of getting and storing our own pointer. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uR6sj-004Ku5-HR@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: stmmac: rk: use device rather than platform device in rk_priv_data	Russell King (Oracle)	1	-8/+8
	All the code in dwmac-rk uses &bsp_priv->pdev->dev, nothing uses bsp_priv->pdev directly. Store the struct device rather than the struct platform_device in struct rk_priv_data, and simplifying the code. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uR6se-004Ktz-Dx@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: stmmac: rk: fix code formmating issue	Russell King (Oracle)	1	-1/+1
	Fix a code formatting issue introduced in the previous series, no space after , before "int". Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uR6sZ-004Ktt-9y@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	Merge branch 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux	Jakub Kicinski	1	-85/+281
	Shradha Gupta says: ==================== Allow dyn MSI-X vector allocation of MANA In this patchset we want to enable the MANA driver to be able to allocate MSI-X vectors in PCI dynamically. The first patch exports pci_msix_prepare_desc() in PCI to be able to correctly prepare descriptors for dynamically added MSI-X vectors. The second patch adds the support of dynamic vector allocation in pci-hyperv PCI controller by enabling the MSI_FLAG_PCI_MSIX_ALLOC_DYN flag and using the pci_msix_prepare_desc() exported in first patch. The third patch adds a detailed description of the irq_setup(), to help understand the function design better. The fourth patch is a preparation patch for mana changes to support dynamic IRQ allocation. It contains changes in irq_setup() to allow skipping first sibling CPU sets, in case certain IRQs are already affinitized to them. The fifth patch has the changes in MANA driver to be able to allocate MSI-X vectors dynamically. If the support does not exist it defaults to older behavior. * 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux: net: mana: Allocate MSI-X vectors dynamically net: mana: Allow irq_setup() to skip cpus for affinity net: mana: explain irq_setup() algorithm PCI: hv: Allow dynamic MSI-X vector allocation PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations ==================== Link: https://patch.msgid.link/1749650984-9193-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	e1000e: set fixed clock frequency indication for Nahum 11 and Nahum 13	Vitaly Lifshits	2	-6/+16
	On some systems with Nahum 11 and Nahum 13 the value of the XTAL clock in the software STRAP is incorrect. This causes the PTP timer to run at the wrong rate and can lead to synchronization issues. The STRAP value is configured by the system firmware, and a firmware update is not always possible. Since the XTAL clock on these systems always runs at 38.4MHz, the driver may ignore the STRAP and just set the correct value. Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake") Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Reviewed-by: Gil Fine <gil.fine@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17	ice: fix eswitch code memory leak in reset scenario	Grzegorz Nitka	1	-1/+5
	Add simple eswitch mode checker in attaching VF procedure and allocate required port representor memory structures only in switchdev mode. The reset flows triggers VF (if present) detach/attach procedure. It might involve VF port representor(s) re-creation if the device is configured is switchdev mode (not legacy one). The memory was blindly allocated in current implementation, regardless of the mode and not freed if in legacy mode. Kmemeleak trace: unreferenced object (percpu) 0x7e3bce5b888458 (size 40): comm "bash", pid 1784, jiffies 4295743894 hex dump (first 32 bytes on cpu 45): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): pcpu_alloc_noprof+0x4c4/0x7c0 ice_repr_create+0x66/0x130 [ice] ice_repr_create_vf+0x22/0x70 [ice] ice_eswitch_attach_vf+0x1b/0xa0 [ice] ice_reset_all_vfs+0x1dd/0x2f0 [ice] ice_pci_err_resume+0x3b/0xb0 [ice] pci_reset_function+0x8f/0x120 reset_store+0x56/0xa0 kernfs_fop_write_iter+0x120/0x1b0 vfs_write+0x31c/0x430 ksys_write+0x61/0xd0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e Testing hints (ethX is PF netdev): - create at least one VF echo 1 > /sys/class/net/ethX/device/sriov_numvfs - trigger the reset echo 1 > /sys/class/net/ethX/device/reset Fixes: 415db8399d06 ("ice: make representor code generic") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17	net: ice: Perform accurate aRFS flow match	Krishna Kumar	1	-0/+48
	This patch fixes an issue seen in a large-scale deployment under heavy incoming pkts where the aRFS flow wrongly matches a flow and reprograms the NIC with wrong settings. That mis-steering causes RX-path latency spikes and noisy neighbor effects when many connections collide on the same hash (some of our production servers have 20-30K connections). set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by hashing the skb sized by the per rx-queue table size. This results in multiple connections (even across different rx-queues) getting the same hash value. The driver steer function modifies the wrong flow to use this rx-queue, e.g.: Flow#1 is first added: Flow#1: <ip1, port1, ip2, port2>, Hash 'h', q#10 Later when a new flow needs to be added: Flow#2: <ip3, port3, ip4, port4>, Hash 'h', q#20 The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This results in both flows getting un-optimized - packets for Flow#1 goes to q#20, and then reprogrammed back to q#10 later and so on; and Flow #2 programming is never done as Flow#1 is matched first for all misses. Many flows may wrongly share the same hash and reprogram rules of the original flow each with their own q#. Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s 72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS, enable aRFS (global value is 144 * rps_flow_cnt). Test notes about results from ice_rx_flow_steer(): --------------------------------------------------- 1. "Skip:" counter increments here: if (fltr_info->q_index == rxq_idx \|\| arfs_entry->fltr_state != ICE_ARFS_ACTIVE) goto out; 2. "Add:" counter increments here: ret = arfs_entry->fltr_info.fltr_id; INIT_HLIST_NODE(&arfs_entry->list_entry); 3. "Update:" counter increments here: /* update the queue to forward to on an already existing flow / Runtime comparison: original code vs with the patch for different rps_flow_cnt values. +-------------------------------+--------------+--------------+ \| rps_flow_cnt \| 512 \| 2048 \| +-------------------------------+--------------+--------------+ \| Ratio of Pkts on Good:Bad q's \| 214 vs 822K \| 1.1M vs 980K \| \| Avoid wrong aRFS programming \| 0 vs 310K \| 0 vs 30K \| \| CPU User \| 216 vs 183 \| 216 vs 206 \| \| CPU System \| 1441 vs 1171 \| 1447 vs 1320 \| \| CPU Softirq \| 1245 vs 920 \| 1238 vs 961 \| \| CPU Total \| 29 vs 22.7 \| 29 vs 24.9 \| \| aRFS Update \| 533K vs 59 \| 521K vs 32 \| \| aRFS Skip \| 82M vs 77M \| 7.2M vs 4.5M \| +-------------------------------+--------------+--------------+ A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections showed no performance degradation. Some points on the patch/aRFS behavior: 1. Enabling full tuple matching ensures flows are always correctly matched, even with smaller hash sizes. 2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs and fewer calls to driver for programming on misses. 3. Larger hash tables reduces mis-steering due to more unique flow hashes, but still has clashes. However, with larger per-device rps_flow_cnt, old flows take more time to expire and new aRFS flows cannot be added if h/w limits are reached (rps_may_expire_flow() succeeds when 10rps_flow_cnt pkts have been processed by this cpu that are not part of the flow). Fixes: 28bf26724fdb0 ("ice: Implement aRFS") Signed-off-by: Krishna Kumar <krikku@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17	Merge branch 'intel-next-queue-1GbE'	Paolo Abeni	7	-43/+189
	Tony Nguyen says: ==================== Faizal Rahim says: MAC Merge support for frame preemption was previously added for igc: https://lore.kernel.org/netdev/20250418163822.3519810-1-anthony.l.nguyen@intel.com/ This series builds on that work and adds support for: - Harmonizing taprio and mqprio queue priority behavior, based on past discussions and suggestions: https://lore.kernel.org/all/20250214102206.25dqgut5tbak2rkz@skbuf/ - Enabling preemptible queue support for both taprio and mqprio, with priority harmonization as a prerequisite. Patch organization: - Patches 1-3: Preparation work for patches 6 and 7 - Patches 4-5: Queue priority harmonization - Patches 6-7: Add preemptible queue support ==================== Link: https://patch.msgid.link/20250611180314.2059166-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-17	net: enetc: replace PCVLANR1/2 with SICVLANR1/2 and remove dead branch	Wei Fang	2	-7/+8
	Both PF and VF have rx-vlan-offload enabled, however, the PCVLANR1/2 registers are resources controlled by PF, so VF cannot access these two registers. Fortunately, the hardware provides SICVLANR1/2 registers for each SI to reflect the value of PCVLANR1/2 registers. Therefore, use SICVLANR1/2 instead of PCVLANR1/2. Note that this is not an issue in actual use, because the current driver does not support custom TPID, the driver will not access these two registers in actual use, so this modification is just an optimization. In addition, since ENETC_RXBD_FLAG_TPID is defined as GENMASK(1, 0), the possible values are only 0, 1, 2, 3, so the default branch will never be true, so remove the default branch. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250613093605.39277-1-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-17	sysfs: treewide: switch back to bin_attribute::read()/write()	Thomas Weißschuh	2	-22/+22
	The bin_attribute argument of bin_attribute::read() is now const. This makes the _new() callbacks unnecessary. Switch all users back. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-3-724bfcf05b99@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-17	net: mana: Allocate MSI-X vectors dynamically	Shradha Gupta	1	-82/+229
	Currently, the MANA driver allocates MSI-X vectors statically based on MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends up allocating more vectors than it needs. This is because, by this time we do not have a HW channel and do not know how many IRQs should be allocated. To avoid this, we allocate 1 MSI-X vector during the creation of HWC and after getting the value supported by hardware, dynamically add the remaining MSI-X vectors. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
2025-06-17	net: mana: Allow irq_setup() to skip cpus for affinity	Shradha Gupta	1	-4/+12
	In order to prepare the MANA driver to allocate the MSI-X IRQs dynamically, we need to enhance irq_setup() to allow skipping affinitizing IRQs to the first CPU sibling group. This would be for cases when the number of IRQs is less than or equal to the number of online CPUs. In such cases for dynamically added IRQs the first CPU sibling group would already be affinitized with HWC IRQ. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
2025-06-17	net: mana: explain irq_setup() algorithm	Yury Norov	1	-0/+41
	Commit 91bfe210e196 ("net: mana: add a function to spread IRQs per CPUs") added the irq_setup() function that distributes IRQs on CPUs according to a tricky heuristic. The corresponding commit message explains the heuristic. Duplicate it in the source code to make available for readers without digging git in history. Also, add more detailed explanation about how the heuristics is implemented. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
2025-06-17	eth: gianfar: migrate to new RXFH callbacks	Jakub Kicinski	1	-7/+17
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Uniquely, this driver supports only the SET operation. It does not support GET at all. The SET callback also always returns 0, even tho it checks a bunch of conditions, and if my quick reading is right, expects the user to insert filtering rules for given flow type first? Long story short it seems too convoluted to easily add the GET as part of the conversion. Link: https://patch.msgid.link/20250613172751.3754732-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: iavf: migrate to new RXFH callbacks	Jakub Kicinski	1	-41/+11
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). I'm deleting all the boilerplate kdoc from the affected functions. It is somewhere between pointless and incorrect, just a burden for people refactoring the code. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: ice: migrate to new RXFH callbacks	Jakub Kicinski	1	-41/+18
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). I'm deleting all the boilerplate kdoc from the affected functions. It is somewhere between pointless and incorrect, just a burden for people refactoring the code. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: i40e: migrate to new RXFH callbacks	Jakub Kicinski	1	-24/+14
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). I'm deleting all the boilerplate kdoc from the affected functions. It is somewhere between pointless and incorrect, just a burden for people refactoring the code. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: fm10k: migrate to new RXFH callbacks	Jakub Kicinski	1	-24/+10
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). .get callback moves out of the switch and set_rxnfc disappears as ETHTOOL_SRXFH as the only functionality. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: ixgbe: migrate to new RXFH callbacks	Jakub Kicinski	1	-10/+12
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: igc: migrate to new RXFH callbacks	Jakub Kicinski	1	-8/+10
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: igb: migrate to new RXFH callbacks	Jakub Kicinski	1	-10/+10
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: enetc: migrate to new RXFH callbacks	Jakub Kicinski	1	-6/+5
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). This driver's RXFH config is read only / fixed so the conversion is trivial. Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250614180638.4166766-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: e1000e: migrate to new RXFH callbacks	Jakub Kicinski	1	-42/+35
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). This driver's RXFH config is read only / fixed and it's the only get_rxnfc sub-command the driver supports. So convert the get_rxnfc handler into a get_rxfh_fields handler. Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250614180638.4166766-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: lan743x: migrate to new RXFH callbacks	Jakub Kicinski	1	-12/+19
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). This driver's RXFH config is read only / fixed so the conversion is purely factoring out the handling into a helper. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250614180638.4166766-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: cxgb4: migrate to new RXFH callbacks	Jakub Kicinski	1	-50/+55
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). This driver's RXFH config is read only / fixed so the conversion is purely factoring out the handling into a helper. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250614180638.4166766-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	eth: cisco: migrate to new RXFH callbacks	Jakub Kicinski	1	-4/+4
	Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). This driver's RXFH config is read only / fixed so the conversion is trivial. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250614180638.4166766-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	octeontx2-pf: CN20K mbox implementation between PF-VF	Sai Krishna	5	-14/+194
	This patch implements the CN20k MBOX communication between PF and it's VFs. CN20K silicon got extra interrupt of MBOX response for trigger interrupt. Also few of the CSR offsets got changed in CN20K against prior series of silicons. Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/1749639716-13868-7-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	octeontx2-af: CN20K mbox implementation for AF's VF	Sai Krishna	11	-37/+380
	This patch implements the CN20k MBOX communication between AF and AF's VFs. This implementation uses separate trigger interrupts for request, response messages against using trigger message data in CN10K. Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/1749639716-13868-6-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	octeontx2-pf: CN20K mbox REQ/ACK implementation for NIC PF	Sai Krishna	10	-22/+186
	This implementation uses separate trigger interrupts for request, response messages against using trigger message data in CN10K. This patch adds support for basic mbox implementation for CN20K from NIC PF side. Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/1749639716-13868-5-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	octeontx2-af: CN20k mbox to support AF REQ/ACK functionality	Sai Krishna	8	-18/+358
	This implementation uses separate trigger interrupts for request, response MBOX messages against using trigger message data in CN10K. This patch adds support for basic mbox implementation for CN20K from AF side. Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/1749639716-13868-4-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	octeontx2-af: CN20k basic mbox operations and structures	Sai Krishna	9	-15/+234
	This patch adds basic mbox operation APIs and structures to add support for mbox module on CN20k silicon. There are few CSR offsets, interrupts changed between CN20k and prior Octeon series of devices. Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/1749639716-13868-3-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17	octeontx2: Set appropriate PF, VF masks and shifts based on silicon	Subbaraya Sundeep	21	-189/+186
	Number of RVU PFs on CN20K silicon have increased to 96 from maximum of 32 that were supported on earlier silicons. Every RVU PF and VF is identified by HW using a 16bit PF_FUNC value. Due to the change in Max number of PFs in CN20K, the bit encoding of this PF_FUNC has changed. This patch handles the change by using helper functions(using silicon check) to use PF,VF masks and shifts to support both new silicon CN20K, OcteonTx series. These helper functions are used in different modules. Also moved the NIX AF register offset macros to other files which will be posted in coming patches. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Link: https://patch.msgid.link/1749639716-13868-2-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>