summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2024-11-16enic: Make MSI-X I/O interrupts come after the other required onesNelson Escobar2-9/+22
The VIC hardware has a constraint that the MSIX interrupt used for errors be specified as a 7 bit number. Before this patch, it was allocated after the I/O interrupts, which would cause a problem if 128 or more I/O interrupts are in use. So make the required interrupts come before the I/O interrupts to guarantee the error interrupt offset never exceeds 7 bits. Co-developed-by: John Daley <johndale@cisco.com> Signed-off-by: John Daley <johndale@cisco.com> Co-developed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Satish Kharat <satishkh@cisco.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-2-a34cf8570c67@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-16enic: Create enic_wq/rq structures to bundle per wq/rq dataNelson Escobar4-73/+81
Bundling the wq/rq specific data into dedicated enic_wq/rq structures cleans up the enic structure and simplifies future changes related to wq/rq. Co-developed-by: John Daley <johndale@cisco.com> Signed-off-by: John Daley <johndale@cisco.com> Co-developed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-1-a34cf8570c67@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-16gve: Flow steering trigger reset only for timeout errorZiwei Xiao1-2/+2
When configuring flow steering rules, the driver is currently going through a reset for all errors from the device. Instead, the driver should only reset when there's a timeout error from the device. Fixes: 57718b60df9b ("gve: Add flow steering adminq commands") Cc: stable@vger.kernel.org Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241113175930.2585680-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-16bnxt_en: optimize gettimex64Vadim Fedorenko2-6/+37
Current implementation of gettimex64() makes at least 3 PCIe reads to get current PHC time. It takes at least 2.2us to get this value back to userspace. At the same time there is cached value of upper bits of PHC available for packet timestamps already. This patch reuses cached value to speed up reading of PHC time. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241114114820.1411660-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15net: ethtool: only allow set_rxnfc with rss + ring_cookie if driver opts inEdward Cree2-0/+2
Ethtool ntuple filters with FLOW_RSS were originally defined as adding the base queue ID (ring_cookie) to the value from the indirection table, so that the same table could distribute over more than one set of queues when used by different filters. However, some drivers / hardware ignore the ring_cookie, and simply use the indirection table entries as queue IDs directly. Thus, for drivers which have not opted in by setting ethtool_ops.cap_rss_rxnfc_adds to declare that they support the original (addition) semantics, reject in ethtool_set_rxnfc any filter which combines FLOW_RSS and a nonzero ring. (For a ring_cookie of zero, both behaviours are equivalent.) Set the cap bit in sfc, as it is known to support this feature. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://patch.msgid.link/cc3da0844083b0e301a33092a6299e4042b65221.1731499022.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15i40e: Fix handling changed priv flagsPeter Große1-1/+1
After assembling the new private flags on a PF, the operation to determine the changed flags uses the wrong bitmaps. Instead of xor-ing orig_flags with new_flags, it uses the still unchanged pf->flags, thus changed_flags is always 0. Fix it by using the correct bitmaps. The issue was discovered while debugging why disabling source pruning stopped working with release 6.7. Although the new flags will be copied to pf->flags later on in that function, disabling source pruning requires a reset of the PF, which was skipped due to this bug. Disabling source pruning: $ sudo ethtool --set-priv-flags eno1 disable-source-pruning on $ sudo ethtool --show-priv-flags eno1 Private flags for eno1: MFP : off total-port-shutdown : off LinkPolling : off flow-director-atr : on veb-stats : off hw-atr-eviction : off link-down-on-close : off legacy-rx : off disable-source-pruning: on disable-fw-lldp : off rs-fec : off base-r-fec : off vf-vlan-pruning : off Regarding reproducing: I observed the issue with a rather complicated lab setup, where * two VLAN interfaces are created on eno1 * each with a different MAC address assigned * each moved into a separate namespace * both VLANs are bridged externally, so they form a single layer 2 network The external bridge is done via a channel emulator adding packet loss and delay and the application in the namespaces tries to send/receive traffic and measure the performance. Sender and receiver are separated by namespaces, yet the network card "sees its own traffic" send back to it. To make that work, source pruning has to be disabled. Cc: stable@vger.kernel.org Fixes: 70756d0a4727 ("i40e: Use DECLARE_BITMAP for flags and hw_features fields in i40e_pf") Signed-off-by: Peter Große <pegro@friiks.de> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20241113210705.1296408-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15net: sparx5: add missing lan969x Kconfig dependencyArnd Bergmann2-2/+2
The sparx5 switchdev driver can be built either with or without support for the Lan969x switch. However, it cannot be built-in when the lan969x driver is a loadable module because of a link-time dependency: arm-linux-gnueabi-ld: drivers/net/ethernet/microchip/sparx5/sparx5_main.o:(.rodata+0xd44): undefined reference to `lan969x_desc' Add a Kconfig dependency to reflect this in Kconfig, allowing all the valid configurations but forcing sparx5 to be a loadable module as well if lan969x is. Fixes: 98a01119608d ("net: sparx5: add compatible string for lan969x") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241113115513.4132548-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15net: enetc: clean up before returning in probe()Dan Carpenter1-3/+5
We recently added this error path. We need to call enetc_pci_remove() before returning. It cleans up the resources from enetc_pci_probe(). Fixes: 99100d0d9922 ("net: enetc: add preliminary support for i.MX95 ENETC PF") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/93888efa-c838-4682-a7e5-e6bf318e844e@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15r8169: copy vendor driver 2.5G/5G EEE advertisement constraintsHeiner Kallweit2-12/+10
Vendor driver r8125 doesn't advertise 2.5G EEE on RTL8125A, and r8126 doesn't advertise 5G EEE. Likely there are compatibility issues, therefore do the same in r8169. With this change we don't have to disable 2.5G EEE advertisement in rtl8125a_config_eee_phy() any longer. We use new phylib accessor phy_set_eee_broken() to mark the respective EEE modes as broken. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/ce185e10-8a2f-4cf8-a49b-fd8fb3c3c8a1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski13-29/+100
Cross-merge networking fixes after downstream PR (net-6.12-rc8). Conflicts: tools/testing/selftests/net/.gitignore 252e01e68241 ("selftests: net: add netlink-dumps to .gitignore") be43a6b23829 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw") https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/ drivers/net/phy/phylink.c 671154f174e0 ("net: phylink: ensure PHY momentary link-fails are handled") 7530ea26c810 ("net: phylink: remove "using_mac_select_pcs"") Adjacent changes: drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c 5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") e96321fad3ad ("net: ethernet: Switch back to struct platform_driver::remove()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14eth: fbnic: Add support to dump registersMohsin Bashir5-1/+186
Add support for the 'ethtool -d <dev>' command to retrieve and print a register dump for fbnic. The dump defaults to version 1 and consists of two parts: all the register sections that can be dumped linearly, and an RPC RAM section that is structured in an interleaved fashion and requires special handling. For each register section, the dump also contains the start and end boundary information which can simplify parsing. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20241112222605.3303211-1-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-14net: ti: icssg-prueth: Fix 1 PPS syncMeghana Malladi2-2/+23
The first PPS latch time needs to be calculated by the driver (in rounded off seconds) and configured as the start time offset for the cycle. After synchronizing two PTP clocks running as master/slave, missing this would cause master and slave to start immediately with some milliseconds drift which causes the PPS signal to never synchronize with the PTP master. Fixes: 186734c15886 ("net: ti: icssg-prueth: add packet timestamping and ptp support") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20241111095842.478833-1-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-14stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routinesVitalii Mordan1-8/+17
If the clock dwmac->tx_clk was not enabled in intel_eth_plat_probe, it should not be disabled in any path. Conversely, if it was enabled in intel_eth_plat_probe, it must be disabled in all error paths to ensure proper cleanup. Found by Linux Verification Center (linuxtesting.org) with Klever. Fixes: 9efc9b2b04c7 ("net: stmmac: Add dwmac-intel-plat for GBE driver") Signed-off-by: Vitalii Mordan <mordan@ispras.ru> Link: https://patch.msgid.link/20241108173334.2973603-1-mordan@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wolNícolas F. R. A. Prado1-2/+2
The mediatek,mac-wol property is being handled backwards to what is described in the binding: it currently enables PHY WOL when the property is present and vice versa. Invert the driver logic so it matches the binding description. Fixes: fd1d62d80ebc ("net: stmmac: replace the use_phy_wol field with a flag") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://patch.msgid.link/20241109-mediatek-mac-wol-noninverted-v2-1-0e264e213878@collabora.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: dwmac_socfpga: This platform has GMACMaxime Chevallier1-0/+1
Indicate that dwmac_socfpga has a gmac. This will make sure that gmac-specific interrupt processing is done, including timestamp interrupt handling. Without this, the external snapshot interrupt is never ack'd and we have an interrupt storm on external snapshot event. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241112170658.2388529-10-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: Configure only the relevant bits for timestamping setupMaxime Chevallier1-1/+14
The PTP_TCR (Timestamp Control Register) is used to configure several features related to packet timestamping. On one hand, it configures the 1588 packet processing, to indicate what types of frames should be timestamped (all, only 1588v1 or 1588v2, using L2 or L4 timestamping, on IPv4 or IPv6, etc.). This is congfigured usually through the ioctl / ndo dedicated for such setup. This configuration is done by setting some fields in that register, that seem to behave the same way on all dwmac variants, including DWMAC1000. On the other hand, and only on DWMAC1000 apparently, some fields in that register are used to configure external snapshots (bits 24/25). On DWMAC4 and others, these fields are reserved and external snapshots are configured through a dedicated register that simply doesn't seem to exist on DWMAC1000. This configuration is done in the dwmac1000-specific ptp_clock_info ops (cf dwmac1000_ptp_enable()). So to avoid the timestamping configuration interfering with the external snapshots, this commit makes sure that the config_hw_tstamping only configures the relevant bits in PTP_TCR, so that the DWMAC1000 timestamping can correctly rely on these otherwise reserved fields. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241112170658.2388529-9-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: Don't include dwmac4 definitions in stmmac_ptpMaxime Chevallier1-1/+0
The stmmac_ptp code doesn't need the dwmac4 register definitions, remove the inclusion. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241112170658.2388529-8-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: Enable timestamping interrupt on dwmac1000Maxime Chevallier1-0/+16
The default configuration for the interrupts on dwmac1000 have the timestamping interrupt masked. Now that the timestamping has been adapted to dwmac1000, enable the timestamping interrupt on these platforms. On dwmac1000, the external snapshot interrupt is configured through a dedicated bit, that is set as reserved on other dwmac variants. The timestaming interrupt is acknowledged by reading the GMAC3_X_TIMESTAMP_STATUS register. Make sure that this interrupt is enabled when snapshot is enabled, and masked when disabled. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241112170658.2388529-7-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: Introduce dwmac1000 timestamping operationsMaxime Chevallier6-2/+65
In GMAC3_X, the timestamping configuration differs from GMAC4 in the layout of the registers accessed to grab the number of snapshots in FIFO as well as the register offset to grab the aux snapshot timestamp. Introduce dedicated ops to configure timestamping on dwmac100 and dwmac1000. The latency correction doesn't seem to exist on GMAC3, so its corresponding operation isn't populated. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241112170658.2388529-6-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: Introduce dwmac1000 ptp_clock_info and operationsMaxime Chevallier6-2/+77
The PTP configuration for GMAC3_X differs from the other implementations in several ways : - There's only one external snapshot trigger - The snapshot configuration is done through the PTP_TCR register, whereas the other dwmac variants have a dedicated ACR (auxiliary control reg) for that purpose - The layout for the PTP_TCR register also differs, as bits 24/25 are used for the snapshot configuration. These bits are reserved on other variants. On GMAC3_X, we also can't discover the number of snapshot triggers automatically. The GMAC3_X has one PPS output, however it's configuration isn't supported yet so report 0 n_per_out for now. Introduce a dedicated set of ptp_clock_info ops and configuration parameters to reflect these differences specific to GMAC3_X. This was tested on dwmac_socfpga. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241112170658.2388529-5-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: Only update the auto-discovered PTP clock featuresMaxime Chevallier1-2/+8
Some DWMAC variants such as dwmac1000 don't support discovering the number of output pps and auxiliary snapshots. Allow these parameters to be defined in default ptp_clock_info, and let them be updated only when the feature discovery yielded a result. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241112170658.2388529-4-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: Use per-hw ptp clock opsMaxime Chevallier3-3/+14
The auxiliary snapshot configuration was found to differ depending on the dwmac version. To prepare supporting this, allow specifying the ptp_clock_info ops in the hwif array Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241112170658.2388529-3-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14net: stmmac: Don't modify the global ptp ops directlyMaxime Chevallier1-6/+7
The stmmac_ptp_clock_ops are copied into the stmmac_priv structure before being registered to the PTP core. Some adjustments are made prior to that, such as the number of snapshots or max adjustment parameters. Instead of modifying the global definition, then copying into the local private data, let's first copy then modify the local parameters. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241112170658.2388529-2-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-13e1000: Hold RTNL when e1000_down can be calledJoe Damato1-1/+9
e1000_down calls netif_queue_set_napi, which assumes that RTNL is held. There are a few paths for e1000_down to be called in e1000 where RTNL is not currently being held: - e1000_shutdown (pci shutdown) - e1000_suspend (power management) - e1000_reinit_locked (via e1000_reset_task delayed work) - e1000_io_error_detected (via pci error handler) Hold RTNL in three places to fix this issue: - e1000_reset_task: igc, igb, and e100e all hold rtnl in this path. - e1000_io_error_detected (pci error handler): e1000e and ixgbe hold rtnl in this path. A patch has been posted for igc to do the same [1]. - __e1000_shutdown (which is called from both e1000_shutdown and e1000_suspend): igb, ixgbe, and e1000e all hold rtnl in the same path. The other paths which call e1000_down seemingly hold RTNL and are OK: - e1000_close (ndo_stop) - e1000_change_mtu (ndo_change_mtu) Based on the above analysis and mailing list discussion [2], I believe adding rtnl in the three places mentioned above is correct. Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs") Reported-by: Dmitry Antipov <dmantipov@yandex.ru> Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/ Link: https://lore.kernel.org/netdev/20241022215246.307821-3-jdamato@fastly.com/ [1] Link: https://lore.kernel.org/netdev/ZxgVRX7Ne-lTjwiJ@LQ3V64L9R2/ [2] Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13igbvf: remove unused spinlockWander Lairson Costa2-6/+0
tx_queue_lock and stats_lock are declared and initialized, but never used. Remove them. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13igb: Fix 2 typos in comments in igb_main.cJohnny Park1-2/+2
Fix 2 spelling mistakes in comments in `igb_main.c`. Signed-off-by: Johnny Park <pjohnny0508@gmail.com> Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13igc: remove autoneg parameter from igc_mac_infoVitaly Lifshits6-195/+163
Since the igc driver doesn't support forced speed configuration and its current related hardware doesn't support it either, there is no use of the mac.autoneg parameter. Moreover, in one case this usage might result in a NULL pointer dereference due to an uninitialized function pointer, phy.ops.force_speed_duplex. Therefore, remove this parameter from the igc code. Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ixgbe: Break include dependency cycleDiomidis Spinellis7-14/+22
Header ixgbe_type.h includes ixgbe_mbx.h. Also, header ixgbe_mbx.h included ixgbe_type.h, thus introducing a circular dependency. - Remove ixgbe_mbx.h inclusion from ixgbe_type.h. - ixgbe_mbx.h requires the definition of struct ixgbe_mbx_operations so move its definition there. While at it, add missing argument identifier names. - Add required forward structure declarations. - Include ixgbe_mbx.h in the .c files that need it, for the following reasons: ixgbe_sriov.c uses ixgbe_check_for_msg ixgbe_main.c uses ixgbe_init_mbx_params_pf ixgbe_82599.c uses mbx_ops_generic ixgbe_x540.c uses mbx_ops_generic ixgbe_x550.c uses mbx_ops_generic Signed-off-by: Diomidis Spinellis <dds@aueb.gr> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: Unbind the workqueueFrederic Weisbecker1-1/+1
The ice workqueue doesn't seem to rely on any CPU locality and should therefore be able to run on any CPU. In practice this is already happening through the unbound ice_service_timer that may fire anywhere and queue the workqueue accordingly to any CPU. Make this official so that the ice workqueue is only ever queued to housekeeping CPUs on nohz_full. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: use stack variable for virtchnl_supported_rxdidsJacob Keller1-16/+4
The ice_vc_query_rxdid() function allocates memory to store the virtchnl_supported_rxdids structure used to communicate the bitmap of supported RXDIDs to a VF. This structure is only 8 bytes in size. The function must hold the allocated length on the stack as well as the pointer to the structure which itself is 8 bytes. Allocating this storage on the heap adds unnecessary overhead including a potential error path that must be handled in case kzalloc fails. Because this structure is so small, we're not saving stack space. Additionally, because we must ensure that we free the allocated memory, the return value from ice_vc_send_msg_to_vf() must also be saved in the stack ret variable. Depending on compiler optimization, this means allocating the 8-byte structure is requiring up to 16-bytes of stack memory! Simplify this function to keep the rxdid variable on the stack, saving memory and removing a potential failure exit path from this function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: initialize pf->supported_rxdids immediately after loading DDPJacob Keller2-18/+33
The pf->supported_rxdids field is used to populate the list of valid RXDIDs that a VF may use when negotiating VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. The set of supported RXDIDs is dependent on the DDP, and can be read from the GLXFLXP_RXDID_FLAGS register. The PF needs to send this list to the VF upon receiving the VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. It also needs to use this list to validate the requested descriptor ID from the VF when programming the Rx queues. A future update to support VF live migration will also want to validate that the target VF can support the same descriptor ID when migrating. Currently, pf->supported_rxdids is initialized inside the ice_vc_query_rxdid() function. This means that it is only ever initialized if at least one VF actually tries to negotiate VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. It is also unnecessarily re-initialized every time the VF loads and requests the descriptor list. This worked before because the PF only checks pf->suppported_rxdids when programming the Rx queue if the VF actually negotiates the VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC feature. This will be problematic for VF live migration. We need the list of supported Rx descriptor IDs when migrating. It is possible that no VF on the target PF has ever actually issued a VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. Refactor the driver to initialize pf->supported_rxdids during driver initialization after the DDP is loaded. This is simpler, avoids unnecessary duplicate work, and avoids issues with the live migration process. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: only allow Tx promiscuous for multicastBrett Creeley2-10/+19
Currently when any VF is trusted and true promiscuous mode is enabled on the PF, the VF will receive all unicast traffic directed to the device's internal switch. This includes traffic external to the NIC and also from other VSI (i.e. VFs). This does not match the expected behavior as unicast traffic should only be visible from external sources in this case. Disable the Tx promiscuous mode bits for unicast promiscuous mode. Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: Add support for persistent NAPI configJoe Damato2-3/+6
Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. This preserves NAPI config settings when queue counts are adjusted. Tested with an E810-2CQDA2 NIC. Begin by setting the queue count to 4: $ sudo ethtool -L eth4 combined 4 Check the queue settings: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, set the queue with NAPI ID 8451 to have a gro-flush-timeout of 1111: $ sudo ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json='{"id": 8451, "gro-flush-timeout": 1111}' None Check that worked: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now reduce the queue count to 2, which would destroy the queue with NAPI ID 8451: $ sudo ethtool -L eth4 combined 2 Check the queue settings, noting that NAPI ID 8451 is gone: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, increase the number of queues back to 4: $ sudo ethtool -L eth4 combined 4 Dump the settings, expecting to see the same NAPI IDs as above and for NAPI ID 8451 to have its gro-flush-timeout set to 1111: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: support optional flags in signature segment headerPrzemek Kitszel2-7/+20
An optional flag field has been added to the signature segment header. The field contains two flags, a "valid" bit, and a "last segment" bit that indicates whether the segment is the last segment that will be sent to firmware. If the flag field's valid bit is NOT set, then as was done before, assume that this is the last segment being downloaded. However, if the flag field's valid bit IS set, then use the last segment flag to determine if this segment is the last segment to download. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: refactor "last" segment of DDP pkgPrzemek Kitszel1-137/+151
Add ice_ddp_send_hunk() that buffers "sent FW hunk" calls to AQ in order to mark the "last" one in more elegant way. Next commit will add even more complicated "sent FW" flow, so it's better to untangle a bit before. Note that metadata buffers were not skipped for NOT-@indicate_last segments, this is fixed now. Minor: + use ice_is_buffer_metadata() instead of open coding it in ice_dwnld_cfg_bufs(); + ice_dwnld_cfg_bufs_no_lock() + dependencies were moved up a bit to have better git-diff, as this function was rewritten (in terms of git-blame) CC: Paul Greenwalt <paul.greenwalt@intel.com> CC: Dan Nowlin <dan.nowlin@intel.com> CC: Ahmed Zaki <ahmed.zaki@intel.com> CC: Simon Horman <horms@kernel.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: extend dump serdes equalizer values featureMateusz Polchlopek3-0/+51
Extend the work done in commit 70838938e89c ("ice: Implement driver functionality to dump serdes equalizer values") by adding the new set of Rx registers that can be read using command: $ ethtool -d interface_name Rx equalization parameters are E810 PHY registers used by end user to gather information about configuration and status to debug link and connection issues in the field. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: rework of dump serdes equalizer values featureMateusz Polchlopek2-77/+38
Refactor function ice_get_tx_rx_equa() to iterate over new table of params instead of multiple calls to ice_aq_get_phy_equalization(). Subsequent commit will extend that function by add more serdes equalizer values to dump. Shorten the fields of struct ice_serdes_equalization_to_ethtool for readability purposes. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13octeontx2-pf: Adds TC offload supportGeetha sowjanya7-17/+154
Implements tc offload support for rvu representors. Usage example: - Add tc rule to drop packets with vlan id 3 using port representor(Rpf1vf0). # tc filter add dev Rpf1vf0 protocol 802.1Q parent ffff: flower vlan_id 3 vlan_ethtype ipv4 skip_sw action drop - Redirect packets with vlan id 5 and IPv4 packets to eth1, after stripping vlan header. # tc filter add dev Rpf1vf0 ingress protocol 802.1Q flower vlan_id 5 vlan_ethtype ipv4 skip_sw action vlan pop action mirred ingress redirect dev eth1 Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Implement offload stats ndo for representorsGeetha sowjanya2-0/+43
Implement the offload stat ndo by fetching the HW stats of rx/tx queues attached to the representor. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Add devlink port supportGeetha sowjanya4-0/+98
Register devlink port for the rvu representors. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Add representors for sdp MACGeetha sowjanya8-26/+88
Hardware supports different types of MACs eg RPM, SDP, LBK. LBK is for internal Tx->Rx HW loopback path. RPM and SDP MACs support ingress/egress pkt IO on interfaces with different set of capabilities like interface modes. At the time of netdev driver registration PF will seek MAC related information from Admin function driver 'drivers/net/ethernet/marvell/octeontx2/af' and sets up ingress/egress queues etc such that pkt IO on the channels of these different MACs is possible. This patch add representors for SDP MAC. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Configure VF mtu via representorGeetha sowjanya2-0/+22
Adds support to manage the mtu configuration for VF through representor. On update of representor mtu a mbox notification is send to VF to update its mtu. This feature is implemented based on the "Network Function Representors" kernel documentation. " Setting an MTU on the representor should cause that same MTU to be reported to the representee. " Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Add support to sync link state between representor and VFsGeetha sowjanya8-3/+287
Implements the below requirement mentioned in the representors documentation. " The representee's link state is controlled through the representor. Setting the representor administratively UP or DOWN should cause carrier ON or OFF at the representee. " This patch enables - Reflecting the link state of representor based on the VF state and link state of VF based on representor. - On VF interface up/down a notification is sent via mbox to representor to update the link state. eg: ip link set eth0 up/down will disable carrier on/off of the corresponding representor(r0p1) interface. - On representor interface up/down will cause the link state update of VF. eg: ip link set r0p1 up/down will disable carrier on/off of the corresponding representee(eth0) interface. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Get VF stats via representorGeetha sowjanya7-54/+180
Adds support to export VF port statistics via representor netdev. Defines new mbox "NIX_LF_STATS" to fetch VF hw stats. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-af: Add packet path between representor and VFGeetha sowjanya8-8/+298
Current HW, do not support in-built switch which will forward pkts between representee and representor. When representor is put under a bridge and pkts needs to be sent to representee, then pkts from representor are sent on a HW internal loopback channel, which again will be punted to ingress pkt parser. Now the rules that this patch installs are the MCAM filters/rules which will match against these pkts and forward them to representee. The rules that this patch installs are for basic representor <=> representee path similar to Tun/TAP between VM and Host. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Add basic net_device_opsGeetha sowjanya5-11/+70
Implements basic set of net_device_ops. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Create representor netdevGeetha sowjanya6-0/+285
Adds initial devlink support to set/get the switchdev mode. Representor netdevs are created for each rvu devices when the switch mode is set to 'switchdev'. These netdevs are be used to control and configure VFs. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: RVU representor driverGeetha sowjanya11-12/+326
Adds basic driver for the RVU representor. Driver on probe does pci specific initialization and does hw resources configuration. Introduces RVU_ESWITCH kernel config to enable/disable the driver. Representor and NIC shares the code but representors netdev support subset of NIC functionality. Hence "otx2_rep_dev" API helps to skip the features initialization that are not supported by the representors. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13eth: bnxt: use page pool for head fragsJakub Kicinski2-48/+51
Testing small size RPCs (300B-400B) on a large AMD system suggests that page pool recycling is very useful even for just the head frags. With this patch (and copy break disabled) I see a 30% performance improvement (82Gbps -> 106Gbps). Convert bnxt from normal page frags to page pool frags for head buffers. On systems with small page size we can use the same pool as for TPA pages. On systems with large pages the frag allocation logic of the page pool is already used to split a large page into TPA chunks. TPA chunks are much larger than heads (8k or 64k, AFAICT vs 1kB) and we always allocate the same sized chunks. Mixing allocation of TPA and head pages would lead to sub-optimal memory use. Plus Taehee's work on zero-copy / devmem will need to differentiate between TPA and non-TPA page pool, anyway. Conditionally allocate a new page pool for heads. Link: https://patch.msgid.link/20241109035119.3391864-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-13Revert "igb: Disable threaded IRQ for igb_msix_other"Wander Lairson Costa1-1/+1
This reverts commit 338c4d3902feb5be49bfda530a72c7ab860e2c9f. Sebastian noticed the ISR indirectly acquires spin_locks, which are sleeping locks under PREEMPT_RT, which leads to kernel splats. Fixes: 338c4d3902feb ("igb: Disable threaded IRQ for igb_msix_other") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20241106111427.7272-1-wander@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>