summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2024-10-15net: xilinx: axienet: fix potential memory leak in axienet_start_xmit()Wang Hai1-0/+2
The axienet_start_xmit() returns NETDEV_TX_OK without freeing skb in case of dma_map_single() fails, add dev_kfree_skb_any() to fix it. Fixes: 71791dc8bdea ("net: axienet: Check for DMA mapping errors") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://patch.msgid.link/20241014143704.31938-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: ethernet: fs_enet: Use %pa to format resource_size_tSimon Horman1-1/+1
The correct format string for resource_size_t is %pa which acts on the address of the variable to be formatted [1]. [1] https://elixir.bootlin.com/linux/v6.11.3/source/Documentation/core-api/printk-formats.rst#L229 Introduced by commit 9d9326d3bc0e ("phy: Change mii_bus id field to a string") Flagged by gcc-14 as: drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c: In function 'fs_mii_bitbang_init': drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:126:46: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=] 126 | snprintf(bus->id, MII_BUS_ID_SIZE, "%x", res.start); | ~^ ~~~~~~~~~ | | | | | resource_size_t {aka long long unsigned int} | unsigned int | %llx No functional change intended. Compile tested only. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/netdev/711d7f6d-b785-7560-f4dc-c6aad2cce99@linux-m68k.org/ Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241014-net-pa-fmt-v1-2-dcc9afb8858b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: fec_mpc52xx_phy: Use %pa to format resource_size_tSimon Horman1-1/+1
The correct format string for resource_size_t is %pa which acts on the address of the variable to be formatted [1]. [1] https://elixir.bootlin.com/linux/v6.11.3/source/Documentation/core-api/printk-formats.rst#L229 Introduced by commit 9d9326d3bc0e ("phy: Change mii_bus id field to a string") Flagged by gcc-14 as: drivers/net/ethernet/freescale/fec_mpc52xx_phy.c: In function 'mpc52xx_fec_mdio_probe': drivers/net/ethernet/freescale/fec_mpc52xx_phy.c:97:46: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=] 97 | snprintf(bus->id, MII_BUS_ID_SIZE, "%x", res.start); | ~^ ~~~~~~~~~ | | | | | resource_size_t {aka long long unsigned int} | unsigned int | %llx No functional change intended. Compile tested only. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/netdev/711d7f6d-b785-7560-f4dc-c6aad2cce99@linux-m68k.org/ Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241014-net-pa-fmt-v1-1-dcc9afb8858b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: txgbe: Pass string literal as format argument of alloc_workqueue()Simon Horman1-1/+1
Recently I noticed that both gcc-14 and clang-18 report that passing a non-string literal as the format argument of clkdev_create() is potentially insecure. E.g. clang-18 says: .../txgbe_phy.c:582:35: warning: format string is not a string literal (potentially insecure) [-Wformat-security] 581 | clock = clkdev_create(clk, NULL, clk_name); | ^~~~~~~~ .../txgbe_phy.c:582:35: note: treat the string as an argument to avoid this 581 | clock = clkdev_create(clk, NULL, clk_name); | ^ | "%s", It is always the case where the contents of clk_name is safe to pass as the format argument. That is, in my understanding, it never contains any format escape sequences. However, it seems better to be safe than sorry. And, as a bonus, compiler output becomes less verbose by addressing this issue as suggested by clang-18. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241014-string-thing-v2-2-b9b29625060a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: macb: Avoid 20s boot delay by skipping MDIO bus registration for ↵Oleksij Rempel1-3/+11
fixed-link PHY A boot delay was introduced by commit 79540d133ed6 ("net: macb: Fix handling of fixed-link node"). This delay was caused by the call to `mdiobus_register()` in cases where a fixed-link PHY was present. The MDIO bus registration triggered unnecessary PHY address scans, leading to a 20-second delay due to attempts to detect Clause 45 (C45) compatible PHYs, despite no MDIO bus being attached. The commit 79540d133ed6 ("net: macb: Fix handling of fixed-link node") was originally introduced to fix a regression caused by commit 7897b071ac3b4 ("net: macb: convert to phylink"), which caused the driver to misinterpret fixed-link nodes as PHY nodes. This resulted in warnings like: mdio_bus f0028000.ethernet-ffffffff: fixed-link has invalid PHY address mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 0 ... mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 31 This patch reworks the logic to avoid registering and allocation of the MDIO bus when: - The device tree contains a fixed-link node. - There is no "mdio" child node in the device tree. If a child node named "mdio" exists, the MDIO bus will be registered to support PHYs attached to the MACB's MDIO bus. Otherwise, with only a fixed-link, the MDIO bus is skipped. Tested on a sama5d35 based system with a ksz8863 switch attached to macb0. Fixes: 79540d133ed6 ("net: macb: Fix handling of fixed-link node") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241013052916.3115142-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit()Wang Hai1-1/+2
The greth_start_xmit_gbit() returns NETDEV_TX_OK without freeing skb in case of skb->len being too long, add dev_kfree_skb() to fix it. Fixes: d4c41139df6e ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20241012110434.49265-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: airoha: Implement BQL supportLorenzo Bianconi1-2/+6
Introduce BQL support in the airoha_eth driver reporting to the kernel info about tx hw DMA queues in order to avoid bufferbloat and keep the latency small. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20241012-en7581-bql-v2-1-4deb4efdb60b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15r8169: implement additional ethtool stats opsHeiner Kallweit1-0/+82
This adds support for ethtool standard statistics, and makes use of the extended hardware statistics being available from RTl8125. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/58e0da73-a7dd-4be3-82ae-d5b3f9069bde@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: gianfar: Use __be64 * to store pointers to big endian valuesSimon Horman1-3/+4
Timestamp values are read using pointers to 64-bit big endian values. But the type of these pointers is u64 *, host byte order. Use __be64 * instead. Flagged by Sparse: .../gianfar.c:2212:60: warning: cast to restricted __be64 .../gianfar.c:2475:53: warning: cast to restricted __be64 Introduced by commit cc772ab7cdca ("gianfar: Add hardware RX timestamping support"). Compile tested only. No functional change intended. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20241011-gianfar-be64-v1-1-a77ebe972176@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15octeontx2-af: Fix potential integer overflows on integer shiftsColin Ian King1-2/+2
The left shift int 32 bit integer constants 1 is evaluated using 32 bit arithmetic and then assigned to a 64 bit unsigned integer. In the case where the shift is 32 or more this can lead to an overflow. Avoid this by shifting using the BIT_ULL macro instead. Fixes: 019aba04f08c ("octeontx2-af: Modify SMQ flush sequence to drop packets") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/20241010154519.768785-1-colin.i.king@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15net: ethernet: ti: am65-cpsw: Enable USXGMII mode for J7200 CPSW5GSiddharth Vadapalli1-1/+2
TI's J7200 SoC supports USXGMII mode. Add USXGMII mode to the extra_modes member of the J7200 SoC data. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20241010150543.2620448-1-s-vadapalli@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15net: stmmac: dwmac-tegra: Fix link bring-up sequenceParitosh Dixit1-2/+12
The Tegra MGBE driver sometimes fails to initialize, reporting the following error, and as a result, it is unable to acquire an IP address with DHCP: tegra-mgbe 6800000.ethernet: timeout waiting for link to become ready As per the recommendation from the Tegra hardware design team, fix this issue by: - clearing the PHY_RDY bit before setting the CDR_RESET bit and then setting PHY_RDY bit before clearing CDR_RESET bit. This ensures valid data is present at UPHY RX inputs before starting the CDR lock. - adding the required delays when bringing up the UPHY lane. Note we need to use delays here because there is no alternative, such as polling, for these cases. Using the usleep_range() instead of ndelay() as sleeping is preferred over busy wait loop. Without this change we would see link failures on boot sometimes as often as 1 in 5 boots. With this fix we have not observed any failures in over 1000 boots. Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support") Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com> Link: https://patch.msgid.link/20241010142908.602712-1-paritoshd@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15net: mtk_eth_soc: use ethtool_putsRosen Penev1-4/+2
Allows simplifying get_strings and avoids manual pointer manipulation. Tested on Belkin RT1800. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20241011200225.7403-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: mvneta: use ethtool_putsRosen Penev1-3/+1
Allows simplifying get_strings and avoids manual pointer manipulation. Tested on Turris Omnia. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20241011195955.7065-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15mlx4: Add support for persistent NAPI config to RX CQsJoe Damato1-1/+2
Use netif_napi_add_config to assign persistent per-NAPI config when initializing RX CQ NAPIs. Presently, struct napi_config only has support for two fields used for RX, so there is no need to support them with TX CQs, yet. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-10-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15mlx5: Add support for persistent NAPI configJoe Damato1-1/+1
Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-9-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15bnxt: Add support for persistent NAPI configJoe Damato1-1/+2
Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-8-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15tg3: Address byte-order miss-matchesSimon Horman1-8/+14
Address byte-order miss-matches flagged by Sparse. In tg3_load_firmware_cpu() and tg3_get_device_address() this is done using appropriate types to store big endian values. In the cases of tg3_test_nvram(), where buf is an array which contains values of several different types, cast to __le32 before converting values to host byte order. Reported by Sparse as: .../tg3.c:3745:34: warning: cast to restricted __be32 .../tg3.c:13096:21: warning: cast to restricted __le32 .../tg3.c:13096:21: warning: cast from restricted __be32 .../tg3.c:13101:21: warning: cast to restricted __le32 .../tg3.c:13101:21: warning: cast from restricted __be32 .../tg3.c:17070:63: warning: incorrect type in argument 3 (different base types) .../tg3.c:17070:63: expected restricted __be32 [usertype] *val .../tg3.c:17070:63: got unsigned int * dr.../tg3.c:17071:63: warning: incorrect type in argument 3 (different base types) .../tg3.c:17071:63: expected restricted __be32 [usertype] *val .../tg3.c:17071:63: got unsigned int * Also, address white-space issues on lines modified for the above. And, for consistency, lines adjacent to them. Compile tested only. No functional change intended. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-tg3-sparse-v1-1-6af38a7bf4ff@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: lan743x: Remove duplicate checkJinjie Ruan1-21/+14
Since timespec64_valid() has been checked in higher layer pc_clock_settime(), the duplicate check in lan743x_ptpci_settime64() can be removed. Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20241009072302.1754567-3-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14net: ethernet: ti: cpsw_ale: Remove unused accessor functionsSimon Horman1-9/+21
W=1 builds flag that some accessor functions for ALE fields are unused. Address this by splitting up the macros used to define these accessors to allow only those that are used to be declared. The warnings are verbose, but for example, the mcast_state case is flagged by clang-18 as: .../cpsw_ale.c:220:1: warning: unused function 'cpsw_ale_get_mcast_state' [-Wunused-function] 220 | DEFINE_ALE_FIELD(mcast_state, 62, 2) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .../cpsw_ale.c:145:19: note: expanded from macro 'DEFINE_ALE_FIELD' 145 | static inline int cpsw_ale_get_##name(u32 *ale_entry) \ | ^~~~~~~~~~~~~~~~~~~ <scratch space>:196:1: note: expanded from here 196 | cpsw_ale_get_mcast_state | ^~~~~~~~~~~~~~~~~~~~~~~~ Compile tested only. No functional change intended. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-14net: ethernet: ti: am65-cpsw: Use tstats instead of open coded versionSimon Horman2-93/+8
Make use of struct pcpu_sw_netstats and related helpers to handle existing per-cpu stats for this driver - the exact same counters are maintained. A side effect of this change is to address __percpu warnings flagged by Sparse: .../am65-cpsw-nuss.c:2658:55: warning: incorrect type in initializer (different address spaces) .../am65-cpsw-nuss.c:2658:55: expected struct am65_cpsw_ndev_stats [noderef] __percpu *stats .../am65-cpsw-nuss.c:2658:55: got void *data .../am65-cpsw-nuss.c:2781:15: warning: incorrect type in argument 3 (different address spaces) .../am65-cpsw-nuss.c:2781:15: expected void *data .../am65-cpsw-nuss.c:2781:15: got struct am65_cpsw_ndev_stats [noderef] __percpu *stats Compile tested only. No functional change intended. Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/all/20240911170643.7ecb1bbb@kernel.org/ Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-14net: ethernet: ti: am65-cpsw: Use __be64 type for id_tempSimon Horman1-1/+1
The id_temp local variable in am65_cpsw_nuss_probe() is used to hold a 64-bit big-endian value as it is assigned using cpu_to_be64(). It is read using memcpy(), where it is written as an identifier into a byte-array. So this can also be treated as big endian. As it's type is currently host byte order (u64), sparse flags an endian mismatch when compiling for little-endian systems: .../am65-cpsw-nuss.c:3454:17: warning: incorrect type in assignment (different base types) .../am65-cpsw-nuss.c:3454:17: expected unsigned long long [usertype] id_temp .../am65-cpsw-nuss.c:3454:17: got restricted __be64 [usertype] Address this by using __be64 as the type of id_temp. No functional change intended. Compile tested only. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-14r8169: enable SG/TSO on selected chip versions per defaultHeiner Kallweit1-5/+11
Due to problem reports in the past SG and TSO/TSO6 are disabled per default. It's not fully clear which chip versions are affected, so we may impact also users of unaffected chip versions, unless they know how to use ethtool for enabling SG/TSO/TSO6. Vendor drivers r8168/r8125 enable SG/TSO/TSO6 for selected chip versions per default, I'd interpret this as confirmation that these chip versions are unaffected. So let's do the same here. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-12net: bcmasp: enable SW timestampingJustin Chen2-0/+4
Add skb_tx_timestamp() call and enable support for SW timestamping. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241010221506.802730-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12net: broadcom: remove select MII from brcmstb Ethernet driversJustin Chen1-3/+0
The MII driver isn't used by brcmstb Ethernet drivers. Remove it from the BCMASP, GENET, and SYSTEMPORT drivers. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241010191332.1074642-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12net: enetc: disable NAPI after all rings are disabledWei Fang1-6/+6
When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC on LS1028A, it was found that if the command was re-run multiple times, Rx could not receive the frames, and the result of xdp-bench showed that the rx rate was 0. root@ls1028ardb:~# ./xdp-bench tx eno0 Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) Summary 2046 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s By observing the Rx PIR and CIR registers, CIR is always 0x7FF and PIR is always 0x7FE, which means that the Rx ring is full and can no longer accommodate other Rx frames. Therefore, the problem is caused by the Rx BD ring not being cleaned up. Further analysis of the code revealed that the Rx BD ring will only be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. Therefore, some debug logs were added to the driver and the current values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx BD ring was full. The logs are as follows. [ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110 From the results, the max value of xdp_tx_in_flight has reached 2140. However, the size of the Rx BD ring is only 2048. So xdp_tx_in_flight did not drop to 0 after enetc_stop() is called and the driver does not clear it. The root cause is that NAPI is disabled too aggressively, without having waited for the pending XDP_TX frames to be transmitted, and their buffers recycled, so that xdp_tx_in_flight cannot naturally drop to 0. Later, enetc_free_tx_ring() does free those stale, unsent XDP_TX packets, but it is not coded up to also reset xdp_tx_in_flight, hence the manifestation of the bug. One option would be to cover this extra condition in enetc_free_tx_ring(), but now that the ENETC_TX_DOWN exists, we have created a window at the beginning of enetc_stop() where NAPI can still be scheduled, but any concurrent enqueue will be blocked. Therefore, enetc_wait_bdrs() and enetc_disable_tx_bdrs() can be called with NAPI still scheduled, and it is guaranteed that this will not wait indefinitely, but instead give us an indication that the pending TX frames have orderly dropped to zero. Only then should we call napi_disable(). This way, enetc_free_tx_ring() becomes entirely redundant and can be dropped as part of subsequent cleanup. The change also refactors enetc_start() so that it looks like the mirror opposite procedure of enetc_stop(). Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet processing") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241010092056.298128-5-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12net: enetc: disable Tx BD rings after they are emptyWei Fang1-10/+26
The Tx BD rings are disabled first in enetc_stop() and the driver waits for them to become empty. This operation is not safe while the ring is actively transmitting frames, and will cause the ring to not be empty and hardware exception. As described in the NETC block guide, software should only disable an active Tx ring after all pending ring entries have been consumed (i.e. when PI = CI). Disabling a transmit ring that is actively processing BDs risks a HW-SW race hazard whereby a hardware resource becomes assigned to work on one or more ring entries only to have those entries be removed due to the ring becoming disabled. When testing XDP_REDIRECT feautre, although all frames were blocked from being put into Tx rings during ring reconfiguration, the similar warning log was still encountered: fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear The reason is that when there are still unsent frames in the Tx ring, disabling the Tx ring causes the remaining frames to be unable to be sent out. And the Tx ring cannot be restored, which means that even if the xdp program is uninstalled, the Tx frames cannot be sent out anymore. Therefore, correct the operation order in enect_start() and enect_stop(). Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet processing") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241010092056.298128-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12net: enetc: block concurrent XDP transmissions during ring reconfigurationWei Fang2-0/+15
When testing the XDP_REDIRECT function on the LS1028A platform, we found a very reproducible issue that the Tx frames can no longer be sent out even if XDP_REDIRECT is turned off. Specifically, if there is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on, the console may display some warnings like "timeout for tx ring #6 clear", and all redirected frames will be dropped, the detailed log is as follows. root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2 Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc) [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s 15.71 bulk-avg eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s 15.71 bulk-avg By analyzing the XDP_REDIRECT implementation of enetc driver, the driver will reconfigure Tx and Rx BD rings when a bpf program is installed or uninstalled, but there is no mechanisms to block the redirected frames when enetc driver reconfigures rings. Similarly, XDP_TX verdicts on received frames can also lead to frames being enqueued in the Tx rings. Because XDP ignores the state set by the netif_tx_wake_queue() API, so introduce the ENETC_TX_DOWN flag to suppress transmission of XDP frames. Fixes: c33bfaf91c4c ("net: enetc: set up XDP program under enetc_reconfigure()") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241010092056.298128-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12net: enetc: remove xdp_drops statistic from enetc_xdp_drop()Wei Fang1-1/+1
The xdp_drops statistic indicates the number of XDP frames dropped in the Rx direction. However, enetc_xdp_drop() is also used in XDP_TX and XDP_REDIRECT actions. If frame loss occurs in these two actions, the frames loss count should not be included in xdp_drops, because there are already xdp_tx_drops and xdp_redirect_failures to count the frame loss of these two actions, so it's better to remove xdp_drops statistic from enetc_xdp_drop() and increase xdp_drops in XDP_DROP action. Fixes: 7ed2bc80074e ("net: enetc: add support for XDP_TX") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241010092056.298128-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12net: sparx5: fix source port register when mirroringDaniel Machon1-6/+6
When port mirroring is added to a port, the bit position of the source port, needs to be written to the register ANA_AC_PROBE_PORT_CFG. This register is replicated for n_ports > 32, and therefore we need to derive the correct register from the port number. Before this patch, we wrongly calculate the register from portno / BITS_PER_BYTE, where the divisor ought to be 32, causing any port >=8 to be written to the wrong register. We fix this, by using do_div(), where the dividend is the register, the remainder is the bit position and the divisor is now 32. Fixes: 4e50d72b3b95 ("net: sparx5: add port mirroring implementation") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-mirroring-fix-v1-1-9ec962301989@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12net: emaclite: Adopt clock supportAbin Joseph1-0/+7
Adapt to use the clock framework. Add s_axi_aclk clock from the processor bus clock domain and make clk optional to keep DTB backward compatibility. Signed-off-by: Abin Joseph <abin.joseph@amd.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1728491303-1456171-4-git-send-email-radhey.shyam.pandey@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12net: emaclite: Replace alloc_etherdev() with devm_alloc_etherdev()Abin Joseph1-10/+4
Use device managed ethernet device allocation to simplify the error handling logic. No functional change. Signed-off-by: Abin Joseph <abin.joseph@amd.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1728491303-1456171-3-git-send-email-radhey.shyam.pandey@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11eth: remove the DLink/Sundance (ST201) driverJakub Kicinski3-2006/+0
Konstantin reports the maintainer's address bounces. There is no other maintainer and the driver is quite old. There is a good chance nobody is using this driver any more. Let's try to remove it completely, we can revert it back in if someone complains. Link: https://lore.kernel.org/20240925-bizarre-earwig-from-pluto-1484aa@lemu/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Denis Kirjanov <dkirjanov@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-11tg3: Link queues to NAPIsJoe Damato1-4/+35
Link queues to NAPIs using the netdev-genl API so this information is queryable. First, test with the default setting on my tg3 NIC at boot with 1 TX queue: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}] Now, adjust the number of TX queues to be 4 via ethtool: $ sudo ethtool -L eth0 tx 4 $ sudo ethtool -l eth0 | tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Despite "Combined: n/a" in the ethtool output, /proc/interrupts shows the tg3 has renamed the IRQs to be combined: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Now query this via netlink to ensure the queues are linked properly to their NAPIs: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'tx'}] As you can see above, id 0 for both TX and RX share a NAPI, NAPI ID 8960, and so on for each queue index up to 3. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11tg3: Link IRQs to NAPI instancesJoe Damato1-3/+5
Link IRQs to NAPI instances with netif_napi_set_irq. This information can be queried with the netdev-genl API. Begin by testing my tg3 device in its default state: 1 TX queue and 4 RX queues. Compare the output of /proc/interrupts for my tg3 device with the output of netdev-genl after applying this patch: $ cat /proc/interrupts | grep eth0 343: [...] eth0-tx-0 344: [...] eth0-rx-1 345: [...] eth0-rx-2 346: [...] eth0-rx-3 347: [...] eth0-rx-4 As you can see above, tg3 has named the IRQs such that there is a dedicated tx IRQ and 4 dedicated rx IRQs, for a total of 5 IRQs. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8197, 'ifindex': 2, 'irq': 347}, {'id': 8196, 'ifindex': 2, 'irq': 346}, {'id': 8195, 'ifindex': 2, 'irq': 345}, {'id': 8194, 'ifindex': 2, 'irq': 344}, {'id': 8193, 'ifindex': 2, 'irq': 343}] Netlink displays the same IRQs as above, noting that each is mapped to a unique NAPI instance. Now, reconfigure the NIC to have 4 TX queues and 4 RX queues: $ sudo ethtool -L eth0 rx 4 tx 4 $ sudo ethtool -l eth0 | tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Examine /proc/interrupts once again, noting that tg3 will now rename the IRQs to suggest that they are combined tx and rx without allocating additional IRQs, so the total IRQ count in /proc/interrupts is unchanged: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Check the output from netlink again: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8973, 'ifindex': 2, 'irq': 347}, {'id': 8972, 'ifindex': 2, 'irq': 346}, {'id': 8971, 'ifindex': 2, 'irq': 345}, {'id': 8970, 'ifindex': 2, 'irq': 344}, {'id': 8969, 'ifindex': 2, 'irq': 343}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11r8169: remove original workaround for RTL8125 broken rx issueHeiner Kallweit1-4/+0
Now that we have b9c7ac4fe22c ("r8169: disable ALDPS per default for RTL8125"), the first attempt to fix the issue shouldn't be needed any longer. So let's effectively revert 621735f59064 ("r8169: fix rare issue with broken rx after link-down on RTL8125") and see whether anybody complains. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/382d8c88-cbce-400f-ad62-fda0181c7e38@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11r8169: don't apply UDP padding quirk on RTL8126AHeiner Kallweit1-4/+10
Vendor drivers r8125/r8126 indicate that this quirk isn't needed any longer for RTL8126A. Mimic this in r8169. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/d1317187-aa81-4a69-b831-678436e4de62@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski34-117/+201
Cross-merge networking fixes after downstream PR (net-6.12-rc3). No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10Merge branch 'net-introduce-tx-h-w-shaping-api'Jakub Kicinski15-3/+726
Paolo Abeni says: ==================== net: introduce TX H/W shaping API We have a plurality of shaping-related drivers API, but none flexible enough to meet existing demand from vendors[1]. This series introduces new device APIs to configure in a flexible way TX H/W shaping. The new functionalities are exposed via a newly defined generic netlink interface and include introspection capabilities. Some self-tests are included, on top of a dummy netdevsim implementation. Finally a basic implementation for the iavf driver is provided. Some usage examples: * Configure shaping on a given queue: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do set --json '{"ifindex": '$IFINDEX', "shaper": {"handle": {"scope": "queue", "id":'$QUEUEID'}, "bw-max": 2000000}}' * Container B/W sharing The orchestration infrastructure wants to group the container-related queues under a RR scheduling and limit the aggregate bandwidth: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope":"node"}, "bw-max": 10000000}' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}} Q1 \ \ Q2 -- node 0 ------- netdev / (bw-max: 10M) Q3 / * Delegation A containers wants to limit the aggregate B/W bandwidth of 2 of the 3 queues it owns - the starting configuration is the one from the previous point: SPEC=Documentation/netlink/specs/net_shaper.yaml ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], "handle": {"scope": "node"}, "bw-max": 5000000 }' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}} Q1 -- node 1 --------\ / (bw-max: 5M) \ Q2 / node 0 ------- netdev /(bw-max: 10M) Q3 ------------------/ In a group operation, when prior to the op itself, the leaves have different parents, the user must specify the parent handle for the group. I.e., starting from the previous config: ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "bw-max": 3000000 }' Netlink error: Invalid argument nl_len = 96 (80) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'All the leaves shapers must have the same old parent'} ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "parent": {"scope": "node", "id": 1}, "bw-max": 3000000 } {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}} Q1 -- node 2 --- /(bw-max:3M)\ Q3 / \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) * Cleanup: Still starting from config 1To delete a single queue shaper ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID3'}}' Q1 -- node 2 --- (bw-max:3M)\ \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) Deleting a node shaper relinks all its leaves to the node's parent: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id":2}}' Q1 ---\ \ node 1----- \ / (bw-max: 5M)\ Q2----/ node 0 ------- netdev (bw-max: 10M) Deleting the last shaper under a node shaper deletes the node, too: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID1'}}' ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID2'}}' ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 1}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} Such delete recurses on parents that are left over with no leaves: ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 0}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} v8: https://lore.kernel.org/cover.1727704215.git.pabeni@redhat.com v7: https://lore.kernel.org/cover.1725919039.git.pabeni@redhat.com v6: https://lore.kernel.org/cover.1725457317.git.pabeni@redhat.com v5: https://lore.kernel.org/cover.1724944116.git.pabeni@redhat.com v4: https://lore.kernel.org/cover.1724165948.git.pabeni@redhat.com v3: https://lore.kernel.org/cover.1722357745.git.pabeni@redhat.com RFC v2: https://lore.kernel.org/cover.1721851988.git.pabeni@redhat.com RFC v1: https://lore.kernel.org/cover.1719518113.git.pabeni@redhat.com ==================== Link: https://patch.msgid.link/cover.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10iavf: add support to exchange qos capabilitiesSudheer Mogilappagari3-3/+150
During driver initialization VF determines QOS capability is allowed by PF and receives QOS parameters. After which quanta size for queues is configured which is not configurable and is set to 1KB currently. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/72cbeb9c88d40e557053c57d7531c96bed490576.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10iavf: Add net_shaper_ops supportSudheer Mogilappagari5-1/+182
Implement net_shaper_ops support for IAVF. This enables configuration of rate limiting on per queue basis. Customer intends to enforce bandwidth limit on Tx traffic steered to the queue by configuring rate limits on the queue. To set rate limiting for a queue, update shaper object of given queues in driver and send VIRTCHNL_OP_CONFIG_QUEUE_BW to PF to update HW configuration. Deleting shaper configured for queue is nothing but configuring shaper with bw_max 0. The PF restores the default rate limiting config when bw_max is zero. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/5a882cb51998c4c2c3d21fed521498eba1c8f079.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10ice: Support VF queue rate limit and quanta size configurationWenjun Wu10-0/+395
Add support to configure VF queue rate limit and quanta size. For quanta size configuration, the quanta profiles are divided evenly by PF numbers. For each port, the first quanta profile is reserved for default. When VF is asked to set queue quanta size, PF will search for an available profile, change the fields and assigned this profile to the queue. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/fddefc2c1ec3ab32b241ce444af401da19e834dd.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net/mlx5: Add support check for TSAR types in QoS schedulingCarolina Jubran4-2/+34
Introduce a new function, mlx5_qos_tsar_type_supported(), to handle the validation of TSAR types within QoS scheduling contexts. Refactor the existing code to use this new function, replacing direct checks for TSAR type support in the NIC scheduling hierarchy. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: Unify QoS element type checks across NIC and E-SwitchCarolina Jubran4-23/+44
Refactor the QoS element type support check by introducing a new function, mlx5_qos_element_type_supported(), which handles element type validation for both NIC and E-Switch schedulers. This change removes the redundant esw_qos_element_type_supported() function and unifies the element type checks into a single implementation. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Refactor locking to a qos domain mutexCosmin Ratiu5-39/+74
E-Switch qos changes used the esw state_lock to serialize qos changes. With the introduction of cross-esw scheduling, multiple E-Switches might be involved in a qos operation, so prepare for that by switching locking to use a qos domain mutex. Add three helper functions: - esw_qos_lock - esw_qos_unlock - esw_assert_qos_lock_held Convert existing direct lock/unlock/lockdep calls to them. Also call esw_assert_qos_lock_held in a couple more places. mlx5_esw_qos_set_vport_rate expected to be called with the esw state_lock already held. Change it to instead acquire the qos lock directly. mlx5_eswitch_get_vport_config also accessed qos properties with the esw state lock. Introduce a new function mlx5_esw_qos_get_vport_rate to access those with the correct lock and change get_vport_config to use it. Finally, mlx5_vport_disable is called from the cleanup path with the esw state_lock held, so have it additionally acquire the qos lock to make sure there are no races. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Store rate groups in a qos domainCosmin Ratiu4-11/+65
Groups are currently maintained as a list in their corresponding eswitch, protected by the esw state_lock. The upcoming cross-eswitch scheduling feature cannot work with this approach, as it would require acquiring multiple eswitch locks (in the correct order) in order to maintain group membership. This commit moves the rate groups into a new 'qos domain' struct and adds explicit qos init/cleanup steps to the eswitch init/cleanup. Upcoming patches will expand the qos domain struct and allow it to be shared between eswitches. For now, qos domains are private to each esw so there's only an extra indirection. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Rename rate group 'list' as 'parent_entry'Cosmin Ratiu1-5/+5
'list' is not very descriptive, I prefer list membership to clearly specify which list the entry belongs to. This commit renames the list entry into the esw groups list as 'parent_entry' to make the code more readable. This is a no-op change. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Add an explicit 'dev' to vport trace callsCosmin Ratiu2-13/+16
vport qos trace calls used vport->dev implicitly as the device to which the command was sent (and thus the device logged in traces). But that will no longer be the case for cross-esw scheduling, where the commands have to be sent to the group esw device instead. This commit corrects that. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Store the eswitch in a mlx5_esw_rate_groupCosmin Ratiu1-63/+52
The rate groups are about to be moved out of eswitches, so store a reference to the eswitch they belong to so things can still work later. This allows dropping the esw parameter from a couple of functions and simplifying some of the code. Use this opportunity to make sure that vport scheduling element commands are always sent to the group eswitch, because that will be relevant for cross-esw scheduling. For now though, the eswitches are not different. There is no functionality change here. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Drop 'esw' param from vport qos functionsCosmin Ratiu7-60/+57
The vport has a pointer to its own eswitch in vport->dev->priv.eswitch, so passing the same eswitch as a parameter to the various functions manipulating vport qos is superfluous at best and prone to errors at worst. More importantly, with the upcoming cross-esw scheduling changes, the eswitch that should receive the various scheduling element commands is NOT the same as the vport's eswitch, so the current code's assumptions will break. To avoid confusion and bugs, this commit drops the 'esw' parameter from all vport qos functions and uses the vport's own eswitch pointer instead. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>