summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-04-15Merge branch 'eth-fbnic-extend-hardware-stats-coverage'Paolo Abeni8-10/+685
Mohsin Bashir says: ==================== eth: fbnic: extend hardware stats coverage This patch series extends the coverage for hardware stats reported via `ethtool -S`, queue API, and rtnl link stats. The patchset is organized as follow: - The first patch adds locking support to protect hardware stats. - The second patch provides coverage to the hardware queue stats. - The third patch covers the RX buffer related stats. - The fourth patch covers the TMI (TX MAC Interface) stats. - The last patch cover the TTI (TX TEI Interface) stats. ==================== Link: https://patch.msgid.link/20250410070859.4160768-1-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15eth: fbnic: add support for TTI HW statsMohsin Bashir6-1/+99
Add coverage for the TX Extension (TEI) Interface (TTI) stats. We are tracking packets and control message drops because of credit exhaustion on the TX interface. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250410070859.4160768-6-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15eth: fbnic: add support for TMI statsMohsin Bashir6-0/+60
This patch add coverage for TMI stats including PTP stats and drop stats. PTP stats include illegal requests, bad timestamp and good timestamps. The bad timestamp and illegal request counters are reported under as `error` via `ethtool -T` Both these counters are individually being reported via `ethtool -S` The good timestamp stats are being reported as `pkts` via `ethtool -T` ethtool -S eth0 | grep "ptp" ptp_illegal_req: 0 ptp_good_ts: 0 ptp_bad_ts: 0 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250410070859.4160768-5-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15eth: fbnic: add coverage for RXB statsMohsin Bashir6-1/+356
This patch provides coverage to the RXB (RX Buffer) stats. RXB stats are divided into 3 sections: RXB enqueue, RXB FIFO, and RXB dequeue stats. The RXB enqueue/dequeue stats are indexed from 0-3 and cater for the input/output counters whereas, the RXB fifo stats are indexed from 0-7. The RXB also supports pause frame stats counters which we are leaving for a later patch. ethtool -S eth0 | grep rxb rxb_integrity_err0: 0 rxb_mac_err0: 0 rxb_parser_err0: 0 rxb_frm_err0: 0 rxb_drbo0_frames: 1433543 rxb_drbo0_bytes: 775949081 --- --- rxb_intf3_frames: 1195711 rxb_intf3_bytes: 739650210 rxb_pbuf3_frames: 1195711 rxb_pbuf3_bytes: 765948092 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250410070859.4160768-4-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15eth: fbnic: add coverage for hw queue statsMohsin Bashir6-8/+156
This patch provides support for hardware queue stats and covers packet errors for RX-DMA engine, RCQ drops and BDQ drops. The packet errors are also aggregated with the `rx_errors` stats in the `rtnl_link_stats` as well as with the `hw_drops` in the queue API. The RCQ and BDQ drops are aggregated with `rx_over_errors` in the `rtnl_link_stats` as well as with the `hw_drop_overruns` in the queue API. ethtool -S eth0 | grep -E 'rde' rde_0_pkt_err: 0 rde_0_pkt_cq_drop: 0 rde_0_pkt_bdq_drop: 0 --- --- rde_127_pkt_err: 0 rde_127_pkt_cq_drop: 0 rde_127_pkt_bdq_drop: 0 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250410070859.4160768-3-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15eth: fbnic: add locking support for hw statsMohsin Bashir3-3/+17
This patch adds lock protection for the hardware statistics for fbnic. The hardware statistics access via ndo_get_stats64 is not protected by the rtnl_lock(). Since these stats can be accessed from different places in the code such as service task, ethtool, Q-API, and net_device_ops, a lock-less approach can lead to races. Note that this patch is not a fix rather, just a prep for the subsequent changes in this series. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250410070859.4160768-2-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15net: mdio: Add RTL9300 MDIO driverChris Packham3-0/+530
Add a driver for the MDIO controller on the RTL9300 family of Ethernet switches with integrated SoC. There are 4 physical SMI interfaces on the RTL9300 however access is done using the switch ports. The driver takes the MDIO bus hierarchy from the DTS and uses this to configure the switch ports so they are associated with the correct PHY. This mapping is also used when dealing with software requests from phylib. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250409231554.3943115-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15Merge branch 'net-stmmac-qcom-ethqos-simplifications'Jakub Kicinski1-28/+15
Russell King says: ==================== net: stmmac: qcom-ethqos: simplifications Remove unnecessary code from the qcom-ethqos glue driver. Start by consistently using -> serdes_speed to set the speed of the serdes PHY rather than sometimes using ->serdes_speed and sometimes using ->speed. This then allows the removal of ->speed in the second patch. There is no need to set the maximum speed just because we're using 2500BASE-X - phylink already knows that 2500BASE-X can't support faster speeds. This then makes qcom_ethqos_speed_mode_2500() redundant as it's setting the interface mode to the value that was determined in the switch statement that already determined that the interface mode had this value. Not tested on hardware. ==================== Link: https://patch.msgid.link/Z_p0LzY2_HFupWK0@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: stmmac: qcom-ethqos: remove speed_mode_2500() methodRussell King (Oracle)1-9/+0
qcom-ethqos doesn't need to implement the speed_mode_2500() method as it is only setting priv->plat->phy_interface to 2500BASE-X, which is already a pre-condition for assigning speed_mode_2500 in qcom_ethqos_probe(). So, qcom_ethqos_speed_mode_2500() has no effect. Remove it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1u3bYa-000EcW-H1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: stmmac: qcom-ethqos: remove unnecessary setting max_speedRussell King (Oracle)1-1/+0
Phylink will already limit the MAC speed according to the interface, so if 2500BASE-X is selected, the maximum speed will be 2.5G. It is, therefore, not necessary to set a speed limit. Remove setting plat_dat->max_speed from this glue driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1u3bYV-000EcQ-Cv@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: stmmac: qcom-ethqos: remove ethqos->speedRussell King (Oracle)1-17/+14
Rather than ethqos_fix_mac_speed() storing the speed in struct qcom_ethqos and then functions that are only called from here reading that speed, pass the speed to the called functions instead. This removes all readers of this struct member, which then allows the removal of the two places that set its value and the struct member. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1u3bYQ-000EcK-9K@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: stmmac: qcom-ethqos: set serdes speed using serdes_speedRussell King (Oracle)1-1/+1
ethqos->serdes_speed represents the current speed the serdes was configured for, which should be the same as ethqos->speed. Since we wish to remove ethqos->speed to simplify the code, switch to using the serdes_speed instead. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1u3bYL-000EcE-5c@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15Merge branch 'rxrpc-afs-add-afs-gssapi-security-class-to-af_rxrpc-and-kafs'Jakub Kicinski40-245/+4272
David Howells says: ==================== rxrpc, afs: Add AFS GSSAPI security class to AF_RXRPC and kafs Here's a set of patches to add basic support for the AFS GSSAPI security class to AF_RXRPC and kafs. It provides transport security for keys that match the security index 6 (YFS) for connections to the AFS fileserver and VL server. Note that security index 4 (OpenAFS) can also be supported using this, but it needs more work as it's slightly different. The patches also provide the ability to secure the callback channel - connections from the fileserver back to the client that are used to pass file change notifications, amongst other things. When challenged by the fileserver, kafs will generate a token specific to that server and include it in the RESPONSE packet as the appdata. The server then extracts this and uses it to send callback RPC calls back to the client. It can also be used to provide transport security on the callback channel, but a further set of patches is required to provide the token and key to set that up when the client responds to the fileserver's challenge. This makes use of the previously added crypto-krb5 library that is now upstream (last commit fc0cf10c04f4). This series of patches consist of the following parts: (0) Update kdoc comments to remove some kdoc builder warnings. (1) Push reponding to CHALLENGE packets over to recvmsg() or the kernel equivalent so that the application layer can include user-defined information in the RESPONSE packet. In a follow-up patch set, this will allow the callback channel to be secured by the AFS filesystem. (2) Add the AF_RXRPC RxGK security class that uses a key obtained from the AFS GSS security service to do Kerberos 5-based encryption instead of pcbc(fcrypt) and pcbc(des). (3) Add support for callback channel encryption in kafs. (4) Provide the test rxperf server module with some fixed krb5 keys. ==================== Link: https://patch.msgid.link/20250411095303.2316168-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: rxperf: Add test RxGK server keysDavid Howells1-3/+65
Add RxGK server keys of bytes containing { 0, 1, 2, 3, 4, ... } to the server keyring for the rxperf test server. This allows the rxperf test client to connect to it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-15-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: Add more CHALLENGE/RESPONSE packet tracingDavid Howells4-1/+85
Add more tracing for CHALLENGE and RESPONSE packets. Currently, rxrpc only has client-relevant tracepoints (rx_challenge and tx_response), but add the server-side ones too. Further, record the service ID in the rx_challenge tracepoint as well. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-14-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15afs: Use rxgk RESPONSE to pass token for callback channelDavid Howells5-1/+276
Implement in kafs the hook for adding appdata into a RESPONSE packet generated in response to an RxGK CHALLENGE packet, and include the key for securing the callback channel so that notifications from the fileserver get encrypted. This will be necessary when more complex notifications are used that convey changed data around. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-13-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: Display security params in the afs_cb_call tracepointDavid Howells8-2/+41
Make the afs_cb_call tracepoint display some security parameters to make debugging easier. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-12-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: Allow the app to store private data on peer structsDavid Howells1-2/+2
Provide a way for the application (e.g. the afs filesystem) to store private data on the rxrpc_peer structs for later retrieval via the call object. This will allow afs to store a pointer to the afs_server object on the rxrpc_peer struct, thereby obviating the need for afs to keep lookup tables by which it can associate an incoming call with server that transmitted it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-11-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: rxgk: Implement connection rekeyingDavid Howells4-7/+181
Implement rekeying of connections with the RxGK security class. This involves regenerating the keys with a different key number as part of the input data after a certain amount of time or a certain amount of bytes encrypted. Rekeying may be triggered by either end. The LSW of the key number is inserted into the security-specific field in the RX header, and we try and expand it to 32-bits to make it last longer. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: Chuck Lever <chuck.lever@oracle.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-10-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)David Howells12-5/+1694
Implement the basic parts of the yfs-rxgk security class (security index 6) to support GSSAPI-negotiated security. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: Chuck Lever <chuck.lever@oracle.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-9-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: rxgk: Provide infrastructure and key derivationDavid Howells5-1/+364
Provide some infrastructure for implementing the RxGK transport security class: (1) A definition of an encoding type, including: - Relevant crypto-layer names - Lengths of the crypto keys and checksums involved - Crypto functions specific to the encoding type - Crypto scheme used for that type (2) A definition of a crypto scheme, including: - Underlying crypto handlers - The pseudo-random function, PRF, used in base key derivation - Functions for deriving usage keys Kc, Ke and Ki - Functions for en/decrypting parts of an sk_buff (3) A key context, with the usage keys required for a derivative of a transport key for a specific key number. This includes keys for securing packets for transmission, extracting received packets and dealing with response packets. (3) A function to look up an encoding type by number. (4) A function to set up a key context and derive the keys. (5) A function to set up the keys required to extract the ticket obtained from the GSS negotiation in the server. (6) Miscellaneous functions for context handling. The keys and key derivation functions are described in: tools.ietf.org/html/draft-wilkinson-afs3-rxgk-11 Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: Chuck Lever <chuck.lever@oracle.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-8-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: Add YFS RxGK (GSSAPI) security classDavid Howells2-0/+202
Add support for the YFS-variant RxGK security class to support GSSAPI-derived authentication. This also allows the use of better crypto over the rxkad security class. The key payload is XDR encoded of the form: typedef int64_t opr_time; const AFSTOKEN_RK_TIX_MAX = 12000; /* Matches entry in rxkad.h */ struct token_rxkad { afs_int32 viceid; afs_int32 kvno; afs_int64 key; afs_int32 begintime; afs_int32 endtime; afs_int32 primary_flag; opaque ticket<AFSTOKEN_RK_TIX_MAX>; }; struct token_rxgk { opr_time begintime; opr_time endtime; afs_int64 level; afs_int64 lifetime; afs_int64 bytelife; afs_int64 enctype; opaque key<>; opaque ticket<>; }; const AFSTOKEN_UNION_NOAUTH = 0; const AFSTOKEN_UNION_KAD = 2; const AFSTOKEN_UNION_YFSGK = 6; union ktc_tokenUnion switch (afs_int32 type) { case AFSTOKEN_UNION_KAD: token_rxkad kad; case AFSTOKEN_UNION_YFSGK: token_rxgk gk; }; const AFSTOKEN_LENGTH_MAX = 16384; typedef opaque token_opaque<AFSTOKEN_LENGTH_MAX>; const AFSTOKEN_MAX = 8; const AFSTOKEN_CELL_MAX = 64; struct ktc_setTokenData { afs_int32 flags; string cell<AFSTOKEN_CELL_MAX>; token_opaque tokens<AFSTOKEN_MAX>; }; The parser for the basic token struct is already present, as is the rxkad token type. This adds a parser for the rxgk token type. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: Chuck Lever <chuck.lever@oracle.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-7-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: Add the security index for yfs-rxgkDavid Howells3-0/+63
Add the security index and abort codes for the YFS variant of rxgk. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20250411095303.2316168-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSEDavid Howells23-165/+1188
Allow the app to request that CHALLENGEs be passed to it through an out-of-band queue that allows recvmsg() to pick it up so that the app can add data to it with sendmsg(). This will allow the application (AFS or userspace) to interact with the process if it wants to and put values into user-defined fields. This will be used by AFS when talking to a fileserver to supply that fileserver with a crypto key by which callback RPCs can be encrypted (ie. notifications from the fileserver to the client). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: Remove some socket lock acquire/release annotationsDavid Howells3-4/+5
Remove some socket lock acquire/release annotations as lock_sock() and release_sock() don't have them and so the checker gets confused. Removing all of them, however, causes warnings about "context imbalance" and "wrong count at exit" to occur instead. Probably lock_sock() and release_sock() should have annotations on indicating their taking of sk_lock - there is a dep_map in socket_lock_t, but I don't know if that matters to the static checker. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: Pull out certain app callback funcs into an ops tableDavid Howells6-48/+55
A number of functions separately furnish an AF_RXRPC socket with callback function pointers into a kernel app (such as the AFS filesystem) that is using it. Replace most of these with an ops table for the entire socket. This makes it easier to add more callback functions. Note that the call incoming data processing callback is retaind as that gets set to different things, depending on the type of op. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15rxrpc: kdoc: Update function descriptions and add link from rxrpc.rstDavid Howells7-16/+61
Update the kerneldoc function descriptions to add "Return:" sections for AF_RXRPC exported functions that have return values to stop the kdoc builder from throwing warnings. Also add links from the rxrpc.rst API doc to add a function API reference at the end. (Note that the API doc really needs updating, but that's beyond this patchset). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15Merge branch 'net-mlx5-hws-refactor-action-ste-handling'Jakub Kicinski20-969/+972
Tariq Toukan says: ==================== net/mlx5: HWS, Refactor action STE handling This patch series by Vlad refactors how action STEs are handled for hardware steering. Definitions ---------- * STE (Steering Table Entry): a building block for steering rules. Simple rules consist of a single STE that specifies both the match value and what actions to do. For more complex rules we have one or more match STEs that point to one or more action STEs. It is these action STEs which this patch series is primarily concerned with. * RTC (Rule Table Context): a table that contains STEs. A matcher currently consists of a match RTC and, if necessary, an action RTC. This patch series decouples action RTCs from matchers and moves action RTCs to a central pool. * Matcher: a logical container for steering rules. While the items above describe hardware concepts, a matcher is purely a software construct. Current situation ----------------- As mentioned above, a matcher currently consists of a match RTC (or more, in case of complex matchers) and zero or one action STCs. An action STC is only allocated if the matcher contains sufficiently complicated action templates, or many actions. When adding a rule, we decide based on its action template whether it requires action STEs. If yes, we allocate the required number of action STEs from the matcher's action STE. When updating a rule, we need to prevent the rule ever being in an invalid state. So we need to allocate and write new action STEs first, then update the match STE to point to them, and finally release the old action STEs. So there is a state when a rule needs double the action STEs it normally uses. Thus, for a given matcher of log_sz=N, log_action_ste_sz=A, the action STC log_size is (N + A + 1). We need enough space to hold all the rules' action STEs, and effectively double that space to account for the not very common case of rules being updated. We could manage with much fewer extra action STEs, but RTCs are allocated in powers of two. This results in effective utilization of action RTCs of 50%, outside rule update cases. This is further complicated when resizing matchers. To avoid updating all the rules to point to new match STEs, we keep existing action RTCs around as resize_data, and only free them when the matcher is freed. Action STE pool --------------- This patch series decouples action RTCs from matchers by creating a per-queue pool. When a rule needs to allocate action STEs it does so from the pool, creating a new RTC if needed. During update two sets of action STEs are in use, possibly from different RTCs. The pool is sharded per-queue to avoid lock contention. Each per-queue pool consists of 3 elements, corresponding to rx-only, tx-only and rx-and-tx use cases. The series takes this approach because rules that are bidirectional require that their action STEs have the same index in the rx- and tx-RTCs, and using a single RTC would result in unidirectional rules wasting the STEs for the unused direction. Pool elements, in turn, consist of a list of RTCs. The driver progressively allocates larger RTCs as they are needed to amortize the cost of allocation. Allocation of elements (STEs) inside RTCs is modelled by an existing mechanism, somewhat confusingly also known as a pool. The first few patches in the series refactor this abstraction to simplify it and adapt it to the new schema. Finally, this series implements periodic cleanup of unused action RTCs as a new feature. Previously, once a matcher allocated an action RTC, it would only be freed when the matcher is freed. This resulted in a lot of wasted memory for matchers that had previously grown, but were now mostly unused. Conversely, action STE pools have a timestamp of when they were last used. A cleanup routine periodically checks all pools. If a pool's last usage was too far in the past, it is destroyed. Benchmarks ---------- The test module creates a batch of (1 << 18) rules per queue and then deletes them, in a loop. The rules are complex enough to require two action STEs per rule. Each queue is manipulated from a separate kernel workqueue, so there is a 1:1 correspondence between threads and queues. There are sleep statements between insert and delete batches so that memory usage can be evaluated using `free -m`. The numbers below are the diff between base memory usage (without the mlx5 module inserted) and peak usage while running a test. The values are rounded to the nearest hundred megabytes. The `queues` column lists how many queues the test used. queues mem_before mem_after 1 1300M 800M 4 4000M 2300M 8 7300M 3300M Across all of the tests, insertion and deletion rates are the same before and after these patches. Summary of the patches ---------------------- * Patch 1: Fix matcher action template attach to avoid overrunning the buffer and correctly report errors. * Patches 2-7: Cleanup the existing pool abstraction. Clarify semantics, and use cases, simplify API and callers. * Patch 8: Implement the new action STE pool structure. * Patch 9: Use the action STE pool when manipulating rules. * Patch 10: Remove action RTC from matcher. * Patch 11: Add logic to periodically check and free unused action RTCs. * Patch 12: Export action STE tables in debugfs for our dump tool. ==================== Link: https://patch.msgid.link/1744312662-356571-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Export action STE tables to debugfsVlad Dogaru2-1/+37
Introduce a new type of dump object and dump all action STE tables, along with information on their RTCs and STEs. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-13-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Free unused action STE tablesVlad Dogaru3-4/+96
Periodically check for unused action STE tables and free their associated resources. In order to do this safely, add a per-queue lock to synchronize the garbage collect work with regular operations on steering rules. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-12-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Cleanup matcher action STE tableVlad Dogaru8-401/+87
Remove the matcher action STE implementation now that the code uses per-queue action STE pools. This also allows simplifying matcher code because it is now only handling a single type of RTC/STE. The matcher resize data is also going away. Matchers were saving old action STE data because the rules still used it, but now that data lives in the action STE pool and is no longer coupled to a matcher. Furthermore, matchers no longer need to rehash a due to action template addition. If a new action template needs more action STEs, we simply update the matcher's num_of_action_stes and future rules will allocate the correct number. Existing rules are unaffected by such an operation and can continue to use their existing action STEs. The range action was using the matcher action STE implementation, but there was no reason to do this other than the container fitting the purpose. Extract that information to a separate structure. Finally, stop dumping per-matcher information about action RTCs, because they no longer exist. A later patch in this series will add support for dumping action STE pools. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-11-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Use the new action STE poolVlad Dogaru2-51/+30
Use the central action STE pool when creating / updating rules. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-10-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Implement action STE poolVlad Dogaru7-1/+457
Implement a per-queue pool of action STEs that match STEs can link to, regardless of matcher. The code relies on hints to optimize whether a given rule is added to rx-only, tx-only or both. Correspondingly, action STEs need to be added to different RTC for ingress or egress paths. For rx-and-tx rules, the current rule implementation dictates that the offsets for a given rule must be the same in both RTCs. To avoid wasting STEs, each action STE pool element holds 3 pools: rx-only, tx-only, and rx-and-tx, corresponding to the possible values of the pool optimization enum. The implementation then chooses at rule creation / update which of these elements to allocate from. Each element holds multiple action STE tables, which wrap an RTC, an STE range, the logic to buddy-allocate offsets from the range, and an STC that allows match STEs to point to this table. When allocating offsets from an element, we iterate through available action STE tables and, if needed, create a new table. Similar to the previous implementation, this iteration does not free any resources. This is implemented in a subsequent patch. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-9-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Fix pool size optimizationVlad Dogaru1-2/+2
The optimization to create a size-one STE range for the unused direction was broken. The hardware prevents us from creating RTCs over unallocated STE space, so the only reason this has worked so far is because the optimization was never used. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-8-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Add fullness tracking to poolVlad Dogaru2-0/+32
Future users will need to query whether a pool is empty. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-7-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Cleanup after pool refactoringVlad Dogaru5-42/+14
Remove members which are now no longer used. In fact, many of the `struct mlx5hws_pool_chunk` were not even written to beyond being initialized, but they were used in various internals. Also cleanup some local variables which made more sense when the API was thicker. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Refactor pool implementationVlad Dogaru5-337/+116
Refactor the pool implementation to remove unused flags and clarify its usage. A pool represents a single range of STEs or STCs which are allocated at pool creation time. Pools are used under three patterns: 1. STCs are allocated one at a time from a global pool using a bitmap based implementation. 2. Action STEs are allocated in power-of-two blocks using a buddy algorithm. 3. Match STEs do not use allocation, since insertion into these tables is based on hashes or direct addressing. In such cases we use a pool only to create the STE range. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Make pool single resourceVlad Dogaru5-178/+91
The pool implementation claimed to support multiple resources, but this does not really make sense in context. Callers always allocate a single STC or STE chunk of exactly the size provided. The code that handled multiple resources was unused (and likely buggy) due to the combination of flags passed by callers. Simplify the pool by having it handle a single resource. As a result of this simplification, chunks no longer contain a resource offset (there is now only one resource per pool), and the get_base_id functions no longer take a chunk parameter. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Remove unused element arrayVlad Dogaru2-38/+23
Remove the array of elements wrapped in a struct because in reality only the first element was ever used. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net/mlx5: HWS, Fix matcher action template attachVlad Dogaru5-24/+97
The procedure of attaching an action template to an existing matcher had a few issues: 1. Attaching accidentally overran the `at` array in bwc_matcher, which would result in memory corruption. This bug wasn't triggered, but it is possible to trigger it by attaching action templates beyond the initial buffer size of 8. Fix this by converting to a dynamically sized buffer and reallocating if needed. 2. Similarly, the `at` array inside the native matcher was never reallocated. Fix this the same as above. 3. The bwc layer treated any error in action template attach as a signal that the matcher should be rehashed to account for a larger number of action STEs. In reality, there are other unrelated errors that can arise and they should be propagated upstack. Fix this by adding a `need_rehash` output parameter that's orthogonal to error codes. Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling") Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: dsa: microchip: add ETS scheduler support for KSZ88x3 switchesOleksij Rempel2-3/+113
Implement Enhanced Transmission Selection scheduler (ETS) support for KSZ88x3 devices, which support two fixed egress scheduling modes: Strict Priority and Weighted Fair Queuing (WFQ). Since the switch does not allow remapping priorities to queues or adjusting weights, this implementation only supports enabling strict priority mode. If strict mode is not explicitly requested, the switch falls back to its default WFQ mode. This patch introduces KSZ88x3-specific handlers for ETS add and delete operations and uses TXQ Split Control registers to toggle the WFQ enable bit per queue. Corresponding macros are also added for register access. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250410124249.2728568-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15Merge branch 'net-stmmac-remove-unnecessary-initialisation-of-1-s-tic-counter'Jakub Kicinski5-42/+1
Russell King says: ==================== net: stmmac: remove unnecessary initialisation of 1µs TIC counter In commit 8efbdbfa9938 ("net: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"), code to initialise the LPI 1us counter in dwmac4's initialisation was added, making the initialisation in glue drivers unnecessary. This series cleans up the now redundant initialisation. ==================== Link: https://patch.msgid.link/Z_oe0U5E0i3uZbop@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: stmmac: remove GMAC_1US_TIC_COUNTER definitionRussell King (Oracle)1-1/+0
GMAC_1US_TIC_COUNTER is now no longer used, so remove the definition. This was duplicated by GMAC4_MAC_ONEUS_TIC_COUNTER further down in the same file. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u3Vv0-000E87-DQ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: stmmac: remove eee_usecs_rateRussell King (Oracle)1-1/+0
plat_dat->eee_users_rate is now unused, so remove this member. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u3Vuv-000E7y-9k@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: stmmac: intel-plat: remove eee_usecs_rate and hardware writeRussell King (Oracle)1-9/+0
Remove the write to GMAC_1US_TIC_COUNTER for two reasons: 1. during initialisation or reinitialisation of the DWMAC core, the core is reset, which sets this register back to its default value. Writing it prior to stmmac_dvr_probe() has no effect. 2. Since commit 8efbdbfa9938 ("net: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"), GMAC4/5 core code will set this register based on the rate of plat->stmmac_clk. This clock is fetched by devm_stmmac_probe_config_dt(), and plat->clk_ptp_rate will be set to its rate profided a "ptp_ref" clock is not provided. In any case, Marek's commit will set the effectual value of this register. Therefore, dwmac-intel-plat.c writing GMAC_1US_TIC_COUNTER serves no useful purpose and can be removed. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u3Vuq-000E7s-5Y@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: stmmac: intel: remove eee_usecs_rate and hardware writeRussell King (Oracle)1-8/+0
Remove the write to GMAC_1US_TIC_COUNTER for two reasons: 1. during initialisation or reinitialisation of the DWMAC core, the core is reset, which sets this register back to its default value. Writing it prior to stmmac_dvr_probe() has no effect. 2. Since commit 8efbdbfa9938 ("net: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"), GMAC4/5 core code will set this register based on the rate of plat->stmmac_clk. This clock is created by the same code which initialises plat->eee_usecs_rate, which is also created to run at this same rate. Since Marek's commit, this will set this register appropriately using the rate of this clock. Therefore, dwmac-intel.c writing GMAC_1US_TIC_COUNTER serves no useful purpose and can be removed. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u3Vul-000E7m-1j@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: stmmac: dwc-qos: remove tegra_eqos_init()Russell King (Oracle)1-23/+1
tegra_eqos_init() initialises the 1US TIC counter for the EEE timers. However, the DWGMAC core is reset after this write, which clears this register to its default. However, dwmac4_core_init() configures this register using the same clock, which happens after reset - thus this is the write which ensures that the register is correctly configured. Therefore, tegra_eqos_init() is not required and is removed. This also means eqos->clk_slave can also be removed. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u3Vuf-000E7g-U4@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15Merge branch 'net-convert-exit_batch_rtnl-to-exit_rtnl'Jakub Kicinski19-300/+213
Kuniyuki Iwashima says: ==================== net: Convert ->exit_batch_rtnl() to ->exit_rtnl(). While converting nexthop to per-netns RTNL, there are two blockers to using rtnl_net_dereference(), flush_all_nexthops() and __unregister_nexthop_notifier(), both of which are called from ->exit_batch_rtnl(). Instead of spreading __rtnl_net_lock() over each ->exit_batch_rtnl(), we should convert all ->exit_batch_rtnl() to per-net ->exit_rtnl() and run it under __rtnl_net_lock() because all ->exit_batch_rtnl() functions do not have anything to factor out for batching. Patch 1 & 2 factorise the undo mechanism against ->init() into a single function, and Patch 3 adds ->exit_batch_rtnl(). Patch 4 ~ 13 convert all ->exit_batch_rtnl() users. Patch 14 removes ->exit_batch_rtnl(). Later, we can convert pfcp and ppp to use ->exit_rtnl(). v1: https://lore.kernel.org/all/20250410022004.8668-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20250411205258.63164-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: Remove ->exit_batch_rtnl().Kuniyuki Iwashima2-9/+1
There are no ->exit_batch_rtnl() users remaining. Let's remove the hook. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250411205258.63164-15-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15geneve: Convert geneve_exit_batch_rtnl() to ->exit_rtnl().Kuniyuki Iwashima1-12/+4
geneve_exit_batch_rtnl() iterates the dying netns list and performs the same operation for each. Let's use ->exit_rtnl(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250411205258.63164-14-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>