summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorJakub Kicinski <kuba@kernel.org>2023-03-30 08:15:24 +0300
committerJakub Kicinski <kuba@kernel.org>2023-03-30 08:15:24 +0300
commit7079d5e61aaa14cd04fd2fe7a8a2b6eca7833fdb (patch)
treea2bdd68b78767853f248dcef3ce4230164d20acb /Documentation
parentc5370374bb1bf692167c7276be8b56c02565d535 (diff)
parent3905f8d64ccc2c640d8c1179f4452f2bf8f1df56 (diff)
downloadlinux-7079d5e61aaa14cd04fd2fe7a8a2b6eca7833fdb.tar.xz
Merge tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says: ==================== mlx5-updates-2023-03-28 Dragos Tatulea says: ==================== net/mlx5e: RX, Drop page_cache and fully use page_pool For page allocation on the rx path, the mlx5e driver has been using an internal page cache in tandem with the page pool. The internal page cache uses a queue for page recycling which has the issue of head of queue blocking. This patch series drops the internal page_cache altogether and uses the page_pool to implement everything that was done by the page_cache before: * Let the page_pool handle dma mapping and unmapping. * Use fragmented pages with fragment counter instead of tracking via page ref. * Enable skb recycling. The patch series has the following effects on the rx path: * Improved performance for the cases when there was low page recycling due to head of queue blocking in the internal page_cache. The test for this was running a single iperf TCP stream to a rx queue which is bound on the same cpu as the application. |-------------+--------+--------+------+---------| | rq type | before | after | unit | diff | |-------------+--------+--------+------+---------| | striding rq | 30.1 | 31.4 | Gbps | 4.14 % | | legacy rq | 30.2 | 33.0 | Gbps | 8.48 % | |-------------+--------+--------+------+---------| * Small XDP performance degradation. The test was is XDP drop program running on a single rx queue with small packets incoming it looks like this: |-------------+----------+----------+------+---------| | rq type | before | after | unit | diff | |-------------+----------+----------+------+---------| | striding rq | 19725449 | 18544617 | pps | -6.37 % | | legacy rq | 19879931 | 18631841 | pps | -6.70 % | |-------------+----------+----------+------+---------| This will be handled in a different patch series by adding support for multi-packet per page. * For other cases the performance is roughly the same. The above numbers were obtained on the following system: 24 core Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz 32 GB RAM ConnectX-7 single port The breakdown on the patch series is the following: * Preparations for introducing the mlx5e_frag_page struct. * Delete the mlx5e_page_cache struct. * Enable dma mapping from page_pool. * Enable skb recycling and fragment counting. * Do deferred release of pages (just before alloc) to ensure better page_pool cache utilization. ==================== * tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats net/mlx5e: RX, Break the wqe bulk refill in smaller chunks net/mlx5e: RX, Increase WQE bulk size for legacy rq net/mlx5e: RX, Split off release path for xsk buffers for legacy rq net/mlx5e: RX, Defer page release in legacy rq for better recycling net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags net/mlx5e: RX, Defer page release in striding rq for better recycling net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name net/mlx5e: RX, Enable skb page recycling through the page_pool net/mlx5e: RX, Enable dma map and sync from page_pool allocator net/mlx5e: RX, Remove internal page_cache net/mlx5e: RX, Store SHAMPO header pages in array net/mlx5e: RX, Remove alloc unit layout constraint for striding rq net/mlx5e: RX, Remove alloc unit layout constraint for legacy rq net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation ==================== Link: https://lore.kernel.org/r/20230328205623.142075-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst26
1 files changed, 0 insertions, 26 deletions
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index 4cd8e869762b..6b2d1fe74ecf 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -346,32 +346,6 @@ the software port.
- The number of receive packets with CQE compression on ring i [#accel]_.
- Acceleration
- * - `rx[i]_cache_reuse`
- - The number of events of successful reuse of a page from a driver's
- internal page cache.
- - Acceleration
-
- * - `rx[i]_cache_full`
- - The number of events of full internal page cache where driver can't put a
- page back to the cache for recycling (page will be freed).
- - Acceleration
-
- * - `rx[i]_cache_empty`
- - The number of events where cache was empty - no page to give. Driver
- shall allocate new page.
- - Acceleration
-
- * - `rx[i]_cache_busy`
- - The number of events where cache head was busy and cannot be recycled.
- Driver allocated new page.
- - Acceleration
-
- * - `rx[i]_cache_waive`
- - The number of cache evacuation. This can occur due to page move to
- another NUMA node or page was pfmemalloc-ed and should be freed as soon
- as possible.
- - Acceleration
-
* - `rx[i]_arfs_err`
- Number of flow rules that failed to be added to the flow table.
- Error