diff options
author | Jakub Kicinski <kuba@kernel.org> | 2023-03-30 08:15:24 +0300 |
---|---|---|
committer | Jakub Kicinski <kuba@kernel.org> | 2023-03-30 08:15:24 +0300 |
commit | 7079d5e61aaa14cd04fd2fe7a8a2b6eca7833fdb (patch) | |
tree | a2bdd68b78767853f248dcef3ce4230164d20acb /Documentation | |
parent | c5370374bb1bf692167c7276be8b56c02565d535 (diff) | |
parent | 3905f8d64ccc2c640d8c1179f4452f2bf8f1df56 (diff) | |
download | linux-7079d5e61aaa14cd04fd2fe7a8a2b6eca7833fdb.tar.xz |
Merge tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-03-28
Dragos Tatulea says:
====================
net/mlx5e: RX, Drop page_cache and fully use page_pool
For page allocation on the rx path, the mlx5e driver has been using an
internal page cache in tandem with the page pool. The internal page
cache uses a queue for page recycling which has the issue of head of
queue blocking.
This patch series drops the internal page_cache altogether and uses the
page_pool to implement everything that was done by the page_cache
before:
* Let the page_pool handle dma mapping and unmapping.
* Use fragmented pages with fragment counter instead of tracking via
page ref.
* Enable skb recycling.
The patch series has the following effects on the rx path:
* Improved performance for the cases when there was low page recycling
due to head of queue blocking in the internal page_cache. The test
for this was running a single iperf TCP stream to a rx queue
which is bound on the same cpu as the application.
|-------------+--------+--------+------+---------|
| rq type | before | after | unit | diff |
|-------------+--------+--------+------+---------|
| striding rq | 30.1 | 31.4 | Gbps | 4.14 % |
| legacy rq | 30.2 | 33.0 | Gbps | 8.48 % |
|-------------+--------+--------+------+---------|
* Small XDP performance degradation. The test was is XDP drop
program running on a single rx queue with small packets incoming
it looks like this:
|-------------+----------+----------+------+---------|
| rq type | before | after | unit | diff |
|-------------+----------+----------+------+---------|
| striding rq | 19725449 | 18544617 | pps | -6.37 % |
| legacy rq | 19879931 | 18631841 | pps | -6.70 % |
|-------------+----------+----------+------+---------|
This will be handled in a different patch series by adding support for
multi-packet per page.
* For other cases the performance is roughly the same.
The above numbers were obtained on the following system:
24 core Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
32 GB RAM
ConnectX-7 single port
The breakdown on the patch series is the following:
* Preparations for introducing the mlx5e_frag_page struct.
* Delete the mlx5e_page_cache struct.
* Enable dma mapping from page_pool.
* Enable skb recycling and fragment counting.
* Do deferred release of pages (just before alloc) to ensure better
page_pool cache utilization.
====================
* tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
net/mlx5e: RX, Increase WQE bulk size for legacy rq
net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
net/mlx5e: RX, Defer page release in legacy rq for better recycling
net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
net/mlx5e: RX, Defer page release in striding rq for better recycling
net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
net/mlx5e: RX, Enable skb page recycling through the page_pool
net/mlx5e: RX, Enable dma map and sync from page_pool allocator
net/mlx5e: RX, Remove internal page_cache
net/mlx5e: RX, Store SHAMPO header pages in array
net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
net/mlx5e: RX, Remove alloc unit layout constraint for legacy rq
net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation
====================
Link: https://lore.kernel.org/r/20230328205623.142075-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst | 26 |
1 files changed, 0 insertions, 26 deletions
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst index 4cd8e869762b..6b2d1fe74ecf 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst @@ -346,32 +346,6 @@ the software port. - The number of receive packets with CQE compression on ring i [#accel]_. - Acceleration - * - `rx[i]_cache_reuse` - - The number of events of successful reuse of a page from a driver's - internal page cache. - - Acceleration - - * - `rx[i]_cache_full` - - The number of events of full internal page cache where driver can't put a - page back to the cache for recycling (page will be freed). - - Acceleration - - * - `rx[i]_cache_empty` - - The number of events where cache was empty - no page to give. Driver - shall allocate new page. - - Acceleration - - * - `rx[i]_cache_busy` - - The number of events where cache head was busy and cannot be recycled. - Driver allocated new page. - - Acceleration - - * - `rx[i]_cache_waive` - - The number of cache evacuation. This can occur due to page move to - another NUMA node or page was pfmemalloc-ed and should be freed as soon - as possible. - - Acceleration - * - `rx[i]_arfs_err` - Number of flow rules that failed to be added to the flow table. - Error |