diff options
author | Jakub Kicinski <kuba@kernel.org> | 2023-02-08 05:20:02 +0300 |
---|---|---|
committer | Jakub Kicinski <kuba@kernel.org> | 2023-02-08 05:20:03 +0300 |
commit | cc74ca303a658dc4fb69e75df5f79d03d7a9b7e5 (patch) | |
tree | e446248b006239a53bc21ef49dc7c75262be8870 /tools/perf/scripts/python/task-analyzer.py | |
parent | 383d9f87a06dd923c4fd0fdcb65b58258851f545 (diff) | |
parent | 2ac4980c57f54db7c5b416f7946d2921fc16d9d2 (diff) | |
download | linux-cc74ca303a658dc4fb69e75df5f79d03d7a9b7e5.tar.xz |
Merge branch 'sched-cpumask-improve-on-cpumask_local_spread-locality'
Yury Norov says:
====================
sched: cpumask: improve on cpumask_local_spread() locality
cpumask_local_spread() currently checks local node for presence of i'th
CPU, and then if it finds nothing makes a flat search among all non-local
CPUs. We can do it better by checking CPUs per NUMA hops.
This has significant performance implications on NUMA machines, for example
when using NUMA-aware allocated memory together with NUMA-aware IRQ
affinity hints.
Performance tests from patch 8 of this series for mellanox network
driver show:
TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
+-------------------------+-----------+------------------+------------------+
| | BW (Gbps) | TX side CPU util | RX side CPU util |
+-------------------------+-----------+------------------+------------------+
| Baseline | 52.3 | 6.4 % | 17.9 % |
+-------------------------+-----------+------------------+------------------+
| Applied on TX side only | 52.6 | 5.2 % | 18.5 % |
+-------------------------+-----------+------------------+------------------+
| Applied on RX side only | 94.9 | 11.9 % | 27.2 % |
+-------------------------+-----------+------------------+------------------+
| Applied on both sides | 95.1 | 8.4 % | 27.3 % |
+-------------------------+-----------+------------------+------------------+
Bottleneck in RX side is released, reached linerate (~1.8x speedup).
~30% less cpu util on TX.
====================
Link: https://lore.kernel.org/r/20230121042436.2661843-1-yury.norov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'tools/perf/scripts/python/task-analyzer.py')
0 files changed, 0 insertions, 0 deletions