summaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/task-analyzer.py
diff options
context:
space:
mode:
authorJakub Kicinski <kuba@kernel.org>2023-02-08 05:20:02 +0300
committerJakub Kicinski <kuba@kernel.org>2023-02-08 05:20:03 +0300
commitcc74ca303a658dc4fb69e75df5f79d03d7a9b7e5 (patch)
treee446248b006239a53bc21ef49dc7c75262be8870 /tools/perf/scripts/python/task-analyzer.py
parent383d9f87a06dd923c4fd0fdcb65b58258851f545 (diff)
parent2ac4980c57f54db7c5b416f7946d2921fc16d9d2 (diff)
downloadlinux-cc74ca303a658dc4fb69e75df5f79d03d7a9b7e5.tar.xz
Merge branch 'sched-cpumask-improve-on-cpumask_local_spread-locality'
Yury Norov says: ==================== sched: cpumask: improve on cpumask_local_spread() locality cpumask_local_spread() currently checks local node for presence of i'th CPU, and then if it finds nothing makes a flat search among all non-local CPUs. We can do it better by checking CPUs per NUMA hops. This has significant performance implications on NUMA machines, for example when using NUMA-aware allocated memory together with NUMA-aware IRQ affinity hints. Performance tests from patch 8 of this series for mellanox network driver show: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ | | BW (Gbps) | TX side CPU util | RX side CPU util | +-------------------------+-----------+------------------+------------------+ | Baseline | 52.3 | 6.4 % | 17.9 % | +-------------------------+-----------+------------------+------------------+ | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | +-------------------------+-----------+------------------+------------------+ | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | +-------------------------+-----------+------------------+------------------+ | Applied on both sides | 95.1 | 8.4 % | 27.3 % | +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. ==================== Link: https://lore.kernel.org/r/20230121042436.2661843-1-yury.norov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'tools/perf/scripts/python/task-analyzer.py')
0 files changed, 0 insertions, 0 deletions