summaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/event_analyzing_sample.py
diff options
context:
space:
mode:
authorLiao Chang <liaochang1@huawei.com>2024-09-19 15:17:19 +0300
committerCatalin Marinas <catalin.marinas@arm.com>2024-11-08 19:25:21 +0300
commitbdf94836c22abc923af5ae83709c96ff4c3c77c9 (patch)
treea78c453af5b56897792ec493ef0097bcc1c9b410 /tools/perf/scripts/python/event_analyzing_sample.py
parent1a9de2f6fda69d5f105dd8af776856a66abdaa64 (diff)
downloadlinux-bdf94836c22abc923af5ae83709c96ff4c3c77c9.tar.xz
arm64: uprobes: Optimize cache flushes for xol slot
The profiling of single-thread selftests bench reveals a bottlenect in caches_clean_inval_pou() on ARM64. On my local testing machine, this function takes approximately 34% of CPU cycles for trig-uprobe-nop and trig-uprobe-push. This patch add a check to avoid unnecessary cache flush when writing instruction to the xol slot. If the instruction is same with the existing instruction in slot, there is no need to synchronize D/I cache. Since xol slot allocation and updates occur on the hot path of uprobe handling, The upstream kernel running on Kunpeng916 (Hi1616), 4 NUMA nodes, 64 cores@ 2.4GHz reveals this optimization has obvious gain for nop and push testcases. Before (next-20240918) ---------------------- uprobe-nop ( 1 cpus): 0.418 ± 0.001M/s ( 0.418M/s/cpu) uprobe-push ( 1 cpus): 0.411 ± 0.005M/s ( 0.411M/s/cpu) uprobe-ret ( 1 cpus): 2.052 ± 0.002M/s ( 2.052M/s/cpu) uretprobe-nop ( 1 cpus): 0.350 ± 0.000M/s ( 0.350M/s/cpu) uretprobe-push ( 1 cpus): 0.353 ± 0.000M/s ( 0.353M/s/cpu) uretprobe-ret ( 1 cpus): 1.074 ± 0.001M/s ( 1.074M/s/cpu) After ----- uprobe-nop ( 1 cpus): 0.926 ± 0.000M/s ( 0.926M/s/cpu) uprobe-push ( 1 cpus): 0.910 ± 0.001M/s ( 0.910M/s/cpu) uprobe-ret ( 1 cpus): 2.056 ± 0.001M/s ( 2.056M/s/cpu) uretprobe-nop ( 1 cpus): 0.653 ± 0.001M/s ( 0.653M/s/cpu) uretprobe-push ( 1 cpus): 0.645 ± 0.000M/s ( 0.645M/s/cpu) uretprobe-ret ( 1 cpus): 1.093 ± 0.001M/s ( 1.093M/s/cpu) Signed-off-by: Liao Chang <liaochang1@huawei.com> Link: https://lore.kernel.org/r/20240919121719.2148361-1-liaochang1@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Diffstat (limited to 'tools/perf/scripts/python/event_analyzing_sample.py')
0 files changed, 0 insertions, 0 deletions