diff options
author | Puranjay Mohan <puranjay@kernel.org> | 2024-05-03 20:18:47 +0300 |
---|---|---|
committer | Catalin Marinas <catalin.marinas@arm.com> | 2024-06-12 17:44:19 +0300 |
commit | bf0baa5bbdc9b99ea081d360f245e5f96e835612 (patch) | |
tree | 4acac0c94439b1e1feb92caca8521f1d012aabe6 /scripts/gdb/linux/tasks.py | |
parent | 7647e2b109f4d508fcb35bb8089a27c4fdd81f61 (diff) | |
download | linux-bf0baa5bbdc9b99ea081d360f245e5f96e835612.tar.xz |
arm64: implement raw_smp_processor_id() using thread_info
Historically, arm64 implemented raw_smp_processor_id() as a read of
current_thread_info()->cpu. This changed when arm64 moved thread_info into
task struct, as at the time CONFIG_THREAD_INFO_IN_TASK made core code use
thread_struct::cpu for the cpu number, and due to header dependencies
prevented using this in raw_smp_processor_id(). As a workaround, we moved to
using a percpu variable in commit:
57c82954e77fa12c ("arm64: make cpu number a percpu variable")
Since then, thread_info::cpu was reintroduced, and core code was made to use
this in commits:
001430c1910df65a ("arm64: add CPU field to struct thread_info")
bcf9033e5449bdca ("sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y")
Consequently it is possible to use current_thread_info()->cpu again.
This decreases the number of emitted instructions like in the following
example:
Dump of assembler code for function bpf_get_smp_processor_id:
0xffff8000802cd608 <+0>: nop
0xffff8000802cd60c <+4>: nop
0xffff8000802cd610 <+8>: adrp x0, 0xffff800082138000
0xffff8000802cd614 <+12>: mrs x1, tpidr_el1
0xffff8000802cd618 <+16>: add x0, x0, #0x8
0xffff8000802cd61c <+20>: ldrsw x0, [x0, x1]
0xffff8000802cd620 <+24>: ret
After this patch:
Dump of assembler code for function bpf_get_smp_processor_id:
0xffff8000802c9130 <+0>: nop
0xffff8000802c9134 <+4>: nop
0xffff8000802c9138 <+8>: mrs x0, sp_el0
0xffff8000802c913c <+12>: ldr w0, [x0, #24]
0xffff8000802c9140 <+16>: ret
A microbenchmark[1] was built to measure the performance improvement
provided by this change. It calls the following function given number of
times and finds the runtime overhead:
static noinline int get_cpu_id(void)
{
return smp_processor_id();
}
Run the benchmark like:
modprobe smp_processor_id nr_function_calls=1000000000
+--------------------------+------------------------+
| | Number of Calls | Time taken |
+--------+-----------------+------------------------+
| Before | 1000000000 | 1602888401ns |
+--------+-----------------+------------------------+
| After | 1000000000 | 1206212658ns |
+--------+-----------------+------------------------+
| Difference (decrease) | 396675743ns (24.74%) |
+---------------------------------------------------+
Remove the percpu variable cpu_number as it is used only in
set_smp_ipi_range() as a dummy variable to be passed to ipi_handler().
Use irq_stat in place of cpu_number here like arm32.
[1] https://github.com/puranjaymohan/linux/commit/77d3fdd
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20240503171847.68267-2-puranjay@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Diffstat (limited to 'scripts/gdb/linux/tasks.py')
0 files changed, 0 insertions, 0 deletions