diff options
author | Nicholas Piggin <npiggin@gmail.com> | 2020-05-11 13:19:52 +0300 |
---|---|---|
committer | Michael Ellerman <mpe@ellerman.id.au> | 2020-07-15 04:08:27 +0300 |
commit | 0138ba5783ae0dcc799ad401a1e8ac8333790df9 (patch) | |
tree | 3e7ba7e2e19810a618adb8b003aa6c36d8c811fd /arch/powerpc/include/asm/ppc-opcode.h | |
parent | b648a5132ca3237a0f1ce5d871fff342b0efcf8a (diff) | |
download | linux-0138ba5783ae0dcc799ad401a1e8ac8333790df9.tar.xz |
powerpc/64/signal: Balance return predictor stack in signal trampoline
Returning from an interrupt or syscall to a signal handler currently
begins execution directly at the handler's entry point, with LR set to
the address of the sigreturn trampoline. When the signal handler
function returns, it runs the trampoline. It looks like this:
# interrupt at user address xyz
# kernel stuff... signal is raised
rfid
# void handler(int sig)
addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
mflr 0
std 0,16(1)
stdu 1,-96(1)
# handler stuff
ld 0,16(1)
mtlr 0
blr
# __kernel_sigtramp_rt64
addi r1,r1,__SIGNAL_FRAMESIZE
li r0,__NR_rt_sigreturn
sc
# kernel executes rt_sigreturn
rfid
# back to user address xyz
Note the blr with no matching bl. This can corrupt the return
predictor.
Solve this by instead resuming execution at the signal trampoline
which then calls the signal handler. qtrace-tools link_stack checker
confirms the entire user/kernel/vdso cycle is balanced after this
patch, whereas it's not upstream.
Alan confirms the dwarf unwind info still looks good. gdb still
recognises the signal frame and can step into parent frames if it
break inside a signal handler.
Performance is pretty noisy, not a very significant change on a POWER9
here, but branch misses are consistently a lot lower on a
microbenchmark:
Performance counter stats for './signal':
13,085.72 msec task-clock # 1.000 CPUs utilized
45,024,760,101 cycles # 3.441 GHz
65,102,895,542 instructions # 1.45 insn per cycle
11,271,673,787 branches # 861.372 M/sec
59,468,979 branch-misses # 0.53% of all branches
12,989.09 msec task-clock # 1.000 CPUs utilized
44,692,719,559 cycles # 3.441 GHz
65,109,984,964 instructions # 1.46 insn per cycle
11,282,136,057 branches # 868.585 M/sec
39,786,942 branch-misses # 0.35% of all branches
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200511101952.1463138-1-npiggin@gmail.com
Diffstat (limited to 'arch/powerpc/include/asm/ppc-opcode.h')
-rw-r--r-- | arch/powerpc/include/asm/ppc-opcode.h | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 2a39c716c343..bce879fb9afd 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -332,6 +332,7 @@ #define PPC_INST_BLR 0x4e800020 #define PPC_INST_BLRL 0x4e800021 #define PPC_INST_BCTR 0x4e800420 +#define PPC_INST_BCTRL 0x4e800421 #define PPC_INST_MULLD 0x7c0001d2 #define PPC_INST_MULLW 0x7c0001d6 #define PPC_INST_MULHWU 0x7c000016 |