x86/mm: Print likely CPU at segfault time - kernel/linux.git

diff options

author	Rik van Riel <riel@surriel.com>	2022-08-05 17:16:44 +0300
committer	Borislav Petkov <bp@suse.de>	2022-08-24 13:48:05 +0300
commit	c926087eb38520b268515ae1a842db6db62554cc (patch)
tree	da2a492dc0049efcd7d9a32f3e56bf6f37ef2421 /Documentation/firmware-guide/acpi/osi.rst
parent	0db7058e8e23e6bbab1b4747ecabd1784c34f50b (diff)
download	linux-c926087eb38520b268515ae1a842db6db62554cc.tar.xz

x86/mm: Print likely CPU at segfault time

In a large enough fleet of computers, it is common to have a few bad CPUs. Those can often be identified by seeing that some commonly run kernel code, which runs fine everywhere else, keeps crashing on the same CPU core on one particular bad system. However, the failure modes in CPUs that have gone bad over the years are often oddly specific, and the only bad behavior seen might be segfaults in programs like bash, python, or various system daemons that run fine everywhere else. Add a printk() to show_signal_msg() to print the CPU, core, and socket at segfault time. This is not perfect, since the task might get rescheduled on another CPU between when the fault hit, and when the message is printed, but in practice this has been good enough to help people identify several bad CPU cores. For example: segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in \ segfault[401000+1000] likely on CPU 0 (core 0, socket 0) This printk can be controlled through /proc/sys/debug/exception-trace. [ bp: Massage a bit, add "likely" to the printed line to denote that the CPU number is not always reliable. ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220805101644.2e674553@imladris.surriel.com

Diffstat (limited to 'Documentation/firmware-guide/acpi/osi.rst')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: