diff options
author | Dave Hansen <dave.hansen@linux.intel.com> | 2017-12-04 17:07:35 +0300 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2017-12-23 23:12:59 +0300 |
commit | 8a09317b895f073977346779df52f67c1056d81d (patch) | |
tree | 22dfc208daba8068dfa0bee05fd3035c4a6691b9 /arch/x86/entry/calling.h | |
parent | c313ec66317d421fb5768d78c56abed2dc862264 (diff) | |
download | linux-8a09317b895f073977346779df52f67c1056d81d.tar.xz |
x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching
PAGE_TABLE_ISOLATION needs to switch to a different CR3 value when it
enters the kernel and switch back when it exits. This essentially needs to
be done before leaving assembly code.
This is extra challenging because the switching context is tricky: the
registers that can be clobbered can vary. It is also hard to store things
on the stack because there is an established ABI (ptregs) or the stack is
entirely unsafe to use.
Establish a set of macros that allow changing to the user and kernel CR3
values.
Interactions with SWAPGS:
Previous versions of the PAGE_TABLE_ISOLATION code relied on having
per-CPU scratch space to save/restore a register that can be used for the
CR3 MOV. The %GS register is used to index into our per-CPU space, so
SWAPGS *had* to be done before the CR3 switch. That scratch space is gone
now, but the semantic that SWAPGS must be done before the CR3 MOV is
retained. This is good to keep because it is not that hard to do and it
allows to do things like add per-CPU debugging information.
What this does in the NMI code is worth pointing out. NMIs can interrupt
*any* context and they can also be nested with NMIs interrupting other
NMIs. The comments below ".Lnmi_from_kernel" explain the format of the
stack during this situation. Changing the format of this stack is hard.
Instead of storing the old CR3 value on the stack, this depends on the
*regular* register save/restore mechanism and then uses %r14 to keep CR3
during the NMI. It is callee-saved and will not be clobbered by the C NMI
handlers that get called.
[ PeterZ: ESPFIX optimization ]
Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'arch/x86/entry/calling.h')
-rw-r--r-- | arch/x86/entry/calling.h | 66 |
1 files changed, 66 insertions, 0 deletions
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 3fd8bc560fae..a9d17a7686ab 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -1,6 +1,8 @@ /* SPDX-License-Identifier: GPL-2.0 */ #include <linux/jump_label.h> #include <asm/unwind_hints.h> +#include <asm/cpufeatures.h> +#include <asm/page_types.h> /* @@ -187,6 +189,70 @@ For 32-bit we have the following conventions - kernel is built with #endif .endm +#ifdef CONFIG_PAGE_TABLE_ISOLATION + +/* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two halves: */ +#define PTI_SWITCH_MASK (1<<PAGE_SHIFT) + +.macro ADJUST_KERNEL_CR3 reg:req + /* Clear "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */ + andq $(~PTI_SWITCH_MASK), \reg +.endm + +.macro ADJUST_USER_CR3 reg:req + /* Move CR3 up a page to the user page tables: */ + orq $(PTI_SWITCH_MASK), \reg +.endm + +.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req + mov %cr3, \scratch_reg + ADJUST_KERNEL_CR3 \scratch_reg + mov \scratch_reg, %cr3 +.endm + +.macro SWITCH_TO_USER_CR3 scratch_reg:req + mov %cr3, \scratch_reg + ADJUST_USER_CR3 \scratch_reg + mov \scratch_reg, %cr3 +.endm + +.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req + movq %cr3, \scratch_reg + movq \scratch_reg, \save_reg + /* + * Is the switch bit zero? This means the address is + * up in real PAGE_TABLE_ISOLATION patches in a moment. + */ + testq $(PTI_SWITCH_MASK), \scratch_reg + jz .Ldone_\@ + + ADJUST_KERNEL_CR3 \scratch_reg + movq \scratch_reg, %cr3 + +.Ldone_\@: +.endm + +.macro RESTORE_CR3 save_reg:req + /* + * The CR3 write could be avoided when not changing its value, + * but would require a CR3 read *and* a scratch register. + */ + movq \save_reg, %cr3 +.endm + +#else /* CONFIG_PAGE_TABLE_ISOLATION=n: */ + +.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req +.endm +.macro SWITCH_TO_USER_CR3 scratch_reg:req +.endm +.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req +.endm +.macro RESTORE_CR3 save_reg:req +.endm + +#endif + #endif /* CONFIG_X86_64 */ /* |