summaryrefslogtreecommitdiff
path: root/arch/x86/lib/retpoline.S
AgeCommit message (Collapse)AuthorFilesLines
2024-04-17x86/retpolines: Enable the default thunk warning only on relevant configsBorislav Petkov (AMD)1-0/+7
The using-default-thunk warning check makes sense only with configurations which actually enable the special return thunks. Otherwise, it fires on unrelated 32-bit configs on which the special return thunks won't even work (they're 64-bit only) and, what is more, those configs even go off into the weeds when booting in the alternatives patching code, leading to a dead machine. Fixes: 4461438a8405 ("x86/retpoline: Ensure default return thunk isn't used at runtime") Reported-by: Klara Modin <klarasmodin@gmail.com> Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Klara Modin <klarasmodin@gmail.com> Link: https://lore.kernel.org/r/78e0d19c-b77a-4169-a80f-2eef91f4a1d6@gmail.com Link: https://lore.kernel.org/r/20240413024956.488d474e@yea
2024-04-06x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunkBorislav Petkov (AMD)1-0/+1
srso_alias_untrain_ret() is special code, even if it is a dummy which is called in the !SRSO case, so annotate it like its real counterpart, to address the following objtool splat: vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0 Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org
2024-04-03x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSOBorislav Petkov (AMD)1-1/+4
The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO case is there only for the altenative in CALL_UNTRAIN_RET to have a symbol to resolve. However, testing with kernels which don't have CONFIG_MITIGATION_SRSO enabled, leads to the warning in patch_return() to fire: missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826 Put in a plain "ret" there so that gcc doesn't put a return thunk in in its place which special and gets checked. In addition: ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined! make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Chyba 1 make[1]: *** [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2 make: *** [Makefile:240: __sub-make] Chyba 2 since !SRSO builds would use the dummy return thunk as reported by petr.pisar@atlas.cz, https://bugzilla.kernel.org/show_bug.cgi?id=218679. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404020901.da75a60f-oliver.sang@intel.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/202404020901.da75a60f-oliver.sang@intel.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-29x86/bugs: Fix the SRSO mitigation on Zen3/4Borislav Petkov (AMD)1-5/+6
The original version of the mitigation would patch in the calls to the untraining routines directly. That is, the alternative() in UNTRAIN_RET will patch in the CALL to srso_alias_untrain_ret() directly. However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain mess") meant well in trying to clean up the situation, due to micro- architectural reasons, the untraining routine srso_alias_untrain_ret() must be the target of a CALL instruction and not of a JMP instruction as it is done now. Reshuffle the alternative macros to accomplish that. Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-12x86/retpoline: Ensure default return thunk isn't used at runtimeJosh Poimboeuf1-9/+6
Make sure the default return thunk is not used after all return instructions have been patched by the alternatives because the default return thunk is insufficient when it comes to mitigating Retbleed or SRSO. Fix based on an earlier version by David Kaplan <david.kaplan@amd.com>. [ bp: Fix the compilation error of warn_thunk_thunk being an invisible symbol, hoist thunk macro into calling.h ] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231010171020.462211-4-david.kaplan@amd.com Link: https://lore.kernel.org/r/20240104132446.GEZZaxnrIgIyat0pqf@fat_crate.local
2024-01-10x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNKBreno Leitao1-2/+2
Step 10/10 of the namespace unification of CPU mitigations related Kconfig options. [ mingo: Added one more case. ] Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231121160740.1249350-11-leitao@debian.org
2024-01-10x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSOBreno Leitao1-5/+5
Step 9/10 of the namespace unification of CPU mitigations related Kconfig options. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231121160740.1249350-10-leitao@debian.org
2024-01-10x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRYBreno Leitao1-5/+5
Step 7/10 of the namespace unification of CPU mitigations related Kconfig options. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231121160740.1249350-8-leitao@debian.org
2024-01-10x86/bugs: Rename CONFIG_CALL_DEPTH_TRACKING => ↵Breno Leitao1-3/+3
CONFIG_MITIGATION_CALL_DEPTH_TRACKING Step 3/10 of the namespace unification of CPU mitigations related Kconfig options. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231121160740.1249350-4-leitao@debian.org
2023-10-31Merge tag 'x86-headers-2023-10-28' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 header file cleanup from Ingo Molnar: "Replace <asm/export.h> uses with <linux/export.h> and then remove <asm/export.h>" * tag 'x86-headers-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/headers: Remove <asm/export.h> x86/headers: Replace #include <asm/export.h> with #include <linux/export.h> x86/headers: Remove unnecessary #include <asm/export.h>
2023-10-20x86/retpoline: Document some thunk handling aspectsBorislav Petkov (AMD)1-0/+15
After a lot of experimenting (see thread Link points to) document for now the issues and requirements for future improvements to the thunk handling and potential issuing of a diagnostic when the default thunk hasn't been patched out. This documentation is only temporary and that close before the merge window it is only a placeholder for those future improvements. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231010171020.462211-1-david.kaplan@amd.com
2023-10-20x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()Josh Poimboeuf1-2/+2
For consistency with the other return thunks, rename __x86_return_skl() to call_depth_return_thunk(). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/ae44e9f9976934e3b5b47a458d523ccb15867561.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macrosJosh Poimboeuf1-4/+4
Macros already exist for unaligned code block symbols. Use them. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/26d461bd509cc840af24c94586561c06d39812b2.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Disentangle rethunk-dependent optionsJosh Poimboeuf1-70/+87
CONFIG_RETHUNK, CONFIG_CPU_UNRET_ENTRY and CONFIG_CPU_SRSO are all tangled up. De-spaghettify the code a bit. Some of the rethunk-related code has been shuffled around within the '.text..__x86.return_thunk' section, but otherwise there are no functional changes. srso_alias_untrain_ret() and srso_alias_safe_ret() ((which are very address-sensitive) haven't moved. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/2845084ed303d8384905db3b87b77693945302b4.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Unexport untraining functionsJosh Poimboeuf1-5/+2
These functions aren't called outside of retpoline.S. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/1ae080f95ce7266c82cba6d2adde82349b832654.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Improve i-cache locality for alias mitigationJosh Poimboeuf1-3/+2
Move srso_alias_return_thunk() to the same section as srso_alias_safe_ret() so they can share a cache line. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/eadaf5530b46a7ae8b936522da45ae555d2b3393.1693889988.git.jpoimboe@kernel.org
2023-10-03x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>Masahiro Yamada1-1/+1
The following commit: ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost") deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>. Use <linux/export.h> in *.S as well as in *.c files. After all the <asm/export.h> lines are replaced, <asm/export.h> and <asm-generic/export.h> will be removed. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230806145958.380314-2-masahiroy@kernel.org
2023-08-16x86/srso: Explain the untraining sequences a bit moreBorislav Petkov (AMD)1-0/+19
The goal is to eventually have a proper documentation about all this. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814164447.GFZNpZ/64H4lENIe94@fat_crate.local
2023-08-16x86/cpu: Cleanup the untrain messPeter Zijlstra1-0/+7
Since there can only be one active return_thunk, there only needs be one (matching) untrain_ret. It fundamentally doesn't make sense to allow multiple untrain_ret at the same time. Fold all the 3 different untrain methods into a single (temporary) helper stub. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121149.042774962@infradead.org
2023-08-16x86/cpu: Rename srso_(.*)_alias to srso_alias_\1Peter Zijlstra1-13/+13
For a more consistent namespace. [ bp: Fixup names in the doc too. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.976236447@infradead.org
2023-08-16x86/cpu: Rename original retbleed methodsPeter Zijlstra1-15/+15
Rename the original retbleed return thunk and untrain_ret to retbleed_return_thunk() and retbleed_untrain_ret(). No functional changes. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.909378169@infradead.org
2023-08-16x86/cpu: Clean up SRSO return thunk messPeter Zijlstra1-16/+42
Use the existing configurable return thunk. There is absolute no justification for having created this __x86_return_thunk alternative. To clarify, the whole thing looks like: Zen3/4 does: srso_alias_untrain_ret: nop2 lfence jmp srso_alias_return_thunk int3 srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so add $8, %rsp ret int3 srso_alias_return_thunk: call srso_alias_safe_ret ud2 While Zen1/2 does: srso_untrain_ret: movabs $foo, %rax lfence call srso_safe_ret (jmp srso_return_thunk ?) int3 srso_safe_ret: // embedded in movabs instruction add $8,%rsp ret int3 srso_return_thunk: call srso_safe_ret ud2 While retbleed does: zen_untrain_ret: test $0xcc, %bl lfence jmp zen_return_thunk int3 zen_return_thunk: // embedded in the test instruction ret int3 Where Zen1/2 flush the BTB entry using the instruction decoder trick (test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence (srso_safe_ret()) which forces the function return instruction to speculate into a trap (UD2). This RET will then mispredict and execution will continue at the return site read from the top of the stack. Pick one of three options at boot (evey function can only ever return once). [ bp: Fixup commit message uarch details and add them in a comment in the code too. Add a comment about the srso_select_mitigation() dependency on retbleed_select_mitigation(). Add moar ifdeffery for 32-bit builds. Add a dummy srso_untrain_ret_alias() definition for 32-bit alternatives needing the symbol. ] Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.842775684@infradead.org
2023-08-16x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()Peter Zijlstra1-2/+2
vmlinux.o: warning: objtool: srso_untrain_ret() falls through to next function __x86_return_skl() vmlinux.o: warning: objtool: __x86_return_thunk() falls through to next function __x86_return_skl() This is because these functions (can) end with CALL, which objtool does not consider a terminating instruction. Therefore, replace the INT3 instruction (which is a non-fatal trap) with UD2 (which is a fatal-trap). This indicates execution will not continue past this point. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.637802730@infradead.org
2023-08-16x86/cpu: Fix __x86_return_thunk symbol typePeter Zijlstra1-1/+3
Commit fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") reimplemented __x86_return_thunk with a mix of SYM_FUNC_START and SYM_CODE_END, this is not a sane combination. Since nothing should ever actually 'CALL' this, make it consistently CODE. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.571027074@infradead.org
2023-08-14x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANGPetr Pavlu1-4/+4
The linker script arch/x86/kernel/vmlinux.lds.S matches the thunk sections ".text.__x86.*" from arch/x86/lib/retpoline.S as follows: .text { [...] TEXT_TEXT [...] __indirect_thunk_start = .; *(.text.__x86.*) __indirect_thunk_end = .; [...] } Macro TEXT_TEXT references TEXT_MAIN which normally expands to only ".text". However, with CONFIG_LTO_CLANG, TEXT_MAIN becomes ".text .text.[0-9a-zA-Z_]*" which wrongly matches also the thunk sections. The output layout is then different than expected. For instance, the currently defined range [__indirect_thunk_start, __indirect_thunk_end] becomes empty. Prevent the problem by using ".." as the first separator, for example, ".text..__x86.indirect_thunk". This pattern is utilized by other explicit section names which start with one of the standard prefixes, such as ".text" or ".data", and that need to be individually selected in the linker script. [ nathan: Fix conflicts with SRSO and fold in fix issue brought up by Andrew Cooper in post-review: https://lore.kernel.org/20230803230323.1478869-1-andrew.cooper3@citrix.com ] Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230711091952.27944-2-petr.pavlu@suse.com
2023-08-14x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()Sean Christopherson1-4/+3
Use LEA instead of ADD when adjusting %rsp in srso_safe_ret{,_alias}() so as to avoid clobbering flags. Drop one of the INT3 instructions to account for the LEA consuming one more byte than the ADD. KVM's emulator makes indirect calls into a jump table of sorts, where the destination of each call is a small blob of code that performs fast emulation by executing the target instruction with fixed operands. E.g. to emulate ADC, fastop() invokes adcb_al_dl(): adcb_al_dl: <+0>: adc %dl,%al <+2>: jmp <__x86_return_thunk> A major motivation for doing fast emulation is to leverage the CPU to handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is both an input and output to the target of the call. fastop() collects the RFLAGS result by pushing RFLAGS onto the stack and popping them back into a variable (held in %rdi in this case): asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n" <+71>: mov 0xc0(%r8),%rdx <+78>: mov 0x100(%r8),%rcx <+85>: push %rdi <+86>: popf <+87>: call *%rsi <+89>: nop <+90>: nop <+91>: nop <+92>: pushf <+93>: pop %rdi and then propagating the arithmetic flags into the vCPU's emulator state: ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK); <+64>: and $0xfffffffffffff72a,%r9 <+94>: and $0x8d5,%edi <+109>: or %rdi,%r9 <+122>: mov %r9,0x10(%r8) The failures can be most easily reproduced by running the "emulator" test in KVM-Unit-Tests. If you're feeling a bit of deja vu, see commit b63f20a778c8 ("x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386"). In addition, this breaks booting of clang-compiled guest on a gcc-compiled host where the host contains the %rsp-modifying SRSO mitigations. [ bp: Massage commit message, extend, remove addresses. ] Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Closes: https://lore.kernel.org/all/de474347-122d-54cd-eabf-9dcc95ab9eae@amd.com Reported-by: Srikanth Aithal <sraithal@amd.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20230810013334.GA5354@dev-arch.thelio-3990X/ Link: https://lore.kernel.org/r/20230811155255.250835-1-seanjc@google.com
2023-07-29x86/srso: Add a forgotten NOENDBR annotationBorislav Petkov (AMD)1-0/+1
Fix: vmlinux.o: warning: objtool: .export_symbol+0x29e40: data relocation to !ENDBR: srso_untrain_ret_alias+0x0 Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27x86/srso: Add a Speculative RAS Overflow mitigationBorislav Petkov (AMD)1-4/+78
Add a mitigation for the speculative return address stack overflow vulnerability found on AMD processors. The mitigation works by ensuring all RET instructions speculate to a controlled location, similar to how speculation is controlled in the retpoline sequence. To accomplish this, the __x86_return_thunk forces the CPU to mispredict every function return using a 'safe return' sequence. To ensure the safety of this mitigation, the kernel must ensure that the safe return sequence is itself free from attacker interference. In Zen3 and Zen4, this is accomplished by creating a BTB alias between the untraining function srso_untrain_ret_alias() and the safe return function srso_safe_ret_alias() which results in evicting a potentially poisoned BTB entry and using that safe one for all function returns. In older Zen1 and Zen2, this is accomplished using a reinterpretation technique similar to Retbleed one: srso_untrain_ret() and srso_safe_ret(). Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-05-17x86/retbleed: Add __x86_return_thunk alignment checksBorislav Petkov (AMD)1-1/+1
Add a linker assertion and compute the 0xcc padding dynamically so that __x86_return_thunk is always cacheline-aligned. Leave the SYM_START() macro in as the untraining doesn't need ENDBR annotations anyway. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Link: https://lore.kernel.org/r/20230515140726.28689-1-bp@alien8.de
2023-05-13x86/retbleed: Fix return thunk alignmentBorislav Petkov (AMD)1-2/+2
SYM_FUNC_START_LOCAL_NOALIGN() adds an endbr leading to this layout (leaving only the last 2 bytes of the address): 3bff <zen_untrain_ret>: 3bff: f3 0f 1e fa endbr64 3c03: f6 test $0xcc,%bl 3c04 <__x86_return_thunk>: 3c04: c3 ret 3c05: cc int3 3c06: 0f ae e8 lfence However, "the RET at __x86_return_thunk must be on a 64 byte boundary, for alignment within the BTB." Use SYM_START instead. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-24x86,objtool: Split UNWIND_HINT_EMPTY in twoJosh Poimboeuf1-3/+3
Mark reported that the ORC unwinder incorrectly marks an unwind as reliable when the unwind terminates prematurely in the dark corners of return_to_handler() due to lack of information about the next frame. The problem is UNWIND_HINT_EMPTY is used in two different situations: 1) The end of the kernel stack unwind before hitting user entry, boot code, or fork entry 2) A blind spot in ORC coverage where the unwinder has to bail due to lack of information about the next frame The ORC unwinder has no way to tell the difference between the two. When it encounters an undefined stack state with 'end=1', it blindly marks the stack reliable, which can break the livepatch consistency model. Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and UNWIND_HINT_END_OF_STACK. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
2022-10-17x86/calldepth: Add ret/call counting for debugThomas Gleixner1-1/+6
Add a debuigfs mechanism to validate the accounting, e.g. vs. call/ret balance and to gather statistics about the stuffing to call ratio. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111148.204285506@infradead.org
2022-10-17x86/retpoline: Add SKL retthunk retpolinesPeter Zijlstra1-8/+63
Ensure that retpolines do the proper call accounting so that the return accounting works correctly. Specifically; retpolines are used to replace both 'jmp *%reg' and 'call *%reg', however these two cases do not have the same accounting requirements. Therefore split things up and provide two different retpoline arrays for SKL. The 'jmp *%reg' case needs no accounting, the __x86_indirect_jump_thunk_array[] covers this. The retpoline is changed to not use the return thunk; it's a simple call;ret construct. [ strictly speaking it should do: andq $(~0x1f), PER_CPU_VAR(__x86_call_depth) but we can argue this can be covered by the fuzz we already have in the accounting depth (12) vs the RSB depth (16) ] The 'call *%reg' case does need accounting, the __x86_indirect_call_thunk_array[] covers this. Again, this retpoline avoids the use of the return-thunk, in this case to avoid double accounting. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111147.996634749@infradead.org
2022-10-17x86/retbleed: Add SKL return thunkThomas Gleixner1-0/+31
To address the Intel SKL RSB underflow issue in software it's required to do call depth tracking. Provide a return thunk for call depth tracking on Intel SKL CPUs. The tracking does not use a counter. It uses uses arithmetic shift right on call entry and logical shift left on return. The depth tracking variable is initialized to 0x8000.... when the call depth is zero. The arithmetic shift right sign extends the MSB and saturates after the 12th call. The shift count is 5 so the tracking covers 12 nested calls. On return the variable is shifted left logically so it becomes zero again. CALL RET 0: 0x8000000000000000 0x0000000000000000 1: 0xfc00000000000000 0xf000000000000000 ... 11: 0xfffffffffffffff8 0xfffffffffffffc00 12: 0xffffffffffffffff 0xffffffffffffffe0 After a return buffer fill the depth is credited 12 calls before the next stuffing has to take place. There is a inaccuracy for situations like this: 10 calls 5 returns 3 calls 4 returns 3 calls .... The shift count might cause this to be off by one in either direction, but there is still a cushion vs. the RSB depth. The algorithm does not claim to be perfect, but it should obfuscate the problem enough to make exploitation extremly difficult. The theory behind this is: RSB is a stack with depth 16 which is filled on every call. On the return path speculation "pops" entries to speculate down the call chain. Once the speculative RSB is empty it switches to other predictors, e.g. the Branch History Buffer, which can be mistrained by user space and misguide the speculation path to a gadget. Call depth tracking is designed to break this speculation path by stuffing speculation trap calls into the RSB which are never getting a corresponding return executed. This stalls the prediction path until it gets resteered, The assumption is that stuffing at the 12th return is sufficient to break the speculation before it hits the underflow and the fallback to the other predictors. Testing confirms that it works. Johannes, one of the retbleed researchers. tried to attack this approach but failed. There is obviously no scientific proof that this will withstand future research progress, but all we can do right now is to speculate about it. The SAR/SHL usage was suggested by Andi Kleen. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111147.890071690@infradead.org
2022-06-29x86/retbleed: Add fine grained Kconfig knobsPeter Zijlstra1-0/+4
Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86: Add magic AMD return-thunkPeter Zijlstra1-4/+60
Note: needs to be in a section distinct from Retpolines such that the Retpoline RET substitution cannot possibly use immediate jumps. ORC unwinding for zen_untrain_ret() and __x86_return_thunk() is a little tricky but works due to the fact that zen_untrain_ret() doesn't have any stack ops and as such will emit a single ORC entry at the start (+0x3f). Meanwhile, unwinding an IP, including the __x86_return_thunk() one (+0x40) will search for the largest ORC entry smaller or equal to the IP, these will find the one ORC entry (+0x3f) and all works. [ Alexandre: SVM part. ] [ bp: Build fix, massages. ] Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/retpoline: Use -mfunction-returnPeter Zijlstra1-0/+13
Utilize -mfunction-return=thunk-extern when available to have the compiler replace RET instructions with direct JMPs to the symbol __x86_return_thunk. This does not affect assembler (.S) sources, only C sources. -mfunction-return=thunk-extern has been available since gcc 7.3 and clang 15. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/retpoline: Swizzle retpoline thunkPeter Zijlstra1-3/+3
Put the actual retpoline thunk as the original code so that it can become more complicated. Specifically, it allows RET to be a JMP, which can't be .altinstr_replacement since that doesn't do relocations (except for the very first instruction). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-04-19x86/retpoline: Add ANNOTATE_NOENDBR for retpolinesJosh Poimboeuf1-1/+1
The retpolines are exported, so they're referenced by ksymtab sections. But they're never indirect-branched to, so add ANNOTATE_NOENDBR. Fixes: ed53a0d97192 ("x86/alternative: Use .ibt_endbr_seal to seal indirect calls") Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/b6ec963dfd9301b6b1d74ef7758fcb0b540d6c6c.1650300597.git.jpoimboe@redhat.com
2022-03-15x86/ibt: Annotate text referencesPeter Zijlstra1-0/+1
Annotate away some of the generic code references. This is things where we take the address of a symbol for exception handling or return addresses (eg. context switch). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.877758523@infradead.org
2022-02-21x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCEPeter Zijlstra (Intel)1-1/+1
The RETPOLINE_AMD name is unfortunate since it isn't necessarily AMD only, in fact Hygon also uses it. Furthermore it will likely be sufficient for some Intel processors. Therefore rename the thing to RETPOLINE_LFENCE to better describe what it is. Add the spectre_v2=retpoline,lfence option as an alias to spectre_v2=retpoline,amd to preserve existing setups. However, the output of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed. [ bp: Fix typos, massage. ] Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2021-12-09x86: Add straight-line-speculation mitigationPeter Zijlstra1-1/+1
Make use of an upcoming GCC feature to mitigate straight-line-speculation for x86: https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 https://bugs.llvm.org/show_bug.cgi?id=52323 It's built tested on x86_64-allyesconfig using GCC-12 and GCC-11. Maintenance overhead of this should be fairly low due to objtool validation. Size overhead of all these additional int3 instructions comes to: text data bss dec hex filename 22267751 6933356 2011368 31212475 1dc43bb defconfig-build/vmlinux 22804126 6933356 1470696 31208178 1dc32f2 defconfig-build/vmlinux.sls Or roughly 2.4% additional text. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134908.140103474@infradead.org
2021-12-08x86: Prepare asm files for straight-line-speculationPeter Zijlstra1-1/+1
Replace all ret/retq instructions with RET in preparation of making RET a macro. Since AS is case insensitive it's a big no-op without RET defined. find arch/x86/ -name \*.S | while read file do sed -i 's/\<ret[q]*\>/RET/' $file done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org
2021-10-29x86/retpoline: Create a retpoline thunk arrayPeter Zijlstra1-5/+9
Stick all the retpolines in a single symbol and have the individual thunks as inner labels, this should guarantee thunk order and layout. Previously there were 16 (or rather 15 without rsp) separate symbols and a toolchain might reasonably expect it could displace them however it liked, with disregard for their relative position. However, now they're part of a larger symbol. Any change to their relative position would disrupt this larger _array symbol and thus not be sound. This is the same reasoning used for data symbols. On their own there is no guarantee about their relative position wrt to one aonther, but we're still able to do arrays because an array as a whole is a single larger symbol. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.169659320@infradead.org
2021-10-29x86/asm: Fixup odd GEN-for-each-reg.h usagePeter Zijlstra1-2/+2
Currently GEN-for-each-reg.h usage leaves GEN defined, relying on any subsequent usage to start with #undef, which is rude. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.041792350@infradead.org
2021-10-29x86/retpoline: Remove unused replacement symbolsPeter Zijlstra1-42/+0
Now that objtool no longer creates alternatives, these replacement symbols are no longer needed, remove them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120309.915051744@infradead.org
2021-06-21objtool/x86: Ignore __x86_indirect_alt_* symbolsPeter Zijlstra1-0/+4
Because the __x86_indirect_alt* symbols are just that, objtool will try and validate them as regular symbols, instead of the alternative replacements that they are. This goes sideways for FRAME_POINTER=y builds; which generate a fair amount of warnings. Fixes: 9bc0bb50727c ("objtool/x86: Rewrite retpoline thunk calls") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/YNCgxwLBiK9wclYJ@hirez.programming.kicks-ass.net
2021-04-02objtool/x86: Rewrite retpoline thunk callsPeter Zijlstra1-1/+40
When the compiler emits: "CALL __x86_indirect_thunk_\reg" for an indirect call, have objtool rewrite it to: ALTERNATIVE "call __x86_indirect_thunk_\reg", "call *%reg", ALT_NOT(X86_FEATURE_RETPOLINE) Additionally, in order to not emit endless identical .altinst_replacement chunks, use a global symbol for them, see __x86_indirect_alt_*. This also avoids objtool from having to do code generation. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Link: https://lkml.kernel.org/r/20210326151300.320177914@infradead.org
2021-04-02x86/retpoline: Simplify retpolinesPeter Zijlstra1-17/+17
Due to: c9c324dc22aa ("objtool: Support stack layout changes in alternatives") it is now possible to simplify the retpolines. Currently our retpolines consist of 2 symbols: - __x86_indirect_thunk_\reg: the compiler target - __x86_retpoline_\reg: the actual retpoline. Both are consecutive in code and aligned such that for any one register they both live in the same cacheline: 0000000000000000 <__x86_indirect_thunk_rax>: 0: ff e0 jmpq *%rax 2: 90 nop 3: 90 nop 4: 90 nop 0000000000000005 <__x86_retpoline_rax>: 5: e8 07 00 00 00 callq 11 <__x86_retpoline_rax+0xc> a: f3 90 pause c: 0f ae e8 lfence f: eb f9 jmp a <__x86_retpoline_rax+0x5> 11: 48 89 04 24 mov %rax,(%rsp) 15: c3 retq 16: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%rax,%rax,1) The thunk is an alternative_2, where one option is a JMP to the retpoline. This was done so that objtool didn't need to deal with alternatives with stack ops. But that problem has been solved, so now it is possible to fold the entire retpoline into the alternative to simplify and consolidate unused bytes: 0000000000000000 <__x86_indirect_thunk_rax>: 0: ff e0 jmpq *%rax 2: 90 nop 3: 90 nop 4: 90 nop 5: 90 nop 6: 90 nop 7: 90 nop 8: 90 nop 9: 90 nop a: 90 nop b: 90 nop c: 90 nop d: 90 nop e: 90 nop f: 90 nop 10: 90 nop 11: 66 66 2e 0f 1f 84 00 00 00 00 00 data16 nopw %cs:0x0(%rax,%rax,1) 1c: 0f 1f 40 00 nopl 0x0(%rax) Notice that since the longest alternative sequence is now: 0: e8 07 00 00 00 callq c <.altinstr_replacement+0xc> 5: f3 90 pause 7: 0f ae e8 lfence a: eb f9 jmp 5 <.altinstr_replacement+0x5> c: 48 89 04 24 mov %rax,(%rsp) 10: c3 retq 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW, if we can shrink the retpoline by 1 byte we can pack it more densely). [ bp: Massage commit message. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210326151259.506071949@infradead.org
2021-03-11x86/alternative: Merge include filesJuergen Gross1-1/+1
Merge arch/x86/include/asm/alternative-asm.h into arch/x86/include/asm/alternative.h in order to make it easier to use common definitions later. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210311142319.4723-2-jgross@suse.com