Age | Commit message (Collapse) | Author | Files | Lines |
|
Commit:
e0ba94f14f74 ("x86/tlb_info: get last level TLB entry number of CPU")
introduced u16 "info" arrays for each TLB type.
Since 2012 and each array stores just one type of information: the
number of TLB entries for its respective TLB type.
Replace such arrays with simple variables.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250304085152.51092-8-darwi@linutronix.de
|
|
<asm/cpuid.h> uses static_assert() at multiple locations but it does not
include the CPP macro's definition at linux/build_bug.h.
Include the needed header to make <asm/cpuid.h> self-sufficient.
This gets triggered when cpuid.h is included in new C files, which is to
be done in further commits.
Fixes: 43d86e3cd9a7 ("x86/cpu: Provide cpuid_read() et al.")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250304085152.51092-5-darwi@linutronix.de
|
|
These macros used to abstract over CONFIG_X86_FEATURE_NAMES, but that
was removed in:
7583e8fbdc49 ("x86/cpu: Remove X86_FEATURE_NAMES")
Now they are just an unnecessary indirection, remove them.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250303-setcpuid-taint-louder-v1-1-8d255032cb4c@google.com
|
|
Retpoline mitigation for spectre-v2 uses thunks for indirect branches. To
support this mitigation compilers add a CS prefix with
-mindirect-branch-cs-prefix. For an indirect branch in asm, this needs to
be added manually.
CS prefix is already being added to indirect branches in asm files, but not
in inline asm. Add CS prefix to CALL_NOSPEC for inline asm as well. There
is no JMP_NOSPEC for inline asm.
Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250228-call-nospec-v3-2-96599fed0f33@linux.intel.com
|
|
CALL_NOSPEC macro is used to generate Spectre-v2 mitigation friendly
indirect branches. At compile time the macro defaults to indirect branch,
and at runtime those can be patched to thunk based mitigations.
This approach is opposite of what is done for the rest of the kernel, where
the compile time default is to replace indirect calls with retpoline thunk
calls.
Make CALL_NOSPEC consistent with the rest of the kernel, default to
retpoline thunk at compile time when CONFIG_MITIGATION_RETPOLINE is
enabled.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250228-call-nospec-v3-1-96599fed0f33@linux.intel.com
|
|
behavior
Fix some related issues (done in a single patch to avoid introducing
intermediate bisect warnings):
1) The SMP version of mwait_play_dead() doesn't return, but its
!SMP counterpart does. Make its calling behavior consistent by
resolving the !SMP version to a BUG(). It should never be called
anyway, this just enforces that at runtime and enables its callers
to be marked as __noreturn.
2) While the SMP definition of mwait_play_dead() is annotated as
__noreturn, the declaration isn't. Nor is it listed in
tools/objtool/noreturns.h. Fix that.
3) Similar to #1, the SMP version of acpi_processor_ffh_play_dead()
doesn't return but its !SMP counterpart does. Make the !SMP
version a BUG(). It should never be called.
4) acpi_processor_ffh_play_dead() doesn't return, but is lacking any
__noreturn annotations. Fix that.
This fixes the following objtool warnings:
vmlinux.o: warning: objtool: acpi_processor_ffh_play_dead+0x67: mwait_play_dead() is missing a __noreturn annotation
vmlinux.o: warning: objtool: acpi_idle_play_dead+0x3c: acpi_processor_ffh_play_dead() is missing a __noreturn annotation
Fixes: a7dd183f0b38 ("x86/smp: Allow calling mwait_play_dead with an arbitrary hint")
Fixes: 541ddf31e300 ("ACPI/processor_idle: Add FFH state handling")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/e885c6fa9e96a61471b33e48c2162d28b15b14c5.1740962711.git.jpoimboe@kernel.org
|
|
The safe_smp_processor_id() function was originally implemented in:
dc2bc768a009 ("stack overflow safe kdump: safe_smp_processor_id()")
to mitigate the CPU number corruption on a stack overflow. At the time,
x86-32 stored the CPU number in thread_struct, which was located at the
bottom of the task stack and thus vulnerable to an overflow.
The CPU number is now located in percpu memory, so this workaround
is no longer needed.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250303170115.2176553-1-brgerst@gmail.com
|
|
Commit:
263042e4630a ("Save user RSP in pt_regs->sp on SYSCALL64 fastpath")
simplified the 64-bit implementation of KSTK_ESP() which is
now identical to 32-bit. Merge them into a common definition.
No functional change.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250303183111.2245129-1-brgerst@gmail.com
|
|
When handling an "AP Create" event, return an error if the "requested" SEV
features for the vCPU don't exactly match KVM's view of the VM-scoped
features. There is no known use case for heterogeneous SEV features across
vCPUs, and while KVM can't actually enforce an exact match since the value
in RAX isn't guaranteed to match what the guest shoved into the VMSA, KVM
can at least avoid knowingly letting the guest run in an unsupported state.
E.g. if a VM is created with DebugSwap disabled, KVM will intercept #DBs
and DRs for all vCPUs, even if an AP is "created" with DebugSwap enabled in
its VMSA.
Note, the GHCB spec only "requires" that "AP use the same interrupt
injection mechanism as the BSP", but given the disaster that is DebugSwap
and SEV_FEATURES in general, it's safe to say that AMD didn't consider all
possible complications with mismatching features between the BSP and APs.
Opportunistically fold the check into the relevant request flavors; the
"request < AP_DESTROY" check is just a bizarre way of implementing the
AP_CREATE_ON_INIT => AP_CREATE fallthrough.
Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20250227012541.3234589-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Retpoline mitigation for spectre-v2 uses thunks for indirect branches. To
support this mitigation compilers add a CS prefix with
-mindirect-branch-cs-prefix. For an indirect branch in asm, this needs to
be added manually.
CS prefix is already being added to indirect branches in asm files, but not
in inline asm. Add CS prefix to CALL_NOSPEC for inline asm as well. There
is no JMP_NOSPEC for inline asm.
Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250228-call-nospec-v3-2-96599fed0f33@linux.intel.com
|
|
CALL_NOSPEC macro is used to generate Spectre-v2 mitigation friendly
indirect branches. At compile time the macro defaults to indirect branch,
and at runtime those can be patched to thunk based mitigations.
This approach is opposite of what is done for the rest of the kernel, where
the compile time default is to replace indirect calls with retpoline thunk
calls.
Make CALL_NOSPEC consistent with the rest of the kernel, default to
retpoline thunk at compile time when CONFIG_MITIGATION_RETPOLINE is
enabled.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250228-call-nospec-v3-1-96599fed0f33@linux.intel.com
|
|
The last use of paravirt_disable_iospace() was removed in 2015 by
commit d1c29465b8a5 ("lguest: don't disable iospace.")
Remove it.
Note the comment above it about 'entry.S' is unrelated to this
but stayed when intervening code got deleted.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20250303004441.250451-1-linux@treblig.org
|
|
Robot yielded a .config that tripped:
vmlinux.o: warning: objtool: do_jit+0x276: relocation to !ENDBR: .noinstr.text+0x6a60
This is the result of using __bhi_args[1] in unreachable code; make
sure the compiler is able to determine this is unreachable and trigger
DCE.
Closes: https://lore.kernel.org/oe-kbuild-all/202503030704.H9KFysNS-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250303094911.GL5880@noisy.programming.kicks-ass.net
|
|
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.
Opportunistically change the field to a u64. While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Define independent macros for the RWX protection bits that are enumerated
via EXIT_QUALIFICATION for EPT Violations, and tie them to the RWX bits in
EPT entries via compile-time asserts. Piggybacking the EPTE defines works
for now, but it creates holes in the EPT_VIOLATION_xxx macros and will
cause headaches if/when KVM emulates Mode-Based Execution (MBEC), or any
other features that introduces additional protection information.
Opportunistically rename EPT_VIOLATION_RWX_MASK to EPT_VIOLATION_PROT_MASK
so that it doesn't become stale if/when MBEC support is added.
No functional change intended.
Cc: Jon Kohler <jon@nutanix.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250227000705.3199706-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Those defines are only used in the definition of the various
EPT_VIOLATIONS_ACC_* macros which are then used to extract respective
bits from vmexit error qualifications. Remove the _BIT defines and
redefine the _ACC ones via BIT() macro. No functional changes.
Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250227000705.3199706-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Commit:
03b122da74b2 ("x86/sgx: Hook arch_memory_failure() into mainline code")
... added <linux/mm.h> to <asm/set_memory.h> to provide some helpers.
However the following commit:
b3fdf9398a16 ("x86/mce: relocate set{clear}_mce_nospec() functions")
... moved the inline definitions someplace else, and now <asm/set_memory.h>
just declares a bunch of mostly self-contained functions.
No need for the whole <linux/mm.h> inclusion to declare functions; just
remove that include. This helps avoid circular dependency headaches
(e.g. if <linux/mm.h> ends up including <linux/set_memory.h>).
This change requires a couple of include fixups not to break the
build:
* <asm/smp.h>: including <asm/thread_info.h> directly relies on
<linux/thread_info.h> having already been included, because the
former needs the BAD_STACK/NOT_STACK constants defined in the
latter. This is no longer the case when <asm/smp.h> is included from
some driver file - just include <linux/thread_info.h> to stay out
of trouble.
* sev-guest.c relies on <asm/set_memory.h> including <linux/mm.h>,
so we just need to make that include explicit.
[ mingo: Cleaned up the changelog ]
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20241212080904.2089632-3-kevin.brodsky@arm.com
|
|
__set_memory_prot() is unused since:
5c11f00b09c1 ("x86: remove memory hotplug support on X86_32")
Let's remove it.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20241212080904.2089632-2-kevin.brodsky@arm.com
|
|
Add AUTO mitigations for mds/taa/mmio/rfds to create consistent vulnerability
handling. These AUTO mitigations will be turned into the appropriate default
mitigations in the <vuln>_select_mitigation() functions. Later, these will be
used with the new attack vector controls to help select appropriate
mitigations.
Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250108202515.385902-4-david.kaplan@amd.com
|
|
All CPU vulnerabilities with command line options map to a single X86_BUG bit
except for Spectre V2 where both the spectre_v2 and spectre_v2_user command
line options are related to the same bug.
The spectre_v2 command line options mostly relate to user->kernel and
guest->host mitigations, while the spectre_v2_user command line options relate
to user->user or guest->guest protections.
Define a new X86_BUG bit for spectre_v2_user so each *_select_mitigation()
function in bugs.c is related to a unique X86_BUG bit.
No functional changes.
Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250108202515.385902-2-david.kaplan@amd.com
|
|
Replace X86_CMPXCHG64 with X86_CX8, as CX8 is the name of the CPUID
flag, thus to make it consistent with X86_FEATURE_CX8 defined in
<asm/cpufeatures.h>.
No functional change intended.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250228082338.73859-2-xin@zytor.com
|
|
Sometimes it can be very useful to run CPU vulnerability mitigations on
systems where they aren't known to mitigate any real-world
vulnerabilities. This can be handy for mundane reasons like debugging
HW-agnostic logic on whatever machine is to hand, but also for research
reasons: while some mitigations are focused on individual vulns and
uarches, others are fairly general, and it's strategically useful to
have an idea how they'd perform on systems where they aren't currently
needed.
As evidence for this being useful, a flag specifically for Retbleed was
added in:
5c9a92dec323 ("x86/bugs: Add retbleed=force").
Since CPU bugs are tracked using the same basic mechanism as features,
and there are already parameters for manipulating them by hand, extend
that mechanism to support bug as well as capabilities.
With this patch and setcpuid=srso, a QEMU guest running on an Intel host
will boot with Safe-RET enabled.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241220-force-cpu-bug-v2-3-7dc71bce742a@google.com
|
|
Minimum version of binutils required to compile the kernel is 2.25.
This version correctly handles the "lock" prefix, so it is possible
to remove the semicolon, which was used to support ancient versions
of GNU as.
Due to the semicolon, the compiler considers "lock; insn" as two
separate instructions. Removing the semicolon makes asm length
calculations more accurate, consequently making scheduling and
inlining decisions of the compiler more accurate.
Removing the semicolon also enables assembler checks involving lock
prefix. Trying to assemble e.g. "lock andl %eax, %ebx" results in:
Error: expecting lockable instruction after `lock'
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250228085149.2478245-1-ubizjak@gmail.com
|
|
Because calls to get_this_hybrid_cpu_type() and
get_this_hybrid_cpu_native_id() are not required now. cpu-type and
native-model-id are cached at boot in per-cpu struct cpuinfo_topology.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-4-2ae010f50370@linux.intel.com
|
|
get_this_hybrid_cpu_type() misses a case when cpu-type is populated
regardless of X86_FEATURE_HYBRID_CPU. This is particularly true for hybrid
variants that have P or E cores fused off.
Instead use the cpu-type cached in struct x86_topology, as it does not rely
on hybrid feature to enumerate cpu-type. This can also help avoid the
model-specific fixup get_hybrid_cpu_type(). Also replace the
get_this_hybrid_cpu_native_id() with its cached value in struct
x86_topology.
While at it, remove enum hybrid_cpu_type as it serves no purpose when we
have the exact cpu-types defined in enum intel_cpu_type. Also rename
atom_native_id to intel_native_id and move it to intel-family.h where
intel_cpu_type lives.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-3-2ae010f50370@linux.intel.com
|
|
ST ConneXt STA2x11 was an interface chip for Atom E6xx processors,
using a number of components usually found on Arm SoCs. Most of this
was merged upstream, but it was never complete enough to actually work
and has been abandoned for many years.
We already had an agreement on removing it in 2022, but nobody ever
submitted the patch to do it.
Without STA2x11, CONFIG_X86_32_NON_STANDARD no longer has any
use - remove it.
Suggested-by: Davide Ciminaghi <ciminaghi@gnudd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250226213714.4040853-10-arnd@kernel.org
|
|
With the maximum amount of RAM now 4GB, there is very little point
to still have PTE pages in highmem. Drop this for simplification.
The only other architecture supporting HIGHPTE is 32-bit arm, and
once that feature is removed as well, the highpte logic can be
dropped from common code as well.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250226213714.4040853-8-arnd@kernel.org
|
|
HIGHMEM64G support was added in linux-2.3.25 to support (then)
high-end Pentium Pro and Pentium III Xeon servers with more than 4GB of
addressing, NUMA and PCI-X slots started appearing.
I have found no evidence of this ever being used in regular dual-socket
servers or consumer devices, all the users seem obsolete these days,
even by i386 standards:
- Support for NUMA servers (NUMA-Q, IBM x440, unisys) was already
removed ten years ago.
- 4+ socket non-NUMA servers based on Intel 450GX/450NX, HP F8 and
ServerWorks ServerSet/GrandChampion could theoretically still work
with 8GB, but these were exceptionally rare even 20 years ago and
would have usually been equipped with than the maximum amount of
RAM.
- Some SKUs of the Celeron D from 2004 had 64-bit mode fused off but
could still work in a Socket 775 mainboard designed for the later
Core 2 Duo and 8GB. Apparently most BIOSes at the time only allowed
64-bit CPUs.
- The rare Xeon LV "Sossaman" came on a few motherboards with
registered DDR2 memory support up to 16GB.
- In the early days of x86-64 hardware, there was sometimes the need
to run a 32-bit kernel to work around bugs in the hardware drivers,
or in the syscall emulation for 32-bit userspace. This likely still
works but there should never be a need for this any more.
PAE mode is still required to get access to the 'NX' bit on Atom
'Pentium M' and 'Core Duo' CPUs.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250226213714.4040853-6-arnd@kernel.org
|
|
The x86 CPU selection menu is confusing for a number of reasons:
When configuring 32-bit kernels, it shows a small number of early 64-bit
microarchitectures (K8, Core 2) but not the regular generic 64-bit target
that is the normal default. There is no longer a reason to run 32-bit
kernels on production 64-bit systems, so only actual 32-bit CPUs need
to be shown here.
When configuring 64-bit kernels, the options also pointless as there is
no way to pick any CPU from the past 15 years, leaving GENERIC_CPU as
the only sensible choice.
Address both of the above by removing the obsolete options and making
all 64-bit kernels run on both Intel and AMD CPUs from any generation.
Testing generic 32-bit kernels on 64-bit hardware remains possible,
just not building a 32-bit kernel that requires a 64-bit CPU.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250226213714.4040853-5-arnd@kernel.org
|
|
We are going to apply a new series that conflicts with pending
work in x86/mm, so merge in x86/mm to avoid it, and also to
refresh the x86/cpu branch with fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
X86_FEATURE_USE_IBPB was introduced in:
2961298efe1e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")
to have separate flags for when the CPU supports IBPB (i.e. X86_FEATURE_IBPB)
and when an IBPB is actually used to mitigate Spectre v2.
Ever since then, the uses of IBPB expanded. The name became confusing
because it does not control all IBPB executions in the kernel.
Furthermore, because its name is generic and it's buried within
indirect_branch_prediction_barrier(), it's easy to use it not knowing
that it is specific to Spectre v2.
X86_FEATURE_USE_IBPB is no longer needed because all the IBPB executions
it used to control are now controlled through other means (e.g.
switch_mm_*_ibpb static branches).
Remove the unused feature bit.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20250227012712.3193063-7-yosry.ahmed@linux.dev
|
|
Instead of using X86_FEATURE_USE_IBPB to guard the IBPB execution in KVM
when a new vCPU is loaded, introduce a static branch, similar to
switch_mm_*_ibpb.
This makes it obvious in spectre_v2_user_select_mitigation() what
exactly is being toggled, instead of the unclear X86_FEATURE_USE_IBPB
(which will be shortly removed). It also provides more fine-grained
control, making it simpler to change/add paths that control the IBPB in
the vCPU switch path without affecting other IBPBs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250227012712.3193063-5-yosry.ahmed@linux.dev
|
|
indirect_branch_prediction_barrier() only performs the MSR write if
X86_FEATURE_USE_IBPB is set, using alternative_msr_write(). In
preparation for removing X86_FEATURE_USE_IBPB, move the feature check
into the callers so that they can be addressed one-by-one, and use
X86_FEATURE_IBPB instead to guard the MSR write.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250227012712.3193063-2-yosry.ahmed@linux.dev
|
|
Add support for
CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it
indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate
SRSO.
Enable BpSpecReduce to mitigate SRSO across guest/host boundaries.
Switch back to enabling the bit when virtualization is enabled and to
clear the bit when virtualization is disabled because using a MSR slot
would clear the bit when the guest is exited and any training the guest
has done, would potentially influence the host kernel when execution
enters the kernel and hasn't VMRUN the guest yet.
More detail on the public thread in Link below.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241202120416.6054-1-bp@kernel.org
|
|
Saves a CALL to an out-of-line thunk for the common case of 1
argument.
Suggested-by: Scott Constable <scott.d.constable@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250224124200.927885784@infradead.org
|
|
While WAIT_FOR_ENDBR is specified to be a full speculation stop; it
has been shown that some implementations are 'leaky' to such an extend
that speculation can escape even the FineIBT preamble.
To deal with this, add additional hardening to the FineIBT preamble.
Notably, using a new LLVM feature:
https://github.com/llvm/llvm-project/commit/e223485c9b38a5579991b8cebb6a200153eee245
which encodes the number of arguments in the kCFI preamble's register.
Using this register<->arity mapping, have the FineIBT preamble CALL
into a stub clobbering the relevant argument registers in the
speculative case.
Scott sayeth thusly:
Microarchitectural attacks such as Branch History Injection (BHI) and
Intra-mode Branch Target Injection (IMBTI) [1] can cause an indirect
call to mispredict to an adversary-influenced target within the same
hardware domain (e.g., within the kernel). Instructions at the
mispredicted target may execute speculatively and potentially expose
kernel data (e.g., to a user-mode adversary) through a
microarchitectural covert channel such as CPU cache state.
CET-IBT [2] is a coarse-grained control-flow integrity (CFI) ISA
extension that enforces that each indirect call (or indirect jump)
must land on an ENDBR (end branch) instruction, even speculatively*.
FineIBT is a software technique that refines CET-IBT by associating
each function type with a 32-bit hash and enforcing (at the callee)
that the hash of the caller's function pointer type matches the hash
of the callee's function type. However, recent research [3] has
demonstrated that the conditional branch that enforces FineIBT's hash
check can be coerced to mispredict, potentially allowing an adversary
to speculatively bypass the hash check:
__cfi_foo:
ENDBR64
SUB R10d, 0x01234567
JZ foo # Even if the hash check fails and ZF=0, this branch could still mispredict as taken
UD2
foo:
...
The techniques demonstrated in [3] require the attacker to be able to
control the contents of at least one live register at the mispredicted
target. Therefore, this patch set introduces a sequence of CMOV
instructions at each indirect-callable target that poisons every live
register with data that the attacker cannot control whenever the
FineIBT hash check fails, thus mitigating any potential attack.
The security provided by this scheme has been discussed in detail on
an earlier thread [4].
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
[2] Intel Software Developer's Manual, Volume 1, Chapter 18
[3] https://www.vusec.net/projects/native-bhi/
[4] https://lore.kernel.org/lkml/20240927194925.707462984@infradead.org/
*There are some caveats for certain processors, see [1] for more info
Suggested-by: Scott Constable <scott.d.constable@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250224124200.820402212@infradead.org
|
|
Add an array of code thunks, to be called from the FineIBT preamble,
clobbering the first 'n' argument registers for speculative execution.
Notably the 0th entry will clobber no argument registers and will never
be used, it exists so the array can be naturally indexed, while the 7th
entry will clobber all the 6 argument registers and also RSP in order to
mess up stack based arguments.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250224124200.717378681@infradead.org
|
|
Because overlapping code sequences are all the rage.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250224124200.486463917@infradead.org
|
|
FineIBT will start using 0xEA as #UD. Normally '0xEA' is a 'bad',
invalid instruction for the CPU.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250224124200.166774696@infradead.org
|
|
The call to mce_notify_irq() has been there since the initial version of
the soft inject mce machinery, introduced in
ea149b36c7f5 ("x86, mce: add basic error injection infrastructure").
At that time it was functional since injecting an MCE resulted in the
following call chain:
raise_mce()
->machine_check_poll()
->mce_log() - sets notfiy_user_bit
->mce_notify_user() (current mce_notify_irq) consumed the bit and called the
usermode helper.
However, with the introduction of
011d82611172 ("RAS: Add a Corrected Errors Collector")
the code got moved around and the usermode helper began to be called via the
early notifier mce_first_notifier() rendering the call in raise_local()
defunct as the mce_need_notify bit (ex notify_user) is only being set from the
early notifier.
Remove the noop call and make mce_notify_irq() static.
No functional changes.
Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250225143348.268469-1-nik.borisov@suse.com
|
|
Add support for "Idle HLT" interception on AMD CPUs, and enable Idle HLT
interception instead of "normal" HLT interception for all VMs for which
HLT-exiting is enabled. Idle HLT provides a mild performance boost for
all VM types, by avoiding a VM-Exit in the scenario where KVM would
immediately "wake" and resume the vCPU.
Idle HLT makes HLT-exiting conditional on the vCPU not having a valid,
unmasked interrupt. Specifically, a VM-Exit occurs on execution of HLT
if and only if there are no pending V_IRQ or V_NMI events. Note, Idle
is a replacement for full HLT interception, i.e. enabling HLT interception
would result in all HLT instructions causing unconditional VM-Exits. Per
the APM:
When both HLT and Idle HLT intercepts are active at the same time, the
HLT intercept takes priority. This intercept occurs only if a virtual
interrupt is not pending (V_INTR or V_NMI).
For KVM's use of V_IRQ (also called V_INTR in the APM) to detect interrupt
windows, the net effect of enabling Idle HLT is that, if a virtual
interupt is pending and unmasked at the time of HLT, the vCPU will take
a V_IRQ intercept instead of a HLT intercept.
When AVIC is enabled, Idle HLT works as intended: the vCPU continues
unimpeded and services the pending virtual interrupt.
Note, the APM's description of V_IRQ interaction with AVIC is quite
confusing, and requires piecing together implied behavior. Per the APM,
when AVIC is enabled, V_IRQ *from the VMCB* is ignored:
When AVIC mode is enabled for a virtual processor, the V_IRQ, V_INTR_PRIO,
V_INTR_VECTOR, and V_IGN_TPR fields in the VMCB are ignored.
Which seems to contradict the behavior of Idle HLT:
This intercept occurs only if a virtual interrupt is not pending (V_INTR
or V_NMI).
What's not explicitly stated is that hardware's internal copy of V_IRQ
(and related fields) *are* still active, i.e. are presumably used to cache
information from the virtual APIC.
Handle Idle HLT exits as if they were normal HLT exits, e.g. don't try to
optimize the handling under the assumption that there isn't a pending IRQ.
Irrespective of AVIC, Idle HLT is inherently racy with respect to the vIRR,
as KVM can set vIRR bits asychronously.
No changes are required to support KVM's use Idle HLT while running
L2. In fact, supporting Idle HLT is actually a bug fix to some extent.
If L1 wants to intercept HLT, recalc_intercepts() will enable HLT
interception in vmcb02 and forward the intercept to L1 as normal.
But if L1 does not want to intercept HLT, then KVM will run L2 with Idle
HLT enabled and HLT interception disabled. If a V_IRQ or V_NMI for L2
becomes pending and L2 executes HLT, then use of Idle HLT will do the
right thing, i.e. not #VMEXIT and instead deliver the virtual event. KVM
currently doesn't handle this scenario correctly, e.g. doesn't check V_IRQ
or V_NMI in vmcs02 as part of kvm_vcpu_has_events().
Do not expose Idle HLT to L1 at this time, as supporting nested Idle HLT is
more complex than just enumerating the feature, e.g. requires KVM to handle
the aforementioned scenarios of V_IRQ and V_NMI at the time of exit.
Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://bugzilla.kernel.org/attachment.cgi?id=306250
Link: https://lore.kernel.org/r/20250128124812.7324-3-manali.shukla@amd.com
[sean: rewrite changelog, drop nested "support"]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The Idle HLT Intercept feature allows for the HLT instruction
execution by a vCPU to be intercepted by the hypervisor only if there
are no pending events (V_INTR and V_NMI) for the vCPU. When the vCPU
is expected to service the pending events (V_INTR and V_NMI), the Idle
HLT intercept won’t trigger. The feature allows the hypervisor to
determine if the vCPU is idle and reduces wasteful VMEXITs.
In addition to the aforementioned use case, the Idle HLT intercept
feature is also used for enlightened guests who aim to securely manage
events without the hypervisor’s awareness. If a HLT occurs while
a virtual event is pending and the hypervisor is unaware of this
pending event (as could be the case with enlightened guests), the
absence of the Idle HLT intercept feature could result in a vCPU being
suspended indefinitely.
Presence of Idle HLT intercept feature for guests is indicated via CPUID
function 0x8000000A_EDX[30].
Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250128124812.7324-2-manali.shukla@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Convert VMGEXIT SW_EXITINFO1 codes from plain numbers to proper defines.
Opportunistically update the comment for the malformed input "sub-error"
codes to state that they are defined by the GHCB, and to capure the
relationship to the malformed input response.
No functional change intended.
Signed-off-by: Melody Wang <huibo.wang@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Pavan Kumar Paluri <papaluri@amd.com>
Link: https://lore.kernel.org/r/20250225213937.2471419-2-huibo.wang@amd.com
[sean: update comments]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Construct __percpu_seg_override macro from __percpu_seg by
concatenating the later with __seg_ prefix to reduce ifdeffery.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250225200235.48007-1-ubizjak@gmail.com
|
|
Open code the filling of vcpu->arch.exception in kvm_requeue_exception()
instead of bouncing through kvm_multiple_exception(), as re-injection
doesn't actually share that much code with "normal" injection, e.g. the
VM-Exit interception check, payload delivery, and nested exception code
is all bypassed as those flows only apply during initial injection.
When FRED comes along, the special casing will only get worse, as FRED
explicitly tracks nested exceptions and essentially delivers the payload
on the stack frame, i.e. re-injection will need more inputs, and normal
injection will have yet more code that needs to be bypassed when KVM is
re-injecting an exception.
No functional change intended.
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20241001050110.3643764-2-xin@zytor.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rename send_user_only to avoid "user", because KVM's ABI is to not inject
page faults into CPL0, whereas "user" in x86 is specifically CPL3. Invert
the polarity to keep the naming simple and unambiguous. E.g. while KVM
often refers to CPL0 as "kernel", that terminology isn't ubiquitous, and
"send_kernel" could be misconstrued as "send only to kernel".
Link: https://lore.kernel.org/r/20250215010609.1199982-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Depending on the type of panics, it was found that the
__register_nmi_handler() function can be called in NMI context from
nmi_shootdown_cpus() leading to a lockdep splat:
WARNING: inconsistent lock state
inconsistent {INITIAL USE} -> {IN-NMI} usage.
lock(&nmi_desc[0].lock);
<Interrupt>
lock(&nmi_desc[0].lock);
Call Trace:
_raw_spin_lock_irqsave
__register_nmi_handler
nmi_shootdown_cpus
kdump_nmi_shootdown_cpus
native_machine_crash_shutdown
__crash_kexec
In this particular case, the following panic message was printed before:
Kernel panic - not syncing: Fatal hardware error!
This message seemed to be given out from __ghes_panic() running in
NMI context.
The __register_nmi_handler() function which takes the nmi_desc lock
with irq disabled shouldn't be called from NMI context as this can
lead to deadlock.
The nmi_shootdown_cpus() function can only be invoked once. After the
first invocation, all other CPUs should be stuck in the newly added
crash_nmi_callback() and cannot respond to a second NMI.
Fix it by adding a new emergency NMI handler to the nmi_desc
structure and provide a new set_emergency_nmi_handler() helper to set
crash_nmi_callback() in any context. The new emergency handler will
preempt other handlers in the linked list. That will eliminate the need
to take any lock and serve the panic in NMI use case.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250206191844.131700-1-longman@redhat.com
|
|
Unify __pcpu_op1_N() and __pcpu_op2_N() macros to __pcpu_op_N()
by applying the macro only to asm mnemonic, not to the mnemonic
plus its arguments.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250224071648.15913-1-ubizjak@gmail.com
|
|
Now that all KVM usage of the Xen HVM config information is buried behind
CONFIG_KVM_XEN=y, move the per-VM kvm_xen_hvm_config field out of kvm_arch
and into kvm_xen.
No functional change intended.
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20250215011437.1203084-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that all references to kvm_vcpu_arch.xen_hvm_config are wrapped with
CONFIG_KVM_XEN #ifdefs, bury the field itself behind CONFIG_KVM_XEN=y.
No functional change intended.
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20250215011437.1203084-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|