summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2025-02-28x86/platform: Fix missing declaration of 'x86_apple_machine'Zhang Kunbo1-0/+2
Fix this sparse warning: arch/x86/kernel/quirks.c:662:6: warning: symbol 'x86_apple_machine' was not declared. Should it be static? Signed-off-by: Zhang Kunbo <zhangkunbo@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20241126015636.3463994-1-zhangkunbo@huawei.com
2025-02-28x86/irq: Fix missing declaration of 'io_apic_irqs'Zhang Kunbo1-0/+1
Fix this sparse warning: arch/x86/kernel/i8259.c:57:15: warning: symbol 'io_apic_irqs' was not declared. Should it be static? Signed-off-by: Zhang Kunbo <zhangkunbo@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20241126020511.3464664-1-zhangkunbo@huawei.com
2025-02-28x86/ia32: Leave NULL selector values 0~3 unchangedXin Li (Intel)1-19/+43
The first GDT descriptor is reserved as 'NULL descriptor'. As bits 0 and 1 of a segment selector, i.e., the RPL bits, are NOT used to index GDT, selector values 0~3 all point to the NULL descriptor, thus values 0, 1, 2 and 3 are all valid NULL selector values. When a NULL selector value is to be loaded into a segment register, reload_segments() sets its RPL bits. Later IRET zeros ES, FS, GS, and DS segment registers if any of them is found to have any nonzero NULL selector value. The two operations offset each other to actually effect a nop. Besides, zeroing of RPL in NULL selector values is an information leak in pre-FRED systems as userspace can spot any interrupt/exception by loading a nonzero NULL selector, and waiting for it to become zero. But there is nothing software can do to prevent it before FRED. ERETU, the only legit instruction to return to userspace from kernel under FRED, by design does NOT zero any segment register to avoid this problem behavior. As such, leave NULL selector values 0~3 unchanged and close the leak. Do the same on 32-bit kernel as well. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241126184529.1607334-1-xin@zytor.com
2025-02-27x86/fpu/xstate: Simplify print_xstate_features()Chang S. Bae1-21/+9
print_xstate_features() currently invokes print_xstate_feature() multiple times on separate lines, which can be simplified in a loop. print_xstate_feature() already checks the feature's enabled status and is only called within print_xstate_features(). Inline print_xstate_feature() and iterate over features in a loop to streamline the enabling message. No functional changes. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20250227184502.10288-2-chang.seok.bae@intel.com
2025-02-27x86/fpu: Refine and simplify the magic number check during signal returnChang S. Bae1-8/+3
Before restoring xstate from the user space buffer, the kernel performs sanity checks on these magic numbers: magic1 in the software reserved area, and magic2 at the end of XSAVE region. The position of magic2 is calculated based on the xstate size derived from the user space buffer. But, the in-kernel record is directly available and reliable for this purpose. This reliance on user space data is also inconsistent with the recent fix in: d877550eaf2d ("x86/fpu: Stop relying on userspace for info to fault in xsave buffer") Simply use fpstate->user_size, and then get rid of unnecessary size-evaluation code. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211014500.3738-1-chang.seok.bae@intel.com
2025-02-27x86/percpu: Disable named address spaces for UBSAN_BOOL with KASAN for GCC < ↵Uros Bizjak1-9/+11
14.2 GCC < 14.2 does not correctly propagate address space qualifiers with -fsanitize=bool,enum. Together with address sanitizer then causes that load to be sanitized. Disable named address spaces for GCC < 14.2 when both, UBSAN_BOOL and KASAN are enabled. Reported-by: Matt Fleming <matt@readmodwrite.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250227140715.2276353-1-ubizjak@gmail.com Closes: https://lore.kernel.org/lkml/20241213190119.3449103-1-matt@readmodwrite.com/
2025-02-27x86/bootflag: Replace open-coded parity calculation with parity8()Kuan-Wei Chiu1-15/+3
Refactor parity calculations to use the standard parity8() helper. This change eliminates redundant implementations and improves code efficiency. [ ubizjak: Updated the patch to apply to the latest x86 tree. ] Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250227125616.2253774-1-ubizjak@gmail.com
2025-02-27x86/cpu: Remove get_this_hybrid_cpu_*()Pawan Gupta2-45/+0
Because calls to get_this_hybrid_cpu_type() and get_this_hybrid_cpu_native_id() are not required now. cpu-type and native-model-id are cached at boot in per-cpu struct cpuinfo_topology. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-4-2ae010f50370@linux.intel.com
2025-02-27perf/x86/intel: Use cache cpu-type for hybrid PMU selectionPawan Gupta3-28/+25
get_this_hybrid_cpu_type() misses a case when cpu-type is populated regardless of X86_FEATURE_HYBRID_CPU. This is particularly true for hybrid variants that have P or E cores fused off. Instead use the cpu-type cached in struct x86_topology, as it does not rely on hybrid feature to enumerate cpu-type. This can also help avoid the model-specific fixup get_hybrid_cpu_type(). Also replace the get_this_hybrid_cpu_native_id() with its cached value in struct x86_topology. While at it, remove enum hybrid_cpu_type as it serves no purpose when we have the exact cpu-types defined in enum intel_cpu_type. Also rename atom_native_id to intel_native_id and move it to intel-family.h where intel_cpu_type lives. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-3-2ae010f50370@linux.intel.com
2025-02-27x86/cpu: Prefix hexadecimal values with 0x in cpu_debug_show()Pawan Gupta1-2/+2
The hex values in CPU debug interface are not prefixed with 0x. This may cause misinterpretation of values. Fix it. [ mingo: Restore previous vertical alignment of the output. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-1-2ae010f50370@linux.intel.com
2025-02-27x86/platform: Only allow CONFIG_EISA for 32-bitArnd Bergmann1-1/+1
The CONFIG_EISA menu was cleaned up in 2018, but this inadvertently brought the option back on 64-bit machines: ISA remains guarded by a CONFIG_X86_32 check, but EISA no longer depends on ISA. The last Intel machines ith EISA support used a 82375EB PCI/EISA bridge from 1993 that could be paired with the 440FX chipset on early Pentium-II CPUs, long before the first x86-64 products. Fixes: 6630a8e50105 ("eisa: consolidate EISA Kconfig entry in drivers/eisa") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-11-arnd@kernel.org
2025-02-27x86/pci: Remove old STA2x11 supportArnd Bergmann4-277/+3
ST ConneXt STA2x11 was an interface chip for Atom E6xx processors, using a number of components usually found on Arm SoCs. Most of this was merged upstream, but it was never complete enough to actually work and has been abandoned for many years. We already had an agreement on removing it in 2022, but nobody ever submitted the patch to do it. Without STA2x11, CONFIG_X86_32_NON_STANDARD no longer has any use - remove it. Suggested-by: Davide Ciminaghi <ciminaghi@gnudd.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-10-arnd@kernel.org
2025-02-27x86/cpu: Document CONFIG_X86_INTEL_MID as 64-bit-onlyArnd Bergmann1-21/+29
The X86_INTEL_MID code was originally introduced for the 32-bit Moorestown/Medfield/Clovertrail platform, later the 64-bit Merrifield/Moorefield variants were added, but the final Morganfield 14nm platform was canceled before it hit the market. To help users understand what the option actually refers to, update the help text, and add a dependency on 64-bit kernels. Ferry confirmed that all the hardware can run 64-bit kernels these days, but is still testing 32-bit kernels on the Intel Edison board, so this remains possible, but is guarded by a CONFIG_EXPERT dependency now, to gently push remaining users towards using CONFIG_64BIT. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Shevchenko <andy@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-9-arnd@kernel.org
2025-02-27x86/mm: Drop support for CONFIG_HIGHPTEArnd Bergmann3-40/+1
With the maximum amount of RAM now 4GB, there is very little point to still have PTE pages in highmem. Drop this for simplification. The only other architecture supporting HIGHPTE is 32-bit arm, and once that feature is removed as well, the highpte logic can be dropped from common code as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-8-arnd@kernel.org
2025-02-27x86/mm: Drop CONFIG_SWIOTLB for PAEArnd Bergmann1-1/+0
Since kernels with and without CONFIG_X86_PAE are now limited to the low 4GB of physical address space, there is no need to use swiotlb any more, so stop selecting this. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-7-arnd@kernel.org
2025-02-27x86/mm: Remove CONFIG_HIGHMEM64G supportArnd Bergmann4-51/+10
HIGHMEM64G support was added in linux-2.3.25 to support (then) high-end Pentium Pro and Pentium III Xeon servers with more than 4GB of addressing, NUMA and PCI-X slots started appearing. I have found no evidence of this ever being used in regular dual-socket servers or consumer devices, all the users seem obsolete these days, even by i386 standards: - Support for NUMA servers (NUMA-Q, IBM x440, unisys) was already removed ten years ago. - 4+ socket non-NUMA servers based on Intel 450GX/450NX, HP F8 and ServerWorks ServerSet/GrandChampion could theoretically still work with 8GB, but these were exceptionally rare even 20 years ago and would have usually been equipped with than the maximum amount of RAM. - Some SKUs of the Celeron D from 2004 had 64-bit mode fused off but could still work in a Socket 775 mainboard designed for the later Core 2 Duo and 8GB. Apparently most BIOSes at the time only allowed 64-bit CPUs. - The rare Xeon LV "Sossaman" came on a few motherboards with registered DDR2 memory support up to 16GB. - In the early days of x86-64 hardware, there was sometimes the need to run a 32-bit kernel to work around bugs in the hardware drivers, or in the syscall emulation for 32-bit userspace. This likely still works but there should never be a need for this any more. PAE mode is still required to get access to the 'NX' bit on Atom 'Pentium M' and 'Core Duo' CPUs. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-6-arnd@kernel.org
2025-02-27x86/cpu: Drop configuration options for early 64-bit CPUsArnd Bergmann4-103/+17
The x86 CPU selection menu is confusing for a number of reasons: When configuring 32-bit kernels, it shows a small number of early 64-bit microarchitectures (K8, Core 2) but not the regular generic 64-bit target that is the normal default. There is no longer a reason to run 32-bit kernels on production 64-bit systems, so only actual 32-bit CPUs need to be shown here. When configuring 64-bit kernels, the options also pointless as there is no way to pick any CPU from the past 15 years, leaving GENERIC_CPU as the only sensible choice. Address both of the above by removing the obsolete options and making all 64-bit kernels run on both Intel and AMD CPUs from any generation. Testing generic 32-bit kernels on 64-bit hardware remains possible, just not building a 32-bit kernel that requires a 64-bit CPU. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-5-arnd@kernel.org
2025-02-27x86/build: Rework CONFIG_GENERIC_CPU compiler flagsArnd Bergmann1-2/+2
Building an x86-64 kernel with CONFIG_GENERIC_CPU is documented to run on all CPUs, but the Makefile does not actually pass an -march= argument, instead relying on the default that was used to configure the toolchain. In many cases, gcc will be configured to -march=x86-64 or -march=k8 for maximum compatibility, but in other cases a distribution default may be either raised to a more recent ISA, or set to -march=native to build for the CPU used for compilation. This still works in the case of building a custom kernel for the local machine. The point where it breaks down is building a kernel for another machine that is older the the default target. Changing the default to -march=x86-64 would make it work reliable, but possibly produce worse code on distros that intentionally default to a newer ISA. To allow reliably building a kernel for either the oldest x86-64 CPUs, pass the -march=x86-64 flag to the compiler. This was not possible in early versions of x86-64 gcc, but works on all currently supported versions down to at least gcc-5. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-4-arnd@kernel.org
2025-02-27x86/smp: Drop 32-bit "bigsmp" machine supportArnd Bergmann6-169/+4
The x86-32 kernel used to support multiple platforms with more than eight logical CPUs, from the 1999-2003 timeframe: Sequent NUMA-Q, IBM Summit, Unisys ES7000 and HP F8. Support for all except the latter was dropped back in 2014, leaving only the F8 based DL740 and DL760 G2 machines in this catery, with up to eight single-core Socket-603 Xeon-MP processors with hyperthreading. Like the already removed machines, the HP F8 servers at the time cost upwards of $100k in typical configurations, but were quickly obsoleted by their 64-bit Socket-604 cousins and the AMD Opteron. Earlier servers with up to 8 Pentium Pro or Xeon processors remain fully supported as they had no hyperthreading. Similarly, the more common 4-socket Xeon-MP machines with hyperthreading using Intel or ServerWorks chipsets continue to work without this, and all the multi-core Xeon processors also run 64-bit kernels. While the "bigsmp" support can also be used to run on later 64-bit machines (including VM guests), it seems best to discourage that and get any remaining users to update their kernels to 64-bit builds on these. As a side-effect of this, there is also no more need to support NUMA configurations on 32-bit x86, as all true 32-bit NUMA platforms are already gone. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-3-arnd@kernel.org
2025-02-27x86/Kconfig: Add cmpxchg8b support back to Geode CPUsArnd Bergmann1-1/+1
An older cleanup of mine inadvertently removed geode-gx1 and geode-lx from the list of CPUs that are known to support a working cmpxchg8b. Fixes: 88a2b4edda3d ("x86/Kconfig: Rework CONFIG_X86_PAE dependency") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250226213714.4040853-2-arnd@kernel.org
2025-02-27Merge branch 'x86/mm' into x86/cpu, to avoid conflictsIngo Molnar47-350/+605
We are going to apply a new series that conflicts with pending work in x86/mm, so merge in x86/mm to avoid it, and also to refresh the x86/cpu branch with fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-27x86/bugs: Remove X86_FEATURE_USE_IBPBYosry Ahmed2-2/+0
X86_FEATURE_USE_IBPB was introduced in: 2961298efe1e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags") to have separate flags for when the CPU supports IBPB (i.e. X86_FEATURE_IBPB) and when an IBPB is actually used to mitigate Spectre v2. Ever since then, the uses of IBPB expanded. The name became confusing because it does not control all IBPB executions in the kernel. Furthermore, because its name is generic and it's buried within indirect_branch_prediction_barrier(), it's easy to use it not knowing that it is specific to Spectre v2. X86_FEATURE_USE_IBPB is no longer needed because all the IBPB executions it used to control are now controlled through other means (e.g. switch_mm_*_ibpb static branches). Remove the unused feature bit. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20250227012712.3193063-7-yosry.ahmed@linux.dev
2025-02-27KVM: nVMX: Always use IBPB to properly virtualize IBRSYosry Ahmed1-2/+1
On synthesized nested VM-exits in VMX, an IBPB is performed if IBRS is advertised to the guest to properly provide separate prediction domains for L1 and L2. However, this is currently conditional on X86_FEATURE_USE_IBPB, which depends on the host spectre_v2_user mitigation. In short, if spectre_v2_user=no, IBRS is not virtualized correctly and L1 becomes susceptible to attacks from L2. Fix this by performing the IBPB regardless of X86_FEATURE_USE_IBPB. Fixes: 2e7eab81425a ("KVM: VMX: Execute IBPB on emulated VM-exit when guest has IBRS") Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Jim Mattson <jmattson@google.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250227012712.3193063-6-yosry.ahmed@linux.dev
2025-02-27x86/bugs: Use a static branch to guard IBPB on vCPU switchYosry Ahmed4-2/+9
Instead of using X86_FEATURE_USE_IBPB to guard the IBPB execution in KVM when a new vCPU is loaded, introduce a static branch, similar to switch_mm_*_ibpb. This makes it obvious in spectre_v2_user_select_mitigation() what exactly is being toggled, instead of the unclear X86_FEATURE_USE_IBPB (which will be shortly removed). It also provides more fine-grained control, making it simpler to change/add paths that control the IBPB in the vCPU switch path without affecting other IBPBs. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250227012712.3193063-5-yosry.ahmed@linux.dev
2025-02-27x86/bugs: Remove the X86_FEATURE_USE_IBPB check in ib_prctl_set()Yosry Ahmed1-1/+1
If X86_FEATURE_USE_IBPB is not set, then both spectre_v2_user_ibpb and spectre_v2_user_stibp are set to SPECTRE_V2_USER_NONE in spectre_v2_user_select_mitigation(). Since ib_prctl_set() already checks for this before performing the IBPB, the X86_FEATURE_USE_IBPB check is redundant. Remove it. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20250227012712.3193063-4-yosry.ahmed@linux.dev
2025-02-27x86/mm: Remove X86_FEATURE_USE_IBPB checks in cond_mitigation()Yosry Ahmed1-4/+2
The check is performed when either switch_mm_cond_ibpb or switch_mm_always_ibpb is set. In both cases, X86_FEATURE_USE_IBPB is always set. Remove the redundant check. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20250227012712.3193063-3-yosry.ahmed@linux.dev
2025-02-27x86/bugs: Move the X86_FEATURE_USE_IBPB check into callersYosry Ahmed6-8/+12
indirect_branch_prediction_barrier() only performs the MSR write if X86_FEATURE_USE_IBPB is set, using alternative_msr_write(). In preparation for removing X86_FEATURE_USE_IBPB, move the feature check into the callers so that they can be addressed one-by-one, and use X86_FEATURE_IBPB instead to guard the MSR write. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250227012712.3193063-2-yosry.ahmed@linux.dev
2025-02-27x86/bootflag: Micro-optimize sbf_write()Uros Bizjak1-2/+1
Change parity bit with XOR when !parity instead of masking bit out and conditionally setting it when !parity. Saves a couple of bytes in the object file. Co-developed-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226153709.6370-1-ubizjak@gmail.com
2025-02-27x86/mm: Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RWMatthew Wilcox (Oracle)1-3/+3
The bit pattern of _PAGE_DIRTY set and _PAGE_RW clear is used to mark shadow stacks. This is currently checked for in mk_pte() but not pfn_pte(). If we add the check to pfn_pte(), it catches vfree() calling set_direct_map_invalid_noflush() which calls __change_page_attr() which loads the old protection bits from the PTE, clears the specified bits and uses pfn_pte() to construct the new PTE. We should, therefore, for kernel mappings, clear the _PAGE_DIRTY bit consistently whenever we clear _PAGE_RW. I opted to do it in the callers in case we want to use __change_page_attr() to create shadow stacks inside the kernel at some point in the future. Arguably, we might also want to clear _PAGE_ACCESSED here. Note that the 3 functions involved: __set_pages_np() kernel_map_pages_in_pgd() kernel_unmap_pages_in_pgd() Only ever manipulate non-swappable kernel mappings, so maintaining the DIRTY:1|RW:0 special pattern for shadow stacks and DIRTY:0 pattern for non-shadow-stack entries can be maintained consistently and doesn't result in the unintended clearing of a live dirty bit that could corrupt (destroy) dirty bit information for user mappings. Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/174051422675.10177.13226545170101706336.tip-bot2@tip-bot2 Closes: https://lore.kernel.org/oe-lkp/202502241646.719f4651-lkp@intel.com
2025-02-27cxl: Add mce notifier to emit aliased address for extended linear cacheDave Jiang1-0/+1
Below is a setup with extended linear cache configuration with an example layout of memory region shown below presented as a single memory region consists of 256G memory where there's 128G of DRAM and 128G of CXL memory. The kernel sees a region of total 256G of system memory. 128G DRAM 128G CXL memory |-----------------------------------|-------------------------------------| Data resides in either DRAM or far memory (FM) with no replication. Hot data is swapped into DRAM by the hardware behind the scenes. When error is detected in one location, it is possible that error also resides in the aliased location. Therefore when a memory location that is flagged by MCE is part of the special region, the aliased memory location needs to be offlined as well. Add an mce notify callback to identify if the MCE address location is part of an extended linear cache region and handle accordingly. Added symbol export to set_mce_nospec() in x86 code in order to call set_mce_nospec() from the CXL MCE notify callback. Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/ Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20250226162224.3633792-5-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26perf: Remove unnecessary parameter of security checkLuo Gengkun3-3/+3
It seems that the attr parameter was never been used in security checks since it was first introduced by: commit da97e18458fb ("perf_event: Add support for LSM and SELinux checks") so remove it. Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-02-26KVM: Drop kvm_arch_sync_events() now that all implementations are nopsSean Christopherson1-5/+0
Remove kvm_arch_sync_events() now that x86 no longer uses it (no other arch has ever used it). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Message-ID: <20250224235542.2562848-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-26KVM: x86: Fold guts of kvm_arch_sync_events() into kvm_arch_pre_destroy_vm()Sean Christopherson1-3/+12
Fold the guts of kvm_arch_sync_events() into kvm_arch_pre_destroy_vm(), as the kvmclock and PIT background workers only need to be stopped before destroying vCPUs (to avoid accessing vCPUs as they are being freed); it's a-ok for them to be running while the VM is visible on the global vm_list. Note, the PIT also needs to be stopped before IRQ routing is freed (because KVM's IRQ routing is garbage and assumes there is always non-NULL routing). Opportunistically add comments to explain why KVM stops/frees certain assets early. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250224235542.2562848-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-26KVM: x86: Unload MMUs during vCPU destruction, not beforeSean Christopherson1-12/+3
When destroying a VM, unload a vCPU's MMUs as part of normal vCPU freeing, instead of as a separate prepratory action. Unloading MMUs ahead of time is a holdover from commit 7b53aa565084 ("KVM: Fix vcpu freeing for guest smp"), which "fixed" a rather egregious flaw where KVM would attempt to free *all* MMU pages when destroying a vCPU. At the time, KVM would spin on all MMU pages in a VM when free a single vCPU, and so would hang due to the way KVM pins and zaps root pages (roots are invalidated but not freed if they are pinned by a vCPU). static void free_mmu_pages(struct kvm_vcpu *vcpu) { struct kvm_mmu_page *page; while (!list_empty(&vcpu->kvm->active_mmu_pages)) { page = container_of(vcpu->kvm->active_mmu_pages.next, struct kvm_mmu_page, link); kvm_mmu_zap_page(vcpu->kvm, page); } free_page((unsigned long)vcpu->mmu.pae_root); } Now that KVM doesn't try to free all MMU pages when destroying a single vCPU, there's no need to unpin roots prior to destroying a vCPU. Note! While KVM mostly destroys all MMUs before calling kvm_arch_destroy_vm() (see commit f00be0cae4e6 ("KVM: MMU: do not free active mmu pages in free_mmu_pages()")), unpinning MMU roots during vCPU destruction will unfortunately trigger remote TLB flushes, i.e. will try to send requests to all vCPUs. Happily, thanks to commit 27592ae8dbe4 ("KVM: Move wiping of the kvm->vcpus array to common code"), that's a non-issue as freed vCPUs are naturally skipped by xa_for_each_range(), i.e. by kvm_for_each_vcpu(). Prior to that commit, KVM x86 rather stupidly freed vCPUs one-by-one, and _then_ nullified them, one-by-one. I.e. triggering a VM-wide request would hit a use-after-free. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250224235542.2562848-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-26KVM: x86: Don't load/put vCPU when unloading its MMU during teardownSean Christopherson1-8/+1
Don't load (and then put) a vCPU when unloading its MMU during VM destruction, as nothing in kvm_mmu_unload() accesses vCPU state beyond the root page/address of each MMU, i.e. can't possible need to run with the vCPU loaded. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250224235542.2562848-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-26static_call_inline: Provide trampoline address when updating sitesChristophe Leroy1-1/+1
In preparation of support of inline static calls on powerpc, provide trampoline address when updating sites, so that when the destination function is too far for a direct function call, the call site is patched with a call to the trampoline. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/5efe0cffc38d6f69b1ec13988a99f1acff551abf.1733245362.git.christophe.leroy@csgroup.eu
2025-02-26x86/boot: Add missing has_cpuflag() prototypeZhou Ding1-0/+1
We get a warning when building the kernel with W=1: arch/x86/boot/compressed/cpuflags.c:4:6: warning: no previous prototype for ‘has_cpuflag’ [-Werror=missing-prototypes] 4 | bool has_cpuflag(int flag) | ^~~~~~~~~~~ Add a function declaration to cpuflags.h Signed-off-by: Zhou Ding <zhouding@cmss.chinamobile.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20241217162859.1167889-1-zhouding@cmss.chinamobile.com
2025-02-26x86/fpu: Avoid copying dynamic FP state from init_task in arch_dup_task_struct()Benjamin Berg1-1/+6
The init_task instance of struct task_struct is statically allocated and may not contain the full FP state for userspace. As such, limit the copy to the valid area of both init_task and 'dst' and ensure all memory is initialized. Note that the FP state is only needed for userspace, and as such it is entirely reasonable for init_task to not contain parts of it. Fixes: 5aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250226133136.816901-1-benjamin@sipsolutions.net ---- v2: - Fix code if arch_task_struct_size < sizeof(init_task) by using memcpy_and_pad.
2025-02-26x86/bugs: KVM: Add support for SRSO_MSR_FIXBorislav Petkov5-4/+33
Add support for CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate SRSO. Enable BpSpecReduce to mitigate SRSO across guest/host boundaries. Switch back to enabling the bit when virtualization is enabled and to clear the bit when virtualization is disabled because using a MSR slot would clear the bit when the guest is exited and any training the guest has done, would potentially influence the host kernel when execution enters the kernel and hasn't VMRUN the guest yet. More detail on the public thread in Link below. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241202120416.6054-1-bp@kernel.org
2025-02-26x86/ibt: Optimize the fineibt-bhi arity 1 casePeter Zijlstra2-9/+54
Saves a CALL to an out-of-line thunk for the common case of 1 argument. Suggested-by: Scott Constable <scott.d.constable@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.927885784@infradead.org
2025-02-26x86/ibt: Implement FineIBT-BHI mitigationPeter Zijlstra4-19/+131
While WAIT_FOR_ENDBR is specified to be a full speculation stop; it has been shown that some implementations are 'leaky' to such an extend that speculation can escape even the FineIBT preamble. To deal with this, add additional hardening to the FineIBT preamble. Notably, using a new LLVM feature: https://github.com/llvm/llvm-project/commit/e223485c9b38a5579991b8cebb6a200153eee245 which encodes the number of arguments in the kCFI preamble's register. Using this register<->arity mapping, have the FineIBT preamble CALL into a stub clobbering the relevant argument registers in the speculative case. Scott sayeth thusly: Microarchitectural attacks such as Branch History Injection (BHI) and Intra-mode Branch Target Injection (IMBTI) [1] can cause an indirect call to mispredict to an adversary-influenced target within the same hardware domain (e.g., within the kernel). Instructions at the mispredicted target may execute speculatively and potentially expose kernel data (e.g., to a user-mode adversary) through a microarchitectural covert channel such as CPU cache state. CET-IBT [2] is a coarse-grained control-flow integrity (CFI) ISA extension that enforces that each indirect call (or indirect jump) must land on an ENDBR (end branch) instruction, even speculatively*. FineIBT is a software technique that refines CET-IBT by associating each function type with a 32-bit hash and enforcing (at the callee) that the hash of the caller's function pointer type matches the hash of the callee's function type. However, recent research [3] has demonstrated that the conditional branch that enforces FineIBT's hash check can be coerced to mispredict, potentially allowing an adversary to speculatively bypass the hash check: __cfi_foo: ENDBR64 SUB R10d, 0x01234567 JZ foo # Even if the hash check fails and ZF=0, this branch could still mispredict as taken UD2 foo: ... The techniques demonstrated in [3] require the attacker to be able to control the contents of at least one live register at the mispredicted target. Therefore, this patch set introduces a sequence of CMOV instructions at each indirect-callable target that poisons every live register with data that the attacker cannot control whenever the FineIBT hash check fails, thus mitigating any potential attack. The security provided by this scheme has been discussed in detail on an earlier thread [4]. [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html [2] Intel Software Developer's Manual, Volume 1, Chapter 18 [3] https://www.vusec.net/projects/native-bhi/ [4] https://lore.kernel.org/lkml/20240927194925.707462984@infradead.org/ *There are some caveats for certain processors, see [1] for more info Suggested-by: Scott Constable <scott.d.constable@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.820402212@infradead.org
2025-02-26x86/bhi: Add BHI stubsPeter Zijlstra3-1/+153
Add an array of code thunks, to be called from the FineIBT preamble, clobbering the first 'n' argument registers for speculative execution. Notably the 0th entry will clobber no argument registers and will never be used, it exists so the array can be naturally indexed, while the 7th entry will clobber all the 6 argument registers and also RSP in order to mess up stack based arguments. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.717378681@infradead.org
2025-02-26Merge tag 'v6.14-rc4' into x86/fpu, to pick up fixes and refresh the branchIngo Molnar24-103/+218
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-26x86/ibt: Add paranoid FineIBT modePeter Zijlstra1-6/+138
Due to concerns about circumvention attacks against FineIBT on 'naked' ENDBR, add an additional caller side hash check to FineIBT. This should make it impossible to pivot over such a 'naked' ENDBR instruction at the cost of an additional load. The specific pivot reported was against the SYSCALL entry site and FRED will have all those holes fixed up. https://lore.kernel.org/linux-hardening/Z60NwR4w%2F28Z7XUa@ubun/ This specific fineibt_paranoid_start[] sequence was concocted by Scott. Suggested-by: Scott Constable <scott.d.constable@intel.com> Reported-by: Jennifer Miller <jmill@asu.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.598033084@infradead.org
2025-02-26x86/traps: Decode LOCK Jcc.d8 as #UDPeter Zijlstra2-3/+25
Because overlapping code sequences are all the rage. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.486463917@infradead.org
2025-02-26x86/ibt: Optimize the FineIBT instruction sequencePeter Zijlstra2-24/+42
Scott notes that non-taken branches are faster. Abuse overlapping code that traps instead of explicit UD2 instructions. And LEA does not modify flags and will have less dependencies. Suggested-by: Scott Constable <scott.d.constable@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.371942555@infradead.org
2025-02-26x86/traps: Allow custom fixups in handle_bug()Peter Zijlstra2-7/+17
The normal fixup in handle_bug() is simply continuing at the next instruction. However upcoming patches make this the wrong thing, so allow handlers (specifically handle_cfi_failure()) to over-ride regs->ip. The callchain is such that the fixup needs to be done before it is determined if the exception is fatal, as such, revert any changes in that case. Additionally, have handle_cfi_failure() remember the regs->ip value it starts with for reporting. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.275223080@infradead.org
2025-02-26x86/traps: Decode 0xEA instructions as #UDPeter Zijlstra2-3/+18
FineIBT will start using 0xEA as #UD. Normally '0xEA' is a 'bad', invalid instruction for the CPU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.166774696@infradead.org
2025-02-26x86/mce/inject: Remove call to mce_notify_irq()Nikolay Borisov3-25/+22
The call to mce_notify_irq() has been there since the initial version of the soft inject mce machinery, introduced in ea149b36c7f5 ("x86, mce: add basic error injection infrastructure"). At that time it was functional since injecting an MCE resulted in the following call chain: raise_mce() ->machine_check_poll() ->mce_log() - sets notfiy_user_bit ->mce_notify_user() (current mce_notify_irq) consumed the bit and called the usermode helper. However, with the introduction of 011d82611172 ("RAS: Add a Corrected Errors Collector") the code got moved around and the usermode helper began to be called via the early notifier mce_first_notifier() rendering the call in raise_local() defunct as the mce_need_notify bit (ex notify_user) is only being set from the early notifier. Remove the noop call and make mce_notify_irq() static. No functional changes. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250225143348.268469-1-nik.borisov@suse.com
2025-02-26x86/alternatives: Clean up preprocessor conditional block commentsIngo Molnar1-10/+10
When in the middle of a kernel source code file a kernel developer sees a lone #else or #endif: ... #else ... It's not obvious at a glance what those preprocessor blocks are conditional on, if the starting #ifdef is outside visible range. So apply the standard pattern we use in such cases elsewhere in the kernel for large preprocessor blocks: #ifdef CONFIG_XXX ... ... ... #endif /* CONFIG_XXX */ ... #ifdef CONFIG_XXX ... ... ... #else /* !CONFIG_XXX: */ ... ... ... #endif /* !CONFIG_XXX */ ( Note that in the #else case we use the /* !CONFIG_XXX */ marker in the final #endif, not /* CONFIG_XXX */, which serves as an easy visual marker to differentiate #else or #elif related #endif closures from singular #ifdef/#endif blocks. ) Also clean up __CFI_DEFAULT definition with a bit more vertical alignment applied, and a pointless tab converted to the standard space we use in such definitions. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: Peter Zijlstra (Intel) <peterz@infradead.org>