summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2026-03-04arm64: Silence sparse warnings caused by the type casting in (cmp)xchgCatalin Marinas1-5/+7
The arm64 xchg/cmpxchg() wrappers cast the arguments to (unsigned long) prior to invoking the static inline functions implementing the operation. Some restrictive type annotations (e.g. __bitwise) lead to sparse warnings like below: sparse warnings: (new ones prefixed by >>) fs/crypto/bio.c:67:17: sparse: sparse: cast from restricted blk_status_t >> fs/crypto/bio.c:67:17: sparse: sparse: cast to restricted blk_status_t Force the casting in the arm64 xchg/cmpxchg() wrappers to silence sparse. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602230947.uNRsPyBn-lkp@intel.com/ Link: https://lore.kernel.org/r/202602230947.uNRsPyBn-lkp@intel.com/ Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-04x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file pathsJan Stancek1-0/+1
CONFIG_EFI_SBAT_FILE can be a relative path. When compiling using a different output directory (O=) the build currently fails because it can't find the filename set in CONFIG_EFI_SBAT_FILE: arch/x86/boot/compressed/sbat.S: Assembler messages: arch/x86/boot/compressed/sbat.S:6: Error: file not found: kernel.sbat Add $(srctree) as include dir for sbat.o. [ bp: Massage commit message. ] Fixes: 61b57d35396a ("x86/efi: Implement support for embedding SBAT data for x86") Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <stable@kernel.org> Link: https://patch.msgid.link/f4eda155b0cef91d4d316b4e92f5771cb0aa7187.1772047658.git.jstancek@redhat.com
2026-03-04powerpc/crash: adjust the elfcorehdr sizeSourabh Jain1-1/+13
With crash hotplug support enabled, additional memory is allocated to the elfcorehdr kexec segment to accommodate resources added during memory hotplug events. However, the kdump FDT is not updated with the same size, which can result in elfcorehdr corruption in the kdump kernel. Update elf_headers_sz (the kimage member representing the size of the elfcorehdr kexec segment) to reflect the total memory allocated for the elfcorehdr segment instead of the elfcorehdr buffer size at the time of kdump load. This allows of_kexec_alloc_and_setup_fdt() to reserve the full elfcorehdr memory in the kdump FDT and prevents elfcorehdr corruption. Fixes: 849599b702ef8 ("powerpc/crash: add crash memory hotplug support") Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260227171801.2238847-1-sourabhjain@linux.ibm.com
2026-03-04powerpc/kexec/core: use big-endian types for crash variablesSourabh Jain1-8/+9
Use explicit word-sized big-endian types for kexec and crash related variables. This makes the endianness unambiguous and avoids type mismatches that trigger sparse warnings. The change addresses sparse warnings like below (seen on both 32-bit and 64-bit builds): CHECK ../arch/powerpc/kexec/core.c sparse: expected unsigned int static [addressable] [toplevel] [usertype] crashk_base sparse: got restricted __be32 [usertype] sparse: warning: incorrect type in assignment (different base types) sparse: expected unsigned int static [addressable] [toplevel] [usertype] crashk_size sparse: got restricted __be32 [usertype] sparse: warning: incorrect type in assignment (different base types) sparse: expected unsigned long long static [addressable] [toplevel] mem_limit sparse: got restricted __be32 [usertype] sparse: warning: incorrect type in assignment (different base types) sparse: expected unsigned int static [addressable] [toplevel] [usertype] kernel_end sparse: got restricted __be32 [usertype] No functional change intended. Fixes: ea961a828fe7 ("powerpc: Fix endian issues in kexec and crash dump code") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512221405.VHPKPjnp-lkp@intel.com/ Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251224151257.28672-1-sourabhjain@linux.ibm.com
2026-03-04powerpc/prom_init: Fixup missing #size-cells on PowerMac media-bay nodesRob Herring (Arm)1-1/+2
Similar to other PowerMac mac-io devices, the media-bay node is missing the "#size-cells" property. Depends-on: commit 045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells handling") Reported-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251029174047.1620073-1-robh@kernel.org
2026-03-04powerpc: dts: fsl: Drop unused .dtsi filesRob Herring (Arm)4-324/+0
These files are not included by anything and therefore don't get built or tested. There's also no upstream driver for the interlaken-lac stuff. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260128140222.1627203-1-robh@kernel.org
2026-03-04powerpc/uaccess: Fix inline assembly for clang build on PPC32Christophe Leroy (CS GROUP)1-1/+1
Test robot reports the following error with clang-16.0.6: In file included from kernel/rseq.c:75: include/linux/rseq_entry.h:141:3: error: invalid operand for instruction unsafe_get_user(offset, &ucs->post_commit_offset, efault); ^ include/linux/uaccess.h:608:2: note: expanded from macro 'unsafe_get_user' arch_unsafe_get_user(x, ptr, local_label); \ ^ arch/powerpc/include/asm/uaccess.h:518:2: note: expanded from macro 'arch_unsafe_get_user' __get_user_size_goto(__gu_val, __gu_addr, sizeof(*(p)), e); \ ^ arch/powerpc/include/asm/uaccess.h:284:2: note: expanded from macro '__get_user_size_goto' __get_user_size_allowed(x, ptr, size, __gus_retval); \ ^ arch/powerpc/include/asm/uaccess.h:275:10: note: expanded from macro '__get_user_size_allowed' case 8: __get_user_asm2(x, (u64 __user *)ptr, retval); break; \ ^ arch/powerpc/include/asm/uaccess.h:258:4: note: expanded from macro '__get_user_asm2' " li %1+1,0\n" \ ^ <inline asm>:7:5: note: instantiated into assembly here li 31+1,0 ^ 1 error generated. On PPC32, for 64 bits vars a pair of registers is used. Usually the lower register in the pair is the high part and the higher register is the low part. GCC uses r3/r4 ... r11/r12 ... r14/r15 ... r30/r31 In older kernel code inline assembly was using %1 and %1+1 to represent 64 bits values. However here it looks like clang uses r31 as high part, allthough r32 doesn't exist hence the error. Allthoug %1+1 should work, most places now use %L1 instead of %1+1, so let's do the same here. With that change, the build doesn't fail anymore and a disassembly shows clang uses r17/r18 and r31/r14 pair when GCC would have used r16/r17 and r30/r31: Disassembly of section .fixup: 00000000 <.fixup>: 0: 38 a0 ff f2 li r5,-14 4: 3a 20 00 00 li r17,0 8: 3a 40 00 00 li r18,0 c: 48 00 00 00 b c <.fixup+0xc> c: R_PPC_REL24 .text+0xbc 10: 38 a0 ff f2 li r5,-14 14: 3b e0 00 00 li r31,0 18: 39 c0 00 00 li r14,0 1c: 48 00 00 00 b 1c <.fixup+0x1c> 1c: R_PPC_REL24 .text+0x144 Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602021825.otcItxGi-lkp@intel.com/ Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()") Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Acked-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/8ca3a657a650e497a96bfe7acde2f637dadab344.1770103646.git.chleroy@kernel.org
2026-03-04powerpc/e500: Always use 64 bits PTEChristophe Leroy5-110/+5
Today there are two PTE formats for e500: - The 64 bits format, used - On 64 bits kernel - On 32 bits kernel with 64 bits physical addresses - On 32 bits kernel with support of huge pages - The 32 bits format, used in other cases Maintaining two PTE formats means unnecessary maintenance burden because every change needs to be implemented and tested for both formats. Remove the 32 bits PTE format. The memory usage increase due to larger PTEs is minimal (approx. 0,1% of memory). This also means that from now on huge pages are supported also with 32 bits physical addresses. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/04a658209ea78dcc0f3dbde6b2c29cf1939adfe9.1767721208.git.chleroy@kernel.org
2026-03-04KVM/TDX: Rename KVM_SUPPORTED_TD_ATTRS to KVM_SUPPORTED_TDX_TD_ATTRSXiaoyao Li1-2/+2
Rename KVM_SUPPORTED_TD_ATTRS to KVM_SUPPORTED_TDX_TD_ATTRS to include "TDX" in the name, making it clear that it pertains to TDX. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260303030335.766779-5-xiaoyao.li@intel.com
2026-03-04x86/tdx: Rename TDX_ATTR_* to TDX_TD_ATTR_*Xiaoyao Li4-44/+44
The macros TDX_ATTR_* and DEF_TDX_ATTR_* are related to TD attributes, which are TD-scope attributes. Naming them as TDX_ATTR_* can be somewhat confusing and might mislead people into thinking they are TDX global things. Rename TDX_ATTR_* to TDX_TD_ATTR_* to explicitly clarify they are TD-scope things. Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260303030335.766779-4-xiaoyao.li@intel.com
2026-03-04KVM/TDX: Remove redundant definitions of TDX_TD_ATTR_*Xiaoyao Li2-8/+2
There are definitions of TD attributes bits inside asm/shared/tdx.h as TDX_ATTR_*. Remove KVM's definitions and use the ones in asm/shared/tdx.h Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260303030335.766779-3-xiaoyao.li@intel.com
2026-03-04x86/tdx: Fix the typo in TDX_ATTR_MIGRTABLEXiaoyao Li2-3/+3
The TD scoped TDCS attributes are defined by bit positions. In the guest side of the TDX code, the 'tdx_attributes' string array holds pretty print names for these attributes, which are generated via macros and defines. Today these pretty print names are only used to print the attribute names to dmesg. Unfortunately there is a typo in the define for the migratable bit. Change the defines TDX_ATTR_MIGRTABLE* to TDX_ATTR_MIGRATABLE*. Update the sole user, the tdx_attributes array, to use the fixed name. Since these defines control the string printed to dmesg, the change is user visible. But the risk of breakage is almost zero since it is not exposed in any interface expected to be consumed programmatically. Fixes: 564ea84c8c14 ("x86/tdx: Dump attributes and TD_CTLS on boot") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260303030335.766779-2-xiaoyao.li@intel.com
2026-03-03KVM: x86: Drop redundant call to kvm_deliver_exception_payload()Yosry Ahmed1-6/+4
In kvm_check_and_inject_events(), kvm_deliver_exception_payload() is called for pending #DB exceptions. However, shortly after, the per-vendor inject_exception callbacks are made. Both vmx_inject_exception() and svm_inject_exception() unconditionally call kvm_deliver_exception_payload(), so the call in kvm_check_and_inject_events() is redundant. Note that the extra call for pending #DB exceptions is harmless, as kvm_deliver_exception_payload() clears exception.has_payload after the first call. The call in kvm_check_and_inject_events() was added in commit f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery"). At that point, the call was likely needed because svm_queue_exception() checked whether an exception for L2 is intercepted by L1 before calling kvm_deliver_exception_payload(), as SVM did not have a check_nested_events callback. Since DR6 is updated before the #DB intercept in SVM (unlike VMX), it was necessary to deliver the DR6 payload before calling svm_queue_exception(). After that, commit 7c86663b68ba ("KVM: nSVM: inject exceptions via svm_check_nested_events") added a check_nested_events callback for SVM, which checked for L1 intercepts for L2's exceptions, and delivered the the payload appropriately before the intercept. At that point, svm_queue_exception() started calling kvm_deliver_exception_payload() unconditionally, and the call to kvm_deliver_exception_payload() from its caller became redundant. No functional change intended. Signed-off-by: Yosry Ahmed <yosry@kernel.org> Link: https://patch.msgid.link/20260302154249.784529-1-yosry@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: SVM: Skip OSVW MSR reads if current CPU doesn't support the featureSean Christopherson1-6/+2
Skip the OSVW RDMSRs if the current CPU doesn't enumerate support for the MSRs. In practice, checking only the boot CPU's capabilities is sufficient, as the RDMSRs should fault when unsupported, but there's no downside to being more precise, and checking only the boot CPU _looks_ wrong given the rather odd semantics of the MSRs. E.g. if a CPU doesn't support OVSW, then KVM must assume all errata are present. Link: https://patch.msgid.link/20251113231420.1695919-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: SVM: Skip OSVW variable updates if current CPU's errata are a subsetSean Christopherson1-12/+10
Elide the OSVW variable updates if the current CPU's set of errata are a subset of the errata tracked in the global values, i.e. if no update is needed. There's no danger of under-reporting errata due to bailing early as KVM is purely reducing the set of "known fixed" errata. I.e. a racing update on a different CPU with _more_ errata doesn't change anything if the current CPU has the same or fewer errata relative to the status quo. If another CPU is writing osvw_len, then "len" is guaranteed to be larger than the new osvw_len and so the osvw_len update would be skipped anyways. If another CPU is setting new bits in osvw_status, then "status" is guaranteed to be a subset of the new osvw_status and the bitwise-OR would be an effective nop anyways. Link: https://patch.msgid.link/20251113231420.1695919-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: SVM: Extract OS-visible workarounds setup to helper functionSean Christopherson1-41/+49
Move the initialization of the global OSVW variables to a helper function so that svm_enable_virtualization_cpu() isn't polluted with a pile of what is effectively legacy code. No functional change intended. Link: https://patch.msgid.link/20251113231420.1695919-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: SVM: Skip OSVW MSR reads if KVM is treating all errata as presentSean Christopherson1-2/+12
Don't bother reading the OSVW MSRs if osvw_len is already zero, i.e. if KVM is already treating all errata as present, in which case the positive path of the if-statement is one giant nop. Opportunistically update the comment to more thoroughly explain how the MSRs work and why the code does what it does. Link: https://patch.msgid.link/20251113231420.1695919-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: SVM: Serialize updates to global OS-Visible Workarounds variablesSean Christopherson1-3/+7
Guard writes to the global osvw_status and osvw_len variables with a spinlock to ensure enabling virtualization on multiple CPUs in parallel doesn't effectively drop any writes due to writing back stale data. Don't bother taking the lock when the boot CPU doesn't support the feature, as that check is constant for all CPUs, i.e. racing writes will always write the same value (zero). Note, the bug was inadvertently "fixed" by commit 9a798b1337af ("KVM: Register cpuhp and syscore callbacks when enabling hardware"), which effectively serialized calls to enable virtualization due to how the cpuhp framework "brings up" CPU. But KVM shouldn't rely on the mechanics of cphup to provide serialization. Link: https://patch.msgid.link/20251113231420.1695919-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86/mmu: Don't zero-allocate page table used for splitting a hugepageSean Christopherson1-1/+1
When splitting hugepages in the TDP MMU, don't zero the new page table on allocation since tdp_mmu_split_huge_page() is guaranteed to write every entry and thus every byte. Unless someone peeks at the memory between allocating the page table and writing the child SPTEs, no functional change intended. Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260218210820.2828896-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03xtensa: xtfpga: Use register_sys_off_handler(SYS_OFF_MODE_RESTART)Andrew Davis1-7/+4
Function register_restart_handler() is deprecated. Using this new API removes our need to keep and manage a struct notifier_block. Signed-off-by: Andrew Davis <afd@ti.com> Message-ID: <20260302193543.275197-3-afd@ti.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2026-03-03xtensa: xt2000: Use register_sys_off_handler(SYS_OFF_MODE_RESTART)Andrew Davis1-7/+4
Function register_restart_handler() is deprecated. Using this new API removes our need to keep and manage a struct notifier_block. Signed-off-by: Andrew Davis <afd@ti.com> Message-ID: <20260302193543.275197-2-afd@ti.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2026-03-03xtensa: ISS: Use register_sys_off_handler(SYS_OFF_MODE_RESTART)Andrew Davis1-7/+4
Function register_restart_handler() is deprecated. Using this new API removes our need to keep and manage a struct notifier_block. Signed-off-by: Andrew Davis <afd@ti.com> Message-ID: <20260302193543.275197-1-afd@ti.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2026-03-03x86/cpu: Remove LASS restriction on EFISohil Mehta1-6/+1
The initial LASS enabling has been deferred to much later during boot, and EFI runtime services now run with LASS temporarily disabled. This removes LASS from the path of all EFI services. Remove the LASS restriction on EFI config, as the two can now coexist. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://patch.msgid.link/20260120234730.2215498-4-sohil.mehta@intel.com
2026-03-03x86/efi: Disable LASS while executing runtime servicesSohil Mehta1-0/+35
Ideally, EFI runtime services should switch to kernel virtual addresses after SetVirtualAddressMap(). However, firmware implementations are known to be buggy in this regard and continue to access physical addresses. The kernel maintains a 1:1 mapping of all runtime services code and data regions to avoid breaking such firmware. LASS enforcement relies on bit 63 of the virtual address, which would block such accesses to the lower half. Unfortunately, not doing anything could lead to #GP faults when users update to a kernel with LASS enabled. One option is to use a STAC/CLAC pair to temporarily disable LASS data enforcement. However, there is no guarantee that the stray accesses would only touch data and not perform instruction fetches. Also, relying on the AC bit would depend on the runtime calls preserving RFLAGS, which is highly unlikely in practice. Instead, use the big hammer and switch off the entire LASS mechanism temporarily by clearing CR4.LASS. Runtime services are called in the context of efi_mm, which has explicitly unmapped any memory EFI isn't allowed to touch (including userspace). So, do this right after switching to efi_mm to avoid any security impact. Some runtime services can be invoked during boot when LASS isn't active. Use a global variable (similar to efi_mm) to save and restore the correct CR4.LASS state. The runtime calls are serialized with the efi_runtime_lock, so no concurrency issues are expected. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://patch.msgid.link/20260120234730.2215498-3-sohil.mehta@intel.com
2026-03-03x86/cpu: Defer LASS enabling until userspace comes upSohil Mehta1-1/+22
LASS blocks any kernel access to the lower half of the virtual address space. Unfortunately, some EFI accesses happen during boot with bit 63 cleared, which causes a #GP fault when LASS is enabled. Notably, the SetVirtualAddressMap() call can only happen in EFI physical mode. Also, EFI_BOOT_SERVICES_CODE/_DATA could be accessed even after ExitBootServices(). The boot services memory is truly freed during efi_free_boot_services() after SVAM has completed. To prevent EFI from tripping LASS, at a minimum, LASS enabling must be deferred until EFI has completely finished entering virtual mode (including freeing boot services memory). Moving setup_lass() to arch_cpu_finalize_init() would do the trick, but that would make the implementation very fragile. Something else might come in the future that would need the LASS enabling to be moved again. In general, security features such as LASS provide limited value before userspace is up. They aren't necessary during early boot while only trusted ring0 code is executing. Introduce a generic late initcall to defer activating some CPU features until userspace is enabled. For now, only move the LASS CR4 programming to this initcall. As APs are already up by the time late initcalls run, some extra steps are needed to enable LASS on all CPUs. Use a CPU hotplug callback instead of on_each_cpu() or smp_call_function(). This ensures that LASS is enabled on every CPU that is currently online as well as any future CPUs that come online later. Note, even though hotplug callbacks run with preemption enabled, cr4_set_bits() would disable interrupts while updating CR4. Keep the existing logic in place to clear the LASS feature bits early. setup_clear_cpu_cap() must be called before boot_cpu_data is finalized and alternatives are patched. Eventually, the entire setup_lass() logic can go away once the restrictions based on vsyscall emulation and EFI are removed. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://patch.msgid.link/20260120234730.2215498-2-sohil.mehta@intel.com
2026-03-03bpf, arm64: Use ORR-based MOV for general-purpose registersPuranjay Mohan1-1/+3
The A64_MOV macro unconditionally uses ADD Rd, Rn, #0 to implement register moves. While functionally correct, this is not the canonical encoding when both operands are general-purpose registers. On AArch64, MOV has two aliases depending on the operand registers: - MOV <Xd|SP>, <Xn|SP> → ADD <Xd|SP>, <Xn|SP>, #0 - MOV <Xd>, <Xn> → ORR <Xd>, XZR, <Xn> The ADD form is required when the stack pointer is involved (as ORR does not accept SP), while the ORR form is the preferred encoding for general-purpose registers. The ORR encoding is also measurably faster on modern microarchitectures. A microbenchmark [1] comparing dependent chains of MOV (ORR) vs ADD #0 on an ARM Neoverse-V2 (72-core, 3.4 GHz) shows: === mov (ORR Xd, XZR, Xn) === run1 cycles/op=0.749859456 run2 cycles/op=0.749991250 run3 cycles/op=0.749601847 avg cycles/op=0.749817518 === add0 (ADD Xd, Xn, #0) === run1 cycles/op=1.004777689 run2 cycles/op=1.004558266 run3 cycles/op=1.004806559 avg cycles/op=1.004714171 The ORR form completes in ~0.75 cycles/op vs ~1.00 cycles/op for ADD #0, a ~25% improvement. This is likely because the CPU's register renaming hardware can eliminate ORR-based moves, while ADD #0 must go through the ALU pipeline. Update A64_MOV to select the appropriate encoding at JIT time: use ADD when either register is A64_SP, and ORR (via aarch64_insn_gen_move_reg()) otherwise. Update verifier_private_stack selftests to expect "mov x7, x0" instead of "add x7, x0, #0x0" in the JITed instruction checks, matching the new ORR-based encoding. [1] https://github.com/puranjaymohan/scripts/blob/main/arm64/bench/run_mov_vs_add0.sh Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260225134339.2723288-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03bpf,s390: add fsession support for trampolinesMenglong Dong1-14/+60
Implement BPF_TRACE_FSESSION support for s390. The logic here is similar to what we did in x86_64. In order to simply the logic, we factor out the function invoke_bpf() for fentry and fexit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260224092208.1395085-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03bpf,s390: introduce emit_store_stack_imm64() for trampolineMenglong Dong1-11/+10
Introduce a helper to store 64-bit immediate on the trampoline stack with a help of a register. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20260224092208.1395085-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03s390: Introduce bpf_get_lowcore() kfuncIlya Leoshkevich2-0/+14
Implementing BPF version of preempt_count() requires accessing lowcore from BPF. Since lowcore can be relocated, open-coding (struct lowcore *)0 does not work, so add a kfunc. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20260217160813.100855-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03sparc/PCI: Initialize msi_addr_mask for OF-created PCI devicesNilay Shroff1-0/+7
Recent changes replaced the use of no_64bit_msi with msi_addr_mask, which is now expected to be initialized to DMA_BIT_MASK(64) during PCI device setup. On SPARC systems, this initialization was inadvertently missed for devices instantiated from device tree nodes, leaving msi_addr_mask unset for OF-created pci_dev instances. As a result, MSI address validation fails during probe, causing affected devices to fail initialization. Initialize pdev->msi_addr_mask to DMA_BIT_MASK(64) in of_create_pci_dev() so that MSI address validation succeeds and PCI device probing works as expected. Fixes: 386ced19e9a3 ("PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Han Gao <gaohan@iscas.ac.cn> # SPARC Enterprise T5220 Tested-by: Nathaniel Roach <nroach44@nroach44.id.au> # SPARC T5-2 Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn> Link: https://patch.msgid.link/20260220070239.1693303-3-nilay@linux.ibm.com
2026-03-03powerpc/pci: Initialize msi_addr_mask for OF-created PCI devicesNilay Shroff1-0/+7
Recent changes replaced the use of no_64bit_msi with msi_addr_mask. As a result, msi_addr_mask is now expected to be initialized to DMA_BIT_MASK(64) when a pci_dev is set up. However, this initialization was missed on powerpc due to differences in the device initialization path compared to other (x86) architecture. Due to this, now PCI device probe method fails on powerpc system. On powerpc systems, struct pci_dev instances are created from device tree nodes via of_create_pci_dev(). Because msi_addr_mask was not initialized there, it remained zero. Later, during MSI setup, msi_verify_entries() validates the programmed MSI address against pdev->msi_addr_mask. Since the mask was not set correctly, the validation fails, causing PCI driver probe failures for devices on powerpc systems. Initialize pdev->msi_addr_mask to DMA_BIT_MASK(64) in of_create_pci_dev() so that MSI address validation succeeds and device probe works as expected. Fixes: 386ced19e9a3 ("PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Tested-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn> Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260220070239.1693303-2-nilay@linux.ibm.com
2026-03-03s390/stackleak: Fix __stackleak_poison() inline assembly constraintHeiko Carstens1-1/+1
The __stackleak_poison() inline assembly comes with a "count" operand where the "d" constraint is used. "count" is used with the exrl instruction and "d" means that the compiler may allocate any register from 0 to 15. If the compiler would allocate register 0 then the exrl instruction would not or the value of "count" into the executed instruction - resulting in a stackframe which is only partially poisoned. Use the correct "a" constraint, which excludes register 0 from register allocation. Fixes: 2a405f6bb3a5 ("s390/stackleak: provide fast __stackleak_poison() implementation") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-4-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-03s390/xor: Improve inline assembly constraintsHeiko Carstens1-4/+4
The inline assembly constraint for the "bytes" operand is "d" for all xor() inline assemblies. "d" means that any register from 0 to 15 can be used. If the compiler would use register 0 then the exrl instruction would not or the value of "bytes" into the executed instruction - resulting in an incorrect result. However all the xor() inline assemblies make hard-coded use of register 0, and it is correctly listed in the clobber list, so that this cannot happen. Given that this is quite subtle use the better "a" constraint, which excludes register 0 from register allocation in any case. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-3-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-03s390/xor: Fix xor_xc_2() inline assembly constraintsHeiko Carstens1-2/+2
The inline assembly constraints for xor_xc_2() are incorrect. "bytes", "p1", and "p2" are input operands, while all three of them are modified within the inline assembly. Given that the function consists only of this inline assembly it seems unlikely that this may cause any problems, however fix this in any case. Fixes: 2cfc5f9ce7f5 ("s390/xor: optimized xor routing using the XC instruction") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-2-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-03s390/xor: Fix xor_xc_5() inline assemblyVasily Gorbik1-1/+0
xor_xc_5() contains a larl 1,2f that is not used by the asm and is not declared as a clobber. This can corrupt a compiler-allocated value in %r1 and lead to miscompilation. Remove the instruction. Fixes: 745600ed6965 ("s390/lib: Use exrl instead of ex in xor functions") Cc: stable@vger.kernel.org Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-03x86/PVH: Use boot params to pass RSDP address in start_info pageHou Wenlong1-6/+1
After commit e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from boot params if available"), the RSDP address can be passed in boot params. Therefore, store the RSDP address in start_info page into boot params in the PVH entry instead of registering a different callback. This removes an absolute reference during the PVH entry and is more standardized. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <76675c4d49d3a8f72252076812ef8f22276230c2.1772282441.git.houwenlong.hwl@antgroup.com>
2026-03-03x86/xen: update outdated commentkexinsun1-1/+1
The function xen_flush_tlb_others() was renamed xen_flush_tlb_multi() by commit 4ce94eabac16 ("x86/mm/tlb: Flush remote and local TLBs concurrently"). Update the comment accordingly. Signed-off-by: kexinsun <kexinsun@smail.nju.edu.cn> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260224022424.1718-1-kexinsun@smail.nju.edu.cn>
2026-03-03x86/xen: Build identity mapping page tables dynamically for XENPVHou Wenlong3-30/+9
After commit 47ffe0578aee ("x86/pvh: Add 64bit relocation page tables"), the PVH entry uses a new set of page tables instead of the preconstructed page tables in head64.S. Since those preconstructed page tables are only used in XENPV now and XENPV does not actually need the preconstructed identity page tables directly, they can be filled in xen_setup_kernel_pagetable(). Therefore, build the identity mapping page table dynamically to remove the preconstructed page tables and make the code cleaner. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: "Borislav Petkov (AMD)" <bp@alien8.de> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <453981eae7e8158307f971d1632d5023adbe03c3.1769074722.git.houwenlong.hwl@antgroup.com>
2026-03-03KVM: x86/mmu: Fix UBSAN warning when reading nx_huge_pages parameterGal Pressman1-0/+5
The nx_huge_pages parameter is stored as an int (initialized to -1 to indicate auto mode), but get_nx_huge_pages() calls param_get_bool() which expects a bool pointer. This causes UBSAN to report "load of value 255 is not a valid value for type '_Bool'" when the parameter is read via sysfs during a narrow time window. The issue occurs during module load: the module parameter is registered and its sysfs file becomes readable before the kvm_mmu_x86_module_init() function runs: 1. Module load begins, static variable initialized to -1 2. mod_sysfs_setup() creates /sys/module/kvm/parameters/nx_huge_pages 3. (Parameter readable, value = -1) 4. do_init_module() runs kvm_x86_init() 5. kvm_mmu_x86_module_init() resolves -1 to bool If userspace (e.g., sos report) reads the parameter during step 3, param_get_bool() dereferences the int as a bool, triggering the UBSAN warning. Fix that by properly reading and converting the -1 value into an 'auto' string. Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260225145050.2350278-3-gal@nvidia.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: SVM: Fix UBSAN warning when reading avic parameterGal Pressman1-1/+12
The avic parameter is stored as an int to support the special value -1 (AVIC_AUTO_MODE), but the cited commit changed it from bool to int while keeping param_get_bool() as the getter function. This causes UBSAN to report "load of value 255 is not a valid value for type '_Bool'" when the parameter is read via sysfs. The issue happens in two scenarios: 1. During module load: There's a time window between when module parameters are registered, and when avic_hardware_setup() runs to resolve the value, where the value is -1. 2. On non-AMD systems: On non-AMD hardware, the kvm_is_svm_supported() check returns early. The avic_hardware_setup() function never runs, so avic remains -1. Fix that by implementing a getter function that properly reads and converts the -1 value into a string. Triggered by sos report: UBSAN: invalid-load in kernel/params.c:323:33 load of value 255 is not a valid value for type '_Bool' CPU: 0 UID: 0 PID: 4667 Comm: sos Not tainted 6.19.0-rc5net_mlx5_1e86836 #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x69/0xa0 ubsan_epilogue+0x5/0x2b __ubsan_handle_load_invalid_value.cold+0x47/0x4c ? lock_acquire+0x219/0x2c0 param_get_bool.cold+0xf/0x14 param_attr_show+0x51/0x80 module_attr_show+0x19/0x30 sysfs_kf_seq_show+0xac/0xf0 seq_read_iter+0x100/0x410 copy_splice_read+0x1b4/0x360 splice_direct_to_actor+0xbd/0x270 ? wait_for_space+0xb0/0xb0 do_splice_direct+0x72/0xb0 ? propagate_umount+0x870/0x870 do_sendfile+0x3a3/0x470 __x64_sys_sendfile64+0x5e/0xe0 do_syscall_64+0x70/0x8c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: ca2967de5a5b ("KVM: SVM: Enable AVIC by default for Zen4+ if x2AVIC is support") Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://patch.msgid.link/20260225145050.2350278-2-gal@nvidia.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Add helpers to prepare kvm_run for userspace MMIO exitSean Christopherson3-43/+37
Add helpers to fill kvm_run for userspace MMIO exits to deduplicate a variety of code, and to allow for a cleaner return path in emulator_read_write(). Opportunistically add a KVM_BUG_ON() to ensure the caller is limiting the length of a single MMIO access to 8 bytes (the largest size userspace is prepared to handled, as the ABI was baked before things like MOVDQ came along). No functional change intended. Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Binbin Wu <binbin.wu@linux.intel.com> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Michael Roth <michael.roth@amd.com> Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Don't panic the kernel if completing userspace I/O / MMIO goes ↵Sean Christopherson1-4/+8
sideways Kill the VM instead of the host kernel if KVM botches I/O and/or MMIO handling. There is zero danger to the host or guest, i.e. panicking the host isn't remotely justified. Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Rename .read_write_emulate() to .read_write_guest()Sean Christopherson1-10/+28
Rename the ops and helpers to read/write guest memory to clarify that they do exactly that, i.e. aren't generic emulation flows and don't do anything related to emulated MMIO. Opportunistically add comments to explain the flow, e.g. it's not exactly obvious that KVM deliberately treats "failed" accesses to guest memory as emulated MMIO. No functional change intended. Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Fold emulator_write_phys() into write_emulate()Sean Christopherson2-16/+7
Fold emulator_write_phys() into write_emulate() to drop a superfluous wrapper, and to provide more symmetry between the read and write paths. No functional change intended. Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Bury emulator read/write ops in emulator_{read,write}_emulated()Sean Christopherson1-15/+14
Now that SEV-ES invokes vcpu_mmio_{read,write}() directly, bury the read vs. write emulator ops in the dedicated emulator callbacks so that they are colocated with their usage, and to make it harder for non-emulator code to use hooks that are intended for the emulator. Opportunistically rename the structures to get rid of the long-standing "emultor" typo. No functional change intended. Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Consolidate SEV-ES MMIO emulation into a single public APISean Christopherson3-38/+17
Dedup kvm_sev_es_mmio_{read,write}() into a single API, as the "cost" of plumbing in a boolean is largely negligible since KVM pulls out a boolean for ops->write anyways, and consolidating the APIs will allow for additional cleanups. No functional change intended. Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Dedup kvm_sev_es_mmio_{read,write}()Sean Christopherson1-39/+19
Dedup the SEV-ES emulated MMIO code by using the read vs. write emulator ops to handle the few differences between reads and writes. Opportunistically tweak the comment about fragments to call out that KVM should verify that userspace can actually handle MMIO requests that cross page boundaries. Unlike emulated MMIO, the request is made in the GPA space, not the GVA space, i.e. emulation across page boundaries can work generically, at least in theory. No functional change intended. Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Harden SEV-ES MMIO against on-stack use-after-freeSean Christopherson1-4/+6
Add a sanity check to ensure KVM doesn't use an on-stack variable when handling an MMIO request for an SEV-ES guest. The source/destination for SEV-ES MMIO should _always_ be the #VMGEXIT scratch area. Opportunistically update the comment in the completion side of things to clarify that frag->data doesn't need to be copied anywhere, and the VMEGEXIT is trap-like (the current comment doesn't clarify *how* RIP is advanced). Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Move MMIO write tracing into vcpu_mmio_write()Sean Christopherson1-8/+5
Move the invocation of MMIO write tracepoint into vcpu_mmio_write() and drop its largely-useless wrapper to cull pointless code and to make the code symmetrical with respect to vcpu_mmio_read(). No functional change intended. Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-03KVM: x86: Open code read vs. write userspace MMIO exits in emulator_read_write()Sean Christopherson1-20/+13
Open code the differences in read vs. write userspace MMIO exits instead of burying three lines of code behind indirect callbacks, as splitting the logic makes it extremely hard to track that KVM's handling of reads vs. write is _significantly_ different. Add a comment to explain why the semantics are different, and how on earth an MMIO write ends up triggering an exit to userspace. No functional change intended. Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>