summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2025-04-07crypto: chacha - centralize the skcipher wrappers for arch codeEric Biggers1-1/+7
Following the example of the crc32 and crc32c code, make the crypto subsystem register both generic and architecture-optimized chacha20, xchacha20, and xchacha12 skcipher algorithms, all implemented on top of the appropriate library functions. This eliminates the need for every architecture to implement the same skcipher glue code. To register the architecture-optimized skciphers only when architecture-optimized code is actually being used, add a function chacha_is_arch_optimized() and make each arch implement it. Change each architecture's ChaCha module_init function to arch_initcall so that the CPU feature detection is guaranteed to run before chacha_is_arch_optimized() gets called by crypto/chacha.c. In the case of s390, remove the CPU feature based module autoloading, which is no longer needed since the module just gets pulled in via function linkage. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aes-xts - optimize _compute_first_set_of_tweaks for AVX-512Eric Biggers1-28/+62
Optimize the AVX-512 version of _compute_first_set_of_tweaks by using vectorized shifts to compute the first vector of tweak blocks, and by using byte-aligned shifts when multiplying by x^8. AES-XTS performance on AMD Ryzen 9 9950X (Zen 5) improves by about 2% for 4096-byte messages or 6% for 512-byte messages. AES-XTS performance on Intel Sapphire Rapids improves by about 1% for 4096-byte messages or 3% for 512-byte messages. Code size decreases by 75 bytes which outweighs the increase in rodata size of 16 bytes. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86 - Remove CONFIG_AS_AVX512 handlingUros Bizjak7-26/+9
Current minimum required version of binutils is 2.25, which supports AVX-512 instruction mnemonics. Remove check for assembler support of AVX-512 instructions and all relevant macros for conditional compilation. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86 - Remove CONFIG_AS_SHA256_NIUros Bizjak3-16/+1
Current minimum required version of binutils is 2.25, which supports SHA-256 instruction mnemonics. Remove check for assembler support of SHA-256 instructions and all relevant macros for conditional compilation. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86 - Remove CONFIG_AS_SHA1_NIUros Bizjak3-17/+1
Current minimum required version of binutils is 2.25, which supports SHA-1 instruction mnemonics. Remove check for assembler support of SHA-1 instructions and all relevant macros for conditional compilation. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/chacha - Remove SIMD fallback pathHerbert Xu1-35/+11
Get rid of the fallback path as SIMD is now always usable in softirq context. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/twofish - stop using the SIMD helperEric Biggers2-15/+7
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/sm4 - stop using the SIMD helperEric Biggers3-42/+22
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/serpent - stop using the SIMD helperEric Biggers4-45/+21
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/cast - stop using the SIMD helperEric Biggers3-30/+13
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/camellia - stop using the SIMD helperEric Biggers3-29/+14
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aria - stop using the SIMD helperEric Biggers4-47/+20
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aes - stop using the SIMD helperEric Biggers2-100/+59
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aegis - stop using the SIMD helperEric Biggers2-10/+4
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR codeEric Biggers3-121/+74
Intel made a late change to the AVX10 specification that removes support for a 256-bit maximum vector length and enumeration of the maximum vector length. AVX10 will imply a maximum vector length of 512 bits. I.e. there won't be any such thing as AVX10/256 or AVX10/512; there will just be AVX10, and it will essentially just consolidate AVX512 features. As a result of this new development, my strategy of providing both *_avx10_256 and *_avx10_512 functions didn't turn out to be that useful. The only remaining motivation for the 256-bit AVX512 / AVX10 functions is to avoid downclocking on older Intel CPUs. But in the case of AES-XTS and AES-CTR, I already wrote *_avx2 code too (primarily to support CPUs without AVX512), which performs almost as well as *_avx10_256. So we should just use that. Therefore, remove the *_avx10_256 AES-XTS and AES-CTR functions and algorithms, and rename the *_avx10_512 AES-XTS and AES-CTR functions and algorithms to *_avx512. Make Ice Lake and Tiger Lake use *_avx2 instead of *_avx10_256 which they previously used. I've left AES-GCM unchanged for now. There is no VAES+AVX2 optimized AES-GCM in the kernel yet, so the path forward for that is not as clear. However, I did write a VAES+AVX2 optimized AES-GCM for BoringSSL. So one option is to port that to the kernel and then do the same cleanup. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-06x86/boot: Move the EFI mixed mode startup code back under arch/x86, into ↵Ard Biesheuvel2-0/+256
startup/ Linus expressed a strong preference for arch-specific asm code (i.e., virtually all of it) to reside under arch/ rather than anywhere else. So move the EFI mixed mode startup code back, and put it under arch/x86/boot/startup/ where all shared x86 startup code is going to live. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20250401133416.1436741-11-ardb+git@google.com
2025-04-06x86/boot: Move the 5-level paging trampoline into /startupArd Biesheuvel4-1/+5
The 5-level paging trampoline is used by both the EFI stub and the traditional decompressor. Move it out of the decompressor sources into the newly minted arch/x86/boot/startup/ sub-directory which will hold startup code that may be shared between the decompressor, the EFI stub and the kernel proper, and needs to tolerate being called during early boot, before the kernel virtual mapping has been created. This will allow the 5-level paging trampoline to be used by EFI boot images such as zboot that omit the traditional decompressor entirely. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250401133416.1436741-10-ardb+git@google.com
2025-04-06x86/boot/compressed: Merge the local pgtable.h include into <asm/boot.h>Ard Biesheuvel6-22/+10
Merge the local include "pgtable.h" -which declares the API of the 5-level paging trampoline- into <asm/boot.h> so that its implementation in la57toggle.S as well as the calling code can be decoupled from the traditional decompressor. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250401133416.1436741-9-ardb+git@google.com
2025-04-06x86/boot: Use __ALIGN_KERNEL_MASK() instead of open coded analogueAndy Shevchenko1-3/+1
LOAD_PHYSICAL_ADDR is calculated as an aligned (up) CONFIG_PHYSICAL_START with the respective alignment value CONFIG_PHYSICAL_ALIGN. However, the code is written openly while we have __ALIGN_KERNEL_MASK() macro that does the same. This macro has nothing special, that's why it may be used in assembler code or linker scripts (on the contrary __ALIGN_KERNEL() may not). Do it so. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250404165303.3657139-1-andriy.shevchenko@linux.intel.com
2025-04-06x86/cpuid: Add AMX and SPEC_CTRL dependenciesAndi Kleen1-0/+4
Add some missing dependencies to the CPUID dependency table: - All the AMX features depend on AMX_TILE - All the SPEC_CTRL features depend on SPEC_CTRL [ mingo: Keep the AMX part of the table grouped ... ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Ahmed S. Darwish <darwi@linutronix.de> Link: https://lore.kernel.org/r/20240924170128.2611854-1-ak@linux.intel.com
2025-04-06Merge tag 'timers-cleanups-2025-04-06' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A set of final cleanups for the timer subsystem: - Convert all del_timer[_sync]() instances over to the new timer_delete[_sync]() API and remove the legacy wrappers. Conversion was done with coccinelle plus some manual fixups as coccinelle chokes on scoped_guard(). - The final cleanup of the hrtimer_init() to hrtimer_setup() conversion. This has been delayed to the end of the merge window, so that all patches which have been merged through other trees are in mainline and all new users are catched. Doing this right before rc1 ensures that new code which is merged post rc1 is not introducing new instances of the original functionality" * tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/timers: Rename the hrtimer_init event to hrtimer_setup hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack() hrtimers: Rename debug_init() to debug_setup() hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns() hrtimers: Make callback function pointer private hrtimers: Merge __hrtimer_init() into __hrtimer_setup() hrtimers: Switch to use __htimer_setup() hrtimers: Delete hrtimer_init() treewide: Convert new and leftover hrtimer_init() users treewide: Switch/rename to timer_delete[_sync]()
2025-04-06Merge tag 'irq-urgent-2025-04-06' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more irq updates from Thomas Gleixner: "A set of updates for the interrupt subsystem: - A treewide cleanup for the irq_domain code, which makes the naming consistent and gets rid of the original oddity of naming domains 'host'. This is a trivial mechanical change and is done late to ensure that all instances have been catched and new code merged post rc1 wont reintroduce new instances. - A trivial consistency fix in the migration code The recent introduction of irq_force_complete_move() in the core code, causes a problem for the nostalgia crowd who maintains ia64 out of tree. The code assumes that hierarchical interrupt domains are enabled and dereferences irq_data::parent_data unconditionally. That works in mainline because both architectures which enable that code have hierarchical domains enabled. Though it breaks the ia64 build, which enables the functionality, but does not have hierarchical domains. While it's not really a problem for mainline today, this unconditional dereference is inconsistent and trivially fixable by using the existing helper function irqd_get_parent_data(), which has the appropriate #ifdeffery in place" * tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move() irqdomain: Stop using 'host' for domain irqdomain: Rename irq_get_default_host() to irq_get_default_domain() irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
2025-04-06Merge tag 'kbuild-v6.15' of ↵Linus Torvalds5-51/+7
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Improve performance in gendwarfksyms - Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS - Support CONFIG_HEADERS_INSTALL for ARCH=um - Use more relative paths to sources files for better reproducibility - Support the loong64 Debian architecture - Add Kbuild bash completion - Introduce intermediate vmlinux.unstripped for architectures that need static relocations to be stripped from the final vmlinux - Fix versioning in Debian packages for -rc releases - Treat missing MODULE_DESCRIPTION() as an error - Convert Nios2 Makefiles to use the generic rule for built-in DTB - Add debuginfo support to the RPM package * tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits) kbuild: rpm-pkg: build a debuginfo RPM kconfig: merge_config: use an empty file as initfile nios2: migrate to the generic rule for built-in DTB rust: kbuild: skip `--remap-path-prefix` for `rustdoc` kbuild: pacman-pkg: hardcode module installation path kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally modpost: require a MODULE_DESCRIPTION() kbuild: make all file references relative to source root x86: drop unnecessary prefix map configuration kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS kbuild: Add a help message for "headers" kbuild: deb-pkg: remove "version" variable in mkdebian kbuild: deb-pkg: fix versioning for -rc releases Documentation/kbuild: Fix indentation in modules.rst example x86: Get rid of Makefile.postlink kbuild: Create intermediate vmlinux build with relocations preserved kbuild: Introduce Kconfig symbol for linking vmlinux with relocations kbuild: link-vmlinux.sh: Make output file name configurable kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y Revert "kheaders: Ignore silly-rename files" ...
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner2-5/+5
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-04irqdomain: Rename irq_set_default_host() to irq_set_default_domain()Jiri Slaby (SUSE)1-1/+1
Naming interrupt domains host is confusing at best and the irqdomain code uses both domain and host inconsistently. Therefore rename irq_set_default_host() to irq_set_default_domain(). Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-3-jirislaby@kernel.org
2025-04-04Merge tag 'x86-urgent-2025-04-04' of ↵Linus Torvalds6-25/+44
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix a performance regression on AMD iGPU and dGPU drivers, related to the unintended activation of DMA bounce buffers that regressed game performance if KASLR disturbed things just enough - Fix a copy_user_generic() performance regression on certain older non-FSRM/ERMS CPUs - Fix a Clang build warning due to a semantic merge conflict the Kunit tree generated with the x86 tree - Fix FRED related system hang during S4 resume - Remove an unused API * tag 'x86-urgent-2025-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fred: Fix system hang during S4 resume with FRED enabled x86/platform/iosf_mbi: Remove unused iosf_mbi_unregister_pmic_bus_access_notifier() x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers x86/tools: Drop duplicate unlikely() definition in insn_decoder_test.c x86/uaccess: Improve performance by aligning writes to 8 bytes in copy_user_generic(), on non-FSRM/ERMS CPUs
2025-04-04Merge branch 'kvm-pi-fix-lockdep' into HEADPaolo Bonzini1-7/+30
2025-04-04KVM: VMX: Use separate subclasses for PI wakeup lock to squash false positiveYan Zhao1-3/+29
Use a separate subclass when acquiring KVM's per-CPU posted interrupts wakeup lock in the scheduled out path, i.e. when adding a vCPU on the list of vCPUs to wake, to workaround a false positive deadlock. Chain exists of: &p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); lock(&rq->__lock); lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); lock(&p->pi_lock); *** DEADLOCK *** In the wakeup handler, the callchain is *always*: sysvec_kvm_posted_intr_wakeup_ipi() | --> pi_wakeup_handler() | --> kvm_vcpu_wake_up() | --> try_to_wake_up(), and the lock order is: &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) --> &p->pi_lock. For the schedule out path, the callchain is always (for all intents and purposes; if the kernel is preemptible, kvm_sched_out() can be called from something other than schedule(), but the beginning of the callchain will be the same point in vcpu_block()): vcpu_block() | --> schedule() | --> kvm_sched_out() | --> vmx_vcpu_put() | --> vmx_vcpu_pi_put() | --> pi_enable_wakeup_handler() and the lock order is: &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) I.e. lockdep sees AB+BC ordering for schedule out, and CA ordering for wakeup, and complains about the A=>C versus C=>A inversion. In practice, deadlock can't occur between schedule out and the wakeup handler as they are mutually exclusive. The entirely of the schedule out code that runs with the problematic scheduler locks held, does so with IRQs disabled, i.e. can't run concurrently with the wakeup handler. Use a subclass instead disabling lockdep entirely, and tell lockdep that both subclasses are being acquired when loading a vCPU, as the sched_out and sched_in paths are NOT mutually exclusive, e.g. CPU 0 CPU 1 --------------- --------------- vCPU0 sched_out vCPU1 sched_in vCPU1 sched_out vCPU 0 sched_in where vCPU0's sched_in may race with vCPU1's sched_out, on CPU 0's wakeup list+lock. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-ID: <20250401154727.835231-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04KVM: VMX: Assert that IRQs are disabled when putting vCPU on PI wakeup listSean Christopherson1-4/+1
Assert that IRQs are already disabled when putting a vCPU on a CPU's PI wakeup list, as opposed to saving/disabling+restoring IRQs. KVM relies on IRQs being disabled until the vCPU task is fully scheduled out, i.e. until the scheduler has dropped all of its per-CPU locks (e.g. for the runqueue), as attempting to wake the task while it's being scheduled out could lead to deadlock. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Message-ID: <20250401154727.835231-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04KVM: x86: Explicitly zero-initialize on-stack CPUID unionsSean Christopherson1-5/+3
Explicitly zero/empty-initialize the unions used for PMU related CPUID entries, instead of manually zeroing all fields (hopefully), or in the case of 0x80000022, relying on the compiler to clobber the uninitialized bitfields. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-ID: <20250315024102.2361628-1-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04KVM: x86/mmu: Wrap sanity check on number of TDP MMU pages with KVM_PROVE_MMUSean Christopherson2-2/+13
Wrap the TDP MMU page counter in CONFIG_KVM_PROVE_MMU so that the sanity check is omitted from production builds, and more importantly to remove the atomic accesses to account pages. A one-off memory leak in production is relatively uninteresting, and a WARN_ON won't help mitigate a systemic issue; it's as much about helping triage memory leaks as it is about detecting them in the first place, and doesn't magically stop the leaks. I.e. production environments will be quite sad if a severe KVM bug escapes, regardless of whether or not KVM WARNs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250315023448.2358456-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04KVM: x86: Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accessesSean Christopherson1-0/+4
Acquire a lock on kvm->srcu when userspace is getting MP state to handle a rather extreme edge case where "accepting" APIC events, i.e. processing pending INIT or SIPI, can trigger accesses to guest memory. If the vCPU is in L2 with INIT *and* a TRIPLE_FAULT request pending, then getting MP state will trigger a nested VM-Exit by way of ->check_nested_events(), and emuating the nested VM-Exit can access guest memory. The splat was originally hit by syzkaller on a Google-internal kernel, and reproduced on an upstream kernel by hacking the triple_fault_event_test selftest to stuff a pending INIT, store an MSR on VM-Exit (to generate a memory access on VMX), and do vcpu_mp_state_get() to trigger the scenario. ============================= WARNING: suspicious RCU usage 6.14.0-rc3-b112d356288b-vmx/pi_lockdep_false_pos-lock #3 Not tainted ----------------------------- include/linux/kvm_host.h:1058 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by triple_fault_ev/1256: #0: ffff88810df5a330 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x8b/0x9a0 [kvm] stack backtrace: CPU: 11 UID: 1000 PID: 1256 Comm: triple_fault_ev Not tainted 6.14.0-rc3-b112d356288b-vmx #3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x7f/0x90 lockdep_rcu_suspicious+0x144/0x190 kvm_vcpu_gfn_to_memslot+0x156/0x180 [kvm] kvm_vcpu_read_guest+0x3e/0x90 [kvm] read_and_check_msr_entry+0x2e/0x180 [kvm_intel] __nested_vmx_vmexit+0x550/0xde0 [kvm_intel] kvm_check_nested_events+0x1b/0x30 [kvm] kvm_apic_accept_events+0x33/0x100 [kvm] kvm_arch_vcpu_ioctl_get_mpstate+0x30/0x1d0 [kvm] kvm_vcpu_ioctl+0x33e/0x9a0 [kvm] __x64_sys_ioctl+0x8b/0xb0 do_syscall_64+0x6c/0x170 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250401150504.829812-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-03Merge tag 'mm-stable-2025-04-02-22-07' of ↵Linus Torvalds3-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - The series "mm: fixes for fallouts from mem_init() cleanup" from Mike Rapoport fixes a couple of issues with the just-merged "arch, mm: reduce code duplication in mem_init()" series - The series "MAINTAINERS: add my isub-entries to MM part." from Mike Rapoport does some maintenance on MAINTAINERS - The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some cleanup work to the page mapping code - The series "mseal system mappings" from Jeff Xu permits sealing of "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode) - Plus the usual shower of singleton patches * tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits) mseal sysmap: add arch-support txt mseal sysmap: enable s390 selftest: test system mappings are sealed mseal sysmap: update mseal.rst mseal sysmap: uprobe mapping mseal sysmap: enable arm64 mseal sysmap: enable x86-64 mseal sysmap: generic vdso vvar mapping selftests: x86: test_mremap_vdso: skip if vdso is msealed mseal sysmap: kernel config and header change mm: pgtable: remove tlb_remove_page_ptdesc() x86: pgtable: convert to use tlb_remove_ptdesc() riscv: pgtable: unconditionally use tlb_remove_ptdesc() mm: pgtable: convert some architectures to use tlb_remove_ptdesc() mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc* mm: pgtable: make generic tlb_remove_table() use struct ptdesc microblaze/mm: put mm_cmdline_setup() in .init.text section mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range MAINTAINERS: mm: add entry for secretmem MAINTAINERS: mm: add entry for numa memblocks and numa emulation ...
2025-04-03x86/idle: Use MONITOR and MWAIT mnemonics in <asm/mwait.h>Uros Bizjak1-9/+12
Current minimum required version of binutils is 2.25, which supports MONITOR and MWAIT instruction mnemonics. Replace the byte-wise specification of MONITOR and MWAIT with these proper mnemonics. No functional change intended. Note: LLVM assembler is not able to assemble correct forms of MONITOR and MWAIT instructions with explicit operands and reports: error: invalid operand for instruction monitor %rax,%ecx,%edx ^~~~ # https://lore.kernel.org/oe-kbuild-all/202504030802.2lEVBSpN-lkp@intel.com/ Use instruction mnemonics with implicit operands to work around this issue. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250403125111.429805-1-ubizjak@gmail.com
2025-04-03x86/idle: Change arguments of mwait_idle_with_hints() to u32Uros Bizjak1-1/+1
All functions in mwait_idle_with_hints() cast eax and ecx arguments to u32. Propagate argument type to the enclosing function. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250403073105.245987-1-ubizjak@gmail.com
2025-04-03x86/tlb: Simplify choose_new_asid() and generate better codeBorislav Petkov (AMD)1-29/+34
Have it return the two things it does return: - a new ASID and - the need to flush the TLB or not, in a struct which fits in a single 32-bit register and whack the IO parameters. Beyond being easier to read, this also helps the compiler generate better, more compact code: # arch/x86/mm/tlb.o: text data bss dec hex filename 9341 753 516 10610 2972 tlb.o.before 9213 753 516 10482 28f2 tlb.o.after No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250403085623.20824-1-bp@kernel.org
2025-04-03x86/idle: Remove CONFIG_AS_TPAUSEUros Bizjak2-13/+2
There is not much point in CONFIG_AS_TPAUSE at all when the emitted assembly is always the same - it only obfuscates the __tpause() code in essence. Remove the TPAUSE insn mnemonic from __tpause() and leave only the equivalent byte-wise definition. This can then be changed back to insn mnemonic once binutils 2.31.1 is the minimum version to build the kernel. (Right now it's 2.25.) Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250402180827.3762-4-ubizjak@gmail.com
2025-04-03x86/idle: Remove .s output beautifying delimiters from simpler asm() templatesUros Bizjak1-7/+7
Delimiters in asm() templates such as ';', '\t' or '\n' are not required syntactically, they were used historically in the Linux kernel to prettify the compiler's .s output for people who were looking at compiler generated .s output. Most x86 developers these days are primarily looking at: 1) objdump --disassemble-all .o 2) perf top's live kernel function annotation and disassembler feature that uses /dev/mem. ... because: - this kind of assembler output is standardized regardless of compiler used, - it's generally less messy looking, - it gives ground-truth instead of being some intermediate layer in the toolchain that might or might not be the real deal, - and on a live kernel it also sees through the kernel's various layers of runtime patching code obfuscation facilities, also known as: alternative-instructions, tracepoints and jump labels. There are some cases where the .s output is the most useful tool, such as alternatives() code generation, but other than that these delimiters used in simple asm() statements mostly add noise to the source code side, which isn't desirable for assembly code that is fragile enough already. Remove the delimiters for <asm/mwait.h>, which also happens to make the GCC inliner's asm() instruction length heuristics more accurate... [ mingo: Wrote a new changelog to give historic context and to give people a chance to object. :-) ] Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250402180827.3762-3-ubizjak@gmail.com
2025-04-03Merge tag 'cxl-for-6.15' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) updates from Dave Jiang: - Add support for Global Persistent Flush (GPF) - Cleanup of DPA partition metadata handling: - Remove the CXL_DECODER_MIXED enum that's not needed anymore - Introduce helpers to access resource and perf meta data - Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' - Make cxl_dpa_alloc() DPA partition number agnostic - Remove cxl_decoder_mode - Cleanup partition size and perf helpers - Remove unused CXL partition values - Add logging support for CXL CPER endpoint and port protocol errors: - Prefix protocol error struct and function names with cxl_ - Move protocol error definitions and structures to a common location - Remove drivers/firmware/efi/cper_cxl.h to include/linux/cper.h - Add support in GHES to process CXL CPER protocol errors - Process CXL CPER protocol errors - Add trace logging for CXL PCIe port RAS errors - Remove redundant gp_port init - Add validation of cxl device serial number - CXL ABI documentation updates/fixups - A series that uses guard() to clean up open coded mutex lockings and remove gotos for error handling. - Some followup patches to support dirty shutdown accounting: - Add helper to retrieve DVSEC offset for dirty shutdown registers - Rename cxl_get_dirty_shutdown() to cxl_arm_dirty_shutdown() - Add support for dirty shutdown count via sysfs - cxl_test support for dirty shutdown - A series to support CXL mailbox Features commands. Mostly in preparation for CXL EDAC code to utilize the Features commands. It's also in preparation for CXL fwctl support to utilize the CXL Features. The commands include "Get Supported Features", "Get Feature", and "Set Feature". - A series to support extended linear cache support described by the ACPI HMAT table. The addition helps enumerate the cache and also provides additional RAS reporting support for configuration with extended linear cache. (and related fixes for the series). - An update to cxl_test to support a 3-way capable CFMWS - A documentation fix to remove unused "mixed mode" * tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (39 commits) cxl/region: Fix the first aliased address miscalculation cxl/region: Quiet some dev_warn()s in extended linear cache setup cxl/Documentation: Remove 'mixed' from sysfs mode doc cxl: Fix warning from emitting resource_size_t as long long int on 32bit systems cxl/test: Define a CFMWS capable of a 3 way HB interleave cxl/mem: Do not return error if CONFIG_CXL_MCE unset tools/testing/cxl: Set Shutdown State support cxl/pmem: Export dirty shutdown count via sysfs cxl/pmem: Rename cxl_dirty_shutdown_state() cxl/pci: Introduce cxl_gpf_get_dvsec() cxl/pci: Support Global Persistent Flush (GPF) cxl: Document missing sysfs files cxl: Plug typos in ABI doc cxl/pmem: debug invalid serial number data cxl/cdat: Remove redundant gp_port initialization cxl/memdev: Remove unused partition values cxl/region: Drop goto pattern of construct_region() cxl/region: Drop goto pattern in cxl_dax_region_alloc() cxl/core: Use guard() to drop goto pattern of cxl_dpa_alloc() cxl/core: Use guard() to drop the goto pattern of cxl_dpa_free() ...
2025-04-02x86/idle: Standardize argument types for MONITOR{,X} and MWAIT{,X} ↵Uros Bizjak1-8/+5
instruction wrappers on 'u32' MONITOR and MONITORX expect 32-bit unsigned integer arguments in the %ecx and %edx registers. MWAIT and MWAITX expect 32-bit usigned int argument in %eax and %ecx registers. Some of the helpers around these instructions in <asm/mwait.h> are using too wide types (long), standardize on u32 instead that makes it clear that this is a hardware ABI. [ mingo: Cleaned up the changelog. ] Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250402180827.3762-1-ubizjak@gmail.com
2025-04-02x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR in ↵Andrew Cooper2-12/+6
mwait_idle_with_hints() and prefer_mwait_c1_over_halt() The following commit, 12 years ago: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers") added barriers around the CLFLUSH in mwait_idle_with_hints(), justified with: ... and add memory barriers around it since the documentation is explicit that CLFLUSH is only ordered with respect to MFENCE. This also triggered, 11 years ago, the same adjustment in: f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs") during development, although it failed to get the static_cpu_has_bug() treatment. X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel CPUs, and the SDM currently states: Executions of the CLFLUSH instruction are ordered with respect to each other and with respect to writes, locked read-modify-write instructions, and fence instructions[1]. With footnote 1 reading: Earlier versions of this manual specified that executions of the CLFLUSH instruction were ordered only by the MFENCE instruction. All processors implementing the CLFLUSH instruction also order it relative to the other operations enumerated above. i.e. The SDM was incorrect at the time, and barriers should not have been inserted. Double checking the original AAI65 errata (not available from intel.com any more) shows no mention of barriers either. Note: If this were a general codepath, the MFENCEs would be needed, because AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs. Remove the unnecessary barriers. Furthermore, use a plain alternative(), rather than static_cpu_has_bug() and/or no optimisation. The workaround is a single instruction. Use an explicit %rax pointer rather than a general memory operand, because MONITOR takes the pointer implicitly in the same way. [ mingo: Cleaned up the commit a bit. ] Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250402172458.1378112-1-andrew.cooper3@citrix.com
2025-04-02Merge tag 'uml-for-linux-6.15-rc1' of ↵Linus Torvalds7-46/+47
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Johannes Berg: - proper nofault accesses and read-only rodata - hostfs fix for host inode number reuse - fixes for host errno handling - various cleanups/small fixes * tag 'uml-for-linux-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: Rewrite the sigio workaround based on epoll and tgkill um: Prohibit the VM_CLONE flag in run_helper_thread() um: Switch to the pthread-based helper in sigio workaround um: ubd: Switch to the pthread-based helper um: Add pthread-based helper support um: x86: clean up elf specific definitions um: Store full CSGSFS and SS register from mcontext um: virt-pci: Refactor virtio_pcidev into its own module um: work around sched_yield not yielding in time-travel mode um/locking: Remove semicolon from "lock" prefix um: Update min_low_pfn to match changes in uml_reserved um: use str_yes_no() to remove hardcoded "yes" and "no" um: hostfs: avoid issues on inode number reuse by host um: Allocate vdso page pointer statically um: remove copy_from_kernel_nofault_allowed um: mark rodata read-only and implement _nofault accesses um: Pass the correct Rust target and options with gcc
2025-04-02Merge tag 'x86_tdx_for_6.15-rc1' of ↵Linus Torvalds8-40/+78
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TDX updates from Dave Hansen: "Avoid direct HLT instruction execution in TDX guests. TDX guests aren't expected to use the HLT instruction directly. It causes a virtualization exception (#VE). While the #VE _can_ be handled, the current handling is slow and buggy and the easiest thing is just to avoid HLT in the first place. Plus, the kernel already has paravirt infrastructure that makes it relatively painless. Make TDX guests require paravirt and add some TDX-specific paravirt handlers which avoid HLT in the normal halt routines. Also add a warning in case another HLT sneaks in. There was a report that this leads to a "major performance improvement" on specjbb2015, probably because of the extra #VE overhead or missed wakeups from the buggy HLT handling" * tag 'x86_tdx_for_6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tdx: Emit warning if IRQs are enabled during HLT #VE handling x86/tdx: Fix arch_safe_halt() execution for TDX VMs x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
2025-04-02Merge tag 'objtool-urgent-2025-04-01' of ↵Linus Torvalds3-14/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "These are objtool fixes and updates by Josh Poimboeuf, centered around the fallout from the new CONFIG_OBJTOOL_WERROR=y feature, which, despite its default-off nature, increased the profile/impact of objtool warnings: - Improve error handling and the presentation of warnings/errors - Revert the new summary warning line that some test-bot tools interpreted as new regressions - Fix a number of objtool warnings in various drivers, core kernel code and architecture code. About half of them are potential problems related to out-of-bounds accesses or potential undefined behavior, the other half are additional objtool annotations - Update objtool to latest (known) compiler quirks and objtool bugs triggered by compiler code generation - Misc fixes" * tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) objtool/loongarch: Add unwind hints in prepare_frametrace() rcu-tasks: Always inline rcu_irq_work_resched() context_tracking: Always inline ct_{nmi,irq}_{enter,exit}() sched/smt: Always inline sched_smt_active() objtool: Fix verbose disassembly if CROSS_COMPILE isn't set objtool: Change "warning:" to "error: " for fatal errors objtool: Always fail on fatal errors Revert "objtool: Increase per-function WARN_FUNC() rate limit" objtool: Append "()" to function name in "unexpected end of section" warning objtool: Ignore end-of-section jumps for KCOV/GCOV objtool: Silence more KCOV warnings, part 2 objtool, drm/vmwgfx: Don't ignore vmw_send_msg() for ORC objtool: Fix STACK_FRAME_NON_STANDARD for cold subfunctions objtool: Fix segfault in ignore_unreachable_insn() objtool: Fix NULL printf() '%s' argument in builtin-check.c:save_argv() objtool, lkdtm: Obfuscate the do_nothing() pointer objtool, regulator: rk808: Remove potential undefined behavior in rk806_set_mode_dcdc() objtool, ASoC: codecs: wcd934x: Remove potential undefined behavior in wcd934x_slim_irq_handler() objtool, Input: cyapa - Remove undefined behavior in cyapa_update_fw_store() objtool, panic: Disable SMAP in __stack_chk_fail() ...
2025-04-02mseal sysmap: enable x86-64Jeff Xu2-2/+4
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on x86-64, covering the vdso, vvar, vvar_vclock. Production release testing passes on Android and Chrome OS. Link: https://lkml.kernel.org/r/20250305021711.3867874-4-jeffxu@google.com Signed-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Kees Cook <kees@kernel.org> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Berg <benjamin@sipsolutions.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Elliot Hughes <enh@google.com> Cc: Florian Faineli <f.fainelli@gmail.com> Cc: Greg Ungerer <gerg@kernel.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Linus Waleij <linus.walleij@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <mike.rapoport@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stephen Röttger <sroettger@google.com> Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-02x86: pgtable: convert to use tlb_remove_ptdesc()Qi Zheng1-4/+4
The x86 has already been converted to use struct ptdesc, so convert it to use tlb_remove_ptdesc() instead of tlb_remove_table(). Link: https://lkml.kernel.org/r/36ad56b7e06fa4b17fb23c4fc650e8e0d72bb3cd.1740454179.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickens <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01x86/mm: Stop prefetching current->mm->mmap_lock on page faultsMateusz Guzik1-3/+0
The prefetchw() dates back decades and the fundamental notion of doing something like this on a lock is shady. Moreover, for a few years now in the fast path faults are handled with RCU + per-vma locking, hopefully not even looking at the lock to begin with. As such just remove it. I did not see a point benchmarking this. Given that it is not expected to be looked at by default justifies not doing the prefetch. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250401143520.1113572-1-mjguzik@gmail.com
2025-04-01x86/mm: Simplify the pgd_leaf() and p4d_leaf() checks a bitBaoquan He1-2/+2
The functions return bool, simplify the checks. [ mingo: Split off from two other patches. ] Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250331081327.256412-6-bhe@redhat.com
2025-04-01x86/mm: Remove the arch-specific p4d_leaf() definitionBaoquan He1-7/+0
P4D huge pages are not supported yet, let's use the generic definition in <linux/pgtable.h>. [ mingo: Cleaned up the changelog. ] Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Link: https://lore.kernel.org/r/20250331081327.256412-7-bhe@redhat.com
2025-04-01x86/mm: Remove the arch-specific pgd_leaf() definitionBaoquan He1-3/+0
PGD huge pages are not supported yet, let's use the generic definition in <linux/pgtable.h>. [ mingo: Cleaned up the changelog. ] Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Link: https://lore.kernel.org/r/20250331081327.256412-6-bhe@redhat.com