Age | Commit message (Collapse) | Author | Files | Lines |
|
Following the example of the crc32 and crc32c code, make the crypto
subsystem register both generic and architecture-optimized chacha20,
xchacha20, and xchacha12 skcipher algorithms, all implemented on top of
the appropriate library functions. This eliminates the need for every
architecture to implement the same skcipher glue code.
To register the architecture-optimized skciphers only when
architecture-optimized code is actually being used, add a function
chacha_is_arch_optimized() and make each arch implement it. Change each
architecture's ChaCha module_init function to arch_initcall so that the
CPU feature detection is guaranteed to run before
chacha_is_arch_optimized() gets called by crypto/chacha.c. In the case
of s390, remove the CPU feature based module autoloading, which is no
longer needed since the module just gets pulled in via function linkage.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Optimize the AVX-512 version of _compute_first_set_of_tweaks by using
vectorized shifts to compute the first vector of tweak blocks, and by
using byte-aligned shifts when multiplying by x^8.
AES-XTS performance on AMD Ryzen 9 9950X (Zen 5) improves by about 2%
for 4096-byte messages or 6% for 512-byte messages. AES-XTS performance
on Intel Sapphire Rapids improves by about 1% for 4096-byte messages or
3% for 512-byte messages. Code size decreases by 75 bytes which
outweighs the increase in rodata size of 16 bytes.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Current minimum required version of binutils is 2.25,
which supports AVX-512 instruction mnemonics.
Remove check for assembler support of AVX-512 instructions
and all relevant macros for conditional compilation.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Current minimum required version of binutils is 2.25,
which supports SHA-256 instruction mnemonics.
Remove check for assembler support of SHA-256 instructions
and all relevant macros for conditional compilation.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Current minimum required version of binutils is 2.25,
which supports SHA-1 instruction mnemonics.
Remove check for assembler support of SHA-1 instructions
and all relevant macros for conditional compilation.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Get rid of the fallback path as SIMD is now always usable in softirq
context.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c). The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs. Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU. This has now been fixed. In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.
This simplifies the code and improves performance.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c). The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs. Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU. This has now been fixed. In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.
This simplifies the code and improves performance.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c). The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs. Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU. This has now been fixed. In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.
This simplifies the code and improves performance.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c). The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs. Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU. This has now been fixed. In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.
This simplifies the code and improves performance.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c). The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs. Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU. This has now been fixed. In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.
This simplifies the code and improves performance.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c). The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs. Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU. This has now been fixed. In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.
This simplifies the code and improves performance.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c). The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs. Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU. This has now been fixed. In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.
This simplifies the code and improves performance.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c). The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs. Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU. This has now been fixed. In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.
This simplifies the code and improves performance.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Intel made a late change to the AVX10 specification that removes support
for a 256-bit maximum vector length and enumeration of the maximum
vector length. AVX10 will imply a maximum vector length of 512 bits.
I.e. there won't be any such thing as AVX10/256 or AVX10/512; there will
just be AVX10, and it will essentially just consolidate AVX512 features.
As a result of this new development, my strategy of providing both
*_avx10_256 and *_avx10_512 functions didn't turn out to be that useful.
The only remaining motivation for the 256-bit AVX512 / AVX10 functions
is to avoid downclocking on older Intel CPUs. But in the case of
AES-XTS and AES-CTR, I already wrote *_avx2 code too (primarily to
support CPUs without AVX512), which performs almost as well as
*_avx10_256. So we should just use that.
Therefore, remove the *_avx10_256 AES-XTS and AES-CTR functions and
algorithms, and rename the *_avx10_512 AES-XTS and AES-CTR functions and
algorithms to *_avx512. Make Ice Lake and Tiger Lake use *_avx2 instead
of *_avx10_256 which they previously used.
I've left AES-GCM unchanged for now. There is no VAES+AVX2 optimized
AES-GCM in the kernel yet, so the path forward for that is not as clear.
However, I did write a VAES+AVX2 optimized AES-GCM for BoringSSL. So
one option is to port that to the kernel and then do the same cleanup.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
startup/
Linus expressed a strong preference for arch-specific asm code (i.e.,
virtually all of it) to reside under arch/ rather than anywhere else.
So move the EFI mixed mode startup code back, and put it under
arch/x86/boot/startup/ where all shared x86 startup code is going to
live.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20250401133416.1436741-11-ardb+git@google.com
|
|
The 5-level paging trampoline is used by both the EFI stub and the
traditional decompressor. Move it out of the decompressor sources into
the newly minted arch/x86/boot/startup/ sub-directory which will hold
startup code that may be shared between the decompressor, the EFI stub
and the kernel proper, and needs to tolerate being called during early
boot, before the kernel virtual mapping has been created.
This will allow the 5-level paging trampoline to be used by EFI boot
images such as zboot that omit the traditional decompressor entirely.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250401133416.1436741-10-ardb+git@google.com
|
|
Merge the local include "pgtable.h" -which declares the API of the
5-level paging trampoline- into <asm/boot.h> so that its implementation
in la57toggle.S as well as the calling code can be decoupled from the
traditional decompressor.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250401133416.1436741-9-ardb+git@google.com
|
|
LOAD_PHYSICAL_ADDR is calculated as an aligned (up) CONFIG_PHYSICAL_START
with the respective alignment value CONFIG_PHYSICAL_ALIGN. However,
the code is written openly while we have __ALIGN_KERNEL_MASK() macro
that does the same. This macro has nothing special, that's why
it may be used in assembler code or linker scripts (on the contrary
__ALIGN_KERNEL() may not). Do it so.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250404165303.3657139-1-andriy.shevchenko@linux.intel.com
|
|
Add some missing dependencies to the CPUID dependency table:
- All the AMX features depend on AMX_TILE
- All the SPEC_CTRL features depend on SPEC_CTRL
[ mingo: Keep the AMX part of the table grouped ... ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ahmed S. Darwish <darwi@linutronix.de>
Link: https://lore.kernel.org/r/20240924170128.2611854-1-ak@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A set of final cleanups for the timer subsystem:
- Convert all del_timer[_sync]() instances over to the new
timer_delete[_sync]() API and remove the legacy wrappers.
Conversion was done with coccinelle plus some manual fixups as
coccinelle chokes on scoped_guard().
- The final cleanup of the hrtimer_init() to hrtimer_setup()
conversion.
This has been delayed to the end of the merge window, so that all
patches which have been merged through other trees are in mainline
and all new users are catched.
Doing this right before rc1 ensures that new code which is merged post
rc1 is not introducing new instances of the original functionality"
* tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/timers: Rename the hrtimer_init event to hrtimer_setup
hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
hrtimers: Rename debug_init() to debug_setup()
hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
hrtimers: Make callback function pointer private
hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
hrtimers: Switch to use __htimer_setup()
hrtimers: Delete hrtimer_init()
treewide: Convert new and leftover hrtimer_init() users
treewide: Switch/rename to timer_delete[_sync]()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more irq updates from Thomas Gleixner:
"A set of updates for the interrupt subsystem:
- A treewide cleanup for the irq_domain code, which makes the naming
consistent and gets rid of the original oddity of naming domains
'host'.
This is a trivial mechanical change and is done late to ensure that
all instances have been catched and new code merged post rc1 wont
reintroduce new instances.
- A trivial consistency fix in the migration code
The recent introduction of irq_force_complete_move() in the core
code, causes a problem for the nostalgia crowd who maintains ia64
out of tree.
The code assumes that hierarchical interrupt domains are enabled
and dereferences irq_data::parent_data unconditionally. That works
in mainline because both architectures which enable that code have
hierarchical domains enabled. Though it breaks the ia64 build,
which enables the functionality, but does not have hierarchical
domains.
While it's not really a problem for mainline today, this
unconditional dereference is inconsistent and trivially fixable by
using the existing helper function irqd_get_parent_data(), which
has the appropriate #ifdeffery in place"
* tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()
irqdomain: Stop using 'host' for domain
irqdomain: Rename irq_get_default_host() to irq_get_default_domain()
irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Improve performance in gendwarfksyms
- Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS
- Support CONFIG_HEADERS_INSTALL for ARCH=um
- Use more relative paths to sources files for better reproducibility
- Support the loong64 Debian architecture
- Add Kbuild bash completion
- Introduce intermediate vmlinux.unstripped for architectures that need
static relocations to be stripped from the final vmlinux
- Fix versioning in Debian packages for -rc releases
- Treat missing MODULE_DESCRIPTION() as an error
- Convert Nios2 Makefiles to use the generic rule for built-in DTB
- Add debuginfo support to the RPM package
* tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
kbuild: rpm-pkg: build a debuginfo RPM
kconfig: merge_config: use an empty file as initfile
nios2: migrate to the generic rule for built-in DTB
rust: kbuild: skip `--remap-path-prefix` for `rustdoc`
kbuild: pacman-pkg: hardcode module installation path
kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally
modpost: require a MODULE_DESCRIPTION()
kbuild: make all file references relative to source root
x86: drop unnecessary prefix map configuration
kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS
kbuild: Add a help message for "headers"
kbuild: deb-pkg: remove "version" variable in mkdebian
kbuild: deb-pkg: fix versioning for -rc releases
Documentation/kbuild: Fix indentation in modules.rst example
x86: Get rid of Makefile.postlink
kbuild: Create intermediate vmlinux build with relocations preserved
kbuild: Introduce Kconfig symbol for linking vmlinux with relocations
kbuild: link-vmlinux.sh: Make output file name configurable
kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y
Revert "kheaders: Ignore silly-rename files"
...
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Naming interrupt domains host is confusing at best and the irqdomain code
uses both domain and host inconsistently.
Therefore rename irq_set_default_host() to irq_set_default_domain().
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-3-jirislaby@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix a performance regression on AMD iGPU and dGPU drivers, related to
the unintended activation of DMA bounce buffers that regressed game
performance if KASLR disturbed things just enough
- Fix a copy_user_generic() performance regression on certain older
non-FSRM/ERMS CPUs
- Fix a Clang build warning due to a semantic merge conflict the Kunit
tree generated with the x86 tree
- Fix FRED related system hang during S4 resume
- Remove an unused API
* tag 'x86-urgent-2025-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fred: Fix system hang during S4 resume with FRED enabled
x86/platform/iosf_mbi: Remove unused iosf_mbi_unregister_pmic_bus_access_notifier()
x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers
x86/tools: Drop duplicate unlikely() definition in insn_decoder_test.c
x86/uaccess: Improve performance by aligning writes to 8 bytes in copy_user_generic(), on non-FSRM/ERMS CPUs
|
|
|
|
Use a separate subclass when acquiring KVM's per-CPU posted interrupts
wakeup lock in the scheduled out path, i.e. when adding a vCPU on the list
of vCPUs to wake, to workaround a false positive deadlock.
Chain exists of:
&p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu)
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
lock(&rq->__lock);
lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
lock(&p->pi_lock);
*** DEADLOCK ***
In the wakeup handler, the callchain is *always*:
sysvec_kvm_posted_intr_wakeup_ipi()
|
--> pi_wakeup_handler()
|
--> kvm_vcpu_wake_up()
|
--> try_to_wake_up(),
and the lock order is:
&per_cpu(wakeup_vcpus_on_cpu_lock, cpu) --> &p->pi_lock.
For the schedule out path, the callchain is always (for all intents and
purposes; if the kernel is preemptible, kvm_sched_out() can be called from
something other than schedule(), but the beginning of the callchain will
be the same point in vcpu_block()):
vcpu_block()
|
--> schedule()
|
--> kvm_sched_out()
|
--> vmx_vcpu_put()
|
--> vmx_vcpu_pi_put()
|
--> pi_enable_wakeup_handler()
and the lock order is:
&rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu)
I.e. lockdep sees AB+BC ordering for schedule out, and CA ordering for
wakeup, and complains about the A=>C versus C=>A inversion. In practice,
deadlock can't occur between schedule out and the wakeup handler as they
are mutually exclusive. The entirely of the schedule out code that runs
with the problematic scheduler locks held, does so with IRQs disabled,
i.e. can't run concurrently with the wakeup handler.
Use a subclass instead disabling lockdep entirely, and tell lockdep that
both subclasses are being acquired when loading a vCPU, as the sched_out
and sched_in paths are NOT mutually exclusive, e.g.
CPU 0 CPU 1
--------------- ---------------
vCPU0 sched_out
vCPU1 sched_in
vCPU1 sched_out vCPU 0 sched_in
where vCPU0's sched_in may race with vCPU1's sched_out, on CPU 0's wakeup
list+lock.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-ID: <20250401154727.835231-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Assert that IRQs are already disabled when putting a vCPU on a CPU's PI
wakeup list, as opposed to saving/disabling+restoring IRQs. KVM relies on
IRQs being disabled until the vCPU task is fully scheduled out, i.e. until
the scheduler has dropped all of its per-CPU locks (e.g. for the runqueue),
as attempting to wake the task while it's being scheduled out could lead
to deadlock.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20250401154727.835231-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Explicitly zero/empty-initialize the unions used for PMU related CPUID
entries, instead of manually zeroing all fields (hopefully), or in the
case of 0x80000022, relying on the compiler to clobber the uninitialized
bitfields.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-ID: <20250315024102.2361628-1-seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Wrap the TDP MMU page counter in CONFIG_KVM_PROVE_MMU so that the sanity
check is omitted from production builds, and more importantly to remove
the atomic accesses to account pages. A one-off memory leak in production
is relatively uninteresting, and a WARN_ON won't help mitigate a systemic
issue; it's as much about helping triage memory leaks as it is about
detecting them in the first place, and doesn't magically stop the leaks.
I.e. production environments will be quite sad if a severe KVM bug escapes,
regardless of whether or not KVM WARNs.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250315023448.2358456-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Acquire a lock on kvm->srcu when userspace is getting MP state to handle a
rather extreme edge case where "accepting" APIC events, i.e. processing
pending INIT or SIPI, can trigger accesses to guest memory. If the vCPU
is in L2 with INIT *and* a TRIPLE_FAULT request pending, then getting MP
state will trigger a nested VM-Exit by way of ->check_nested_events(), and
emuating the nested VM-Exit can access guest memory.
The splat was originally hit by syzkaller on a Google-internal kernel, and
reproduced on an upstream kernel by hacking the triple_fault_event_test
selftest to stuff a pending INIT, store an MSR on VM-Exit (to generate a
memory access on VMX), and do vcpu_mp_state_get() to trigger the scenario.
=============================
WARNING: suspicious RCU usage
6.14.0-rc3-b112d356288b-vmx/pi_lockdep_false_pos-lock #3 Not tainted
-----------------------------
include/linux/kvm_host.h:1058 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by triple_fault_ev/1256:
#0: ffff88810df5a330 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x8b/0x9a0 [kvm]
stack backtrace:
CPU: 11 UID: 1000 PID: 1256 Comm: triple_fault_ev Not tainted 6.14.0-rc3-b112d356288b-vmx #3
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0x90
lockdep_rcu_suspicious+0x144/0x190
kvm_vcpu_gfn_to_memslot+0x156/0x180 [kvm]
kvm_vcpu_read_guest+0x3e/0x90 [kvm]
read_and_check_msr_entry+0x2e/0x180 [kvm_intel]
__nested_vmx_vmexit+0x550/0xde0 [kvm_intel]
kvm_check_nested_events+0x1b/0x30 [kvm]
kvm_apic_accept_events+0x33/0x100 [kvm]
kvm_arch_vcpu_ioctl_get_mpstate+0x30/0x1d0 [kvm]
kvm_vcpu_ioctl+0x33e/0x9a0 [kvm]
__x64_sys_ioctl+0x8b/0xb0
do_syscall_64+0x6c/0x170
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250401150504.829812-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
Rapoport fixes a couple of issues with the just-merged "arch, mm:
reduce code duplication in mem_init()" series
- The series "MAINTAINERS: add my isub-entries to MM part." from Mike
Rapoport does some maintenance on MAINTAINERS
- The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
cleanup work to the page mapping code
- The series "mseal system mappings" from Jeff Xu permits sealing of
"system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
compat-mode), sigpage (arm compat-mode)
- Plus the usual shower of singleton patches
* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
mseal sysmap: add arch-support txt
mseal sysmap: enable s390
selftest: test system mappings are sealed
mseal sysmap: update mseal.rst
mseal sysmap: uprobe mapping
mseal sysmap: enable arm64
mseal sysmap: enable x86-64
mseal sysmap: generic vdso vvar mapping
selftests: x86: test_mremap_vdso: skip if vdso is msealed
mseal sysmap: kernel config and header change
mm: pgtable: remove tlb_remove_page_ptdesc()
x86: pgtable: convert to use tlb_remove_ptdesc()
riscv: pgtable: unconditionally use tlb_remove_ptdesc()
mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
mm: pgtable: make generic tlb_remove_table() use struct ptdesc
microblaze/mm: put mm_cmdline_setup() in .init.text section
mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
MAINTAINERS: mm: add entry for secretmem
MAINTAINERS: mm: add entry for numa memblocks and numa emulation
...
|
|
Current minimum required version of binutils is 2.25,
which supports MONITOR and MWAIT instruction mnemonics.
Replace the byte-wise specification of MONITOR and
MWAIT with these proper mnemonics.
No functional change intended.
Note: LLVM assembler is not able to assemble correct forms of MONITOR
and MWAIT instructions with explicit operands and reports:
error: invalid operand for instruction
monitor %rax,%ecx,%edx
^~~~
# https://lore.kernel.org/oe-kbuild-all/202504030802.2lEVBSpN-lkp@intel.com/
Use instruction mnemonics with implicit operands to
work around this issue.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250403125111.429805-1-ubizjak@gmail.com
|
|
All functions in mwait_idle_with_hints() cast eax and ecx arguments
to u32. Propagate argument type to the enclosing function.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250403073105.245987-1-ubizjak@gmail.com
|
|
Have it return the two things it does return:
- a new ASID and
- the need to flush the TLB or not,
in a struct which fits in a single 32-bit register and whack the IO
parameters.
Beyond being easier to read, this also helps the compiler generate
better, more compact code:
# arch/x86/mm/tlb.o:
text data bss dec hex filename
9341 753 516 10610 2972 tlb.o.before
9213 753 516 10482 28f2 tlb.o.after
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250403085623.20824-1-bp@kernel.org
|
|
There is not much point in CONFIG_AS_TPAUSE at all when the emitted
assembly is always the same - it only obfuscates the __tpause() code
in essence.
Remove the TPAUSE insn mnemonic from __tpause() and leave only
the equivalent byte-wise definition. This can then be changed
back to insn mnemonic once binutils 2.31.1 is the minimum version
to build the kernel. (Right now it's 2.25.)
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402180827.3762-4-ubizjak@gmail.com
|
|
Delimiters in asm() templates such as ';', '\t' or '\n' are not
required syntactically, they were used historically in the Linux
kernel to prettify the compiler's .s output for people who were
looking at compiler generated .s output.
Most x86 developers these days are primarily looking at:
1) objdump --disassemble-all .o
2) perf top's live kernel function annotation and disassembler
feature that uses /dev/mem.
... because:
- this kind of assembler output is standardized regardless of
compiler used,
- it's generally less messy looking,
- it gives ground-truth instead of being some intermediate layer
in the toolchain that might or might not be the real deal,
- and on a live kernel it also sees through the kernel's various
layers of runtime patching code obfuscation facilities, also
known as: alternative-instructions, tracepoints and jump labels.
There are some cases where the .s output is the most useful
tool, such as alternatives() code generation, but other than
that these delimiters used in simple asm() statements mostly
add noise to the source code side, which isn't desirable for
assembly code that is fragile enough already.
Remove the delimiters for <asm/mwait.h>, which also happens to
make the GCC inliner's asm() instruction length heuristics
more accurate...
[ mingo: Wrote a new changelog to give historic context and
to give people a chance to object. :-) ]
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402180827.3762-3-ubizjak@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) updates from Dave Jiang:
- Add support for Global Persistent Flush (GPF)
- Cleanup of DPA partition metadata handling:
- Remove the CXL_DECODER_MIXED enum that's not needed anymore
- Introduce helpers to access resource and perf meta data
- Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
- Make cxl_dpa_alloc() DPA partition number agnostic
- Remove cxl_decoder_mode
- Cleanup partition size and perf helpers
- Remove unused CXL partition values
- Add logging support for CXL CPER endpoint and port protocol errors:
- Prefix protocol error struct and function names with cxl_
- Move protocol error definitions and structures to a common location
- Remove drivers/firmware/efi/cper_cxl.h to include/linux/cper.h
- Add support in GHES to process CXL CPER protocol errors
- Process CXL CPER protocol errors
- Add trace logging for CXL PCIe port RAS errors
- Remove redundant gp_port init
- Add validation of cxl device serial number
- CXL ABI documentation updates/fixups
- A series that uses guard() to clean up open coded mutex lockings and
remove gotos for error handling.
- Some followup patches to support dirty shutdown accounting:
- Add helper to retrieve DVSEC offset for dirty shutdown registers
- Rename cxl_get_dirty_shutdown() to cxl_arm_dirty_shutdown()
- Add support for dirty shutdown count via sysfs
- cxl_test support for dirty shutdown
- A series to support CXL mailbox Features commands.
Mostly in preparation for CXL EDAC code to utilize the Features
commands. It's also in preparation for CXL fwctl support to utilize
the CXL Features. The commands include "Get Supported Features", "Get
Feature", and "Set Feature".
- A series to support extended linear cache support described by the
ACPI HMAT table.
The addition helps enumerate the cache and also provides additional
RAS reporting support for configuration with extended linear cache.
(and related fixes for the series).
- An update to cxl_test to support a 3-way capable CFMWS
- A documentation fix to remove unused "mixed mode"
* tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (39 commits)
cxl/region: Fix the first aliased address miscalculation
cxl/region: Quiet some dev_warn()s in extended linear cache setup
cxl/Documentation: Remove 'mixed' from sysfs mode doc
cxl: Fix warning from emitting resource_size_t as long long int on 32bit systems
cxl/test: Define a CFMWS capable of a 3 way HB interleave
cxl/mem: Do not return error if CONFIG_CXL_MCE unset
tools/testing/cxl: Set Shutdown State support
cxl/pmem: Export dirty shutdown count via sysfs
cxl/pmem: Rename cxl_dirty_shutdown_state()
cxl/pci: Introduce cxl_gpf_get_dvsec()
cxl/pci: Support Global Persistent Flush (GPF)
cxl: Document missing sysfs files
cxl: Plug typos in ABI doc
cxl/pmem: debug invalid serial number data
cxl/cdat: Remove redundant gp_port initialization
cxl/memdev: Remove unused partition values
cxl/region: Drop goto pattern of construct_region()
cxl/region: Drop goto pattern in cxl_dax_region_alloc()
cxl/core: Use guard() to drop goto pattern of cxl_dpa_alloc()
cxl/core: Use guard() to drop the goto pattern of cxl_dpa_free()
...
|
|
instruction wrappers on 'u32'
MONITOR and MONITORX expect 32-bit unsigned integer arguments in the %ecx
and %edx registers. MWAIT and MWAITX expect 32-bit usigned int
argument in %eax and %ecx registers.
Some of the helpers around these instructions in <asm/mwait.h> are using
too wide types (long), standardize on u32 instead that makes it clear that
this is a hardware ABI.
[ mingo: Cleaned up the changelog. ]
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402180827.3762-1-ubizjak@gmail.com
|
|
mwait_idle_with_hints() and prefer_mwait_c1_over_halt()
The following commit, 12 years ago:
7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
added barriers around the CLFLUSH in mwait_idle_with_hints(), justified with:
... and add memory barriers around it since the documentation is explicit
that CLFLUSH is only ordered with respect to MFENCE.
This also triggered, 11 years ago, the same adjustment in:
f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs")
during development, although it failed to get the static_cpu_has_bug() treatment.
X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel CPUs,
and the SDM currently states:
Executions of the CLFLUSH instruction are ordered with respect to each
other and with respect to writes, locked read-modify-write instructions,
and fence instructions[1].
With footnote 1 reading:
Earlier versions of this manual specified that executions of the CLFLUSH
instruction were ordered only by the MFENCE instruction. All processors
implementing the CLFLUSH instruction also order it relative to the other
operations enumerated above.
i.e. The SDM was incorrect at the time, and barriers should not have been
inserted. Double checking the original AAI65 errata (not available from
intel.com any more) shows no mention of barriers either.
Note: If this were a general codepath, the MFENCEs would be needed, because
AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs.
Remove the unnecessary barriers. Furthermore, use a plain alternative(),
rather than static_cpu_has_bug() and/or no optimisation. The workaround
is a single instruction.
Use an explicit %rax pointer rather than a general memory operand, because
MONITOR takes the pointer implicitly in the same way.
[ mingo: Cleaned up the commit a bit. ]
Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20250402172458.1378112-1-andrew.cooper3@citrix.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Johannes Berg:
- proper nofault accesses and read-only rodata
- hostfs fix for host inode number reuse
- fixes for host errno handling
- various cleanups/small fixes
* tag 'uml-for-linux-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: Rewrite the sigio workaround based on epoll and tgkill
um: Prohibit the VM_CLONE flag in run_helper_thread()
um: Switch to the pthread-based helper in sigio workaround
um: ubd: Switch to the pthread-based helper
um: Add pthread-based helper support
um: x86: clean up elf specific definitions
um: Store full CSGSFS and SS register from mcontext
um: virt-pci: Refactor virtio_pcidev into its own module
um: work around sched_yield not yielding in time-travel mode
um/locking: Remove semicolon from "lock" prefix
um: Update min_low_pfn to match changes in uml_reserved
um: use str_yes_no() to remove hardcoded "yes" and "no"
um: hostfs: avoid issues on inode number reuse by host
um: Allocate vdso page pointer statically
um: remove copy_from_kernel_nofault_allowed
um: mark rodata read-only and implement _nofault accesses
um: Pass the correct Rust target and options with gcc
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TDX updates from Dave Hansen:
"Avoid direct HLT instruction execution in TDX guests.
TDX guests aren't expected to use the HLT instruction directly. It
causes a virtualization exception (#VE). While the #VE _can_ be
handled, the current handling is slow and buggy and the easiest thing
is just to avoid HLT in the first place. Plus, the kernel already has
paravirt infrastructure that makes it relatively painless.
Make TDX guests require paravirt and add some TDX-specific paravirt
handlers which avoid HLT in the normal halt routines. Also add a
warning in case another HLT sneaks in.
There was a report that this leads to a "major performance
improvement" on specjbb2015, probably because of the extra #VE
overhead or missed wakeups from the buggy HLT handling"
* tag 'x86_tdx_for_6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tdx: Emit warning if IRQs are enabled during HLT #VE handling
x86/tdx: Fix arch_safe_halt() execution for TDX VMs
x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"These are objtool fixes and updates by Josh Poimboeuf, centered around
the fallout from the new CONFIG_OBJTOOL_WERROR=y feature, which,
despite its default-off nature, increased the profile/impact of
objtool warnings:
- Improve error handling and the presentation of warnings/errors
- Revert the new summary warning line that some test-bot tools
interpreted as new regressions
- Fix a number of objtool warnings in various drivers, core kernel
code and architecture code. About half of them are potential
problems related to out-of-bounds accesses or potential undefined
behavior, the other half are additional objtool annotations
- Update objtool to latest (known) compiler quirks and objtool bugs
triggered by compiler code generation
- Misc fixes"
* tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
objtool/loongarch: Add unwind hints in prepare_frametrace()
rcu-tasks: Always inline rcu_irq_work_resched()
context_tracking: Always inline ct_{nmi,irq}_{enter,exit}()
sched/smt: Always inline sched_smt_active()
objtool: Fix verbose disassembly if CROSS_COMPILE isn't set
objtool: Change "warning:" to "error: " for fatal errors
objtool: Always fail on fatal errors
Revert "objtool: Increase per-function WARN_FUNC() rate limit"
objtool: Append "()" to function name in "unexpected end of section" warning
objtool: Ignore end-of-section jumps for KCOV/GCOV
objtool: Silence more KCOV warnings, part 2
objtool, drm/vmwgfx: Don't ignore vmw_send_msg() for ORC
objtool: Fix STACK_FRAME_NON_STANDARD for cold subfunctions
objtool: Fix segfault in ignore_unreachable_insn()
objtool: Fix NULL printf() '%s' argument in builtin-check.c:save_argv()
objtool, lkdtm: Obfuscate the do_nothing() pointer
objtool, regulator: rk808: Remove potential undefined behavior in rk806_set_mode_dcdc()
objtool, ASoC: codecs: wcd934x: Remove potential undefined behavior in wcd934x_slim_irq_handler()
objtool, Input: cyapa - Remove undefined behavior in cyapa_update_fw_store()
objtool, panic: Disable SMAP in __stack_chk_fail()
...
|
|
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on x86-64, covering the
vdso, vvar, vvar_vclock.
Production release testing passes on Android and Chrome OS.
Link: https://lkml.kernel.org/r/20250305021711.3867874-4-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The x86 has already been converted to use struct ptdesc, so convert it to
use tlb_remove_ptdesc() instead of tlb_remove_table().
Link: https://lkml.kernel.org/r/36ad56b7e06fa4b17fb23c4fc650e8e0d72bb3cd.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The prefetchw() dates back decades and the fundamental notion of doing
something like this on a lock is shady.
Moreover, for a few years now in the fast path faults are handled with RCU
+ per-vma locking, hopefully not even looking at the lock to begin with.
As such just remove it.
I did not see a point benchmarking this. Given that it is not expected
to be looked at by default justifies not doing the prefetch.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250401143520.1113572-1-mjguzik@gmail.com
|
|
The functions return bool, simplify the checks.
[ mingo: Split off from two other patches. ]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250331081327.256412-6-bhe@redhat.com
|
|
P4D huge pages are not supported yet, let's use the generic definition
in <linux/pgtable.h>.
[ mingo: Cleaned up the changelog. ]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Link: https://lore.kernel.org/r/20250331081327.256412-7-bhe@redhat.com
|
|
PGD huge pages are not supported yet, let's use the generic definition
in <linux/pgtable.h>.
[ mingo: Cleaned up the changelog. ]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Link: https://lore.kernel.org/r/20250331081327.256412-6-bhe@redhat.com
|