summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2025-02-22crypto: x86/ghash - Use proper helpers to clone requestHerbert Xu1-6/+17
Rather than copying a request by hand with memcpy, use the correct API helpers to setup the new request. This will matter once the API helpers start setting up chained requests as a simple memcpy will break chaining. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22crypto: lib/Kconfig - Fix lib built-in failure when arch is modularHerbert Xu1-3/+3
The HAVE_ARCH Kconfig options in lib/crypto try to solve the modular versus built-in problem, but it still fails when the the LIB option (e.g., CRYPTO_LIB_CURVE25519) is selected externally. Fix this by introducing a level of indirection with ARCH_MAY_HAVE Kconfig options, these then go on to select the ARCH_HAVE options if the ARCH Kconfig options matches that of the LIB option. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501230223.ikroNDr1-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22crypto: x86/aes-xts - change license to Apache-2.0 OR BSD-2-ClauseEric Biggers1-8/+47
As with the other AES modes I've implemented, I've received interest in my AES-XTS assembly code being reused in other projects. Therefore, change the license to Apache-2.0 OR BSD-2-Clause like what I used for AES-GCM. Apache-2.0 is the license of OpenSSL and BoringSSL. Note that it is difficult to *directly* share code between the kernel, OpenSSL, and BoringSSL for various reasons such as perlasm vs. plain asm, Windows ABI support, different divisions of responsibility between C and asm in each project, etc. So whether that will happen instead of just doing ports is still TBD. But this dual license should at least make it possible to port changes between the projects. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22crypto: x86/aes-ctr - rewrite AESNI+AVX optimized CTR and add VAES supportEric Biggers4-792/+803
Delete aes_ctrby8_avx-x86_64.S and add a new assembly file aes-ctr-avx-x86_64.S which follows a similar approach to aes-xts-avx-x86_64.S in that it uses a "template" to provide AESNI+AVX, VAES+AVX2, VAES+AVX10/256, and VAES+AVX10/512 code, instead of just AESNI+AVX. Wire it up to the crypto API accordingly. This greatly improves the performance of AES-CTR and AES-XCTR on VAES-capable CPUs, with the best case being AMD Zen 5 where an over 230% increase in throughput is seen on long messages. Performance on non-VAES-capable CPUs remains about the same, and the non-AVX AES-CTR code (aesni_ctr_enc) is also kept as-is for now. There are some slight regressions (less than 10%) on some short message lengths on some CPUs; these are difficult to avoid, given how the previous code was so heavily unrolled by message length, and they are not particularly important. Detailed performance results are given in the tables below. Both CTR and XCTR support is retained. The main loop remains 8-vector-wide, which differs from the 4-vector-wide main loops that are used in the XTS and GCM code. A wider loop is appropriate for CTR and XCTR since they have fewer other instructions (such as vpclmulqdq) to interleave with the AES instructions. Similar to what was the case for AES-GCM, the new assembly code also has a much smaller binary size, as it fixes the excessive unrolling by data length and key length present in the old code. Specifically, the new assembly file compiles to about 9 KB of text vs. 28 KB for the old file. This is despite 4x as many implementations being included. The tables below show the detailed performance results. The tables show percentage improvement in single-threaded throughput for repeated encryption of the given message length; an increase from 6000 MB/s to 12000 MB/s would be listed as 100%. They were collected by directly measuring the Linux crypto API performance using a custom kernel module. The tested CPUs were all server processors from Google Compute Engine except for Zen 5 which was a Ryzen 9 9950X desktop processor. Table 1: AES-256-CTR throughput improvement, CPU microarchitecture vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ---------------------+-------+-------+-------+-------+-------+-------+ AMD Zen 5 | 232% | 203% | 212% | 143% | 71% | 95% | Intel Emerald Rapids | 116% | 116% | 117% | 91% | 78% | 79% | Intel Ice Lake | 109% | 103% | 107% | 81% | 54% | 56% | AMD Zen 4 | 109% | 91% | 100% | 70% | 43% | 59% | AMD Zen 3 | 92% | 78% | 87% | 57% | 32% | 43% | AMD Zen 2 | 9% | 8% | 14% | 12% | 8% | 21% | Intel Skylake | 7% | 7% | 8% | 5% | 3% | 8% | | 300 | 200 | 64 | 63 | 16 | ---------------------+-------+-------+-------+-------+-------+ AMD Zen 5 | 57% | 39% | -9% | 7% | -7% | Intel Emerald Rapids | 37% | 42% | -0% | 13% | -8% | Intel Ice Lake | 39% | 30% | -1% | 14% | -9% | AMD Zen 4 | 42% | 38% | -0% | 18% | -3% | AMD Zen 3 | 38% | 35% | 6% | 31% | 5% | AMD Zen 2 | 24% | 23% | 5% | 30% | 3% | Intel Skylake | 9% | 1% | -4% | 10% | -7% | Table 2: AES-256-XCTR throughput improvement, CPU microarchitecture vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ---------------------+-------+-------+-------+-------+-------+-------+ AMD Zen 5 | 240% | 201% | 216% | 151% | 75% | 108% | Intel Emerald Rapids | 100% | 99% | 102% | 91% | 94% | 104% | Intel Ice Lake | 93% | 89% | 92% | 74% | 50% | 64% | AMD Zen 4 | 86% | 75% | 83% | 60% | 41% | 52% | AMD Zen 3 | 73% | 63% | 69% | 45% | 21% | 33% | AMD Zen 2 | -2% | -2% | 2% | 3% | -1% | 11% | Intel Skylake | -1% | -1% | 1% | 2% | -1% | 9% | | 300 | 200 | 64 | 63 | 16 | ---------------------+-------+-------+-------+-------+-------+ AMD Zen 5 | 78% | 56% | -4% | 38% | -2% | Intel Emerald Rapids | 61% | 55% | 4% | 32% | -5% | Intel Ice Lake | 57% | 42% | 3% | 44% | -4% | AMD Zen 4 | 35% | 28% | -1% | 17% | -3% | AMD Zen 3 | 26% | 23% | -3% | 11% | -6% | AMD Zen 2 | 13% | 24% | -1% | 14% | -3% | Intel Skylake | 16% | 8% | -4% | 35% | -3% | Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22hyperv: Change hv_root_partition into a functionNuno Das Neves2-27/+7
Introduce hv_curr_partition_type to store the partition type as an enum. Right now this is limited to guest or root partition, but there will be other kinds in future and the enum is easily extensible. Set up hv_curr_partition_type early in Hyper-V initialization with hv_identify_partition_type(). hv_root_partition() just queries this value, and shouldn't be called before that. Making this check into a function sets the stage for adding a config option to gate the compilation of root partition code. In particular, hv_root_partition() can be stubbed out always be false if root partition support isn't desired. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com>
2025-02-22x86/arch_prctl/64: Clean up ARCH_MAP_VDSO_32Brian Gerst1-1/+1
process_64.c is not built on native 32-bit, so CONFIG_X86_32 will never be set. No change in functionality intended. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250202202323.422113-3-brgerst@gmail.com
2025-02-22x86/arch_prctl: Simplify sys_arch_prctl()Brian Gerst5-27/+6
Use in_ia32_syscall() instead of a compat syscall entry. No change in functionality intended. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250202202323.422113-2-brgerst@gmail.com
2025-02-21x86/efi/mixed: Move mixed mode startup code into libstubArd Biesheuvel2-254/+0
The EFI mixed mode code has been decoupled from the legacy decompressor, in order to be able to reuse it with generic EFI zboot images for x86. Move the source file into the libstub source directory to facilitate this. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/efi/mixed: Simplify and document thunking logicArd Biesheuvel1-40/+37
Now that the GDT/IDT and data segment selector preserve/restore logic has been removed from the boot-time EFI mixed mode thunking routines, the remaining logic to handle the function arguments can be simplified: the setup of the arguments on the stack can be moved into the 32-bit callee, which is able to use a more idiomatic sequence of PUSH instructions. This, in turn, allows the far call and far return to be issued using plain LCALL and LRET instructions, removing the need to set up the return explicitly. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/efi/mixed: Remove dependency on legacy startup_32 codeArd Biesheuvel2-169/+65
The EFI mixed mode startup code calls into startup_32 in the legacy decompressor with a mocked up boot_params struct, only to get it to set up the 1:1 mapping of the lower 4 GiB of memory and switch to a GDT that supports 64-bit mode. In order to be able to reuse the EFI mixed mode startup code in EFI zboot images, which do not incorporate the legacy decompressor code, decouple it, by dealing with the GDT and IDT directly. Doing so makes it possible to construct a GDT that is compatible with the one the firmware uses, with one additional entry for a 64-bit mode code segment appended. This removes the need entirely to switch between GDTs and IDTs or data segment selector values and all of this code can be removed. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/efi/mixed: Set up 1:1 mapping of lower 4GiB in the stubArd Biesheuvel1-0/+29
In preparation for dropping the dependency on startup_32 entirely in the next patch, add the code that sets up the 1:1 mapping of the lower 4 GiB of system RAM to the mixed mode stub. The reload of CR3 after the long mode switch will be removed in a subsequent patch, when it is no longer needed. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/apic: Use str_disabled_enabled() helper in print_ipi_mode()Thorsten Blum1-1/+2
Remove hard-coded strings by using the str_disabled_enabled() helper. No change in functionality intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250209210333.5666-2-thorsten.blum@linux.dev
2025-02-21x86/efi/mixed: Factor out and clean up long mode entryArd Biesheuvel1-14/+15
Entering long mode involves setting the EFER_LME and CR4.PAE bits before enabling paging by setting CR0.PG bit. It also involves disabling interrupts, given that the firmware's 32-bit IDT becomes invalid as soon as the CPU transitions into long mode. Reloading the CR3 register is not necessary at boot time, given that the EFI firmware as well as the kernel's EFI stub use a 1:1 mapping of the 32-bit addressable memory in the system. Break out this code into a separate helper for clarity, and so that it can be reused in a subsequent patch. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/efi/mixed: Check CPU compatibility without relying on verify_cpu()Ard Biesheuvel1-13/+9
In order for the EFI mixed mode startup code to be reusable in a context where the legacy decompressor is not used, replace the call to verify_cpu() [which performs an elaborate set of checks] with a simple check against the 'long mode' bit in the appropriate CPUID leaf. This is reasonable, given that EFI support is implied when booting in this manner, and so there is no need to consider very old CPUs when performing this check. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/efistub: Merge PE and handover entrypointsArd Biesheuvel1-15/+1
The difference between the PE and handover entrypoints in the EFI stub is that the former allocates a struct boot_params whereas the latter expects one from the caller. Currently, these are two completely separate entrypoints, duplicating some logic and both relying of efi_exit() to return straight back to the firmware on an error. Simplify this by making the PE entrypoint call the handover entrypoint with NULL as the argument for the struct boot_params parameter. This makes the code easier to follow, and removes the need to support two different calling conventions in the mixed mode asm code. While at it, move the assignment of boot_params_ptr into the function that actually calls into the legacy decompressor, which is where its value is required. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/platform/olpc-xo1-sci: Don't include <linux/pm_wakeup.h> directlyWolfram Sang1-1/+0
The header clearly states that it does not want to be included directly, only via <linux/(platform_)?device.h>. Which is already present, so delete the superfluous include. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250210113453.51825-2-wsa+renesas@sang-engineering.com
2025-02-21x86/pat: Fix W=1 build warning when the within_inclusive() function is unusedAndy Shevchenko1-2/+2
The within_inclusive() function, in some cases, when CONFIG_X86_64=n, may be not used. This, in particular, prevents kernel builds with Clang, `make W=1` and CONFIG_WERROR=y: arch/x86/mm/pat/set_memory.c:215:1: error: unused function 'within_inclusive' [-Werror,-Wunused-function] Fix this by guarding the definitions with the respective ifdeffery. See also: 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250211145721.1620552-1-andriy.shevchenko@linux.intel.com
2025-02-21x86/mm: Remove pv_ops.mmu.tlb_remove_table callRik van Riel6-11/+0
Every pv_ops.mmu.tlb_remove_table call ends up calling tlb_remove_table. Get rid of the indirection by simply calling tlb_remove_table directly, and not going through the paravirt function pointers. Suggested-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Manali Shukla <Manali.Shukla@amd.com> Tested-by: Brendan Jackman <jackmanb@google.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250213161423.449435-3-riel@surriel.com
2025-02-21x86/mm: Make MMU_GATHER_RCU_TABLE_FREE unconditionalRik van Riel3-40/+6
Currently x86 uses CONFIG_MMU_GATHER_TABLE_FREE when using paravirt, and not when running on bare metal. There is no real good reason to do things differently for each setup. Make them all the same. Currently get_user_pages_fast synchronizes against page table freeing in two different ways: - on bare metal, by blocking IRQs, which block TLB flush IPIs - on paravirt, with MMU_GATHER_RCU_TABLE_FREE This is done because some paravirt TLB flush implementations handle the TLB flush in the hypervisor, and will do the flush even when the target CPU has interrupts disabled. Always handle page table freeing with MMU_GATHER_RCU_TABLE_FREE. Using RCU synchronization between page table freeing and get_user_pages_fast() allows bare metal to also do TLB flushing while interrupts are disabled. Various places in the mm do still block IRQs or disable preemption as an implicit way to block RCU frees. That makes it safe to use INVLPGB on AMD CPUs. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Manali Shukla <Manali.Shukla@amd.com> Tested-by: Brendan Jackman <jackmanb@google.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250213161423.449435-2-riel@surriel.com
2025-02-21x86/e820: Drop obsolete E820_TYPE_RESERVED_KERN and related codeMike Rapoport (Microsoft)6-83/+4
E820_TYPE_RESERVED_KERN is a relict from the ancient history that was used to early reserve setup_data, see: 28bb22379513 ("x86: move reserve_setup_data to setup.c") Nowadays setup_data is anyway reserved in memblock and there is no point in carrying E820_TYPE_RESERVED_KERN that behaves exactly like E820_TYPE_RAM but only complicates the code. A bonus for removing E820_TYPE_RESERVED_KERN is a small but measurable speedup of 20 microseconds in init_mem_mappings() on a VM with 32GB or RAM. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20250214090651.3331663-5-rppt@kernel.org
2025-02-21x86/boot: Split parsing of boot_params into the parse_boot_params() helper ↵Mike Rapoport (Microsoft)1-31/+41
function Makes setup_arch() a bit easier to comprehend. No functional changes. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250214090651.3331663-4-rppt@kernel.org
2025-02-21x86/boot: Split kernel resources setup into the setup_kernel_resources() ↵Mike Rapoport (Microsoft)1-14/+22
helper function Makes setup_arch() a bit easier to comprehend. No functional changes. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250214090651.3331663-3-rppt@kernel.org
2025-02-21x86/boot: Move setting of memblock parameters to e820__memblock_setup()Mike Rapoport (Microsoft)2-25/+30
Changing memblock parameters, namely bottom_up and allocation upper limit does not have any effect before memblock initialization in e820__memblock_setup(). Move the calls to memblock_set_bottom_up() and memblock_set_current_limit() to e820__memblock_setup() to group all the memblock initial setup and make setup_arch() more readable. No functional changes. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250214090651.3331663-2-rppt@kernel.org
2025-02-21x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulationsUros Bizjak2-54/+55
According to: https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html the usage of asm pseudo directives in the asm template can confuse the compiler to wrongly estimate the size of the generated code. The ALTERNATIVE macro expands to several asm pseudo directives, so its usage in {,try_}cmpxchg{64,128} causes instruction length estimate to fail by an order of magnitude (the specially instrumented compiler reports the estimated length of these asm templates to be more than 20 instructions long). This incorrect estimate further causes unoptimal inlining decisions, unoptimal instruction scheduling and unoptimal code block alignments for functions that use these locking primitives. Use asm_inline instead: https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html which is a feature that makes GCC pretend some inline assembler code is tiny (while it would think it is huge), instead of just asm. For code size estimation, the size of the asm is then taken as the minimum size of one instruction, ignoring how many instructions compiler thinks it is. The effect of this patch on x86_64 target is minor, since 128-bit functions are rarely used on this target. The code size of the resulting defconfig object file stays the same: text data bss dec hex filename 27456612 4638523 814148 32909283 1f627e3 vmlinux-old.o 27456612 4638523 814148 32909283 1f627e3 vmlinux-new.o but the patch has minor effect on code layout due to the different scheduling decisions in functions containing changed macros. There is no effect on the x64_32 target, the code size of the resulting defconfig object file and the code layout stays the same: text data bss dec hex filename 18883870 2679275 1707916 23271061 1631695 vmlinux-old.o 18883870 2679275 1707916 23271061 1631695 vmlinux-new.o Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250214150929.5780-2-ubizjak@gmail.com
2025-02-21x86/locking: Use ALT_OUTPUT_SP() for percpu_{,try_}cmpxchg{64,128}_op()Uros Bizjak1-14/+14
percpu_{,try_}cmpxchg{64,128}() macros use CALL instruction inside asm statement in one of their alternatives. Use ALT_OUTPUT_SP() macro to add required dependence on %esp register. ALT_OUTPUT_SP() implements the above dependence by adding ASM_CALL_CONSTRAINT to its arguments. This constraint should be used for any inline asm which has a CALL instruction, otherwise the compiler may schedule the asm before the frame pointer gets set up by the containing function, causing objtool to print a "call without frame pointer save/setup" warning. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250214150929.5780-1-ubizjak@gmail.com
2025-02-21x86/mm: Replace open-coded gap bounding with clamp()Qasim Ijaz1-8/+1
Rather than manually bounding gap between gap_min and gap_max, use the well-known clamp() macro to make the code easier to read. Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250215125249.10729-1-qasdev00@gmail.com
2025-02-21x86/tsc: Always save/restore TSC sched_clock() on suspend/resumeGuilherme G. Piccoli1-2/+2
TSC could be reset in deep ACPI sleep states, even with invariant TSC. That's the reason we have sched_clock() save/restore functions, to deal with this situation. But what happens is that such functions are guarded with a check for the stability of sched_clock - if not considered stable, the save/restore routines aren't executed. On top of that, we have a clear comment in native_sched_clock() saying that *even* with TSC unstable, we continue using TSC for sched_clock due to its speed. In other words, if we have a situation of TSC getting detected as unstable, it marks the sched_clock as unstable as well, so subsequent S3 sleep cycles could bring bogus sched_clock values due to the lack of the save/restore mechanism, causing warnings like this: [22.954918] ------------[ cut here ]------------ [22.954923] Delta way too big! 18446743750843854390 ts=18446744072977390405 before=322133536015 after=322133536015 write stamp=18446744072977390405 [22.954923] If you just came from a suspend/resume, [22.954923] please switch to the trace global clock: [22.954923] echo global > /sys/kernel/tracing/trace_clock [22.954923] or add trace_clock=global to the kernel command line [22.954937] WARNING: CPU: 2 PID: 5728 at kernel/trace/ring_buffer.c:2890 rb_add_timestamp+0x193/0x1c0 Notice that the above was reproduced even with "trace_clock=global". The fix for that is to _always_ save/restore the sched_clock on suspend cycle _if TSC is used_ as sched_clock - only if we fallback to jiffies the sched_clock_stable() check becomes relevant to save/restore the sched_clock. Debugged-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250215210314.351480-1-gpiccoli@igalia.com
2025-02-21ACPI/processor_idle: Export acpi_processor_ffh_play_dead()Artem Bityutskiy1-0/+1
The kernel test robot reported the following build error: >> ERROR: modpost: "acpi_processor_ffh_play_dead" [drivers/acpi/processor.ko] undefined! Caused by this recently merged commit: 541ddf31e300 ("ACPI/processor_idle: Add FFH state handling") The build failure is due to an oversight in the 'CONFIG_ACPI_PROCESSOR=m' case, the function export is missing. Add it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502151207.FA9UO1iX-lkp@intel.com/ Fixes: 541ddf31e300 ("ACPI/processor_idle: Add FFH state handling") Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/de5bf4f116779efde315782a15146fdc77a4a044.camel@linux.intel.com
2025-02-21x86/mm: Make memremap(MEMREMAP_WB) map memory as encrypted by defaultKirill A. Shutemov2-0/+11
Currently memremap(MEMREMAP_WB) can produce decrypted/shared mapping: memremap(MEMREMAP_WB) arch_memremap_wb() ioremap_cache() __ioremap_caller(.encrytped = false) In such cases, the IORES_MAP_ENCRYPTED flag on the memory will determine if the resulting mapping is encrypted or decrypted. Creating a decrypted mapping without explicit request from the caller is risky: - It can inadvertently expose the guest's data and compromise the guest. - Accessing private memory via shared/decrypted mapping on TDX will either trigger implicit conversion to shared or #VE (depending on VMM implementation). Implicit conversion is destructive: subsequent access to the same memory via private mapping will trigger a hard-to-debug #VE crash. The kernel already provides a way to request decrypted mapping explicitly via the MEMREMAP_DEC flag. Modify memremap(MEMREMAP_WB) to produce encrypted/private mapping by default unless MEMREMAP_DEC is specified or if the kernel runs on a machine with SME enabled. It fixes the crash due to #VE on kexec in TDX guests if CONFIG_EISA is enabled. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20250217163822.343400-3-kirill.shutemov@linux.intel.com
2025-02-21Merge tag 'v6.14-rc3' into x86/mm, to pick up fixes before merging new changesIngo Molnar26-111/+241
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-21Merge branch 'perf/urgent' into perf/core, to pick up fixes before merging ↵Ingo Molnar26-111/+241
new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-21x86/fpu: Fix guest FPU state buffer allocation sizeStanislav Spassov1-1/+1
Ongoing work on an optimization to batch-preallocate vCPU state buffers for KVM revealed a mismatch between the allocation sizes used in fpu_alloc_guest_fpstate() and fpstate_realloc(). While the former allocates a buffer sized to fit the default set of XSAVE features in UABI form (as per fpu_user_cfg), the latter uses its ksize argument derived (for the requested set of features) in the same way as the sizes found in fpu_kernel_cfg, i.e. using the compacted in-kernel representation. The correct size to use for guest FPU state should indeed be the kernel one as seen in fpstate_realloc(). The original issue likely went unnoticed through a combination of UABI size typically being larger than or equal to kernel size, and/or both amounting to the same number of allocated 4K pages. Fixes: 69f6ed1d14c6 ("x86/fpu: Provide infrastructure for KVM FPU cleanup") Signed-off-by: Stanislav Spassov <stanspas@amazon.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250218141045.85201-1-stanspas@amazon.de
2025-02-21x86/module: Remove unnecessary check in module_finalize()Dan Carpenter1-4/+2
The "calls" pointer can no longer be NULL after the following commit: ab9fea59487d ("x86/alternative: Simplify callthunk patching") Delete this unnecessary check. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/fcbb2f57-0714-4139-b441-8817365c16a1@stanley.mountain
2025-02-21x86/cpufeatures: Make AVX-VNNI depend on AVXEric Biggers1-0/+1
The 'noxsave' boot option disables support for AVX, but support for the AVX-VNNI feature was still declared on CPUs that support it. Fix this. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20250220060124.89622-1-ebiggers@kernel.org
2025-02-21x86/vdso/vdso2c: Remove page handlingThomas Weißschuh4-51/+0
The values are not used anymore. Also the sanity checks performed by vdso2c can never trigger as they only validate invariants already enforced by the linker script. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-16-13a4669dfc8c@linutronix.de
2025-02-21x86/vdso: Switch to generic storage implementationThomas Weißschuh6-178/+19
The generic storage implementation provides the same features as the custom one. However it can be shared between architectures, making maintenance easier. This switch also moves the random state data out of the time data page. The currently used hardcoded __VDSO_RND_DATA_OFFSET does not take into account changes to the time data page layout. Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-15-13a4669dfc8c@linutronix.de
2025-02-21vdso: Rename included MakefileThomas Weißschuh1-1/+1
As the Makefile is included into other Makefiles it can not be used to define objects to be built from the current source directory. However the generic datastore will introduce such a local source file. Rename the included Makefile so it is clear how it is to be used and to make room for a regular Makefile in lib/vdso/. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-4-13a4669dfc8c@linutronix.de
2025-02-21x86/vdso: Fix latent bug in vclock_pages calculationThomas Weißschuh3-2/+3
The vclock pages are *after* the non-vclock pages. Currently there are both two vclock and two non-vclock pages so the existing logic works by accident. As soon as the number of pages changes it will break however. This will be the case with the introduction of the generic vDSO data storage. Use a macro to keep the calculation understandable and in sync between the linker script and mapping code. Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-1-13a4669dfc8c@linutronix.de
2025-02-20perf/x86/intel: Fix event constraints for LNCKan Liang2-14/+8
According to the latest event list, update the event constraint tables for Lion Cove core. The general rule (the event codes < 0x90 are restricted to counters 0-3.) has been removed. There is no restriction for most of the performance monitoring events. Fixes: a932aa0e868f ("perf/x86: Add Lunar Lake and Arrow Lake support") Reported-by: Amiri Khalil <amiri.khalil@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250219141005.2446823-1-kan.liang@linux.intel.com
2025-02-20genirq: Introduce common irq_force_complete_move() implementationThomas Gleixner1-123/+108
CONFIG_GENERIC_PENDING_IRQ requires an architecture specific implementation of irq_force_complete_move() for CPU hotplug. At the moment, only x86 implements this unconditionally, but for RISC-V irq_force_complete_move() is only needed when the RISC-V IMSIC driver is in use and not needed otherwise. To allow runtime configuration of this mechanism, introduce a common irq_force_complete_move() implementation in the interrupt core code, which only invokes the completion function, when a interrupt chip in the hierarchy implements it. Switch X86 over to the new mechanism. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250217085657.789309-5-apatel@ventanamicro.com
2025-02-19x86/crc: add ANNOTATE_NOENDBR to suppress objtool warningsEric Biggers1-0/+5
The assembly functions generated by crc-pclmul-template.S are called only via static_call, so they do not need to begin with an endbr instruction. But objtool still warns about a missing endbr by default. Add ANNOTATE_NOENDBR to suppress these warnings: vmlinux.o: warning: objtool: crc32_x86_init+0x1c0: relocation to !ENDBR: crc32_lsb_vpclmul_avx10_256+0x0 vmlinux.o: warning: objtool: crc64_x86_init+0x183: relocation to !ENDBR: crc64_msb_vpclmul_avx10_256+0x0 vmlinux.o: warning: objtool: crc_t10dif_x86_init+0x183: relocation to !ENDBR: crc16_msb_vpclmul_avx10_256+0x0 vmlinux.o: warning: objtool: __SCK__crc32_lsb_pclmul+0x0: data relocation to !ENDBR: crc32_lsb_pclmul_sse+0x0 vmlinux.o: warning: objtool: __SCK__crc64_lsb_pclmul+0x0: data relocation to !ENDBR: crc64_lsb_pclmul_sse+0x0 vmlinux.o: warning: objtool: __SCK__crc64_msb_pclmul+0x0: data relocation to !ENDBR: crc64_msb_pclmul_sse+0x0 vmlinux.o: warning: objtool: __SCK__crc16_msb_pclmul+0x0: data relocation to !ENDBR: crc16_msb_pclmul_sse+0x0 Fixes: 8d2d3e72e35b ("x86/crc: add "template" for [V]PCLMULQDQ based CRC functions") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20250217170555.3d14df62@canb.auug.org.au/ Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250217193230.100443-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-18x86/ACPI: CPPC: Add missing includeMario Limonciello1-0/+2
Some minimial kernel configurations will fail with -Werror=implicit-function-declaration due to a missing header include. Add that header. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20250211203314.762755-1-superm1@kernel.org [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-02-18x86: Move sysctls into arch/x86Joel Granados3-2/+67
Move the following sysctl tables into arch/x86/kernel/setup.c: panic_on_{unrecoverable_nmi,io_nmi} bootloader_{type,version} io_delay_type unknown_nmi_panic acpi_realmode_flags Variables moved from include/linux/ to arch/x86/include/asm/ because there is no longer need for them outside arch/x86/kernel: acpi_realmode_flags panic_on_{unrecoverable_nmi,io_nmi} Include <asm/nmi.h> in arch/s86/kernel/setup.h in order to bring in panic_on_{io_nmi,unrecovered_nmi}. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kerenel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250218-jag-mv_ctltables-v1-8-cd3698ab8d29@kernel.org
2025-02-18Merge tag 'v6.14-rc3' into x86/core, to pick up fixesIngo Molnar23-89/+209
Pick up upstream x86 fixes before applying new patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-18perf/x86: Switch to use hrtimer_setup()Nam Cao2-4/+2
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Patch was created by using Coccinelle. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/e84ad5a660b8e10867e547db8e64f7e99c48ebba.1738746821.git.namcao@linutronix.de
2025-02-18KVM: x86: Switch to use hrtimer_setup()Nam Cao5-12/+8
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Patch was created by using Coccinelle. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/5051cfe7ed48ef9913bf2583eeca6795cb53d6ae.1738746821.git.namcao@linutronix.de
2025-02-18x86/percpu/64: Remove INIT_PER_CPU macrosBrian Gerst6-33/+1
Now that the load and link addresses of percpu variables are the same, these macros are no longer necessary. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-12-brgerst@gmail.com
2025-02-18x86/boot/64: Remove inverse relocationsBrian Gerst2-142/+2
Inverse relocations were needed to offset the effects of relocation for RIP-relative accesses to zero-based percpu data. Now that the percpu section is linked normally as part of the kernel image, they are no longer needed. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-11-brgerst@gmail.com
2025-02-18x86/percpu/64: Remove fixed_percpu_dataBrian Gerst4-14/+0
Now that the stack protector canary value is a normal percpu variable, fixed_percpu_data is unused and can be removed. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-10-brgerst@gmail.com
2025-02-18x86/percpu/64: Use relative percpu offsetsBrian Gerst7-64/+26
The percpu section is currently linked at absolute address 0, because older compilers hard-coded the stack protector canary value at a fixed offset from the start of the GS segment. Now that the canary is a normal percpu variable, the percpu section does not need to be linked at a specific address. x86-64 will now calculate the percpu offsets as the delta between the initial percpu address and the dynamically allocated memory, like other architectures. Note that GSBASE is limited to the canonical address width (48 or 57 bits, sign-extended). As long as the kernel text, modules, and the dynamically allocated percpu memory are all in the negative address space, the delta will not overflow this limit. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-9-brgerst@gmail.com