summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
4 daysMerge tag 'x86-urgent-2025-04-26' of ↵Linus Torvalds7-17/+28
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix 32-bit kernel boot crash if passed physical memory with more than 32 address bits - Fix Xen PV crash - Work around build bug in certain limited build environments - Fix CTEST instruction decoding in insn_decoder_test * tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Fix CTEST instruction decoding x86/boot: Work around broken busybox 'truncate' tool x86/mm: Fix _pgd_alloc() for Xen PV mode x86/e820: Discard high memory that can't be addressed by 32-bit systems
4 daysMerge tag 'perf-urgent-2025-04-26' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc perf events fixes from Ingo Molnar: - Use POLLERR for events in error state, instead of the ambiguous POLLHUP error value - Fix non-sampling (counting) events on certain x86 platforms * tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix non-sampling (counting) events on certain x86 platforms perf/core: Change to POLLERR for pinned events with error
4 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-61/+82
Pull kvm fixes from Paolo Bonzini: "ARM: - Single fix for broken usage of 'multi-MIDR' infrastructure in PI code, adding an open-coded erratum check for everyone's favorite pile of sand: Cavium ThunderX x86: - Bugfixes from a planned posted interrupt rework - Do not use kvm_rip_read() unconditionally to cater for guests with inaccessible register state" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Do not use kvm_rip_read() unconditionally for KVM_PROFILING KVM: x86: Do not use kvm_rip_read() unconditionally in KVM tracepoints KVM: SVM: WARN if an invalid posted interrupt IRTE entry is added iommu/amd: WARN if KVM attempts to set vCPU affinity without posted intrrupts iommu/amd: Return an error if vCPU affinity is set for non-vCPU IRTE KVM: x86: Take irqfds.lock when adding/deleting IRQ bypass producer KVM: x86: Explicitly treat routing entry type changes as changes KVM: x86: Reset IRTE to host control if *new* route isn't postable KVM: SVM: Allocate IR data using atomic allocation KVM: SVM: Don't update IRTEs if APICv/AVIC is disabled KVM: arm64, x86: make kvm_arch_has_irq_bypass() inline arm64: Rework checks for broken Cavium HW in the PI code
6 daysx86/insn: Fix CTEST instruction decodingKirill A. Shutemov1-2/+2
insn_decoder_test found a problem with decoding APX CTEST instructions: Found an x86 instruction decoder bug, please report this. ffffffff810021df 62 54 94 05 85 ff ctestneq objdump says 6 bytes, but insn_get_length() says 5 It happens because x86-opcode-map.txt doesn't specify arguments for the instruction and the decoder doesn't expect to see ModRM byte. Fixes: 690ca3a3067f ("x86/insn: Add support for APX EVEX instructions to the opcode map") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org # v6.10+ Link: https://lore.kernel.org/r/20250423065815.2003231-1-kirill.shutemov@linux.intel.com
6 daysperf/x86: Fix non-sampling (counting) events on certain x86 platformsLuo Gengkun1-1/+1
Perf doesn't work at perf stat for hardware events on certain x86 platforms: $perf stat -- sleep 1 Performance counter stats for 'sleep 1': 16.44 msec task-clock # 0.016 CPUs utilized 2 context-switches # 121.691 /sec 0 cpu-migrations # 0.000 /sec 54 page-faults # 3.286 K/sec <not supported> cycles <not supported> instructions <not supported> branches <not supported> branch-misses The reason is that the check in x86_pmu_hw_config() for sampling events is unexpectedly applied to counting events as well. It should only impact x86 platforms with limit_period used for non-PEBS events. For Intel platforms, it should only impact some older platforms, e.g., HSW, BDW and NHM. Fixes: 88ec7eedbbd2 ("perf/x86: Fix low freqency setting issue") Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250423064724.3716211-1-luogengkun@huaweicloud.com
6 daysx86/boot: Work around broken busybox 'truncate' toolArd Biesheuvel1-1/+1
The GNU coreutils version of truncate, which is the original, accepts a % prefix for the -s size argument which means the file in question should be padded to a multiple of the given size. This is currently used to pad the setup block of bzImage to a multiple of 4k before appending the decompressor. busybox reimplements truncate but does not support this idiom, and therefore fails the build since commit 9c54baab4401 ("x86/boot: Drop CRC-32 checksum and the build tool that generates it") Since very little build code within the kernel depends on the 'truncate' utility, work around this incompatibility by avoiding truncate altogether, and relying on dd to perform the padding. Fixes: 9c54baab4401 ("x86/boot: Drop CRC-32 checksum and the build tool that generates it") Reported-by: <phasta@kernel.org> Tested-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250424101917.1552527-2-ardb+git@google.com
6 daysKVM: x86: Do not use kvm_rip_read() unconditionally for KVM_PROFILINGAdrian Hunter1-1/+2
Not all VMs allow access to RIP. Check guest_state_protected before calling kvm_rip_read(). This avoids, for example, hitting WARN_ON_ONCE in vt_cache_reg() for TDX VMs. Fixes: 81bf912b2c15 ("KVM: TDX: Implement TDX vcpu enter/exit path") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Message-ID: <20250415104821.247234-3-adrian.hunter@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 daysKVM: x86: Do not use kvm_rip_read() unconditionally in KVM tracepointsAdrian Hunter1-3/+10
Not all VMs allow access to RIP. Check guest_state_protected before calling kvm_rip_read(). This avoids, for example, hitting WARN_ON_ONCE in vt_cache_reg() for TDX VMs. Fixes: 81bf912b2c15 ("KVM: TDX: Implement TDX vcpu enter/exit path") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Message-ID: <20250415104821.247234-2-adrian.hunter@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 daysKVM: SVM: WARN if an invalid posted interrupt IRTE entry is addedSean Christopherson1-1/+4
Now that the AMD IOMMU doesn't signal success incorrectly, WARN if KVM attempts to track an AMD IRTE entry without metadata. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250404193923.1413163-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 daysKVM: x86: Take irqfds.lock when adding/deleting IRQ bypass producerSean Christopherson1-2/+15
Take irqfds.lock when adding/deleting an IRQ bypass producer to ensure irqfd->producer isn't modified while kvm_irq_routing_update() is running. The only lock held when a producer is added/removed is irqbypass's mutex. Fixes: 872768800652 ("KVM: x86: select IRQ_BYPASS_MANAGER") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250404193923.1413163-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 daysKVM: x86: Explicitly treat routing entry type changes as changesSean Christopherson1-1/+2
Explicitly treat type differences as GSI routing changes, as comparing MSI data between two entries could get a false negative, e.g. if userspace changed the type but left the type-specific data as-is. Fixes: 515a0c79e796 ("kvm: irqfd: avoid update unmodified entries of the routing") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250404193923.1413163-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 daysKVM: x86: Reset IRTE to host control if *new* route isn't postableSean Christopherson2-45/+41
Restore an IRTE back to host control (remapped or posted MSI mode) if the *new* GSI route prevents posting the IRQ directly to a vCPU, regardless of the GSI routing type. Updating the IRTE if and only if the new GSI is an MSI results in KVM leaving an IRTE posting to a vCPU. The dangling IRTE can result in interrupts being incorrectly delivered to the guest, and in the worst case scenario can result in use-after-free, e.g. if the VM is torn down, but the underlying host IRQ isn't freed. Fixes: efc644048ecd ("KVM: x86: Update IRTE for posted-interrupts") Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250404193923.1413163-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 daysKVM: SVM: Allocate IR data using atomic allocationSean Christopherson1-1/+1
Allocate SVM's interrupt remapping metadata using GFP_ATOMIC as svm_ir_list_add() is called with IRQs are disabled and irqfs.lock held when kvm_irq_routing_update() reacts to GSI routing changes. Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250404193923.1413163-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 daysKVM: SVM: Don't update IRTEs if APICv/AVIC is disabledSean Christopherson1-2/+1
Skip IRTE updates if AVIC is disabled/unsupported, as forcing the IRTE into remapped mode (kvm_vcpu_apicv_active() will never be true) is unnecessary and wasteful. The IOMMU driver is responsible for putting IRTEs into remapped mode when an IRQ is allocated by a device, long before that device is assigned to a VM. I.e. the kernel as a whole has major issues if the IRTE isn't already in remapped mode. Opportunsitically kvm_arch_has_irq_bypass() to query for APICv/AVIC, so so that all checks in KVM x86 incorporate the same information. Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250401161804.842968-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 daysKVM: arm64, x86: make kvm_arch_has_irq_bypass() inlinePaolo Bonzini2-5/+6
kvm_arch_has_irq_bypass() is a small function and even though it does not appear in any *really* hot paths, it's also not entirely rare. Make it inline---it also works out nicely in preparation for using it in kvm-intel.ko and kvm-amd.ko, since the function is not currently exported. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysx86/mm: Fix _pgd_alloc() for Xen PV modeJuergen Gross4-14/+17
Recently _pgd_alloc() was switched from using __get_free_pages() to pagetable_alloc_noprof(), which might return a compound page in case the allocation order is larger than 0. On x86 this will be the case if CONFIG_MITIGATION_PAGE_TABLE_ISOLATION is set, even if PTI has been disabled at runtime. When running as a Xen PV guest (this will always disable PTI), using a compound page for a PGD will result in VM_BUG_ON_PGFLAGS being triggered when the Xen code tries to pin the PGD. Fix the Xen issue together with the not needed 8k allocation for a PGD with PTI disabled by replacing PGD_ALLOCATION_ORDER with an inline helper returning the needed order for PGD allocations. Fixes: a9b3c355c2e6 ("asm-generic: pgalloc: provide generic __pgd_{alloc,free}") Reported-by: Petr Vaněk <arkamar@atlas.cz> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Petr Vaněk <arkamar@atlas.cz> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250422131717.25724-1-jgross%40suse.com
11 daysx86/e820: Discard high memory that can't be addressed by 32-bit systemsMike Rapoport (Microsoft)1-0/+8
Dave Hansen reports the following crash on a 32-bit system with CONFIG_HIGHMEM=y and CONFIG_X86_PAE=y: > 0xf75fe000 is the mem_map[] entry for the first page >4GB. It > obviously wasn't allocated, thus the oops. BUG: unable to handle page fault for address: f75fe000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page *pdpt = 0000000002da2001 *pde = 000000000300c067 *pte = 0000000000000000 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-00288-ge618ee89561b-dirty #311 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 EIP: __free_pages_core+0x3c/0x74 ... Call Trace: memblock_free_pages+0x11/0x2c memblock_free_all+0x2ce/0x3a0 mm_core_init+0xf5/0x320 start_kernel+0x296/0x79c i386_start_kernel+0xad/0xb0 startup_32_smp+0x151/0x154 The mem_map[] is allocated up to the end of ZONE_HIGHMEM which is defined by max_pfn. The bug was introduced by this recent commit: 6faea3422e3b ("arch, mm: streamline HIGHMEM freeing") Previously, freeing of high memory was also clamped to the end of ZONE_HIGHMEM but after this change, memblock_free_all() tries to free memory above the of ZONE_HIGHMEM as well and that causes access to mem_map[] entries beyond the end of the memory map. To fix this, discard the memory after max_pfn from memblock on 32-bit systems so that core MM would be aware only of actually usable memory. Fixes: 6faea3422e3b ("arch, mm: streamline HIGHMEM freeing") Reported-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Shevchenko <andy@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davide Ciminaghi <ciminaghi@gnudd.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/r/20250413080858.743221-1-rppt@kernel.org # discussion and submission
11 daysMerge tag 'x86-urgent-2025-04-18' of ↵Linus Torvalds7-68/+43
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix hypercall detection on Xen guests - Extend the AMD microcode loader SHA check to Zen5, to block loading of any unreleased standalone Zen5 microcode patches - Add new Intel CPU model number for Bartlett Lake - Fix the workaround for AMD erratum 1054 - Fix buggy early memory acceptance between SEV-SNP guests and the EFI stub * tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/sev: Avoid shared GHCB page for early memory acceptance x86/cpu/amd: Fix workaround for erratum 1054 x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove cores x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any unreleased standalone Zen5 microcode patches x86/xen: Fix __xen_hypercall_setfunc()
11 daysMerge tag 'timers-urgent-2025-04-18' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a lockdep false positive in the i8253 driver" * tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/i8253: Call clockevent_i8253_disable() with interrupts disabled
11 daysMerge tag 'perf-urgent-2025-04-18' of ↵Linus Torvalds3-104/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event fixes from Ingo Molnar: "Miscellaneous fixes and a hardware-enabling change: - Fix Intel uncore PMU IIO free running counters on SPR, ICX and SNR systems - Fix Intel PEBS buffer overflow handling - Fix skid in Intel PEBS sampling of user-space general purpose registers - Enable Panther Lake PMU support - similar to Lunar Lake" * tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Add Panther Lake support perf/x86/intel: Allow to update user space GPRs from PEBS records perf/x86/intel: Don't clear perf metrics overflow bit unconditionally perf/x86/intel/uncore: Fix the scale of IIO free running counters on SPR perf/x86/intel/uncore: Fix the scale of IIO free running counters on ICX perf/x86/intel/uncore: Fix the scale of IIO free running counters on SNR
12 daysx86/boot/sev: Avoid shared GHCB page for early memory acceptanceArd Biesheuvel3-53/+21
Communicating with the hypervisor using the shared GHCB page requires clearing the C bit in the mapping of that page. When executing in the context of the EFI boot services, the page tables are owned by the firmware, and this manipulation is not possible. So switch to a different API for accepting memory in SEV-SNP guests, one which is actually supported at the point during boot where the EFI stub may need to accept memory, but the SEV-SNP init code has not executed yet. For simplicity, also switch the memory acceptance carried out by the decompressor when not booting via EFI - this only involves the allocation for the decompressed kernel, and is generally only called after kexec, as normal boot will jump straight into the kernel from the EFI stub. Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250404082921.2767593-8-ardb+git@google.com # discussion thread #1 Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com # discussion thread #2 Link: https://lore.kernel.org/r/20250417202120.1002102-2-ardb+git@google.com # final submission
12 daysx86/cpu/amd: Fix workaround for erratum 1054Sandipan Das1-7/+12
Erratum 1054 affects AMD Zen processors that are a part of Family 17h Models 00-2Fh and the workaround is to not set HWCR[IRPerfEn]. However, when X86_FEATURE_ZEN1 was introduced, the condition to detect unaffected processors was incorrectly changed in a way that the IRPerfEn bit gets set only for unaffected Zen 1 processors. Ensure that HWCR[IRPerfEn] is set for all unaffected processors. This includes a subset of Zen 1 (Family 17h Models 30h and above) and all later processors. Also clear X86_FEATURE_IRPERF on affected processors so that the IRPerfCount register is not used by other entities like the MSR PMU driver. Fixes: 232afb557835 ("x86/CPU/AMD: Add X86_FEATURE_ZEN1") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/caa057a9d6f8ad579e2f1abaa71efbd5bd4eaf6d.1744956467.git.sandipan.das@amd.com
13 daysMerge tag 'for-linus-6.15a-rc3-tag' of ↵Linus Torvalds3-16/+14
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "Just a single fix for the Xen multicall driver avoiding a percpu variable referencing initdata by its initializer" * tag 'for-linus-6.15a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: fix multicall debug feature
13 daysperf/x86/intel: Add Panther Lake supportKan Liang1-2/+9
From PMU's perspective, Panther Lake is similar to the previous generation Lunar Lake. Both are hybrid platforms, with e-core and p-core. The key differences are the ARCH PEBS feature and several new events. The ARCH PEBS is supported in the following patches. The new events will be supported later in perf tool. Share the code path with the Lunar Lake. Only update the name. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20250415114428.341182-2-dapeng1.mi@linux.intel.com
13 daysperf/x86/intel: Allow to update user space GPRs from PEBS recordsDapeng Mi1-3/+5
Currently when a user samples user space GPRs (--user-regs option) with PEBS, the user space GPRs actually always come from software PMI instead of from PEBS hardware. This leads to the sampled GPRs to possibly be inaccurate for single PEBS record case because of the skid between counter overflow and GPRs sampling on PMI. For the large PEBS case, it is even worse. If user sets the exclude_kernel attribute, large PEBS would be used to sample user space GPRs, but since PEBS GPRs group is not really enabled, it leads to all samples in the large PEBS record to share the same piece of user space GPRs, like this reproducer shows: $ perf record -e branches:pu --user-regs=ip,ax -c 100000 ./foo $ perf report -D | grep "AX" .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead So enable GPRs group for user space GPRs sampling and prioritize reading GPRs from PEBS. If the PEBS sampled GPRs is not user space GPRs (single PEBS record case), perf_sample_regs_user() modifies them to user space GPRs. [ mingo: Clarified the changelog. ] Fixes: c22497f5838c ("perf/x86/intel: Support adaptive PEBS v4") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250415104135.318169-2-dapeng1.mi@linux.intel.com
13 daysperf/x86/intel: Don't clear perf metrics overflow bit unconditionallyDapeng Mi1-2/+11
The below code would always unconditionally clear other status bits like perf metrics overflow bit once PEBS buffer overflows: status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI; This is incorrect. Perf metrics overflow bit should be cleared only when fixed counter 3 in PEBS counter group. Otherwise perf metrics overflow could be missed to handle. Closes: https://lore.kernel.org/all/20250225110012.GK31462@noisy.programming.kicks-ass.net/ Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250415104135.318169-1-dapeng1.mi@linux.intel.com
13 daysperf/x86/intel/uncore: Fix the scale of IIO free running counters on SPRKan Liang1-57/+1
The scale of IIO bandwidth in free running counters is inherited from the ICX. The counter increments for every 32 bytes rather than 4 bytes. The IIO bandwidth out free running counters don't increment with a consistent size. The increment depends on the requested size. It's impossible to find a fixed increment. Remove it from the event_descs. Fixes: 0378c93a92e2 ("perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server") Reported-by: Tang Jun <dukang.tj@alibaba-inc.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250416142426.3933977-3-kan.liang@linux.intel.com
13 daysperf/x86/intel/uncore: Fix the scale of IIO free running counters on ICXKan Liang1-32/+1
There was a mistake in the ICX uncore spec too. The counter increments for every 32 bytes rather than 4 bytes. The same as SNR, there are 1 ioclk and 8 IIO bandwidth in free running counters. Reuse the snr_uncore_iio_freerunning_events(). Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support") Reported-by: Tang Jun <dukang.tj@alibaba-inc.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250416142426.3933977-2-kan.liang@linux.intel.com
13 daysperf/x86/intel/uncore: Fix the scale of IIO free running counters on SNRKan Liang1-8/+8
There was a mistake in the SNR uncore spec. The counter increments for every 32 bytes of data sent from the IO agent to the SOC, not 4 bytes which was documented in the spec. The event list has been updated: "EventName": "UNC_IIO_BANDWIDTH_IN.PART0_FREERUN", "BriefDescription": "Free running counter that increments for every 32 bytes of data sent from the IO agent to the SOC", Update the scale of the IIO bandwidth in free running counters as well. Fixes: 210cc5f9db7a ("perf/x86/intel/uncore: Add uncore support for Snow Ridge server") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250416142426.3933977-1-kan.liang@linux.intel.com
14 daysx86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove coresPi Xiange1-0/+2
Bartlett Lake has a P-core only product with Raptor Cove. [ mingo: Switch around the define as pointed out by Christian Ludloff: Ratpr Cove is the core, Bartlett Lake is the product. Signed-off-by: Pi Xiange <xiange.pi@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Christian Ludloff <ludloff@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: "Ahmed S. Darwish" <darwi@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250414032839.5368-1-xiange.pi@intel.com
2025-04-12x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any ↵Borislav Petkov (AMD)1-2/+7
unreleased standalone Zen5 microcode patches All Zen5 machines out there should get BIOS updates which update to the correct microcode patches addressing the microcode signature issue. However, silly people carve out random microcode blobs from BIOS packages and think are doing other people a service this way... Block loading of any unreleased standalone Zen5 microcode patches. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Cc: Nikolay Borisov <nik.borisov@suse.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250410114222.32523-1-bp@kernel.org
2025-04-11x86/xen: Fix __xen_hypercall_setfunc()Jason Andryuk1-6/+1
Hypercall detection is failing with xen_hypercall_intel() chosen even on an AMD processor. Looking at the disassembly, the call to xen_get_vendor() was removed. The check for boot_cpu_has(X86_FEATURE_CPUID) was used as a proxy for the x86_vendor having been set. When CONFIG_X86_REQUIRED_FEATURE_CPUID=y (the default value), DCE eliminates the call to xen_get_vendor(). An uninitialized value 0 means X86_VENDOR_INTEL, so the Intel function is always returned. Remove the if and always call xen_get_vendor() to avoid this issue. Fixes: 3d37d9396eb3 ("x86/cpufeatures: Add {REQUIRED,DISABLED} feature configs") Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Xin Li (Intel)" <xin@zytor.com> Link: https://lore.kernel.org/r/20250410193106.16353-1-jason.andryuk@amd.com
2025-04-11xen: fix multicall debug featureJuergen Gross3-16/+14
Initializing a percpu variable with the address of a struct tagged as .initdata is breaking the build with CONFIG_SECTION_MISMATCH_WARN_ONLY not set to "y". Fix that by using an access function instead returning the .initdata struct address if the percpu space of the struct hasn't been allocated yet. Fixes: 368990a7fe30 ("xen: fix multicall debug data referencing") Reported-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: "Borislav Petkov (AMD)" <bp@alien8.de> Tested-by: "Borislav Petkov (AMD)" <bp@alien8.de> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250327190602.26015-1-jgross@suse.com>
2025-04-11x86/i8253: Call clockevent_i8253_disable() with interrupts disabledFernando Fernandez Mancera1-1/+2
There's a lockdep false positive warning related to i8253_lock: WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ... systemd-sleep/3324 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffffffffb2c23398 (i8253_lock){+.+.}-{2:2}, at: pcspkr_event+0x3f/0xe0 [pcspkr] ... ... which became HARDIRQ-irq-unsafe at: ... lock_acquire+0xd0/0x2f0 _raw_spin_lock+0x30/0x40 clockevent_i8253_disable+0x1c/0x60 pit_timer_init+0x25/0x50 hpet_time_init+0x46/0x50 x86_late_time_init+0x1b/0x40 start_kernel+0x962/0xa00 x86_64_start_reservations+0x24/0x30 x86_64_start_kernel+0xed/0xf0 common_startup_64+0x13e/0x141 ... Lockdep complains due pit_timer_init() using the lock in an IRQ-unsafe fashion, but it's a false positive, because there is no deadlock possible at that point due to init ordering: at the point where pit_timer_init() is called there is no other possible usage of i8253_lock because the system is still in the very early boot stage with no interrupts. But in any case, pit_timer_init() should disable interrupts before calling clockevent_i8253_disable() out of general principle, and to keep lockdep working even in this scenario. Use scoped_guard() for that, as suggested by Thomas Gleixner. [ mingo: Cleaned up the changelog. ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/Z-uwd4Bnn7FcCShX@gmail.com
2025-04-11Merge tag 'x86-urgent-2025-04-10' of ↵Linus Torvalds10-120/+99
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix CPU topology related regression that limited Xen PV guests to a single CPU - Fix ancient e820__register_nosave_regions() bugs that were causing problems with kexec's artificial memory maps - Fix an S4 hibernation crash caused by two missing ENDBR's that were mistakenly removed in a recent commit - Fix a resctrl serialization bug - Fix early_printk documentation and comments - Fix RSB bugs, combined with preparatory updates to better match the code to vendor recommendations. - Add RSB mitigation document - Fix/update documentation - Fix the erratum_1386_microcode[] table to be NULL terminated * tag 'x86-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ibt: Fix hibernate x86/cpu: Avoid running off the end of an AMD erratum table Documentation/x86: Zap the subsection letters Documentation/x86: Update the naming of CPU features for /proc/cpuinfo x86/bugs: Add RSB mitigation document x86/bugs: Don't fill RSB on context switch with eIBRS x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline x86/bugs: Fix RSB clearing in indirect_branch_prediction_barrier() x86/bugs: Use SBPB in write_ibpb() if applicable x86/bugs: Rename entry_ibpb() to write_ibpb() x86/early_printk: Use 'mmio32' for consistency, fix comments x86/resctrl: Fix rdtgroup_mkdir()'s unlocked use of kernfs_node::name x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions() x86/acpi: Don't limit CPUs to 1 for Xen PV guests due to disabled ACPI
2025-04-11Merge tag 'objtool-urgent-2025-04-10' of ↵Linus Torvalds2-9/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc objtool fixes from Ingo Molnar: - Remove the recently introduced ANNOTATE_IGNORE_ALTERNATIVE noise from clac()/stac() code to make .s files more readable - Fix INSN_SYSCALL / INSN_SYSRET semantics - Fix various false-positive warnings * tag 'objtool-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix false-positive "ignoring unreachables" warning objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC objtool, xen: Fix INSN_SYSCALL / INSN_SYSRET semantics objtool: Stop UNRET validation on UD2 objtool: Split INSN_CONTEXT_SWITCH into INSN_SYSCALL and INSN_SYSRET objtool: Fix INSN_CONTEXT_SWITCH handling in validate_unret()
2025-04-10Merge tag 'for-linus-6.15a-rc2-tag' of ↵Linus Torvalds3-4/+28
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - A simple fix adding the module description of the Xenbus frontend module - A fix correcting the xen-acpi-processor Kconfig dependency for PVH Dom0 support - A fix for the Xen balloon driver when running as Xen Dom0 in PVH mode - A fix for PVH Dom0 in order to avoid problems with CPU idle and frequency drivers conflicting with Xen * tag 'for-linus-6.15a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: disable CPU idle and frequency drivers for PVH dom0 x86/xen: fix balloon target initialization for PVH dom0 xen: Change xen-acpi-processor dom0 dependency xenbus: add module description
2025-04-09x86/ibt: Fix hibernatePeter Zijlstra1-2/+2
Todd reported, and Len confirmed, that commit 582077c94052 ("x86/cfi: Clean up linkage") broke S4 hiberate on a fair number of machines. Turns out these machines trip #CP when trying to restore the image. As it happens, the commit in question removes two ENDBR instructions in the hibernate code, and clearly got it wrong. Notably restore_image() does an indirect jump to relocated_restore_code(), which is a relocated copy of core_restore_code(). In turn, core_restore_code(), will at the end do an indirect jump to restore_jump_address (r8), which is pointing at a relocated restore_registers(). So both sites do indeed need to be ENDBR. Fixes: 582077c94052 ("x86/cfi: Clean up linkage") Reported-by: Todd Brandt <todd.e.brandt@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Todd Brandt <todd.e.brandt@intel.com> Tested-by: Len Brown <len.brown@intel.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219998 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219998
2025-04-09x86/cpu: Avoid running off the end of an AMD erratum tableDave Hansen1-0/+1
The NULL array terminator at the end of erratum_1386_microcode was removed during the switch from x86_cpu_desc to x86_cpu_id. This causes readers to run off the end of the array. Replace the NULL. Fixes: f3f325152673 ("x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'") Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2025-04-09x86/bugs: Add RSB mitigation documentJosh Poimboeuf1-51/+13
Create a document to summarize hard-earned knowledge about RSB-related mitigations, with references, and replace the overly verbose yet incomplete comments with a reference to the document. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/ab73f4659ba697a974759f07befd41ae605e33dd.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Don't fill RSB on context switch with eIBRSJosh Poimboeuf2-15/+15
User->user Spectre v2 attacks (including RSB) across context switches are already mitigated by IBPB in cond_mitigation(), if enabled globally or if either the prev or the next task has opted in to protection. RSB filling without IBPB serves no purpose for protecting user space, as indirect branches are still vulnerable. User->kernel RSB attacks are mitigated by eIBRS. In which case the RSB filling on context switch isn't needed, so remove it. Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Amit Shah <amit.shah@amd.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/98cdefe42180358efebf78e3b80752850c7a3e1b.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpolineJosh Poimboeuf1-4/+4
eIBRS protects against guest->host RSB underflow/poisoning attacks. Adding retpoline to the mix doesn't change that. Retpoline has a balanced CALL/RET anyway. So the current full RSB filling on VMEXIT with eIBRS+retpoline is overkill. Disable it or do the VMEXIT_LITE mitigation if needed. Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Amit Shah <amit.shah@amd.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Woodhouse <dwmw2@infradead.org> Link: https://lore.kernel.org/r/84a1226e5c9e2698eae1b5ade861f1b8bf3677dc.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Fix RSB clearing in indirect_branch_prediction_barrier()Josh Poimboeuf2-4/+3
IBPB is expected to clear the RSB. However, if X86_BUG_IBPB_NO_RET is set, that doesn't happen. Make indirect_branch_prediction_barrier() take that into account by calling write_ibpb() which clears RSB on X86_BUG_IBPB_NO_RET: /* Make sure IBPB clears return stack preductions too. */ FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_BUG_IBPB_NO_RET Note that, as of the previous patch, write_ibpb() also reads 'x86_pred_cmd' in order to use SBPB when applicable: movl _ASM_RIP(x86_pred_cmd), %eax Therefore that existing behavior in indirect_branch_prediction_barrier() is not lost. Fixes: 50e4b3b94090 ("x86/entry: Have entry_ibpb() invalidate return predictions") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/bba68888c511743d4cd65564d1fc41438907523f.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Use SBPB in write_ibpb() if applicableJosh Poimboeuf1-1/+1
write_ibpb() does IBPB, which (among other things) flushes branch type predictions on AMD. If the CPU has SRSO_NO, or if the SRSO mitigation has been disabled, branch type flushing isn't needed, in which case the lighter-weight SBPB can be used. The 'x86_pred_cmd' variable already keeps track of whether IBPB or SBPB should be used. Use that instead of hardcoding IBPB. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/17c5dcd14b29199b75199d67ff7758de9d9a4928.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Rename entry_ibpb() to write_ibpb()Josh Poimboeuf3-9/+10
There's nothing entry-specific about entry_ibpb(). In preparation for calling it from elsewhere, rename it to write_ibpb(). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/1e54ace131e79b760de3fe828264e26d0896e3ac.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/early_printk: Use 'mmio32' for consistency, fix commentsAndy Shevchenko1-5/+5
First of all, using 'mmio' prevents proper implementation of 8-bit accessors. Second, it's simply inconsistent with uart8250 set of options. Rename it to 'mmio32'. While at it, remove rather misleading comment in the documentation. From now on mmio32 is self-explanatory and pciserial supports not only 32-bit MMIO accessors. Also, while at it, fix the comment for the "pciserial" case. The comment seems to be a copy'n'paste error when mentioning "serial" instead of "pciserial" (with double quotes). Fix this. With that, move it upper, so we don't calculate 'buf' twice. Fixes: 3181424aeac2 ("x86/early_printk: Add support for MMIO-based UARTs") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Denis Mukhin <dmukhin@ford.com> Link: https://lore.kernel.org/r/20250407172214.792745-1-andriy.shevchenko@linux.intel.com
2025-04-09x86/resctrl: Fix rdtgroup_mkdir()'s unlocked use of kernfs_node::nameJames Morse1-21/+27
Since 741c10b096bc ("kernfs: Use RCU to access kernfs_node::name.") a helper rdt_kn_name() that checks that rdtgroup_mutex is held has been used for all accesses to the kernfs node name. rdtgroup_mkdir() uses the name to determine if a valid monitor group is being created by checking the parent name is "mon_groups". This is done without holding rdtgroup_mutex, and now triggers the following warning: | WARNING: suspicious RCU usage | 6.15.0-rc1 #4465 Tainted: G E | ----------------------------- | arch/x86/kernel/cpu/resctrl/internal.h:408 suspicious rcu_dereference_check() usage! [...] | Call Trace: | <TASK> | dump_stack_lvl | lockdep_rcu_suspicious.cold | is_mon_groups | rdtgroup_mkdir | kernfs_iop_mkdir | vfs_mkdir | do_mkdirat | __x64_sys_mkdir | do_syscall_64 | entry_SYSCALL_64_after_hwframe Creating a control or monitor group calls mkdir_rdt_prepare(), which uses rdtgroup_kn_lock_live() to take the rdtgroup_mutex. To avoid taking and dropping the lock, move the check for the monitor group name and position into mkdir_rdt_prepare() so that it occurs under rdtgroup_mutex. Hoist is_mon_groups() earlier in the file. [ bp: Massage. ] Fixes: 741c10b096bc ("kernfs: Use RCU to access kernfs_node::name.") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250407124637.2433230-1-james.morse@arm.com
2025-04-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-14/+50
Pull kvm fixes from Paolo Bonzini: "ARM: - Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk stage-1 page tables) to align with the architecture. This avoids possibly taking an SEA at EL2 on the page table walk or using an architecturally UNKNOWN fault IPA - Use acquire/release semantics in the KVM FF-A proxy to avoid reading a stale value for the FF-A version - Fix KVM guest driver to match PV CPUID hypercall ABI - Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM selftests, which is the only memory type for which atomic instructions are architecturally guaranteed to work s390: - Don't use %pK for debug printing and tracepoints x86: - Use a separate subclass when acquiring KVM's per-CPU posted interrupts wakeup lock in the scheduled out path, i.e. when adding a vCPU on the list of vCPUs to wake, to workaround a false positive deadlock. The schedule out code runs with a scheduler lock that the wakeup handler takes in the opposite order; but it does so with IRQs disabled and cannot run concurrently with a wakeup - Explicitly zero-initialize on-stack CPUID unions - Allow building irqbypass.ko as as module when kvm.ko is a module - Wrap relatively expensive sanity check with KVM_PROVE_MMU - Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses selftests: - Add more scenarios to the MONITOR/MWAIT test - Add option to rseq test to override /dev/cpu_dma_latency - Bring list of exit reasons up to date - Cleanup Makefile to list once tests that are valid on all architectures Other: - Documentation fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits) KVM: arm64: Use acquire/release to communicate FF-A version negotiation KVM: arm64: selftests: Explicitly set the page attrs to Inner-Shareable KVM: arm64: selftests: Introduce and use hardware-definition macros KVM: VMX: Use separate subclasses for PI wakeup lock to squash false positive KVM: VMX: Assert that IRQs are disabled when putting vCPU on PI wakeup list KVM: x86: Explicitly zero-initialize on-stack CPUID unions KVM: Allow building irqbypass.ko as as module when kvm.ko is a module KVM: x86/mmu: Wrap sanity check on number of TDP MMU pages with KVM_PROVE_MMU KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency KVM: x86: Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses Documentation: kvm: remove KVM_CAP_MIPS_TE Documentation: kvm: organize capabilities in the right section Documentation: kvm: fix some definition lists Documentation: kvm: drop "Capability" heading from capabilities Documentation: kvm: give correct name for KVM_CAP_SPAPR_MULTITCE Documentation: KVM: KVM_GET_SUPPORTED_CPUID now exposes TSC_DEADLINE selftests: kvm: list once tests that are valid on all architectures selftests: kvm: bring list of exit reasons up to date selftests: kvm: revamp MONITOR/MWAIT tests KVM: arm64: Don't translate FAR if invalid/unsafe ...
2025-04-08objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STACJosh Poimboeuf1-6/+6
ANNOTATE_IGNORE_ALTERNATIVE adds additional noise to the code generated by CLAC/STAC alternatives, hurting readability for those whose read uaccess-related code generation on a regular basis. Remove the annotation specifically for the "NOP patched with CLAC/STAC" case in favor of a manual check. Leave the other uses of that annotation in place as they're less common and more difficult to detect. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/fc972ba4995d826fcfb8d02733a14be8d670900b.1744098446.git.jpoimboe@kernel.org
2025-04-08x86/xen: disable CPU idle and frequency drivers for PVH dom0Roger Pau Monne1-1/+18
When running as a PVH dom0 the ACPI tables exposed to Linux are (mostly) the native ones, thus exposing the C and P states, that can lead to attachment of CPU idle and frequency drivers. However the entity in control of the CPU C and P states is Xen, as dom0 doesn't have a full view of the system load, neither has all CPUs assigned and identity pinned. Like it's done for classic PV guests, prevent Linux from using idle or frequency state drivers when running as a PVH dom0. On an AMD EPYC 7543P system without this fix a Linux PVH dom0 will keep the host CPUs spinning at 100% even when dom0 is completely idle, as it's attempting to use the acpi_idle driver. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250407101842.67228-1-roger.pau@citrix.com>