summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2024-10-25x86: fix whitespace in runtime-const assembler outputLinus Torvalds1-2/+2
The x86 user pointer validation changes made me look at compiler output a lot, and the wrong indentation for the ".popsection" in the generated assembler triggered me. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-25x86: fix user address masking non-canonical speculation issueLinus Torvalds1-19/+24
It turns out that AMD has a "Meltdown Lite(tm)" issue with non-canonical accesses in kernel space. And so using just the high bit to decide whether an access is in user space or kernel space ends up with the good old "leak speculative data" if you have the right gadget using the result: CVE-2020-12965 “Transient Execution of Non-Canonical Accesses“ Now, the kernel surrounds the access with a STAC/CLAC pair, and those instructions end up serializing execution on older Zen architectures, which closes the speculation window. But that was true only up until Zen 5, which renames the AC bit [1]. That improves performance of STAC/CLAC a lot, but also means that the speculation window is now open. Note that this affects not just the new address masking, but also the regular valid_user_address() check used by access_ok(), and the asm version of the sign bit check in the get_user() helpers. It does not affect put_user() or clear_user() variants, since there's no speculative result to be used in a gadget for those operations. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Link: https://lore.kernel.org/all/80d94591-1297-4afb-b510-c665efd37f10@citrix.com/ Link: https://lore.kernel.org/all/20241023094448.GAZxjFkEOOF_DM83TQ@fat_crate.local/ [1] Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1010.html Link: https://arxiv.org/pdf/2108.10771 Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> # LAM case Fixes: 2865baf54077 ("x86: support user address masking instead of non-speculative conditional") Fixes: 6014bc27561f ("x86-64: make access_ok() independent of LAM") Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-20Merge tag 'x86_urgent_for_v6.12_rc4' of ↵Linus Torvalds1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Explicitly disable the TSC deadline timer when going idle to address some CPU errata in that area - Do not apply the Zenbleed fix on anything else except AMD Zen2 on the late microcode loading path - Clear CPU buffers later in the NMI exit path on 32-bit to avoid register clearing while they still contain sensitive data, for the RDFS mitigation - Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit path on 32-bit - Fix parsing issues of memory bandwidth specification in sysfs for resctrl's memory bandwidth allocation feature - Other small cleanups and improvements * tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Always explicitly disarm TSC-deadline timer x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load x86/bugs: Use code segment selector for VERW operand x86/entry_32: Clear CPU buffers after register restore in NMI return x86/entry_32: Do not clobber user EFLAGS.ZF x86/resctrl: Annotate get_mem_config() functions as __init x86/resctrl: Avoid overflow in MB settings in bw_validate() x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h
2024-10-10x86/cpufeatures: Add a IBPB_NO_RET BUG flagJohannes Wikner1-0/+1
Set this flag if the CPU has an IBPB implementation that does not invalidate return target predictions. Zen generations < 4 do not flush the RSB when executing an IBPB and this bug flag denotes that. [ bp: Massage. ] Signed-off-by: Johannes Wikner <kwikner@ethz.ch> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2024-10-10x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RETJim Mattson1-1/+2
AMD's initial implementation of IBPB did not clear the return address predictor. Beginning with Zen4, AMD's IBPB *does* clear the return address predictor. This behavior is enumerated by CPUID.80000008H:EBX.IBPB_RET[30]. Define X86_FEATURE_AMD_IBPB_RET for use in KVM_GET_SUPPORTED_CPUID, when determining cross-vendor capabilities. Suggested-by: Venkatesh Srinivas <venkateshs@chromium.org> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org>
2024-10-09x86/bugs: Use code segment selector for VERW operandPawan Gupta1-1/+10
Robert Gill reported below #GP in 32-bit mode when dosemu software was executing vm86() system call: general protection fault: 0000 [#1] PREEMPT SMP CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1 Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010 EIP: restore_all_switch_stack+0xbe/0xcf EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046 CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0 Call Trace: show_regs+0x70/0x78 die_addr+0x29/0x70 exc_general_protection+0x13c/0x348 exc_bounds+0x98/0x98 handle_exception+0x14d/0x14d exc_bounds+0x98/0x98 restore_all_switch_stack+0xbe/0xcf exc_bounds+0x98/0x98 restore_all_switch_stack+0xbe/0xcf This only happens in 32-bit mode when VERW based mitigations like MDS/RFDS are enabled. This is because segment registers with an arbitrary user value can result in #GP when executing VERW. Intel SDM vol. 2C documents the following behavior for VERW instruction: #GP(0) - If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user space. Use %cs selector to reference VERW operand. This ensures VERW will not #GP for an arbitrary user %ds. [ mingo: Fixed the SOB chain. ] Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition") Reported-by: Robert Gill <rtgill82@gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com Cc: stable@vger.kernel.org # 5.10+ Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707 Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/ Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Suggested-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-10-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-2/+2
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix pKVM error path on init, making sure we do not change critical system registers as we're about to fail - Make sure that the host's vector length is at capped by a value common to all CPUs - Fix kvm_has_feat*() handling of "negative" features, as the current code is pretty broken - Promote Joey to the status of official reviewer, while James steps down -- hopefully only temporarly x86: - Fix compilation with KVM_INTEL=KVM_AMD=n - Fix disabling KVM_X86_QUIRK_SLOT_ZAP_ALL when shadow MMU is in use Selftests: - Fix compilation on non-x86 architectures" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/reboot: emergency callbacks are now registered by common KVM code KVM: x86: leave kvm.ko out of the build if no vendor module is requested KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU KVM: arm64: Fix kvm_has_feat*() handling of negative features KVM: selftests: Fix build on architectures other than x86_64 KVM: arm64: Another reviewer reshuffle KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path
2024-10-06Merge tag 'kvmarm-fixes-6.12-1' of ↵Paolo Bonzini44-388/+327
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.12, take #1 - Fix pKVM error path on init, making sure we do not change critical system registers as we're about to fail - Make sure that the host's vector length is at capped by a value common to all CPUs - Fix kvm_has_feat*() handling of "negative" features, as the current code is pretty broken - Promote Joey to the status of official reviewer, while James steps down -- hopefully only temporarly
2024-10-06x86/reboot: emergency callbacks are now registered by common KVM codePaolo Bonzini1-2/+2
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules. In practice this has no functional change, because CONFIG_KVM_X86_COMMON is set if and only if at least one vendor-specific module is being built. However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that are used in kvm.ko. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Fixes: 6d55a94222db ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-03x86/ftrace: Include <asm/ptrace.h>Sami Tolvanen1-0/+2
<asm/ftrace.h> uses struct pt_regs in several places. Include <asm/ptrace.h> to ensure it's visible. This is needed to make sure object files that only include <asm/asm-prototypes.h> compile. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/20240916221557.846853-2-samitolvanen@google.com Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-09-30x86: kvm: fix build errorLinus Torvalds1-0/+2
The cpu_emergency_register_virt_callback() function is used unconditionally by the x86 kvm code, but it is declared (and defined) conditionally: #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD) void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback); ... leading to a build error when neither KVM_INTEL nor KVM_AMD support is enabled: arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’: arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration] 12517 | cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’: arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration] 12522 | cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix the build by defining empty helper functions the same way the old cpu_emergency_disable_virtualization() function was dealt with for the same situation. Maybe we could instead have made the call sites conditional, since the callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak fallback. I'll leave that to the kvm people to argue about, this at least gets the build going for that particular config. Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Kai Huang <kai.huang@intel.com> Cc: Chao Gao <chao.gao@intel.com> Cc: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-29Merge tag 'x86-urgent-2024-09-29' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix TDX MMIO #VE fault handling, and add two new Intel model numbers for 'Pantherlake' and 'Diamond Rapids'" * tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add two Intel CPU model numbers x86/tdx: Fix "in-kernel MMIO" check
2024-09-29Merge tag 'locking-urgent-2024-09-29' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "lockdep: - Fix potential deadlock between lockdep and RCU (Zhiguo Niu) - Use str_plural() to address Coccinelle warning (Thorsten Blum) - Add debuggability enhancement (Luis Claudio R. Goncalves) static keys & calls: - Fix static_key_slow_dec() yet again (Peter Zijlstra) - Handle module init failure correctly in static_call_del_module() (Thomas Gleixner) - Replace pointless WARN_ON() in static_call_module_notify() (Thomas Gleixner) <linux/cleanup.h>: - Add usage and style documentation (Dan Williams) rwsems: - Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS (Waiman Long) atomic ops, x86: - Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak) - Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros Bizjak)" Signed-off-by: Ingo Molnar <mingo@kernel.org> * tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS jump_label: Fix static_key_slow_dec() yet again static_call: Replace pointless WARN_ON() in static_call_module_notify() static_call: Handle module init failure correctly in static_call_del_module() locking/lockdep: Simplify character output in seq_line() lockdep: fix deadlock issue between lockdep and rcu lockdep: Use str_plural() to fix Coccinelle warning cleanup: Add usage and style documentation lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8
2024-09-29Merge branch 'locking/core' into locking/urgent, to pick up pending commitsIngo Molnar1-4/+2
Merge all pending locking commits into a single branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-09-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-42/+94
Pull x86 kvm updates from Paolo Bonzini: "x86: - KVM currently invalidates the entirety of the page tables, not just those for the memslot being touched, when a memslot is moved or deleted. This does not traditionally have particularly noticeable overhead, but Intel's TDX will require the guest to re-accept private pages if they are dropped from the secure EPT, which is a non starter. Actually, the only reason why this is not already being done is a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs, so allow userspace to opt into the new behavior. - Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon) - Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work) - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value at the ICR offset - Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler - Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit - Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the nested guest) - Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU - Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding multi generation LRU support in KVM - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled, i.e. when the CPU has already flushed the RSB - Trace the per-CPU host save area as a VMCB pointer to improve readability and cleanup the retrieval of the SEV-ES host save area - Remove unnecessary accounting of temporary nested VMCB related allocations - Set FINAL/PAGE in the page fault error code for EPT violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2 - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible) - Minor SGX fix and a cleanup - Misc cleanups Generic: - Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed - Enable virtualization when KVM is loaded, not right before the first VM is created Together with the previous change, this simplifies a lot the logic of the callbacks, because their very existence implies virtualization is enabled - Fix a bug that results in KVM prematurely exiting to userspace for coalesced MMIO/PIO in many cases, clean up the related code, and add a testcase - Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_ the gpa+len crosses a page boundary, which thankfully is guaranteed to not happen in the current code base. Add WARNs in more helpers that read/write guest memory to detect similar bugs Selftests: - Fix a goof that caused some Hyper-V tests to be skipped when run on bare metal, i.e. NOT in a VM - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest - Explicitly include one-off assets in .gitignore. Past Sean was completely wrong about not being able to detect missing .gitignore entries - Verify userspace single-stepping works when KVM happens to handle a VM-Exit in its fastpath - Misc cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) Documentation: KVM: fix warning in "make htmldocs" s390: Enable KVM_S390_UCONTROL config in debug_defconfig selftests: kvm: s390: Add VM run test case KVM: SVM: let alternatives handle the cases when RSB filling is required KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure() KVM: x86: Update retry protection fields when forcing retry on emulation failure KVM: x86: Apply retry protection to "unprotect on failure" path KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn ...
2024-09-27Merge tag 'for-linus-6.12-rc1a-tag' of ↵Linus Torvalds1-1/+22
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: "A second round of Xen related changes and features: - a small fix of the xen-pciback driver for a warning issued by sparse - support PCI passthrough when using a PVH dom0 - enable loading the kernel in PVH mode at arbitrary addresses, avoiding conflicts with the memory map when running as a Xen dom0 using the host memory layout" * tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/pvh: Add 64bit relocation page tables x86/kernel: Move page table macros to header x86/pvh: Set phys_base when calling xen_prepare_pvh() x86/pvh: Make PVH entrypoint PIC for x86-64 xen: sync elfnote.h from xen tree xen/pciback: fix cast to restricted pci_ers_result_t and pci_power_t xen/privcmd: Add new syscall to get gsi from dev xen/pvh: Setup gsi for passthrough device xen/pci: Add a function to reset device for xen
2024-09-26x86/cpu: Add two Intel CPU model numbersTony Luck1-0/+5
Pantherlake is a mobile CPU. Diamond Rapids next generation Xeon. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240923173750.16874-1-tony.luck%40intel.com
2024-09-25x86/kernel: Move page table macros to headerJason Andryuk1-1/+22
The PVH entry point will need an additional set of prebuild page tables. Move the macros and defines to pgtable_64.h, so they can be re-used. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Message-ID: <20240823193630.2583107-5-jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-22Merge branch 'address-masking'Linus Torvalds1-0/+11
Merge user access fast validation using address masking. This allows architectures to optionally use a data dependent address masking model instead of a conditional branch for validating user accesses. That avoids the Spectre-v1 speculation barriers. Right now only x86-64 takes advantage of this, and not all architectures will be able to do it. It requires a guard region between the user and kernel address spaces (so that you can't overflow from one to the other), and an easy way to generate a guaranteed-to-fault address for invalid user pointers. Also note that this currently assumes that there is no difference between user read and write accesses. If extended to architectures like powerpc, we'll also need to separate out the user read-vs-write cases. * address-masking: x86: make the masked_user_access_begin() macro use its argument only once x86: do the user address masking outside the user access area x86: support user address masking instead of non-speculative conditional
2024-09-22x86: make the masked_user_access_begin() macro use its argument only onceLinus Torvalds1-2/+3
This doesn't actually matter for any of the current users, but before merging it mainline, make sure we don't have any surprising semantics. We don't actually want to use an inline function here, because we want to allow - but not require - const pointer arguments, and return them as such. But we already had a local auto-type variable, so let's just use it to avoid any possible double evaluation. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-21Merge tag 'mm-stable-2024-09-20-02-31' of ↵Linus Torvalds9-115/+118
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Along with the usual shower of singleton patches, notable patch series in this pull request are: - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. - "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. - "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. - "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". - "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. - "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. - "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. - "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. - "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). - "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. - "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. - "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. - "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. - "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. - "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. - "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. - "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). - "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! - "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. - "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. - "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. - "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. - "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. - "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy. This was overprovisioning THPs in sparsely accessed memory areas. - "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. - "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. - "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. - "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios" * tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits) zram: free secondary algorithms names uprobes: turn xol_area->pages[2] into xol_area->page uprobes: introduce the global struct vm_special_mapping xol_mapping Revert "uprobes: use vm_special_mapping close() functionality" mm: support large folios swap-in for sync io devices mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: fix swap_read_folio_zeromap() for large folios with partial zeromap mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries set_memory: add __must_check to generic stubs mm/vma: return the exact errno in vms_gather_munmap_vmas() memcg: cleanup with !CONFIG_MEMCG_V1 mm/show_mem.c: report alloc tags in human readable units mm: support poison recovery from copy_present_page() mm: support poison recovery from do_cow_fault() resource, kunit: add test case for region_intersects() resource: make alloc_free_mem_region() works for iomem_resource mm: z3fold: deprecate CONFIG_Z3FOLD vfio/pci: implement huge_fault support mm/arm64: support large pfn mappings mm/x86: support large pfn mappings ...
2024-09-19Merge tag 'platform-drivers-x86-v6.12-1' of ↵Linus Torvalds3-69/+3
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers updates from Hans de Goede: - asus-wmi: Add support for vivobook fan profiles - dell-laptop: Add knobs to change battery charge settings - lg-laptop: Add operation region support - intel-uncore-freq: Add support for efficiency latency control - intel/ifs: Add SBAF test support - intel/pmc: Ignore all LTRs during suspend - platform/surface: Support for arm64 based Surface devices - wmi: Pass event data directly to legacy notify handlers - x86/platform/geode: switch GPIO buttons and LEDs to software properties - bunch of small cleanups, fixes, hw-id additions, etc. * tag 'platform-drivers-x86-v6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (65 commits) MAINTAINERS: adjust file entry in INTEL MID PLATFORM platform/x86: x86-android-tablets: Adjust Xiaomi Pad 2 bottom bezel touch buttons LED platform/mellanox: mlxbf-pmc: fix lockdep warning platform/x86/amd: pmf: Add quirk for TUF Gaming A14 platform/x86: touchscreen_dmi: add nanote-next quirk platform/x86: asus-wmi: don't fail if platform_profile already registered platform/x86: asus-wmi: add debug print in more key places platform/x86: intel_scu_wdt: Move intel_scu_wdt.h to x86 subfolder platform/x86: intel_scu_ipc: Move intel_scu_ipc.h out of arch/x86/include/asm MAINTAINERS: Add Intel MID section platform/x86: panasonic-laptop: Add support for programmable buttons platform/olpc: Remove redundant null pointer checks in olpc_ec_setup_debugfs() platform/x86: intel/pmc: Ignore all LTRs during suspend platform/x86: wmi: Call both legacy and WMI driver notify handlers platform/x86: wmi: Merge get_event_data() with wmi_get_notify_data() platform/x86: wmi: Remove wmi_get_event_data() platform/x86: wmi: Pass event data directly to legacy notify handlers platform/x86: thinkpad_acpi: Fix uninitialized symbol 's' warning platform/x86: x86-android-tablets: Fix spelling in the comments platform/x86: ideapad-laptop: Make the scope_guard() clear of its scope ...
2024-09-19Merge tag 'for-linus-6.12-rc1-tag' of ↵Linus Torvalds2-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - fix a boot problem as a Xen dom0 on some AMD systems - fix Xen PVH boot problems with KASAN enabled - fix for a build warning - fixes to swiotlb-xen * tag 'for-linus-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/swiotlb: fix allocated size xen/swiotlb: add alignment check for dma buffers xen/pci: Avoid -Wflex-array-member-not-at-end warning xen/xenbus: Convert to use ERR_CAST() xen, pvh: fix unbootable VMs by inlining memset() in xen_prepare_pvh() x86/cpu: fix unbootable VMs by inlining memcmp() in hypervisor_cpuid_base() xen, pvh: fix unbootable VMs (PVH + KASAN - AMD_MEM_ENCRYPT) xen: tolerate ACPI NVS memory overlapping with Xen allocated memory xen: allow mapping ACPI data using a different physical address xen: add capability to remap non-RAM pages to different PFNs xen: move max_pfn in xen_memory_setup() out of function scope xen: move checks for e820 conflicts further up xen: introduce generic helper checking for memory map conflicts xen: use correct end address of kernel for conflict checking
2024-09-18Merge tag 'random-6.12-rc1-for-linus' of ↵Linus Torvalds3-16/+8
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "Originally I'd planned on sending each of the vDSO getrandom() architecture ports to their respective arch trees. But as we started to work on this, we found lots of interesting issues in the shared code and infrastructure, the fixes for which the various archs needed to base their work. So in the end, this turned into a nice collaborative effort fixing up issues and porting to 5 new architectures -- arm64, powerpc64, powerpc32, s390x, and loongarch64 -- with everybody pitching in and commenting on each other's code. It was a fun development cycle. This contains: - Numerous fixups to the vDSO selftest infrastructure, getting it running successfully on more platforms, and fixing bugs in it. - Additions to the vDSO getrandom & chacha selftests. Basically every time manual review unearthed a bug in a revision of an arch patch, or an ambiguity, the tests were augmented. By the time the last arch was submitted for review, s390x, v1 of the series was essentially fine right out of the gate. - Fixes to the the generic C implementation of vDSO getrandom, to build and run successfully on all archs, decoupling it from assumptions we had (unintentionally) made on x86_64 that didn't carry through to the other architectures. - Port of vDSO getrandom to LoongArch64, from Xi Ruoyao and acked by Huacai Chen. - Port of vDSO getrandom to ARM64, from Adhemerval Zanella and acked by Will Deacon. - Port of vDSO getrandom to PowerPC, in both 32-bit and 64-bit varieties, from Christophe Leroy and acked by Michael Ellerman. - Port of vDSO getrandom to S390X from Heiko Carstens, the arch maintainer. While it'd be natural for there to be things to fix up over the course of the development cycle, these patches got a decent amount of review from a fairly diverse crew of folks on the mailing lists, and, for the most part, they've been cooking in linux-next, which has been helpful for ironing out build issues. In terms of architectures, I think that mostly takes care of the important 64-bit archs with hardware still being produced and running production loads in settings where vDSO getrandom is likely to help. Arguably there's still RISC-V left, and we'll see for 6.13 whether they find it useful and submit a port" * tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (47 commits) selftests: vDSO: check cpu caps before running chacha test s390/vdso: Wire up getrandom() vdso implementation s390/vdso: Move vdso symbol handling to separate header file s390/vdso: Allow alternatives in vdso code s390/module: Provide find_section() helper s390/facility: Let test_facility() generate static branch if possible s390/alternatives: Remove ALT_FACILITY_EARLY s390/facility: Disable compile time optimization for decompressor code selftests: vDSO: fix vdso_config for s390 selftests: vDSO: fix ELF hash table entry size for s390x powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64 powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32 powerpc/vdso: Refactor CFLAGS for CVDSO build powerpc/vdso32: Add crtsavres mm: Define VM_DROPPABLE for powerpc/32 powerpc/vdso: Fix VDSO data access when running in a non-root time namespace selftests: vDSO: don't include generated headers for chacha test arm64: vDSO: Wire up getrandom() vDSO implementation arm64: alternative: make alternative_has_cap_likely() VDSO compatible selftests: vDSO: also test counter in vdso_test_chacha ...
2024-09-17Merge tag 'kvm-x86-vmx-6.12' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-0/+3
KVM VMX changes for 6.12: - Set FINAL/PAGE in the page fault error code for EPT Violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical. - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2. - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures. - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible). - Minor SGX fix and a cleanup.
2024-09-17Merge tag 'kvm-x86-svm-6.12' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-5/+15
KVM SVM changes for 6.12: - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled, i.e. when the CPU has already flushed the RSB. - Trace the per-CPU host save area as a VMCB pointer to improve readability and cleanup the retrieval of the SEV-ES host save area. - Remove unnecessary accounting of temporary nested VMCB related allocations.
2024-09-17Merge tag 'kvm-x86-pat_vmx_msrs-6.12' of https://github.com/kvm-x86/linux ↵Paolo Bonzini2-24/+50
into HEAD KVM VMX and x86 PAT MSR macro cleanup for 6.12: - Add common defines for the x86 architectural memory types, i.e. the types that are shared across PAT, MTRRs, VMCSes, and EPTPs. - Clean up the various VMX MSR macros to make the code self-documenting (inasmuch as possible), and to make it less painful to add new macros.
2024-09-17Merge tag 'kvm-x86-mmu-6.12' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-5/+9
KVM x86 MMU changes for 6.12: - Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop. - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU. - Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding MGLRU support in KVM. - Misc cleanups
2024-09-17Merge tag 'kvm-x86-misc-6.12' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini3-2/+6
KVM x86 misc changes for 6.12 - Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon). - Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs. This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work). - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value a the ICR offset. - Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler. - Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit. - Finally fix the RSM vs. nested VM-Enter WARN by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (architecturally, the SHUTDOWN is supposed to hit L1, not L2).
2024-09-17Merge branch 'kvm-redo-enable-virt' into HEADPaolo Bonzini3-5/+8
Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed. The primary motivation for this series is to simplify dealing with enabling virtualization for Intel's TDX, which needs to enable virtualization when kvm-intel.ko is loaded, i.e. long before the first VM is created. That said, this is a nice cleanup on its own. By registering the callbacks on-demand, the callbacks themselves don't need to check kvm_usage_count, because their very existence implies a non-zero count. Patch 1 (re)adds a dedicated lock for kvm_usage_count. This avoids a lock ordering issue between cpus_read_lock() and kvm_lock. The lock ordering issue still exist in very rare cases, and will be fixed for good by switching vm_list to an (S)RCU-protected list. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-17Merge branch 'kvm-memslot-zap-quirk' into HEADPaolo Bonzini2-1/+3
Today whenever a memslot is moved or deleted, KVM invalidates the entire page tables and generates fresh ones based on the new memslot layout. This behavior traditionally was kept because of a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs. It generally does not have a huge overhead, because the old MMU is able to reuse cached page tables and the new one is more scalabale and can resolve EPT violations/nested page faults in parallel, but it has worse performance if the guest frequently deletes and adds small memslots, and it's entirely not viable for TDX. This is because TDX requires re-accepting of private pages after page dropping. For non-TDX VMs, this series therefore introduces the KVM_X86_QUIRK_SLOT_ZAP_ALL quirk, enabling users to control the behavior of memslot zapping when a memslot is moved/deleted. The quirk is turned on by default, leading to the zapping of all SPTEs when a memslot is moved/deleted; users however have the option to turn off the quirk, which limits the zapping only to those SPTEs hat lie within the range of memslot being moved/deleted. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-17Merge tag 'x86-misc-2024-09-17' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Thomas Gleixner: - Rework kcpuid to handle the the autogenerated CSV file correctly and update the CSV file to cover the whole zoo of CPUID. - Avoid memcpy() for ia32 syscall_get_arguments() and use direct assignments as fortified memcpy() is unhappy about writing/reading beyond the end of the addresses destination/source struct member - A few new PCI IDs for AMD - Update MAINTAINERS to cover x86 specific selftests * tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Add selftests/x86 entry x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70h x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments() MAINTAINERS: Add x86 cpuid database entry tools/x86/kcpuid: Introduce a complete cpuid bitfields CSV file tools/x86/kcpuid: Parse subleaf ranges if provided tools/x86/kcpuid: Recognize all leaves with subleaves tools/x86/kcpuid: Strip bitfield names leading/trailing whitespace tools/x86/kcpuid: Protect against faulty "max subleaf" values tools/x86/kcpuid: Set max possible subleaves count to 64 tools/x86/kcpuid: Properly align long-description columns tools/x86/kcpuid: Remove unused variable x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h
2024-09-17Merge tag 'x86-platform-2024-09-17' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform update from Thomas Gleixner: "Remove a stale declaration from the UV platform code" * tag 'x86-platform-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/uv: Remove unused declaration uv_irq_2_mmr_info()
2024-09-17Merge tag 'x86-mm-2024-09-17' of ↵Linus Torvalds4-9/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 memory management updates from Thomas Gleixner: - Make LAM enablement safe vs. kernel threads using a process mm temporarily as switching back to the process would not update CR3 and therefore not enable LAM causing faults in user space when using tagged pointers. Cure it by synchronizing LAM enablement via IPIs to all CPUs which use the related mm. - Cure a LAM harmless inconsistency between CR3 and the state during context switch. It's both confusing and prone to lead to real bugs - Handle alt stack handling for threads which run with a non-zero protection key. The non-zero key prevents the kernel to access the alternate stack. Cure it by temporarily enabling all protection keys for the alternate stack setup/restore operations. - Provide a EFI config table identity mapping for kexec kernel to prevent kexec fails because the new kernel cannot access the config table array - Use GB pages only when a full GB is mapped in the identity map as otherwise the CPU can speculate into reserved areas after the end of memory which causes malfunction on UV systems. - Remove the noisy and pointless SRAT table dump during boot - Use is_ioremap_addr() for iounmap() address range checks instead of high_memory. is_ioremap_addr() is more precise. * tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioremap: Improve iounmap() address range checks x86/mm: Remove duplicate check from build_cr3() x86/mm: Remove unused NX related declarations x86/mm: Remove unused CR3_HW_ASID_BITS x86/mm: Don't print out SRAT table information x86/mm/ident_map: Use gbpages only where full GB page should be mapped. x86/kexec: Add EFI config table identity mapping for kexec kernel selftests/mm: Add new testcases for pkeys x86/pkeys: Restore altstack access in sigreturn() x86/pkeys: Update PKRU to enable all pkeys before XSAVE x86/pkeys: Add helper functions to update PKRU on the sigframe x86/pkeys: Add PKRU as a parameter in signal handling functions x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking x86/mm: Fix LAM inconsistency during context switch x86/mm: Use IPIs to synchronize LAM enablement
2024-09-17Merge tag 'x86-fred-2024-09-17' of ↵Linus Torvalds5-23/+47
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FRED updates from Thomas Gleixner: - Enable FRED right after init_mem_mapping() because at that point the early IDT fault handler is replaced by the real fault handler. The real fault handler retrieves the faulting address from the stack frame and not from CR2 when the FRED feature is set. But that obviously only works when FRED is enabled in the CPU as well. - Set SS to __KERNEL_DS when enabling FRED to prevent a corner case where ERETS can observe a SS mismatch and raises a #GP. * tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Set FRED RSP0 on return to userspace instead of context switch x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism x86/entry: Test ti_work for zero before processing individual bits x86/fred: Set SS to __KERNEL_DS when enabling FRED x86/fred: Enable FRED right after init_mem_mapping() x86/fred: Move FRED RSP initialization into a separate function x86/fred: Parse cmdline param "fred=" in cpu_parse_early_param()
2024-09-17Merge tag 'x86-fpu-2024-09-17' of ↵Linus Torvalds1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Thomas Gleixner: "Provide FPU buffer layout in core dumps: Debuggers have guess the FPU buffer layout in core dumps, which is error prone. This is because AMD and Intel layouts differ. To avoid buggy heuristics add a ELF section which describes the buffer layout which can be retrieved by tools" * tag 'x86-fpu-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/elf: Add a new FPU buffer layout info to x86 core files
2024-09-17Merge tag 'x86-core-2024-09-17' of ↵Linus Torvalds1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core update from Thomas Gleixner: "Enable UBSAN traps for x86, which provides better reporting through metadata encodeded into UD1" * tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/traps: Enable UBSAN traps on x86
2024-09-17Merge tag 'x86-apic-2024-09-17' of ↵Linus Torvalds2-31/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Thomas Gleixner: - Handle an allocation failure in the IO/APIC code gracefully instead of crashing the machine. - Remove support for APIC local destination mode on 64bit Logical destination mode of the local APIC is used for systems with up to 8 CPUs. It has an advantage over physical destination mode as it allows to target multiple CPUs at once with IPIs. That advantage was definitely worth it when systems with up to 8 CPUs were state of the art for servers and workstations, but that's history. In the recent past there were quite some reports of new laptops failing to boot with logical destination mode, but they work fine with physical destination mode. That's not a suprise because physical destination mode is guaranteed to work as it's the only way to get a CPU up and running via the INIT/INIT/STARTUP sequence. Some of the affected systems were cured by BIOS updates, but not all OEMs provide them. As the number of CPUs keep increasing, logical destination mode becomes less used and the benefit for small systems, like laptops, is not really worth the trouble. So just remove logical destination mode support for 64bit and be done with it. - Code and comment cleanups in the APIC area. * tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Fix comment on IRQ vector layout x86/apic: Remove unused extern declarations x86/apic: Remove logical destination mode for 64-bit x86/apic: Remove unused inline function apic_set_eoi_cb() x86/ioapic: Cleanup remaining coding style issues x86/ioapic: Cleanup line breaks x86/ioapic: Cleanup bracket usage x86/ioapic: Cleanup comments x86/ioapic: Move replace_pin_at_irq_node() to the call site iommu/vt-d: Cleanup apic_printk() x86/mpparse: Cleanup apic_printk()s x86/ioapic: Cleanup guarded debug printk()s x86/ioapic: Cleanup apic_printk()s x86/apic: Cleanup apic_printk()s x86/apic: Provide apic_printk() helpers x86/ioapic: Use guard() for locking where applicable x86/ioapic: Cleanup structs x86/ioapic: Mark mp_alloc_timer_irq() __init x86/ioapic: Handle allocation failures gracefully
2024-09-17Merge tag 'x86-cleanups-2024-09-17' of ↵Linus Torvalds2-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Thomas Gleixner: "A set of cleanups across x86: - Use memremap() for the EISA probe instead of ioremap(). EISA is strictly memory and not MMIO - Cleanups and enhancement all over the place" * tag 'x86-cleanups-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/EISA: Dereference memory directly instead of using readl() x86/extable: Remove unused declaration fixup_bug() x86/boot/64: Strip percpu address space when setting up GDT descriptors x86/cpu: Clarify the error message when BIOS does not support SGX x86/kexec: Add comments around swap_pages() assembly to improve readability x86/kexec: Fix a comment of swap_pages() assembly x86/sgx: Fix a W=1 build warning in function comment x86/EISA: Use memremap() to probe for the EISA BIOS signature x86/mtrr: Remove obsolete declaration for mtrr_bp_restore() x86/cpu_entry_area: Annotate percpu_setup_exception_stacks() as __init
2024-09-17Merge tag 'x86-build-2024-09-17' of ↵Linus Torvalds2-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Thomas Gleixner: "Updates for KCOV instrumentation on x86: - Prevent spurious KCOV coverage in common_interrupt() - Fixup the KCOV Makefile directive which got stale due to a source file rename - Exclude stack unwinding from KCOV as it creates large amounts of uninteresting coverage - Provide a self test to validate that KCOV coverage of the interrupt handling code starts not before preempt count got updated" * tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Ignore stack unwinding in KCOV module: Fix KCOV-ignored file name kcov: Add interrupt handling self test x86/entry: Remove unwanted instrumentation in common_interrupt()
2024-09-17mm/x86: support large pfn mappingsPeter Xu1-28/+52
Helpers to install and detect special pmd/pud entries. In short, bit 9 on x86 is not used for pmd/pud, so we can directly define them the same as the pte level. One note is that it's also used in _PAGE_BIT_CPA_TEST but that is only used in the debug test, and shouldn't conflict in this case. One note is that pxx_set|clear_flags() for pmd/pud will need to be moved upper so that they can be referenced by the new special bit helpers. There's no change in the code that was moved. Link: https://lkml.kernel.org/r/20240826204353.2228736-18-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-16Merge tag 'pm-6.12-rc1' of ↵Linus Torvalds2-3/+13
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "By the number of new lines of code, the most visible change here is the addition of hybrid CPU capacity scaling support to the intel_pstate driver. Next are the amd-pstate driver changes related to the calculation of the AMD boost numerator and preferred core detection. As far as new hardware support is concerned, the intel_idle driver will now handle Granite Rapids Xeon processors natively, the intel_rapl power capping driver will recognize family 1Ah of AMD processors and Intel ArrowLake-U chipos, and intel_pstate will handle Granite Rapids and Sierra Forest chips in the out-of-band (OOB) mode. Apart from the above, there is a usual collection of assorted fixes and code cleanups in many places and there are tooling updates. Specifics: - Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef) - Add support for Granite Rapids and Sierra Forest in OOB mode to the intel_pstate cpufreq driver (Srinivas Pandruvada) - Add basic support for CPU capacity scaling on x86 and make the intel_pstate driver set asymmetric CPU capacity on hybrid systems without SMT (Rafael Wysocki) - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq driver (Jeff Johnson) - Several OF related cleanups in cpufreq drivers (Rob Herring) - Enable COMPILE_TEST for ARM drivers (Rob Herrring) - Introduce quirks for syscon failures and use socinfo to get revision for TI cpufreq driver (Dhruva Gole, Nishanth Menon) - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay Ugwekar) - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers (Danila Tikhonov, Huacai Chen, and Liu Jing) - Make amd-pstate validate return of any attempt to update EPP limits, which fixes the masking hardware problems (Mario Limonciello) - Move the calculation of the AMD boost numerator outside of amd-pstate, correcting acpi-cpufreq on systems with preferred cores (Mario Limonciello) - Harden preferred core detection in amd-pstate to avoid potential false positives (Mario Limonciello) - Add extra unit test coverage for mode state machine (Mario Limonciello) - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu) - Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy) - Disable promotion to C1E on Jasper Lake and Elkhart Lake in intel_idle (Kai-Heng Feng) - Use scoped device node handling to fix missing of_node_put() and simplify walking OF children in the riscv-sbi cpuidle driver (Krzysztof Kozlowski) - Remove dead code from cpuidle_enter_state() (Dhruva Gole) - Change an error pointer to NULL to fix error handling in the intel_rapl power capping driver (Dan Carpenter) - Fix off by one in get_rpi() in the intel_rapl power capping driver (Dan Carpenter) - Add support for ArrowLake-U to the intel_rapl power capping driver (Sumeet Pawnikar) - Fix the energy-pkg event for AMD CPUs in the intel_rapl power capping driver (Dhananjay Ugwekar) - Add support for AMD family 1Ah processors to the intel_rapl power capping driver (Dhananjay Ugwekar) - Remove unused stub for saveable_highmem_page() and remove deprecated macros from power management documentation (Andy Shevchenko) - Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM sysfs interface (Xueqin Luo) - Update the maintainers information for the operating-points-v2-ti-cpu DT binding (Dhruva Gole) - Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring) - Add missing MODULE_DESCRIPTION() macros to devfreq governors (Jeff Johnson) - Use devm_clk_get_enabled() in the exynos-bus devfreq driver (Anand Moon) - Use of_property_present() instead of of_get_property() in the imx-bus devfreq driver (Rob Herring) - Update directory handling and installation process in the pm-graph Makefile and add .gitignore to ignore sleepgraph.py artifacts to pm-graph (Amit Vadhavana, Yo-Jung Lin) - Make cpupower display residency value in idle-info (Aboorva Devarajan) - Add missing powercap_set_enabled() stub function to cpupower (John B. Wyatt IV) - Add SWIG support to cpupower (John B. Wyatt IV)" * tag 'pm-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits) cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue cpufreq/amd-pstate-ut: Add test case for mode switches cpufreq/amd-pstate: Export symbols for changing modes amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking` cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore` cpufreq: amd-pstate: Optimize amd_pstate_update_limits() cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator() x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator() x86/amd: Move amd_get_highest_perf() out of amd-pstate ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn ACPI: CPPC: Drop check for non zero perf ratio x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator() ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c PM: hibernate: Remove unused stub for saveable_highmem_page() pm:cpupower: Add error warning when SWIG is not installed MAINTAINERS: Add Maintainers for SWIG Python bindings pm:cpupower: Include test_raw_pylibcpupower.py pm:cpupower: Add SWIG bindings files for libcpupower pm:cpupower: Add missing powercap_set_enabled() stub function ...
2024-09-16Merge tag 'x86_cpu_for_v6.12_rc1' of ↵Linus Torvalds2-103/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Add the final conversions to the new Intel VFM CPU model matching macros which include the vendor and finally drop the old ones which hardcode family 6 * tag 'x86_cpu_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/vfm: Delete all the *_FAM6_ CPU #defines x86/cpu/vfm: Delete X86_MATCH_INTEL_FAM6_MODEL[_STEPPING]() macros extcon: axp288: Switch to new Intel CPU model defines x86/cpu/intel: Replace PAT erratum model/family magic numbers with symbolic IFM references
2024-09-16Merge tag 'x86_sev_for_v6.12_rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - A bunch of cleanups to the sev-guest driver. All in preparation for future SEV work * tag 'x86_sev_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: virt: sev-guest: Ensure the SNP guest messages do not exceed a page virt: sev-guest: Fix user-visible strings virt: sev-guest: Rename local guest message variables virt: sev-guest: Replace dev_dbg() with pr_debug()
2024-09-16Merge tag 'ras_core_for_v6.12_rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - Reorganize the struct mce populating functions so that MCA errors reported through BIOS' BERT method can report the correct CPU number the error has been detected on * tag 'ras_core_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error() x86/mce: Define mce_prep_record() helpers for common and per-CPU fields x86/mce: Rename mce_setup() to mce_prep_record()
2024-09-13random: vDSO: minimize and simplify header includesChristophe Leroy1-0/+1
Depending on the architecture, building a 32-bit vDSO on a 64-bit kernel is problematic when some system headers are included. Minimise the amount of headers by moving needed items, such as __{get,put}_unaligned_t, into dedicated common headers and in general use more specific headers, similar to what was done in commit 8165b57bca21 ("linux/const.h: Extract common header for vDSO") and commit 8c59ab839f52 ("lib/vdso: Enable common headers"). On some architectures this results in missing PAGE_SIZE, as was described by commit 8b3843ae3634 ("vdso/datapage: Quick fix - use asm/page-def.h for ARM64"), so define this if necessary, in the same way as done prior by commit cffaefd15a8f ("vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h"). Removing linux/time64.h leads to missing 'struct timespec64' in x86's asm/pvclock.h. Add a forward declaration of that struct in that file. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13random: vDSO: add __arch_get_k_vdso_rng_data() helper for data page accessChristophe Leroy1-3/+7
_vdso_data is specific to x86 and __arch_get_k_vdso_data() is provided so that all architectures can provide the requested pointer. Do the same with _vdso_rng_data, provide __arch_get_k_vdso_rng_data() and don't use x86 _vdso_rng_data directly. Until now vdso/vsyscall.h was only included by time/vsyscall.c but now it will also be included in char/random.c, leading to a duplicate declaration of _vdso_data and _vdso_rng_data. To fix this issue, move the declaration in a C file. vma.c looks like the most appropriate candidate. We don't need to replace the definitions in vsyscall.h by declarations as declarations are already in asm/vvar.h. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13random: vDSO: move prototype of arch chacha function to vdso/getrandom.hJason A. Donenfeld1-13/+0
Having the prototype for __arch_chacha20_blocks_nostack in arch/x86/include/asm/vdso/getrandom.h meant that the prototype and large doc comment were cloned by every architecture, which has been causing unnecessary churn. Instead move it into include/vdso/getrandom.h, where it can be shared by all archs implementing it. As a side bonus, this then lets us use that prototype in the vdso_test_chacha self test, to ensure that it matches the source, and indeed doing so turned up some inconsistencies, which are rectified here. Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-12x86/cpu: fix unbootable VMs by inlining memcmp() in hypervisor_cpuid_base()Alexey Dobriyan1-1/+6
If this memcmp() is not inlined then PVH early boot code can call into KASAN-instrumented memcmp() which results in unbootable VMs: pvh_start_xen xen_prepare_pvh xen_cpuid_base hypervisor_cpuid_base memcmp Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20240802154253.482658-2-adobriyan@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12xen: allow mapping ACPI data using a different physical addressJuergen Gross1-0/+8
When running as a Xen PV dom0 the system needs to map ACPI data of the host using host physical addresses, while those addresses can conflict with the guest physical addresses of the loaded linux kernel. The same problem might apply in case a PV guest is configured to use the host memory map. This conflict can be solved by mapping the ACPI data to a different guest physical address, but mapping the data via acpi_os_ioremap() must still be possible using the host physical address, as this address might be generated by AML when referencing some of the ACPI data. When configured to support running as a Xen PV domain, have an implementation of acpi_os_ioremap() being aware of the possibility to need above mentioned translation of a host physical address to the guest physical address. This modification requires to #include linux/acpi.h in some sources which need to include asm/acpi.h directly. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>