summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2026-01-14vdso: Remove struct getcpu_cacheThomas Weißschuh5-14/+7
The cache parameter of getcpu() is useless nowadays for various reasons. * It is never passed by userspace for either the vDSO or syscalls. * It is never used by the kernel. * It could not be made to work on the current vDSO architecture. * The structure definition is not part of the UAPI headers. * vdso_getcpu() is superseded by restartable sequences in any case. Remove the struct and its header. As a side-effect this gets rid of an unwanted inclusion of the linux/ header namespace from vDSO code. [ tglx: Adapt to s390 upstream changes */ Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Link: https://patch.msgid.link/20251230-getcpu_cache-v3-1-fb9c5f880ebe@linutronix.de
2026-01-14Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds1-4/+2
Pull bpf fixes from Alexei Starovoitov: - Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK in riscv JIT (Menglong Dong) - Fix reference count leak in bpf_prog_test_run_xdp() (Tetsuo Handa) - Fix metadata size check in bpf_test_run() (Toke Høiland-Jørgensen) - Check that BPF insn array is not allowed as a map for const strings (Deepanshu Kartikey) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix reference count leak in bpf_prog_test_run_xdp() bpf: Reject BPF_MAP_TYPE_INSN_ARRAY in check_reg_const_str() selftests/bpf: Update xdp_context_test_run test to check maximum metadata size bpf, test_run: Subtract size of xdp_frame from allowed metadata size riscv, bpf: Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK
2026-01-14KVM: SVM: Assert that Hyper-V's HV_SVM_EXITCODE_ENL == SVM_EXIT_SWSean Christopherson1-0/+6
Add a build-time assertiont that Hyper-V's "enlightened" exit code is that, same as the AMD-defined "Reserved for Host" exit code, mostly to help readers connect the dots and understand why synthesizing a software-defined exit code is safe/ok. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://patch.msgid.link/20251230211347.4099600-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: SVM: Harden exit_code against being used in Spectre-like attacksSean Christopherson1-0/+1
Explicitly clamp the exit code used to index KVM's exit handlers to guard against Spectre-like attacks, mainly to provide consistency between VMX and SVM (VMX was given the same treatment by commit c926f2f7230b ("KVM: x86: Protect exit_reason from being used in Spectre-v1/L1TF attacks"). For normal VMs, it's _extremely_ unlikely the exit code could be used to exploit a speculation vulnerability, as the exit code is set by hardware and unexpected/unknown exit codes should be quite well bounded (as is/was the case with VMX). But with SEV-ES+, the exit code is guest-controlled as it comes from the GHCB, not from hardware, i.e. an attack from the guest is at least somewhat plausible. Irrespective of SEV-ES+, hardening KVM is easy and inexpensive, and such an attack is theoretically possible. Link: https://patch.msgid.link/20251230211347.4099600-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: SVM: Limit incorrect check on SVM_EXIT_ERR to running as a VMSean Christopherson1-1/+4
Limit KVM's incorrect check for VMXEXIT_INVALID, a.k.a. SVM_EXIT_ERR, to running as a VM, as detected by X86_FEATURE_HYPERVISOR. The exit_code and all failure codes, e.g. VMXEXIT_INVALID, are 64-bit values, and so checking only bits 31:0 could result in false positives when running on non-broken hardware, e.g. in the extremely unlikely scenario exit code 0xffffffffull is ever generated by hardware. Keep the 32-bit check to play nice with running on broken KVM (for years, KVM has not set bits 63:32 when synthesizing nested SVM VM-Exits). Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: SVM: Treat exit_code as an unsigned 64-bit value through all of KVMSean Christopherson8-64/+38
Fix KVM's long-standing buggy handling of SVM's exit_code as a 32-bit value. Per the APM and Xen commit d1bd157fbc ("Big merge the HVM full-virtualisation abstractions.") (which is arguably more trustworthy than KVM), offset 0x70 is a single 64-bit value: 070h 63:0 EXITCODE Track exit_code as a single u64 to prevent reintroducing bugs where KVM neglects to correctly set bits 63:32. Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface") Cc: Jim Mattson <jmattson@google.com> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: SVM: Filter out 64-bit exit codes when invoking exit handlers on bare metalSean Christopherson1-2/+16
Explicitly filter out 64-bit exit codes when invoking exit handlers, as svm_exit_handlers[] will never be sized with entries that use bits 63:32. Processing the non-failing exit code as a 32-bit value will allow tracking exit_code as a single 64-bit value (which it is, architecturally). This will also allow hardening KVM against Spectre-like attacks without needing to do silly things to avoid build failures on 32-bit kernels (array_index_nospec() rightly asserts that the index fits in an "unsigned long"). Omit the check when running as a VM, as KVM has historically failed to set bits 63:32 appropriately when synthesizing VM-Exits, i.e. KVM could get false positives when running as a VM on an older, broken KVM/kernel. From a functional perspective, omitting the check is "fine", as any unwanted collision between e.g. VMEXIT_INVALID and a 32-bit exit code will be fatal to KVM-on-KVM regardless of what KVM-as-L1 does. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: SVM: Check for an unexpected VM-Exit after RETPOLINE "fast" handlingSean Christopherson1-6/+6
Check for an unexpected/unhandled VM-Exit after the manual RETPOLINE=y handling. The entire point of the RETPOLINE checks is to optimize for common VM-Exits, i.e. checking for the rare case of an unsupported VM-Exit is counter-productive. This also aligns SVM and VMX exit handling. No functional change intended. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: SVM: Open code handling of unexpected exits in svm_invoke_exit_handler()Sean Christopherson1-15/+10
Fold svm_check_exit_valid() and svm_handle_invalid_exit() into their sole caller, svm_invoke_exit_handler(), as having tiny single-use helpers makes the code unncessarily difficult to follow. This will also allow for additional cleanups in svm_invoke_exit_handler(). No functional change intended. Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Link: https://patch.msgid.link/20251230211347.4099600-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: SVM: Add a helper to detect VMRUN failuresSean Christopherson3-11/+14
Add a helper to detect VMRUN failures so that KVM can guard against its own long-standing bug, where KVM neglects to set exitcode[63:32] when synthesizing a nested VMFAIL_INVALID VM-Exit. This will allow fixing KVM's mess of treating exitcode as two separate 32-bit values without breaking KVM-on-KVM when running on an older, unfixed KVM. Cc: Jim Mattson <jmattson@google.com> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: x86: align the code with kvm_x86_call()Jun Miao1-1/+1
The use of static_call_cond() is essentially the same as static_call() on x86 (e.g. static_call() now handles a NULL pointer as a NOP), and then the kvm_x86_call() is added to improve code readability and maintainability for keeping consistent code style. Fixes 8d032b683c29 ("KVM: TDX: create/destroy VM structure") Link: https://lore.kernel.org/all/3916caa1dcd114301a49beafa5030eca396745c1.1679456900.git.jpoimboe@kernel.org/ Link: https://lore.kernel.org/r/20240507133103.15052-3-wei.w.wang@intel.com Signed-off-by: Jun Miao <jun.miao@intel.com> Link: https://patch.msgid.link/20260105065423.1870622-1-jun.miao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: x86: Ignore -EBUSY when checking nested events from vcpu_block()Sean Christopherson1-2/+1
Ignore -EBUSY when checking nested events after exiting a blocking state while L2 is active, as exiting to userspace will generate a spurious userspace exit, usually with KVM_EXIT_UNKNOWN, and likely lead to the VM's demise. Continuing with the wakeup isn't perfect either, as *something* has gone sideways if a vCPU is awakened in L2 with an injected event (or worse, a nested run pending), but continuing on gives the VM a decent chance of surviving without any major side effects. As explained in the Fixes commits, it _should_ be impossible for a vCPU to be put into a blocking state with an already-injected event (exception, IRQ, or NMI). Unfortunately, userspace can stuff MP_STATE and/or injected events, and thus put the vCPU into what should be an impossible state. Don't bother trying to preserve the WARN, e.g. with an anti-syzkaller Kconfig, as WARNs can (hopefully) be added in paths where _KVM_ would be violating x86 architecture, e.g. by WARNing if KVM attempts to inject an exception or interrupt while the vCPU isn't running. Cc: Alessandro Ratti <alessandro@0x65c.net> Cc: stable@vger.kernel.org Fixes: 26844fee6ade ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Fixes: 45405155d876 ("KVM: x86: WARN if a vCPU gets a valid wakeup that KVM can't yet inject") Link: https://syzkaller.appspot.com/text?tag=ReproC&x=10d4261a580000 Reported-by: syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671bc7a7.050a0220.455e8.022a.GAE@google.com Link: https://patch.msgid.link/20260109030657.994759-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: SVM: Tag sev_supported_vmsa_features as read-only after initSean Christopherson1-2/+2
Tag sev_supported_vmsa_features with __ro_after_init as it's configured by sev_hardware_setup() and never written after initial configuration (and if it were, that'd be a blatant bug). Opportunistically relocate the variable out of the module params area now that sev_es_debug_swap_enabled is gone (which largely motivated its original location). Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/20260109033101.1005769-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: SVM: Drop the module param to control SEV-ES DebugSwapSean Christopherson1-8/+3
Rip out the DebugSwap module param, as the sequence of events that led to its inclusion was one big mistake, the param no longer serves any purpose. Commit d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES") goofed by not adding a way for the userspace VMM to control the feature. Functionally, that was fine, but it broke attestation signatures because SEV_FEATURES are included in the signature. Commit 5abf6dceb066 ("SEV: disable SEV-ES DebugSwap by default") fixed that issue, but the underlying flaw of userspace not having a way to control SEV_FEATURES was still there. That flaw was addressed by commit 4f5defae7089 ("KVM: SEV: introduce KVM_SEV_INIT2 operation"), and so then 4dd5ecacb9a4 ("KVM: SEV: allow SEV-ES DebugSwap again") re-enabled DebugSwap by default. Now that the dust has settled, the module param doesn't serve any meaningful purpose. Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/20260109033101.1005769-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: x86: Update APICv ISR (a.k.a. SVI) as part of kvm_apic_update_apicv()Sean Christopherson3-27/+12
Fold the calls to .hwapic_isr_update() in kvm_apic_set_state(), kvm_lapic_reset(), and __kvm_vcpu_update_apicv() into kvm_apic_update_apicv(), as updating SVI is directly related to updating KVM's own cache of ISR information, e.g. SVI is more or less the APICv equivalent of highest_isr_cache. Note, calling .hwapic_isr_update() during kvm_apic_update_apicv() has benign side effects, as doing so changes the orders of the calls in kvm_lapic_reset() and kvm_apic_set_state(), specifically with respect to to the order between .hwapic_isr_update() and .apicv_post_state_restore(). However, the changes in ordering are glorified nops as the former hook is VMX-only and the latter is SVM-only. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: nVMX: Switch to vmcs01 to set virtual APICv mode on-demand if L2 is activeSean Christopherson3-13/+11
If L1's virtual APIC mode changes while L2 is active, e.g. because L1 doesn't intercept writes to the APIC_BASE MSR and L2 changes the mode, temporarily load vmcs01 and do all of the necessary actions instead of deferring the update until the next nested VM-Exit. This will help in fixing yet more issues related to updates while L2 is active, e.g. KVM neglects to update vmcs02 MSR intercepts if vmcs01's MSR intercepts are modified while L2 is active. Not updating x2APIC MSRs is benign because vmcs01's settings are not factored into vmcs02's bitmap, but deferring the x2APIC MSR updates would create a weird, inconsistent state. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: nVMX: Switch to vmcs01 to update APIC page on-demand if L2 is activeSean Christopherson3-11/+2
If the KVM-owned APIC-access page is migrated while L2 is running, temporarily load vmcs01 and immediately update APIC_ACCESS_ADDR instead of deferring the update until the next nested VM-Exit. Once changing the virtual APIC mode is converted to always do on-demand updates, all of the "defer until vmcs01 is active" logic will be gone. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: nVMX: Switch to vmcs01 to refresh APICv controls on-demand if L2 is activeSean Christopherson3-10/+1
If APICv is (un)inhibited while L2 is running, temporarily load vmcs01 and immediately refresh the APICv controls in vmcs01 instead of deferring the update until the next nested VM-Exit. This all but eliminates potential ordering issues due to vmcs01 not being synchronized with kvm_lapic.apicv_active, e.g. where KVM _thinks_ it refreshed APICv, but vmcs01 still contains stale state. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: nVMX: Switch to vmcs01 to update SVI on-demand if L2 is activeSean Christopherson3-18/+7
If APICv is activated while L2 is running and triggers an SVI update, temporarily load vmcs01 and immediately update SVI instead of deferring the update until the next nested VM-Exit. This will eventually allow killing off kvm_apic_update_hwapic_isr(), and all of nVMX's deferred APICv updates. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: nVMX: Switch to vmcs01 to update TPR threshold on-demand if L2 is activeSean Christopherson3-11/+3
If KVM updates L1's TPR Threshold while L2 is active, temporarily load vmcs01 and immediately update TPR_THRESHOLD instead of deferring the update until the next nested VM-Exit. Deferring the TPR Threshold update is relatively straightforward, but for several APICv related updates, deferring updates creates ordering and state consistency problems, e.g. KVM at-large thinks APICv is enabled, but vmcs01 is still running with stale (and effectively unknown) state. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14KVM: nVMX: Switch to vmcs01 to update PML controls on-demand if L2 is activeSean Christopherson3-10/+36
If KVM toggles "CPU dirty logging", a.k.a. Page-Modification Logging (PML), while L2 is active, temporarily load vmcs01 and immediately update the relevant controls instead of deferring the update until the next nested VM-Exit. For PML, deferring the update is relatively straightforward, but for several APICv related updates, deferring updates creates ordering and state consistency problems, e.g. KVM at-large thinks APICv is enabled, but vmcs01 is still running with stale (and effectively unknown) state. Convert PML first precisely because it's the simplest case to handle: if something is broken with the vmcs01 <=> vmcs02 dance, then hopefully bugs will bisect here. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14x86/entry/vdso32: When using int $0x80, use it directlyH. Peter Anvin2-5/+17
When neither sysenter32 nor syscall32 is available (on either FRED-capable 64-bit hardware or old 32-bit hardware), there is no reason to do a bunch of stack shuffling in __kernel_vsyscall. Unfortunately, just overwriting the initial "push" instructions will mess up the CFI annotations, so suffer the 3-byte NOP if not applicable. Similarly, inline the int $0x80 when doing inline system calls in the vdso instead of calling __kernel_vsyscall. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-11-hpa@zytor.com
2026-01-14x86/cpufeature: Replace X86_FEATURE_SYSENTER32 with X86_FEATURE_SYSFAST32H. Peter Anvin11-31/+42
In most cases, the use of "fast 32-bit system call" depends either on X86_FEATURE_SEP or X86_FEATURE_SYSENTER32 || X86_FEATURE_SYSCALL32. However, nearly all the logic for both is identical. Define X86_FEATURE_SYSFAST32 which indicates that *either* SYSENTER32 or SYSCALL32 should be used, for either 32- or 64-bit kernels. This defaults to SYSENTER; use SYSCALL if the SYSCALL32 bit is also set. As this removes ALL existing uses of X86_FEATURE_SYSENTER32, which is a kernel-only synthetic feature bit, simply remove it and replace it with X86_FEATURE_SYSFAST32. This leaves an unused alternative for a true 32-bit kernel, but that should really not matter in any way. The clearing of X86_FEATURE_SYSCALL32 can be removed once the patches for automatically clearing disabled features has been merged. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-10-hpa@zytor.com
2026-01-14x86/vdso: Abstract out vdso system call internalsH. Peter Anvin2-100/+111
Abstract out the calling of true system calls from the vdso into macros. It has been a very long time since gcc did not allow %ebx or %ebp in inline asm in 32-bit PIC mode; remove the corresponding hacks. Remove the use of memory output constraints in gettimeofday.h in favor of "memory" clobbers. The resulting code is identical for the current use cases, as the system call is usually a terminal fallback anyway, and it merely complicates the macroization. This patch adds only a handful of more lines of code than it removes, and in fact could be made substantially smaller by removing the macros for the argument counts that aren't currently used, however, it seems better to be general from the start. [ v3: remove stray comment from prototyping; remove VDSO_SYSCALL6() since it would require special handling on 32 bits and is currently unused. (Uros Biszjak) Indent nested preprocessor directives. ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Uros Bizjak <ubizjak@gmail.com> Link: https://patch.msgid.link/20251216212606.1325678-9-hpa@zytor.com
2026-01-14x86/entry/vdso: Include GNU_PROPERTY and GNU_STACK PHDRsH. Peter Anvin1-15/+23
Currently the vdso doesn't include .note.gnu.property or a GNU noexec stack annotation (the -z noexecstack in the linker script is ineffective because we specify PHDRs explicitly.) The motivation is that the dynamic linker currently do not check these. However, this is a weak excuse: the vdso*.so are also supposed to be usable at link libraries, and there is no reason why the dynamic linker might not want or need to check these in the future, so add them back in -- it is trivial enough. Use symbolic constants for the PHDR permission flags. [ v4: drop unrelated formatting changes ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-8-hpa@zytor.com
2026-01-14x86/entry/vdso32: Remove open-coded DWARF in sigreturn.SH. Peter Anvin3-114/+39
The vdso32 sigreturn.S contains open-coded DWARF bytecode, which includes a hack for gdb to not try to step back to a previous call instruction when backtracing from a signal handler. Neither of those are necessary anymore: the backtracing issue is handled by ".cfi_entry simple" and ".cfi_signal_frame", both of which have been supported for a very long time now, which allows the remaining frame to be built using regular .cfi annotations. Add a few more register offsets to the signal frame just for good measure. Replace the nop on fallthrough of the system call (which should never, ever happen) with a ud2a trap. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-7-hpa@zytor.com
2026-01-14x86/entry/vdso32: Remove SYSCALL_ENTER_KERNEL macro in sigreturn.SH. Peter Anvin1-6/+2
A macro SYSCALL_ENTER_KERNEL was defined in sigreturn.S, with the ability of overriding it. The override capability, however, is not used anywhere, and the macro name is potentially confusing because it seems to imply that sysenter/syscall could be used here, which is NOT true: the sigreturn system calls MUST use int $0x80. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-6-hpa@zytor.com
2026-01-14x86/entry/vdso32: Don't rely on int80_landing_pad for adjusting ipH. Peter Anvin1-10/+6
There is no fundamental reason to use the int80_landing_pad symbol to adjust ip when moving the vdso. If ip falls within the vdso, and the vdso is moved, we should change the ip accordingly, regardless of mode or location within the vdso. This *currently* can only happen on 32 bits, but there isn't any reason not to do so generically. Note that if this is ever possible from a vdso-internal call, then the user space stack will also needed to be adjusted (as well as the shadow stack, if enabled.) Fortunately this is not currently the case. At the moment, we don't even consider other threads when moving the vdso. The assumption is that it is only used by process freeze/thaw for migration, where this is not an issue. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-5-hpa@zytor.com
2026-01-14x86/entry/vdso: Refactor the vdso buildH. Peter Anvin21-186/+180
- Separate out the vdso sources into common, vdso32, and vdso64 directories. - Build the 32- and 64-bit vdsos in their respective subdirectories; this greatly simplifies the build flags handling. - Unify the mangling of Makefile flags between the 32- and 64-bit vdso code as much as possible; all common rules are put in arch/x86/entry/vdso/common/Makefile.include. The remaining is very simple for 32 bits; the 64-bit one is only slightly more complicated because it contains the x32 generation rule. - Define __DISABLE_EXPORTS when building the vdso. This need seems to have been masked by different ordering compile flags before. - Change CONFIG_X86_64 to BUILD_VDSO32_64 in vdso32/system_call.S, to make it compatible with including fake_32bit_build.h. - The -fcf-protection= option was "leaking" from the kernel build, for reasons that was not clear to me. Furthermore, several distributions ship with it set to a default value other than "-fcf-protection=none". Make it match the configuration options for *user space*. Note that this patch may seem large, but the vast majority of it is simply code movement. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-4-hpa@zytor.com
2026-01-14x86/entry/vdso: Move vdso2c to arch/x86/toolsH. Peter Anvin5-10/+14
It is generally better to build tools in arch/x86/tools to keep host cflags proliferation down, and to reduce makefile sequencing issues. Move the vdso build tool vdso2c into arch/x86/tools in preparation for refactoring the vdso makefiles. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-3-hpa@zytor.com
2026-01-14x86/entry/vdso: Rename vdso_image_* to vdso*_imageH. Peter Anvin8-26/+23
The vdso .so files are named vdso*.so. These structures are binary images and descriptions of these files, so it is more consistent for them to have a naming that more directly mirrors the filenames. It is also very slightly more compact (by one character...) and simplifies the Makefile just a little bit. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-2-hpa@zytor.com
2026-01-13KVM: SVM: Fix a missing kunmap_local() in sev_gmem_post_populate()Yan Zhao1-0/+1
sev_gmem_post_populate() needs to unmap the target vaddr after copy_from_user() to the vaddr fails. Fixes: dee5a47cc7a4 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Link: https://patch.msgid.link/20260108214622.1084057-2-michael.roth@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13arch/um: remove unused varible err in remove_files_and_dir()Alex Shi1-2/+1
err is duplicated with errno, and never used; remove it. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alex Shi <alexs@kernel.org> Link: https://patch.msgid.link/20260107064009.15380-1-alexs@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-13um: virtio_uml: Support adding devices via mconsoleTiwei Bie1-1/+50
It can be used when we want to add virtio devices to UML while it's up and running. Virtio devices can be added to UML using the same syntax as the command line option: (mconsole) config virtio_uml.device=<socket>:<virtio_id>[:<platform_id>] Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20260106004259.1604736-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-13um: Handle SIGCHLD in seccomp mode like other IRQ signalsTiwei Bie1-0/+3
In seccomp mode, SIGCHLD serves as the child reaper IRQ. So, let's handle it in the same way as other IRQ signals, including preventing them from nesting with each other and allowing interrupted syscalls to be automatically restarted. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20260106001228.1531146-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-13um: Preserve errno within signal handlerTiwei Bie1-3/+3
We rely on errno to determine whether a syscall has failed, so we need to ensure that accessing errno is async-signal-safe. Currently, we preserve the errno in sig_handler_common(), but it doesn't cover every possible case. Let's do it in hard_handler() instead, which is the signal handler we actually register. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20260106001228.1531146-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-13arm64: dts: qcom: qcs8300: Add GPU coolingGaurav Kohli1-0/+26
Unlike the CPU, the GPU does not throttle its speed automatically when it reaches high temperatures. Set up GPU cooling by throttling the GPU speed when reaching 115°C. Signed-off-by: Gaurav Kohli <quic_gkohli@quicinc.com> Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250903-a623-gpu-support-v5-4-5398585e2981@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-6/+54
Pull x86 kvm fixes from Paolo Bonzini: - Avoid freeing stack-allocated node in kvm_async_pf_queue_task - Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1 * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: selftests: kvm: Verify TILELOADD actually #NM faults when XFD[18]=1 selftests: kvm: try getting XFD and XSAVE state out of sync selftests: kvm: replace numbered sync points with actions x86/fpu: Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1 x86/kvm: Avoid freeing stack-allocated node in kvm_async_pf_queue_task
2026-01-13riscv: dts: allwinner: d1: Add RGB LEDs to boardsSamuel Holland2-0/+25
Some D1-based boards feature an onboard RGB LED. Enable them. Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Tested-by: Trevor Woerner <twoerner@gmail.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Link: https://patch.msgid.link/20231029212738.7871-6-samuel@sholland.org Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2026-01-13riscv: dts: allwinner: d1: Add LED controller nodeSamuel Holland2-0/+21
Allwinner D1 contains an LED controller. Add its devicetree node, as well as the pinmux used by the reference board design. Acked-by: Guo Ren <guoren@kernel.org> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Tested-by: Trevor Woerner <twoerner@gmail.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Link: https://patch.msgid.link/20231029212738.7871-5-samuel@sholland.org [wens@kernel.org: move "status" to end of properties] Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2026-01-13arm64: dts: allwinner: a100: Add LED controller nodeSamuel Holland1-0/+14
Allwinner A100 contains an LED controller. Add it to the devicetree. Acked-by: Guo Ren <guoren@kernel.org> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Link: https://patch.msgid.link/20231029212738.7871-4-samuel@sholland.org [wens@kernel.org: resolve conflict; move "status" to end of properties] Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2026-01-13x86/resctrl: Fix memory bandwidth counter width for HygonXiaochen Shen2-2/+16
The memory bandwidth calculation relies on reading the hardware counter and measuring the delta between samples. To ensure accurate measurement, the software reads the counter frequently enough to prevent it from rolling over twice between reads. The default Memory Bandwidth Monitoring (MBM) counter width is 24 bits. Hygon CPUs provide a 32-bit width counter, but they do not support the MBM capability CPUID leaf (0xF.[ECX=1]:EAX) to report the width offset (from 24 bits). Consequently, the kernel falls back to the 24-bit default counter width, which causes incorrect overflow handling on Hygon CPUs. Fix this by explicitly setting the counter width offset to 8 bits (resulting in a 32-bit total counter width) for Hygon CPUs. Fixes: d8df126349da ("x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper") Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251209062650.1536952-3-shenxiaochen@open-hieco.net
2026-01-13x86/resctrl: Add missing resctrl initialization for HygonXiaochen Shen1-2/+4
Hygon CPUs supporting Platform QoS features currently undergo partial resctrl initialization through resctrl_cpu_detect() in the Hygon BSP init helper and AMD/Hygon common initialization code. However, several critical data structures remain uninitialized for Hygon CPUs in the following paths: - get_mem_config()-> __rdt_get_mem_config_amd(): rdt_resource::membw,alloc_capable hw_res::num_closid - rdt_init_res_defs()->rdt_init_res_defs_amd(): rdt_resource::cache hw_res::msr_base,msr_update Add the missing AMD/Hygon common initialization to ensure proper Platform QoS functionality on Hygon CPUs. Fixes: d8df126349da ("x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper") Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251209062650.1536952-2-shenxiaochen@open-hieco.net
2026-01-13arm64: dts: qcom: sa8775p: Add reg and clocks for QoS configurationOdelu Kukatla1-72/+91
Add register addresses and clocks which need to be enabled for configuring QoS on sa8775p SoC. Signed-off-by: Odelu Kukatla <odelu.kukatla@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251001073344.6599-4-odelu.kukatla@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-13x86/crash: Use set_memory_p() instead of __set_memory_prot()Coiby Xu3-18/+1
set_memory_p() is available to use outside of its compilation unit since: 030ad7af9437 ("x86/mm: Regularize set_memory_p() parameters and make non-static"). There is no use for __set_memory_prot() anymore so drop it too. [ bp: Massage commit message. ] Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20260106095100.656292-1-coxu@redhat.com
2026-01-13arm64: dts: rockchip: Add Radxa CM3J on RPi CM4 IO BoardFUKAUMI Naoki2-0/+205
The Raspberry Pi Compute Module 4 IO Board is an application board for the Compute Module 4. [1] This patch adds support for the Radxa CM3J mounted on the RPi CM4 IO Board. Specification: - 12V 5521 DC jack - 2x full-size HDMI 2.0 connectors (only HDMI0 is supported with CM3J) - Gigabit Ethernet RJ45 with PoE support - 2x USB 2.0 connectors, with header for two more connectors - Micro USB connector - microSD card socket - PCIe Gen 2 x1 socket - 12V 4-pin PWM fan connector - External power connector (+5V, +12V) - 2x MIPI DSI connectors - 2x MIPI CSI-2 connectors - 40-pin GPIO header - RTC with battery socket - Red (power) and green (heartbeat) LEDs [1] https://datasheets.raspberrypi.com/cm4io/cm4io-datasheet.pdf Signed-off-by: FUKAUMI Naoki <naoki@radxa.com> Link: https://patch.msgid.link/20260108113341.14037-3-naoki@radxa.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2026-01-13arm64: dts: rockchip: Add Radxa CM3JFUKAUMI Naoki1-0/+558
The Radxa CM3J is a feature rich industrial compute module based on the Rockchip RK3568J SoC. [1] Specification: - Quad-core Cortex-A55 CPU - Mali-G52 2EE GPU - 1TOPS NPU - Up to 8GB LPDDR4x RAM - Up to 32GB eMMC (optional) - 16MB SPI flash (optional) - Wi-Fi 5 / BT 5.0 with external antenna connector - Gigabit Ethernet PHY - RK809 PMIC - Green (power) LED [1] https://dl.radxa.com/cm3j/docs/hw/radxa_cm3j_schematic_v1.2_20250115.pdf Signed-off-by: FUKAUMI Naoki <naoki@radxa.com> Link: https://patch.msgid.link/20260108113341.14037-2-naoki@radxa.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2026-01-13x86/pvlocks: Move paravirt spinlock functions into own headerJuergen Gross11-200/+197
Instead of having the pv spinlock function definitions in paravirt.h, move them into the new header paravirt-spinlock.h. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20260105110520.21356-22-jgross@suse.com
2026-01-13arm64: dts: rockchip: Make eeprom read-only for Radxa ROCK 3C/5A/5CFUKAUMI Naoki3-0/+3
The BL24C16 EEPROM implemented on the Radxa ROCK 3C, 5A, and 5C [1] [2] [3] is designed to have data written during factory programming (regardless of whether data is actually written or not), and we at Radxa permit users to read the data but not write to it. [4] Therefore, we will add a read-only property to the eeprom node. [1] https://dl.radxa.com/rock3/docs/hw/3c/v1400/radxa_rock_3c_v1400_schematic.pdf p.13 [2] https://dl.radxa.com/rock5/5a/docs/hw/radxa_rock5a_V1.1_sch.pdf p.19 [3] https://dl.radxa.com/rock5/5c/docs/hw/v1100/radxa_rock_5c_schematic_v1100.pdf p.18 [4] https://github.com/radxa/u-boot/blob/next-dev-v2024.10/drivers/misc/radxa-i2c-eeprom.c Signed-off-by: FUKAUMI Naoki <naoki@radxa.com> Link: https://patch.msgid.link/20260108034252.2713-1-naoki@radxa.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2026-01-13arm64: dts: rockchip: Add TS133 variant of the QNAP NAS seriesHeiko Stuebner2-0/+72
The TS133 is a one-bay NAS mostly similar to the other devices in the series. The main difference is that it is build around the RK3566 SoC instead of the RK3568 variant. The RK3566/RK3568 are mostly similar with only slight variants in both speed and some specific peripherals - the RK3568 has more. The specific for the NAS series stay the same though. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patch.msgid.link/20260104191448.2693309-6-heiko@sntech.de