summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2026-03-13ARM: clean up the memset64() C wrapperThomas Weißschuh1-5/+9
commit b52343d1cb47bb27ca32a3f4952cc2fd3cd165bf upstream. The current logic to split the 64-bit argument into its 32-bit halves is byte-order specific and a bit clunky. Use a union instead which is easier to read and works in all cases. GCC still generates the same machine code. While at it, rename the arguments of the __memset64() prototype to actually reflect their semantics. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Ben Hutchings <ben@decadent.org.uk> # for -stable Link: https://lore.kernel.org/all/1a11526ae3d8664f705b541b8d6ea57b847b49a8.camel@decadent.org.uk/ Suggested-by: https://lore.kernel.org/all/aZonkWMwpbFhzDJq@casper.infradead.org/ # for -stable Link: https://lore.kernel.org/all/aZonkWMwpbFhzDJq@casper.infradead.org/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-13x86/efi: defer freeing of boot services memoryMike Rapoport (Microsoft)3-5/+54
commit a4b0bf6a40f3c107c67a24fbc614510ef5719980 upstream. efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA using memblock_free_late(). There are two issue with that: memblock_free_late() should be used for memory allocated with memblock_alloc() while the memory reserved with memblock_reserve() should be freed with free_reserved_area(). More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y efi_free_boot_services() is called before deferred initialization of the memory map is complete. Benjamin Herrenschmidt reports that this causes a leak of ~140MB of RAM on EC2 t3a.nano instances which only have 512MB or RAM. If the freed memory resides in the areas that memory map for them is still uninitialized, they won't be actually freed because memblock_free_late() calls memblock_free_pages() and the latter skips uninitialized pages. Using free_reserved_area() at this point is also problematic because __free_page() accesses the buddy of the freed page and that again might end up in uninitialized part of the memory map. Delaying the entire efi_free_boot_services() could be problematic because in addition to freeing boot services memory it updates efi.memmap without any synchronization and that's undesirable late in boot when there is concurrency. More robust approach is to only defer freeing of the EFI boot services memory. Split efi_free_boot_services() in two. First efi_unmap_boot_services() collects ranges that should be freed into an array then efi_free_boot_services() later frees them after deferred init is complete. Link: https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org Fixes: 916f676f8dc0 ("x86, efi: Retain boot service code until after switching to virtual mode") Cc: <stable@vger.kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-13LoongArch: Remove some extern variables in source filesTiezhu Yang3-7/+0
[ Upstream commit 0e6f596d6ac635e80bb265d587b2287ef8fa1cd6 ] There are declarations of the variable "eentry", "pcpu_handlers[]" and "exception_handlers[]" in asm/setup.h, the source files already include this header file directly or indirectly, so no need to declare them in the source files, just remove the code. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13LoongArch: Handle percpu handler address for ORC unwinderTiezhu Yang2-0/+19
[ Upstream commit 055c7e75190e0be43037bd663a3f6aced194416e ] After commit 4cd641a79e69 ("LoongArch: Remove unnecessary checks for ORC unwinder"), the system can not boot normally under some configs (such as enable KASAN), there are many error messages "cannot find unwind pc". The kernel boots normally with the defconfig, so no problem found out at the first time. Here is one way to reproduce: cd linux make mrproper defconfig -j"$(nproc)" scripts/config -e KASAN make olddefconfig all -j"$(nproc)" sudo make modules_install sudo make install sudo reboot The address that can not unwind is not a valid kernel address which is between "pcpu_handlers[cpu]" and "pcpu_handlers[cpu] + vec_sz" due to the code of eentry was copied to the new area of pcpu_handlers[cpu] in setup_tlb_handler(), handle this special case to get the valid address to unwind normally. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13LoongArch: Remove unnecessary checks for ORC unwinderTiezhu Yang1-11/+5
[ Upstream commit 4cd641a79e69270a062777f64a0dd330abb9044a ] According to the following function definitions, __kernel_text_address() already checks __module_text_address(), so it should remove the check of __module_text_address() in bt_address() at least. int __kernel_text_address(unsigned long addr) { if (kernel_text_address(addr)) return 1; ... return 0; } int kernel_text_address(unsigned long addr) { bool no_rcu; int ret = 1; ... if (is_module_text_address(addr)) goto out; ... return ret; } bool is_module_text_address(unsigned long addr) { guard(rcu)(); return __module_text_address(addr) != NULL; } Furthermore, there are two checks of __kernel_text_address(), one is in bt_address() and the other is after calling bt_address(), it looks like redundant. Handle the exception address first and then use __kernel_text_address() to validate the calculated address for exception or the normal address in bt_address(), then it can remove the check of __kernel_text_address() after calling bt_address(). Just remove unnecessary checks, no functional changes intended. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Stable-dep-of: 055c7e75190e ("LoongArch: Handle percpu handler address for ORC unwinder") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13LoongArch/orc: Use RCU in all users of __module_address().Sebastian Andrzej Siewior1-3/+1
[ Upstream commit f99d27d9feb755aee9350fc89f57814d7e1b4880 ] __module_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_address() with RCU. Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: loongarch@lists.linux.dev Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-19-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Stable-dep-of: 055c7e75190e ("LoongArch: Handle percpu handler address for ORC unwinder") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13uprobes: switch to RCU Tasks Trace flavor for better performanceAndrii Nakryiko1-0/+1
[ Upstream commit 87195a1ee332add27bd51448c6b54aad551a28f5 ] This patch switches uprobes SRCU usage to RCU Tasks Trace flavor, which is optimized for more lightweight and quick readers (at the expense of slower writers, which for uprobes is a fine tradeof) and has better performance and scalability with number of CPUs. Similarly to baseline vs SRCU, we've benchmarked SRCU-based implementation vs RCU Tasks Trace implementation. SRCU ==== uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu) uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu) uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu) uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu) uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu) uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu) uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu) uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu) uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu) uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu) uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu) uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu) uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu) uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu) RCU Tasks Trace =============== uprobe-nop ( 1 cpus): 3.396 ± 0.002M/s ( 3.396M/s/cpu) uprobe-nop ( 2 cpus): 4.271 ± 0.006M/s ( 2.135M/s/cpu) uprobe-nop ( 4 cpus): 8.499 ± 0.015M/s ( 2.125M/s/cpu) uprobe-nop ( 8 cpus): 10.355 ± 0.028M/s ( 1.294M/s/cpu) uprobe-nop (16 cpus): 7.615 ± 0.099M/s ( 0.476M/s/cpu) uprobe-nop (32 cpus): 4.430 ± 0.007M/s ( 0.138M/s/cpu) uprobe-nop (64 cpus): 6.887 ± 0.020M/s ( 0.108M/s/cpu) uretprobe-nop ( 1 cpus): 2.174 ± 0.001M/s ( 2.174M/s/cpu) uretprobe-nop ( 2 cpus): 2.853 ± 0.001M/s ( 1.426M/s/cpu) uretprobe-nop ( 4 cpus): 4.913 ± 0.002M/s ( 1.228M/s/cpu) uretprobe-nop ( 8 cpus): 5.883 ± 0.002M/s ( 0.735M/s/cpu) uretprobe-nop (16 cpus): 5.147 ± 0.001M/s ( 0.322M/s/cpu) uretprobe-nop (32 cpus): 3.738 ± 0.008M/s ( 0.117M/s/cpu) uretprobe-nop (64 cpus): 4.397 ± 0.002M/s ( 0.069M/s/cpu) Peak throughput for uprobes increases from 8 mln/s to 10.3 mln/s (+28%!), and for uretprobes from 5.3 mln/s to 5.8 mln/s (+11%), as we have more work to do on uretprobes side. Even single-thread (no contention) performance is slightly better: 3.276 mln/s to 3.396 mln/s (+3.5%) for uprobes, and 2.055 mln/s to 2.174 mln/s (+5.8%) for uretprobes. We also select TASKS_TRACE_RCU for UPROBES in Kconfig due to the new dependency. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20240910174312.3646590-1-andrii@kernel.org Stable-dep-of: a56a38fd9196 ("uprobes: Fix incorrect lockdep condition in filter_chain()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13arm64: dts: rockchip: Fix rk3588 PCIe range mappingsShawn Lin2-5/+5
[ Upstream commit 46c56b737161060dfa468f25ae699749047902a2 ] The pcie bus address should be mapped 1:1 to the cpu side MMIO address, so that there is no same address allocated from normal system memory. Otherwise it's broken if the same address assigned to the EP for DMA purpose.Fix it to sync with the vendor BSP. Fixes: 0acf4fa7f187 ("arm64: dts: rockchip: add PCIe3 support for rk3588") Fixes: 8d81b77f4c49 ("arm64: dts: rockchip: add rk3588 PCIe2 support") Cc: stable@vger.kernel.org Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://patch.msgid.link/1767600929-195341-2-git-send-email-shawn.lin@rock-chips.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13arm64: dts: rockchip: Fix rk356x PCIe range mappingsShawn Lin2-3/+3
[ Upstream commit f63ea193a404481f080ca2958f73e9f364682db9 ] The pcie bus address should be mapped 1:1 to the cpu side MMIO address, so that there is no same address allocated from normal system memory. Otherwise it's broken if the same address assigned to the EP for DMA purpose.Fix it to sync with the vendor BSP. Fixes: 568a67e742df ("arm64: dts: rockchip: Fix rk356x PCIe register and range mappings") Fixes: 66b51ea7d70f ("arm64: dts: rockchip: Add rk3568 PCIe2x1 controller") Cc: stable@vger.kernel.org Cc: Andrew Powers-Holmes <aholmes@omnom.net> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://patch.msgid.link/1767600929-195341-1-git-send-email-shawn.lin@rock-chips.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13KVM: x86: Ignore -EBUSY when checking nested events from vcpu_block()Sean Christopherson1-2/+1
[ Upstream commit ead63640d4e72e6f6d464f4e31f7fecb79af8869 ] Ignore -EBUSY when checking nested events after exiting a blocking state while L2 is active, as exiting to userspace will generate a spurious userspace exit, usually with KVM_EXIT_UNKNOWN, and likely lead to the VM's demise. Continuing with the wakeup isn't perfect either, as *something* has gone sideways if a vCPU is awakened in L2 with an injected event (or worse, a nested run pending), but continuing on gives the VM a decent chance of surviving without any major side effects. As explained in the Fixes commits, it _should_ be impossible for a vCPU to be put into a blocking state with an already-injected event (exception, IRQ, or NMI). Unfortunately, userspace can stuff MP_STATE and/or injected events, and thus put the vCPU into what should be an impossible state. Don't bother trying to preserve the WARN, e.g. with an anti-syzkaller Kconfig, as WARNs can (hopefully) be added in paths where _KVM_ would be violating x86 architecture, e.g. by WARNing if KVM attempts to inject an exception or interrupt while the vCPU isn't running. Cc: Alessandro Ratti <alessandro@0x65c.net> Cc: stable@vger.kernel.org Fixes: 26844fee6ade ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Fixes: 45405155d876 ("KVM: x86: WARN if a vCPU gets a valid wakeup that KVM can't yet inject") Link: https://syzkaller.appspot.com/text?tag=ReproC&x=10d4261a580000 Reported-by: syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671bc7a7.050a0220.455e8.022a.GAE@google.com Link: https://patch.msgid.link/20260109030657.994759-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13x86/acpi/boot: Correct acpi_is_processor_usable() check againYazen Ghannam2-19/+8
[ Upstream commit adbf61cc47cb72b102682e690ad323e1eda652c2 ] ACPI v6.3 defined a new "Online Capable" MADT LAPIC flag. This bit is used in conjunction with the "Enabled" MADT LAPIC flag to determine if a CPU can be enabled/hotplugged by the OS after boot. Before the new bit was defined, the "Enabled" bit was explicitly described like this (ACPI v6.0 wording provided): "If zero, this processor is unusable, and the operating system support will not attempt to use it" This means that CPU hotplug (based on MADT) is not possible. Many BIOS implementations follow this guidance. They may include LAPIC entries in MADT for unavailable CPUs, but since these entries are marked with "Enabled=0" it is expected that the OS will completely ignore these entries. However, QEMU will do the same (include entries with "Enabled=0") for the purpose of allowing CPU hotplug within the guest. Comment from QEMU function pc_madt_cpu_entry(): /* ACPI spec says that LAPIC entry for non present * CPU may be omitted from MADT or it must be marked * as disabled. However omitting non present CPU from * MADT breaks hotplug on linux. So possible CPUs * should be put in MADT but kept disabled. */ Recent Linux topology changes broke the QEMU use case. A following fix for the QEMU use case broke bare metal topology enumeration. Rework the Linux MADT LAPIC flags check to allow the QEMU use case only for guests and to maintain the ACPI spec behavior for bare metal. Remove an unnecessary check added to fix a bare metal case introduced by the QEMU "fix". [ bp: Change logic as Michal suggested. ] [ mingo: Removed misapplied -stable tag. ] Fixes: fed8d8773b8e ("x86/acpi/boot: Correct acpi_is_processor_usable() check") Fixes: f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package") Closes: https://lore.kernel.org/r/20251024204658.3da9bf3f.michal.pecio@gmail.com Reported-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Michal Pecio <michal.pecio@gmail.com> Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://lore.kernel.org/20251111145357.4031846-1-yazen.ghannam@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearingFuad Tabba1-1/+1
[ Upstream commit ef06fd16d48704eac868441d98d4ef083d8f3d07 ] struct bpf_plt contains a u64 target field. Currently, the BPF JIT allocator requests an alignment of 4 bytes (sizeof(u32)) for the JIT buffer. Because the base address of the JIT buffer can be 4-byte aligned (e.g., ending in 0x4 or 0xc), the relative padding logic in build_plt() fails to ensure that target lands on an 8-byte boundary. This leads to two issues: 1. UBSAN reports misaligned-access warnings when dereferencing the structure. 2. More critically, target is updated concurrently via WRITE_ONCE() in bpf_arch_text_poke() while the JIT'd code executes ldr. On arm64, 64-bit loads/stores are only guaranteed to be single-copy atomic if they are 64-bit aligned. A misaligned target risks a torn read, causing the JIT to jump to a corrupted address. Fix this by increasing the allocation alignment requirement to 8 bytes (sizeof(u64)) in bpf_jit_binary_pack_alloc(). This anchors the base of the JIT buffer to an 8-byte boundary, allowing the relative padding math in build_plt() to correctly align the target field. Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64") Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20260226075525.233321-1-tabba@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13s390/vtime: Fix virtual timer forwardingHeiko Carstens1-16/+2
[ Upstream commit dbc0fb35679ed5d0adecf7d02137ac2c77244b3b ] Since delayed accounting of system time [1] the virtual timer is forwarded by do_account_vtime() but also vtime_account_kernel(), vtime_account_softirq(), and vtime_account_hardirq(). This leads to double accounting of system, guest, softirq, and hardirq time. Remove accounting from the vtime_account*() family to restore old behavior. There is only one user of the vtimer interface, which might explain why nobody noticed this so far. Fixes: b7394a5f4ce9 ("sched/cputime, s390: Implement delayed accounting of system time") [1] Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13s390/idle: Fix cpu idle exit cpu time accountingHeiko Carstens3-6/+18
[ Upstream commit 0d785e2c324c90662baa4fe07a0d02233ff92824 ] With the conversion to generic entry [1] cpu idle exit cpu time accounting was converted from assembly to C. This introduced an reversed order of cpu time accounting. On cpu idle exit the current accounting happens with the following call chain: -> do_io_irq()/do_ext_irq() -> irq_enter_rcu() -> account_hardirq_enter() -> vtime_account_irq() -> vtime_account_kernel() vtime_account_kernel() accounts the passed cpu time since last_update_timer as system time, and updates last_update_timer to the current cpu timer value. However the subsequent call of -> account_idle_time_irq() will incorrectly subtract passed cpu time from timer_idle_enter to the updated last_update_timer value from system_timer. Then last_update_timer is updated to a sys_enter_timer, which means that last_update_timer goes back in time. Subsequently account_hardirq_exit() will account too much cpu time as hardirq time. The sum of all accounted cpu times is still correct, however some cpu time which was previously accounted as system time is now accounted as hardirq time, plus there is the oddity that last_update_timer goes back in time. Restore previous behavior by extracting cpu time accounting code from account_idle_time_irq() into a new update_timer_idle() function and call it before irq_enter_rcu(). Fixes: 56e62a737028 ("s390: convert to generic entry") [1] Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13x86/fred: Correct speculative safety in fred_extint()Andrew Cooper1-3/+2
[ Upstream commit aa280a08e7d8fae58557acc345b36b3dc329d595 ] array_index_nospec() is no use if the result gets spilled to the stack, as it makes the believed safe-under-speculation value subject to memory predictions. For all practical purposes, this means array_index_nospec() must be used in the expression that accesses the array. As the code currently stands, it's the wrong side of irqentry_enter(), and 'index' is put into %ebp across the function call. Remove the index variable and reposition array_index_nospec(), so it's calculated immediately before the array access. Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260106131504.679932-1-andrew.cooper3@citrix.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13KVM: arm64: Hide S1POE from guests when not supported by the hostFuad Tabba1-0/+3
[ Upstream commit f66857bafd4f151c5cc6856e47be2e12c1721e43 ] When CONFIG_ARM64_POE is disabled, KVM does not save/restore POR_EL1. However, ID_AA64MMFR3_EL1 sanitisation currently exposes the feature to guests whenever the hardware supports it, ignoring the host kernel configuration. If a guest detects this feature and attempts to use it, the host will fail to context-switch POR_EL1, potentially leading to state corruption. Fix this by masking ID_AA64MMFR3_EL1.S1POE in the sanitised system registers, preventing KVM from advertising the feature when the host does not support it (i.e. system_supports_poe() is false). Fixes: 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260213143815.1732675-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-05Revert "x86/kexec: add a sanity check on previous kernel's ima kexec buffer"Sasha Levin1-6/+0
This reverts commit c5489d04337b47e93c0623e8145fcba3f5739efd. The commit introduces a call to ima_validate_range() in arch/x86/kernel/setup.c, but the function declaration is not available in the 6.12 stable tree, resulting in build failures due to implicit function declaration errors across multiple stable branches. Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: Fix sampling the "stable" virtual counter in preemptible sectionMarc Zyngier1-1/+5
[ Upstream commit e5cb94ba5f96d691d8885175d4696d6ae6bc5ec9 ] Ben reports that when running with CONFIG_DEBUG_PREEMPT, using __arch_counter_get_cntvct_stable() results in well deserves warnings, as we access a per-CPU variable without preemption disabled. Fix the issue by disabling preemption on reading the counter. We can probably do a lot better by not disabling preemption on systems that do not require horrible workarounds to return a valid counter value, but this plugs the issue for the time being. Fixes: 29cc0f3aa7c6 ("arm64: Force the use of CNTVCT_EL0 in __delay()") Reported-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/aZw3EGs4rbQvbAzV@e134344.arm.com Tested-by: Ben Horgan <ben.horgan@arm.com> Tested-by: André Draszik <andre.draszik@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: Force the use of CNTVCT_EL0 in __delay()Marc Zyngier1-4/+15
[ Upstream commit 29cc0f3aa7c64d3b3cb9d94c0a0984ba6717bf72 ] Quentin forwards a report from Hyesoo Yu, describing an interesting problem with the use of WFxT in __delay() when a vcpu is loaded and that KVM is *not* in VHE mode (either nVHE or hVHE). In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the state of the guest virtual counter. At the same time, __delay() is using get_cycles() to read the counter value, which is indirected to reading CNTPCT_EL0. The core of the issue is that WFxT is using the *virtual* counter, while the kernel is using the physical counter, and that the offset introduces a really bad discrepancy between the two. Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent irrespective of the value of CNTVOFF_EL2. Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com> Reported-by: Quentin Perret <qperret@google.com> Reviewed-by: Quentin Perret <qperret@google.com> Fixes: 7d26b0516a0d ("arm64: Use WFxT for __delay() when possible") Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od Cc: stable@vger.kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04x86/kexec: Copy ACPI root pointer address from config tableArd Biesheuvel1-0/+7
[ Upstream commit e00ac9e5afb5d80c0168ec88d8e8662a54af8249 ] Dave reports that kexec may fail when the first kernel boots via the EFI stub but without EFI runtime services, as in that case, the RSDP address field in struct bootparams is never assigned. Kexec copies this value into the version of struct bootparams that it provides to the incoming kernel, which may have no other means to locate the ACPI root pointer. So take the value from the EFI config tables if no root pointer has been set in the first kernel's struct bootparams. Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Cc: <stable@vger.kernel.org> # v6.1 Reported-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Link: https://lore.kernel.org/linux-efi/aZQg_tRQmdKNadCg@darkstar.users.ipa.redhat.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04LoongArch: Disable instrumentation for setup_ptwalker()Tiezhu Yang1-1/+1
[ Upstream commit 7cb37af61f09c9cfd90c43c9275307c16320cbf2 ] According to Documentation/dev-tools/kasan.rst, software KASAN modes use compiler instrumentation to insert validity checks. Such instrumentation might be incompatible with some parts of the kernel, and therefore needs to be disabled, just use the attribute __no_sanitize_address to disable instrumentation for the low level function setup_ptwalker(). Otherwise bringing up the secondary CPUs failed when CONFIG_KASAN is set (especially when PTW is enabled), here are the call chains: smpboot_entry() start_secondary() cpu_probe() per_cpu_trap_init() tlb_init() setup_tlb_handler() setup_ptwalker() The reason is the PGD registers are configured in setup_ptwalker(), but KASAN instrumentation may cause TLB exceptions before that. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04LoongArch: Guard percpu handler under !CONFIG_PREEMPT_RTTiezhu Yang1-1/+1
[ Upstream commit 70b0faae3590c628a98a627a10e5d211310169d4 ] After commit 88fd2b70120d ("LoongArch: Fix sleeping in atomic context for PREEMPT_RT"), it should guard percpu handler under !CONFIG_PREEMPT_RT to avoid redundant operations. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04LoongArch: Use %px to print unmodified unwinding addressTiezhu Yang1-1/+1
[ Upstream commit 77403a06d845db1caf9a6b0867b43e9dd8de8e4a ] Currently, use %p to prevent leaking information about the kernel memory layout when printing the PC address, but the kernel log messages are not useful to debug problem if bt_address() returns 0. Given that the type of "pc" variable is unsigned long, it should use %px to print the unmodified unwinding address. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04LoongArch: Prefer top-down allocation after arch_mem_init()Huacai Chen1-0/+1
[ Upstream commit 2172d6ebac9372eb01fe4505a53e18cb061e103b ] Currently we use bottom-up allocation after sparse_init(), the reason is sparse_init() need a lot of memory, and bottom-up allocation may exhaust precious low memory (below 4GB). On the other hand, SWIOTLB and CMA need low memories for DMA32, so swiotlb_init() and dma_contiguous_reserve() need bottom-up allocation. Since swiotlb_init() and dma_contiguous_reserve() are both called in arch_mem_init(), we no longer need bottom-up allocation after that. So we set the allocation policy to top-down at the end of arch_mem_init(), in order to avoid later memory allocations (such as KASAN) exhaust low memory. This solve at least two problems: 1. Some buggy BIOSes use 0xfd000000~0xfe000000 for secondary CPUs, but didn't reserve this range, which causes smpboot failures. 2. Some DMA32 devices, such as Loongson-DRM and OHCI, cannot work with KASAN enabled. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04LoongArch: Make cpumask_of_node() robust against NUMA_NO_NODEJohn Garry1-1/+1
[ Upstream commit 94b0c831eda778ae9e4f2164a8b3de485d8977bb ] The arch definition of cpumask_of_node() cannot handle NUMA_NO_NODE - which is a valid index - so add a check for this. Cc: stable@vger.kernel.org Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04MIPS: rb532: Fix MMIO UART resource registrationJiaxun Yang1-2/+3
[ Upstream commit e93bb4b76cfefb302534246e892c7667491cb8cc ] Since commit 6e690d54cfa8 ("serial: 8250: fix return error code in serial8250_request_std_resource()"), registering an 8250 MMIO port without mapbase no longer works, as the resource range is derived from mapbase/mapsize. Populate mapbase and mapsize accordingly. Also drop ugly membase KSEG1 pointer and set UPF_IOREMAP instead, letting the 8250 core perform the ioremap. Fixes: 6e690d54cfa8 ("serial: 8250: fix return error code in serial8250_request_std_resource()") Cc: stable@vger.kernel.org Reported-by: Waldemar Brodkorb <wbx@openadk.org> Link: https://lore.kernel.org/linux-mips/aX-d0ShTplHKZT33@waldemar-brodkorb.de/ Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04parisc: kernel: replace kfree() with put_device() in create_tree_node()Haoxiang Li1-1/+1
[ Upstream commit dcf69599c47f29ce0a99117eb3f9ddcd2c4e78b6 ] If device_register() fails, put_device() is the correct way to drop the device reference. Found by code review. Fixes: 1070c9655b90 ("[PA-RISC] Fix must_check warnings in drivers.c") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: Fix non-atomic __READ_ONCE() with CONFIG_LTO=yMarco Elver1-1/+1
[ Upstream commit bb0c99e08ab9aa6d04b40cb63c72db9950d51749 ] The implementation of __READ_ONCE() under CONFIG_LTO=y incorrectly qualified the fallback "once" access for types larger than 8 bytes, which are not atomic but should still happen "once" and suppress common compiler optimizations. The cast `volatile typeof(__x)` applied the volatile qualifier to the pointer type itself rather than the pointee. This created a volatile pointer to a non-volatile type, which violated __READ_ONCE() semantics. Fix this by casting to `volatile typeof(*__x) *`. With a defconfig + LTO + debug options build, we see the following functions to be affected: xen_manage_runstate_time (884 -> 944 bytes) xen_steal_clock (248 -> 340 bytes) ^-- use __READ_ONCE() to load vcpu_runstate_info structs Fixes: e35123d83ee3 ("arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y") Cc: stable@vger.kernel.org Reviewed-by: Boqun Feng <boqun@kernel.org> Signed-off-by: Marco Elver <elver@google.com> Tested-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04powerpc/smp: Add check for kcalloc() failure in parse_thread_groups()Guangshuo Li1-0/+2
[ Upstream commit 33c1c6d8a28a2761ac74b0380b2563cf546c2a3a ] As kcalloc() may fail, check its return value to avoid a NULL pointer dereference when passing it to of_property_read_u32_array(). Fixes: 790a1662d3a26 ("powerpc/smp: Parse ibm,thread-groups with multiple properties") Cc: stable@vger.kernel.org Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250923133235.1862108-1-lgs201920130244@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: poe: fix stale POR_EL0 values for ptraceJoey Gouly1-0/+3
[ Upstream commit 1f3b950492db411e6c30ee0076b61ef2694c100a ] If a process wrote to POR_EL0 and then crashed before a context switch happened, the coredump would contain an incorrect value for POR_EL0. The value read in poe_get() would be a stale value left in thread.por_el0. Fix this by reading the value from the system register, if the target thread is the current thread. This matches what gcs/fpsimd do. Fixes: 175198199262 ("arm64/ptrace: add support for FEAT_POE") Reported-by: David Spickett <david.spickett@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04x86/kexec: add a sanity check on previous kernel's ima kexec bufferHarshit Mogalapalli1-0/+6
[ Upstream commit c5489d04337b47e93c0623e8145fcba3f5739efd ] When the second-stage kernel is booted via kexec with a limiting command line such as "mem=<size>", the physical range that contains the carried over IMA measurement list may fall outside the truncated RAM leading to a kernel panic. BUG: unable to handle page fault for address: ffff97793ff47000 RIP: ima_restore_measurement_list+0xdc/0x45a #PF: error_code(0x0000) – not-present page Other architectures already validate the range with page_is_ram(), as done in commit cbf9c4b9617b ("of: check previous kernel's ima-kexec-buffer against memory bounds") do a similar check on x86. Without carrying the measurement list across kexec, the attestation would fail. Link: https://lkml.kernel.org/r/20251231061609.907170-4-harshit.m.mogalapalli@oracle.com Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Fixes: b69a2afd5afc ("x86/kexec: Carry forward IMA measurement log on kexec") Reported-by: Paul Webb <paul.x.webb@oracle.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Cc: Alexander Graf <graf@amazon.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Betkov <bp@alien8.de> Cc: guoweikang <guoweikang.kernel@gmail.com> Cc: Henry Willard <henry.willard@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Joel Granados <joel.granados@kernel.org> Cc: Jonathan McDowell <noodles@fb.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Yifei Liu <yifei.l.liu@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04KVM: x86: Add SRCU protection for reading PDPTRs in __get_sregs2()Vasiliy Kovalev1-0/+2
[ Upstream commit 95d848dc7e639988dbb385a8cba9b484607cf98c ] Add SRCU read-side protection when reading PDPTR registers in __get_sregs2(). Reading PDPTRs may trigger access to guest memory: kvm_pdptr_read() -> svm_cache_reg() -> load_pdptrs() -> kvm_vcpu_read_guest_page() -> kvm_vcpu_gfn_to_memslot() kvm_vcpu_gfn_to_memslot() dereferences memslots via __kvm_memslots(), which uses srcu_dereference_check() and requires either kvm->srcu or kvm->slots_lock to be held. Currently only vcpu->mutex is held, triggering lockdep warning: ============================= WARNING: suspicious RCU usage in kvm_vcpu_gfn_to_memslot 6.12.59+ #3 Not tainted include/linux/kvm_host.h:1062 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by syz.5.1717/15100: #0: ff1100002f4b00b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x1d5/0x1590 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xf0/0x120 lib/dump_stack.c:120 lockdep_rcu_suspicious+0x1e3/0x270 kernel/locking/lockdep.c:6824 __kvm_memslots include/linux/kvm_host.h:1062 [inline] __kvm_memslots include/linux/kvm_host.h:1059 [inline] kvm_vcpu_memslots include/linux/kvm_host.h:1076 [inline] kvm_vcpu_gfn_to_memslot+0x518/0x5e0 virt/kvm/kvm_main.c:2617 kvm_vcpu_read_guest_page+0x27/0x50 virt/kvm/kvm_main.c:3302 load_pdptrs+0xff/0x4b0 arch/x86/kvm/x86.c:1065 svm_cache_reg+0x1c9/0x230 arch/x86/kvm/svm/svm.c:1688 kvm_pdptr_read arch/x86/kvm/kvm_cache_regs.h:141 [inline] __get_sregs2 arch/x86/kvm/x86.c:11784 [inline] kvm_arch_vcpu_ioctl+0x3e20/0x4aa0 arch/x86/kvm/x86.c:6279 kvm_vcpu_ioctl+0x856/0x1590 virt/kvm/kvm_main.c:4663 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xbd/0x1d0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Suggested-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Fixes: 6dba94035203 ("KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2") Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Link: https://patch.msgid.link/20260123222801.646123-1-kovalev@altlinux.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: dts: rockchip: Do not enable hdmi_sound node on Pinebook ProJun Yan1-4/+0
[ Upstream commit b18247f9dab735c9c2d63823d28edc9011e7a1ad ] Remove the redundant enabling of the hdmi_sound node in the Pinebook Pro board dts file, because the HDMI output is unused on this device. [1][2] This change also eliminates the following kernel log warning, which is caused by the unenabled dependent node of hdmi_sound that ultimately results in the node's probe failure: platform hdmi-sound: deferred probe pending: asoc-simple-card: parse error [1] https://files.pine64.org/doc/PinebookPro/pinebookpro_v2.1_mainboard_schematic.pdf [2] https://files.pine64.org/doc/PinebookPro/pinebookpro_schematic_v21a_20220419.pdf Cc: stable@vger.kernel.org Fixes: 5a65505a69884 ("arm64: dts: rockchip: Add initial support for Pinebook Pro") Signed-off-by: Jun Yan <jerrysteve1101@gmail.com> Reviewed-by: Peter Robinson <pbrobinson@gmail.com> Reviewed-by: Dragan Simic <dsimic@manjaro.org> Link: https://patch.msgid.link/20260116151253.9223-1-jerrysteve1101@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04s390/pci: Handle futile config accesses of disabled devices directlyNiklas Schnelle1-8/+17
[ Upstream commit 84d875e69818bed600edccb09be4a64b84a34a54 ] On s390 PCI busses and slots with multiple functions may have holes because PCI functions are passed-through by the hypervisor on a per function basis and some functions may be in standby or reserved. This fact is indicated by returning true from the hypervisor_isolated_pci_functions() helper and triggers common code to scan all possible devfn values. Via pci_scan_single_device() this in turn causes config reads for the device and vendor IDs, even for PCI functions which are in standby and thereofore disabled. So far these futile config reads, as well as potentially writes, which can never succeed were handled by the PCI load/store instructions themselves. This works as the platform just returns an error for a disabled and thus not usable function handle. It does cause spamming of error logs and additional overhead though. Instead check if the used function handle is enabled in zpci_cfg_load() and zpci_cfg_write() and if not enable directly return -ENODEV. Also refactor zpci_cfg_load() and zpci_cfg_store() slightly to accommodate the new logic while meeting modern kernel style guidelines. Cc: stable@vger.kernel.org Fixes: a50297cf8235 ("s390/pci: separate zbus creation from scanning") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04KVM: nSVM: Always use vmcb01 in VMLOAD/VMSAVE emulationYosry Ahmed1-2/+3
[ Upstream commit 127ccae2c185f62e6ecb4bf24f9cb307e9b9c619 ] Commit cc3ed80ae69f ("KVM: nSVM: always use vmcb01 to for vmsave/vmload of guest state") made KVM always use vmcb01 for the fields controlled by VMSAVE/VMLOAD, but it missed updating the VMLOAD/VMSAVE emulation code to always use vmcb01. As a result, if VMSAVE/VMLOAD is executed by an L2 guest and is not intercepted by L1, KVM will mistakenly use vmcb02. Always use vmcb01 instead of the current VMCB. Fixes: cc3ed80ae69f ("KVM: nSVM: always use vmcb01 to for vmsave/vmload of guest state") Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20260110004821.3411245-2-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: dts: apple: t8112-j473: Keep the HDMI port powered onJanne Grunau1-0/+19
[ Upstream commit 3e4e729325131fe6f7473a0673f7d8cdde53f5a0 ] Add the display controller and DPTX phy power-domains to the framebuffer node to keep the framebuffer and display out working after device probing finished. The OS has more control about the display pipeline used for the HDMI output on M2 based devices. The HDMI output is driven by an integrated DisplayPort to HDMI converter (Parade PS190). The DPTX phy is now controlled by the OS and no longer by firmware running on the display co-processor. This allows using the second display controller on the second USB type-c port or tunneling 2 DisplayPort connections over USB4/Thunderbolt. The m1n1 bootloader uses the second display controller to drive the HDMI output. Adjust for this difference compared to the notebooks as well. Fixes: 2d5ce3fbef32 ("arm64: dts: apple: t8112: Initial t8112 (M2) device trees") Cc: stable@vger.kernel.org Signed-off-by: Janne Grunau <j@jannau.net> Link: https://patch.msgid.link/20260108-apple-dt-pmgr-fixes-v1-1-cfdce629c0a8@jannau.net Signed-off-by: Sven Peter <sven@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: Disable branch profiling for all arm64 codeBreno Leitao1-0/+4
[ Upstream commit f22c81bebf8bda6e54dc132df0ed54f6bf8756f9 ] The arm64 kernel doesn't boot with annotated branches (PROFILE_ANNOTATED_BRANCHES) enabled and CONFIG_DEBUG_VIRTUAL together. Bisecting it, I found that disabling branch profiling in arch/arm64/mm solved the problem. Narrowing down a bit further, I found that physaddr.c is the file that needs to have branch profiling disabled to get the machine to boot. I suspect that it might invoke some ftrace helper very early in the boot process and ftrace is still not enabled(!?). Rather than playing whack-a-mole with individual files, disable branch profiling for the entire arch/arm64 tree, similar to what x86 already does in arch/x86/Kbuild. Cc: stable@vger.kernel.org Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04KVM: nSVM: Remove a user-triggerable WARN on nested_svm_load_cr3() succeedingSean Christopherson1-2/+1
[ Upstream commit fc3ba56385d03501eb582e4b86691ba378e556f9 ] Drop the WARN in svm_set_nested_state() on nested_svm_load_cr3() failing as it is trivially easy to trigger from userspace by modifying CPUID after loading CR3. E.g. modifying the state restoration selftest like so: --- tools/testing/selftests/kvm/x86/state_test.c +++ tools/testing/selftests/kvm/x86/state_test.c @@ -280,7 +280,16 @@ int main(int argc, char *argv[]) /* Restore state in a new VM. */ vcpu = vm_recreate_with_one_vcpu(vm); - vcpu_load_state(vcpu, state); + + if (stage == 4) { + state->sregs.cr3 = BIT(44); + vcpu_load_state(vcpu, state); + + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_MAX_PHY_ADDR, 36); + __vcpu_nested_state_set(vcpu, &state->nested); + } else { + vcpu_load_state(vcpu, state); + } /* * Restore XSAVE state in a dummy vCPU, first without doing generates: WARNING: CPU: 30 PID: 938 at arch/x86/kvm/svm/nested.c:1877 svm_set_nested_state+0x34a/0x360 [kvm_amd] Modules linked in: kvm_amd kvm irqbypass [last unloaded: kvm] CPU: 30 UID: 1000 PID: 938 Comm: state_test Tainted: G W 6.18.0-rc7-58e10b63777d-next-vm Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:svm_set_nested_state+0x34a/0x360 [kvm_amd] Call Trace: <TASK> kvm_arch_vcpu_ioctl+0xf33/0x1700 [kvm] kvm_vcpu_ioctl+0x4e6/0x8f0 [kvm] __x64_sys_ioctl+0x8f/0xd0 do_syscall_64+0x61/0xad0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Simply delete the WARN instead of trying to prevent userspace from shoving "illegal" state into CR3. For better or worse, KVM's ABI allows userspace to set CPUID after SREGS, and vice versa, and KVM is very permissive when it comes to guest CPUID. I.e. attempting to enforce the virtual CPU model when setting CPUID could break userspace. Given that the WARN doesn't provide any meaningful protection for KVM or benefit for userspace, simply drop it even though the odds of breaking userspace are minuscule. Opportunistically delete a spurious newline. Fixes: b222b0b88162 ("KVM: nSVM: refactor the CR3 reload on migration") Cc: stable@vger.kernel.org Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251216161755.1775409-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04KVM: x86: Return "unsupported" instead of "invalid" on access to unsupported ↵Sean Christopherson1-20/+20
PV MSR [ Upstream commit 5bb9ac1865123356337a389af935d3913ee917ed ] Return KVM_MSR_RET_UNSUPPORTED instead of '1' (which for all intents and purposes means "invalid") when rejecting accesses to KVM PV MSRs to adhere to KVM's ABI of allowing host reads and writes of '0' to MSRs that are advertised to userspace via KVM_GET_MSR_INDEX_LIST, even if the vCPU model doesn't support the MSR. E.g. running a QEMU VM with -cpu host,-kvmclock,kvm-pv-enforce-cpuid yields: qemu: error: failed to set MSR 0x12 to 0x0 qemu: target/i386/kvm/kvm.c:3301: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://patch.msgid.link/20251230205948.4094097-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04ARM: omap2: Fix reference count leaks in omap_control_init()Wentao Liang1-4/+10
[ Upstream commit 93a04ab480c8bbcb7d9004be139c538c8a0c1bc8 ] The of_get_child_by_name() function increments the reference count of child nodes, causing multiple reference leaks in omap_control_init(): 1. scm_conf node never released in normal/error paths 2. clocks node leak when checking existence 3. Missing scm_conf release before np in error paths Fix these leaks by adding proper of_node_put() calls and separate error handling. Fixes: e5b635742e98 ("ARM: OMAP2+: control: add syscon support for register accesses") Cc: stable@vger.kernel.org Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Andreas Kemnade <andreas@kemnade.info> Link: https://patch.msgid.link/20251217142122.1861292-1-vulab@iscas.ac.cn Signed-off-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: dts: qcom: x1e80100: Add missing TCSR ref clock to the DP PHYsAbel Vesa1-4/+8
[ Upstream commit 0907cab01ff9746ecf08592edd9bd85d2636be58 ] The DP PHYs on X1E80100 need the ref clock which is provided by the TCSR CC. The current X Elite devices supported upstream work fine without this clock, because the boot firmware leaves this clock enabled. But we should not rely on that. Also, even though this change breaks the ABI, it is needed in order to make the driver disables this clock along with the other ones, for a proper bring-down of the entire PHY. So lets attach it to each of the DP PHYs in order to do that. Cc: stable@vger.kernel.org # v6.9 Fixes: 1940c25eaa63 ("arm64: dts: qcom: x1e80100: Add display nodes") Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20251224-phy-qcom-edp-add-missing-refclk-v5-3-3f45d349b5ac@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: hugetlbpage: avoid unused-but-set-parameter warning (gcc-16)Arnd Bergmann1-2/+7
[ Upstream commit 729a2e8e9ac47099a967567389cc9d73ef4194ca ] gcc-16 warns about an instance that older compilers did not: arch/arm64/mm/hugetlbpage.c: In function 'huge_pte_clear': arch/arm64/mm/hugetlbpage.c:369:57: error: parameter 'addr' set but not used [-Werror=unused-but-set-parameter=] The issue here is that __pte_clear() does not actually use its second argument, but when CONFIG_ARM64_CONTPTE is enabled it still gets updated. Replace the macro with an inline function to let the compiler see the argument getting passed down. Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04ARM: 9467/1: mm: Don't use %pK through printkThomas Weissschuh1-1/+1
[ Upstream commit 012ea376a5948b025f260aa45d2a6ec5d96674ea ] Restricted pointers ("%pK") were never meant to be used through printk(). They can acquire sleeping locks in atomic contexts. Switch to %px over the more secure %p as this usage is a debugging aid, gated behind CONFIG_DEBUG_VIRTUAL and used by WARN(). Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/ Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04MIPS: Loongson: Make cpumask_of_node() robust against NUMA_NO_NODEJohn Garry1-1/+1
[ Upstream commit d55d3fe2d1470ac5b6e93efe7998b728013c9fc8 ] The arch definition of cpumask_of_node() cannot handle NUMA_NO_NODE - which is a valid index - so add a check for this. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04m68k: nommu: fix memmove() with differently aligned src and dest for 68000Daniel Palmer1-0/+18
[ Upstream commit 590fe2f46c8698bb758f9002cb247ca10ce95569 ] 68000 has different alignment needs to 68020+. memcpy() checks if the destination is aligned and does a smaller copy to fix the alignment and then critically for 68000 it checks if the source is still unaligned and if it is reverts to smaller copies. memmove() does not currently do the second part and malfunctions if one of the pointers is aligned and the other isn't. This is apparently getting triggered by printk. If I put breakpoints into the new checks added by this commit the first hit looks like this: memmove (n=205, src=0x2f3971 <printk_shared_pbufs+205>, dest=0x2f3980 <printk_shared_pbufs+220>) at arch/m68k/lib/memmove.c:82 Signed-off-by: Daniel Palmer <daniel@thingy.jp> Signed-off-by: Greg Ungerer <gerg@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04riscv: vector: init vector context with proper vlenbSergey Matyukevich1-4/+8
[ Upstream commit ef3ff40346db8476a9ef7269fc9d1837e7243c40 ] The vstate in thread_struct is zeroed when the vector context is initialized. That includes read-only register vlenb, which holds the vector register length in bytes. Zeroed state persists until mstatus.VS becomes 'dirty' and a context switch saves the actual hardware values. This can expose the zero vlenb value to the user-space in early debug scenarios, e.g. when ptrace attaches to a traced process early, before any vector instruction except the first one was executed. Fix this by specifying proper vlenb on vector context init. Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com> Reviewed-by: Andy Chiu <andybnac@gmail.com> Tested-by: Andy Chiu <andybnac@gmail.com> Link: https://patch.msgid.link/20251214163537.1054292-3-geomatsi@gmail.com Signed-off-by: Paul Walmsley <pjw@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04openrisc: define arch-specific version of nop()Brian Masney1-0/+2
[ Upstream commit 0dfffa5479d6260d04d021f69203b1926f73d889 ] When compiling a driver written for MIPS on OpenRISC that uses the nop() function, it fails due to the following error: drivers/watchdog/pic32-wdt.c: Assembler messages: drivers/watchdog/pic32-wdt.c:125: Error: unrecognized instruction `nop' The driver currently uses the generic version of nop() from include/asm-generic/barrier.h: #ifndef nop #define nop() asm volatile ("nop") #endif Let's fix this on OpenRISC by defining an architecture-specific version of nop(). This was tested by performing an allmodconfig openrisc cross compile on an aarch64 host. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601180236.BVy480We-lkp@intel.com/ Signed-off-by: Brian Masney <bmasney@redhat.com> Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04parisc: Prevent interrupts during rebootHelge Deller1-0/+3
[ Upstream commit 35ac5a728c878594f2ea6c43b57652a16be3c968 ] Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04arm64: tegra: smaug: Add usb-role-switch supportDiogo Ivo1-0/+2
[ Upstream commit dfa93788dd8b2f9c59adf45ecf592082b1847b7b ] The USB2 port on Smaug is configured for OTG operation but lacked the required 'usb-role-switch' property, leading to a failed probe and a non-functioning USB port. Add the property along with setting the default role to host. Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04Revert "arm64: zynqmp: Add an OP-TEE node to the device tree"Tomas Melin1-5/+0
[ Upstream commit c197179990124f991fca220d97fac56779a02c6d ] This reverts commit 06d22ed6b6635b17551f386b50bb5aaff9b75fbe. OP-TEE logic in U-Boot automatically injects a reserved-memory node along with optee firmware node to kernel device tree. The injection logic is dependent on that there is no manually defined optee node. Having the node in zynqmp.dtsi effectively breaks OP-TEE's insertion of the reserved-memory node, causing memory access violations during runtime. Signed-off-by: Tomas Melin <tomas.melin@vaisala.com> Signed-off-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/r/20251125-revert-zynqmp-optee-v1-1-d2ce4c0fcaf6@vaisala.com Signed-off-by: Sasha Levin <sashal@kernel.org>