summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
4 daysARM: dts: imx6q-ba16: fix RTC interrupt levelIan Ray1-1/+1
[ Upstream commit e6a4eedd49ce27c16a80506c66a04707e0ee0116 ] RTC interrupt level should be set to "LOW". This was revealed by the introduction of commit: f181987ef477 ("rtc: m41t80: use IRQ flags obtained from fwnode") which changed the way IRQ type is obtained. Fixes: 56c27310c1b4 ("ARM: dts: imx: Add Advantech BA-16 Qseven module") Signed-off-by: Ian Ray <ian.ray@gehealthcare.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysarm64: dts: add off-on-delay-us for usdhc2 regulatorHaibo Chen1-0/+1
[ Upstream commit ca643894a37a25713029b36cfe7d1bae515cac08 ] For SD card, according to the spec requirement, for sd card power reset operation, it need sd card supply voltage to be lower than 0.5v and keep over 1ms, otherwise, next time power back the sd card supply voltage to 3.3v, sd card can't support SD3.0 mode again. To match such requirement on imx8qm-mek board, add 4.8ms delay between sd power off and power on. Fixes: 307fd14d4b14 ("arm64: dts: imx: add imx8qm mek support") Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysalpha: don't reference obsolete termio struct for TC* constantsSam James1-4/+4
[ Upstream commit 9aeed9041929812a10a6d693af050846942a1d16 ] Similar in nature to ab107276607af90b13a5994997e19b7b9731e251. glibc-2.42 drops the legacy termio struct, but the ioctls.h header still defines some TC* constants in terms of termio (via sizeof). Hardcode the values instead. This fixes building Python for example, which falls over like: ./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio' Link: https://bugs.gentoo.org/961769 Link: https://bugs.gentoo.org/962600 Signed-off-by: Sam James <sam@gentoo.org> Reviewed-by: Magnus Lindholm <linmag7@gmail.com> Link: https://lore.kernel.org/r/6ebd3451908785cad53b50ca6bc46cfe9d6bc03c.1764922497.git.sam@gentoo.org Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernelsSebastian Andrzej Siewior1-1/+1
[ Upstream commit fedadc4137234c3d00c4785eeed3e747fe9036ae ] gup_pgd_range() is invoked with disabled interrupts and invokes __kmap_local_page_prot() via pte_offset_map(), gup_p4d_range(). With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get() which uses a spinlock_t via lock_kmap_any(). This leads to an sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a sleeping lock and must not be acquired in atomic context. The loop in map_new_virtual() uses wait_queue_head_t for wake up which also is using a spinlock_t. Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT to allow the use of get_user_pages_fast(). [arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP] Co-developed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 dayscsky: fix csky_cmpxchg_fixup not workingYang Li1-2/+2
[ Upstream commit 809ef03d6d21d5fea016bbf6babeec462e37e68c ] In the csky_cmpxchg_fixup function, it is incorrect to use the global variable csky_cmpxchg_stw to determine the address where the exception occurred.The global variable csky_cmpxchg_stw stores the opcode at the time of the exception, while &csky_cmpxchg_stw shows the address where the exception occurred. Signed-off-by: Yang Li <yang.li85200@gmail.com> Signed-off-by: Guo Ren <guoren@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysx86: remove __range_not_ok()Arnd Bergmann5-13/+9
commit 36903abedfe8d419e90ce349b2b4ce6dc2883e17 upstream. The __range_not_ok() helper is an x86 (and sparc64) specific interface that does roughly the same thing as __access_ok(), but with different calling conventions. Change this to use the normal interface in order for consistency as we clean up all access_ok() implementations. This changes the limit from TASK_SIZE to TASK_SIZE_MAX, which Al points out is the right thing do do here anyway. The callers have to use __access_ok() instead of the normal access_ok() though, because on x86 that contains a WARN_ON_IN_IRQ() check that cannot be used inside of NMI context while tracing. The check in copy_code() is not needed any more, because this one is already done by copy_from_user_nmi(). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/lkml/YgsUKcXGR7r4nINj@zeniv-ca.linux.org.uk/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Stable-dep-of: d319f344561d ("mm: Fix copy_from_user_nofault().") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: arm64: sys_regs: disable -Wuninitialized-const-pointer warningJustin Stitt1-0/+3
A new warning in Clang 22 [1] complains that @clidr passed to get_clidr_el1() is an uninitialized const pointer. get_clidr_el1() doesn't really care since it casts away the const-ness anyways -- it is a false positive. | ../arch/arm64/kvm/sys_regs.c:2838:23: warning: variable 'clidr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer] | 2838 | get_clidr_el1(NULL, &clidr); /* Ugly... */ | | ^~~~~ This patch isn't needed for anything past 6.1 as this code section was reworked in Commit 7af0c2534f4c ("KVM: arm64: Normalize cache configuration"). Since there is no upstream equivalent, this patch just needs to be applied to 5.15. Disable this warning for sys_regs.o with an iron fist as it doesn't make sense to waste maintainer's time or potentially break builds by backporting large changelists from 6.2+. Cc: stable@vger.kernel.org Fixes: 7c8c5e6a9101e ("arm64: KVM: system register handling") Link: https://github.com/llvm/llvm-project/commit/00dacf8c22f065cb52efb14cd091d441f19b319e [1] Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: x86: Acquire kvm->srcu when handling KVM_SET_VCPU_EVENTSSean Christopherson1-0/+2
commit 4bcdd831d9d01e0fb64faea50732b59b2ee88da1 upstream. Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX reads guest memory. Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN via sync_regs(), which already holds SRCU. I.e. trying to precisely use kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause problems. Acquiring SRCU isn't all that expensive, so for simplicity, grab it unconditionally for KVM_SET_VCPU_EVENTS. ============================= WARNING: suspicious RCU usage 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted ----------------------------- include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by repro/1071: #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm] stack backtrace: CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x7f/0x90 lockdep_rcu_suspicious+0x13f/0x1a0 kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm] kvm_vcpu_read_guest+0x3e/0x90 [kvm] nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel] load_vmcs12_host_state+0x432/0xb40 [kvm_intel] vmx_leave_nested+0x30/0x40 [kvm_intel] kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm] kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm] ? mark_held_locks+0x49/0x70 ? kvm_vcpu_ioctl+0x7d/0x970 [kvm] ? kvm_vcpu_ioctl+0x497/0x970 [kvm] kvm_vcpu_ioctl+0x497/0x970 [kvm] ? lock_acquire+0xba/0x2d0 ? find_held_lock+0x2b/0x80 ? do_user_addr_fault+0x40c/0x6f0 ? lock_release+0xb7/0x270 __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x6c/0x170 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7ff11eb1b539 </TASK> Fixes: f7e570780efc ("KVM: x86: Forcibly leave nested virt when SMM state is toggled") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240723232055.3643811-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> [ Based on kernel 5.15 available functions, using srcu_read_lock/srcu_read_unlock instead of kvm_vcpu_srcu_read_lock/kvm_vcpu_srcu_read_unlock ] Signed-off-by: Rajani Kantha <681739313@139.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 dayspowerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pagesDavid Hildenbrand1-0/+1
[ Upstream commit 0da2ba35c0d532ca0fe7af698b17d74c4d084b9a ] Let's properly adjust BALLOON_MIGRATE like the other drivers. Note that the INFLATE/DEFLATE events are triggered from the core when enqueueing/dequeueing pages. This was found by code inspection. Link: https://lkml.kernel.org/r/20251021100606.148294-3-david@redhat.com Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysmm/balloon_compaction: convert balloon_page_delete() to balloon_page_finalize()David Hildenbrand1-1/+1
[ Upstream commit 15504b1163007bbfbd9a63460d5c14737c16e96d ] Let's move the removal of the page from the balloon list into the single caller, to remove the dependency on the PG_isolated flag and clarify locking requirements. Note that for now, balloon_page_delete() was used on two paths: (1) Removing a page from the balloon for deflation through balloon_page_list_dequeue() (2) Removing an isolated page from the balloon for migration in the per-driver migration handlers. Isolated pages were already removed from the balloon list during isolation. So instead of relying on the flag, we can just distinguish both cases directly and handle it accordingly in the caller. We'll shuffle the operations a bit such that they logically make more sense (e.g., remove from the list before clearing flags). In balloon migration functions we can now move the balloon_page_finalize() out of the balloon lock and perform the finalization just before dropping the balloon reference. Document that the page lock is currently required when modifying the movability aspects of a page; hopefully we can soon decouple this from the page lock. Link: https://lkml.kernel.org/r/20250704102524.326966-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brendan Jackman <jackmanb@google.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pé rez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gregory Price <gourry@gourry.net> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 0da2ba35c0d5 ("powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 dayspowerpc/64s/slb: Fix SLB multihit issue during SLB preloadDonet Tom5-97/+0
[ Upstream commit 00312419f0863964625d6dcda8183f96849412c6 ] On systems using the hash MMU, there is a software SLB preload cache that mirrors the entries loaded into the hardware SLB buffer. This preload cache is subject to periodic eviction — typically after every 256 context switches — to remove old entry. To optimize performance, the kernel skips switch_mmu_context() in switch_mm_irqs_off() when the prev and next mm_struct are the same. However, on hash MMU systems, this can lead to inconsistencies between the hardware SLB and the software preload cache. If an SLB entry for a process is evicted from the software cache on one CPU, and the same process later runs on another CPU without executing switch_mmu_context(), the hardware SLB may retain stale entries. If the kernel then attempts to reload that entry, it can trigger an SLB multi-hit error. The following timeline shows how stale SLB entries are created and can cause a multi-hit error when a process moves between CPUs without a MMU context switch. CPU 0 CPU 1 ----- ----- Process P exec swapper/1 load_elf_binary begin_new_exc activate_mm switch_mm_irqs_off switch_mmu_context switch_slb /* * This invalidates all * the entries in the HW * and setup the new HW * SLB entries as per the * preload cache. */ context_switch sched_migrate_task migrates process P to cpu-1 Process swapper/0 context switch (to process P) (uses mm_struct of Process P) switch_mm_irqs_off() switch_slb load_slb++ /* * load_slb becomes 0 here * and we evict an entry from * the preload cache with * preload_age(). We still * keep HW SLB and preload * cache in sync, that is * because all HW SLB entries * anyways gets evicted in * switch_slb during SLBIA. * We then only add those * entries back in HW SLB, * which are currently * present in preload_cache * (after eviction). */ load_elf_binary continues... setup_new_exec() slb_setup_new_exec() sched_switch event sched_migrate_task migrates process P to cpu-0 context_switch from swapper/0 to Process P switch_mm_irqs_off() /* * Since both prev and next mm struct are same we don't call * switch_mmu_context(). This will cause the HW SLB and SW preload * cache to go out of sync in preload_new_slb_context. Because there * was an SLB entry which was evicted from both HW and preload cache * on cpu-1. Now later in preload_new_slb_context(), when we will try * to add the same preload entry again, we will add this to the SW * preload cache and then will add it to the HW SLB. Since on cpu-0 * this entry was never invalidated, hence adding this entry to the HW * SLB will cause a SLB multi-hit error. */ load_elf_binary continues... START_THREAD start_thread preload_new_slb_context /* * This tries to add a new EA to preload cache which was earlier * evicted from both cpu-1 HW SLB and preload cache. This caused the * HW SLB of cpu-0 to go out of sync with the SW preload cache. The * reason for this was, that when we context switched back on CPU-0, * we should have ideally called switch_mmu_context() which will * bring the HW SLB entries on CPU-0 in sync with SW preload cache * entries by setting up the mmu context properly. But we didn't do * that since the prev mm_struct running on cpu-0 was same as the * next mm_struct (which is true for swapper / kernel threads). So * now when we try to add this new entry into the HW SLB of cpu-0, * we hit a SLB multi-hit error. */ WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62 assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149] Modules linked in: CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12 VOLUNTARY Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected) 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000 REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty) MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000 CFAR: c0000000001543b0 IRQMASK: 3 <...> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50 LR [c0000000001543b4] slb_insert_entry+0x124/0x390 Call Trace: 0x7fffceb5ffff (unreliable) preload_new_slb_context+0x100/0x1a0 start_thread+0x26c/0x420 load_elf_binary+0x1b04/0x1c40 bprm_execve+0x358/0x680 do_execveat_common+0x1f8/0x240 sys_execve+0x58/0x70 system_call_exception+0x114/0x300 system_call_common+0x160/0x2c4 >>From the above analysis, during early exec the hardware SLB is cleared, and entries from the software preload cache are reloaded into hardware by switch_slb. However, preload_new_slb_context and slb_setup_new_exec also attempt to load some of the same entries, which can trigger a multi-hit. In most cases, these additional preloads simply hit existing entries and add nothing new. Removing these functions avoids redundant preloads and eliminates the multi-hit issue. This patch removes these two functions. We tested process switching performance using the context_switch benchmark on POWER9/hash, and observed no regression. Without this patch: 129041 ops/sec With this patch: 129341 ops/sec We also measured SLB faults during boot, and the counts are essentially the same with and without this patch. SLB faults without this patch: 19727 SLB faults with this patch: 19786 Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache") cc: stable@vger.kernel.org Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/0ac694ae683494fe8cadbd911a1a5018d5d3c541.1761834163.git.ritesh.list@gmail.com [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 dayspowerpc/pseries/cmm: call balloon_devinfo_init() also without ↵David Hildenbrand1-1/+1
CONFIG_BALLOON_COMPACTION [ Upstream commit fc6bcf9ac4de76f5e7bcd020b3c0a86faff3f2d5 ] Patch series "powerpc/pseries/cmm: two smaller fixes". Two smaller fixes identified while doing a bigger rework. This patch (of 2): We always have to initialize the balloon_dev_info, even when compaction is not configured in: otherwise the containing list and the lock are left uninitialized. Likely not many such configs exist in practice, but let's CC stable to be sure. This was found by code inspection. Link: https://lkml.kernel.org/r/20251021100606.148294-1-david@redhat.com Link: https://lkml.kernel.org/r/20251021100606.148294-2-david@redhat.com Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ moved balloon_devinfo_init() call from inside cmm_balloon_compaction_init() to cmm_init() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: SVM: Mark VMCB_NPT as dirty on nested VMRUNJim Mattson1-0/+1
[ Upstream commit 7c8b465a1c91f674655ea9cec5083744ec5f796a ] Mark the VMCB_NPT bit as dirty in nested_vmcb02_prepare_save() on every nested VMRUN. If L1 changes the PAT MSR between two VMRUN instructions on the same L1 vCPU, the g_pat field in the associated vmcb02 will change, and the VMCB_NPT clean bit should be cleared. Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit") Cc: stable@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250922162935.621409-3-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> [ adapted vmcb02 local variable to svm->vmcb direct access pattern ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysARM: dts: microchip: sama7g5: fix uart fifo size to 32Nicolas Ferre1-2/+2
[ Upstream commit 5654889a94b0de5ad6ceae3793e7f5e0b61b50b6 ] On some flexcom nodes related to uart, the fifo sizes were wrong: fix them to 32 data. Fixes: 7540629e2fc7 ("ARM: dts: at91: add sama7g5 SoC DT and sama7g5-ek") Cc: stable@vger.kernel.org # 5.15+ Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20251114103313.20220-2-nicolas.ferre@microchip.com Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: nVMX: Immediately refresh APICv controls as needed on nested VM-ExitDongli Zhang3-2/+3
[ Upstream commit 29763138830916f46daaa50e83e7f4f907a3236b ] If an APICv status updated was pended while L2 was active, immediately refresh vmcs01's controls instead of pending KVM_REQ_APICV_UPDATE as kvm_vcpu_update_apicv() only calls into vendor code if a change is necessary. E.g. if APICv is inhibited, and then activated while L2 is running: kvm_vcpu_update_apicv() | -> __kvm_vcpu_update_apicv() | -> apic->apicv_active = true | -> vmx_refresh_apicv_exec_ctrl() | -> vmx->nested.update_vmcs01_apicv_status = true | -> return Then L2 exits to L1: __nested_vmx_vmexit() | -> kvm_make_request(KVM_REQ_APICV_UPDATE) vcpu_enter_guest(): KVM_REQ_APICV_UPDATE -> kvm_vcpu_update_apicv() | -> __kvm_vcpu_update_apicv() | -> return // because if (apic->apicv_active == activate) Reported-by: Chao Gao <chao.gao@intel.com> Closes: https://lore.kernel.org/all/aQ2jmnN8wUYVEawF@intel.com Fixes: 7c69661e225c ("KVM: nVMX: Defer APICv updates while L2 is active until L1 is active") Cc: stable@vger.kernel.org Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> [sean: write changelog] Link: https://patch.msgid.link/20251205231913.441872-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> [ exported vmx_refresh_apicv_exec_ctrl() and added declaration in vmx.h ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysparisc: entry: set W bit for !compat tasks in syscall_restore_rfi()Sven Schnelle2-1/+6
commit 5fb1d3ce3e74a4530042795e1e065422295f1371 upstream. When the kernel leaves to userspace via syscall_restore_rfi(), the W bit is not set in the new PSW. This doesn't cause any problems because there's no 64 bit userspace for parisc. Simple static binaries are usually loaded at addresses way below the 32 bit limit so the W bit doesn't matter. Fix this by setting the W bit when TIF_32BIT is not set. Signed-off-by: Sven Schnelle <svens@stackframe.org> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysparisc: entry.S: fix space adjustment on interruption for 64-bit userspaceSven Schnelle1-3/+8
commit 1aa4524c0c1b54842c4c0a370171d11b12d0709b upstream. In wide mode, the IASQ contain the upper part of the GVA during interruption. This needs to be reversed before the space is used - otherwise it contains parts of IAOQ. See Page 2-13 "Processing Resources / Interruption Instruction Address Queues" in the Parisc 2.0 Architecture Manual page 2-13 for an explanation. The IAOQ/IASQ space_adjust was skipped for other interruptions than itlb misses. However, the code in handle_interruption() checks whether iasq[0] contains a valid space. Due to the not masked out bits this match failed and the process was killed. Also add space_adjust for IAOQ1/IASQ1 so ptregs contains sane values. Signed-off-by: Sven Schnelle <svens@stackframe.org> Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: nSVM: Clear exit_code_hi in VMCB when synthesizing nested VM-ExitsSean Christopherson2-3/+6
commit da01f64e7470988f8607776aa7afa924208863fb upstream. Explicitly clear exit_code_hi in the VMCB when synthesizing "normal" nested VM-Exits, as the full exit code is a 64-bit value (spoiler alert), and all exit codes for non-failing VMRUN use only bits 31:0. Cc: Jim Mattson <jmattson@google.com> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: stable@vger.kernel.org Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251113225621.1688428-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: nSVM: Set exit_code_hi to -1 when synthesizing SVM_EXIT_ERR (failed VMRUN)Sean Christopherson1-2/+2
commit f402ecd7a8b6446547076f4bd24bd5d4dcc94481 upstream. Set exit_code_hi to -1u as a temporary band-aid to fix a long-standing (effectively since KVM's inception) bug where KVM treats the exit code as a 32-bit value, when in reality it's a 64-bit value. Per the APM, offset 0x70 is a single 64-bit value: 070h 63:0 EXITCODE And a sane reading of the error values defined in "Table C-1. SVM Intercept Codes" is that negative values use the full 64 bits: –1 VMEXIT_INVALID Invalid guest state in VMCB. –2 VMEXIT_BUSYBUSY bit was set in the VMSA –3 VMEXIT_IDLE_REQUIREDThe sibling thread is not in an idle state -4 VMEXIT_INVALID_PMC Invalid PMC state And that interpretation is confirmed by testing on Milan and Turin (by setting bits in CR0[63:32] to generate VMEXIT_INVALID on VMRUN). Furthermore, Xen has treated exitcode as a 64-bit value since HVM support was adding in 2006 (see Xen commit d1bd157fbc ("Big merge the HVM full-virtualisation abstractions.")). Cc: Jim Mattson <jmattson@google.com> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: stable@vger.kernel.org Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251113225621.1688428-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: nSVM: Propagate SVM_EXIT_CR0_SEL_WRITE correctly for LMSW emulationYosry Ahmed1-9/+9
commit 5674a76db0213f9db1e4d08e847ff649b46889c0 upstream. When emulating L2 instructions, svm_check_intercept() checks whether a write to CR0 should trigger a synthesized #VMEXIT with SVM_EXIT_CR0_SEL_WRITE. For MOV-to-CR0, SVM_EXIT_CR0_SEL_WRITE is only triggered if any bit other than CR0.MP and CR0.TS is updated. However, according to the APM (24593—Rev. 3.42—March 2024, Table 15-7): The LMSW instruction treats the selective CR0-write intercept as a non-selective intercept (i.e., it intercepts regardless of the value being written). Skip checking the changed bits for x86_intercept_lmsw and always inject SVM_EXIT_CR0_SEL_WRITE. Fixes: cfec82cb7d31 ("KVM: SVM: Add intercept check for emulated cr accesses") Cc: stable@vger.kernel.org Reported-by: Matteo Rizzo <matteorizzo@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251024192918.3191141-3-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: x86: Fix VM hard lockup after prolonged inactivity with periodic HV timerfuqiang wang1-5/+23
commit 18ab3fc8e880791aa9f7c000261320fc812b5465 upstream. When advancing the target expiration for the guest's APIC timer in periodic mode, set the expiration to "now" if the target expiration is in the past (similar to what is done in update_target_expiration()). Blindly adding the period to the previous target expiration can result in KVM generating a practically unbounded number of hrtimer IRQs due to programming an expired timer over and over. In extreme scenarios, e.g. if userspace pauses/suspends a VM for an extended duration, this can even cause hard lockups in the host. Currently, the bug only affects Intel CPUs when using the hypervisor timer (HV timer), a.k.a. the VMX preemption timer. Unlike the software timer, a.k.a. hrtimer, which KVM keeps running even on exits to userspace, the HV timer only runs while the guest is active. As a result, if the vCPU does not run for an extended duration, there will be a huge gap between the target expiration and the current time the vCPU resumes running. Because the target expiration is incremented by only one period on each timer expiration, this leads to a series of timer expirations occurring rapidly after the vCPU/VM resumes. More critically, when the vCPU first triggers a periodic HV timer expiration after resuming, advancing the expiration by only one period will result in a target expiration in the past. As a result, the delta may be calculated as a negative value. When the delta is converted into an absolute value (tscdeadline is an unsigned u64), the resulting value can overflow what the HV timer is capable of programming. I.e. the large value will exceed the VMX Preemption Timer's maximum bit width of cpu_preemption_timer_multi + 32, and thus cause KVM to switch from the HV timer to the software timer (hrtimers). After switching to the software timer, periodic timer expiration callbacks may be executed consecutively within a single clock interrupt handler, because hrtimers honors KVM's request for an expiration in the past and immediately re-invokes KVM's callback after reprogramming. And because the interrupt handler runs with IRQs disabled, restarting KVM's hrtimer over and over until the target expiration is advanced to "now" can result in a hard lockup. E.g. the following hard lockup was triggered in the host when running a Windows VM (only relevant because it used the APIC timer in periodic mode) after resuming the VM from a long suspend (in the host). NMI watchdog: Watchdog detected hard LOCKUP on cpu 45 ... RIP: 0010:advance_periodic_target_expiration+0x4d/0x80 [kvm] ... RSP: 0018:ff4f88f5d98d8ef0 EFLAGS: 00000046 RAX: fff0103f91be678e RBX: fff0103f91be678e RCX: 00843a7d9e127bcc RDX: 0000000000000002 RSI: 0052ca4003697505 RDI: ff440d5bfbdbd500 RBP: ff440d5956f99200 R08: ff2ff2a42deb6a84 R09: 000000000002a6c0 R10: 0122d794016332b3 R11: 0000000000000000 R12: ff440db1af39cfc0 R13: ff440db1af39cfc0 R14: ffffffffc0d4a560 R15: ff440db1af39d0f8 FS: 00007f04a6ffd700(0000) GS:ff440db1af380000(0000) knlGS:000000e38a3b8000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000d5651feff8 CR3: 000000684e038002 CR4: 0000000000773ee0 PKRU: 55555554 Call Trace: <IRQ> apic_timer_fn+0x31/0x50 [kvm] __hrtimer_run_queues+0x100/0x280 hrtimer_interrupt+0x100/0x210 ? ttwu_do_wakeup+0x19/0x160 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> Moreover, if the suspend duration of the virtual machine is not long enough to trigger a hard lockup in this scenario, since commit 98c25ead5eda ("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86"), KVM will continue using the software timer until the guest reprograms the APIC timer in some way. Since the periodic timer does not require frequent APIC timer register programming, the guest may continue to use the software timer in perpetuity. Fixes: d8f2f498d9ed ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode") Cc: stable@vger.kernel.org Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com> [sean: massage comments and changelog] Link: https://patch.msgid.link/20251113205114.1647493-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: x86: Explicitly set new periodic hrtimer expiration in apic_timer_fn()fuqiang wang1-1/+1
commit 9633f180ce994ab293ce4924a9b7aaf4673aa114 upstream. When restarting an hrtimer to emulate a the guest's APIC timer in periodic mode, explicitly set the expiration using the target expiration computed by advance_periodic_target_expiration() instead of adding the period to the existing timer. This will allow making adjustments to the expiration, e.g. to deal with expirations far in the past, without having to implement the same logic in both advance_periodic_target_expiration() and apic_timer_fn(). Cc: stable@vger.kernel.org Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com> [sean: split to separate patch, write changelog] Link: https://patch.msgid.link/20251113205114.1647493-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysKVM: x86: WARN if hrtimer callback for periodic APIC timer fires with period=0Sean Christopherson1-1/+1
commit 0ea9494be9c931ddbc084ad5e11fda91b554cf47 upstream. WARN and don't restart the hrtimer if KVM's callback runs with the guest's APIC timer in periodic mode but with a period of '0', as not advancing the hrtimer's deadline would put the CPU into an infinite loop of hrtimer events. Observing a period of '0' should be impossible, even when the hrtimer is running on a different CPU than the vCPU, as KVM is supposed to cancel the hrtimer before changing (or zeroing) the period, e.g. when switching from periodic to one-shot. Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251113205114.1647493-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 dayslib/crypto: x86/blake2s: Fix 32-bit arg treated as 64-bitEric Biggers1-2/+2
commit 2f22115709fc7ebcfa40af3367a508fbbd2f71e9 upstream. In the C code, the 'inc' argument to the assembly functions blake2s_compress_ssse3() and blake2s_compress_avx512() is declared with type u32, matching blake2s_compress(). The assembly code then reads it from the 64-bit %rcx. However, the ABI doesn't guarantee zero-extension to 64 bits, nor do gcc or clang guarantee it. Therefore, fix these functions to read this argument from the 32-bit %ecx. In theory, this bug could have caused the wrong 'inc' value to be used, causing incorrect BLAKE2s hashes. In practice, probably not: I've fixed essentially this same bug in many other assembly files too, but there's never been a real report of it having caused a problem. In x86_64, all writes to 32-bit registers are zero-extended to 64 bits. That results in zero-extension in nearly all situations. I've only been able to demonstrate a lack of zero-extension with a somewhat contrived example involving truncation, e.g. when the C code has a u64 variable holding 0x1234567800000040 and passes it as a u32 expecting it to be truncated to 0x40 (64). But that's not what the real code does, of course. Fixes: ed0356eda153 ("crypto: blake2s - x86_64 SIMD implementation") Cc: stable@vger.kernel.org Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251102234209.62133-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 dayspowerpc/addnote: Fix overflow on 32-bit buildsBen Collins1-3/+4
[ Upstream commit 825ce89a3ef17f84cf2c0eacfa6b8dc9fd11d13f ] The PUT_64[LB]E() macros need to cast the value to unsigned long long like the GET_64[LB]E() macros. Caused lots of warnings when compiled on 32-bit, and clobbered addresses (36-bit P4080). Signed-off-by: Ben Collins <bcollins@kernel.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/2025042122-mustard-wrasse-694572@boujee-and-buff Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysx86/ptrace: Always inline trivial accessorsPeter Zijlstra1-10/+10
[ Upstream commit 1fe4002cf7f23d70c79bda429ca2a9423ebcfdfa ] A KASAN build bloats these single load/store helpers such that it fails to inline them: vmlinux.o: error: objtool: irqentry_exit+0x5e8: call to instruction_pointer_set() with UACCESS enabled Make sure the compiler isn't allowed to do stupid. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251031105435.GU4068168@noisy.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysbpf, arm64: Do not audit capability check in do_jit()Ondrej Mosnacek1-1/+1
[ Upstream commit 189e5deb944a6f9c7992355d60bffd8ec2e54a9c ] Analogically to the x86 commit 881a9c9cb785 ("bpf: Do not audit capability check in do_jit()"), change the capable() call to ns_capable_noaudit() in order to avoid spurious SELinux denials in audit log. The commit log from that commit applies here as well: """ The failure of this check only results in a security mitigation being applied, slightly affecting performance of the compiled BPF program. It doesn't result in a failed syscall, an thus auditing a failed LSM permission check for it is unwanted. For example with SELinux, it causes a denial to be reported for confined processes running as root, which tends to be flagged as a problem to be fixed in the policy. Yet dontauditing or allowing CAP_SYS_ADMIN to the domain may not be desirable, as it would allow/silence also other checks - either going against the principle of least privilege or making debugging potentially harder. Fix it by changing it from capable() to ns_capable_noaudit(), which instructs the LSMs to not audit the resulting denials. """ Fixes: f300769ead03 ("arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Link: https://lore.kernel.org/r/20251204125916.441021-1-omosnace@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysARM: 9464/1: fix input-only operand modification in load_unaligned_zeropad()Liyuan Pang1-5/+5
[ Upstream commit edb924a7211c9aa7a4a415e03caee4d875e46b8e ] In the inline assembly inside load_unaligned_zeropad(), the "addr" is constrained as input-only operand. The compiler assumes that on exit from the asm statement these operands contain the same values as they had before executing the statement, but when kernel page fault happened, the assembly fixup code "bic %2 %2, #0x3" modify the value of "addr", which may lead to an unexpected behavior. Use a temporary variable "tmp" to handle it, instead of modifying the input-only operand, just like what arm64's load_unaligned_zeropad() does. Fixes: b9a50f74905a ("ARM: 7450/1: dcache: select DCACHE_WORD_ACCESS for little-endian ARMv6+ CPUs") Co-developed-by: Xie Yuanbin <xieyuanbin1@huawei.com> Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com> Signed-off-by: Liyuan Pang <pangliyuan1@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 dayspowerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE formatRitesh Harjani (IBM)1-0/+6
[ Upstream commit eae40a6da63faa9fb63ff61f8fa2b3b57da78a84 ] HPTE format was changed since Power9 (ISA 3.0) onwards. While dumping kernel hash page tables, nothing gets printed on powernv P9+. This patch utilizes the helpers added in the patch tagged as fixes, to convert new format to old format and dump the hptes. This fix is only needed for native_find() (powernv), since pseries continues to work fine with the old format. Fixes: 6b243fcfb5f1e ("powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/4c2bb9e5b3cfbc0dd80b61b67cdd3ccfc632684c.1761834163.git.ritesh.list@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
4 dayspowerpc/32: Fix unpaired stwcx. on interrupt exitChristophe Leroy1-6/+4
[ Upstream commit 10e1c77c3636d815db802ceef588522c2d2d947c ] Commit b96bae3ae2cb ("powerpc/32: Replace ASM exception exit by C exception exit from ppc64") erroneouly copied to powerpc/32 the logic from powerpc/64 based on feature CPU_FTR_STCX_CHECKS_ADDRESS which is always 0 on powerpc/32. Re-instate the logic implemented by commit b64f87c16f3c ("[POWERPC] Avoid unpaired stwcx. on some processors") which is based on CPU_FTR_NEED_PAIRED_STWCX feature. Fixes: b96bae3ae2cb ("powerpc/32: Replace ASM exception exit by C exception exit from ppc64") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/6040b5dbcf5cdaa1cd919fcf0790f12974ea6e5a.1757666244.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysperf/x86/intel: Correct large PEBS flag checkDapeng Mi1-1/+3
[ Upstream commit 5e4e355ae7cdeb0fef5dbe908866e1f895abfacc ] current large PEBS flag check only checks if sample_regs_user contains unsupported GPRs but doesn't check if sample_regs_intr contains unsupported GPRs. Of course, currently PEBS HW supports to sample all perf supported GPRs, the missed check doesn't cause real issue. But it won't be true any more after the subsequent patches support to sample SSP register. SSP sampling is not supported by adaptive PEBS HW and it would be supported until arch-PEBS HW. So correct this issue. Fixes: a47ba4d77e12 ("perf/x86: Enable free running PEBS for REGS_USER/INTR") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-5-dapeng1.mi@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysx86/dumpstack: Prevent KASAN false positive warnings in __show_regs()Tengda Wu1-2/+21
[ Upstream commit ced37e9ceae50e4cb6cd058963bd315ec9afa651 ] When triggering a stack dump via sysrq (echo t > /proc/sysrq-trigger), KASAN may report false-positive out-of-bounds access: BUG: KASAN: out-of-bounds in __show_regs+0x4b/0x340 Call Trace: dump_stack_lvl print_address_description.constprop.0 print_report __show_regs show_trace_log_lvl sched_show_task show_state_filter sysrq_handle_showstate __handle_sysrq write_sysrq_trigger proc_reg_write vfs_write ksys_write do_syscall_64 entry_SYSCALL_64_after_hwframe The issue occurs as follows: Task A (walk other tasks' stacks) Task B (running) 1. echo t > /proc/sysrq-trigger show_trace_log_lvl regs = unwind_get_entry_regs() show_regs_if_on_stack(regs) 2. The stack value pointed by `regs` keeps changing, and so are the tags in its KASAN shadow region. __show_regs(regs) regs->ax, regs->bx, ... 3. hit KASAN redzones, OOB When task A walks task B's stack without suspending it, the continuous changes in task B's stack (and corresponding KASAN shadow tags) may cause task A to hit KASAN redzones when accessing obsolete values on the stack, resulting in false positive reports. Simply stopping the task before unwinding is not a viable fix, as it would alter the state intended to inspect. This is especially true for diagnosing misbehaving tasks (e.g., in a hard lockup), where stopping might fail or hide the root cause by changing the call stack. Therefore, fix this by disabling KASAN checks during asynchronous stack unwinding, which is identified when the unwinding task does not match the current task (task != current). [ bp: Align arguments on function's opening brace. ] Fixes: 3b3fa11bc700 ("x86/dumpstack: Print any pt_regs found on the stack") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://patch.msgid.link/all/20251023090632.269121-1-wutengda@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysx86: kmsan: don't instrument stack walking functionsAlexander Potapenko2-0/+17
[ Upstream commit 37ad4ee8364255c73026a3c343403b5977fa7e79 ] Upon function exit, KMSAN marks local variables as uninitialized. Further function calls may result in the compiler creating the stack frame where these local variables resided. This results in frame pointers being marked as uninitialized data, which is normally correct, because they are not stack-allocated. However stack unwinding functions are supposed to read and dereference the frame pointers, in which case KMSAN might be reporting uses of uninitialized values. To work around that, we mark update_stack_state(), unwind_next_frame() and show_trace_log_lvl() with __no_kmsan_checks, preventing all KMSAN reports inside those functions and making them return initialized values. Link: https://lkml.kernel.org/r/20220915150417.722975-40-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: ced37e9ceae5 ("x86/dumpstack: Prevent KASAN false positive warnings in __show_regs()") Signed-off-by: Sasha Levin <sashal@kernel.org>
4 dayss390/smp: Fix fallback CPU detectionHeiko Carstens1-0/+1
[ Upstream commit 07a75d08cfa1b883a6e1256666e5f0617ee99231 ] In case SCLP CPU detection does not work a fallback mechanism using SIGP is in place. Since a cleanup this does not work correctly anymore: new CPUs are only considered if their type matches the boot CPU. Before the cleanup the information if a CPU type should be considered was also part of a structure generated by the fallback mechanism and indicated that a CPU type should not be considered when adding CPUs. Since the rework a global SCLP state is used instead. If the global SCLP state indicates that the CPU type should be considered and the fallback mechanism is used, there may be a mismatch with CPU types if CPUs are added. This can lead to a system with only a single CPU even tough there are many more CPUs. Address this by simply copying the boot cpu type into the generated data structure from the fallback mechanism. Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Fixes: d08d94306e90 ("s390/smp: cleanup core vs. cpu in the SCLP interface") Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysarm64: dts: imx8mm-venice-gw72xx: remove unused sdhc1 pinctrlTim Harvey1-11/+0
[ Upstream commit d949b8d12d6e8fa119bca10d3157cd42e810f6f7 ] The SDHC1 interface is not used on the imx8mm-venice-gw72xx. Remove the unused pinctrl_usdhc1 iomux node. Fixes: 6f30b27c5ef5 ("arm64: dts: imx8mm: Add Gateworks i.MX 8M Mini Development Kits") Signed-off-by: Tim Harvey <tharvey@gateworks.com> Reviewed-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07MIPS: mm: Prevent a TLB shutdown on initial uniquificationMaciej W. Rozycki1-37/+63
commit 9f048fa487409e364cf866c957cf0b0d782ca5a3 upstream. Depending on the particular CPU implementation a TLB shutdown may occur if multiple matching entries are detected upon the execution of a TLBP or the TLBWI/TLBWR instructions. Given that we don't know what entries we have been handed we need to be very careful with the initial TLB setup and avoid all these instructions. Therefore read all the TLB entries one by one with the TLBR instruction, bypassing the content addressing logic, and truncate any large pages in place so as to avoid a case in the second step where an incoming entry for a large page at a lower address overlaps with a replacement entry chosen at another index. Then preinitialize the TLB using addresses outside our usual unique range and avoiding clashes with any entries received, before making the usual call to local_flush_tlb_all(). This fixes (at least) R4x00 cores if TLBP hits multiple matching TLB entries (SGI IP22 PROM for examples sets up all TLBs to the same virtual address). Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: 35ad7e181541 ("MIPS: mm: tlb-r4k: Uniquify TLB entries on init") Cc: stable@vger.kernel.org Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> # Boston I6400, M5150 sim Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-07Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"Jiri Olsa1-5/+5
commit 6d08340d1e354787d6c65a8c3cdd4d41ffb8a5ed upstream. This reverts commit 83f44ae0f8afcc9da659799db8693f74847e66b3. Currently we store initial stacktrace entry twice for non-HW ot_regs, which means callers that fail perf_hw_regs(regs) condition in perf_callchain_kernel. It's easy to reproduce this bpftrace: # bpftrace -e 'tracepoint:sched:sched_process_exec { print(kstack()); }' Attaching 1 probe... bprm_execve+1767 bprm_execve+1767 do_execveat_common.isra.0+425 __x64_sys_execve+56 do_syscall_64+133 entry_SYSCALL_64_after_hwframe+118 When perf_callchain_kernel calls unwind_start with first_frame, AFAICS we do not skip regs->ip, but it's added as part of the unwind process. Hence reverting the extra perf_callchain_store for non-hw regs leg. I was not able to bisect this, so I'm not really sure why this was needed in v5.2 and why it's not working anymore, but I could see double entries as far as v5.10. I did the test for both ORC and framepointer unwind with and without the this fix and except for the initial entry the stacktraces are the same. Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20251104215405.168643-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-07Makefile.compiler: replace cc-ifversion with compiler-specific macrosNick Desaulniers2-3/+3
commit 88b61e3bff93f99712718db785b4aa0c1165f35c upstream. cc-ifversion is GCC specific. Replace it with compiler specific variants. Update the users of cc-ifversion to use these new macros. Link: https://github.com/ClangBuiltLinux/linux/issues/350 Link: https://lore.kernel.org/llvm/CAGG=3QWSAUakO42kubrCap8fp-gm1ERJJAYXTnP1iHk_wrH=BQ@mail.gmail.com/ Suggested-by: Bill Wendling <morbo@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> [nathan: Backport to 5.15 and eliminate instances of cc-ifversion that did not exist upstream when this change was original created] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-07MIPS: Malta: Fix !EVA SOC-it PCI MMIOMaciej W. Rozycki1-7/+13
commit ebd729fef31620e0bf74cbf8a4c7fda73a2a4e7e upstream. Fix a regression that has caused accesses to the PCI MMIO window to complete unclaimed in non-EVA configurations with the SOC-it family of system controllers, preventing PCI devices from working that use MMIO. In the non-EVA case PHYS_OFFSET is set to 0, meaning that PCI_BAR0 is set with an empty mask (and PCI_HEAD4 matches addresses starting from 0 accordingly). Consequently all addresses are matched for incoming DMA accesses from PCI. This seems to confuse the system controller's logic and outgoing bus cycles targeting the PCI MMIO window seem not to make it to the intended devices. This happens as well when a wider mask is used with PCI_BAR0, such as 0x80000000 or 0xe0000000, that makes addresses match that overlap with the PCI MMIO window, which starts at 0x10000000 in our configuration. Set the mask in PCI_BAR0 to 0xf0000000 for non-EVA then, covering the non-EVA maximum 256 MiB of RAM, which is what YAMON does and which used to work correctly up to the offending commit. Set PCI_P2SCMSKL to match PCI_BAR0 as required by the system controller's specification, and match PCI_P2SCMAPL to PCI_HEAD4 for identity mapping. Verified with: Core board type/revision = 0x0d (Core74K) / 0x01 System controller/revision = MIPS SOC-it 101 OCP / 1.3 SDR-FW-4:1 Processor Company ID/options = 0x01 (MIPS Technologies, Inc.) / 0x1c Processor ID/revision = 0x97 (MIPS 74Kf) / 0x4c for non-EVA and with: Core board type/revision = 0x0c (CoreFPGA-5) / 0x00 System controller/revision = MIPS ROC-it2 / 0.0 FW-1:1 (CLK_unknown) GIC Processor Company ID/options = 0x01 (MIPS Technologies, Inc.) / 0x00 Processor ID/revision = 0xa0 (MIPS interAptiv UP) / 0x20 for EVA/non-EVA, fixing: defxx 0000:00:12.0: assign IRQ: got 10 defxx: v1.12 2021/03/10 Lawrence V. Stefani and others 0000:00:12.0: Could not read adapter factory MAC address! vs: defxx 0000:00:12.0: assign IRQ: got 10 defxx: v1.12 2021/03/10 Lawrence V. Stefani and others 0000:00:12.0: DEFPA at MMIO addr = 0x10142000, IRQ = 10, Hardware addr = 00-00-f8-xx-xx-xx 0000:00:12.0: registered as fddi0 for non-EVA and causing no change for EVA. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: 422dd256642b ("MIPS: Malta: Allow PCI devices DMA to lower 2GB physical") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-07lib/crypto: arm/curve25519: Disable on CPU_BIG_ENDIANEric Biggers1-1/+1
commit 44e8241c51f762aafa50ed116da68fd6ecdcc954 upstream. On big endian arm kernels, the arm optimized Curve25519 code produces incorrect outputs and fails the Curve25519 test. This has been true ever since this code was added. It seems that hardly anyone (or even no one?) actually uses big endian arm kernels. But as long as they're ostensibly supported, we should disable this code on them so that it's not accidentally used. Note: for future-proofing, use !CPU_BIG_ENDIAN instead of CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could get removed in the future if big endian support gets dropped. Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation") Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07RISC-V: clear hot-unplugged cores from all task mm_cpumasks to avoid rfence ↵Danil Skrebenkov1-0/+1
errors [ Upstream commit ae9e9f3d67dcef7582a4524047b01e33c5185ddb ] openSBI v1.7 adds harts checks for ipi operations. Especially it adds comparison between hmask passed as an argument from linux and mask of online harts (from openSBI side). If they don't fit each other the error occurs. When cpu is offline, cpu_online_mask is explicitly cleared in __cpu_disable. However, there is no explicit clearing of mm_cpumask. mm_cpumask is used for rfence operations that call openSBI RFENCE extension which uses ipi to remote harts. If hart is offline there may be error if mask of linux is not as mask of online harts in openSBI. this patch adds explicit clearing of mm_cpumask for offline hart. Signed-off-by: Danil Skrebenkov <danil.skrebenkov@cloudbear.ru> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250919132849.31676-1-danil.skrebenkov@cloudbear.ru [pjw@kernel.org: rewrote subject line for clarity] Signed-off-by: Paul Walmsley <pjw@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07riscv: ptdump: use seq_puts() in pt_dump_seq_puts() macroJosephine Pfeiffer1-1/+1
[ Upstream commit a74f038fa50e0d33b740f44f862fe856f16de6a8 ] The pt_dump_seq_puts() macro incorrectly uses seq_printf() instead of seq_puts(). This is both a performance issue and conceptually wrong, as the macro name suggests plain string output (puts) but the implementation uses formatted output (printf). The macro is used in ptdump.c:301 to output a newline character. Using seq_printf() adds unnecessary overhead for format string parsing when outputting this constant string. This bug was introduced in commit 59c4da8640cc ("riscv: Add support to dump the kernel page tables") in 2020, which copied the implementation pattern from other architectures that had the same bug. Fixes: 59c4da8640cc ("riscv: Add support to dump the kernel page tables") Signed-off-by: Josephine Pfeiffer <hi@josie.lol> Link: https://lore.kernel.org/r/20251018170451.3355496-1-hi@josie.lol Signed-off-by: Paul Walmsley <pjw@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07ARM: at91: pm: save and restore ACR during PLL disable/enableNicolas Ferre1-1/+7
[ Upstream commit 0c01fe49651d387776abed6a28541e80c8a93319 ] Add a new word in assembly to store ACR value during the calls to at91_plla_disable/at91_plla_enable macros and use it. Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> [cristian.birsan@microchip.com: remove ACR_DEFAULT_PLLA loading] Signed-off-by: Cristian Birsan <cristian.birsan@microchip.com> Link: https://lore.kernel.org/r/20250827145427.46819-4-nicolas.ferre@microchip.com Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07um: Fix help message for ssl-non-rawTiwei Bie1-1/+4
[ Upstream commit 725e9d81868fcedaeef775948e699955b01631ae ] Add the missing option name in the help message. Additionally, switch to __uml_help(), because this is a global option rather than a per-channel option. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07sparc/module: Add R_SPARC_UA64 relocation handlingKoakuma2-0/+2
[ Upstream commit 05457d96175d25c976ab6241c332ae2eb5e07833 ] This is needed so that the kernel can handle R_SPARC_UA64 relocations, which is emitted by LLVM's IAS. Signed-off-by: Koakuma <koachan@protonmail.com> Reviewed-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07x86/kvm: Prefer native qspinlock for dedicated vCPUs irrespective of PV_UNHALTLi RongQing1-10/+10
[ Upstream commit 960550503965094b0babd7e8c83ec66c8a763b0b ] The commit b2798ba0b876 ("KVM: X86: Choose qspinlock when dedicated physical CPUs are available") states that when PV_DEDICATED=1 (vCPU has dedicated pCPU), qspinlock should be preferred regardless of PV_UNHALT. However, the current implementation doesn't reflect this: when PV_UNHALT=0, we still use virt_spin_lock() even with dedicated pCPUs. This is suboptimal because: 1. Native qspinlocks should outperform virt_spin_lock() for dedicated vCPUs irrespective of HALT exiting 2. virt_spin_lock() should only be preferred when vCPUs may be preempted (non-dedicated case) So reorder the PV spinlock checks to: 1. First handle dedicated pCPU case (disable virt_spin_lock_key) 2. Second check single CPU, and nopvspin configuration 3. Only then check PV_UNHALT support This ensures we always use native qspinlock for dedicated vCPUs, delivering pretty performance gains at high contention levels. Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Wangyang Guo <wangyang.guo@intel.com> Link: https://lore.kernel.org/r/20250722110005.4988-1-lirongqing@baidu.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07mips: lantiq: xway: sysctrl: rename stp clockAleksander Jan Bajkowski1-1/+1
[ Upstream commit b0d04fe6a633ada2c7bc1b5ddd011cbd85961868 ] Bindig requires a node name matching ‘^gpio@[0-9a-f]+$’. This patch changes the clock name from “stp” to “gpio”. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07mips: lantiq: danube: add missing device_type in pci nodeAleksander Jan Bajkowski1-0/+2
[ Upstream commit d66949a1875352d2ddd52b144333288952a9e36f ] This fixes the following warning: arch/mips/boot/dts/lantiq/danube_easy50712.dtb: pci@e105400 (lantiq,pci-xway): 'device_type' is a required property from schema $id: http://devicetree.org/schemas/pci/pci-bus-common.yaml# Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07mips: lantiq: danube: add missing properties to cpu nodeAleksander Jan Bajkowski1-0/+4
[ Upstream commit e8dee66c37085dc9858eb8608bc783c2900e50e7 ] This fixes the following warnings: arch/mips/boot/dts/lantiq/danube_easy50712.dtb: cpus: '#address-cells' is a required property from schema $id: http://devicetree.org/schemas/cpus.yaml# arch/mips/boot/dts/lantiq/danube_easy50712.dtb: cpus: '#size-cells' is a required property from schema $id: http://devicetree.org/schemas/cpus.yaml# arch/mips/boot/dts/lantiq/danube_easy50712.dtb: cpu@0 (mips,mips24Kc): 'reg' is a required property from schema $id: http://devicetree.org/schemas/mips/cpus.yaml# Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07powerpc/eeh: Use result of error_detected() in ueventNiklas Schnelle1-1/+1
[ Upstream commit 704e5dd1c02371dfc7d22e1520102b197a3b628b ] Ever since uevent support was added for AER and EEH with commit 856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume"), it reported PCI_ERS_RESULT_NONE as uevent when recovery begins. Commit 7b42d97e99d3 ("PCI/ERR: Always report current recovery status for udev") subsequently amended AER to report the actual return value of error_detected(). Make the same change to EEH to align it with AER and s390. Suggested-by: Lukas Wunner <lukas@wunner.de> Link: https://lore.kernel.org/linux-pci/aIp6LiKJor9KLVpv@wunner.de/ Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://patch.msgid.link/20250807-add_err_uevents-v5-3-adf85b0620b0@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>