kernel/linux.git/arch/powerpc, branch v7.0.11

powerpc/time: Remove redundant preempt_disable|enable() calls from arch_irq_work_raise()

2026-06-01T15:54:43+00:00

[ Upstream commit 31467b23823ffec1f6fff407f8e3ca9af8b7491a ] A kernel panic is observed when handling machine check exceptions from real mode. BUG: Unable to handle kernel data access on read at 0xc00000006be21300 Oops: Kernel access of bad area, sig: 11 [#1] MSR: 8000000000001003 CR: 88222248 XER: 00000005 CFAR: c00000000003ffc4 DAR: c00000006be21300 DSISR: 40000000 IRQMASK: 0 NIP [c000000000029e40] arch_irq_work_raise+0x10/0x70 LR [c00000000003ffc8] machine_check_queue_event+0xa8/0x150 Call Trace: [c0000000179d3c70] [c00000000003ff64] machine_check_queue_event+0x44/0x150 [c0000000179d3d30] [c0000000000084e0] machine_check_early_common+0x1f0/0x2c0 The crash occurs because arch_irq_work_raise() calls preempt_disable() from machine check exception (MCE) handlers running in real mode. In this context, accessing the preempt_count can fault, leading to the panic. The preempt_disable()/preempt_enable() pair in arch_irq_work_raise() was originally added by commit 0fe1ac48bef0 ("powerpc/perf_event: Fix oops due to perf_event_do_pending call") to avoid races while raising irq work from exception context. Later, commit 471ba0e686cb ("irq_work: Do not raise an IPI when queueing work on the local CPU") added preemption protection in irq_work_queue() path, while commit 20b876918c06 ("irq_work: Use per cpu atomics instead of regular atomics") added equivalent protection in irq_work_queue_on() before reaching arch_irq_work_raise(): irq_work_queue() / irq_work_queue_on() -> preempt_disable() -> __irq_work_queue_local() -> irq_work_raise() -> arch_irq_work_raise() As a result, callers other than mce_irq_work_raise() already execute with preemption disabled, making the additional preempt_disable()/preempt_enable() pair in arch_irq_work_raise() redundant. The arch_irq_work_raise() function executes in NMI context when called from MCE handler. Hence we will not be preempted or scheduled out since we are in NMI context with MSR[EE]=0. Therefore, it is safe to remove the preempt_disable()/preempt_enable() calls from here. Remove it to avoid accessing preempt_count from real mode context. Fixes: cc15ff327569 ("powerpc/mce: Avoid using irq_work_queue() in realmode") Suggested-by: Mahesh Salgaonkar Acked-by: Shrikanth Hegde Reviewed-by: Ritesh Harjani (IBM) Signed-off-by: Sayali Patil [Maddy: Fixed the commit title] Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260513081413.222490-1-sayalip@linux.ibm.com Signed-off-by: Sasha Levin

powerpc/hv-gpci: fix preempt count leak in sysfs show paths

2026-06-01T15:54:38+00:00

[ Upstream commit dbc30a57bd8e026995e9fa8e8c31cffd18542c01 ] Four sysfs show() callbacks in hv-gpci take get_cpu_var(hv_gpci_reqb) (which calls preempt_disable()) but only call the matching put_cpu_var() on the error path under the 'out:' label. Every successful read leaks one preempt_disable(): processor_bus_topology_show() processor_config_show() affinity_domain_via_virtual_processor_show() affinity_domain_via_domain_show() (affinity_domain_via_partition_show() was already correct.) On a CONFIG_PREEMPT=y kernel, repeated reads raise preempt_count and eventually return to userspace with preemption still disabled. The next user-mode page fault then hits faulthandler_disabled() == 1, gets forced to SIGSEGV, and the resulting coredump trips 'BUG: scheduling while atomic' in call_usermodehelper_exec -> wait_for_completion_state -> schedule: BUG: scheduling while atomic: //0x00000004 ... __schedule_bug+0x6c/0x90 __schedule+0x58c/0x13a0 schedule+0x48/0x1a0 schedule_timeout+0x104/0x170 wait_for_completion_state+0x16c/0x330 call_usermodehelper_exec+0x254/0x2d0 vfs_coredump+0x1050/0x2590 get_signal+0xb9c/0xc80 do_notify_resume+0xf8/0x470 Add an out_success label that calls put_cpu_var() before returning the byte count, mirroring affinity_domain_via_partition_show(). Fixes: 71f1c39647d8 ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor bus topology information") Fixes: 1a160c2a13c6 ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor config information") Fixes: 71a7ccb478fc ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via virtual processor information") Fixes: a69a57cac1ec ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via domain information") Signed-off-by: Aboorva Devarajan Reviewed-by: Ritesh Harjani (IBM) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260508041256.3447113-1-aboorvad@linux.ibm.com Signed-off-by: Sasha Levin

powerpc: fix dead default for GUEST_STATE_BUFFER_TEST

2026-06-01T15:54:38+00:00

[ Upstream commit aef656a0e6c01796190bb5bd2bdba1c644ed7811 ] The GUEST_STATE_BUFFER_TEST config option should default to KUNIT_ALL_TESTS so that if all tests are enabled then it is included, but currently the 'default KUNIT_ALL_TESTS' statement is shadowed by 'def_tristate n', meaning that this second default statement is currently dead code. It looks to me like the commit 6ccbbc33f06a ("KVM: PPC: Add helper library for Guest State Buffers") intended to set the default to KUNIT_ALL_TESTS, but mistakenly missed the def_tristate. This dead code was found by kconfirm, a static analysis tool for Kconfig. Fixes: 6ccbbc33f06a ("KVM: PPC: Add helper library for Guest State Buffers") Signed-off-by: Julian Braha Tested-by: Gautam Menghani Reviewed-by: Amit Machhiwal Reviewed-by: Harsh Prateek Bora Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260405161545.161006-1-julianbraha@gmail.com Signed-off-by: Sasha Levin

powerpc: 82xx: fix uninitialized pointers with free attribute

2026-06-01T15:54:38+00:00

[ Upstream commit acd1e47db03d4b528fd5efb8565dd0de1c79f62a ] Uninitialized pointers with `__free` attribute can cause undefined behavior as the memory allocated to the pointer is freed automatically when the pointer goes out of scope. powerpc/km82xx doesn't have any bugs related to this as of now, but, it is better to initialize and assign pointers with `__free` attribute in one statement to ensure proper scope-based cleanup Reported-by: Dan Carpenter Closes: https://lore.kernel.org/all/aPiG_F5EBQUjZqsl@stanley.mountain/ Signed-off-by: Ally Heev Fixes: 4aa5cc1e0012 ("powerpc-km82xx.c: replace of_node_put() with __free") Reviewed-by: Christophe Leroy (CS GROUP) Reviewed-by: Krzysztof Kozlowski Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20251116-aheev-uninitialized-free-attr-km82xx-v2-1-4307e2b5300d@gmail.com Signed-off-by: Sasha Levin

ring-buffer: Flush and stop persistent ring buffer on panic

2026-06-01T15:54:24+00:00

commit a494d3c8d5392bcdff83c2a593df0c160ff9f322 upstream. On real hardware, panic and machine reboot may not flush hardware cache to memory. This means the persistent ring buffer, which relies on a coherent state of memory, may not have its events written to the buffer and they may be lost. Moreover, there may be inconsistency with the counters which are used for validation of the integrity of the persistent ring buffer which may cause all data to be discarded. To avoid this issue, stop recording of the ring buffer on panic and flush the cache of the ring buffer's memory. Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance") Cc: stable@vger.kernel.org Cc: Will Deacon Cc: Mathieu Desnoyers Cc: Ian Rogers Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) Acked-by: Catalin Marinas Acked-by: Geert Uytterhoeven Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman

powerpc/warp: Fix error handling in pika_dtm_thread

2026-05-23T11:09:40+00:00

commit 108d7f951271cbd36ca36efc5e5d106966f5180c upstream. pika_dtm_thread() acquires client through of_find_i2c_device_by_node() but fails to release it in error handling path. This could result in a reference count leak, preventing proper cleanup and potentially leading to resource exhaustion. Add put_device() to release the reference in the error handling path. Found by code review. Cc: stable@vger.kernel.org Fixes: 3984114f0562 ("powerpc/warp: Platform fix for i2c change") Signed-off-by: Ma Ke Reviewed-by: Christophe Leroy Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20251116024411.21968-1-make24@iscas.ac.cn Signed-off-by: Greg Kroah-Hartman

powerpc/crash: Update backup region offset in elfcorehdr on memory hotplug

2026-05-23T11:08:35+00:00

[ Upstream commit f53b24d1fa263f56155213eabab734c18d884aff ] When elfcorehdr is prepared for kdump, the program header representing the first 64 KB of memory is expected to have its offset point to the backup region. This is required because purgatory copies the first 64 KB of the crashed kernel memory to this backup region following a kernel crash. This allows the capture kernel to use the first 64 KB of memory to place the exception vectors and other required data. When elfcorehdr is recreated due to memory hotplug, the offset of the program header representing the first 64 KB is not updated. As a result, the capture kernel exports the first 64 KB at offset 0, even though the data actually resides in the backup region. Fix this by calling sync_backup_region_phdr() to update the program header offset in the elfcorehdr created during memory hotplug. sync_backup_region_phdr() works for images loaded via the kexec_file_load syscall. However, it does not work for kexec_load, because image->arch.backup_start is not initialized in that case. So introduce machine_kexec_post_load() to process the elfcorehdr prepared by kexec-tools and initialize image->arch.backup_start for kdump images loaded via kexec_load syscall. Rename update_backup_region_phdr() to sync_backup_region_phdr() and extend it to synchronize the backup region offset between the kdump image and the ELF core header. The helper now supports updating either the kdump image from the ELF program header or updating the ELF program header from the kdump image, avoiding code duplication. Define ARCH_HAS_KIMAGE_ARCH and struct kimage_arch when CONFIG_KEXEC_FILE or CONFIG_CRASH_DUMP is enabled so that kimage->arch.backup_start is available with the kexec_load system call. This patch depends on the patch titled "powerpc/crash: fix backup region offset update to elfcorehdr". Fixes: 849599b702ef ("powerpc/crash: add crash memory hotplug support") Reviewed-by: Aditya Gupta Signed-off-by: Sourabh Jain Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260312083051.1935737-3-sourabhjain@linux.ibm.com Signed-off-by: Sasha Levin

powerpc/crash: fix backup region offset update to elfcorehdr

2026-05-23T11:08:35+00:00

[ Upstream commit 789335cacdf37da93bb7c70322dff8c7e82881df ] update_backup_region_phdr() in file_load_64.c iterates over all the program headers in the kdump kernel’s elfcorehdr and updates the p_offset of the program header whose physical address starts at 0. However, the loop logic is incorrect because the program header pointer is not updated during iteration. Since elfcorehdr typically contains PT_NOTE entries first, the PT_LOAD program header with physical address 0 is never reached. As a result, its p_offset is not updated to point to the backup region. Because of this behavior, the capture kernel exports the first 64 KB of the crashed kernel’s memory at offset 0, even though that memory actually lives in the backup region. When a crash happens, purgatory copies the first 64 KB of the crashed kernel’s memory into the backup region so the capture kernel can safely use it. This has not caused problems so far because the first 64 KB is usually identical in both the crashed and capture kernels. However, this is just an assumption and is not guaranteed to always hold true. Fix update_backup_region_phdr() to correctly update the p_offset of the program header with a starting physical address of 0 by correcting the logic used to iterate over the program headers. Fixes: cb350c1f1f86 ("powerpc/kexec_file: Prepare elfcore header for crashing kernel") Reviewed-by: Aditya Gupta Signed-off-by: Sourabh Jain Reviewed-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260312083051.1935737-2-sourabhjain@linux.ibm.com Signed-off-by: Sasha Levin

powerpc/64s: Fix unmap race with PMD migration entries

2026-05-23T11:08:31+00:00

[ Upstream commit bbcbf045d6c778e82b47a35fc8728387708e9a3d ] The following race is possible with migration swap entries or device-private THP entries. e.g. when move_pages is called on a PMD THP page, then there maybe an intermediate state, where PMD entry acts as a migration swap entry (pmd_present() is true). Then if an munmap happens at the same time, then this VM_BUG_ON() can happen in pmdp_huge_get_and_clear_full(). This patch fixes that. Thread A: move_pages() syscall add_folio_for_migration() mmap_read_lock(mm) folio_isolate_lru(folio) mmap_read_unlock(mm) do_move_pages_to_node() migrate_pages() try_to_migrate_one() spin_lock(ptl) set_pmd_migration_entry() pmdp_invalidate() # PMD: _PAGE_INVALID | _PAGE_PTE | pfn set_pmd_at() # PMD: migration swap entry (pmd_present=0) spin_unlock(ptl) [page copy phase] # <--- RACE WINDOW --> Thread B: munmap() mmap_write_downgrade(mm) unmap_vmas() -> zap_pmd_range() zap_huge_pmd() __pmd_trans_huge_lock() pmd_is_huge(): # !pmd_present && !pmd_none -> TRUE (swap entry) pmd_lock() -> # spin_lock(ptl), waits for Thread A to release ptl pmdp_huge_get_and_clear_full() VM_BUG_ON(!pmd_present(*pmdp)) # HITS! [ 287.738700][ T1867] ------------[ cut here ]------------ [ 287.743843][ T1867] kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:187! cpu 0x0: Vector: 700 (Program Check) at [c00000044037f4f0] pc: c000000000094ca4: pmdp_huge_get_and_clear_full+0x6c/0x23c lr: c000000000645dec: zap_huge_pmd+0xb0/0x868 sp: c00000044037f790 msr: 800000000282b033 current = 0xc0000004032c1a00 paca = 0xc000000004fe0000 irqmask: 0x03 irq_happened: 0x09 pid = 1867, comm = a.out kernel BUG at :187! Linux version 6.19.0-12136-g14360d4f917c-dirty (powerpc64le-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #27 SMP PREEMPT Sun Feb 22 10:38:56 IST 2026 enter ? for help [link register ] c000000000645dec zap_huge_pmd+0xb0/0x868 [c00000044037f790] c00000044037f7d0 (unreliable) [c00000044037f7d0] c000000000645dcc zap_huge_pmd+0x90/0x868 [c00000044037f840] c0000000005724cc unmap_page_range+0x176c/0x1f40 [c00000044037fa00] c000000000572ea0 unmap_vmas+0xb0/0x1d8 [c00000044037fa90] c0000000005af254 unmap_region+0xb4/0x128 [c00000044037fb50] c0000000005af400 vms_complete_munmap_vmas+0x138/0x310 [c00000044037fbe0] c0000000005b0f1c do_vmi_align_munmap+0x1ec/0x238 [c00000044037fd30] c0000000005b3688 __vm_munmap+0x170/0x1f8 [c00000044037fdf0] c000000000587f74 sys_munmap+0x2c/0x40 [c00000044037fe10] c000000000032668 system_call_exception+0x128/0x350 [c00000044037fe50] c00000000000d05c system_call_vectored_common+0x15c/0x2ec ---- Exception: 3000 (System Call Vectored) at 0000000010064a2c SP (7fff9b1ee9c0) is in userspace 0:mon> zh commit a30b48bf1b24 ("mm/migrate_device: implement THP migration of zone device pages"), enabled migration for device-private PMD entries. Hence this is one other path where this warning could get trigger from. ------------[ cut here ]------------ WARNING: arch/powerpc/mm/book3s64/hash_pgtable.c:199 at hash__pmd_hugepage_update+0x48/0x284, CPU#3: hmm-tests/1905 Modules linked in: test_hmm CPU: 3 UID: 0 PID: 1905 Comm: hmm-tests Tainted: G B W L N 7.0.0-rc1-01438-g7e2f0ee7581c #21 PREEMPT Tainted: [B]=BAD_PAGE, [W]=WARN, [L]=SOFTLOCKUP, [N]=TEST Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries NIP [c000000000096b70] hash__pmd_hugepage_update+0x48/0x284 LR [c000000000096e7c] hash__pmdp_huge_get_and_clear+0xd0/0xd4 Call Trace: [c000000604707670] [c000000004e102b8] 0xc000000004e102b8 (unreliable) [c000000604707700] [c00000000064ec3c] set_pmd_migration_entry+0x414/0x498 [c000000604707760] [c00000000063e5a4] migrate_vma_collect_pmd+0x12e8/0x16c4 [c000000604707890] [c00000000059282c] walk_pgd_range+0x7fc/0xd2c [c000000604707990] [c000000000592e40] __walk_page_range+0xe4/0x2ac [c000000604707a10] [c000000000593534] walk_page_range_mm_unsafe+0x204/0x2a4 [c000000604707ab0] [c00000000063af10] migrate_vma_setup+0x1dc/0x2e8 [c000000604707b10] [c008000006a21838] dmirror_migrate_to_system.constprop.0+0x210/0x4b0 [test_hmm] [c000000604707c30] [c008000006a245b0] dmirror_fops_unlocked_ioctl+0x454/0xa5c [test_hmm] [c000000604707d20] [c0000000006aab84] sys_ioctl+0x4ec/0x1178 [c000000604707e10] [c0000000000326a8] system_call_exception+0x128/0x350 [c000000604707e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec ---- interrupt: 3000 at 0x7fffbe44f50c Fixes: 75358ea359e7c ("powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race") Fixes: a30b48bf1b24 ("mm/migrate_device: implement THP migration of zone device pages") Reported-by: Pavithra Prakash Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/9437e5ef28d1e2f5cbdd7f8286350ce93c1d43c5.1773078178.git.ritesh.list@gmail.com Signed-off-by: Sasha Levin

powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy

2026-05-23T11:08:31+00:00

[ Upstream commit fda4d71651f71c44b35829d13f3c8bf920032f77 ] powerpc uses pt_frag_refcount as a reference counter for tracking it's pte and pmd page table fragments. For PTE table, in case of Hash with 64K pagesize, we have 16 fragments of 4K size in one 64K page. Patch series [1] "mm: free retracted page table by RCU" added pte_free_defer() to defer the freeing of PTE tables when retract_page_tables() is called for madvise MADV_COLLAPSE on shmem range. [1]: https://lore.kernel.org/all/7cd843a9-aa80-14f-5eb2-33427363c20@google.com/ pte_free_defer() sets the active flag on the corresponding fragment's folio & calls pte_fragment_free(), which reduces the pt_frag_refcount. When pt_frag_refcount reaches 0 (no active fragment using the folio), it checks if the folio active flag is set, if set, it calls call_rcu to free the folio, it the active flag is unset then it calls pte_free_now(). Now, this can lead to following problem in a corner case... [ 265.351553][ T183] BUG: Bad page state in process a.out pfn:20d62 [ 265.353555][ T183] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20d62 [ 265.355457][ T183] flags: 0x3ffff800000100(active|node=0|zone=0|lastcpupid=0x7ffff) [ 265.358719][ T183] raw: 003ffff800000100 0000000000000000 5deadbeef0000122 0000000000000000 [ 265.360177][ T183] raw: 0000000000000000 c0000000119caf58 00000000ffffffff 0000000000000000 [ 265.361438][ T183] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 265.362572][ T183] Modules linked in: [ 265.364622][ T183] CPU: 0 UID: 0 PID: 183 Comm: a.out Not tainted 6.18.0-rc3-00141-g1ddeaaace7ff-dirty #53 VOLUNTARY [ 265.364785][ T183] Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries [ 265.364908][ T183] Call Trace: [ 265.364955][ T183] [c000000011e6f7c0] [c000000001cfaa18] dump_stack_lvl+0x130/0x148 (unreliable) [ 265.365202][ T183] [c000000011e6f7f0] [c000000000794758] bad_page+0xb4/0x1c8 [ 265.365384][ T183] [c000000011e6f890] [c00000000079c020] __free_frozen_pages+0x838/0xd08 [ 265.365554][ T183] [c000000011e6f980] [c0000000000a70ac] pte_frag_destroy+0x298/0x310 [ 265.365729][ T183] [c000000011e6fa30] [c0000000000aa764] arch_exit_mmap+0x34/0x218 [ 265.365912][ T183] [c000000011e6fa80] [c000000000751698] exit_mmap+0xb8/0x820 [ 265.366080][ T183] [c000000011e6fc30] [c0000000001b1258] __mmput+0x98/0x300 [ 265.366244][ T183] [c000000011e6fc80] [c0000000001c81f8] do_exit+0x470/0x1508 [ 265.366421][ T183] [c000000011e6fd70] [c0000000001c95e4] do_group_exit+0x88/0x148 [ 265.366602][ T183] [c000000011e6fdc0] [c0000000001c96ec] pid_child_should_wake+0x0/0x178 [ 265.366780][ T183] [c000000011e6fdf0] [c00000000003a270] system_call_exception+0x1b0/0x4e0 [ 265.366958][ T183] [c000000011e6fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec The bad page state error occurs when such a folio gets freed (with active flag set), from do_exit() path in parallel. ... this can happen when the pte fragment was allocated from this folio, but when all the fragments get freed, the pte_frag_refcount still had some unused fragments. Now, if this process exits, with such folio as it's cached pte_frag in mm->context, then during pte_frag_destroy(), we simply call pagetable_dtor() and pagetable_free(), meaning it doesn't clear the active flag. This, can lead to the above bug. Since we are anyway in do_exit() path, then if the refcount is 0, then I guess it should be ok to simply clear the folio active flag before calling pagetable_dtor() & pagetable_free(). Fixes: 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables sharing page") Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/ee13e7f99b8f258019da2b37655b998e73e5ef8b.1773078178.git.ritesh.list@gmail.com Signed-off-by: Sasha Levin