summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2026-03-04powerpc/smp: Add check for kcalloc() failure in parse_thread_groups()Guangshuo Li1-0/+2
[ Upstream commit 33c1c6d8a28a2761ac74b0380b2563cf546c2a3a ] As kcalloc() may fail, check its return value to avoid a NULL pointer dereference when passing it to of_property_read_u32_array(). Fixes: 790a1662d3a26 ("powerpc/smp: Parse ibm,thread-groups with multiple properties") Cc: stable@vger.kernel.org Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250923133235.1862108-1-lgs201920130244@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04powerpc/pseries: Fix MSI-X allocation failure when quota is exceededNam Cao1-3/+41
[ Upstream commit c0215e2d72debcd9cbc1c002fb012d50a3140387 ] Nilay reported that since commit daaa574aba6f ("powerpc/pseries/msi: Switch to msi_create_parent_irq_domain()"), the NVMe driver cannot enable MSI-X when the device's MSI-X table size is larger than the firmware's MSI quota for the device. This is because the commit changes how rtas_prepare_msi_irqs() is called: - Before, it is called when interrupts are allocated at the global interrupt domain with nvec_in being the number of allocated interrupts. rtas_prepare_msi_irqs() can return a positive number and the allocation will be retried. - Now, it is called at the creation of per-device interrupt domain with nvec_in being the number of interrupts that the device supports. If rtas_prepare_msi_irqs() returns positive, domain creation just fails. For Nilay's NVMe driver case, rtas_prepare_msi_irqs() returns a positive number (the quota). This causes per-device interrupt domain creation to fail and thus the NVMe driver cannot enable MSI-X. Rework to make this scenario works again: - pseries_msi_ops_prepare() only prepares as many interrupts as the quota permit. - pseries_irq_domain_alloc() fails if the device's quota is exceeded. Now, if the quota is exceeded, pseries_msi_ops_prepare() will only prepare as allowed by the quota. If device drivers attempt to allocate more interrupts than the quota permits, pseries_irq_domain_alloc() will return an error code and msi_handle_pci_fail() will allow device drivers a retry. Reported-by: Nilay Shroff <nilay@linux.ibm.com> Closes: https://lore.kernel.org/linuxppc-dev/6af2c4c2-97f6-4758-be33-256638ef39e5@linux.ibm.com/ Fixes: daaa574aba6f ("powerpc/pseries/msi: Switch to msi_create_parent_irq_domain()") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Tested-by: Nilay Shroff <nilay@linux.ibm.com> Acked-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260107100230.1466093-1-namcao@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27kallsyms/bpf: rename __bpf_address_lookup() to bpf_address_lookup()Petr Mladek1-1/+1
[ Upstream commit cd6735896d0343942cf3dafb48ce32eb79341990 ] bpf_address_lookup() has been used only in kallsyms_lookup_buildid(). It was supposed to set @modname and @modbuildid when the symbol was in a module. But it always just cleared @modname because BPF symbols were never in a module. And it did not clear @modbuildid because the pointer was not passed. The wrapper is no longer needed. Both @modname and @modbuildid are now always initialized to NULL in kallsyms_lookup_buildid(). Remove the wrapper and rename __bpf_address_lookup() to bpf_address_lookup() because this variant is used everywhere. [akpm@linux-foundation.org: fix loongarch] Link: https://lkml.kernel.org/r/20251128135920.217303-6-pmladek@suse.com Fixes: 9294523e3768 ("module: add printk formats to add module build ID to stacktraces") Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handlingNarayana Murty N3-9/+78
[ Upstream commit 815a8d2feb5615ae7f0b5befd206af0b0160614c ] The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device hotplug safe") restructured the EEH driver to improve synchronization with the PCI hotplug layer. However, it inadvertently moved pci_lock_rescan_remove() outside its intended scope in eeh_handle_normal_event(), leading to broken PCI error reporting and improper EEH event triggering. Specifically, eeh_handle_normal_event() acquired pci_lock_rescan_remove() before calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to acquire the same lock internally, causing nested locking and disrupting normal EEH event handling paths. This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(), with two public wrappers: eeh_pe_bus_get() with locking enabled. eeh_pe_bus_get_nolock() that skips locking. Callers that already hold pci_lock_rescan_remove() now use eeh_pe_bus_get_nolock() to avoid recursive lock acquisition. Additionally, pci_lock_rescan_remove() calls are restored to the correct position—after eeh_pe_bus_get() and immediately before iterating affected PEs and devices. This ensures EEH-triggered PCI removes occur under proper bus rescan locking without recursive lock contention. The eeh_pe_loc_get() function has been split into two functions: eeh_pe_loc_get(struct eeh_pe *pe) which retrieves the loc for given PE. eeh_pe_loc_get_bus(struct pci_bus *bus) which retrieves the location code for given bus. This resolves lockdep warnings such as: <snip> [ 84.964298] [ T928] ============================================ [ 84.964304] [ T928] WARNING: possible recursive locking detected [ 84.964311] [ T928] 6.18.0-rc3 #51 Not tainted [ 84.964315] [ T928] -------------------------------------------- [ 84.964320] [ T928] eehd/928 is trying to acquire lock: [ 84.964324] [ T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40 [ 84.964342] [ T928] but task is already holding lock: [ 84.964347] [ T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40 [ 84.964357] [ T928] other info that might help us debug this: [ 84.964363] [ T928] Possible unsafe locking scenario: [ 84.964367] [ T928] CPU0 [ 84.964370] [ T928] ---- [ 84.964373] [ T928] lock(pci_rescan_remove_lock); [ 84.964378] [ T928] lock(pci_rescan_remove_lock); [ 84.964383] [ T928] *** DEADLOCK *** [ 84.964388] [ T928] May be due to missing lock nesting notation [ 84.964393] [ T928] 1 lock held by eehd/928: [ 84.964397] [ T928] #0: c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40 [ 84.964408] [ T928] stack backtrace: [ 84.964414] [ T928] CPU: 2 UID: 0 PID: 928 Comm: eehd Not tainted 6.18.0-rc3 #51 VOLUNTARY [ 84.964417] [ T928] Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_022) hv:phyp pSeries [ 84.964419] [ T928] Call Trace: [ 84.964420] [ T928] [c0000011a7157990] [c000000001705de4] dump_stack_lvl+0xc8/0x130 (unreliable) [ 84.964424] [ T928] [c0000011a71579d0] [c0000000002f66e0] print_deadlock_bug+0x430/0x440 [ 84.964428] [ T928] [c0000011a7157a70] [c0000000002fd0c0] __lock_acquire+0x1530/0x2d80 [ 84.964431] [ T928] [c0000011a7157ba0] [c0000000002fea54] lock_acquire+0x144/0x410 [ 84.964433] [ T928] [c0000011a7157cb0] [c0000011a7157cb0] __mutex_lock+0xf4/0x1050 [ 84.964436] [ T928] [c0000011a7157e00] [c000000000de21d8] pci_lock_rescan_remove+0x28/0x40 [ 84.964439] [ T928] [c0000011a7157e20] [c00000000004ed98] eeh_pe_bus_get+0x48/0xc0 [ 84.964442] [ T928] [c0000011a7157e50] [c000000000050434] eeh_handle_normal_event+0x64/0xa60 [ 84.964446] [ T928] [c0000011a7157f30] [c000000000051de8] eeh_event_handler+0xf8/0x190 [ 84.964450] [ T928] [c0000011a7157f90] [c0000000002747ac] kthread+0x16c/0x180 [ 84.964453] [ T928] [c0000011a7157fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18 </snip> Fixes: 1010b4c012b0 ("powerpc/eeh: Make EEH driver device hotplug safe") Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251210142559.8874-1-nnmlinux@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27powerpc/uaccess: Move barrier_nospec() out of allow_read_{from/write}_user()Christophe Leroy2-2/+4
[ Upstream commit 5fbc09eb0b4f4b1a4b33abebacbeee0d29f195e9 ] Commit 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") added a redundant barrier_nospec() in copy_from_user(), because powerpc is already calling barrier_nospec() in allow_read_from_user() and allow_read_write_user(). But on other architectures that call to barrier_nospec() was missing. So change powerpc instead of reverting the above commit and having to fix other architectures one by one. This is now possible because barrier_nospec() has also been added in copy_from_user_iter(). Move barrier_nospec() out of allow_read_from_user() and allow_read_write_user(). This will also allow reuse of those functions when implementing masked user access which doesn't require barrier_nospec(). Don't add it back in raw_copy_from_user() as it is already called by copy_from_user() and copy_from_user_iter(). Fixes: 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/f29612105c5fcbc8ceb7303808ddc1a781f0f6b5.1766574657.git.chleroy@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-08powerpc/pseries/cmm: call balloon_devinfo_init() also without ↵David Hildenbrand1-1/+1
CONFIG_BALLOON_COMPACTION commit fc6bcf9ac4de76f5e7bcd020b3c0a86faff3f2d5 upstream. Patch series "powerpc/pseries/cmm: two smaller fixes". Two smaller fixes identified while doing a bigger rework. This patch (of 2): We always have to initialize the balloon_dev_info, even when compaction is not configured in: otherwise the containing list and the lock are left uninitialized. Likely not many such configs exist in practice, but let's CC stable to be sure. This was found by code inspection. Link: https://lkml.kernel.org/r/20251021100606.148294-1-david@redhat.com Link: https://lkml.kernel.org/r/20251021100606.148294-2-david@redhat.com Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pagesDavid Hildenbrand1-0/+1
commit 0da2ba35c0d532ca0fe7af698b17d74c4d084b9a upstream. Let's properly adjust BALLOON_MIGRATE like the other drivers. Note that the INFLATE/DEFLATE events are triggered from the core when enqueueing/dequeueing pages. This was found by code inspection. Link: https://lkml.kernel.org/r/20251021100606.148294-3-david@redhat.com Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08powerpc/64s/slb: Fix SLB multihit issue during SLB preloadDonet Tom5-98/+0
commit 00312419f0863964625d6dcda8183f96849412c6 upstream. On systems using the hash MMU, there is a software SLB preload cache that mirrors the entries loaded into the hardware SLB buffer. This preload cache is subject to periodic eviction — typically after every 256 context switches — to remove old entry. To optimize performance, the kernel skips switch_mmu_context() in switch_mm_irqs_off() when the prev and next mm_struct are the same. However, on hash MMU systems, this can lead to inconsistencies between the hardware SLB and the software preload cache. If an SLB entry for a process is evicted from the software cache on one CPU, and the same process later runs on another CPU without executing switch_mmu_context(), the hardware SLB may retain stale entries. If the kernel then attempts to reload that entry, it can trigger an SLB multi-hit error. The following timeline shows how stale SLB entries are created and can cause a multi-hit error when a process moves between CPUs without a MMU context switch. CPU 0 CPU 1 ----- ----- Process P exec swapper/1 load_elf_binary begin_new_exc activate_mm switch_mm_irqs_off switch_mmu_context switch_slb /* * This invalidates all * the entries in the HW * and setup the new HW * SLB entries as per the * preload cache. */ context_switch sched_migrate_task migrates process P to cpu-1 Process swapper/0 context switch (to process P) (uses mm_struct of Process P) switch_mm_irqs_off() switch_slb load_slb++ /* * load_slb becomes 0 here * and we evict an entry from * the preload cache with * preload_age(). We still * keep HW SLB and preload * cache in sync, that is * because all HW SLB entries * anyways gets evicted in * switch_slb during SLBIA. * We then only add those * entries back in HW SLB, * which are currently * present in preload_cache * (after eviction). */ load_elf_binary continues... setup_new_exec() slb_setup_new_exec() sched_switch event sched_migrate_task migrates process P to cpu-0 context_switch from swapper/0 to Process P switch_mm_irqs_off() /* * Since both prev and next mm struct are same we don't call * switch_mmu_context(). This will cause the HW SLB and SW preload * cache to go out of sync in preload_new_slb_context. Because there * was an SLB entry which was evicted from both HW and preload cache * on cpu-1. Now later in preload_new_slb_context(), when we will try * to add the same preload entry again, we will add this to the SW * preload cache and then will add it to the HW SLB. Since on cpu-0 * this entry was never invalidated, hence adding this entry to the HW * SLB will cause a SLB multi-hit error. */ load_elf_binary continues... START_THREAD start_thread preload_new_slb_context /* * This tries to add a new EA to preload cache which was earlier * evicted from both cpu-1 HW SLB and preload cache. This caused the * HW SLB of cpu-0 to go out of sync with the SW preload cache. The * reason for this was, that when we context switched back on CPU-0, * we should have ideally called switch_mmu_context() which will * bring the HW SLB entries on CPU-0 in sync with SW preload cache * entries by setting up the mmu context properly. But we didn't do * that since the prev mm_struct running on cpu-0 was same as the * next mm_struct (which is true for swapper / kernel threads). So * now when we try to add this new entry into the HW SLB of cpu-0, * we hit a SLB multi-hit error. */ WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62 assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149] Modules linked in: CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12 VOLUNTARY Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected) 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000 REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty) MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000 CFAR: c0000000001543b0 IRQMASK: 3 <...> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50 LR [c0000000001543b4] slb_insert_entry+0x124/0x390 Call Trace: 0x7fffceb5ffff (unreliable) preload_new_slb_context+0x100/0x1a0 start_thread+0x26c/0x420 load_elf_binary+0x1b04/0x1c40 bprm_execve+0x358/0x680 do_execveat_common+0x1f8/0x240 sys_execve+0x58/0x70 system_call_exception+0x114/0x300 system_call_common+0x160/0x2c4 >From the above analysis, during early exec the hardware SLB is cleared, and entries from the software preload cache are reloaded into hardware by switch_slb. However, preload_new_slb_context and slb_setup_new_exec also attempt to load some of the same entries, which can trigger a multi-hit. In most cases, these additional preloads simply hit existing entries and add nothing new. Removing these functions avoids redundant preloads and eliminates the multi-hit issue. This patch removes these two functions. We tested process switching performance using the context_switch benchmark on POWER9/hash, and observed no regression. Without this patch: 129041 ops/sec With this patch: 129341 ops/sec We also measured SLB faults during boot, and the counts are essentially the same with and without this patch. SLB faults without this patch: 19727 SLB faults with this patch: 19786 Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache") cc: stable@vger.kernel.org Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/0ac694ae683494fe8cadbd911a1a5018d5d3c541.1761834163.git.ritesh.list@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08powerpc, mm: Fix mprotect on book3s 32-bitDave Vasilevsky2-1/+13
commit 78fc63ffa7813e33681839bb33826c24195f0eb7 upstream. On 32-bit book3s with hash-MMUs, tlb_flush() was a no-op. This was unnoticed because all uses until recently were for unmaps, and thus handled by __tlb_remove_tlb_entry(). After commit 4a18419f71cd ("mm/mprotect: use mmu_gather") in kernel 5.19, tlb_gather_mmu() started being used for mprotect as well. This caused mprotect to simply not work on these machines: int *ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); *ptr = 1; // force HPTE to be created mprotect(ptr, 4096, PROT_READ); *ptr = 2; // should segfault, but succeeds Fixed by making tlb_flush() actually flush TLB pages. This finally agrees with the behaviour of boot3s64's tlb_flush(). Fixes: 4a18419f71cd ("mm/mprotect: use mmu_gather") Cc: stable@vger.kernel.org Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251116-vasi-mprotect-g3-v3-1-59a9bd33ba00@vasilevsky.ca Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08powerpc/tools: drop `-o pipefail` in gcc check scriptsJan Stancek2-2/+0
[ Upstream commit f1164534ad62f0cc247d99650b07bd59ad2a49fd ] Fixes: 0f71dcfb4aef ("powerpc/ftrace: Add support for -fpatchable-function-entry") Fixes: b71c9ffb1405 ("powerpc: Add arch/powerpc/tools directory") Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com> Fixes: 8c50b72a3b4f ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel") Fixes: abba759796f9 ("powerpc/kbuild: move -mprofile-kernel check to Kconfig") Tested-by: Justin M. Forbes <jforbes@fedoraproject.org> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/cc6cdd116c3ad9d990df21f13c6d8e8a83815bbd.1758641374.git.jstancek@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-02powerpc/kexec: Enable SMT before waking offline CPUsNysal Jan K.A.1-0/+19
commit c2296a1e42418556efbeb5636c4fa6aa6106713a upstream. If SMT is disabled or a partial SMT state is enabled, when a new kernel image is loaded for kexec, on reboot the following warning is observed: kexec: Waking offline cpu 228. WARNING: CPU: 0 PID: 9062 at arch/powerpc/kexec/core_64.c:223 kexec_prepare_cpus+0x1b0/0x1bc [snip] NIP kexec_prepare_cpus+0x1b0/0x1bc LR kexec_prepare_cpus+0x1a0/0x1bc Call Trace: kexec_prepare_cpus+0x1a0/0x1bc (unreliable) default_machine_kexec+0x160/0x19c machine_kexec+0x80/0x88 kernel_kexec+0xd0/0x118 __do_sys_reboot+0x210/0x2c4 system_call_exception+0x124/0x320 system_call_vectored_common+0x15c/0x2ec This occurs as add_cpu() fails due to cpu_bootable() returning false for CPUs that fail the cpu_smt_thread_allowed() check or non primary threads if SMT is disabled. Fix the issue by enabling SMT and resetting the number of SMT threads to the number of threads per core, before attempting to wake up all present CPUs. Fixes: 38253464bc82 ("cpu/SMT: Create topology_smt_thread_allowed()") Reported-by: Sachin P Bappalige <sachinpb@linux.ibm.com> Cc: stable@vger.kernel.org # v6.6+ Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Tested-by: Samir M <samir@linux.ibm.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251028105516.26258-1-nysal@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-02powerpc: Add reloc_offset() to font bitmap pointer used for bootx_printf()Finn Thain1-1/+2
commit b94b73567561642323617155bf4ee24ef0d258fe upstream. Since Linux v6.7, booting using BootX on an Old World PowerMac produces an early crash. Stan Johnson writes, "the symptoms are that the screen goes blank and the backlight stays on, and the system freezes (Linux doesn't boot)." Further testing revealed that the failure can be avoided by disabling CONFIG_BOOTX_TEXT. Bisection revealed that the regression was caused by a change to the font bitmap pointer that's used when btext_init() begins painting characters on the display, early in the boot process. Christophe Leroy explains, "before kernel text is relocated to its final location ... data is addressed with an offset which is added to the Global Offset Table (GOT) entries at the start of bootx_init() by function reloc_got2(). But the pointers that are located inside a structure are not referenced in the GOT and are therefore not updated by reloc_got2(). It is therefore needed to apply the offset manually by using PTRRELOC() macro." Cc: stable@vger.kernel.org Link: https://lists.debian.org/debian-powerpc/2025/10/msg00111.html Link: https://lore.kernel.org/linuxppc-dev/d81ddca8-c5ee-d583-d579-02b19ed95301@yahoo.com/ Reported-by: Cedar Maxwell <cedarmaxwell@mac.com> Closes: https://lists.debian.org/debian-powerpc/2025/09/msg00031.html Bisected-by: Stan Johnson <userm57@yahoo.com> Tested-by: Stan Johnson <userm57@yahoo.com> Fixes: 0ebc7feae79a ("powerpc: Use shared font data") Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/22b3b247425a052b079ab84da926706b3702c2c7.1762731022.git.fthain@linux-m68k.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-02crash: let architecture decide crash memory export to iomem_resourceSourabh Jain1-0/+8
commit adc15829fb73e402903b7030729263b6ee4a7232 upstream. With the generic crashkernel reservation, the kernel emits the following warning on powerpc: WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/mem.c:341 add_system_ram_resources+0xfc/0x180 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-auto-12607-g5472d60c129f #1 VOLUNTARY Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries NIP: c00000000201de3c LR: c00000000201de34 CTR: 0000000000000000 REGS: c000000127cef8a0 TRAP: 0700 Not tainted (6.17.0-auto-12607-g5472d60c129f) MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84000840 XER: 20040010 CFAR: c00000000017eed0 IRQMASK: 0 GPR00: c00000000201de34 c000000127cefb40 c0000000016a8100 0000000000000001 GPR04: c00000012005aa00 0000000020000000 c000000002b705c8 0000000000000000 GPR08: 000000007fffffff fffffffffffffff0 c000000002db8100 000000011fffffff GPR12: c00000000201dd40 c000000002ff0000 c0000000000112bc 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c0000000015a3808 GPR24: c00000000200468c c000000001699888 0000000000000106 c0000000020d1950 GPR28: c0000000014683f8 0000000081000200 c0000000015c1868 c000000002b9f710 NIP [c00000000201de3c] add_system_ram_resources+0xfc/0x180 LR [c00000000201de34] add_system_ram_resources+0xf4/0x180 Call Trace: add_system_ram_resources+0xf4/0x180 (unreliable) do_one_initcall+0x60/0x36c do_initcalls+0x120/0x220 kernel_init_freeable+0x23c/0x390 kernel_init+0x34/0x26c ret_from_kernel_user_thread+0x14/0x1c This warning occurs due to a conflict between crashkernel and System RAM iomem resources. The generic crashkernel reservation adds the crashkernel memory range to /proc/iomem during early initialization. Later, all memblock ranges are added to /proc/iomem as System RAM. If the crashkernel region overlaps with any memblock range, it causes a conflict while adding those memblock regions as iomem resources, triggering the above warning. The conflicting memblock regions are then omitted from /proc/iomem. For example, if the following crashkernel region is added to /proc/iomem: 20000000-11fffffff : Crash kernel then the following memblock regions System RAM regions fail to be inserted: 00000000-7fffffff : System RAM 80000000-257fffffff : System RAM Fix this by not adding the crashkernel memory to /proc/iomem on powerpc. Introduce an architecture hook to let each architecture decide whether to export the crashkernel region to /proc/iomem. For more info checkout commit c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem") and commit bce074bdbc36 ("powerpc: insert System RAM resource to prevent crashkernel conflict") Note: Before switching to the generic crashkernel reservation, powerpc never exported the crashkernel region to /proc/iomem. Link: https://lkml.kernel.org/r/20251016142831.144515-1-sourabhjain@linux.ibm.com Fixes: e3185ee438c2 ("powerpc/crash: use generic crashkernel reservation"). Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/90937fe0-2e76-4c82-b27e-7b8a7fe3ac69@linux.ibm.com/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-02powerpc/addnote: Fix overflow on 32-bit buildsBen Collins1-3/+4
[ Upstream commit 825ce89a3ef17f84cf2c0eacfa6b8dc9fd11d13f ] The PUT_64[LB]E() macros need to cast the value to unsigned long long like the GET_64[LB]E() macros. Caused lots of warnings when compiled on 32-bit, and clobbered addresses (36-bit P4080). Signed-off-by: Ben Collins <bcollins@kernel.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/2025042122-mustard-wrasse-694572@boujee-and-buff Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE formatRitesh Harjani (IBM)1-0/+6
[ Upstream commit eae40a6da63faa9fb63ff61f8fa2b3b57da78a84 ] HPTE format was changed since Power9 (ISA 3.0) onwards. While dumping kernel hash page tables, nothing gets printed on powernv P9+. This patch utilizes the helpers added in the patch tagged as fixes, to convert new format to old format and dump the hptes. This fix is only needed for native_find() (powernv), since pseries continues to work fine with the old format. Fixes: 6b243fcfb5f1e ("powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/4c2bb9e5b3cfbc0dd80b61b67cdd3ccfc632684c.1761834163.git.ritesh.list@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limitRitesh Harjani (IBM)1-4/+6
[ Upstream commit 17b45ccf09882e0c808ad2cf62acdc90ad968746 ] When HV=0 & IR/DR=0, the Hash MMU is said to be in Virtual Real Addressing Mode during early boot. During this, we should ensure that memory region allocations for stress_hpt_struct should happen from within RMA region as otherwise the boot might get stuck while doing memset of this region. History behind why do we have RMA region limitation is better explained in these 2 patches [1] & [2]. This patch ensures that memset to stress_hpt_struct during early boot does not cross ppc64_rma_size boundary. [1]: https://lore.kernel.org/all/20190710052018.14628-1-sjitindarsingh@gmail.com/ [2]: https://lore.kernel.org/all/87wp54usvj.fsf@linux.vnet.ibm.com/ Fixes: 6b34a099faa12 ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/ada1173933ea7617a994d6ee3e54ced8797339fc.1761834163.git.ritesh.list@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18powerpc/32: Fix unpaired stwcx. on interrupt exitChristophe Leroy1-6/+4
[ Upstream commit 10e1c77c3636d815db802ceef588522c2d2d947c ] Commit b96bae3ae2cb ("powerpc/32: Replace ASM exception exit by C exception exit from ppc64") erroneouly copied to powerpc/32 the logic from powerpc/64 based on feature CPU_FTR_STCX_CHECKS_ADDRESS which is always 0 on powerpc/32. Re-instate the logic implemented by commit b64f87c16f3c ("[POWERPC] Avoid unpaired stwcx. on some processors") which is based on CPU_FTR_NEED_PAIRED_STWCX feature. Fixes: b96bae3ae2cb ("powerpc/32: Replace ASM exception exit by C exception exit from ppc64") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/6040b5dbcf5cdaa1cd919fcf0790f12974ea6e5a.1757666244.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18powerpc/kdump: Fix size calculation for hot-removed memory rangesSourabh Jain1-1/+1
[ Upstream commit 7afe2383eff05f76f4ce2cfda658b7889c89f101 ] The elfcorehdr segment in the kdump image stores information about the memory regions (called crash memory ranges) that the kdump kernel must capture. When a memory hot-remove event occurs, the kernel regenerates the elfcorehdr for the currently loaded kdump image to remove the hot-removed memory from the crash memory ranges. Call chain: remove_mem_range() update_crash_elfcorehdr() arch_crash_handle_hotplug_event() crash_handle_hotplug_event() While removing the hot-removed memory from the crash memory ranges in remove_mem_range(), if the removed memory lies within an existing crash range, that range is split into two. During this split, the size of the second range was being calculated incorrectly. This leads to dump capture failure with makedumpfile with below error: $ makedumpfile -l -d 31 /proc/vmcore /tmp/vmcore readpage_elf: Attempt to read non-existent page at 0xbbdab0000. readmem: type_addr: 0, addr:c000000bbdab7f00, size:16 validate_mem_section: Can't read mem_section array. readpage_elf: Attempt to read non-existent page at 0xbbdab0000. readmem: type_addr: 0, addr:c000000bbdab7f00, size:8 get_mm_sparsemem: Can't get the address of mem_section. The updated crash memory range in PT_LOAD entry is holding incorrect data (checkout FileSiz and MemSiz): readelf -a /proc/vmcore <snip...> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000b013d0000 0xc000000b80000000 0x0000000b80000000 0xffffffffc0000000 0xffffffffc0000000 RWE 0x0 <snip...> Update the size calculation for the new crash memory range to fix this issue. Note: This problem will not occur if the kdump kernel is loaded or reloaded after a memory hot-remove operation. Fixes: 849599b702ef ("powerpc/crash: add crash memory hotplug support") Reported-by: Shirisha G <shirisha@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251105033941.1752287-1-sourabhjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-15mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlbDavid Hildenbrand (Red Hat)2-1/+1
In the past, CONFIG_ARCH_HAS_GIGANTIC_PAGE indicated that we support runtime allocation of gigantic hugetlb folios. In the meantime it evolved into a generic way for the architecture to state that it supports gigantic hugetlb folios. In commit fae7d834c43c ("mm: add __dump_folio()") we started using CONFIG_ARCH_HAS_GIGANTIC_PAGE to decide MAX_FOLIO_ORDER: whether we could have folios larger than what the buddy can handle. In the context of that commit, we started using MAX_FOLIO_ORDER to detect page corruptions when dumping tail pages of folios. Before that commit, we assumed that we cannot have folios larger than the highest buddy order, which was obviously wrong. In commit 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when registering hstate"), we used MAX_FOLIO_ORDER to detect inconsistencies, and in fact, we found some now. Powerpc allows for configs that can allocate gigantic folio during boot (not at runtime), that do not set CONFIG_ARCH_HAS_GIGANTIC_PAGE and can exceed PUD_ORDER. To fix it, let's make powerpc select CONFIG_ARCH_HAS_GIGANTIC_PAGE with hugetlb on powerpc, and increase the maximum folio size with hugetlb to 16 GiB on 64bit (possible on arm64 and powerpc) and 1 GiB on 32 bit (powerpc). Note that on some powerpc configurations, whether we actually have gigantic pages depends on the setting of CONFIG_ARCH_FORCE_MAX_ORDER, but there is nothing really problematic about setting it unconditionally: we just try to keep the value small so we can better detect problems in __dump_folio() and inconsistencies around the expected largest folio in the system. Ideally, we'd have a better way to obtain the maximum hugetlb folio size and detect ourselves whether we really end up with gigantic folios. Let's defer bigger changes and fix the warnings first. While at it, handle gigantic DAX folios more clearly: DAX can only end up creating gigantic folios with HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD. Add a new Kconfig option HAVE_GIGANTIC_FOLIOS to make both cases clearer. In particular, worry about ARCH_HAS_GIGANTIC_PAGE only with HUGETLB_PAGE. Note: with enabling CONFIG_ARCH_HAS_GIGANTIC_PAGE on powerpc, we will now also allow for runtime allocations of folios in some more powerpc configs. I don't think this is a problem, but if it is we could handle it through __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED. While __dump_page()/__dump_folio was also problematic (not handling dumping of tail pages of such gigantic folios correctly), it doesn't seem critical enough to mark it as a fix. Link: https://lkml.kernel.org/r/20251114214920.2550676-1-david@kernel.org Fixes: 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when registering hstate") Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Closes: https://lore.kernel.org/r/3e043453-3f27-48ad-b987-cc39f523060a@csgroup.eu/ Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com> Closes: https://lore.kernel.org/r/94377f5c-d4f0-4c0f-b0f6-5bf1cd7305b1@linux.ibm.com/ Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-13powerpc/fadump: skip parameter area allocation when fadump is disabledSourabh Jain1-0/+3
Fadump allocates memory to pass additional kernel command-line argument to the fadump kernel. However, this allocation is not needed when fadump is disabled. So avoid allocating memory for the additional parameter area in such cases. Fixes: f4892c68ecc1 ("powerpc/fadump: allocate memory for additional parameters early") Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Fixes: f4892c68ecc1 ("powerpc/fadump: allocate memory for additional parameters early") Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251008032934.262683-1-sourabhjain@linux.ibm.com
2025-10-13powerpc, ocxl: Fix extraction of struct xive_irq_dataNam Cao3-10/+6
Commit cc0cc23babc9 ("powerpc/xive: Untangle xive from child interrupt controller drivers") changed xive_irq_data to be stashed to chip_data instead of handler_data. However, multiple places are still attempting to read xive_irq_data from handler_data and get a NULL pointer deference bug. Update them to read xive_irq_data from chip_data. Non-XIVE files which touch xive_irq_data seem quite strange to me, especially the ocxl driver. I think there ought to be an alternative platform-independent solution, instead of touching XIVE's data directly. Therefore, I think this whole thing should be cleaned up. But perhaps I just misunderstand something. In any case, this cleanup would not be trivial; for now, just get things working again. Fixes: cc0cc23babc9 ("powerpc/xive: Untangle xive from child interrupt controller drivers") Reported-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Closes: https://lore.kernel.org/linuxppc-dev/68e48df8.170a0220.4b4b0.217d@mx.google.com/ Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251008081359.1382699-1-namcao@linutronix.de
2025-10-13powerpc/pseries/msi: Fix NULL pointer dereference at irq domain teardownNam Cao1-2/+1
pseries_msi_ops_teardown() reads pci_dev* from msi_alloc_info_t. However, pseries_msi_ops_prepare() does not populate this structure, thus it is all zeros. Consequently, pseries_msi_ops_teardown() triggers a NULL pointer dereference crash. struct pci_dev is available in struct irq_domain. Read it there instead. Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/linuxppc-dev/878d7651-433a-46fe-a28b-1b7e893fcbe0@linux.ibm.com/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251010120307.3281720-1-namcao@linutronix.de
2025-10-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-1/+15
Pull x86 kvm updates from Paolo Bonzini: "Generic: - Rework almost all of KVM's exports to expose symbols only to KVM's x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko x86: - Rework almost all of KVM x86's exports to expose symbols only to KVM's vendor modules, i.e. to kvm-{amd,intel}.ko - Add support for virtualizing Control-flow Enforcement Technology (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks). It is worth noting that while SHSTK and IBT can be enabled separately in CPUID, it is not really possible to virtualize them separately. Therefore, Intel processors will really allow both SHSTK and IBT under the hood if either is made visible in the guest's CPUID. The alternative would be to intercept XSAVES/XRSTORS, which is not feasible for performance reasons - Fix a variety of fuzzing WARNs all caused by checking L1 intercepts when completing userspace I/O. KVM has already committed to allowing L2 to to perform I/O at that point - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios) - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs - Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI vmexits needed to faithfully emulate an I/O APIC for such guests - Many cleanups and minor fixes - Recover possible NX huge pages within the TDP MMU under read lock to reduce guest jitter when restoring NX huge pages - Return -EAGAIN during prefault if userspace concurrently deletes/moves the relevant memslot, to fix an issue where prefaulting could deadlock with the memslot update x86 (AMD): - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported - Require a minimum GHCB version of 2 when starting SEV-SNP guests via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors instead of latent guest failures - Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents unauthorized CPU accesses from reading the ciphertext of SNP guest private memory, e.g. to attempt an offline attack. This feature splits the shared SEV-ES/SEV-SNP ASID space into separate ranges for SEV-ES and SEV-SNP guests, therefore a new module parameter is needed to control the number of ASIDs that can be used for VMs with CipherText Hiding vs. how many can be used to run SEV-ES guests - Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted host from tampering with the guest's TSC frequency, while still allowing the the VMM to configure the guest's TSC frequency prior to launch - Validate the XCR0 provided by the guest (via the GHCB) to avoid bugs resulting from bogus XCR0 values - Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to avoid leaving behind stale state (thankfully not consumed in KVM) - Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with them - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's desired TSC_AUX, to fix a bug where KVM was keeping a different vCPU's TSC_AUX in the host MSR until return to userspace KVM (Intel): - Preparation for FRED support - Don't retry in TDX's anti-zero-step mitigation if the target memslot is invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar to the aforementioned prefaulting case - Misc bugfixes and minor cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: x86: Export KVM-internal symbols for sub-modules only KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c KVM: Export KVM-internal symbols for sub-modules only KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent KVM: VMX: Make CR4.CET a guest owned bit KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported KVM: selftests: Add coverage for KVM-defined registers in MSRs test KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test KVM: selftests: Extend MSRs test to validate vCPUs without supported features KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test KVM: selftests: Add an MSR test to exercise guest/host and read/write KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors KVM: x86: Define Control Protection Exception (#CP) vector KVM: x86: Add human friendly formatting for #XM, and #VE KVM: SVM: Enable shadow stack virtualization for SVM KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid KVM: SVM: Pass through shadow stack MSRs as appropriate KVM: SVM: Update dump_vmcb with shadow stack save area additions KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 ...
2025-10-06Merge tag 'pci-v6.18-changes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Add PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() macros that take config space accessor functions. Implement pci_find_capability(), pci_find_ext_capability(), and dwc, dwc endpoint, and cadence capability search interfaces with them (Hans Zhang) - Leave parent unit address 0 in 'interrupt-map' so that when we build devicetree nodes to describe PCI functions that contain multiple peripherals, we can build this property even when interrupt controllers lack 'reg' properties (Lorenzo Pieralisi) - Add a Xeon 6 quirk to disable Extended Tags and limit Max Read Request Size to 128B to avoid a performance issue (Ilpo Järvinen) - Add sysfs 'serial_number' file to expose the Device Serial Number (Matthew Wood) - Fix pci_acpi_preserve_config() memory leak (Nirmoy Das) Resource management: - Align m68k pcibios_enable_device() with other arches (Ilpo Järvinen) - Remove sparc pcibios_enable_device() implementations that don't do anything beyond what pci_enable_resources() does (Ilpo Järvinen) - Remove mips pcibios_enable_resources() and use pci_enable_resources() instead (Ilpo Järvinen) - Clean up bridge window sizing and assignment (Ilpo Järvinen), including: - Leave non-claimed bridge windows disabled - Enable bridges even if a window wasn't assigned because not all windows are required by downstream devices - Preserve bridge window type when releasing the resource, since the type is needed for reassignment - Consolidate selection of bridge windows into two new interfaces, pbus_select_window() and pbus_select_window_for_type(), so this is done consistently - Compute bridge window start and end earlier to avoid logging stale information MSI: - Add quirk to disable MSI on RDC PCI to PCIe bridges (Marcos Del Sol Vives) Error handling: - Align AER with EEH by allowing drivers to request a Bus Reset on Non-Fatal Errors (in addition to the reset on Fatal Errors that we already do) (Lukas Wunner) - If error recovery fails, emit FAILED_RECOVERY uevents for the devices, not for the bridge leading to them. This makes them correspond to BEGIN_RECOVERY uevents (Lukas Wunner) - Align AER with EEH by calling err_handler.error_detected() callbacks to notify drivers if error recovery fails (Lukas Wunner) - Align AER with EEH by restoring device error_state to pci_channel_io_normal before the err_handler.slot_reset() callback. This is earlier than before the err_handler.resume() callback (Lukas Wunner) - Emit a BEGIN_RECOVERY uevent when driver's err_handler.error_detected() requests a reset, as well as when it says recovery is complete or can be done without a reset (Niklas Schnelle) - Align s390 with AER and EEH by emitting uevents during error recovery (Niklas Schnelle) - Align EEH with AER and s390 by emitting BEGIN_RECOVERY, SUCCESSFUL_RECOVERY, or FAILED_RECOVERY uevents depending on the result of err_handler.error_detected() (Niklas Schnelle) - Fix a NULL pointer dereference in aer_ratelimit() when ACPI GHES error information identifies a device without an AER Capability (Breno Leitao) - Update error decoding and TLP Log printing for new errors in current PCIe base spec (Lukas Wunner) - Update error recovery documentation to match the current code and use consistent nomenclature (Lukas Wunner) ASPM: - Enable all ClockPM and ASPM states for devicetree platforms, since there's typically no firmware that enables ASPM This is a risky change that may uncover hardware or configuration defects at boot-time rather than when users enable ASPM via sysfs later. Booting with "pcie_aspm=off" prevents this enabling (Manivannan Sadhasivam) - Remove the qcom code that enabled ASPM (Manivannan Sadhasivam) Power management: - If a device has already been disconnected, e.g., by a hotplug removal, don't bother trying to resume it to D0 when detaching the driver. This avoids annoying "Unable to change power state from D3cold to D0" messages (Mario Limonciello) - Ensure devices are powered up before config reads for 'max_link_width', 'current_link_speed', 'current_link_width', 'secondary_bus_number', and 'subordinate_bus_number' sysfs files. This prevents using invalid data (~0) in drivers or lspci and, depending on how the PCIe controller reports errors, may avoid error interrupts or crashes (Brian Norris) Virtualization: - Add rescan/remove locking when enabling/disabling SR-IOV, which avoids list corruption on s390, where disabling SR-IOV also generates hotplug events (Niklas Schnelle) Peer-to-peer DMA: - Free struct p2p_pgmap, not a member within it, in the pci_p2pdma_add_resource() error path (Sungho Kim) Endpoint framework: - Document sysfs interface for BAR assignment of vNTB endpoint functions (Jerome Brunet) - Fix array underflow in endpoint BAR test case (Dan Carpenter) - Skip endpoint IRQ test if the IRQ is out of range to avoid false errors (Christian Bruel) - Fix endpoint test case for controllers with fixed-size BARs smaller than requested by the test (Marek Vasut) - Restore inbound translation when disabling doorbell so the endpoint doorbell test case can be run more than once (Niklas Cassel) - Avoid a NULL pointer dereference when releasing DMA channels in endpoint DMA test case (Shin'ichiro Kawasaki) - Convert tegra194 interrupt number to MSI vector to fix endpoint Kselftest MSI_TEST test case (Niklas Cassel) - Reset tegra194 BARs when running in endpoint mode so the BAR tests don't overwrite the ATU settings in BAR4 (Niklas Cassel) - Handle errors in tegra194 BPMP transactions so we don't mistakenly skip future PERST# assertion (Vidya Sagar) AMD MDB PCIe controller driver: - Update DT binding example to separate PERST# to a Root Port stanza to make multiple Root Ports possible in the future (Sai Krishna Musham) - Add driver support for PERST# being described in a Root Port stanza, falling back to the host bridge if not found there (Sai Krishna Musham) Freescale i.MX6 PCIe controller driver: - Enable the 3.3V Vaux supply if available so devices can request wakeup with either Beacon or WAKE# (Richard Zhu) MediaTek PCIe Gen3 controller driver: - Add optional sys clock ready time setting to avoid sys_clk_rdy signal glitching in MT6991 and MT8196 (AngeloGioacchino Del Regno) - Add DT binding and driver support for MT6991 and MT8196 (AngeloGioacchino Del Regno) NVIDIA Tegra PCIe controller driver: - When asserting PERST#, disable the controller instead of mistakenly disabling the PLL twice (Nagarjuna Kristam) - Convert struct tegra_msi mask_lock to raw spinlock to avoid a lock nesting error (Marek Vasut) Qualcomm PCIe controller driver: - Select PCI Power Control Slot driver so slot voltage rails can be turned on/off if described in Root Port devicetree node (Qiang Yu) - Parse only PCI bridge child nodes in devicetree, skipping unrelated nodes such as OPP (Operating Performance Points), which caused probe failures (Krishna Chaitanya Chundru) - Add 8.0 GT/s and 32.0 GT/s equalization settings (Ziyue Zhang) - Consolidate Root Port 'phy' and 'reset' properties in struct qcom_pcie_port, regardless of whether we got them from the Root Port node or the host bridge node (Manivannan Sadhasivam) - Fetch and map the ELBI register space in the DWC core rather than in each driver individually (Krishna Chaitanya Chundru) - Enable ECAM mechanism in DWC core by setting up iATU with 'CFG Shift Feature' and use this in the qcom driver (Krishna Chaitanya Chundru) - Add SM8750 compatible to qcom,pcie-sm8550.yaml (Krishna Chaitanya Chundru) - Update qcom,pcie-x1e80100.yaml to allow fifth PCIe host on Qualcomm Glymur, which is compatible with X1E80100 but doesn't have the cnoc_sf_axi clock (Qiang Yu) Renesas R-Car PCIe controller driver: - Fix a typo that prevented correct PHY initialization (Marek Vasut) - Add a missing 1ms delay after PWR reset assertion as required by the V4H manual (Marek Vasut) - Assure reset has completed before DBI access to avoid SError (Marek Vasut) - Fix inverted PHY initialization check, which sometimes led to timeouts and failure to start the controller (Marek Vasut) - Pass the correct IRQ domain to generic_handle_domain_irq() to fix a regression when converting to msi_create_parent_irq_domain() (Claudiu Beznea) - Drop the spinlock protecting the PMSR register - it's no longer required since pci_lock already serializes accesses (Marek Vasut) - Convert struct rcar_msi mask_lock to raw spinlock to avoid a lock nesting error (Marek Vasut) SOPHGO PCIe controller driver: - Check for existence of struct cdns_pcie.ops before using it to allow Cadence drivers that don't need to supply ops (Chen Wang) - Add DT binding and driver for the SOPHGO SG2042 PCIe controller (Chen Wang) STMicroelectronics STM32MP25 PCIe controller driver: - Update pinctrl documentation of initial states and use in runtime suspend/resume (Christian Bruel) - Add pinctrl_pm_select_init_state() for use by stm32 driver, which needs it during resume (Christian Bruel) - Add devicetree bindings and drivers for the STMicroelectronics STM32MP25 in host and endpoint modes (Christian Bruel) Synopsys DesignWare PCIe controller driver: - Add support for x16 in devicetree 'num-lanes' property (Konrad Dybcio) - Verify that if DT specifies a single IRQ for all eDMA channels, it is named 'dma' (Niklas Cassel) TI J721E PCIe driver: - Add MODULE_DEVICE_TABLE() so driver can be autoloaded (Siddharth Vadapalli) - Power controller off before configuring the glue layer so the controller latches the correct values on power-on (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Use devm_request_irq() so 'ks-pcie-error-irq' is freed when driver exits with error (Siddharth Vadapalli) - Add Peripheral Virtualization Unit (PVU), which restricts DMA from PCIe devices to specific regions of host memory, to the ti,am65 binding (Jan Kiszka) Xilinx NWL PCIe controller driver: - Clear bootloader E_ECAM_CONTROL before merging in the new driver value to avoid writing invalid values (Jani Nurminen)" * tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (141 commits) PCI/AER: Avoid NULL pointer dereference in aer_ratelimit() MAINTAINERS: Add entry for ST STM32MP25 PCIe drivers PCI: stm32-ep: Add PCIe Endpoint support for STM32MP25 dt-bindings: PCI: Add STM32MP25 PCIe Endpoint bindings PCI: stm32: Add PCIe host support for STM32MP25 PCI: xilinx-nwl: Fix ECAM programming PCI: j721e: Fix incorrect error message in probe() PCI: keystone: Use devm_request_irq() to free "ks-pcie-error-irq" on exit dt-bindings: PCI: qcom,pcie-x1e80100: Set clocks minItems for the fifth Glymur PCIe Controller PCI: dwc: Support 16-lane operation PCI: Add lockdep assertion in pci_stop_and_remove_bus_device() PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV PCI: rcar-host: Convert struct rcar_msi mask_lock into raw spinlock PCI: tegra194: Rename 'root_bus' to 'root_port_bus' in tegra_pcie_downstream_dev_to_D0() PCI: tegra: Convert struct tegra_msi mask_lock into raw spinlock PCI: rcar-gen4: Fix inverted break condition in PHY initialization PCI: rcar-gen4: Assure reset occurs before DBI access PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion PCI: Set up bridge resources earlier PCI: rcar-host: Drop PMSR spinlock ...
2025-10-04Merge tag 'dma-mapping-6.18-2025-09-30' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - Refactoring of DMA mapping API to physical addresses as the primary interface instead of page+offset parameters This gets much closer to Matthew Wilcox's long term wish for struct-pageless IO to cacheable DRAM and is supporting memdesc project which seeks to substantially transform how struct page works. An advantage of this approach is the possibility of introducing DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow in the common paths, what in turn lets to use recently introduced dma_iova_link() API to map PCI P2P MMIO without creating struct page Developped by Leon Romanovsky and Jason Gunthorpe - Minor clean-up by Petr Tesarik and Qianfeng Rong * tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: kmsan: fix missed kmsan_handle_dma() signature conversion mm/hmm: properly take MMIO path mm/hmm: migrate to physical address-based DMA mapping API dma-mapping: export new dma_*map_phys() interface xen: swiotlb: Open code map_resource callback dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs() kmsan: convert kmsan_handle_dma to use physical addresses dma-mapping: convert dma_direct_*map_page to be phys_addr_t based iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys dma-debug: refactor to use physical addresses for page mapping iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link(). dma-mapping: introduce new DMA attribute to indicate MMIO memory swiotlb: Remove redundant __GFP_NOWARN dma-direct: clean up the logic in __dma_direct_alloc_pages()
2025-10-03Merge tag 'mm-stable-2025-10-01-19-00' of ↵Linus Torvalds15-38/+23
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
2025-10-02Merge tag 'for-6.18/block-20250929' of ↵Linus Torvalds1-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - FC target fixes (Daniel) - Authentication fixes and updates (Martin, Chris) - Admin controller handling (Kamaljit) - Target lockdep assertions (Max) - Keep-alive updates for discovery (Alastair) - Suspend quirk (Georg) - MD pull request via Yu: - Add support for a lockless bitmap. A key feature for the new bitmap are that the IO fastpath is lockless. If a user issues lots of write IO to the same bitmap bit in a short time, only the first write has additional overhead to update bitmap bit, no additional overhead for the following writes. By supporting only resync or recover written data, means in the case creating new array or replacing with a new disk, there is no need to do a full disk resync/recovery. - Switch ->getgeo() and ->bios_param() to using struct gendisk rather than struct block_device. - Rust block changes via Andreas. This series adds configuration via configfs and remote completion to the rnull driver. The series also includes a set of changes to the rust block device driver API: a few cleanup patches, and a few features supporting the rnull changes. The series removes the raw buffer formatting logic from `kernel::block` and improves the logic available in `kernel::string` to support the same use as the removed logic. - floppy arch cleanups - Reduce the number of dereferencing needed for ublk commands - Restrict supported sockets for nbd. Mostly done to eliminate a class of issues perpetually reported by syzbot, by using nonsensical socket setups. - A few s390 dasd block fixes - Fix a few issues around atomic writes - Improve DMA interation for integrity requests - Improve how iovecs are treated with regards to O_DIRECT aligment constraints. We used to require each segment to adhere to the constraints, now only the request as a whole needs to. - Clean up and improve p2p support, enabling use of p2p for metadata payloads - Improve locking of request lookup, using SRCU where appropriate - Use page references properly for brd, avoiding very long RCU sections - Fix ordering of recursively submitted IOs - Clean up and improve updating nr_requests for a live device - Various fixes and cleanups * tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits) s390/dasd: enforce dma_alignment to ensure proper buffer validation s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request ublk: remove redundant zone op check in ublk_setup_iod() nvme: Use non zero KATO for persistent discovery connections nvmet: add safety check for subsys lock nvme-core: use nvme_is_io_ctrl() for I/O controller check nvme-core: do ioccsz/iorcsz validation only for I/O controllers nvme-core: add method to check for an I/O controller blk-cgroup: fix possible deadlock while configuring policy blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path blk-mq: Fix more tag iteration function documentation selftests: ublk: fix behavior when fio is not installed ublk: don't access ublk_queue in ublk_unmap_io() ublk: pass ublk_io to __ublk_complete_rq() ublk: don't access ublk_queue in ublk_need_complete_req() ublk: don't access ublk_queue in ublk_check_commit_and_fetch() ublk: don't pass ublk_queue to ublk_fetch() ublk: don't access ublk_queue in ublk_config_io_buf() ublk: don't access ublk_queue in ublk_check_fetch_buf() ublk: pass q_id and tag to __ublk_check_and_get_req() ...
2025-10-02Merge tag 'kbuild-6.18-1' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild updates from Nathan Chancellor: - Extend modules.builtin.modinfo to include module aliases from MODULE_DEVICE_TABLE for builtin modules so that userspace tools (such as kmod) can verify that a particular module alias will be handled by a builtin module - Bump the minimum version of LLVM for building the kernel to 15.0.0 - Upgrade several userspace API checks in headers_check.pl to errors - Unify and consolidate CONFIG_WERROR / W=e handling - Turn assembler and linker warnings into errors with CONFIG_WERROR / W=e - Respect CONFIG_WERROR / W=e when building userspace programs (userprogs) - Enable -Werror unconditionally when building host programs (hostprogs) - Support copy_file_range() and data segment alignment in gen_init_cpio to improve performance on filesystems that support reflinks such as btrfs and XFS - Miscellaneous small changes to scripts and configuration files * tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (47 commits) modpost: Initialize builtin_modname to stop SIGSEGVs Documentation: kbuild: note CONFIG_DEBUG_EFI in reproducible builds kbuild: vmlinux.unstripped should always depend on .vmlinux.export.o modpost: Create modalias for builtin modules modpost: Add modname to mod_device_table alias scsi: Always define blogic_pci_tbl structure kbuild: extract modules.builtin.modinfo from vmlinux.unstripped kbuild: keep .modinfo section in vmlinux.unstripped kbuild: always create intermediate vmlinux.unstripped s390: vmlinux.lds.S: Reorder sections KMSAN: Remove tautological checks objtool: Drop noinstr hack for KCSAN_WEAK_MEMORY lib/Kconfig.debug: Drop CLANG_VERSION check from DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT riscv: Remove ld.lld version checks from many TOOLCHAIN_HAS configs riscv: Unconditionally use linker relaxation riscv: Remove version check for LTO_CLANG selects powerpc: Drop unnecessary initializations in __copy_inst_from_kernel_nofault() mips: Unconditionally select ARCH_HAS_CURRENT_STACK_POINTER arm64: Remove tautological LLVM Kconfig conditions ARM: Clean up definition of ARM_HAS_GROUP_RELOCS ...
2025-10-02Merge tag 'soc-drivers-6.18' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "Lots of platform specific updates for Qualcomm SoCs, including a new TEE subsystem driver for the Qualcomm QTEE firmware interface. Added support for the Apple A11 SoC in drivers that are shared with the M1/M2 series, among more updates for those. Smaller platform specific driver updates for Renesas, ASpeed, Broadcom, Nvidia, Mediatek, Amlogic, TI, Allwinner, and Freescale SoCs. Driver updates in the cache controller, memory controller and reset controller subsystems. SCMI firmware updates to add more features and improve robustness. This includes support for having multiple SCMI providers in a single system. TEE subsystem support for protected DMA-bufs, allowing hardware to access memory areas that managed by the kernel but remain inaccessible from the CPU in EL1/EL0" * tag 'soc-drivers-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (139 commits) soc/fsl/qbman: Use for_each_online_cpu() instead of for_each_cpu() soc: fsl: qe: Drop legacy-of-mm-gpiochip.h header from GPIO driver soc: fsl: qe: Change GPIO driver to a proper platform driver tee: fix register_shm_helper() pmdomain: apple: Add "apple,t8103-pmgr-pwrstate" dt-bindings: spmi: Add Apple A11 and T2 compatible serial: qcom-geni: Load UART qup Firmware from linux side spi: geni-qcom: Load spi qup Firmware from linux side i2c: qcom-geni: Load i2c qup Firmware from linux side soc: qcom: geni-se: Add support to load QUP SE Firmware via Linux subsystem soc: qcom: geni-se: Cleanup register defines and update copyright dt-bindings: qcom: se-common: Add QUP Peripheral-specific properties for I2C, SPI, and SERIAL bus Documentation: tee: Add Qualcomm TEE driver tee: qcom: enable TEE_IOC_SHM_ALLOC ioctl tee: qcom: add primordial object tee: add Qualcomm TEE driver tee: increase TEE_MAX_ARG_SIZE to 4096 tee: add TEE_IOCTL_PARAM_ATTR_TYPE_OBJREF tee: add TEE_IOCTL_PARAM_ATTR_TYPE_UBUF tee: add close_context to TEE driver operation ...
2025-10-01Merge tag 'timers-vdso-2025-09-29' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull VDSO updates from Thomas Gleixner: - Further consolidation of the VDSO infrastructure and the common data store - Simplification of the related Kconfig logic - Improve the VDSO selftest suite * tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests: vDSO: Drop vdso_test_clock_getres selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64() selftests: vDSO: vdso_test_abi: Test CPUTIME clocks selftests: vDSO: vdso_test_abi: Use explicit indices for name array selftests: vDSO: vdso_test_abi: Drop clock availability tests selftests: vDSO: vdso_test_abi: Use ksft_finished() selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper vdso: Add struct __kernel_old_timeval forward declaration to gettime.h vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO vdso: Drop Kconfig GENERIC_VDSO_TIME_NS vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE vdso: Drop kconfig GENERIC_COMPAT_VDSO vdso: Drop kconfig GENERIC_VDSO_32 riscv: vdso: Untangle Kconfig logic time: Build generic update_vsyscall() only with generic time vDSO vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs vdso: Move ENABLE_COMPAT_VDSO from core to arm64 ARM: VDSO: Remove cntvct_ok global variable vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
2025-09-30KVM: Export KVM-internal symbols for sub-modules onlySean Christopherson2-1/+15
Rework the vast majority of KVM's exports to expose symbols only to KVM submodules, i.e. to x86's kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko. With few exceptions, KVM's exported APIs are intended (and safe) for KVM- internal usage only. Keep kvm_get_kvm(), kvm_get_kvm_safe(), and kvm_put_kvm() as normal exports, as they are needed by VFIO, and are generally safe for external usage (though ideally even the get/put APIs would be KVM-internal, and VFIO would pin a VM by grabbing a reference to its associated file). Implement a framework in kvm_types.h in anticipation of providing a macro to restrict KVM-specific kernel exports, i.e. to provide symbol exports for KVM if and only if KVM is built as one or more modules. Link: https://lore.kernel.org/r/20250919003303.1355064-3-seanjc@google.com Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-30Merge tag 'sched-core-2025-09-26' of ↵Linus Torvalds4-24/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler changes: - Make migrate_{en,dis}able() inline, to improve performance (Menglong Dong) - Move STDL_INIT() functions out-of-line (Peter Zijlstra) - Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra) Fair scheduling: - Defer throttling to when tasks exit to user-space, to reduce the chance & impact of throttle-preemption with held locks and other resources (Aaron Lu, Valentin Schneider) - Get rid of sched_domains_curr_level hack for tl->cpumask(), as the warning was getting triggered on certain topologies (Peter Zijlstra) Misc cleanups & fixes: - Header cleanups (Menglong Dong) - Fix race in push_dl_task() (Harshit Agarwal)" * tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix some typos in include/linux/preempt.h sched: Make migrate_{en,dis}able() inline rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c sched/fair: Do not balance task to a throttled cfs_rq sched/fair: Do not special case tasks in throttled hierarchy sched/fair: update_cfs_group() for throttled cfs_rqs sched/fair: Propagate load for throttled cfs_rq sched/fair: Get rid of throttled_lb_pair() sched/fair: Task based throttle time accounting sched/fair: Switch to task based throttle model sched/fair: Implement throttle task work and related helpers sched/fair: Add related data structure for task based throttle sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig sched: Move STDL_INIT() functions out-of-line sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() sched/deadline: Fix race in push_dl_task()
2025-09-30Merge tag 'powerpc-6.18-1' of ↵Linus Torvalds140-731/+2438
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Madhavan Srinivasan: - powerpc support for BPF arena and arena atomics - Patches to switch to msi parent domain (per-device MSI domains) - Add a lock contention tracepoint in the queued spinlock slowpath - Fixes for underflow in pseries/powernv msi and pci paths - Switch from legacy-of-mm-gpiochip dependency to platform driver - Fixes for handling TLB misses - Introduce support for powerpc papr-hvpipe - Add vpa-dtl PMU driver for pseries platform - Misc fixes and cleanups Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence, Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao, Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters, Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao Bagalkote. * tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits) powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct genirq/msi: Remove msi_post_free() powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf powerpc/time: Expose boot_tb via accessor powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure powerpc/fprobe: fix updated fprobe for function-graph tracer powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL powerpc64/modules: replace stub allocation sentinel with an explicit counter powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs powerpc/ftrace: ensure ftrace record ops are always set for NOPs powerpc/603: Really copy kernel PGD entries into all PGDIRs powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler powerpc/pseries: HVPIPE changes to support migration powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS powerpc/pseries: Enable HVPIPE event message interrupt ...
2025-09-30Merge tag 'ffs-const-v6.18-rc1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull ffs const-attribute cleanups from Kees Cook: "While working on various hardening refactoring a while back we encountered inconsistencies in the application of __attribute_const__ on the ffs() family of functions. This series fixes this across all archs and adds KUnit tests. Notably, this found a theoretical underflow in PCI (also fixed here) and uncovered an inefficiency in ARC (fixed in the ARC arch PR). I kept the series separate from the general hardening PR since it is a stand-alone "topic". - PCI: Fix theoretical underflow in use of ffs(). - Universally apply __attribute_const__ to all architecture's ffs()-family of functions. - Add KUnit tests for ffs() behavior and const-ness" * tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: KUnit: ffs: Validate all the __attribute_const__ annotations sparc: Add __attribute_const__ to ffs()-family implementations xtensa: Add __attribute_const__ to ffs()-family implementations s390: Add __attribute_const__ to ffs()-family implementations parisc: Add __attribute_const__ to ffs()-family implementations mips: Add __attribute_const__ to ffs()-family implementations m68k: Add __attribute_const__ to ffs()-family implementations openrisc: Add __attribute_const__ to ffs()-family implementations riscv: Add __attribute_const__ to ffs()-family implementations hexagon: Add __attribute_const__ to ffs()-family implementations alpha: Add __attribute_const__ to ffs()-family implementations sh: Add __attribute_const__ to ffs()-family implementations powerpc: Add __attribute_const__ to ffs()-family implementations x86: Add __attribute_const__ to ffs()-family implementations csky: Add __attribute_const__ to ffs()-family implementations bitops: Add __attribute_const__ to generic ffs()-family implementations KUnit: Introduce ffs()-family tests PCI: Test for bit underflow in pcie_set_readrq()
2025-09-30Merge tag 'libcrypto-for-linus' of ↵Linus Torvalds8-1332/+0
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: - Add a RISC-V optimized implementation of Poly1305. This code was written by Andy Polyakov and contributed by Zhihang Shao. - Migrate the MD5 code into lib/crypto/, and add KUnit tests for MD5. Yes, it's still the 90s, and several kernel subsystems are still using MD5 for legacy use cases. As long as that remains the case, it's helpful to clean it up in the same way as I've been doing for other algorithms. Later, I plan to convert most of these users of MD5 to use the new MD5 library API instead of the generic crypto API. - Simplify the organization of the ChaCha, Poly1305, BLAKE2s, and Curve25519 code. Consolidate these into one module per algorithm, and centralize the configuration and build process. This is the same reorganization that has already been successful for SHA-1 and SHA-2. - Remove the unused crypto_kpp API for Curve25519. - Migrate the BLAKE2s and Curve25519 self-tests to KUnit. - Always enable the architecture-optimized BLAKE2s code. * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (38 commits) crypto: md5 - Implement export_core() and import_core() wireguard: kconfig: simplify crypto kconfig selections lib/crypto: tests: Enable Curve25519 test when CRYPTO_SELFTESTS lib/crypto: curve25519: Consolidate into single module lib/crypto: curve25519: Move a couple functions out-of-line lib/crypto: tests: Add Curve25519 benchmark lib/crypto: tests: Migrate Curve25519 self-test to KUnit crypto: curve25519 - Remove unused kpp support crypto: testmgr - Remove curve25519 kpp tests crypto: x86/curve25519 - Remove unused kpp support crypto: powerpc/curve25519 - Remove unused kpp support crypto: arm/curve25519 - Remove unused kpp support crypto: hisilicon/hpre - Remove unused curve25519 kpp support lib/crypto: tests: Add KUnit tests for BLAKE2s lib/crypto: blake2s: Consolidate into single C translation unit lib/crypto: blake2s: Move generic code into blake2s.c lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code lib/crypto: blake2s: Remove obsolete self-test lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2 lib/crypto: chacha: Consolidate into single module ...
2025-09-29Merge tag 'vfs-6.18-rc1.async' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs async directory updates from Christian Brauner: "This contains further preparatory changes for the asynchronous directory locking scheme: - Add lookup_one_positive_killable() which allows overlayfs to perform lookup that won't block on a fatal signal - Unify the mount idmap handling in struct renamedata as a rename can only happen within a single mount - Introduce kern_path_parent() for audit which sets the path to the parent and returns a dentry for the target without holding any locks on return - Rename kern_path_locked() as it is only used to prepare for the removal of an object from the filesystem: kern_path_locked() => start_removing_path() kern_path_create() => start_creating_path() user_path_create() => start_creating_user_path() user_path_locked_at() => start_removing_user_path_at() done_path_create() => end_creating_path() NA => end_removing_path()" * tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: debugfs: rename start_creating() to debugfs_start_creating() VFS: rename kern_path_locked() and related functions. VFS/audit: introduce kern_path_parent() for audit VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata VFS: discard err2 in filename_create() VFS/ovl: add lookup_one_positive_killable()
2025-09-29Merge tag 'kernel-6.18-rc1.clone3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull copy_process updates from Christian Brauner: "This contains the changes to enable support for clone3() on nios2 which apparently is still a thing. The more exciting part of this is that it cleans up the inconsistency in how the 64-bit flag argument is passed from copy_process() into the various other copy_*() helpers" [ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ] * tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nios2: implement architecture-specific portion of sys_clone3 arch: copy_thread: pass clone_flags as u64 copy_process: pass clone_flags as u64 across calltree copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)
2025-09-29Merge tag 'vfs-6.18-rc1.inode' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "This contains a series I originally wrote and that Eric brought over the finish line. It moves out the i_crypt_info and i_verity_info pointers out of 'struct inode' and into the fs-specific part of the inode. So now the few filesytems that actually make use of this pay the price in their own private inode storage instead of forcing it upon every user of struct inode. The pointer for the crypt and verity info is simply found by storing an offset to its address in struct fsverity_operations and struct fscrypt_operations. This shrinks struct inode by 16 bytes. I hope to move a lot more out of it in the future so that struct inode becomes really just about very core stuff that we need, much like struct dentry and struct file, instead of the dumping ground it has become over the years. On top of this are a various changes associated with the ongoing inode lifetime handling rework that multiple people are pushing forward: - Stop accessing inode->i_count directly in f2fs and gfs2. They simply should use the __iget() and iput() helpers - Make the i_state flags an enum - Rework the iput() logic Currently, if we are the last iput, and we have the I_DIRTY_TIME bit set, we will grab a reference on the inode again and then mark it dirty and then redo the put. This is to make sure we delay the time update for as long as possible We can rework this logic to simply dec i_count if it is not 1, and if it is do the time update while still holding the i_count reference Then we can replace the atomic_dec_and_lock with locking the ->i_lock and doing atomic_dec_and_test, since we did the atomic_add_unless above - Add an icount_read() helper and convert everyone that accesses inode->i_count directly for this purpose to use the helper - Expand dump_inode() to dump more information about an inode helping in debugging - Add some might_sleep() annotations to iput() and associated helpers" * tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add might_sleep() annotation to iput() and more fs: expand dump_inode() inode: fix whitespace issues fs: add an icount_read helper fs: rework iput logic fs: make the i_state flags an enum fs: stop accessing ->i_count directly in f2fs and gfs2 fsverity: check IS_VERITY() in fsverity_cleanup_inode() fs: remove inode::i_verity_info btrfs: move verity info pointer to fs-specific part of inode f2fs: move verity info pointer to fs-specific part of inode ext4: move verity info pointer to fs-specific part of inode fsverity: add support for info in fs-specific part of inode fs: remove inode::i_crypt_info ceph: move crypt info pointer to fs-specific part of inode ubifs: move crypt info pointer to fs-specific part of inode f2fs: move crypt info pointer to fs-specific part of inode ext4: move crypt info pointer to fs-specific part of inode fscrypt: add support for info in fs-specific part of inode fscrypt: replace raw loads of info pointer with helper function
2025-09-25arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.cMenglong Dong1-0/+1
The include/generated/asm-offsets.h is generated in Kbuild during compiling from arch/SRCARCH/kernel/asm-offsets.c. When we want to generate another similar offset header file, circular dependency can happen. For example, we want to generate a offset file include/generated/test.h, which is included in include/sched/sched.h. If we generate asm-offsets.h first, it will fail, as include/sched/sched.h is included in asm-offsets.c and include/generated/test.h doesn't exist; If we generate test.h first, it can't success neither, as include/generated/asm-offsets.h is included by it. In x86_64, the macro COMPILE_OFFSETS is used to avoid such circular dependency. We can generate asm-offsets.h first, and if the COMPILE_OFFSETS is defined, we don't include the "generated/test.h". And we define the macro COMPILE_OFFSETS for all the asm-offsets.c for this purpose. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-09-25Merge tag 'soc_fsl-6.18-1' of https://github.com/chleroy/linux into soc/driversArnd Bergmann1-1/+0
FSL SOC Changes for 6.18: - Use for_each_online_cpu() instead of for_each_cpu() in qbman - Update FSL QUICC ENGINE GPIO driver to a standard platform driver and stop using legacy-of-mm-gpiochip.h header - Misc fixes on bus/fsl-mc * tag 'soc_fsl-6.18-1' of https://github.com/chleroy/linux: soc/fsl/qbman: Use for_each_online_cpu() instead of for_each_cpu() soc: fsl: qe: Drop legacy-of-mm-gpiochip.h header from GPIO driver soc: fsl: qe: Change GPIO driver to a proper platform driver bus: fsl-mc: Replace snprintf and sprintf with sysfs_emit in sysfs show functions bus: fsl-mc: Check return value of platform_get_resource() Link: https://lore.kernel.org/r/26615a15-3494-435f-b0c1-861122b4b5e1@csgroup.eu Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-23VFS: rename kern_path_locked() and related functions.NeilBrown1-2/+2
kern_path_locked() is now only used to prepare for removing an object from the filesystem (and that is the only credible reason for wanting a positive locked dentry). Thus it corresponds to kern_path_create() and so should have a corresponding name. Unfortunately the name "kern_path_create" is somewhat misleading as it doesn't actually create anything. The recently added simple_start_creating() provides a better pattern I believe. The "start" can be matched with "end" to bracket the creating or removing. So this patch changes names: kern_path_locked -> start_removing_path kern_path_create -> start_creating_path user_path_create -> start_creating_user_path user_path_locked_at -> start_removing_user_path_at done_path_create -> end_creating_path and also introduces end_removing_path() which is identical to end_creating_path(). __start_removing_path (which was __kern_path_locked) is enhanced to call mnt_want_write() for consistency with the start_creating_path(). Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-23powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr structHaren Myneni1-4/+4
Fix the the following build errors with CONFIG_UAPI_HEADER_TEST: ./usr/include/asm/papr-hvpipe.h:16:9: error: unknown type name 'u8' 16 | u8 version; ./usr/include/asm/papr-hvpipe.h:17:9: error: unknown type name 'u8' 17 | u8 reserved[3]; ./usr/include/asm/papr-hvpipe.h:18:9: error: unknown type name 'u32' 18 | u32 flags; ./usr/include/asm/papr-hvpipe.h:19:9: error: unknown type name 'u8' 19 | u8 reserved2[40]; Fixes: 043439ad1a23c ("powerpc/pseries: Define papr-hvpipe ioctl") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250922091108.1483970-1-haren@linux.ibm.com
2025-09-22soc: fsl: qe: Drop legacy-of-mm-gpiochip.h header from GPIO driverChristophe Leroy1-1/+0
Remove legacy-of-mm-gpiochip.h header file. The above mentioned file provides an OF API that's deprecated. There is no agnostic alternatives to it and we have to open code the logic which was hidden behind of_mm_gpiochip_add_data(). Note, most of the GPIO drivers are using their own labeling schemas and resource retrieval that only a few may gain of the code deduplication, so whenever alternative is appear we can move drivers again to use that one. As a side effect this change fixes a potential memory leak on an error path, if of_mm_gpiochip_add_data() fails. [Text copied from commit 34064c8267a6 ("powerpc/8xx: Drop legacy-of-mm-gpiochip.h header")] Suggested-by: Bartosz Golaszewski <brgl@bgdev.pl> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/e8a5d2c5b72233bd36da7fecc0a551ca54d39478.1758212309.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
2025-09-22powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is ↵Athira Rajeev1-2/+52
needed Handle the case when the aux buffer is going to be full and data needs to be written to the data file. perf_aux_output_begin() function checks if there is enough space depending on the values of aux_wakeup and aux_watermark which is part of "struct perf_buffer". Inorder to maintain where to write to aux buffer, add two fields to "struct vpa_pmu_buf". Field "threshold" to indicate total possible DTL entries that can be contained in aux buffer and field "full" to indicate anytime when buffer is full. In perf_aux_output_end, there is check to see if wake up is needed based on aux head value. In vpa_dtl_capture_aux(), check if there is enough space to contain the DTL data. If not, save the data for available memory and set full to true. Set head of private aux to zero when buffer is full so that next data will be copied to beginning of the buffer. The address used for copying to aux is "aux_copy_buf + buf->head". So once buffer is full, set head to zero, so that next time it will be written from start of the buffer. Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-by: Tejas Manhas <tejas05@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250915102947.26681-7-atrajeev@linux.ibm.com
2025-09-22powerpc/perf/vpa-dtl: Add support to capture DTL data in aux bufferAthira Rajeev1-1/+130
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. When the hrtimer expires, in the timer handler, code is added to save the DTL data to perf event record via vpa_dtl_capture_aux() function. The DTL (Dispatch Trace Log) contains information about dispatch/preempt, enqueue time etc. We directly copy the DTL buffer data as part of auxiliary buffer. Data will be written to disk only when the allocated buffer is full. By this approach, all the DTL data will be present as-is in the perf.data. The data will be post-processed in perf tools side when doing perf report/perf script and this will avoid time taken to create samples in the kernel space. To corelate each DTL entry with other events across CPU's, we need to map timebase from "struct dtl_entry" which phyp provides with boot timebase. This also needs timebase frequency. Define "struct boottb_freq" to save these details. Added changes to capture the Dispatch Trace Log details to AUX buffer in vpa_dtl_dump_sample_data(). Boot timebase and frequency needs to be saved only at once, added field to indicate this as part of "vpa_pmu_buf" structure. perf_aux_output_begin: This function is called before writing to AUX area. This returns the pointer to aux area private structure, ie "struct vpa_pmu_buf". The function obtains the output handle (used in perf_aux_output_end). when capture completes in vpa_dtl_capture_aux(), call perf_aux_output_end() to commit the recorded data. perf_aux_output_end() is called to move the aux->head of "struct perf_buffer" to indicate size of data in aux buffer. aux_tail will be moved in perf tools side when writing the data from aux buffer to perf.data file in disk. It is responsiblity of PMU driver to make sure data is copied between perf_aux_output_begin and perf_aux_output_end. Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-by: Tejas Manhas <tejas05@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250915102947.26681-6-atrajeev@linux.ibm.com
2025-09-22powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing ↵Athira Rajeev1-0/+77
DTL data vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. When the hrtimer expires, in the timer handler, code is added to save the DTL data to perf event record. DTL (Dispatch Trace Log) contains information about dispatch/preempt, enqueue time etc. We directly copy the DTL buffer data as part of auxiliary buffer and it will be postprocessed later. To enable the support for aux buffer, add the PMU callbacks for setup_aux and free_aux. In setup_aux, set up pmu-private data structures for an AUX area. rb_alloc_aux uses "alloc_pages_node" and returns pointer to each page address. Map these pages to contiguous space using vmap and use that as base address. The aux private data structure ie, "struct vpa_pmu_buf" mainly saves: 1. buf->base: aux buffer base address 2. buf->head: offset from base address where data will be written to. 3. buf->size: Size of allocated memory free_aux will free pmu-private AUX data structures. Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-by: Tejas Manhas <tejas05@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250915102947.26681-5-atrajeev@linux.ibm.com
2025-09-22powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perfKajol Jain2-1/+341
The pseries Shared Processor Logical Partition(SPLPAR) machines can retrieve a log of dispatch and preempt events from the hypervisor using data from Disptach Trace Log(DTL) buffer. With this information, user can retrieve when and why each dispatch & preempt has occurred. Added an interface to expose the Virtual Processor Area(VPA) DTL counters via perf. The following events are available and exposed in sysfs: vpa_dtl/dtl_cede/ - Trace voluntary (OS initiated) virtual processor waits vpa_dtl/dtl_preempt/ - Trace time slice preempts vpa_dtl/dtl_fault/ - Trace virtual partition memory page faults. vpa_dtl/dtl_all/ - Trace all (dtl_cede/dtl_preempt/dtl_fault) Added interface defines supported event list, config fields for the event attributes and their corresponding bit values which are exported via sysfs. User could use the standard perf tool to access perf events exposed via vpa-dtl pmu. The VPA DTL PMU counters do not interrupt on overflow or generate any PMI interrupts. Therefore, the kernel needs to poll the counters, added hrtimer code to do that. The timer interval can be provided by user via sample_period field in nano seconds. There is one hrtimer added per vpa-dtl pmu thread. To ensure there are no other conflicting dtl users (example: debugfs dtl or /proc/powerpc/vcpudispatch_stats), interface added code to use "down_write_trylock" call to take the dtl_access_lock. The dtl_access_lock is defined in dtl.h file. Also added global reference count variable called "dtl_global_refc", to ensure dtl data can be captured per-cpu. Code also added global lock called "dtl_global_lock" to avoid race condition. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Tejas Manhas <tejas05@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250915102947.26681-3-atrajeev@linux.ibm.com
2025-09-22powerpc/time: Expose boot_tb via accessorAboorva Devarajan2-1/+11
- Define accessor function get_boot_tb() to safely return boot_tb value, this is only needed when running in SPLPAR environments, so the accessor is built conditionally under CONFIG_PPC_SPLPAR. - Tag boot_tb as __ro_after_init since it is written once at initialized and never updated afterwards. Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Tejas Manhas <tejas05@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250915102947.26681-2-atrajeev@linux.ibm.com
2025-09-22powerpc: stop calling page_address() in free_pages()Vishal Moola (Oracle)1-1/+1
free_pages() should be used when we only have a virtual address. We should call __free_pages() directly on our page instead. Link: https://lkml.kernel.org/r/20250903185921.1785167-6-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Justin Sanders <justin@coraid.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-22kasan: introduce ARCH_DEFER_KASAN and unify static key across modesSabyrzhan Tasbolatov5-19/+4
Patch series "kasan: unify kasan_enabled() and remove arch-specific implementations", v6. This patch series addresses the fragmentation in KASAN initialization across architectures by introducing a unified approach that eliminates duplicate static keys and arch-specific kasan_arch_is_ready() implementations. The core issue is that different architectures have inconsistent approaches to KASAN readiness tracking: - PowerPC, LoongArch, and UML arch, each implement own kasan_arch_is_ready() - Only HW_TAGS mode had a unified static key (kasan_flag_enabled) - Generic and SW_TAGS modes relied on arch-specific solutions or always-on behavior This patch (of 2): Introduce CONFIG_ARCH_DEFER_KASAN to identify architectures [1] that need to defer KASAN initialization until shadow memory is properly set up, and unify the static key infrastructure across all KASAN modes. [1] PowerPC, UML, LoongArch selects ARCH_DEFER_KASAN. The core issue is that different architectures haveinconsistent approaches to KASAN readiness tracking: - PowerPC, LoongArch, and UML arch, each implement own kasan_arch_is_ready() - Only HW_TAGS mode had a unified static key (kasan_flag_enabled) - Generic and SW_TAGS modes relied on arch-specific solutions or always-on behavior This patch addresses the fragmentation in KASAN initialization across architectures by introducing a unified approach that eliminates duplicate static keys and arch-specific kasan_arch_is_ready() implementations. Let's replace kasan_arch_is_ready() with existing kasan_enabled() check, which examines the static key being enabled if arch selects ARCH_DEFER_KASAN or has HW_TAGS mode support. For other arch, kasan_enabled() checks the enablement during compile time. Now KASAN users can use a single kasan_enabled() check everywhere. Link: https://lkml.kernel.org/r/20250810125746.1105476-1-snovitoll@gmail.com Link: https://lkml.kernel.org/r/20250810125746.1105476-2-snovitoll@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217049 Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> #powerpc Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: David Gow <davidgow@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Marco Elver <elver@google.com> Cc: Qing Zhang <zhangqing@loongson.cn> Cc: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>