summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-05-25mm: fix copy_vma() error handling for hugetlb mappingsRicardo Cañuelo Navarro4-3/+22
If, during a mremap() operation for a hugetlb-backed memory mapping, copy_vma() fails after the source vma has been duplicated and opened (ie. vma_link() fails), the error is handled by closing the new vma. This updates the hugetlbfs reservation counter of the reservation map which at this point is referenced by both the source vma and the new copy. As a result, once the new vma has been freed and copy_vma() returns, the reservation counter for the source vma will be incorrect. This patch addresses this corner case by clearing the hugetlb private page reservation reference for the new vma and decrementing the reference before closing the vma, so that vma_close() won't update the reservation counter. This is also what copy_vma_and_data() does with the source vma if copy_vma() succeeds, so a helper function has been added to do the fixup in both functions. The issue was reported by a private syzbot instance and can be reproduced using the C reproducer in [1]. It's also a possible duplicate of public syzbot report [2]. The WARNING report is: ============================================================ page_counter underflow: -1024 nr_pages=1024 WARNING: CPU: 0 PID: 3287 at mm/page_counter.c:61 page_counter_cancel+0xf6/0x120 Modules linked in: CPU: 0 UID: 0 PID: 3287 Comm: repro__WARNING_ Not tainted 6.15.0-rc7+ #54 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:page_counter_cancel+0xf6/0x120 Code: ff 5b 41 5e 41 5f 5d c3 cc cc cc cc e8 f3 4f 8f ff c6 05 64 01 27 06 01 48 c7 c7 60 15 f8 85 48 89 de 4c 89 fa e8 2a a7 51 ff <0f> 0b e9 66 ff ff ff 44 89 f9 80 e1 07 38 c1 7c 9d 4c 81 RSP: 0018:ffffc900025df6a0 EFLAGS: 00010246 RAX: 2edfc409ebb44e00 RBX: fffffffffffffc00 RCX: ffff8880155f0000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff81c4a23c R09: 1ffff1100330482a R10: dffffc0000000000 R11: ffffed100330482b R12: 0000000000000000 R13: ffff888058a882c0 R14: ffff888058a882c0 R15: 0000000000000400 FS: 0000000000000000(0000) GS:ffff88808fc53000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b33e0 CR3: 00000000076d6000 CR4: 00000000000006f0 Call Trace: <TASK> page_counter_uncharge+0x33/0x80 hugetlb_cgroup_uncharge_counter+0xcb/0x120 hugetlb_vm_op_close+0x579/0x960 ? __pfx_hugetlb_vm_op_close+0x10/0x10 remove_vma+0x88/0x130 exit_mmap+0x71e/0xe00 ? __pfx_exit_mmap+0x10/0x10 ? __mutex_unlock_slowpath+0x22e/0x7f0 ? __pfx_exit_aio+0x10/0x10 ? __up_read+0x256/0x690 ? uprobe_clear_state+0x274/0x290 ? mm_update_next_owner+0xa9/0x810 __mmput+0xc9/0x370 exit_mm+0x203/0x2f0 ? __pfx_exit_mm+0x10/0x10 ? taskstats_exit+0x32b/0xa60 do_exit+0x921/0x2740 ? do_raw_spin_lock+0x155/0x3b0 ? __pfx_do_exit+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? _raw_spin_lock_irq+0xc5/0x100 do_group_exit+0x20c/0x2c0 get_signal+0x168c/0x1720 ? __pfx_get_signal+0x10/0x10 ? schedule+0x165/0x360 arch_do_signal_or_restart+0x8e/0x7d0 ? __pfx_arch_do_signal_or_restart+0x10/0x10 ? __pfx___se_sys_futex+0x10/0x10 syscall_exit_to_user_mode+0xb8/0x2c0 do_syscall_64+0x75/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x422dcd Code: Unable to access opcode bytes at 0x422da3. RSP: 002b:00007ff266cdb208 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: 0000000000000001 RBX: 00007ff266cdbcdc RCX: 0000000000422dcd RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004c7bec RBP: 00007ff266cdb220 R08: 203a6362696c6720 R09: 203a6362696c6720 R10: 0000200000c00000 R11: 0000000000000246 R12: ffffffffffffffd0 R13: 0000000000000002 R14: 00007ffe1cb5f520 R15: 00007ff266cbb000 </TASK> ============================================================ Link: https://lkml.kernel.org/r/20250523-warning_in_page_counter_cancel-v2-1-b6df1a8cfefd@igalia.com Link: https://people.igalia.com/rcn/kernel_logs/20250422__WARNING_in_page_counter_cancel__repro.c [1] Link: https://lore.kernel.org/all/67000a50.050a0220.49194.048d.GAE@google.com/ [2] Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Florent Revest <revest@google.com> Cc: Jann Horn <jannh@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25memcg: always call cond_resched() after fn()Breno Leitao1-4/+2
I am seeing soft lockup on certain machine types when a cgroup OOMs. This is happening because killing the process in certain machine might be very slow, which causes the soft lockup and RCU stalls. This happens usually when the cgroup has MANY processes and memory.oom.group is set. Example I am seeing in real production: [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) .... .... [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) .... [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982] .... Quick look at why this is so slow, it seems to be related to serial flush for certain machine types. For all the crashes I saw, the target CPU was at console_flush_all(). In the case above, there are thousands of processes in the cgroup, and it is soft locking up before it reaches the 1024 limit in the code (which would call the cond_resched()). So, cond_resched() in 1024 blocks is not sufficient. Remove the counter-based conditional rescheduling logic and call cond_resched() unconditionally after each task iteration, after fn() is called. This avoids the lockup independently of how slow fn() is. Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Rik van Riel <riel@surriel.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Michael van der Westhuizen <rmikey@meta.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb ↵Ge Yang1-0/+8
folios A kernel crash was observed when replacing free hugetlb folios: BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 Call Trace: <TASK> replace_free_hugepage_folios+0xb6/0x100 alloc_contig_range_noprof+0x18a/0x590 ? srso_return_thunk+0x5/0x5f ? down_read+0x12/0xa0 ? srso_return_thunk+0x5/0x5f cma_range_alloc.constprop.0+0x131/0x290 __cma_alloc+0xcf/0x2c0 cma_alloc_write+0x43/0xb0 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 debugfs_attr_write+0x46/0x70 full_proxy_write+0x62/0xa0 vfs_write+0xf8/0x420 ? srso_return_thunk+0x5/0x5f ? filp_flush+0x86/0xa0 ? srso_return_thunk+0x5/0x5f ? filp_close+0x1f/0x30 ? srso_return_thunk+0x5/0x5f ? do_dup2+0xaf/0x160 ? srso_return_thunk+0x5/0x5f ksys_write+0x65/0xe0 do_syscall_64+0x64/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e There is a potential race between __update_and_free_hugetlb_folio() and replace_free_hugepage_folios(): CPU1 CPU2 __update_and_free_hugetlb_folio replace_free_hugepage_folios folio_test_hugetlb(folio) -- It's still hugetlb folio. __folio_clear_hugetlb(folio) hugetlb_free_folio(folio) h = folio_hstate(folio) -- Here, h is NULL pointer When the above race condition occurs, folio_hstate(folio) returns NULL, and subsequent access to this NULL pointer will cause the system to crash. To resolve this issue, execute folio_hstate(folio) under the protection of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not return NULL. Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.com Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") Signed-off-by: Ge Yang <yangge1116@126.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: vmalloc: only zero-init on vrealloc shrinkKees Cook1-5/+7
The common case is to grow reallocations, and since init_on_alloc will have already zeroed the whole allocation, we only need to zero when shrinking the allocation. Link: https://lkml.kernel.org/r/20250515214217.619685-2-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: Shung-Hsi Yu <shung-hsi.yu@suse.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: vmalloc: actually use the in-place vrealloc regionKees Cook1-0/+1
Patch series "mm: vmalloc: Actually use the in-place vrealloc region". This fixes a performance regression[1] with vrealloc()[1]. The refactoring to not build a new vmalloc region only actually worked when shrinking. Actually return the resized area when it grows. Ugh. Link: https://lkml.kernel.org/r/20250515214217.619685-1-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Closes: https://lore.kernel.org/all/20250515-bpf-verifier-slowdown-vwo2meju4cgp2su5ckj@6gi6ssxbnfqg [1] Tested-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25alloc_tag: allocate percpu counters for module tags dynamicallySuren Baghdasaryan5-28/+88
When a module gets unloaded it checks whether any of its tags are still in use and if so, we keep the memory containing module's allocation tags alive until all tags are unused. However percpu counters referenced by the tags are freed by free_module(). This will lead to UAF if the memory allocated by a module is accessed after module was unloaded. To fix this we allocate percpu counters for module allocation tags dynamically and we keep it alive for tags which are still in use after module unloading. This also removes the requirement of a larger PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because percpu memory for counters does not need to be reserved anymore. Link: https://lkml.kernel.org/r/20250517000739.5930-1-surenb@google.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Tested-by: David Wang <00107082@163.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25module: release codetag section when module load failsDavid Wang1-0/+1
When module load fails after memory for codetag section is ready, codetag section memory will not be properly released. This causes memory leak, and if next module load happens to get the same module address, codetag may pick the uninitialized section when manipulating tags during module unload, and leads to "unable to handle page fault" BUG. Link: https://lkml.kernel.org/r/20250519163823.7540-1-00107082@163.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Signed-off-by: David Wang <00107082@163.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm/cma: make detection of highmem_start more robustMike Rapoport (Microsoft)1-1/+4
Pratyush Yadav reports the following crash: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/physaddr.c:23! ception 0x06 IP 10:ffffffff812ebbf8 error 0 cr2 0xffff88903ffff000 CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc6+ #231 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:__phys_addr+0x58/0x60 Code: 01 48 89 c2 48 d3 ea 48 85 d2 75 05 e9 91 52 cf 00 0f 0b 48 3d ff ff ff 1f 77 0f 48 8b 05 20 54 55 01 48 01 d0 e9 78 52 cf 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 0000:ffffffff82803dd8 EFLAGS: 00010006 ORIG_RAX: 0000000000000000 RAX: 000000007fffffff RBX: 00000000ffffffff RCX: 0000000000000000 RDX: 000000007fffffff RSI: 0000000280000000 RDI: ffffffffffffffff RBP: ffffffff82803e68 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff83153180 R11: ffffffff82803e48 R12: ffffffff83c9aed0 R13: 0000000000000000 R14: 0000001040000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88903ffff000 CR3: 0000000002838000 CR4: 00000000000000b0 Call Trace: <TASK> ? __cma_declare_contiguous_nid+0x6e/0x340 ? cma_declare_contiguous_nid+0x33/0x70 ? dma_contiguous_reserve_area+0x2f/0x70 ? setup_arch+0x6f1/0x870 ? start_kernel+0x52/0x4b0 ? x86_64_start_reservations+0x29/0x30 ? x86_64_start_kernel+0x7c/0x80 ? common_startup_64+0x13e/0x141 The reason is that __cma_declare_contiguous_nid() does: highmem_start = __pa(high_memory - 1) + 1; If dma_contiguous_reserve_area() (or any other CMA declaration) is called before free_area_init(), high_memory is uninitialized. Without CONFIG_DEBUG_VIRTUAL, it will likely work but use the wrong value for highmem_start. The issue occurs because commit e120d1bc12da ("arch, mm: set high_memory in free_area_init()") moved initialization of high_memory after the call to dma_contiguous_reserve() -> __cma_declare_contiguous_nid() on several architectures. In the case CONFIG_HIGHMEM is enabled, some architectures that actually support HIGHMEM (arm, powerpc and x86) have initialization of high_memory before a possible call to __cma_declare_contiguous_nid() and some initialized high_memory late anyway (arc, csky, microblase, mips, sparc, xtensa) even before the commit e120d1bc12da so they are fine with using uninitialized value of high_memory. And in the case CONFIG_HIGHMEM is disabled high_memory essentially becomes the first address after memory end, so instead of relying on high_memory to calculate highmem_start use memblock_end_of_DRAM() and eliminate the dependency of CMA area creation on high_memory in majority of configurations. Link: https://lkml.kernel.org/r/20250519171805.1288393-1-rppt@kernel.org Fixes: e120d1bc12da ("arch, mm: set high_memory in free_area_init()") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reported-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21MAINTAINERS: add mm memory policy sectionLorenzo Stoakes1-0/+20
As part of the ongoing efforts to sub-divide memory management maintainership and reviewership, establish a section for memory policy and migration and add appropriate maintainers and reviewers. [lorenzo.stoakes@oracle.com: add Ying as reviewer] Link: https://lkml.kernel.org/r/ed6f0fc2-5608-4eea-b1be-07e3e19be263@lucifer.local Link: https://lkml.kernel.org/r/20250515191358.205684-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Rakie Kim <rakie.kim@sk.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Huang Ying <ying.huang@linux.alibaba.com> Acked-by: Byungchul Park <byungchul@sk.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gregory Price <gourry@gourry.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21MAINTAINERS: add mm ksm sectionLorenzo Stoakes1-0/+15
As part of the ongoing efforts to sub-divide memory management maintainership and reviewership, establish a section for Kernel Samepage Merging (KSM) and add appropriate maintainers and reviewers. Link: https://lkml.kernel.org/r/20250515190404.203596-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Xu Xin <xu.xin16@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21kasan: avoid sleepable page allocation from atomic contextAlexander Gordeev1-14/+78
apply_to_pte_range() enters the lazy MMU mode and then invokes kasan_populate_vmalloc_pte() callback on each page table walk iteration. However, the callback can go into sleep when trying to allocate a single page, e.g. if an architecutre disables preemption on lazy MMU mode enter. On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable() and arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash occurs: [ 0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 [ 0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd [ 0.663358] preempt_count: 1, expected: 0 [ 0.663366] RCU nest depth: 0, expected: 0 [ 0.663375] no locks held by kthreadd/2. [ 0.663383] Preemption disabled at: [ 0.663386] [<0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0 [ 0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT [ 0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux) [ 0.663409] Call Trace: [ 0.663410] [<0002f3284c385f58>] dump_stack_lvl+0xe8/0x140 [ 0.663413] [<0002f3284c507b9e>] __might_resched+0x66e/0x700 [ 0.663415] [<0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0 [ 0.663419] [<0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0 [ 0.663421] [<0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0 [ 0.663424] [<0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120 [ 0.663427] [<0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0 [ 0.663429] [<0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120 [ 0.663433] [<0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0 [ 0.663435] [<0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0 [ 0.663437] [<0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0 [ 0.663440] [<0002f3284cbc9e88>] apply_to_page_range+0x28/0x40 [ 0.663442] [<0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0 [ 0.663445] [<0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10 [ 0.663448] [<0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0 [ 0.663451] [<0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310 [ 0.663454] [<0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110 [ 0.663457] [<0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330 [ 0.663460] [<0002f3284c458d56>] dup_task_struct+0x66/0x4d0 [ 0.663463] [<0002f3284c45be90>] copy_process+0x280/0x4b90 [ 0.663465] [<0002f3284c460940>] kernel_clone+0xd0/0x4b0 [ 0.663467] [<0002f3284c46115e>] kernel_thread+0xbe/0xe0 [ 0.663469] [<0002f3284c4e440e>] kthreadd+0x50e/0x7f0 [ 0.663472] [<0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0 [ 0.663475] [<0002f3284ed57ff2>] ret_from_fork+0xa/0x38 Instead of allocating single pages per-PTE, bulk-allocate the shadow memory prior to applying kasan_populate_vmalloc_pte() callback on a page range. Link: https://lkml.kernel.org/r/c61d3560297c93ed044f0b1af085610353a06a58.1747316918.git.agordeev@linux.ibm.com Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Daniel Axtens <dja@axtens.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21highmem: add folio_test_partial_kmap()Matthew Wilcox (Oracle)2-5/+12
In commit c749d9b7ebbc ("iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP"), Hugh correctly noted that if KMAP_LOCAL_FORCE_MAP is enabled, we must limit ourselves to PAGE_SIZE bytes per call to kmap_local(). The same problem exists in memcpy_from_folio(), memcpy_to_folio(), folio_zero_tail(), folio_fill_tail() and memcpy_from_file_folio(), so add folio_test_partial_kmap() to do this more succinctly. Link: https://lkml.kernel.org/r/20250514170607.3000994-2-willy@infradead.org Fixes: 00cdf76012ab ("mm: add memcpy_from_file_folio()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21MAINTAINERS: add hung-task detector sectionLance Yang1-0/+8
The hung-task detector is missing in MAINTAINERS. While it's been quiet recently, I'm actively working on it and volunteering to review patches. Adding this section will make it easier for contributors to know who to contact. Link: https://lkml.kernel.org/r/20250513052234.46463-1-lance.yang@linux.dev Signed-off-by: Lance Yang <lance.yang@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21taskstats: fix struct taskstats breaks backward compatibility since version 15Wang Yaxin1-18/+29
Problem ======== commit 658eb5ab916d ("delayacct: add delay max to record delay peak") - adding more fields commit f65c64f311ee ("delayacct: add delay min to record delay peak") - adding more fields commit b016d0873777 ("taskstats: modify taskstats version") - version bump to 15 Since version 15 (TASKSTATS_VERSION=15) the new layout of the structure adds fields in the middle of the structure, rendering all old software incompatible with newer kernels and software compiled against the new kernel headers incompatible with older kernels. Solution ========= move delay max and delay min to the end of taskstat, and bump the version to 16 after the change [wang.yaxin@zte.com.cn: adjust indentation] Link: https://lkml.kernel.org/r/202505192131489882NSciXV4EGd8zzjLuwoOK@zte.com.cn Link: https://lkml.kernel.org/r/20250510155413259V4JNRXxukdDgzsaL0Fo6a@zte.com.cn Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak") Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21mm/truncate: fix out-of-bounds when doing a right-aligned splitZhang Yi1-8/+12
When performing a right split on a folio, the split_at2 may point to a not-present page if the offset + length equals the original folio size, which will trigger the following error: BUG: unable to handle page fault for address: ffffea0006000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 143ffb9067 P4D 143ffb9067 PUD 143ffb8067 PMD 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 0 UID: 0 PID: 502640 Comm: fsx Not tainted 6.15.0-rc3-gc6156189fc6b #889 PR Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/4 RIP: 0010:truncate_inode_partial_folio+0x208/0x620 Code: ff 03 48 01 da e8 78 7e 13 00 48 83 05 10 b5 5a 0c 01 85 c0 0f 85 1c 02 001 RSP: 0018:ffffc90005bafab0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffffea0005ffff00 RCX: 0000000000000002 RDX: 000000000000000c RSI: 0000000000013975 RDI: ffffc90005bafa30 RBP: ffffea0006000000 R08: 0000000000000000 R09: 00000000000009bf R10: 00000000000007e0 R11: 0000000000000000 R12: 0000000000001633 R13: 0000000000000000 R14: ffffea0005ffff00 R15: fffffffffffffffe FS: 00007f9f9a161740(0000) GS:ffff8894971fd000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0006000008 CR3: 000000017c2ae000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> truncate_inode_pages_range+0x226/0x720 truncate_pagecache+0x57/0x90 ... Fix this issue by skipping the split if truncation aligns with the folio size, make sure the split page number lies within the folio. Link: https://lkml.kernel.org/r/20250512062825.3533342-1-yi.zhang@huaweicloud.com Fixes: 7460b470a131 ("mm/truncate: use folio_split() in truncate operation") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: ErKun Yang <yangerkun@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21MAINTAINERS: add mm reclaim sectionLorenzo Stoakes1-0/+13
In furtherance of ongoing efforts to ensure people are aware of who de-facto maintains/has an interest in specific parts of mm, as well trying to avoid get_maintainers.pl listing only Andrew and the mailing list for mm files - establish a reclaim memory management section and add relevant maintainers/reviewers. This is a key part of memory management so sensibly deserves its own section. This encompasses both 'classical' reclaim and MGLRU and thus reflects this in the reviewers from both, as well as those who have contributed specifically on the memcg side of things. Link: https://lkml.kernel.org/r/20250512143122.87740-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21MAINTAINERS: update page allocator sectionLorenzo Stoakes1-3/+5
Make Vlastimil maintainer of this section (with thanks to Vlastimil for agreeing to this!) and add page isolation files for which this section seem most appropriate. We may wish to, in future, refactor/rename some of these files to more logically fit what is actually being performed, but for the time being this seems the most sensible place. Additionally, fix the alphabetical ordering of files. Link: https://lkml.kernel.org/r/20250512144603.90379-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=yFlorent Revest1-1/+1
On configs with CONFIG_ARM64_GCS=y, VM_SHADOW_STACK is bit 38. On configs with CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y (selected by CONFIG_ARM64 when CONFIG_USERFAULTFD=y), VM_UFFD_MINOR is _also_ bit 38. This bit being shared by two different VMA flags could lead to all sorts of unintended behaviors. Presumably, a process could maybe call into userfaultfd in a way that disables the shadow stack vma flag. I can't think of any attack where this would help (presumably, if an attacker tries to disable shadow stacks, they are trying to hijack control flow so can't arbitrarily call into userfaultfd yet anyway) but this still feels somewhat scary. Link: https://lkml.kernel.org/r/20250507131000.1204175-2-revest@chromium.org Fixes: ae80e1629aea ("mm: Define VM_SHADOW_STACK for arm64 when we support GCS") Signed-off-by: Florent Revest <revest@chromium.org> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Brendan Jackman <jackmanb@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabledIgnacio Moreno Gonzalez1-0/+2
commit c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") maps the mmap option MAP_STACK to VM_NOHUGEPAGE. This is also done if CONFIG_TRANSPARENT_HUGEPAGE is not defined. But in that case, the VM_NOHUGEPAGE does not make sense. I discovered this issue when trying to use the tool CRIU to checkpoint and restore a container. Our running kernel is compiled without CONFIG_TRANSPARENT_HUGEPAGE. CRIU parses the output of /proc/<pid>/smaps and saves the "nh" flag. When trying to restore the container, CRIU fails to restore the "nh" mappings, since madvise() MADV_NOHUGEPAGE always returns an error because CONFIG_TRANSPARENT_HUGEPAGE is not defined. Link: https://lkml.kernel.org/r/20250507-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled-v5-1-c6c38cfefd6e@kuka.com Fixes: c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21MAINTAINERS: add myself as vmalloc co-maintainerUladzislau Rezki (Sony)1-1/+1
I have been working on the vmalloc code for several years, contributing to improvements and fixes. Add myself as co-maintainer ("M") alongside Andrew Morton. Link: https://lkml.kernel.org/r/20250507150257.61485-1-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Christop Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21mm/page_alloc.c: avoid infinite retries caused by cpuset raceTianyang Zhang1-0/+8
__alloc_pages_slowpath has no change detection for ac->nodemask in the part of retry path, while cpuset can modify it in parallel. For some processes that set mempolicy as MPOL_BIND, this results ac->nodemask changes, and then the should_reclaim_retry will judge based on the latest nodemask and jump to retry, while the get_page_from_freelist only traverses the zonelist from ac->preferred_zoneref, which selected by a expired nodemask and may cause infinite retries in some cases cpu 64: __alloc_pages_slowpath { /* ..... */ retry: /* ac->nodemask = 0x1, ac->preferred->zone->nid = 1 */ if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac); /* cpu 1: cpuset_write_resmask update_nodemask update_nodemasks_hier update_tasks_nodemask mpol_rebind_task mpol_rebind_policy mpol_rebind_nodemask // mempolicy->nodes has been modified, // which ac->nodemask point to */ /* ac->nodemask = 0x3, ac->preferred->zone->nid = 1 */ if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, did_some_progress > 0, &no_progress_loops)) goto retry; } Simultaneously starting multiple cpuset01 from LTP can quickly reproduce this issue on a multi node server when the maximum memory pressure is reached and the swap is enabled Link: https://lkml.kernel.org/r/20250416082405.20988-1-zhangtianyang@loongson.cn Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm: userfaultfd: correct dirty flags set for both present and swap pteBarry Song1-2/+10
As David pointed out, what truly matters for mremap and userfaultfd move operations is the soft dirty bit. The current comment and implementation—which always sets the dirty bit for present PTEs and fails to set the soft dirty bit for swap PTEs—are incorrect. This could break features like Checkpoint-Restore in Userspace (CRIU). This patch updates the behavior to correctly set the soft dirty bit for both present and swap PTEs in accordance with mremap. Link: https://lkml.kernel.org/r/20250508220912.7275-1-21cnbao@gmail.com Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reported-by: David Hildenbrand <david@redhat.com> Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/ Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12zsmalloc: don't underflow size calculation in zs_obj_write()Sergey Senozhatsky1-4/+4
Do not mix class->size and object size during offsets/sizes calculation in zs_obj_write(). Size classes can merge into clusters, based on objects-per-zspage and pages-per-zspage characteristics, so some size classes can store objects smaller than class->size. This becomes problematic when object size is much smaller than class->size. zsmalloc can falsely decide that object spans two physical pages, because a larger class->size value is used for that check, while the actual object is much smaller and fits the free space of the first physical page, so there is nothing to write to the second page and memcpy() size calculation underflows. Unable to handle kernel paging request at virtual address ffffc00081ff4000 pc : __memcpy+0x10/0x24 lr : zs_obj_write+0x1b0/0x1d0 [zsmalloc] Call trace: __memcpy+0x10/0x24 (P) zram_write_page+0x150/0x4fc [zram] zram_submit_bio+0x5e0/0x6a4 [zram] __submit_bio+0x168/0x220 submit_bio_noacct_nocheck+0x128/0x2c8 submit_bio_noacct+0x19c/0x2f8 This is mostly seen on system with larger page-sizes, because size class cluters of such systems hold wider size ranges than on 4K PAGE_SIZE systems. Assume a 16K PAGE_SIZE system, a write of 820 bytes object to a 864-bytes size class at offset 15560. 15560 + 864 is more than 16384 so zsmalloc attempts to memcpy() it to two physical pages. However, 16384 - 15560 = 824 which is more than 820, so the object in fact doesn't span two physical pages, and there is no data to write to the second physical page. We always know the exact size in bytes of the object that we are about to write (store), so use it instead of class->size. Link: https://lkml.kernel.org/r/20250507054312.4135983-1-senozhatsky@chromium.org Fixes: 44f76413496e ("zsmalloc: introduce new object mapping API") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reported-by: Igor Belousov <igor.b@beldev.am> Tested-by: Igor Belousov <igor.b@beldev.am> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm/page_alloc: fix race condition in unaccepted memory handlingKirill A. Shutemov3-49/+0
The page allocator tracks the number of zones that have unaccepted memory using static_branch_enc/dec() and uses that static branch in hot paths to determine if it needs to deal with unaccepted memory. Borislav and Thomas pointed out that the tracking is racy: operations on static_branch are not serialized against adding/removing unaccepted pages to/from the zone. Sanity checks inside static_branch machinery detects it: WARNING: CPU: 0 PID: 10 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x8e/0xa0 The comment around the WARN() explains the problem: /* * Warn about the '-1' case though; since that means a * decrement is concurrent with a first (0->1) increment. IOW * people are trying to disable something that wasn't yet fully * enabled. This suggests an ordering problem on the user side. */ The effect of this static_branch optimization is only visible on microbenchmark. Instead of adding more complexity around it, remove it altogether. Link: https://lkml.kernel.org/r/20250506133207.1009676-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") Link: https://lore.kernel.org/all/20250506092445.GBaBnVXXyvnazly6iF@fat_crate.local Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Reported-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm/page_alloc: ensure try_alloc_pages() plays well with unaccepted memoryKirill A. Shutemov1-13/+15
try_alloc_pages() will not attempt to allocate memory if the system has *any* unaccepted memory. Memory is accepted as needed and can remain in the system indefinitely, causing the interface to always fail. Rather than immediately giving up, attempt to use already accepted memory on free lists. Pass 'alloc_flags' to cond_accept_memory() and do not accept new memory for ALLOC_TRYLOCK requests. Found via code inspection - only BPF uses this at present and the runtime effects are unclear. Link: https://lkml.kernel.org/r/20250506112509.905147-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation") Cc: Alexei Starovoitov <ast@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12MAINTAINERS: add mm GUP sectionLorenzo Stoakes1-0/+12
As part of the ongoing efforts to sub-divide memory management maintainership and reviewership, establish a section for GUP (Get User Pages) support and add appropriate maintainers and reviewers. Link: https://lkml.kernel.org/r/20250506173601.97562-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm/codetag: move tag retrieval back upfront in __free_pages()David Wang2-9/+14
Commit 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks") introduces a possible use-after-free scenario, when page is non-compound, page[0] could be released by other thread right after put_page_testzero failed in current thread, pgalloc_tag_sub_pages afterwards would manipulate an invalid page for accounting remaining pages: [timeline] [thread1] [thread2] | alloc_page non-compound V | get_page, rf counter inc V | in ___free_pages | put_page_testzero fails V | put_page, page released V | in ___free_pages, | pgalloc_tag_sub_pages | manipulate an invalid page V Restore __free_pages() to its state before, retrieve alloc tag beforehand. Link: https://lkml.kernel.org/r/20250505193034.91682-1-00107082@163.com Fixes: 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks") Signed-off-by: David Wang <00107082@163.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm/memory: fix mapcount / refcount sanity check for mTHP reuseKairui Song1-1/+1
The following WARNING was triggered during swap stress test with mTHP enabled: [ 6609.335758] ------------[ cut here ]------------ [ 6609.337758] WARNING: CPU: 82 PID: 755116 at mm/memory.c:3794 do_wp_page+0x1084/0x10e0 [ 6609.340922] Modules linked in: zram virtiofs [ 6609.342699] CPU: 82 UID: 0 PID: 755116 Comm: sh Kdump: loaded Not tainted 6.15.0-rc1+ #1429 PREEMPT(voluntary) [ 6609.347620] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015 [ 6609.349909] RIP: 0010:do_wp_page+0x1084/0x10e0 [ 6609.351532] Code: ff ff 48 c7 c6 80 ba 49 82 4c 89 ef e8 95 fd fe ff 0f 0b bd f5 ff ff ff e9 43 fb ff ff 41 83 a9 bc 12 00 00 01 e9 5c fb ff ff <0f> 0b e9 a6 fc ff ff 65 ff 00 f0 48 0f b a 6d 00 1f 0f 83 82 fc ff [ 6609.357959] RSP: 0000:ffffc90002273d40 EFLAGS: 00010287 [ 6609.359915] RAX: 000000000000000f RBX: 0000000000000000 RCX: 000fffffffe00000 [ 6609.362606] RDX: 0000000000000010 RSI: 000055a119ac1000 RDI: ffffea000ae6ec00 [ 6609.365143] RBP: ffffea000ae6ec68 R08: 84000002b9bb1025 R09: 000055a119ab6000 [ 6609.367569] R10: ffff8881caa2ad80 R11: 0000000000000000 R12: ffff8881caa2ad80 [ 6609.370255] R13: ffffea000ae6ec00 R14: 000055a119ac1c9c R15: ffffc90002273dd8 [ 6609.373007] FS: 00007f08e467f740(0000) GS:ffff88a07c214000(0000) knlGS:0000000000000000 [ 6609.375999] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6609.377946] CR2: 000055a119ac1c9c CR3: 00000001adfd6005 CR4: 0000000000770eb0 [ 6609.380376] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6609.382853] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6609.385216] PKRU: 55555554 [ 6609.386141] Call Trace: [ 6609.387017] <TASK> [ 6609.387718] ? ___pte_offset_map+0x1b/0x110 [ 6609.389056] __handle_mm_fault+0xa51/0xf00 [ 6609.390363] ? exc_page_fault+0x6a/0x140 [ 6609.391629] handle_mm_fault+0x13d/0x360 [ 6609.392856] do_user_addr_fault+0x2f2/0x7f0 [ 6609.394160] ? sigprocmask+0x77/0xa0 [ 6609.395375] exc_page_fault+0x6a/0x140 [ 6609.396735] asm_exc_page_fault+0x26/0x30 [ 6609.398224] RIP: 0033:0x55a1050bc18b [ 6609.399567] Code: 8b 3f 4d 85 ff 74 40 41 39 5f 18 75 f2 49 8b 7f 08 44 38 27 75 e9 4c 89 c6 4c 89 45 c8 e8 bd 83 fa ff 4c 8b 45 c8 85 c0 75 d5 <41> 83 47 1c 01 48 83 c4 28 4c 89 f8 5b 4 1 5c 41 5d 41 5e 41 5f 5d [ 6609.405971] RSP: 002b:00007ffcf5f37d90 EFLAGS: 00010246 [ 6609.407737] RAX: 0000000000000000 RBX: 00000000182768fa RCX: 0000000000000000 [ 6609.410151] RDX: 00000000000000fa RSI: 000055a105175c7b RDI: 000055a119ac1c60 [ 6609.412606] RBP: 00007ffcf5f37de0 R08: 000055a105175c7b R09: 0000000000000000 [ 6609.414998] R10: 000000004d2dfb5a R11: 0000000000000246 R12: 0000000000000050 [ 6609.417193] R13: 00000000000000fa R14: 000055a119abaf60 R15: 000055a119ac1c80 [ 6609.419268] </TASK> [ 6609.419928] ---[ end trace 0000000000000000 ]--- The WARN_ON here is simply incorrect. The refcount here must be at least the mapcount, not the opposite. Each mapcount must have a corresponding refcount, but the refcount may increase if other components grab the folio, which is acceptable. Meanwhile, having a mapcount larger than refcount is a real problem. So fix the WARN_ON condition. Link: https://lkml.kernel.org/r/20250425074325.61833-1-ryncsn@gmail.com Fixes: 1da190f4d0a6 ("mm: Copy-on-Write (COW) reuse support for PTE-mapped THP") Signed-off-by: Kairui Song <kasong@tencent.com> Reported-by: Kairui Song <kasong@tencent.com> Closes: https://lore.kernel.org/all/CAMgjq7D+ea3eg9gRCVvRnto3Sv3_H3WVhupX4e=k8T5QAfBHbw@mail.gmail.com/ Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12kernel/fork: only call untrack_pfn_clear() on VMAs duplicated for fork()David Hildenbrand1-4/+5
Not intuitive, but vm_area_dup() located in kernel/fork.c is not only used for duplicating VMAs during fork(), but also for duplicating VMAs when splitting VMAs or when mremap()'ing them. VM_PFNMAP mappings can at least get ordinarily mremap()'ed (no change in size) and apparently also shrunk during mremap(), which implies duplicating the VMA in __split_vma() first. In case of ordinary mremap() (no change in size), we first duplicate the VMA in copy_vma_and_data()->copy_vma() to then call untrack_pfn_clear() on the old VMA: we effectively move the VM_PAT reservation. So the untrack_pfn_clear() call on the new VMA duplicating is wrong in that context. Splitting of VMAs seems problematic, because we don't duplicate/adjust the reservation when splitting the VMA. Instead, in memtype_erase() -- called during zapping/munmap -- we shrink a reservation in case only the end address matches: Assume we split a VMA into A and B, both would share a reservation until B is unmapped. So when unmapping B, the reservation would be updated to cover only A. When unmapping A, we would properly remove the now-shrunk reservation. That scenario describes the mremap() shrinking (old_size > new_size), where we split + unmap B, and the untrack_pfn_clear() on the new VMA when is wrong. What if we manage to split a VM_PFNMAP VMA into A and B and unmap A first? It would be broken because we would never free the reservation. Likely, there are ways to trigger such a VMA split outside of mremap(). Affecting other VMA duplication was not intended, vm_area_dup() being used outside of kernel/fork.c was an oversight. So let's fix that for; how to handle VMA splits better should be investigated separately. With a simple reproducer that uses mprotect() to split such a VMA I can trigger x86/PAT: pat_mremap:26448 freeing invalid memtype [mem 0x00000000-0x00000fff] Link: https://lkml.kernel.org/r/20250422144942.2871395-1-david@redhat.com Fixes: dc84bc2aba85 ("x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm: hugetlb: fix incorrect fallback for subpoolWupeng Ma1-6/+22
During our testing with hugetlb subpool enabled, we observe that hstate->resv_huge_pages may underflow into negative values. Root cause analysis reveals a race condition in subpool reservation fallback handling as follow: hugetlb_reserve_pages() /* Attempt subpool reservation */ gbl_reserve = hugepage_subpool_get_pages(spool, chg); /* Global reservation may fail after subpool allocation */ if (hugetlb_acct_memory(h, gbl_reserve) < 0) goto out_put_pages; out_put_pages: /* This incorrectly restores reservation to subpool */ hugepage_subpool_put_pages(spool, chg); When hugetlb_acct_memory() fails after subpool allocation, the current implementation over-commits subpool reservations by returning the full 'chg' value instead of the actual allocated 'gbl_reserve' amount. This discrepancy propagates to global reservations during subsequent releases, eventually causing resv_huge_pages underflow. This problem can be trigger easily with the following steps: 1. reverse hugepage for hugeltb allocation 2. mount hugetlbfs with min_size to enable hugetlb subpool 3. alloc hugepages with two task(make sure the second will fail due to insufficient amount of hugepages) 4. with for a few seconds and repeat step 3 which will make hstate->resv_huge_pages to go below zero. To fix this problem, return corrent amount of pages to subpool during the fallback after hugepage_subpool_get_pages is called. Link: https://lkml.kernel.org/r/20250410062633.3102457-1-mawupeng1@huawei.com Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools") Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Ma Wupeng <mawupeng1@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12Linux 6.15-rc6v6.15-rc6Linus Torvalds1-1/+1
2025-05-11Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds16-72/+200
Pull KVM fixes from Paolo Bonzini: "ARM: - Avoid use of uninitialized memcache pointer in user_mem_abort() - Always set HCR_EL2.xMO bits when running in VHE, allowing interrupts to be taken while TGE=0 and fixing an ugly bug on AmpereOne that occurs when taking an interrupt while clearing the xMO bits (AC03_CPU_36) - Prevent VMMs from hiding support for AArch64 at any EL virtualized by KVM - Save/restore the host value for HCRX_EL2 instead of restoring an incorrect fixed value - Make host_stage2_set_owner_locked() check that the entire requested range is memory rather than just the first page RISC-V: - Add missing reset of smstateen CSRs x86: - Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to sanitize the VMCB as its state is undefined after SHUTDOWN, emulating INIT is the least awful choice). - Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM KVM doesn't goof a sanity check in the future. - Free obsolete roots when (re)loading the MMU to fix a bug where pre-faulting memory can get stuck due to always encountering a stale root. - When dumping GHCB state, use KVM's snapshot instead of the raw GHCB page to print state, so that KVM doesn't print stale/wrong information. - When changing memory attributes (e.g. shared <=> private), add potential hugepage ranges to the mmu_invalidate_range_{start,end} set so that KVM doesn't create a shared/private hugepage when the the corresponding attributes will become mixed (the attributes are commited *after* KVM finishes the invalidation). - Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is loaded led to very measurable performance regressions for non-KVM workloads" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions KVM: arm64: Fix memory check in host_stage2_set_owner_locked() KVM: arm64: Kill HCRX_HOST_FLAGS KVM: arm64: Properly save/restore HCRX_EL2 KVM: arm64: selftest: Don't try to disable AArch64 support KVM: arm64: Prevent userspace from disabling AArch64 support at any virtualisable EL KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort() KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fields KVM: RISC-V: reset smstateen CSRs KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload() KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run() KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception
2025-05-11Merge tag 'mips-fixes_6.15_1' of ↵Linus Torvalds6-46/+54
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - Fix delayed timers - Fix NULL pointer deref - Fix wrong range check * tag 'mips-fixes_6.15_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Fix MAX_REG_OFFSET MIPS: CPS: Fix potential NULL pointer dereferences in cps_prepare_cpus() MIPS: rename rollback_handler with skipover_handler MIPS: Move r4k_wait() to .cpuidle.text section MIPS: Fix idle VS timer enqueue
2025-05-11Merge tag 'x86-urgent-2025-05-11' of ↵Linus Torvalds6-32/+41
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix a boot regression on very old x86 CPUs without CPUID support" * tag 'x86-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Consolidate the loader enablement checking
2025-05-11Merge tag 'timers-urgent-2025-05-11' of ↵Linus Torvalds5-16/+63
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc timers fixes from Ingo Molnar: - Fix time keeping bugs in CLOCK_MONOTONIC_COARSE clocks - Work around absolute relocations into vDSO code that GCC erroneously emits in certain arm64 build environments - Fix a false positive lockdep warning in the i8253 clocksource driver * tag 'timers-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable() arm64: vdso: Work around invalid absolute relocations from GCC timekeeping: Prevent coarse clocks going backwards
2025-05-11Merge tag 'input-for-v6.15-rc5' of ↵Linus Torvalds11-38/+65
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - Synaptics touchpad on multiple laptops (Dynabook Portege X30L-G, Dynabook Portege X30-D, TUXEDO InfinityBook Pro 14 v5, Dell Precision M3800, HP Elitebook 850 G1) switched from PS/2 to SMBus mode - a number of new controllers added to xpad driver: HORI Drum controller, PowerA Fusion Pro 4, PowerA MOGA XP-Ultra controller, 8BitDo Ultimate 2 Wireless Controller, 8BitDo Ultimate 3-mode Controller, Hyperkin DuchesS Xbox One controller - fixes to xpad driver to properly handle Mad Catz JOYTECH NEO SE Advanced and PDP Mirror's Edge Official controllers - fixes to xpad driver to properly handle "Share" button on some controllers - a fix for device initialization timing and for waking up the controller in cyttsp5 driver - a fix for hisi_powerkey driver to properly wake up from s2idle state - other assorted cleanups and fixes * tag 'input-for-v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: xpad - fix xpad_device sorting Input: xpad - add support for several more controllers Input: xpad - fix Share button on Xbox One controllers Input: xpad - fix two controller table values Input: hisi_powerkey - enable system-wakeup for s2idle Input: synaptics - enable InterTouch on Dell Precision M3800 Input: synaptics - enable InterTouch on TUXEDO InfinityBook Pro 14 v5 Input: synaptics - enable InterTouch on Dynabook Portege X30L-G Input: synaptics - enable InterTouch on Dynabook Portege X30-D Input: synaptics - enable SMBus for HP Elitebook 850 G1 Input: mtk-pmic-keys - fix possible null pointer dereference Input: xpad - add support for 8BitDo Ultimate 2 Wireless Controller Input: cyttsp5 - fix power control issue on wakeup MAINTAINERS: .mailmap: update Mattijs Korpershoek's email address dt-bindings: mediatek,mt6779-keypad: Update Mattijs' email address Input: stmpe-ts - use module alias instead of device table Input: cyttsp5 - ensure minimum reset pulse width Input: sparcspkr - avoid unannotated fall-through input/joystick: magellan: Mark __nonstring look-up table
2025-05-11Merge tag 'fixes-2025-05-11' of ↵Linus Torvalds2-2/+9
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fixes from Mike Rapoport: - Mark set_high_memory() as __init to fix section mismatch - Accept memory allocated in memblock_double_array() to mitigate crash of SNP guests * tag 'fixes-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock: Accept allocated memory before use in memblock_double_array() mm,mm_init: Mark set_high_memory as __init
2025-05-11Input: xpad - fix xpad_device sortingVicki Pfau1-1/+1
A recent commit put one entry in the wrong place. This just moves it to the right place. Signed-off-by: Vicki Pfau <vi@endrift.com> Link: https://lore.kernel.org/r/20250328234345.989761-5-vi@endrift.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-05-11Input: xpad - add support for several more controllersVicki Pfau1-0/+7
This adds support for several new controllers, all of which include Share buttons: - HORI Drum controller - PowerA Fusion Pro 4 - 8BitDo Ultimate 3-mode Controller - Hyperkin DuchesS Xbox One controller - PowerA MOGA XP-Ultra controller Signed-off-by: Vicki Pfau <vi@endrift.com> Link: https://lore.kernel.org/r/20250328234345.989761-4-vi@endrift.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-05-11Input: xpad - fix Share button on Xbox One controllersVicki Pfau1-15/+20
The Share button, if present, is always one of two offsets from the end of the file, depending on the presence of a specific interface. As we lack parsing for the identify packet we can't automatically determine the presence of that interface, but we can hardcode which of these offsets is correct for a given controller. More controllers are probably fixable by adding the MAP_SHARE_BUTTON in the future, but for now I only added the ones that I have the ability to test directly. Signed-off-by: Vicki Pfau <vi@endrift.com> Link: https://lore.kernel.org/r/20250328234345.989761-2-vi@endrift.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-05-11Input: xpad - fix two controller table valuesVicki Pfau1-2/+2
Two controllers -- Mad Catz JOYTECH NEO SE Advanced and PDP Mirror's Edge Official -- were missing the value of the mapping field, and thus wouldn't detect properly. Signed-off-by: Vicki Pfau <vi@endrift.com> Link: https://lore.kernel.org/r/20250328234345.989761-1-vi@endrift.com Fixes: 540602a43ae5 ("Input: xpad - add a few new VID/PID combinations") Fixes: 3492321e2e60 ("Input: xpad - add multiple supported devices") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-05-11Input: hisi_powerkey - enable system-wakeup for s2idleUlf Hansson1-1/+1
To wake up the system from s2idle when pressing the power-button, let's convert from using pm_wakeup_event() to pm_wakeup_dev_event(), as it allows us to specify the "hard" in-parameter, which needs to be set for s2idle. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20250306115021.797426-1-ulf.hansson@linaro.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-05-11Merge tag 'mm-hotfixes-stable-2025-05-10-14-23' of ↵Linus Torvalds23-95/+314
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "22 hotfixes. 13 are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates" * tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mm: fix folio_pte_batch() on XEN PV nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs() mm/hugetlb: copy the CMA flag when demoting mm, swap: fix false warning for large allocation with !THP_SWAP selftests/mm: fix a build failure on powerpc selftests/mm: fix build break when compiling pkey_util.c mm: vmalloc: support more granular vrealloc() sizing tools/testing/selftests: fix guard region test tmpfs assumption ocfs2: stop quota recovery before disabling quotas ocfs2: implement handshaking with ocfs2 recovery thread ocfs2: switch osb->disable_recovery to enum mailmap: map Uwe's BayLibre addresses to a single one MAINTAINERS: add mm THP section mm/userfaultfd: fix uninitialized output field for -EAGAIN race selftests/mm: compaction_test: support platform with huge mount of memory MAINTAINERS: add core mm section ocfs2: fix panic in failed foilio allocation mm/huge_memory: fix dereferencing invalid pmd migration entry MAINTAINERS: add reverse mapping section x86: disable image size check for test builds ...
2025-05-10Merge tag 'driver-core-6.15-rc6' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fix from Greg KH: "Here is a single driver core fix for a regression for platform devices that is a regression from a change that went into 6.15-rc1 that affected Pixel devices. It has been in linux-next for over a week with no reported problems" * tag 'driver-core-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: platform: Fix race condition during DMA configure at IOMMU probe time
2025-05-10Merge tag 'usb-6.15-rc6' of ↵Linus Torvalds21-95/+221
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB driver fixes for 6.15-rc6. Included in here are: - typec driver fixes - usbtmc ioctl fixes - xhci driver fixes - cdnsp driver fixes - some gadget driver fixes Nothing really major, just all little stuff that people have reported being issues. All of these have been in linux-next this week with no reported issues" * tag 'usb-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: xhci: dbc: Avoid event polling busyloop if pending rx transfers are inactive. usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeue usb: usbtmc: Fix erroneous generic_read ioctl return usb: usbtmc: Fix erroneous wait_srq ioctl return usb: usbtmc: Fix erroneous get_stb ioctl error returns usb: typec: tcpm: delay SNK_TRY_WAIT_DEBOUNCE to SRC_TRYWAIT transition USB: usbtmc: use interruptible sleep in usbtmc_read usb: cdnsp: fix L1 resume issue for RTL_REVISION_NEW_LPM version usb: typec: ucsi: displayport: Fix NULL pointer access usb: typec: ucsi: displayport: Fix deadlock usb: misc: onboard_usb_dev: fix support for Cypress HX3 hubs usb: uhci-platform: Make the clock really optional usb: dwc3: gadget: Make gadget_wakeup asynchronous usb: gadget: Use get_status callback to set remote wakeup capability usb: gadget: f_ecm: Add get_status callback usb: host: tegra: Prevent host controller crash when OTG port is used usb: cdnsp: Fix issue with resuming from L1 usb: gadget: tegra-xudc: ACK ST_RC after clearing CTRL_RUN
2025-05-10Merge tag 'staging-6.15-rc6' of ↵Linus Torvalds2-11/+4
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are three small staging driver fixes for 6.15-rc6. These are: - bcm2835-camera driver fix - two axis-fifo driver fixes All of these have been in linux-next for a few weeks with no reported issues" * tag 'staging-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: axis-fifo: Remove hardware resets for user errors staging: axis-fifo: Correct handling of tx_fifo_depth for size validation staging: bcm2835-camera: Initialise dev in v4l2_dev
2025-05-10Merge tag 'char-misc-6.15-rc6' of ↵Linus Torvalds29-96/+241
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc/IIO driver fixes from Greg KH: "Here are a bunch of small driver fixes (mostly all IIO) for 6.15-rc6. Included in here are: - loads of tiny IIO driver fixes for reported issues - hyperv driver fix for a much-reported and worked on sysfs ring buffer creation bug All of these have been in linux-next for over a week (the IIO ones for many weeks now), with no reported issues" * tag 'char-misc-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits) Drivers: hv: Make the sysfs node size for the ring buffer dynamic uio_hv_generic: Fix sysfs creation path for ring buffer iio: adis16201: Correct inclinometer channel resolution iio: adc: ad7606: fix serial register access iio: pressure: mprls0025pa: use aligned_s64 for timestamp iio: imu: adis16550: align buffers for timestamp staging: iio: adc: ad7816: Correct conditional logic for store mode iio: adc: ad7266: Fix potential timestamp alignment issue. iio: adc: ad7768-1: Fix insufficient alignment of timestamp. iio: adc: dln2: Use aligned_s64 for timestamp iio: accel: adxl355: Make timestamp 64-bit aligned using aligned_s64 iio: temp: maxim-thermocouple: Fix potential lack of DMA safe buffer. iio: chemical: pms7003: use aligned_s64 for timestamp iio: chemical: sps30: use aligned_s64 for timestamp iio: imu: inv_mpu6050: align buffer for timestamp iio: imu: st_lsm6dsx: Fix wakeup source leaks on device unbind iio: adc: qcom-spmi-iadc: Fix wakeup source leaks on device unbind iio: accel: fxls8962af: Fix wakeup source leaks on device unbind iio: adc: ad7380: fix event threshold shift iio: hid-sensor-prox: Fix incorrect OFFSET calculation ...
2025-05-10Merge tag 'i2c-for-6.15-rc6' of ↵Linus Torvalds2-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - omap: use correct function to read from device tree - MAINTAINERS: remove Seth from ISMT maintainership * tag 'i2c-for-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: MAINTAINERS: Remove entry for Seth Heasley i2c: omap: fix deprecated of_property_read_bool() use
2025-05-10Merge tag 'for-linus-6.15a-rc6-tag' of ↵Linus Torvalds6-14/+32
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - A fix for the xenbus driver allowing to use a PVH Dom0 with Xenstore running in another domain - A fix for the xenbus driver addressing a rare race condition resulting in NULL dereferences and other problems - A fix for the xen-swiotlb driver fixing a problem seen on Arm platforms * tag 'for-linus-6.15a-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xenbus: Use kref to track req lifetime xenbus: Allow PVH dom0 a non-local xenstore xen: swiotlb: Use swiotlb bouncing if kmalloc allocation demands it
2025-05-10Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds3-19/+17
Pull mount fixes from Al Viro: "A couple of races around legalize_mnt vs umount (both fairly old and hard to hit) plus two bugs in move_mount(2) - both around 'move detached subtree in place' logics" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix IS_MNT_PROPAGATING uses do_move_mount(): don't leak MNTNS_PROPAGATING on failures do_umount(): add missing barrier before refcount checks in sync case __legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock