summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-04-05mm: always inline __mk_vma_flags() and invoked functionsLorenzo Stoakes (Oracle)4-8/+12
Be explicit about __mk_vma_flags() (which is used by the mk_vma_flags() macro) always being inline, as we rely on the compiler to evaluate the loop in this function and determine that it can replace the code with the an equivalent constant value, e.g. that: __mk_vma_flags(2, (const vma_flag_t []){ VMA_WRITE_BIT, VMA_EXEC_BIT }); Can be replaced with: (1UL << VMA_WRITE_BIT) | (1UL << VMA_EXEC_BIT) = (1UL << 1) | (1UL << 2) = 6 Most likely an 'inline' will suffice for this, but be explicit as we can be. Also update all of the functions __mk_vma_flags() ultimately invokes to be always inline too. Note that test_bitmap_const_eval() asserts that the relevant bitmap functions result in build time constant values. Additionally, vma_flag_set() operates on a vma_flags_t type, so it is inconsistently named versus other VMA flags functions. We only use vma_flag_set() in __mk_vma_flags() so we don't need to worry about its new name being rather cumbersome, so rename it to vma_flags_set_flag() to disambiguate it from vma_flags_set(). Also update the VMA test headers to reflect the changes. Link: https://lkml.kernel.org/r/241f49c52074d436edbb9c6a6662a8dc142a8f43.1772704455.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Babu Moger <babu.moger@amd.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chao Yu <chao@kernel.org> Cc: Chatre, Reinette <reinette.chatre@intel.com> Cc: Chunhai Guo <guochunhai@vivo.com> Cc: Damien Le Maol <dlemoal@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Dave Martin <dave.martin@arm.com> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hongbo Li <lihongbo22@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morse <james.morse@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naohiro Aota <naohiro.aota@wdc.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Yue Hu <zbestahu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: add vma_desc_test_all() and use itLorenzo Stoakes (Oracle)4-8/+31
erofs and zonefs are using vma_desc_test_any() twice to check whether all of VMA_SHARED_BIT and VMA_MAYWRITE_BIT are set, this is silly, so add vma_desc_test_all() to test all flags and update erofs and zonefs to use it. While we're here, update the helper function comments to be more consistent. Also add the same to the VMA test headers. Link: https://lkml.kernel.org/r/568c8f8d6a84ff64014f997517cba7a629f7eed6.1772704455.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Babu Moger <babu.moger@amd.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chao Yu <chao@kernel.org> Cc: Chatre, Reinette <reinette.chatre@intel.com> Cc: Chunhai Guo <guochunhai@vivo.com> Cc: Damien Le Maol <dlemoal@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Dave Martin <dave.martin@arm.com> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hongbo Li <lihongbo22@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morse <james.morse@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naohiro Aota <naohiro.aota@wdc.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Yue Hu <zbestahu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: rename VMA flag helpers to be more readableLorenzo Stoakes (Oracle)16-70/+72
Patch series "mm: vma flag tweaks". The ongoing work around introducing non-system word VMA flags has introduced a number of helper functions and macros to make life easier when working with these flags and to make conversions from the legacy use of VM_xxx flags more straightforward. This series improves these to reduce confusion as to what they do and to improve consistency and readability. Firstly the series renames vma_flags_test() to vma_flags_test_any() to make it abundantly clear that this function tests whether any of the flags are set (as opposed to vma_flags_test_all()). It then renames vma_desc_test_flags() to vma_desc_test_any() for the same reason. Note that we drop the 'flags' suffix here, as vma_desc_test_any_flags() would be cumbersome and 'test' implies a flag test. Similarly, we rename vma_test_all_flags() to vma_test_all() for consistency. Next, we have a couple of instances (erofs, zonefs) where we are now testing for vma_desc_test_any(desc, VMA_SHARED_BIT) && vma_desc_test_any(desc, VMA_MAYWRITE_BIT). This is silly, so this series introduces vma_desc_test_all() so these callers can instead invoke vma_desc_test_all(desc, VMA_SHARED_BIT, VMA_MAYWRITE_BIT). We then observe that quite a few instances of vma_flags_test_any() and vma_desc_test_any() are in fact only testing against a single flag. Using the _any() variant here is just confusing - 'any' of single item reads strangely and is liable to cause confusion. So in these instances the series reintroduces vma_flags_test() and vma_desc_test() as helpers which test against a single flag. The fact that vma_flags_t is a struct and that vma_flag_t utilises sparse to avoid confusion with vm_flags_t makes it impossible for a user to misuse these helpers without it getting flagged somewhere. The series also updates __mk_vma_flags() and functions invoked by it to explicitly mark them always inline to match expectation and to be consistent with other VMA flag helpers. It also renames vma_flag_set() to vma_flags_set_flag() (a function only used by __mk_vma_flags()) to be consistent with other VMA flag helpers. Finally it updates the VMA tests for each of these changes, and introduces explicit tests for vma_flags_test() and vma_desc_test() to assert that they behave as expected. This patch (of 6): On reflection, it's confusing to have vma_flags_test() and vma_desc_test_flags() test whether any comma-separated VMA flag bit is set, while also having vma_flags_test_all() and vma_test_all_flags() separately test whether all flags are set. Firstly, rename vma_flags_test() to vma_flags_test_any() to eliminate this confusion. Secondly, since the VMA descriptor flag functions are becoming rather cumbersome, prefer vma_desc_test*() to vma_desc_test_flags*(), and also rename vma_desc_test_flags() to vma_desc_test_any(). Finally, rename vma_test_all_flags() to vma_test_all() to keep the VMA-specific helper consistent with the VMA descriptor naming convention and to help avoid confusion vs. vma_flags_test_all(). While we're here, also update whitespace to be consistent in helper functions. Link: https://lkml.kernel.org/r/cover.1772704455.git.ljs@kernel.org Link: https://lkml.kernel.org/r/0f9cb3c511c478344fac0b3b3b0300bb95be95e9.1772704455.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Suggested-by: Pedro Falcato <pfalcato@suse.de> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Babu Moger <babu.moger@amd.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chao Yu <chao@kernel.org> Cc: Chatre, Reinette <reinette.chatre@intel.com> Cc: Chunhai Guo <guochunhai@vivo.com> Cc: Damien Le Maol <dlemoal@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Dave Martin <dave.martin@arm.com> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hongbo Li <lihongbo22@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morse <james.morse@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naohiro Aota <naohiro.aota@wdc.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Yue Hu <zbestahu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05kasan: fix bug type classification for SW_TAGS modeAndrey Ryabinin1-4/+9
kasan_non_canonical_hook() derives orig_addr from kasan_shadow_to_mem(), but the pointer tag may remain in the top byte. In SW_TAGS mode this tagged address is compared against PAGE_SIZE and TASK_SIZE, which leads to incorrect bug classification. As a result, NULL pointer dereferences may be reported as "wild-memory-access". Strip the tag before performing these range checks and use the untagged value when reporting addresses in these ranges. Before: [ ] Unable to handle kernel paging request at virtual address ffef800000000000 [ ] KASAN: maybe wild-memory-access in range [0xff00000000000000-0xff0000000000000f] After: [ ] Unable to handle kernel paging request at virtual address ffef800000000000 [ ] KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f] Link: https://lkml.kernel.org/r/20260305185659.20807-1-ryabinin.a.a@gmail.com Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/vmscan: fix unintended mtc->nmask mutation in alloc_demote_folio()Bing Jiao1-9/+5
In alloc_demote_folio(), mtc->nmask is set to NULL for the first allocation. If that succeeds, it returns without restoring mtc->nmask to allowed_mask. For subsequent allocations from the migrate_pages() batch, mtc->nmask will be NULL. If the target node then becomes full, the fallback allocation will use nmask = NULL, allocating from any node allowed by the task cpuset, which for kswapd is all nodes. To address this issue, use a local copy of the mtc structure with nmask = NULL for the first allocation attempt specifically, ensuring the original mtc remains unmodified. Link: https://lkml.kernel.org/r/20260303052519.109244-1-bingjiao@google.com Fixes: 320080272892 ("mm/demotion: demote pages according to allocation fallback order") Signed-off-by: Bing Jiao <bingjiao@google.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/oom_kill.c: simpilfy rcu call with guard(rcu)Maninder Singh1-6/+3
guard(rcu)() simplifies code readability and there is no need of extra goto labels. Thus replacing rcu_read_lock/unlock with guard(rcu)(). Link: https://lkml.kernel.org/r/20260303102600.105255-1-maninder1.s@samsung.com Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Dmitry Ilvokhin <d@ilvokhin.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/page_reporting: change page_reporting_order to ↵Yuvraj Sakshith1-2/+2
PAGE_REPORTING_ORDER_UNSPECIFIED page_reporting_order when uninitialised, holds a magic number -1. Since we now maintain PAGE_REPORTING_ORDER_UNSPECIFIED as -1, which is also a flag, set page_reporting_order to this flag. Link: https://lkml.kernel.org/r/20260303113032.3008371-6-yuvraj.sakshith@oss.qualcomm.com Signed-off-by: Yuvraj Sakshith <yuvraj.sakshith@oss.qualcomm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Long Li <longli@microsoft.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Liu <wei.liu@kernel.org> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/page_reporting: change PAGE_REPORTING_ORDER_UNSPECIFIED to -1Yuvraj Sakshith1-1/+1
PAGE_REPORTING_ORDER_UNSPECIFIED is now set to zero. This means, pages of order zero cannot be reported to a client/driver -- as zero is used to signal a fallback to MAX_PAGE_ORDER. Change PAGE_REPORTING_ORDER_UNSPECIFIED to (-1), so that zero can be used as a valid order with which pages can be reported. Link: https://lkml.kernel.org/r/20260303113032.3008371-5-yuvraj.sakshith@oss.qualcomm.com Signed-off-by: Yuvraj Sakshith <yuvraj.sakshith@oss.qualcomm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Long Li <longli@microsoft.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Liu <wei.liu@kernel.org> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05hv_balloon: set unspecified page reporting orderYuvraj Sakshith1-1/+1
Explicitly mention page reporting order to be set to default value using PAGE_REPORTING_ORDER_UNSPECIFIED fallback value. Link: https://lkml.kernel.org/r/20260303113032.3008371-4-yuvraj.sakshith@oss.qualcomm.com Signed-off-by: Yuvraj Sakshith <yuvraj.sakshith@oss.qualcomm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Long Li <longli@microsoft.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Liu <wei.liu@kernel.org> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05virtio_balloon: set unspecified page reporting orderYuvraj Sakshith1-0/+2
virtio_balloon page reporting order is set to MAX_PAGE_ORDER implicitly as vb->prdev.order is never initialised and is auto-set to zero. Explicitly mention usage of default page order by making use of PAGE_REPORTING_ORDER_UNSPECIFIED fallback value. Link: https://lkml.kernel.org/r/20260303113032.3008371-3-yuvraj.sakshith@oss.qualcomm.com Signed-off-by: Yuvraj Sakshith <yuvraj.sakshith@oss.qualcomm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Long Li <longli@microsoft.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Liu <wei.liu@kernel.org> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/page_reporting: add PAGE_REPORTING_ORDER_UNSPECIFIEDYuvraj Sakshith2-1/+3
Patch series "Allow order zero pages in page reporting", v4. Today, page reporting sets page_reporting_order in two ways: (1) page_reporting.page_reporting_order cmdline parameter (2) Driver can pass order while registering itself. In both cases, order zero is ignored by free page reporting because it is used to set page_reporting_order to a default value, like MAX_PAGE_ORDER. In some cases we might want page_reporting_order to be zero. For instance, when virtio-balloon runs inside a guest with tiny memory (say, 16MB), it might not be able to find a order 1 page (or in the worst case order MAX_PAGE_ORDER page) after some uptime. Page reporting should be able to return order zero pages back for optimal memory relinquishment. This patch changes the default fallback value from '0' to '-1' in all possible clients of free page reporting (hv_balloon and virtio-balloon) together with allowing '0' as a valid order in page_reporting_register(). This patch (of 5): Drivers can pass order of pages to be reported while registering itself. Today, this is a magic number, 0. Label this with PAGE_REPORTING_ORDER_UNSPECIFIED and check for it when the driver is being registered. This macro will be used in relevant drivers next. [akpm@linux-foundation.org: tweak whitespace, per David] Link: https://lkml.kernel.org/r/20260303113032.3008371-1-yuvraj.sakshith@oss.qualcomm.com Link: https://lkml.kernel.org/r/20260303113032.3008371-2-yuvraj.sakshith@oss.qualcomm.com Signed-off-by: Yuvraj Sakshith <yuvraj.sakshith@oss.qualcomm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Long Li <longli@microsoft.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Liu <wei.liu@kernel.org> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: memcg: separate slab stat accounting from objcg charge cacheJohannes Weiner1-39/+61
Cgroup slab metrics are cached per-cpu the same way as the sub-page charge cache. However, the intertwined code to manage those dependent caches right now is quite difficult to follow. Specifically, cached slab stat updates occur in consume() if there was enough charge cache to satisfy the new object. If that fails, whole pages are reserved, and slab stats are updated when the remainder of those pages, after subtracting the size of the new slab object, are put into the charge cache. This already juggles a delicate mix of the object size, the page charge size, and the remainder to put into the byte cache. Doing slab accounting in this path as well is fragile, and has recently caused a bug where the input parameters between the two caches were mixed up. Refactor the consume() and refill() paths into unlocked and locked variants that only do charge caching. Then let the slab path manage its own lock section and open-code charging and accounting. This makes the slab stat cache subordinate to the charge cache: __refill_obj_stock() is called first to prepare it; __account_obj_stock() follows to hitch a ride. This results in a minor behavioral change: previously, a mismatching percpu stock would always be drained for the purpose of setting up slab account caching, even if there was no byte remainder to put into the charge cache. Now, the stock is left alone, and slab accounting takes the uncached path if there is a mismatch. This is exceedingly rare, and it was probably never worth draining the whole stock just to cache the slab stat update. Link: https://lkml.kernel.org/r/20260302195305.620713-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Hao Li <hao.li@linux.dev> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Johannes Weiner <jweiner@meta.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: memcontrol: use __account_obj_stock() in the !locked pathJohannes Weiner1-1/+5
Make __account_obj_stock() usable for the case where the local trylock failed. Then switch refill_obj_stock() over to it. This consolidates the mod_objcg_mlstate() call into one place and will make the next patch easier to follow. Link: https://lkml.kernel.org/r/20260302195305.620713-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: Hao Li <hao.li@linux.dev> Cc: Johannes Weiner <jweiner@meta.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: memcontrol: split out __obj_cgroup_charge()Johannes Weiner1-5/+16
Move the page charge and remainder calculation into its own function. It will make the slab stat refactor easier to follow. Link: https://lkml.kernel.org/r/20260302195305.620713-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: Hao Li <hao.li@linux.dev> Cc: Johannes Weiner <jweiner@meta.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: memcg: simplify objcg charge size and stock remainder mathJohannes Weiner1-10/+6
Use PAGE_ALIGN() and a more natural cache remainder calculation. Link: https://lkml.kernel.org/r/20260302195305.620713-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: Hao Li <hao.li@linux.dev> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: memcg: factor out trylock_stock() and unlock_stock()Johannes Weiner1-6/+19
Patch series "memcg: obj stock and slab stat caching cleanups". This is a follow-up to `[PATCH] memcg: fix slab accounting in refill_obj_stock() trylock path`. The way the slab stat cache and the objcg charge cache interact appears a bit too fragile. This series factors those paths apart as much as practical. This patch (of 5): Consolidate the local lock acquisition and the local stock lookup. This allows subsequent patches to use !!stock as an easy way to disambiguate the locked vs. contended cases through the callstack. Link: https://lkml.kernel.org/r/20260302195305.620713-1-hannes@cmpxchg.org Link: https://lkml.kernel.org/r/20260302195305.620713-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: Hao Li <hao.li@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05arm64: mm: implement the architecture-specific test_and_clear_young_ptes()Baolin Wang1-6/+12
Implement the Arm64 architecture-specific test_and_clear_young_ptes() to enable batched checking of young flags, improving performance during large folio reclamation when MGLRU is enabled. While we're at it, simplify ptep_test_and_clear_young() by calling test_and_clear_young_ptes(). Since callers guarantee that PTEs are present before calling these functions, we can use pte_cont() to check the CONT_PTE flag instead of pte_valid_cont(). Performance testing: Enable MGLRU, then allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to reclaim 8G file-backed folios via the memory.reclaim interface. I can observe 60%+ performance improvement on my Arm64 32-core server (and about 15% improvement on my X86 machine). W/o patchset: real 0m0.470s user 0m0.000s sys 0m0.470s W/ patchset: real 0m0.180s user 0m0.001s sys 0m0.179s Link: https://lkml.kernel.org/r/7f891d42a720cc2e57862f3b79e4f774404f313c.1772778858.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: support batched checking of the young flag for MGLRUBaolin Wang4-33/+49
Use the batched helper test_and_clear_young_ptes_notify() to check and clear the young flag to improve the performance during large folio reclamation when MGLRU is enabled. Meanwhile, we can also support batched checking the young and dirty flag when MGLRU walks the mm's pagetable to update the folios' generation counter. Since MGLRU also checks the PTE dirty bit, use folio_pte_batch_flags() with FPB_MERGE_YOUNG_DIRTY set to detect batches of PTEs for a large folio. Then we can remove the ptep_test_and_clear_young_notify() since it has no users now. Note that we also update the 'young' counter and 'mm_stats[MM_LEAF_YOUNG]' counter with the batched count in the lru_gen_look_around() and walk_pte_range(). However, the batched operations may inflate these two counters, because in a large folio not all PTEs may have been accessed. (Additionally, tracking how many PTEs have been accessed within a large folio is not very meaningful, since the mm core actually tracks access/dirty on a per-folio basis, not per page). The impact analysis is as follows: 1. The 'mm_stats[MM_LEAF_YOUNG]' counter has no functional impact and is mainly for debugging. 2. The 'young' counter is used to decide whether to place the current PMD entry into the bloom filters by suitable_to_scan() (so that next time we can check whether it has been accessed again), which may set the hash bit in the bloom filters for a PMD entry that hasn't seen much access. However, bloom filters inherently allow some error, so this effect appears negligible. Link: https://lkml.kernel.org/r/378f4acf7d07410aa7c2e4b49d56bb165918eb34.1772778858.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: add a batched helper to clear the young flag for large foliosBaolin Wang2-5/+48
Currently, MGLRU will call ptep_test_and_clear_young_notify() to check and clear the young flag for each PTE sequentially, which is inefficient for large folios reclamation. Moreover, on Arm64 architecture, which supports contiguous PTEs, the Arm64- specific ptep_test_and_clear_young() already implements an optimization to clear the young flags for PTEs within a contiguous range. However, this is not sufficient. Similar to the Arm64 specific clear_flush_young_ptes(), we can extend this to perform batched operations for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE). Thus, we can introduce a new batched helper: test_and_clear_young_ptes() and its wrapper test_and_clear_young_ptes_notify() which are consistent with the existing functions, to perform batched checking of the young flags for large folios, which can help improve performance during large folio reclamation when MGLRU is enabled. And it will be overridden by the architecture that implements a more efficient batch operation in the following patches. Link: https://lkml.kernel.org/r/23ec671bfcc06cd24ee0fbff8e329402742274a0.1772778858.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced()Baolin Wang1-0/+1
The folio_referenced() is used to test whether a folio was referenced during reclaim. Moreover, ZONE_DEVICE folios are controlled by their device driver, have a lifetime tied to that driver, and are never placed on the LRU list. That means we should never try to reclaim ZONE_DEVICE folios, so add a warning to catch this unexpected behavior in folio_referenced() to avoid confusion, as discussed in the previous thread[1]. [1] https://lore.kernel.org/all/16fb7985-ec0f-4b56-91e7-404c5114f899@kernel.org/ Link: https://lkml.kernel.org/r/64d6fb2a33f7101e1d4aca2c9052e0758b76d492.1772778858.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: rename ptep/pmdp_clear_young_notify() to ↵Baolin Wang2-8/+8
ptep/pmdp_test_and_clear_young_notify() Rename ptep/pmdp_clear_young_notify() to ptep/pmdp_test_and_clear_young_notify() to make the function names consistent. Link: https://lkml.kernel.org/r/b3454077ce88745e6f88386b1763721746884565.1772778858.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Suggested-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: use inline helper functions instead of ugly macrosBaolin Wang2-54/+52
Patch series "support batched checking of the young flag for MGLRU", v3. This is a follow-up to the previous work [1], to support batched checking of the young flag for MGLRU. Similarly, batched checking of young flag for large folios can improve performance during large-folio reclamation when MGLRU is enabled. I observed noticeable performance improvements (see patch 5) on an Arm64 machine that supports contiguous PTEs. All mm-selftests are passed. Patch 1 - 3: cleanup patches. Patch 4: add a new generic batched PTE helper: test_and_clear_young_ptes(). Patch 5: support batched young flag checking for MGLRU. Patch 6: implement the Arm64 arch-specific test_and_clear_young_ptes(). This patch (of 6): People have already complained that these *_clear_young_notify() related macros are very ugly, so let's use inline helpers to make them more readable. In addition, we cannot implement these inline helper functions in the mmu_notifier.h file, because some arch-specific files will include the mmu_notifier.h, which introduces header compilation dependencies and causes build errors (e.g., arch/arm64/include/asm/tlbflush.h). Moreover, since these functions are only used in the mm, implementing these inline helpers in the mm/internal.h header seems reasonable. Link: https://lkml.kernel.org/r/cover.1772778858.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/ea14af84e7967ccebb25082c28a8669d6da8fe57.1772778858.git.baolin.wang@linux.alibaba.com Link: https://lore.kernel.org/all/cover.1770645603.git.baolin.wang@linux.alibaba.com/ [1] Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/memory: support VM_MIXEDMAP in zap_special_vma_range()David Hildenbrand (Arm)1-2/+2
There is demand for also zapping page table entries by drivers in VM_MIXEDMAP VMAs[1]. Nothing really speaks against supporting VM_MIXEDMAP for driver use. We just don't want arbitrary drivers to zap in ordinary (non-special) VMAs. Link: https://lkml.kernel.org/r/20260227200848.114019-17-david@kernel.org Link: https://lore.kernel.org/r/aYSKyr7StGpGKNqW@google.com [1] Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: rename zap_vma_ptes() to zap_special_vma_range()David Hildenbrand (Arm)7-18/+16
zap_vma_ptes() is the only zapping function we export to modules. It's essentially a wrapper around zap_vma_range(), however, with some safety checks: * That the passed range fits fully into the VMA * That it's only used for VM_PFNMAP We will add support for VM_MIXEDMAP next, so use the more-generic term "special vma", although "special" is a bit overloaded. Maybe we'll later just support any VM_SPECIAL flag. While at it, improve the kerneldoc. Link: https://lkml.kernel.org/r/20260227200848.114019-16-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Leon Romanovsky <leon@kernel.org> [drivers/infiniband] Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: rename zap_page_range_single() to zap_vma_range()David Hildenbrand (Arm)10-22/+22
Let's rename it to make it better match our new naming scheme. While at it, polish the kerneldoc. [akpm@linux-foundation.org: fix rustfmtcheck] Link: https://lkml.kernel.org/r/20260227200848.114019-15-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: Puranjay Mohan <puranjay@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: rename zap_page_range_single_batched() to zap_vma_range_batched()David Hildenbrand (Arm)3-14/+16
Let's make the naming more consistent with our new naming scheme. While at it, polish the kerneldoc a bit. Link: https://lkml.kernel.org/r/20260227200848.114019-14-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: rename zap_vma_pages() to zap_vma()David Hildenbrand (Arm)5-5/+9
Let's rename it to an even simpler name. While at it, add some simplistic kernel doc. Link: https://lkml.kernel.org/r/20260227200848.114019-13-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/memory: inline unmap_page_range() into __zap_vma_range()David Hildenbrand (Arm)1-20/+12
Let's inline it into the single caller to reduce the number of confusing unmap/zap helpers. Get rid of the unnecessary BUG_ON(). [david@kernel.org: call the local variable simply "addr", per Lorenzo] Link: https://lkml.kernel.org/r/f7732d1c-0e85-4a14-948a-912c417018b5@kernel.org Link: https://lkml.kernel.org/r/20260227200848.114019-12-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/memory: use __zap_vma_range() in zap_vma_for_reaping()David Hildenbrand (Arm)2-4/+10
Let's call __zap_vma_range() instead of unmap_page_range() to prepare for further cleanups. To keep the existing behavior, whereby we do not call uprobe_munmap() which could block, add a new "reaping" member to zap_details and use it. Likely we should handle the possible blocking in uprobe_munmap() differently, but for now keep it unchanged. Link: https://lkml.kernel.org/r/20260227200848.114019-11-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/memory: convert details->even_cows into details->skip_cowsDavid Hildenbrand (Arm)3-8/+7
The current semantics are confusing: simply because someone specifies an empty zap_detail struct suddenly makes should_zap_cows() behave differently. The default should be to also zap CoW'ed anonymous pages. Really only unmap_mapping_pages() and friends want to skip zapping of these anon folios. So let's invert the meaning; turn the confusing "reclaim_pt" check that overrides other properties in should_zap_cows() into a safety check. Note that the only caller that sets reclaim_pt=true is madvise_dontneed_single_vma(), which wants to zap any pages. Link: https://lkml.kernel.org/r/20260227200848.114019-10-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/memory: move adjusting of address range to unmap_vmas()David Hildenbrand (Arm)1-35/+23
__zap_vma_range() has two callers, whereby zap_page_range_single_batched() documents that the range must fit into the VMA range. So move adjusting the range to unmap_vmas() where it is actually required and add a safety check in __zap_vma_range() instead. In unmap_vmas(), we'd never expect to have empty ranges (otherwise, why have the vma in there in the first place). __zap_vma_range() will no longer be called with start == end, so cleanup the function a bit. While at it, simplify the overly long comment to its core message. We will no longer call uprobe_munmap() for start == end, which actually seems to be the right thing to do. Note that hugetlb_zap_begin()->...->adjust_range_if_pmd_sharing_possible() cannot result in the range exceeding the vma range. Link: https://lkml.kernel.org/r/20260227200848.114019-9-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/memory: rename unmap_single_vma() to __zap_vma_range()David Hildenbrand (Arm)1-3/+3
Let's rename it to better fit our new naming scheme. Link: https://lkml.kernel.org/r/20260227200848.114019-8-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/oom_kill: factor out zapping of VMA into zap_vma_for_reaping()David Hildenbrand (Arm)3-22/+34
Let's factor it out so we can turn unmap_page_range() into a static function instead, and so oom reaping has a clean interface to call. Note that hugetlb is not supported, because it would require a bunch of hugetlb-specific further actions (see zap_page_range_single_batched()). Link: https://lkml.kernel.org/r/20260227200848.114019-7-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/oom_kill: use MMU_NOTIFY_CLEAR in __oom_reap_task_mm()David Hildenbrand (Arm)1-1/+1
In commit 7269f999934b ("mm/mmu_notifier: use correct mmu_notifier events for each invalidation") we converted all MMU_NOTIFY_UNMAP to MMU_NOTIFY_CLEAR, except the ones that actually perform munmap() or mremap() as documented. __oom_reap_task_mm() behaves much more like MADV_DONTNEED. So use MMU_NOTIFY_CLEAR as well. This is a preparation for further changes. Link: https://lkml.kernel.org/r/20260227200848.114019-6-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/memory: simplify calculation in unmap_mapping_range_tree()David Hildenbrand (Arm)3-12/+10
Let's simplify the calculation a bit further to make it easier to get, reusing vma_last_pgoff() which we move from interval_tree.c to mm.h. Link: https://lkml.kernel.org/r/20260227200848.114019-5-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/memory: inline unmap_mapping_range_vma() into unmap_mapping_range_tree()David Hildenbrand (Arm)1-16/+7
Let's remove the number of unmap-related functions that cause confusion by inlining unmap_mapping_range_vma() into its single caller. The end result looks pretty readable. Link: https://lkml.kernel.org/r/20260227200848.114019-4-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/memory: remove "zap_details" parameter from zap_page_range_single()David Hildenbrand (Arm)9-22/+20
Nobody except memory.c should really set that parameter to non-NULL. So let's just drop it and make unmap_mapping_range_vma() use zap_page_range_single_batched() instead. [david@kernel.org: format on a single line] Link: https://lkml.kernel.org/r/8a27e9ac-2025-4724-a46d-0a7c90894ba7@kernel.org Link: https://lkml.kernel.org/r/20260227200848.114019-3-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: Puranjay Mohan <puranjay@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/madvise: drop range checks in madvise_free_single_vma()David Hildenbrand (Arm)1-9/+4
Patch series "mm: cleanups around unmapping / zapping". A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions. With this series, we'll have the following high-level zap/unmap functions (excluding high-level folio zapping): * unmap_vmas() for actual unmapping (vmas will go away) * zap_vma(): zap all page table entries in a vma * zap_vma_for_reaping(): zap_vma() that must not block * zap_vma_range(): zap a range of page table entries * zap_vma_range_batched(): zap_vma_range() with more options and batching * zap_special_vma_range(): limited zap_vma_range() for modules * __zap_vma_range(): internal helper Patch #1 is not about unmapping/zapping, but I stumbled over it while verifying MADV_DONTNEED range handling. Patch #16 is related to [1], but makes sense even independent of that. This patch (of 16): madvise_vma_behavior()-> madvise_dontneed_free()->madvise_free_single_vma() is only called from madvise_walk_vmas() (a) After try_vma_read_lock() confirmed that the whole range falls into a single VMA (see is_vma_lock_sufficient()). (b) After adjusting the range to the VMA in the loop afterwards. madvise_dontneed_free() might drop the MM lock when handling userfaultfd, but it properly looks up the VMA again to adjust the range. So in madvise_free_single_vma(), the given range should always fall into a single VMA and should also span at least one page. Let's drop the error checks. The code now matches what we do in madvise_dontneed_single_vma(), where we call zap_vma_range_batched() that documents: "The range must fit into one VMA.". Although that function still adjusts that range, we'll change that soon. Link: https://lkml.kernel.org/r/20260227200848.114019-1-david@kernel.org Link: https://lkml.kernel.org/r/20260227200848.114019-2-david@kernel.org Link: https://lore.kernel.org/r/aYSKyr7StGpGKNqW@google.com [1] Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05kasan: docs: SLUB is the only remaining slab implementationDavid Hildenbrand (Arm)1-3/+0
We have only the SLUB implementation left in the kernel (referred to as "slab"). Therefore, there is nothing special regarding KASAN modes when it comes to the slab allocator anymore. Drop the stale comment regarding differing SLUB vs. SLAB support. Link: https://lkml.kernel.org/r/20260303120416.62580-1-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05vmalloc: support __GFP_RETRY_MAYFAIL and __GFP_NORETRYMichal Hocko1-5/+12
__GFP_RETRY_MAYFAIL and __GFP_NORETRY haven't been supported so far because their semantic (i.e. to not trigger OOM killer) is not possible with the existing vmalloc page table allocation which is allowing for the OOM killer. Example: __vmalloc(size, GFP_KERNEL | __GFP_RETRY_MAYFAIL); <snip> vmalloc_test/55 invoked oom-killer: gfp_mask=0x40dc0( GFP_KERNEL|__GFP_ZERO|__GFP_COMP), order=0, oom_score_adj=0 active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:0 slab_reclaimable:700 slab_unreclaimable:33708 mapped:0 shmem:0 pagetables:5174 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:850 free_pcp:319 free_cma:0 CPU: 4 UID: 0 PID: 639 Comm: vmalloc_test/55 ... Hardware name: QEMU Standard PC (i440FX + PIIX, ... Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 dump_header+0x43/0x1b3 out_of_memory.cold+0x8/0x78 __alloc_pages_slowpath.constprop.0+0xef5/0x1130 __alloc_frozen_pages_noprof+0x312/0x330 alloc_pages_mpol+0x7d/0x160 alloc_pages_noprof+0x50/0xa0 __pte_alloc_kernel+0x1e/0x1f0 ... <snip> There are usecases for these modifiers when a large allocation request should rather fail than trigger OOM killer which wouldn't be able to handle the situation anyway [1]. While we cannot change existing page table allocation code easily we can piggy back on scoped NOWAIT allocation for them that we already have in place. The rationale is that the bulk of the consumed memory is sitting in pages backing the vmalloc allocation. Page tables are only participating a tiny fraction. Moreover page tables for virtually allocated areas are never reclaimed so the longer the system runs to less likely they are. It makes sense to allow an approximation of __GFP_RETRY_MAYFAIL and __GFP_NORETRY even if the page table allocation part is much weaker. This doesn't break the failure mode while it allows for the no OOM semantic. [1] https://lore.kernel.org/all/32bd9bed-a939-69c4-696d-f7f9a5fe31d8@redhat.com/T/#u Link: https://lkml.kernel.org/r/20260302114740.2668450-2-urezki@gmail.com Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/vmalloc: fix incorrect size reporting on allocation failureUladzislau Rezki (Sony)1-1/+1
When __vmalloc_area_node() fails to allocate pages, the failure message may report an incorrect allocation size, for example: vmalloc error: size 0, failed to allocate pages, ... This happens because the warning prints area->nr_pages * PAGE_SIZE. At this point, area->nr_pages may be zero or partly populated thus it is not valid. Report the originally requested allocation size instead by using nr_small_pages * PAGE_SIZE, which reflects the actual number of pages being requested by user. Link: https://lkml.kernel.org/r/20260302114740.2668450-1-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05Documentation: fix a hugetlbfs reservation statementJane Chu1-1/+1
Documentation/mm/hugetlbfs_reserv.rst has if (resv_needed <= (resv_huge_pages - free_huge_pages)) resv_huge_pages += resv_needed; which describes this code in gather_surplus_pages() needed = (h->resv_huge_pages + delta) - h->free_huge_pages; if (needed <= 0) { h->resv_huge_pages += delta; return 0; } which means if there are enough free hugepages to account for the new reservation, simply update the global reservation count without further action. But the description is backwards, it should be if (resv_needed <= (free_huge_pages - resv_huge_pages)) instead. Link: https://lkml.kernel.org/r/20260302201015.1824798-1-jane.chu@oracle.com Fixes: 70bc0dc578b3 ("Documentation: vm, add hugetlbfs reservation overview") Signed-off-by: Jane Chu <jane.chu@oracle.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: make ref_unless functions unless_zero onlyGladyshev Ilya2-7/+7
There are no users of (folio/page)_ref_add_unless(page, nr, u) with u != 0 [1] and all current users are "internal" for page refcounting API. This allows us to safely drop this parameter and reduce function semantics to the "unless zero" cases only. If needed, these functions for the u!=0 cases can be trivially reintroduced later using the same atomic_add_unless operations as before. [1]: The last user was dropped in v5.18 kernel, commit 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page refcount"). There is no trace of discussion as to why this cleanup wasn't done earlier. Link: https://lkml.kernel.org/r/a0c89b49d38c671a0bdd35069d15ee13e08314d2.1772370066.git.gladyshev.ilya1@h-partners.com Co-developed-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com> Signed-off-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com> Signed-off-by: Gladyshev Ilya <gladyshev.ilya1@h-partners.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/page_alloc: remove pcpu_spin_* wrappersVlastimil Babka1-15/+9
We only ever use pcpu_spin_trylock()/unlock() with struct per_cpu_pages so refactor the helpers to remove the generic layer. No functional change intended. Link: https://lkml.kernel.org/r/20260227-b4-pcp-locking-cleanup-v1-3-f7e22e603447@kernel.org Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Suggested-by: Matthew Wilcox <willy@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/page_alloc: remove IRQ saving/restoring from pcp lockingVlastimil Babka1-30/+16
Effectively revert commit 038a102535eb ("mm/page_alloc: prevent pcp corruption with SMP=n"). The original problem is now avoided by pcp_spin_trylock() always failing on CONFIG_SMP=n, so we do not need to disable IRQs anymore. It's not a complete revert, because keeping the pcp_spin_(un)lock() wrappers is useful. Rename them from _maybe_irqsave/restore to _nopin. The difference from pcp_spin_trylock()/pcp_spin_unlock() is that the _nopin variants don't perform pcpu_task_pin/unpin(). Link: https://lkml.kernel.org/r/20260227-b4-pcp-locking-cleanup-v1-2-f7e22e603447@kernel.org Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/page_alloc: effectively disable pcp with CONFIG_SMP=nVlastimil Babka1-58/+34
Patch series "mm/page_alloc: pcp locking cleanup". This is a followup to the hotfix 038a102535eb ("mm/page_alloc: prevent pcp corruption with SMP=n"), to simplify the code and deal with the original issue properly. The previous RFC attempt [1] argued for changing the UP spinlock implementation, which was discouraged, but thanks to David's off-list suggestion, we can achieve the goal without changing the spinlock implementation. The main change in Patch 1 relies on the fact that on UP we don't need the pcp lists for scalability, so just make them always bypassed during alloc/free by making the pcp trylock an unconditional failure. The various drain paths that use pcp_spin_lock_maybe_irqsave() continue to exist but will never do any work in practice. In Patch 2 we can again remove the irq saving from them that commit 038a102535eb added. Besides simpler code with all the ugly UP_flags removed, we get less bloat with CONFIG_SMP=n for mm/page_alloc.o as a result: add/remove: 25/28 grow/shrink: 4/5 up/down: 2105/-6665 (-4560) Function old new delta get_page_from_freelist 5689 7248 +1559 free_unref_folios 2006 2324 +318 make_alloc_exact 270 286 +16 __zone_watermark_ok 306 322 +16 drain_pages_zone.isra 119 109 -10 decay_pcp_high 181 149 -32 setup_pcp_cacheinfo 193 147 -46 __free_frozen_pages 1339 1089 -250 alloc_pages_bulk_noprof 1054 419 -635 free_frozen_page_commit 907 - -907 try_to_claim_block 1975 - -1975 __rmqueue_pcplist 2614 - -2614 Total: Before=54624, After=50064, chg -8.35% This patch (of 3): The page allocator has been using a locking scheme for its percpu page caches (pcp) based on spin_trylock() with no _irqsave() part. The trick is that if we interrupt the locked section, we fail the trylock and just fallback to the slowpath taking the zone lock. That's more expensive, but rare, so we don't need to pay the irqsave/restore cost all the time in the fastpaths. It's similar to but not exactly local_trylock_t (which is also newer anyway) because in some cases we do lock the pcp of a non-local cpu to drain it, in a way that's cheaper than using IPI or queue_work_on(). The complication of this scheme has been UP non-debug spinlock implementation which assumes spin_trylock() can't fail on UP and has no state to track whether it's locked. It just doesn't anticipate this usage scenario. So to work around that we disable IRQs only on UP, complicating the implementation. Also recently we found years old bug in where we didn't disable IRQs in related paths - see 038a102535eb ("mm/page_alloc: prevent pcp corruption with SMP=n"). We can avoid this UP complication by realizing that we do not need the pcp caching for scalability on UP in the first place. Removing it completely with #ifdefs is not worth the trouble either. Just make pcp_spin_trylock() return NULL unconditionally on CONFIG_SMP=n. This makes the slowpaths unconditional, and we can remove the IRQ save/restore handling in pcp_spin_trylock()/unlock() completely. Link: https://lkml.kernel.org/r/20260227-b4-pcp-locking-cleanup-v1-0-f7e22e603447@kernel.org Link: https://lkml.kernel.org/r/20260227-b4-pcp-locking-cleanup-v1-1-f7e22e603447@kernel.org Link: https://lore.kernel.org/all/d762c46b-36f0-471a-b5b4-23c8cf5628ae@suse.cz/ [1] Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Suggested-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/test/core-kunit: add damon_apply_min_nr_regions() testSeongJae Park1-0/+52
Add a kunit test for the functionality of damon_apply_min_nr_regions(). Link: https://lkml.kernel.org/r/20260228222831.7232-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/vaddr: do not split regions for min_nr_regionsSeongJae Park2-144/+2
The previous commit made DAMON core split regions at the beginning for min_nr_regions. The virtual address space operation set (vaddr) does similar work on its own, for a case user delegates entire initial monitoring regions setup to vaddr. It is unnecessary now, as DAMON core will do similar work for any case. Remove the duplicated work in vaddr. Also, remove a helper function that was being used only for the work, and the test code of the helper function. Link: https://lkml.kernel.org/r/20260228222831.7232-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: split regions for min_nr_regionsSeongJae Park1-6/+39
Patch series "mm/damon: strictly respect min_nr_regions". DAMON core respects min_nr_regions only at merge operation. DAMON API callers are therefore responsible to respect or ignore that. Only vaddr ops is respecting that, but only for initial start time. DAMON sysfs interface allows users to setup the initial regions that DAMON core also respects. But, again, it works for only the initial time. Users setting the regions for min_nr_regions can be difficult and inefficient, when the min_nr_regions value is high. There was actually a report [1] from a user. The use case was page granular access monitoring with a large aggregation interval. Make the following three changes for resolving the issue. First (patch 1), make DAMON core split regions at the beginning and every aggregation interval, to respect the min_nr_regions. Second (patch 2), drop the vaddr's split operations and related code that are no more needed. Third (patch 3), add a kunit test for the newly introduced function. This patch (of 3): DAMON core layer respects the min_nr_regions parameter by setting the maximum size of each region as total monitoring region size divided by the parameter value. And the limit is applied by preventing merge of regions that result in a region larger than the maximum size. The limit is updated per ops update interval, because vaddr updates the monitoring regions on the ops update callback. It does nothing for the beginning state. That's because the users can set the initial monitoring regions as they want. That is, if the users really care about the min_nr_regions, they are supposed to set the initial monitoring regions to have more than min_nr_regions regions. The virtual address space operation set, vaddr, has an exceptional case. Users can ask the ops set to configure the initial regions on its own. For the case, vaddr sets up the initial regions to meet the min_nr_regions. So, vaddr has exceptional support, but basically users are required to set the regions on their own if they want min_nr_regions to be respected. When 'min_nr_regions' is high, such initial setup is difficult. If DAMON sysfs interface is used for that, the memory for saving the initial setup is also a waste. Even if the user forgives the setup, DAMON will eventually make more than min_nr_regions regions by splitting operations. But it will take time. If the aggregation interval is long, the delay could be problematic. There was actually a report [1] of the case. The reporter wanted to do page granular monitoring with a large aggregation interval. Also, DAMON is doing nothing for online changes on monitoring regions and min_nr_regions. For example, the user can remove a monitoring region or increase min_nr_regions while DAMON is running. Split regions larger than the size at the beginning of the kdamond main loop, to fix the initial setup issue. Also do the split every aggregation interval, for online changes. This means the behavior is slightly changed. It is difficult to imagine a use case that actually depends on the old behavior, though. So this change is arguably fine. Note that the size limit is aligned by damon_ctx->min_region_sz and cannot be zero. That is, if min_nr_region is larger than the total size of monitoring regions divided by ->min_region_sz, that cannot be respected. Link: https://lkml.kernel.org/r/20260228222831.7232-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260228222831.7232-2-sj@kernel.org Link: https://lore.kernel.org/CAC5umyjmJE9SBqjbetZZecpY54bHpn2AvCGNv3aF6J=1cfoPXQ@mail.gmail.com [1] Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/kasan: fix double free for kasan pXdsRitesh Harjani (IBM)1-4/+4
kasan_free_pxd() assumes the page table is always struct page aligned. But that's not always the case for all architectures. E.g. In case of powerpc with 64K pagesize, PUD table (of size 4096) comes from slab cache named pgtable-2^9. Hence instead of page_to_virt(pxd_page()) let's just directly pass the start of the pxd table which is passed as the 1st argument. This fixes the below double free kasan issue seen with PMEM: radix-mmu: Mapped 0x0000047d10000000-0x0000047f90000000 with 2.00 MiB pages ================================================================== BUG: KASAN: double-free in kasan_remove_zero_shadow+0x9c4/0xa20 Free of addr c0000003c38e0000 by task ndctl/2164 CPU: 34 UID: 0 PID: 2164 Comm: ndctl Not tainted 6.19.0-rc1-00048-gea1013c15392 #157 VOLUNTARY Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_012) hv:phyp pSeries Call Trace: dump_stack_lvl+0x88/0xc4 (unreliable) print_report+0x214/0x63c kasan_report_invalid_free+0xe4/0x110 check_slab_allocation+0x100/0x150 kmem_cache_free+0x128/0x6e0 kasan_remove_zero_shadow+0x9c4/0xa20 memunmap_pages+0x2b8/0x5c0 devm_action_release+0x54/0x70 release_nodes+0xc8/0x1a0 devres_release_all+0xe0/0x140 device_unbind_cleanup+0x30/0x120 device_release_driver_internal+0x3e4/0x450 unbind_store+0xfc/0x110 drv_attr_store+0x78/0xb0 sysfs_kf_write+0x114/0x140 kernfs_fop_write_iter+0x264/0x3f0 vfs_write+0x3bc/0x7d0 ksys_write+0xa4/0x190 system_call_exception+0x190/0x480 system_call_vectored_common+0x15c/0x2ec ---- interrupt: 3000 at 0x7fff93b3d3f4 NIP: 00007fff93b3d3f4 LR: 00007fff93b3d3f4 CTR: 0000000000000000 REGS: c0000003f1b07e80 TRAP: 3000 Not tainted (6.19.0-rc1-00048-gea1013c15392) MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 48888208 XER: 00000000 <...> NIP [00007fff93b3d3f4] 0x7fff93b3d3f4 LR [00007fff93b3d3f4] 0x7fff93b3d3f4 ---- interrupt: 3000 The buggy address belongs to the object at c0000003c38e0000 which belongs to the cache pgtable-2^9 of size 4096 The buggy address is located 0 bytes inside of 4096-byte region [c0000003c38e0000, c0000003c38e1000) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3c38c head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 memcg:c0000003bfd63e01 flags: 0x63ffff800000040(head|node=6|zone=0|lastcpupid=0x7ffff) page_type: f5(slab) raw: 063ffff800000040 c000000140058980 5deadbeef0000122 0000000000000000 raw: 0000000000000000 0000000080200020 00000000f5000000 c0000003bfd63e01 head: 063ffff800000040 c000000140058980 5deadbeef0000122 0000000000000000 head: 0000000000000000 0000000080200020 00000000f5000000 c0000003bfd63e01 head: 063ffff800000002 c00c000000f0e301 00000000ffffffff 00000000ffffffff head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000004 page dumped because: kasan: bad access detected [ 138.953636] [ T2164] Memory state around the buggy address: [ 138.953643] [ T2164] c0000003c38dff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 138.953652] [ T2164] c0000003c38dff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 138.953661] [ T2164] >c0000003c38e0000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 138.953669] [ T2164] ^ [ 138.953675] [ T2164] c0000003c38e0080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 138.953684] [ T2164] c0000003c38e0100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 138.953692] [ T2164] ================================================================== [ 138.953701] [ T2164] Disabling lock debugging due to kernel taint Link: https://lkml.kernel.org/r/2f9135c7866c6e0d06e960993b8a5674a9ebc7ec.1771938394.git.ritesh.list@gmail.com Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>