<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/rmap.c, branch v7.1-rc5</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1-rc5</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1-rc5'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-04-05T20:53:33+00:00</updated>
<entry>
<title>mm/mglru: fix cgroup OOM during MGLRU state switching</title>
<updated>2026-04-05T20:53:33+00:00</updated>
<author>
<name>Leno Hou</name>
<email>lenohou@gmail.com</email>
</author>
<published>2026-03-18T16:30:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a6a8c087dce00eac0c6d03e560b0fa3d529afa5f'/>
<id>urn:sha1:a6a8c087dce00eac0c6d03e560b0fa3d529afa5f</id>
<content type='text'>
When the Multi-Gen LRU (MGLRU) state is toggled dynamically, a race
condition exists between the state switching and the memory reclaim path. 
This can lead to unexpected cgroup OOM kills, even when plenty of
reclaimable memory is available.

Problem Description
==================
The issue arises from a "reclaim vacuum" during the transition.

1. When disabling MGLRU, lru_gen_change_state() sets lrugen-&gt;enabled to
   false before the pages are drained from MGLRU lists back to traditional
   LRU lists.
2. Concurrent reclaimers in shrink_lruvec() see lrugen-&gt;enabled as false
   and skip the MGLRU path.
3. However, these pages might not have reached the traditional LRU lists
   yet, or the changes are not yet visible to all CPUs due to a lack
   of synchronization.
4. get_scan_count() subsequently finds traditional LRU lists empty,
   concludes there is no reclaimable memory, and triggers an OOM kill.

A similar race can occur during enablement, where the reclaimer sees the
new state but the MGLRU lists haven't been populated via fill_evictable()
yet.

Solution
========
Introduce a 'switching' state (`lru_switch`) to bridge the transition.
When transitioning, the system enters this intermediate state where
the reclaimer is forced to attempt both MGLRU and traditional reclaim
paths sequentially. This ensures that folios remain visible to at least
one reclaim mechanism until the transition is fully materialized across
all CPUs.

Race &amp; Mitigation
================
A race window exists between checking the 'draining' state and performing
the actual list operations. For instance, a reclaimer might observe the
draining state as false just before it changes, leading to a suboptimal
reclaim path decision.

However, this impact is effectively mitigated by the kernel's reclaim
retry mechanism (e.g., in do_try_to_free_pages). If a reclaimer pass fails
to find eligible folios due to a state transition race, subsequent retries
in the loop will observe the updated state and correctly direct the scan
to the appropriate LRU lists. This ensures the transient inconsistency
does not escalate into a terminal OOM kill.

This effectively reduce the race window that previously triggered OOMs
under high memory pressure.

This fix has been verified on v7.0.0-rc1; dynamic toggling of MGLRU
functions correctly without triggering unexpected OOM kills.

Link: https://lkml.kernel.org/r/20260319-b4-switch-mglru-v2-v5-1-8898491e5f17@gmail.com
Signed-off-by: Leno Hou &lt;lenohou@gmail.com&gt;
Acked-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Reviewed-by: Barry Song &lt;baohua@kernel.org&gt;
Reviewed-by: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Yuanchu Xie &lt;yuanchu@google.com&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Jialing Wang &lt;wjl.linux@gmail.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Cc: Bingfang Guo &lt;bfguo@icloud.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: support batched checking of the young flag for MGLRU</title>
<updated>2026-04-05T20:53:16+00:00</updated>
<author>
<name>Baolin Wang</name>
<email>baolin.wang@linux.alibaba.com</email>
</author>
<published>2026-03-06T06:43:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=56e5b60b2114dee967c971f08dd29ef193bd3a2d'/>
<id>urn:sha1:56e5b60b2114dee967c971f08dd29ef193bd3a2d</id>
<content type='text'>
Use the batched helper test_and_clear_young_ptes_notify() to check and
clear the young flag to improve the performance during large folio
reclamation when MGLRU is enabled.

Meanwhile, we can also support batched checking the young and dirty flag
when MGLRU walks the mm's pagetable to update the folios' generation
counter.  Since MGLRU also checks the PTE dirty bit, use
folio_pte_batch_flags() with FPB_MERGE_YOUNG_DIRTY set to detect batches
of PTEs for a large folio.

Then we can remove the ptep_test_and_clear_young_notify() since it has no
users now.

Note that we also update the 'young' counter and 'mm_stats[MM_LEAF_YOUNG]'
counter with the batched count in the lru_gen_look_around() and
walk_pte_range().  However, the batched operations may inflate these two
counters, because in a large folio not all PTEs may have been accessed. 
(Additionally, tracking how many PTEs have been accessed within a large
folio is not very meaningful, since the mm core actually tracks
access/dirty on a per-folio basis, not per page).  The impact analysis is
as follows:

1. The 'mm_stats[MM_LEAF_YOUNG]' counter has no functional impact and
   is mainly for debugging.

2. The 'young' counter is used to decide whether to place the current
   PMD entry into the bloom filters by suitable_to_scan() (so that next
   time we can check whether it has been accessed again), which may set
   the hash bit in the bloom filters for a PMD entry that hasn't seen much
   access.  However, bloom filters inherently allow some error, so this
   effect appears negligible.

Link: https://lkml.kernel.org/r/378f4acf7d07410aa7c2e4b49d56bb165918eb34.1772778858.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Rik van Riel &lt;riel@surriel.com&gt;
Acked-by: David Hildenbrand (Arm) &lt;david@kernel.org&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes (Oracle) &lt;ljs@kernel.org&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yuanchu Xie &lt;yuanchu@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced()</title>
<updated>2026-04-05T20:53:15+00:00</updated>
<author>
<name>Baolin Wang</name>
<email>baolin.wang@linux.alibaba.com</email>
</author>
<published>2026-03-06T06:43:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=83ec1286b173e6ff54d0dd0291fbc517d65a8d5b'/>
<id>urn:sha1:83ec1286b173e6ff54d0dd0291fbc517d65a8d5b</id>
<content type='text'>
The folio_referenced() is used to test whether a folio was referenced
during reclaim.  Moreover, ZONE_DEVICE folios are controlled by their
device driver, have a lifetime tied to that driver, and are never placed
on the LRU list.  That means we should never try to reclaim ZONE_DEVICE
folios, so add a warning to catch this unexpected behavior in
folio_referenced() to avoid confusion, as discussed in the previous
thread[1].

[1] https://lore.kernel.org/all/16fb7985-ec0f-4b56-91e7-404c5114f899@kernel.org/
Link: https://lkml.kernel.org/r/64d6fb2a33f7101e1d4aca2c9052e0758b76d492.1772778858.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Alistair Popple &lt;apopple@nvidia.com&gt;
Acked-by: David Hildenbrand (Arm) &lt;david@kernel.org&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes (Oracle) &lt;ljs@kernel.org&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yuanchu Xie &lt;yuanchu@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add folio_test_lazyfree helper</title>
<updated>2026-04-05T20:53:03+00:00</updated>
<author>
<name>Vernon Yang</name>
<email>yanglincheng@kylinos.cn</email>
</author>
<published>2026-02-21T09:39:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6cc153f90b7cf07db2b49469dfd79141b145036a'/>
<id>urn:sha1:6cc153f90b7cf07db2b49469dfd79141b145036a</id>
<content type='text'>
Add folio_test_lazyfree() function to identify lazy-free folios to improve
code readability.

Link: https://lkml.kernel.org/r/20260221093918.1456187-4-vernon2gm@gmail.com
Signed-off-by: Vernon Yang &lt;yanglincheng@kylinos.cn&gt;
Acked-by: David Hildenbrand (Red Hat) &lt;david@kernel.org&gt;
Reviewed-by: Lance Yang &lt;lance.yang@linux.dev&gt;
Reviewed-by: Dev Jain &lt;dev.jain@arm.com&gt;
Reviewed-by: Barry Song &lt;baohua@kernel.org&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Nico Pache &lt;npache@redhat.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/rmap: clear vma-&gt;anon_vma on error</title>
<updated>2026-03-22T00:36:33+00:00</updated>
<author>
<name>Lorenzo Stoakes (Oracle)</name>
<email>ljs@kernel.org</email>
</author>
<published>2026-03-18T12:26:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3a206a8649f83bec99a3517da5e7dac9c138875e'/>
<id>urn:sha1:3a206a8649f83bec99a3517da5e7dac9c138875e</id>
<content type='text'>
Commit 542eda1a8329 ("mm/rmap: improve anon_vma_clone(),
unlink_anon_vmas() comments, add asserts") alters the way errors are
handled, but overlooked one important aspect of clean up.

When a VMA encounters an error state in anon_vma_clone() (that is, on
attempted allocation of anon_vma_chain objects), it cleans up partially
established state in cleanup_partial_anon_vmas(), before returning an
error.

However, this occurs prior to anon_vma-&gt;num_active_vmas being incremented,
and it also fails to clear the VMA's vma-&gt;anon_vma field, which remains in
place.

This is immediately an inconsistent state, because
anon_vma-&gt;num_active_vmas is supposed to track the number of VMAs whose
vma-&gt;anon_vma field references that anon_vma, and now that count is
off-by-negative-1 for each VMA for which this error state has occurred.

When VMAs are unlinked from this anon_vma, unlink_anon_vmas() will
eventually underflow anon_vma-&gt;num_active_vmas, which will trigger a
warning.

This will always eventually happen, as we unlink anon_vma's at process
teardown.

It could also cause maybe_reuse_anon_vma() to incorrectly permit the reuse
of an anon_vma which has active VMAs attached, which will lead to a
persistently invalid state.

The solution is to clear the VMA's anon_vma field when we clean up partial
state, as the fact we are doing so indicates clearly that the VMA is not
correctly integrated into the anon_vma tree and thus this field is
invalid.

Link: https://lkml.kernel.org/r/20260318122632.63404-1-ljs@kernel.org
Fixes: 542eda1a8329 ("mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts")
Signed-off-by: Lorenzo Stoakes (Oracle) &lt;ljs@kernel.org&gt;
Reported-by: Sasha Levin &lt;sashal@kernel.org&gt;
Closes: https://lore.kernel.org/linux-mm/20260302151547.2389070-1-sashal@kernel.org/
Reported-by: Jiakai Xu &lt;jiakaipeanut@gmail.com&gt;
Closes: https://lore.kernel.org/linux-mm/CAFb8wJvRhatRD-9DVmr5v5pixTMPEr3UKjYBJjCd09OfH55CKg@mail.gmail.com/
Acked-by: David Hildenbrand (Arm) &lt;david@kernel.org&gt;
Acked-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Tested-by: Jiakai Xu &lt;jiakaipeanut@gmail.com&gt;
Acked-by: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Sasha Levin (Microsoft) &lt;sashal@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared THP</title>
<updated>2026-03-10T23:01:49+00:00</updated>
<author>
<name>Wei Yang</name>
<email>richard.weiyang@gmail.com</email>
</author>
<published>2026-03-05T01:50:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=939080834fef3ce42fdbcfef33fd29c9ffe5bbed'/>
<id>urn:sha1:939080834fef3ce42fdbcfef33fd29c9ffe5bbed</id>
<content type='text'>
Commit 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and
split_huge_pmd_locked()") return false unconditionally after
split_huge_pmd_locked().  This may fail try_to_migrate() early when
TTU_SPLIT_HUGE_PMD is specified.

The reason is the above commit adjusted try_to_migrate_one() to, when a
PMD-mapped THP entry is found, and TTU_SPLIT_HUGE_PMD is specified (for
example, via unmap_folio()), return false unconditionally.  This breaks
the rmap walk and fail try_to_migrate() early, if this PMD-mapped THP is
mapped in multiple processes.

The user sensible impact of this bug could be:

  * On memory pressure, shrink_folio_list() may split partially mapped
    folio with split_folio_to_list(). Then free unmapped pages without IO.
    If failed, it may not be reclaimed.
  * On memory failure, memory_failure() would call try_to_split_thp_page()
    to split folio contains the bad page. If succeed, the PG_has_hwpoisoned
    bit is only set in the after-split folio contains @split_at. By doing
    so, we limit bad memory. If failed to split, the whole folios is not
    usable.

One way to reproduce:

    Create an anonymous THP range and fork 512 children, so we have a
    THP shared mapped in 513 processes. Then trigger folio split with
    /sys/kernel/debug/split_huge_pages debugfs to split the THP folio to
    order 0.

Without the above commit, we can successfully split to order 0.  With the
above commit, the folio is still a large folio.

And currently there are two core users of TTU_SPLIT_HUGE_PMD:

  * try_to_unmap_one()
  * try_to_migrate_one()

try_to_unmap_one() would restart the rmap walk, so only
try_to_migrate_one() is affected.

We can't simply revert commit 60fbb14396d5 ("mm/huge_memory: adjust
try_to_migrate_one() and split_huge_pmd_locked()"), since it removed some
duplicated check covered by page_vma_mapped_walk().

This patch fixes this by restart page_vma_mapped_walk() after
split_huge_pmd_locked().  Since we cannot simply return "true" to fix the
problem, as that would affect another case:

    When invoking folio_try_share_anon_rmap_pmd() from
    split_huge_pmd_locked(), the latter can fail and leave a large folio
    mapped through PTEs, in which case we ought to return true from
    try_to_migrate_one(). This might result in unnecessary walking of the
    rmap but is relatively harmless.

Link: https://lkml.kernel.org/r/20260305015006.27343-1-richard.weiyang@gmail.com
Fixes: 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
Signed-off-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Tested-by: Lance Yang &lt;lance.yang@linux.dev&gt;
Reviewed-by: Lance Yang &lt;lance.yang@linux.dev&gt;
Reviewed-by: Gavin Guo &lt;gavinguo@igalia.com&gt;
Acked-by: David Hildenbrand (arm) &lt;david@kernel.org&gt;
Reviewed-by: Lorenzo Stoakes (Oracle) &lt;ljs@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/rmap: fix incorrect pte restoration for lazyfree folios</title>
<updated>2026-03-10T23:01:49+00:00</updated>
<author>
<name>Dev Jain</name>
<email>dev.jain@arm.com</email>
</author>
<published>2026-03-03T06:15:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=29f40594a28114b9a9bc87f6cf7bbee9609628f2'/>
<id>urn:sha1:29f40594a28114b9a9bc87f6cf7bbee9609628f2</id>
<content type='text'>
We batch unmap anonymous lazyfree folios by folio_unmap_pte_batch.  If the
batch has a mix of writable and non-writable bits, we may end up setting
the entire batch writable.  Fix this by respecting writable bit during
batching.

Although on a successful unmap of a lazyfree folio, the soft-dirty bit is
lost, preserve it on pte restoration by respecting the bit during
batching, to make the fix consistent w.r.t both writable bit and
soft-dirty bit.

I was able to write the below reproducer and crash the kernel. 
Explanation of reproducer (set 64K mTHP to always):

Fault in a 64K large folio.  Split the VMA at mid-point with
MADV_DONTFORK.  fork() - parent points to the folio with 8 writable ptes
and 8 non-writable ptes.  Merge the VMAs with MADV_DOFORK so that
folio_unmap_pte_batch() can determine all the 16 ptes as a batch.  Do
MADV_FREE on the range to mark the folio as lazyfree.  Write to the memory
to dirty the pte, eventually rmap will dirty the folio.  Then trigger
reclaim, we will hit the pte restoration path, and the kernel will crash
with the trace given below.

The BUG happens at:

	BUG_ON(atomic_inc_return(&amp;ptc-&gt;anon_map_count) &gt; 1 &amp;&amp; rw);

The code path is asking for anonymous page to be mapped writable into the
pagetable.  The BUG_ON() firing implies that such a writable page has been
mapped into the pagetables of more than one process, which breaks
anonymous memory/CoW semantics.

[   21.134473] kernel BUG at mm/page_table_check.c:118!
[   21.134497] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
[   21.135917] Modules linked in:
[   21.136085] CPU: 1 UID: 0 PID: 1735 Comm: dup-lazyfree Not tainted 7.0.0-rc1-00116-g018018a17770 #1028 PREEMPT
[   21.136858] Hardware name: linux,dummy-virt (DT)
[   21.137019] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[   21.137308] pc : page_table_check_set+0x28c/0x2a8
[   21.137607] lr : page_table_check_set+0x134/0x2a8
[   21.137885] sp : ffff80008a3b3340
[   21.138124] x29: ffff80008a3b3340 x28: fffffdffc3d14400 x27: ffffd1a55e03d000
[   21.138623] x26: 0040000000000040 x25: ffffd1a55f7dd000 x24: 0000000000000001
[   21.139045] x23: 0000000000000001 x22: 0000000000000001 x21: ffffd1a55f217f30
[   21.139629] x20: 0000000000134521 x19: 0000000000134519 x18: 005c43e000040000
[   21.140027] x17: 0001400000000000 x16: 0001700000000000 x15: 000000000000ffff
[   21.140578] x14: 000000000000000c x13: 005c006000000000 x12: 0000000000000020
[   21.140828] x11: 0000000000000000 x10: 005c000000000000 x9 : ffffd1a55c079ee0
[   21.141077] x8 : 0000000000000001 x7 : 005c03e000040000 x6 : 000000004000ffff
[   21.141490] x5 : ffff00017fffce00 x4 : 0000000000000001 x3 : 0000000000000002
[   21.141741] x2 : 0000000000134510 x1 : 0000000000000000 x0 : ffff0000c08228c0
[   21.141991] Call trace:
[   21.142093]  page_table_check_set+0x28c/0x2a8 (P)
[   21.142265]  __page_table_check_ptes_set+0x144/0x1e8
[   21.142441]  __set_ptes_anysz.constprop.0+0x160/0x1a8
[   21.142766]  contpte_set_ptes+0xe8/0x140
[   21.142907]  try_to_unmap_one+0x10c4/0x10d0
[   21.143177]  rmap_walk_anon+0x100/0x250
[   21.143315]  try_to_unmap+0xa0/0xc8
[   21.143441]  shrink_folio_list+0x59c/0x18a8
[   21.143759]  shrink_lruvec+0x664/0xbf0
[   21.144043]  shrink_node+0x218/0x878
[   21.144285]  __node_reclaim.constprop.0+0x98/0x338
[   21.144763]  user_proactive_reclaim+0x2a4/0x340
[   21.145056]  reclaim_store+0x3c/0x60
[   21.145216]  dev_attr_store+0x20/0x40
[   21.145585]  sysfs_kf_write+0x84/0xa8
[   21.145835]  kernfs_fop_write_iter+0x130/0x1c8
[   21.145994]  vfs_write+0x2b8/0x368
[   21.146119]  ksys_write+0x70/0x110
[   21.146240]  __arm64_sys_write+0x24/0x38
[   21.146380]  invoke_syscall+0x50/0x120
[   21.146513]  el0_svc_common.constprop.0+0x48/0xf8
[   21.146679]  do_el0_svc+0x28/0x40
[   21.146798]  el0_svc+0x34/0x110
[   21.146926]  el0t_64_sync_handler+0xa0/0xe8
[   21.147074]  el0t_64_sync+0x198/0x1a0
[   21.147225] Code: f9400441 b4fff241 17ffff94 d4210000 (d4210000)
[   21.147440] ---[ end trace 0000000000000000 ]---

#define _GNU_SOURCE
#include &lt;stdio.h&gt;
#include &lt;unistd.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;sys/mman.h&gt;
#include &lt;string.h&gt;
#include &lt;sys/wait.h&gt;
#include &lt;sched.h&gt;
#include &lt;fcntl.h&gt;

void write_to_reclaim() {
    const char *path = "/sys/devices/system/node/node0/reclaim";
    const char *value = "409600000000";
    int fd = open(path, O_WRONLY);
    if (fd == -1) {
        perror("open");
        exit(EXIT_FAILURE);
    }

    if (write(fd, value, sizeof("409600000000") - 1) == -1) {
        perror("write");
        close(fd);
        exit(EXIT_FAILURE);
    }

    printf("Successfully wrote %s to %s\n", value, path);
    close(fd);
}

int main()
{
	char *ptr = mmap((void *)(1UL &lt;&lt; 30), 1UL &lt;&lt; 16, PROT_READ | PROT_WRITE,
			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	if ((unsigned long)ptr != (1UL &lt;&lt; 30)) {
		perror("mmap");
		return 1;
	}

	/* a 64K folio gets faulted in */
	memset(ptr, 0, 1UL &lt;&lt; 16);

	/* 32K half will not be shared into child */
	if (madvise(ptr, 1UL &lt;&lt; 15, MADV_DONTFORK)) {
		perror("madvise madv dontfork");
		return 1;
	}

	pid_t pid = fork();

	if (pid &lt; 0) {
		perror("fork");
		return 1;
	} else if (pid == 0) {
		sleep(15);
	} else {
		/* merge VMAs. now first half of the 16 ptes are writable, the other half not. */
		if (madvise(ptr, 1UL &lt;&lt; 15, MADV_DOFORK)) {
			perror("madvise madv fork");
			return 1;
		}
		if (madvise(ptr, (1UL &lt;&lt; 16), MADV_FREE)) {
			perror("madvise madv free");
			return 1;
		}

		/* dirty the large folio */
		(*ptr) += 10;

		write_to_reclaim();
		// sleep(10);
		waitpid(pid, NULL, 0);

	}
}

Link: https://lkml.kernel.org/r/20260303061528.2429162-1-dev.jain@arm.com
Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
Signed-off-by: Dev Jain &lt;dev.jain@arm.com&gt;
Acked-by: David Hildenbrand (Arm) &lt;david@kernel.org&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Reviewed-by: Barry Song &lt;baohua@kernel.org&gt;
Reviewed-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Tested-by: Lance Yang &lt;lance.yang@linux.dev&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: rmap: support batched unmapping for file large folios</title>
<updated>2026-02-12T23:43:01+00:00</updated>
<author>
<name>Baolin Wang</name>
<email>baolin.wang@linux.alibaba.com</email>
</author>
<published>2026-02-09T14:07:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a67fe41e214fcfd8b66dfb73d3abe4b7a4b3751d'/>
<id>urn:sha1:a67fe41e214fcfd8b66dfb73d3abe4b7a4b3751d</id>
<content type='text'>
Similar to folio_referenced_one(), we can apply batched unmapping for file
large folios to optimize the performance of file folios reclamation.

Barry previously implemented batched unmapping for lazyfree anonymous
large folios[1] and did not further optimize anonymous large folios or
file-backed large folios at that stage.  As for file-backed large folios,
the batched unmapping support is relatively straightforward, as we only
need to clear the consecutive (present) PTE entries for file-backed large
folios.

Note that it's not ready to support batched unmapping for uffd case, so
let's still fallback to per-page unmapping for the uffd case.

Performance testing:
Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and
try to reclaim 8G file-backed folios via the memory.reclaim interface.  I
can observe 75% performance improvement on my Arm64 32-core server (and
50%+ improvement on my X86 machine) with this patch.

W/o patch:
real    0m1.018s
user    0m0.000s
sys     0m1.018s

W/ patch:
real	0m0.249s
user	0m0.000s
sys	0m0.249s

[1] https://lore.kernel.org/all/20250214093015.51024-4-21cnbao@gmail.com/T/#u
Link: https://lkml.kernel.org/r/b53a16f67c93a3fe65e78092069ad135edf00eff.1770645603.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Acked-by: Barry Song &lt;baohua@kernel.org&gt;
Reviewed-by: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Acked-by: David Hildenbrand (Arm) &lt;david@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: rmap: support batched checks of the references for large folios</title>
<updated>2026-02-12T23:43:00+00:00</updated>
<author>
<name>Baolin Wang</name>
<email>baolin.wang@linux.alibaba.com</email>
</author>
<published>2026-02-09T14:07:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=52e054f7184097bea009963e033cdd54af7bf8a2'/>
<id>urn:sha1:52e054f7184097bea009963e033cdd54af7bf8a2</id>
<content type='text'>
Patch series "support batch checking of references and unmapping for large
folios", v6.


Currently, folio_referenced_one() always checks the young flag for each
PTE sequentially, which is inefficient for large folios.  This
inefficiency is especially noticeable when reclaiming clean file-backed
large folios, where folio_referenced() is observed as a significant
performance hotspot.

Moreover, on Arm architecture, which supports contiguous PTEs, there is
already an optimization to clear the young flags for PTEs within a
contiguous range.  However, this is not sufficient.  We can extend this to
perform batched operations for the entire large folio (which might exceed
the contiguous range: CONT_PTE_SIZE).

Similar to folio_referenced_one(), we can also apply batched unmapping for
large file folios to optimize the performance of file folio reclamation. 
By supporting batched checking of the young flags, flushing TLB entries,
and unmapping, I can observed a significant performance improvements in my
performance tests for file folios reclamation.  Please check the
performance data in the commit message of each patch.


This patch (of 5):

Currently, folio_referenced_one() always checks the young flag for each
PTE sequentially, which is inefficient for large folios.  This
inefficiency is especially noticeable when reclaiming clean file-backed
large folios, where folio_referenced() is observed as a significant
performance hotspot.

Moreover, on Arm64 architecture, which supports contiguous PTEs, there is
already an optimization to clear the young flags for PTEs within a
contiguous range.  However, this is not sufficient.  We can extend this to
perform batched operations for the entire large folio (which might exceed
the contiguous range: CONT_PTE_SIZE).

Introduce a new API: clear_flush_young_ptes() to facilitate batched
checking of the young flags and flushing TLB entries, thereby improving
performance during large folio reclamation.  And it will be overridden by
the architecture that implements a more efficient batch operation in the
following patches.

While we are at it, rename ptep_clear_flush_young_notify() to
clear_flush_young_ptes_notify() to indicate that this is a batch
operation.

Link: https://lkml.kernel.org/r/cover.1770645603.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/12132694536834262062d1fb304f8f8a064b6750.1770645603.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Reviewed-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Acked-by: David Hildenbrand (Arm) &lt;david@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add SPDX id lines to some mm source files</title>
<updated>2026-02-06T23:47:16+00:00</updated>
<author>
<name>Tim Bird</name>
<email>tim.bird@sony.com</email>
</author>
<published>2026-02-04T21:31:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ef24e0aa078fa4965c6e925209780a32b325c0d8'/>
<id>urn:sha1:ef24e0aa078fa4965c6e925209780a32b325c0d8</id>
<content type='text'>
Some of the memory management source files are missing
SPDX-License-Identifier lines.  Add appropriate IDs
to these files (mostly GPL-2.0, but one LGPL-2.1).

Link: https://lkml.kernel.org/r/20260204213101.1754183-1-tim.bird@sony.com
Signed-off-by: Tim Bird &lt;tim.bird@sony.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
