<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/rmap.h, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-09-09T23:39:03+00:00</updated>
<entry>
<title>mm: remap unused subpages to shared zeropage when splitting isolated thp</title>
<updated>2024-09-09T23:39:03+00:00</updated>
<author>
<name>Yu Zhao</name>
<email>yuzhao@google.com</email>
</author>
<published>2024-08-30T10:03:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b1f202060afeb7fcb98473929d26fd3d2093b067'/>
<id>urn:sha1:b1f202060afeb7fcb98473929d26fd3d2093b067</id>
<content type='text'>
Patch series "mm: split underused THPs", v5.

The current upstream default policy for THP is always.  However, Meta uses
madvise in production as the current THP=always policy vastly
overprovisions THPs in sparsely accessed memory areas, resulting in
excessive memory pressure and premature OOM killing.  Using madvise +
relying on khugepaged has certain drawbacks over THP=always.  Using
madvise hints mean THPs aren't "transparent" and require userspace
changes.  Waiting for khugepaged to scan memory and collapse pages into
THP can be slow and unpredictable in terms of performance (i.e.  you dont
know when the collapse will happen), while production environments require
predictable performance.  If there is enough memory available, its better
for both performance and predictability to have a THP from fault time,
i.e.  THP=always rather than wait for khugepaged to collapse it, and deal
with sparsely populated THPs when the system is running out of memory.

This patch series is an attempt to mitigate the issue of running out of
memory when THP is always enabled.  During runtime whenever a THP is being
faulted in or collapsed by khugepaged, the THP is added to a list. 
Whenever memory reclaim happens, the kernel runs the deferred_split
shrinker which goes through the list and checks if the THP was underused,
i.e.  how many of the base 4K pages of the entire THP were zero-filled. 
If this number goes above a certain threshold, the shrinker will attempt
to split that THP.  Then at remap time, the pages that were zero-filled
are mapped to the shared zeropage, hence saving memory.  This method
avoids the downside of wasting memory in areas where THP is sparsely
filled when THP is always enabled, while still providing the upside THPs
like reduced TLB misses without having to use madvise.

Meta production workloads that were CPU bound (&gt;99% CPU utilzation) were
tested with THP shrinker.  The results after 2 hours are as follows:

                            | THP=madvise |  THP=always   | THP=always
                            |             |               | + shrinker series
                            |             |               | + max_ptes_none=409
-----------------------------------------------------------------------------
Performance improvement     |      -      |    +1.8%      |     +1.7%
(over THP=madvise)          |             |               |
-----------------------------------------------------------------------------
Memory usage                |    54.6G    | 58.8G (+7.7%) |   55.9G (+2.4%)
-----------------------------------------------------------------------------
max_ptes_none=409 means that any THP that has more than 409 out of 512
(80%) zero filled filled pages will be split.

To test out the patches, the below commands without the shrinker will
invoke OOM killer immediately and kill stress, but will not fail with the
shrinker:

echo 450 &gt; /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
mkdir /sys/fs/cgroup/test
echo $$ &gt; /sys/fs/cgroup/test/cgroup.procs
echo 20M &gt; /sys/fs/cgroup/test/memory.max
echo 0 &gt; /sys/fs/cgroup/test/memory.swap.max
# allocate twice memory.max for each stress worker and touch 40/512 of
# each THP, i.e. vm-stride 50K.
# With the shrinker, max_ptes_none of 470 and below won't invoke OOM
# killer.
# Without the shrinker, OOM killer is invoked immediately irrespective
# of max_ptes_none value and kills stress.
stress --vm 1 --vm-bytes 40M --vm-stride 50K


This patch (of 5):

Here being unused means containing only zeros and inaccessible to
userspace.  When splitting an isolated thp under reclaim or migration, the
unused subpages can be mapped to the shared zeropage, hence saving memory.
This is particularly helpful when the internal fragmentation of a thp is
high, i.e.  it has many untouched subpages.

This is also a prerequisite for THP low utilization shrinker which will be
introduced in later patches, where underutilized THPs are split, and the
zero-filled pages are freed saving memory.

Link: https://lkml.kernel.org/r/20240830100438.3623486-1-usamaarif642@gmail.com
Link: https://lkml.kernel.org/r/20240830100438.3623486-3-usamaarif642@gmail.com
Signed-off-by: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Usama Arif &lt;usamaarif642@gmail.com&gt;
Tested-by: Shuang Zhai &lt;zhais@google.com&gt;
Cc: Alexander Zhu &lt;alexlzhu@fb.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Domenico Cerasuolo &lt;cerasuolodomenico@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Nico Pache &lt;npache@redhat.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Shuang Zhai &lt;szhai2@cs.rochester.edu&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/rmap: use folio-&gt;_mapcount for small folios</title>
<updated>2024-09-04T04:15:37+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2024-08-16T10:32:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d0b003ce97ad6518dea4b0ed70081df95de04179'/>
<id>urn:sha1:d0b003ce97ad6518dea4b0ed70081df95de04179</id>
<content type='text'>
We have some cases left whereby we operate on small folios and still refer
to page-&gt;_mapcount.  Let's just use folio-&gt;_mapcount instead, which
currently still overlays page-&gt;_mapcount, so no change.

This change will make it easier to later spot any remaining users of
page-&gt;_mapcount that target tail pages.

Link: https://lkml.kernel.org/r/20240816103246.719209-1-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: extend rmap flags arguments for folio_add_new_anon_rmap</title>
<updated>2024-07-04T02:30:18+00:00</updated>
<author>
<name>Barry Song</name>
<email>v-songbaohua@oppo.com</email>
</author>
<published>2024-06-17T23:11:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=15bde4abab734c687c1f81704886aba3a70c268e'/>
<id>urn:sha1:15bde4abab734c687c1f81704886aba3a70c268e</id>
<content type='text'>
Patch series "mm: clarify folio_add_new_anon_rmap() and
__folio_add_anon_rmap()", v2.

This patchset is preparatory work for mTHP swapin.

folio_add_new_anon_rmap() assumes that new anon rmaps are always
exclusive.  However, this assumption doesn’t hold true for cases like
do_swap_page(), where a new anon might be added to the swapcache and is
not necessarily exclusive.

The patchset extends the rmap flags to allow folio_add_new_anon_rmap() to
handle both exclusive and non-exclusive new anon folios.  The
do_swap_page() function is updated to use this extended API with rmap
flags.  Consequently, all new anon folios now consistently use
folio_add_new_anon_rmap().  The special case for !folio_test_anon() in
__folio_add_anon_rmap() can be safely removed.

In conclusion, new anon folios always use folio_add_new_anon_rmap(),
regardless of exclusivity.  Old anon folios continue to use
__folio_add_anon_rmap() via folio_add_anon_rmap_pmd() and
folio_add_anon_rmap_ptes().


This patch (of 3):

In the case of a swap-in, a new anonymous folio is not necessarily
exclusive.  This patch updates the rmap flags to allow a new anonymous
folio to be treated as either exclusive or non-exclusive.  To maintain the
existing behavior, we always use EXCLUSIVE as the default setting.

[akpm@linux-foundation.org: cleanup and constifications per David and akpm]
[v-songbaohua@oppo.com: fix missing doc for flags of folio_add_new_anon_rmap()]
  Link: https://lkml.kernel.org/r/20240619210641.62542-1-21cnbao@gmail.com
[v-songbaohua@oppo.com: enhance doc for extend rmap flags arguments for folio_add_new_anon_rmap]
  Link: https://lkml.kernel.org/r/20240622030256.43775-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240617231137.80726-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240617231137.80726-2-21cnbao@gmail.com
Signed-off-by: Barry Song &lt;v-songbaohua@oppo.com&gt;
Suggested-by: David Hildenbrand &lt;david@redhat.com&gt;
Tested-by: Shuai Yuan &lt;yuanshuai@oppo.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove page_mkclean()</title>
<updated>2024-07-04T02:30:17+00:00</updated>
<author>
<name>Kefeng Wang</name>
<email>wangkefeng.wang@huawei.com</email>
</author>
<published>2024-06-04T11:48:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a929e0d10f3db1a53668f6b9845db27d7fb63759'/>
<id>urn:sha1:a929e0d10f3db1a53668f6b9845db27d7fb63759</id>
<content type='text'>
There are no more users of page_mkclean(), remove it and update the
document and comment.

Link: https://lkml.kernel.org/r/20240604114822.2089819-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory-failure: move some function declarations into internal.h</title>
<updated>2024-07-04T02:30:11+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2024-06-12T07:18:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3a78f77fd1fb82c32cb7971a8a97c5cbc83ab69e'/>
<id>urn:sha1:3a78f77fd1fb82c32cb7971a8a97c5cbc83ab69e</id>
<content type='text'>
There are some functions only used inside mm.  Move them into internal.h. 
No functional change intended.

Link: https://lkml.kernel.org/r/20240612071835.157004-11-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202405251049.hxjwX7zO-lkp@intel.com/
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop</title>
<updated>2024-07-04T02:30:08+00:00</updated>
<author>
<name>Lance Yang</name>
<email>ioworker0@gmail.com</email>
</author>
<published>2024-06-14T01:51:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=29e847d2ade3cdff36afe095fdbeb9b5f71a197a'/>
<id>urn:sha1:29e847d2ade3cdff36afe095fdbeb9b5f71a197a</id>
<content type='text'>
In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
folios, start the pagewalk first, then call split_huge_pmd_address() to
split the folio.

Link: https://lkml.kernel.org/r/20240614015138.31461-3-ioworker0@gmail.com
Signed-off-by: Lance Yang &lt;ioworker0@gmail.com&gt;
Suggested-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Suggested-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Acked-by: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Bang Li &lt;libang.li@antgroup.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Fangrui Song &lt;maskray@google.com&gt;
Cc: Jeff Xie &lt;xiehuan09@gmail.com&gt;
Cc: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Yin Fengwei &lt;fengwei.yin@intel.com&gt;
Cc: Zach O'Keefe &lt;zokeefe@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>rmap: remove DEFINE_PAGE_VMA_WALK()</title>
<updated>2024-07-04T02:29:59+00:00</updated>
<author>
<name>Kefeng Wang</name>
<email>wangkefeng.wang@huawei.com</email>
</author>
<published>2024-05-24T05:36:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1df07243483c1a32c8f2f93b06dc52a583a4dd1c'/>
<id>urn:sha1:1df07243483c1a32c8f2f93b06dc52a583a4dd1c</id>
<content type='text'>
This are no users since commit 40d707f33db5 ("mm/ksm: use folio in
write_protect_page"), so remove DEFINE_PAGE_VMA_WALK().

Link: https://lkml.kernel.org/r/20240524053618.208895-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Alex Shi &lt;alexs@kernel.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/rmap: sanity check that zeropages are not passed to RMAP</title>
<updated>2024-07-04T02:29:56+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2024-05-22T12:57:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6ad28e7e52e2bae76b1d3eb4466999d04d50db92'/>
<id>urn:sha1:6ad28e7e52e2bae76b1d3eb4466999d04d50db92</id>
<content type='text'>
Using insert_page() we might have previously ended up passing the zeropage
into rmap code.  Make sure that won't happen again.

Note that we won't check the huge zeropage for now, which might still end
up in RMAP code.

Link: https://lkml.kernel.org/r/20240522125713.775114-4-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Vincent Donnefort &lt;vdonnefort@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: return the address from page_mapped_in_vma()</title>
<updated>2024-05-06T00:53:45+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2024-04-12T19:35:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=37bc2ff506b184411e4cc80f111c638b2b4c83d4'/>
<id>urn:sha1:37bc2ff506b184411e4cc80f111c638b2b4c83d4</id>
<content type='text'>
The only user of this function calls page_address_in_vma() immediately
after page_mapped_in_vma() calculates it and uses it to return true/false.
Return the address instead, allowing memory-failure to skip the call to
page_address_in_vma().

Link: https://lkml.kernel.org/r/20240412193510.2356957-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reviewed-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: track mapcount of large folios in single value</title>
<updated>2024-05-06T00:53:28+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2024-04-09T19:22:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=05c5323b2a344c19c51cd1b91a4ab9ae90853794'/>
<id>urn:sha1:05c5323b2a344c19c51cd1b91a4ab9ae90853794</id>
<content type='text'>
Let's track the mapcount of large folios in a single value.  The mapcount
of a large folio currently corresponds to the sum of the entire mapcount
and all page mapcounts.

This sum is what we actually want to know in folio_mapcount() and it is
also sufficient for implementing folio_mapped().

With PTE-mapped THP becoming more important and more widely used, we want
to avoid looping over all pages of a folio just to obtain the mapcount of
large folios.  The comment "In the common case, avoid the loop when no
pages mapped by PTE" in folio_total_mapcount() does no longer hold for
mTHP that are always mapped by PTE.

Further, we are planning on using folio_mapcount() more frequently, and
might even want to remove page mapcounts for large folios in some kernel
configs.  Therefore, allow for reading the mapcount of large folios
efficiently and atomically without looping over any pages.

Maintain the mapcount also for hugetlb pages for simplicity.  Use the new
mapcount to implement folio_mapcount() and folio_mapped().  Make
page_mapped() simply call folio_mapped().  We can now get rid of
folio_large_is_mapped().

_nr_pages_mapped is now only used in rmap code and for debugging purposes.
Keep folio_nr_pages_mapped() around, but document that its use should be
limited to rmap internals and debugging purposes.

This change implies one additional atomic add/sub whenever
mapping/unmapping (parts of) a large folio.

As we now batch RMAP operations for PTE-mapped THP during fork(), during
unmap/zap, and when PTE-remapping a PMD-mapped THP, and we adjust the
large mapcount for a PTE batch only once, the added overhead in the common
case is small.  Only when unmapping individual pages of a large folio
(e.g., during COW), the overhead might be bigger in comparison, but it's
essentially one additional atomic operation.

Note that before the new mapcount would overflow, already our refcount
would overflow: each mapping requires a folio reference.  Extend the
focumentation of folio_mapcount().

Link: https://lkml.kernel.org/r/20240409192301.907377-5-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Yin Fengwei &lt;fengwei.yin@intel.com&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Richard Chang &lt;richardycc@google.com&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
