<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/memory-failure.c, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-02-05T22:10:00+00:00</updated>
<entry>
<title>mm/memory-failure: reject unsupported non-folio compound page</title>
<updated>2026-02-05T22:10:00+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2026-02-05T07:53:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ae9fd76c111bb4a5293c2831a1ad373605251dd6'/>
<id>urn:sha1:ae9fd76c111bb4a5293c2831a1ad373605251dd6</id>
<content type='text'>
When !CONFIG_TRANSPARENT_HUGEPAGE, a non-folio compound page can appear in
a userspace mapping via either vm_insert_*() functions or
vm_operatios_struct-&gt;fault().  They are not folios, thus should not be
considered for folio operations like split.  To reject these pages, make
sure get_hwpoison_page() is always called as HWPoisonHandlable() will do
the right work.

[Some commit log borrowed from Zi Yan. Thanks.]

Link: https://lkml.kernel.org/r/20260205075328.523211-1-linmiaohe@huawei.com
Fixes: 689b8986776c ("mm/memory-failure: improve large block size folio handling")
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reported-by: 是参差 &lt;shicenci@gmail.com&gt;
Closes: https://lore.kernel.org/all/PS1PPF7E1D7501F1E4F4441E7ECD056DEADAB98A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com/
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Tested-by: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn</title>
<updated>2026-01-27T03:03:46+00:00</updated>
<author>
<name>Jane Chu</name>
<email>jane.chu@oracle.com</email>
</author>
<published>2026-01-20T23:22:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=057a6f2632c956483e2b2628477f0fcd1cd8a844'/>
<id>urn:sha1:057a6f2632c956483e2b2628477f0fcd1cd8a844</id>
<content type='text'>
When a hugetlb folio is being poisoned again, try_memory_failure_hugetlb()
passed head pfn to kill_accessing_process(), that is not right.  The
precise pfn of the poisoned page should be used in order to determine the
precise vaddr as the SIGBUS payload.

This issue has already been taken care of in the normal path, that is,
hwpoison_user_mappings(), see [1][2].  Further more, for [3] to work
correctly in the hugetlb repoisoning case, it's essential to inform VM the
precise poisoned page, not the head page.

[1] https://lkml.kernel.org/r/20231218135837.3310403-1-willy@infradead.org
[2] https://lkml.kernel.org/r/20250224211445.2663312-1-jane.chu@oracle.com
[3] https://lore.kernel.org/lkml/20251116013223.1557158-1-jiaqiyan@google.com/

Link: https://lkml.kernel.org/r/20260120232234.3462258-2-jane.chu@oracle.com
Signed-off-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Jiaqi Yan &lt;jiaqiyan@google.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: William Roche &lt;william.roche@oracle.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory-failure: fix missing -&gt;mf_stats count in hugetlb poison</title>
<updated>2026-01-27T03:03:46+00:00</updated>
<author>
<name>Jane Chu</name>
<email>jane.chu@oracle.com</email>
</author>
<published>2026-01-20T23:22:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a148a2040191b12b45b82cb29c281cb3036baf90'/>
<id>urn:sha1:a148a2040191b12b45b82cb29c281cb3036baf90</id>
<content type='text'>
When a newly poisoned subpage ends up in an already poisoned hugetlb
folio, 'num_poisoned_pages' is incremented, but the per node -&gt;mf_stats is
not.  Fix the inconsistency by designating action_result() to update them
both.

While at it, define __get_huge_page_for_hwpoison() return values in terms
of symbol names for better readibility.  Also rename
folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison() since the
function does more than the conventional bit setting and the fact three
possible return values are expected.

Link: https://lkml.kernel.org/r/20260120232234.3462258-1-jane.chu@oracle.com
Fixes: 18f41fa616ee ("mm: memory-failure: bump memory failure stats to pglist_data")
Signed-off-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Jiaqi Yan &lt;jiaqiyan@google.com&gt;
Cc: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: William Roche &lt;william.roche@oracle.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fixup pfnmap memory failure handling to use pgoff</title>
<updated>2025-12-23T19:23:15+00:00</updated>
<author>
<name>Ankit Agrawal</name>
<email>ankita@nvidia.com</email>
</author>
<published>2025-12-11T07:06:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e6dbcb7c0e7b508d443a9aa6f77f63a2f83b1ae4'/>
<id>urn:sha1:e6dbcb7c0e7b508d443a9aa6f77f63a2f83b1ae4</id>
<content type='text'>
The memory failure handling implementation for the PFNMAP memory with no
struct pages is faulty.  The VA of the mapping is determined based on the
the PFN.  It should instead be based on the file mapping offset.

At the occurrence of poison, the memory_failure_pfn is triggered on the
poisoned PFN.  Introduce a callback function that allows mm to translate
the PFN to the corresponding file page offset.  The kernel module using
the registration API must implement the callback function and provide the
translation.  The translated value is then used to determine the VA
information and sending the SIGBUS to the usermode process mapped to the
poisoned PFN.

The callback is also useful for the driver to be notified of the poisoned
PFN, which may then track it.

Link: https://lkml.kernel.org/r/20251211070603.338701-2-ankita@nvidia.com
Fixes: 2ec41967189c ("mm: handle poisoning of pfn without struct pages")
Signed-off-by: Ankit Agrawal &lt;ankita@nvidia.com&gt;
Suggested-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Kevin Tian &lt;kevin.tian@intel.com&gt;
Cc: Matthew R. Ochs &lt;mochs@nvidia.com&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Neo Jia &lt;cjia@nvidia.com&gt;
Cc: Vikram Sethi &lt;vsethi@nvidia.com&gt;
Cc: Yishai Hadas &lt;yishaih@nvidia.com&gt;
Cc: Zhi Wang &lt;zhiw@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory-failure: remove the selection of RAS</title>
<updated>2025-11-24T23:08:55+00:00</updated>
<author>
<name>Xie Yuanbin</name>
<email>xieyuanbin1@huawei.com</email>
</author>
<published>2025-11-19T09:59:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=31807483d3952059d395c2a73b1fa9625db9b366'/>
<id>urn:sha1:31807483d3952059d395c2a73b1fa9625db9b366</id>
<content type='text'>
commit 97f0b13452198290799f ("tracing: add trace event for
memory-failure") introduces the selection of RAS in memory-failure.  This
commit is just a tracing feature; in reality, there is no dependency
between memory-failure and RAS.  RAS increases the size of the bzImage
image by 8k, which is very valuable for embedded devices.

Move the memory-failure traceing code from ras_event.h to
memory-failure.h and remove the selection of RAS.

Link: https://lkml.kernel.org/r/20251119095943.67125-1-xieyuanbin1@huawei.com
Signed-off-by: Xie Yuanbin &lt;xieyuanbin1@huawei.com&gt;
Acked-by: David Hildenbrand (Red Hat) &lt;david@kernel.org&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: replace remaining pte_to_swp_entry() with softleaf_from_pte()</title>
<updated>2025-11-24T23:08:52+00:00</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lorenzo.stoakes@oracle.com</email>
</author>
<published>2025-11-10T22:21:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a3a3e215c9c140c08760d4d96ba4e8bc485d0f14'/>
<id>urn:sha1:a3a3e215c9c140c08760d4d96ba4e8bc485d0f14</id>
<content type='text'>
There are straggler invocations of pte_to_swp_entry() lying around,
replace all of these with the software leaf entry equivalent -
softleaf_from_pte().

With those removed, eliminate pte_to_swp_entry() altogether.

No functional change intended.

Link: https://lkml.kernel.org/r/d8ee5ccefe4c42d7c4fe1a2e46f285ac40421cd3.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Byungchul Park &lt;byungchul@sk.com&gt;
Cc: Chengming Zhou &lt;chengming.zhou@linux.dev&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Cc: Gregory Price &lt;gourry@gourry.net&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@linux.alibaba.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Janosch Frank &lt;frankja@linux.ibm.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Joshua Hahn &lt;joshua.hahnjy@gmail.com&gt;
Cc: Kairui Song &lt;kasong@tencent.com&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: Lance Yang &lt;lance.yang@linux.dev&gt;
Cc: Leon Romanovsky &lt;leon@kernel.org&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Mathew Brost &lt;matthew.brost@intel.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Nico Pache &lt;npache@redhat.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rakie Kim &lt;rakie.kim@sk.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: xu xin &lt;xu.xin16@zte.com.cn&gt;
Cc: Yuanchu Xie &lt;yuanchu@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: eliminate further swapops predicates</title>
<updated>2025-11-24T23:08:52+00:00</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lorenzo.stoakes@oracle.com</email>
</author>
<published>2025-11-10T22:21:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=93976a20345b4aff1ac7598ec1223d65ca33d49c'/>
<id>urn:sha1:93976a20345b4aff1ac7598ec1223d65ca33d49c</id>
<content type='text'>
Having converted so much of the code base to software leaf entries, we can
mop up some remaining cases.

We replace is_pfn_swap_entry(), pfn_swap_entry_to_page(),
is_writable_device_private_entry(), is_device_exclusive_entry(),
is_migration_entry(), is_writable_migration_entry(),
is_readable_migration_entry(), swp_offset_pfn() and pfn_swap_entry_folio()
with softleaf equivalents.

No functional change intended.

Link: https://lkml.kernel.org/r/956bc9c031604811c0070d2f4bf2f1373f230213.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Byungchul Park &lt;byungchul@sk.com&gt;
Cc: Chengming Zhou &lt;chengming.zhou@linux.dev&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Cc: Gregory Price &lt;gourry@gourry.net&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@linux.alibaba.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Janosch Frank &lt;frankja@linux.ibm.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Joshua Hahn &lt;joshua.hahnjy@gmail.com&gt;
Cc: Kairui Song &lt;kasong@tencent.com&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: Lance Yang &lt;lance.yang@linux.dev&gt;
Cc: Leon Romanovsky &lt;leon@kernel.org&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Mathew Brost &lt;matthew.brost@intel.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Nico Pache &lt;npache@redhat.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rakie Kim &lt;rakie.kim@sk.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: xu xin &lt;xu.xin16@zte.com.cn&gt;
Cc: Yuanchu Xie &lt;yuanchu@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory-failure: improve large block size folio handling</title>
<updated>2025-11-24T23:08:49+00:00</updated>
<author>
<name>Zi Yan</name>
<email>ziy@nvidia.com</email>
</author>
<published>2025-10-31T16:20:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=689b8986776c823161fb4955cc7dc303f78a1962'/>
<id>urn:sha1:689b8986776c823161fb4955cc7dc303f78a1962</id>
<content type='text'>
Large block size (LBS) folios cannot be split to order-0 folios but
min_order_for_folio().  Current split fails directly, but that is not
optimal.  Split the folio to min_order_for_folio(), so that, after split,
only the folio containing the poisoned page becomes unusable instead.

For soft offline, do not split the large folio if its
min_order_for_folio() is not 0.  Since the folio is still accessible from
userspace and premature split might lead to potential performance loss.

Link: https://lkml.kernel.org/r/20251031162001.670503-3-ziy@nvidia.com
Signed-off-by: Zi Yan &lt;ziy@nvidia.com&gt;
Suggested-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Reviewed-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reviewed-by: Barry Song &lt;baohua@kernel.org&gt;
Reviewed-by: Lance Yang &lt;lance.yang@linux.dev&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Nico Pache &lt;npache@redhat.com&gt;
Cc: Pankaj Raghav &lt;kernel@pankajraghav.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: handle poisoning of pfn without struct pages</title>
<updated>2025-11-17T01:28:29+00:00</updated>
<author>
<name>Ankit Agrawal</name>
<email>ankita@nvidia.com</email>
</author>
<published>2025-11-02T18:44:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2ec41967189cd65a8f79c760dd1b50c4f56e8ac6'/>
<id>urn:sha1:2ec41967189cd65a8f79c760dd1b50c4f56e8ac6</id>
<content type='text'>
Poison (or ECC) errors can be very common on a large size cluster.  The
kernel MM currently does not handle ECC errors / poison on a memory region
that is not backed by struct pages.  If a memory region mapped using
remap_pfn_range() for example, but not added to the kernel, MM will not
have associated struct pages.  Add a new mechanism to handle memory
failure on such memory.

Make kernel MM expose a function to allow modules managing the device
memory to register the device memory SPA and the address space associated
it.  MM maintains this information as an interval tree.  On poison, MM can
search for the range that the poisoned PFN belong and use the
address_space to determine the mapping VMA.

In this implementation, kernel MM follows the following sequence that is
largely similar to the memory_failure() handler for struct page backed
memory:

1. memory_failure() is triggered on reception of a poison error.  An
   absence of struct page is detected and consequently
   memory_failure_pfn() is executed.

2. memory_failure_pfn() collects the processes mapped to the PFN.

3. memory_failure_pfn() sends SIGBUS to all the processes mapping the
   faulty PFN using kill_procs().

Note that there is one primary difference versus the handling of the
poison on struct pages, which is to skip unmapping to the faulty PFN. 
This is done to handle the huge PFNMAP support added recently [1] that
enables VM_PFNMAP vmas to map at PMD or PUD level.  A poison to a PFN
mapped in such as way would need breaking the PMD/PUD mapping into PTEs
that will get mirrored into the S2.  This can greatly increase the cost of
table walks and have a major performance impact.

Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1]
Link: https://lkml.kernel.org/r/20251102184434.2406-3-ankita@nvidia.com
Signed-off-by: Ankit Agrawal &lt;ankita@nvidia.com&gt;
Cc: Aniket Agashe &lt;aniketa@nvidia.com&gt;
Cc: Borislav Betkov &lt;bp@alien8.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Hanjun Guo &lt;guohanjun@huawei.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Joanthan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Kevin Tian &lt;kevin.tian@intel.com&gt;
Cc: Kirti Wankhede &lt;kwankhede@nvidia.com&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Matthew R. Ochs &lt;mochs@nvidia.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Neo Jia &lt;cjia@nvidia.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Cc: Smita Koralahalli Channabasappa &lt;smita.koralahallichannabasappa@amd.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Tarun Gupta &lt;targupta@nvidia.com&gt;
Cc: Uwe Kleine-König &lt;u.kleine-koenig@baylibre.com&gt;
Cc: Vikram Sethi &lt;vsethi@nvidia.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Zhi Wang &lt;zhiw@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: always call rmap_walk() on locked folios</title>
<updated>2025-11-17T01:28:00+00:00</updated>
<author>
<name>Lokesh Gidra</name>
<email>lokeshgidra@google.com</email>
</author>
<published>2025-09-23T07:10:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=95b34d66480bbc9bc31e78c26b1d5be47358ffc0'/>
<id>urn:sha1:95b34d66480bbc9bc31e78c26b1d5be47358ffc0</id>
<content type='text'>
Patch series "Improve UFFDIO_MOVE scalability by removing anon_vma lock", v2.

Userfaultfd has a scalability issue in its UFFDIO_MOVE ioctl, which is
heavily used in Android as its java garbage collector uses it for
concurrent heap compaction.

The issue arises because UFFDIO_MOVE updates folio-&gt;mapping to an anon_vma
with a different root, in order to move the folio from a src VMA to dst
VMA.  It performs the operation with the folio locked, but this is
insufficient, because rmap_walk() can be performed on non-KSM anonymous
folios without folio lock.

This means that UFFDIO_MOVE has to acquire the anon_vma write lock of the
root anon_vma belonging to the folio it wishes to move.

This causes scalability bottleneck when multiple threads perform
UFFDIO_MOVE simultanously on distinct pages of the same src VMA.  In field
traces of arm64 android devices, we have observed janky user interactions
due to long (sometimes over ~50ms) uninterruptible sleeps on main UI
thread caused by anon_vma lock contention in UFFDIO_MOVE.  This is
particularly severe during the beginning of GC's compaction phase when it
is likely to have multiple threads involved.

This patch resolves the issue by removing the exception in rmap_walk() for
non-KSM anon folios by ensuring that all folios are locked during rmap
walk.  This is less problematic than it might seem, as the only major
caller which utilises this mode is shrink_active_list(), which is covered
in detail in the first patch of this series.

As a result of changing our approach to locking, we can remove all the
code that took steps to acquire an anon_vma write lock instead of a folio
lock.  This results in a significant simplification and scalability
improvement of the code (currently only in UFFDIO_MOVE).  Furthermore, as
a side-effect, folio_lock_anon_vma_read() gets simpler as we don't need to
worry that folio-&gt;mapping may have changed under us.


This patch (of 2):

Guarantee that rmap_walk() is called on locked folios so that threads
changing folio-&gt;mapping and folio-&gt;index for non-KSM anon folios can
serialize on fine-grained folio lock rather than anon_vma lock.  Other
folio types are already always locked before rmap_walk().  With this, we
are going from 'not necessarily' locking the non-KSM anon folio to
'definitely' locking it during rmap walks.

This patch is in preparation for removing anon_vma write-lock from
UFFDIO_MOVE.

With this patch, three functions are now expected to be called with a
locked folio.  To be careful of not missing any case, here is the
exhaustive list of all their callers.

1) rmap_walk() is called from:

a) folio_referenced()
b) damon_folio_mkold()
c) damon_folio_young()
d) page_idle_clear_pte_refs()
e) try_to_unmap()
f) try_to_migrate()
g) folio_mkclean()
h) remove_migration_ptes()

In the above list, first 4 are changed in this patch to try-lock non-KSM
anon folios, similar to other types of folios.  The remaining functions in
the list already hold folio lock when calling rmap_walk().

2) folio_lock_anon_vma_read() is called from following functions:

a) collect_procs_anon()
b) page_idle_clear_pte_refs()
c) damon_folio_mkold()
d) damon_folio_young()
e) folio_referenced()
f) try_to_unmap()
g) try_to_migrate()

All the functions in above list, except collect_procs_anon(), are covered
by the rmap_walk() list above.  For collect_procs_anon(), with
kill_procs_now() changed to take folio lock in this patch ensures that all
callers of folio_lock_anon_vma_read() now hold the lock.

3) folio_get_anon_vma() is called from following functions, all of which
   already hold the folio lock:

a) move_pages_huge_pmd()
b) __folio_split()
c) move_pages_ptes()
d) migrate_folio_unmap()
e) unmap_and_move_huge_page()

Functionally, this patch doesn't break the logic because rmap walkers
generally do some other check to see if what is expected to mapped did
happen so it's fine, or otherwise treat things as best-effort.

Among the 4 functions changed in this patch, folio_referenced() is the
only core-mm function, and is also frequently accessed.  To assess the
impact of locking non-KSM anon folios in
shrink_active_list()-&gt;folio_referenced() path, we performed an app cycle
test on an arm64 android device.  During the whole duration of the test
there were over 140k invocations of shrink_active_list(), out of which
over 29k had at least one non-KSM anon folio on which folio_referenced()
was called.  In none of these invocations folio_trylock() failed.

Of course, we now take a lock where we wouldn't previously have.  In the
past it would have had a major impact in causing a CoW write fault to copy
a page in do_wp_page(), as commit 09854ba94c6a ("mm: do_wp_page()
simplification") caused a failure to obtain folio lock to result in a page
copy even if one wasn't necessary.

However, since commit 6c287605fd56 ("mm: remember exclusively mapped
anonymous pages with PG_anon_exclusive"), and the introduction of the
folio anon exclusive flag, this issue is significantly mitigated.

The only case remaining that we might worry about from this perspective is
that of read-only folios immediately after fork where the anon exclusive
bit will not have been set yet.

We note however in the case of read-only just-forked folios that
wp_can_reuse_anon_folio() will notice the raised reference count
established by shrink_active_list() via isolate_lru_folios() and refuse to
reuse in any case, so this will in fact have no impact - the folio lock is
ultimately immaterial here.

All-in-all it appears that there is little opportunity for meaningful
negative impact from this change.

Link: https://lkml.kernel.org/r/20250923071019.775806-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20250923071019.775806-2-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra &lt;lokeshgidra@google.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Peter Xu &lt;peterx@redhat.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Nicolas Geoffray &lt;ngeoffray@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
