<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/sparse-vmemmap.c, branch v7.1-rc5</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1-rc5</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1-rc5'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-04-05T20:53:34+00:00</updated>
<entry>
<title>mm: mark early-init static variables with __meminitdata</title>
<updated>2026-04-05T20:53:34+00:00</updated>
<author>
<name>Kaitao Cheng</name>
<email>chengkaitao@kylinos.cn</email>
</author>
<published>2026-03-21T12:08:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c4a9439a5a372c6c0eb7cd2bc9dbb2494699e98d'/>
<id>urn:sha1:c4a9439a5a372c6c0eb7cd2bc9dbb2494699e98d</id>
<content type='text'>
Static variables defined inside __meminit functions should also be marked
with __meminitdata, so that their storage is placed in the .init.data
section and reclaimed with free_initmem(), thereby reducing permanent .bss
memory usage when CONFIG_MEMORY_HOTPLUG is disabled.

Link: https://lkml.kernel.org/r/20260321120847.8159-1-pilgrimtao@gmail.com
Signed-off-by: Kaitao Cheng &lt;chengkaitao@kylinos.cn&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes (Oracle) &lt;ljs@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/sparse: move memory hotplug bits to sparse-vmemmap.c</title>
<updated>2026-04-05T20:53:33+00:00</updated>
<author>
<name>David Hildenbrand (Arm)</name>
<email>david@kernel.org</email>
</author>
<published>2026-03-20T22:13:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=738de20c4fafe64290c5086d683254f60e837db6'/>
<id>urn:sha1:738de20c4fafe64290c5086d683254f60e837db6</id>
<content type='text'>
Let's move all memory hoptplug related code to sparse-vmemmap.c.

We only have to expose sparse_index_init().  While at it, drop the
definition of sparse_index_init() for !CONFIG_SPARSEMEM, which is unused,
and place the declaration in internal.h.

Link: https://lkml.kernel.org/r/20260320-sparsemem_cleanups-v2-15-096addc8800d@kernel.org
Signed-off-by: David Hildenbrand (Arm) &lt;david@kernel.org&gt;
Reviewed-by: Lorenzo Stoakes (Oracle) &lt;ljs@kernel.org&gt;
Reviewed-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Sidhartha Kumar &lt;sidhartha.kumar@oracle.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@kernel.org&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Yuanchu Xie &lt;yuanchu@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: remove fake head pages</title>
<updated>2026-04-05T20:53:09+00:00</updated>
<author>
<name>Kiryl Shutsemau</name>
<email>kas@kernel.org</email>
</author>
<published>2026-02-27T19:42:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=622026e87c4019e609010811757e31193cc23847'/>
<id>urn:sha1:622026e87c4019e609010811757e31193cc23847</id>
<content type='text'>
HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most
vmemmap pages for huge pages and remapping the freed range to a single
page containing the struct page metadata.

With the new mask-based compound_info encoding (for power-of-2 struct page
sizes), all tail pages of the same order are now identical regardless of
which compound page they belong to.  This means the tail pages can be
truly shared without fake heads.

Allocate a single page of initialized tail struct pages per zone per order
in the vmemmap_tails[] array in struct zone.  All huge pages of that order
in the zone share this tail page, mapped read-only into their vmemmap. 
The head page remains unique per huge page.

Redefine MAX_FOLIO_ORDER using ilog2().  The define has to produce a
compile-constant as it is used to specify vmemmap_tail array size.  For
some reason, compiler is not able to solve get_order() at compile-time,
but ilog2() works.

Avoid PUD_ORDER to define MAX_FOLIO_ORDER as it adds dependency to
&lt;linux/pgtable.h&gt; which generates hard-to-break include loop.

This eliminates fake heads while maintaining the same memory savings, and
simplifies compound_head() by removing fake head detection.

Link: https://lkml.kernel.org/r/20260227194302.274384-13-kas@kernel.org
Signed-off-by: Kiryl Shutsemau &lt;kas@kernel.org&gt;
Reviewed-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Acked-by: David Hildenbrand (Arm) &lt;david@kernel.org&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexandre Ghiti &lt;alex@ghiti.fr&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Christoph Lameter &lt;cl@gentwo.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: WANG Xuerui &lt;kernel@xen0n.name&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: defer vmemmap population for bootmem hugepages</title>
<updated>2026-04-05T20:53:08+00:00</updated>
<author>
<name>Kiryl Shutsemau (Meta)</name>
<email>kas@kernel.org</email>
</author>
<published>2026-02-27T19:42:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=209e6d9eb13aaf1b6e0fc6f76afc00d055e5ba12'/>
<id>urn:sha1:209e6d9eb13aaf1b6e0fc6f76afc00d055e5ba12</id>
<content type='text'>
Currently, the vmemmap for bootmem-allocated gigantic pages is populated
early in hugetlb_vmemmap_init_early().  However, the zone information is
only available after zones are initialized.  If it is later discovered
that a page spans multiple zones, the HVO mapping must be undone and
replaced with a normal mapping using vmemmap_undo_hvo().

Defer the actual vmemmap population to hugetlb_vmemmap_init_late().  At
this stage, zones are already initialized, so it can be checked if the
page is valid for HVO before deciding how to populate the vmemmap.

This allows us to remove vmemmap_undo_hvo() and the complex logic required
to rollback HVO mappings.

In hugetlb_vmemmap_init_late(), if HVO population fails or if the zones
are invalid, fall back to a normal vmemmap population.

Postponing population until hugetlb_vmemmap_init_late() also makes zone
information available from within vmemmap_populate_hvo().

Link: https://lkml.kernel.org/r/20260227194302.274384-10-kas@kernel.org
Signed-off-by: Kiryl Shutsemau (Meta) &lt;kas@kernel.org&gt;
Acked-by: David Hildenbrand (Arm) &lt;david@kernel.org&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexandre Ghiti &lt;alex@ghiti.fr&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Christoph Lameter &lt;cl@gentwo.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: WANG Xuerui &lt;kernel@xen0n.name&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: replace READ_ONCE() with standard page table accessors</title>
<updated>2025-11-17T01:27:56+00:00</updated>
<author>
<name>Anshuman Khandual</name>
<email>anshuman.khandual@arm.com</email>
</author>
<published>2025-10-07T06:31:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c0efdb373c3aaacb32db59cadb0710cac13e44ae'/>
<id>urn:sha1:c0efdb373c3aaacb32db59cadb0710cac13e44ae</id>
<content type='text'>
Replace all READ_ONCE() with a standard page table accessors i.e
pxdp_get() that defaults into READ_ONCE() in cases where platform does not
override.

Link: https://lkml.kernel.org/r/20251007063100.2396936-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Lance Yang &lt;lance.yang@linux.dev&gt;
Reviewed-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Reviewed-by: Dev Jain &lt;dev.jain@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: introduce and use {pgd,p4d}_populate_kernel()</title>
<updated>2025-08-28T05:45:44+00:00</updated>
<author>
<name>Harry Yoo</name>
<email>harry.yoo@oracle.com</email>
</author>
<published>2025-08-18T02:02:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f2d2f9598ebb0158a3fe17cda0106d7752e654a2'/>
<id>urn:sha1:f2d2f9598ebb0158a3fe17cda0106d7752e654a2</id>
<content type='text'>
Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
populating PGD and P4D entries for the kernel address space.  These
helpers ensure proper synchronization of page tables when updating the
kernel portion of top-level page tables.

Until now, the kernel has relied on each architecture to handle
synchronization of top-level page tables in an ad-hoc manner.  For
example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct
mapping and vmemmap mapping changes").

However, this approach has proven fragile for following reasons:

  1) It is easy to forget to perform the necessary page table
     synchronization when introducing new changes.
     For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
     savings for compound devmaps") overlooked the need to synchronize
     page tables for the vmemmap area.

  2) It is also easy to overlook that the vmemmap and direct mapping areas
     must not be accessed before explicit page table synchronization.
     For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
     sub-pmd ranges")) caused crashes by accessing the vmemmap area
     before calling sync_global_pgds().

To address this, as suggested by Dave Hansen, introduce _kernel() variants
of the page table population helpers, which invoke architecture-specific
hooks to properly synchronize page tables.  These are introduced in a new
header file, include/linux/pgalloc.h, so they can be called from common
code.

They reuse existing infrastructure for vmalloc and ioremap. 
Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
and the actual synchronization is performed by
arch_sync_kernel_mappings().

This change currently targets only x86_64, so only PGD and P4D level
helpers are introduced.  Currently, these helpers are no-ops since no
architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.

In theory, PUD and PMD level helpers can be added later if needed by other
architectures.  For now, 32-bit architectures (x86-32 and arm) only handle
PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless
we introduce a PMD level helper.

[harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()]
  Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com
Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Suggested-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Kiryl Shutsemau &lt;kas@kernel.org&gt;
Reviewed-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: bibo mao &lt;maobibo@loongson.cn&gt;
Cc: Borislav Betkov &lt;bp@alien8.de&gt;
Cc: Christoph Lameter (Ampere) &lt;cl@gentwo.org&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Dmitriy Vyukov &lt;dvyukov@google.com&gt;
Cc: Gwan-gyeong Mun &lt;gwan-gyeong.mun@intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Cc: Thomas Huth &lt;thuth@redhat.com&gt;
Cc: "Uladzislau Rezki (Sony)" &lt;urezki@gmail.com&gt;
Cc: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix accounting of memmap pages</title>
<updated>2025-08-28T05:45:43+00:00</updated>
<author>
<name>Sumanth Korikkar</name>
<email>sumanthk@linux.ibm.com</email>
</author>
<published>2025-08-07T18:35:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c3576889d87b603cb66b417e08844a53c1077a37'/>
<id>urn:sha1:c3576889d87b603cb66b417e08844a53c1077a37</id>
<content type='text'>
For !CONFIG_SPARSEMEM_VMEMMAP, memmap page accounting is currently done
upfront in sparse_buffer_init().  However, sparse_buffer_alloc() may
return NULL in failure scenario.

Also, memmap pages may be allocated either from the memblock allocator
during early boot or from the buddy allocator.  When removed via
arch_remove_memory(), accounting of memmap pages must reflect the original
allocation source.

To ensure correctness:
* Account memmap pages after successful allocation in sparse_init_nid()
  and section_activate().
* Account memmap pages in section_deactivate() based on allocation
  source.

Link: https://lkml.kernel.org/r/20250807183545.1424509-1-sumanthk@linux.ibm.com
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Sumanth Korikkar &lt;sumanthk@linux.ibm.com&gt;
Suggested-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: do pre-HVO for bootmem allocated pages</title>
<updated>2025-03-17T05:06:29+00:00</updated>
<author>
<name>Frank van der Linden</name>
<email>fvdl@google.com</email>
</author>
<published>2025-02-28T18:29:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b1222550fbf73840e31a103305d03ad53b8f5a59'/>
<id>urn:sha1:b1222550fbf73840e31a103305d03ad53b8f5a59</id>
<content type='text'>
For large systems, the overhead of vmemmap pages for hugetlb is
substantial.  It's about 1.5% of memory, which is about 45G for a 3T
system.  If you want to configure most of that system for hugetlb (e.g. 
to use as backing memory for VMs), there is a chance of running out of
memory on boot, even though you know that the 45G will become available
later.

To avoid this scenario, and since it's a waste to first allocate and then
free that 45G during boot, do pre-HVO for hugetlb bootmem allocated pages
('gigantic' pages).

pre-HVO is done by adding functions that are called from
sparse_init_nid_early and sparse_init_nid_late.  The first is called
before memmap allocation, so it takes care of allocating memmap HVO-style.
The second verifies that all bootmem pages look good, specifically it
checks that they do not intersect with multiple zones.  This can only be
done from sparse_init_nid_late path, when zones have been initialized.

The hugetlb page size must be aligned to the section size, and aligned to
the size of memory described by the number of page structures contained in
one PMD (since pre-HVO is not prepared to split PMDs).  This should be
true for most 'gigantic' pages, it is for 1G pages on x86, where both of
these alignment requirements are 128M.

This will only have an effect if hugetlb_bootmem_alloc was called early in
boot.  If not, it won't do anything, and HVO for bootmem hugetlb pages
works as before.

Link: https://lkml.kernel.org/r/20250228182928.2645936-20-fvdl@google.com
Signed-off-by: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin (Cruise) &lt;roman.gushchin@linux.dev&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/sparse: add vmemmap_*_hvo functions</title>
<updated>2025-03-17T05:06:28+00:00</updated>
<author>
<name>Frank van der Linden</name>
<email>fvdl@google.com</email>
</author>
<published>2025-02-28T18:29:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9eb6207b7812cee13431831c9661d21b53f3e93f'/>
<id>urn:sha1:9eb6207b7812cee13431831c9661d21b53f3e93f</id>
<content type='text'>
Add a few functions to enable early HVO:

vmemmap_populate_hvo
vmemmap_undo_hvo
vmemmap_wrprotect_hvo

The populate and undo functions are expected to be used in early init,
from the sparse_init_nid_early() function.  The wrprotect function is to
be used, potentially, later.

To implement these functions, mostly re-use the existing compound pages
vmemmap logic used by DAX.  vmemmap_populate_address has its argument
changed a bit in this commit: the page structure passed in to be reused in
the mapping is replaced by a PFN and a flag.  The flag indicates whether
an extra ref should be taken on the vmemmap page containing the head page
structure.  Taking the ref is appropriate to for DAX / ZONE_DEVICE, but
not for HugeTLB HVO.

The HugeTLB vmemmap optimization maps tail page structure pages read-only.
The vmemmap_wrprotect_hvo function that does this is implemented
separately, because it cannot be guaranteed that reserved page structures
will not be write accessed during memory initialization.  Even with
CONFIG_DEFERRED_STRUCT_PAGE_INIT, they might still be written to (if they
are at the bottom of a zone).  So, vmemmap_populate_hvo leaves the tail
page structure pages RW initially, and then later during initialization,
after memmap init is fully done, vmemmap_wrprotect_hvo must be called to
finish the job.

Subsequent commits will use these functions for early HugeTLB HVO.

Link: https://lkml.kernel.org/r/20250228182928.2645936-15-fvdl@google.com
Signed-off-by: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin (Cruise) &lt;roman.gushchin@linux.dev&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/sparse: allow for alternate vmemmap section init at boot</title>
<updated>2025-03-17T05:06:27+00:00</updated>
<author>
<name>Frank van der Linden</name>
<email>fvdl@google.com</email>
</author>
<published>2025-02-28T18:29:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d65917c42373f70159a3fc453f8f028fd665e04f'/>
<id>urn:sha1:d65917c42373f70159a3fc453f8f028fd665e04f</id>
<content type='text'>
Add functions that are called just before the per-section memmap is
initialized and just before the memmap page structures are initialized. 
They are called sparse_vmemmap_init_nid_early and
sparse_vmemmap_init_nid_late, respectively.

This allows for mm subsystems to add calls to initialize memmap and page
structures in a specific way, if using SPARSEMEM_VMEMMAP.  Specifically,
hugetlb can pre-HVO bootmem allocated pages that way, so that no time and
resources are wasted on allocating vmemmap pages, only to free them later
(and possibly unnecessarily running the system out of memory in the
process).

Refactor some code and export a few convenience functions for external
use.

In sparse_init_nid, skip any sections that are already initialized, e.g. 
they have been initialized by sparse_vmemmap_init_nid_early already.

The hugetlb code to use these functions will be added in a later commit.

Export section_map_size, as any alternate memmap init code will want to
use it.

The internal config option to enable this is SPARSEMEM_VMEMMAP_PREINIT,
which is selected if an architecture-specific option,
ARCH_WANT_HUGETLB_VMEMMAP_PREINIT, is set.  In the future, if other
subsystems want to do preinit too, they can do it in a similar fashion.

The internal config option is there because a section flag is used, and
the number of flags available is architecture-dependent (see mmzone.h). 
Architecures can decide if there is room for the flag when enabling
options that select SPARSEMEM_VMEMMAP_PREINIT.

Fortunately, as of right now, all sparse vmemmap using architectures do
have room.

Link: https://lkml.kernel.org/r/20250228182928.2645936-11-fvdl@google.com
Signed-off-by: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin (Cruise) &lt;roman.gushchin@linux.dev&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
