<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/hugetlb.h, branch v7.0-rc7</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-02-19T04:50:32+00:00</updated>
<entry>
<title>Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2026-02-19T04:50:32+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-19T04:50:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=eeccf287a2a517954b57cf9d733b3cf5d47afa34'/>
<id>urn:sha1:eeccf287a2a517954b57cf9d733b3cf5d47afa34</id>
<content type='text'>
Pull more MM  updates from Andrew Morton:

 - "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
   couple of issues in the demotion code - pages were failed demotion
   and were finding themselves demoted into disallowed nodes (Bing Jiao)

 - "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
   mapledtree race and performs a number of cleanups (Liam Howlett)

 - "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
   them" implements a lot of cleanups following on from the conversion
   of the VMA flags into a bitmap (Lorenzo Stoakes)

 - "support batch checking of references and unmapping for large folios"
   implements batching to greatly improve the performance of reclaiming
   clean file-backed large folios (Baolin Wang)

 - "selftests/mm: add memory failure selftests" does as claimed (Miaohe
   Lin)

* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
  mm/page_alloc: clear page-&gt;private in free_pages_prepare()
  selftests/mm: add memory failure dirty pagecache test
  selftests/mm: add memory failure clean pagecache test
  selftests/mm: add memory failure anonymous page test
  mm: rmap: support batched unmapping for file large folios
  arm64: mm: implement the architecture-specific clear_flush_young_ptes()
  arm64: mm: support batch clearing of the young flag for large folios
  arm64: mm: factor out the address and ptep alignment into a new helper
  mm: rmap: support batched checks of the references for large folios
  tools/testing/vma: add VMA userland tests for VMA flag functions
  tools/testing/vma: separate out vma_internal.h into logical headers
  tools/testing/vma: separate VMA userland tests into separate files
  mm: make vm_area_desc utilise vma_flags_t only
  mm: update all remaining mmap_prepare users to use vma_flags_t
  mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
  mm: update secretmem to use VMA flags on mmap_prepare
  mm: update hugetlbfs to use VMA flags on mmap_prepare
  mm: add basic VMA flag operation helper functions
  tools: bitmap: add missing bitmap_[subset(), andnot()]
  mm: add mk_vma_flags() bitmap flag macro helper
  ...
</content>
</entry>
<entry>
<title>Merge tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl</title>
<updated>2026-02-18T18:45:36+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-18T18:45:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=23b0f90ba871f096474e1c27c3d14f455189d2d9'/>
<id>urn:sha1:23b0f90ba871f096474e1c27c3d14f455189d2d9</id>
<content type='text'>
Pull sysctl updates from Joel Granados:

 - Remove macros from proc handler converters

   Replace the proc converter macros with "regular" functions. Though it
   is more verbose than the macro version, it helps when debugging and
   better aligns with coding-style.rst.

 - General cleanup

   Remove superfluous ctl_table forward declarations. Const qualify the
   memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays.
   Add missing kernel doc to proc_dointvec_conv.

 - Testing

   This series was run through sysctl selftests/kunit test suite in
   x86_64. And went into linux-next after rc4, giving it a good 3 weeks
   of testing

* tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions
  sysctl: Replace unidirectional INT converter macros with functions
  sysctl: Add kernel doc to proc_douintvec_conv
  sysctl: Replace UINT converter macros with functions
  sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros
  sysctl: clarify proc_douintvec_minmax doc
  sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n
  sysctl: Remove unused ctl_table forward declarations
  loadpin: Implement custom proc_handler for enforce
  alloc_tag: move memory_allocation_profiling_sysctls into .rodata
  sysctl: Add missing kernel-doc for proc_dointvec_conv
</content>
</entry>
<entry>
<title>mm: update hugetlbfs to use VMA flags on mmap_prepare</title>
<updated>2026-02-12T23:42:57+00:00</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lorenzo.stoakes@oracle.com</email>
</author>
<published>2026-01-22T16:06:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=097e8db5e22b03d6791abc243183f597d0f76a7b'/>
<id>urn:sha1:097e8db5e22b03d6791abc243183f597d0f76a7b</id>
<content type='text'>
In order to update all mmap_prepare users to utilising the new VMA flags
type vma_flags_t and associated helper functions, we start by updating
hugetlbfs which has a lot of additional logic that requires updating to
make this change.

This is laying the groundwork for eliminating the vm_flags_t from struct
vm_area_desc and using vma_flags_t only, which further lays the ground for
removing the deprecated vm_flags_t type altogether.

No functional changes intended.

Link: https://lkml.kernel.org/r/9226bec80c9aa3447cc2b83354f733841dba8a50.1769097829.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Cc: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Cc: Jarkko Sakkinen &lt;jarkko@kernel.org&gt;
Cc: Yury Norov &lt;ynorov@nvidia.com&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Pedro Falcato &lt;pfalcato@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "mm/hugetlb: deal with multiple calls to hugetlb_bootmem_alloc"</title>
<updated>2026-01-27T04:02:20+00:00</updated>
<author>
<name>Mike Rapoport (Microsoft)</name>
<email>rppt@kernel.org</email>
</author>
<published>2026-01-11T08:21:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=743758ccf8bede3e7c38f3f7d3f5131aa0a7b4a6'/>
<id>urn:sha1:743758ccf8bede3e7c38f3f7d3f5131aa0a7b4a6</id>
<content type='text'>
This reverts commit d58b2498200724e4f8c12d71a5953da03c8c8bdf.

hugetlb_bootmem_alloc() is called only once, no need to check if it was
called already at its entry.

Other checks performed during HVO initialization are also no longer
necessary because sparse_init() that calls hugetlb_vmemmap_init_early()
and hugetlb_vmemmap_init_late() is always called after
hugetlb_bootmem_alloc().

Link: https://lkml.kernel.org/r/20260111082105.290734-30-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Acked-by: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Alex Shi &lt;alexs@kernel.org&gt;
Cc: Andreas Larsson &lt;andreas@gaisler.com&gt;
Cc: "Borislav Petkov (AMD)" &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Dinh Nguyen &lt;dinguyen@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Berg &lt;johannes@sipsolutions.net&gt;
Cc: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Klara Modin &lt;klarasmodin@gmail.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Magnus Lindholm &lt;linmag7@gmail.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Pratyush Yadav &lt;pratyush@kernel.org&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: "Ritesh Harjani (IBM)" &lt;ritesh.list@gmail.com&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Stafford Horne &lt;shorne@gmail.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vineet Gupta &lt;vgupta@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, arch: consolidate hugetlb CMA reservation</title>
<updated>2026-01-27T04:02:19+00:00</updated>
<author>
<name>Mike Rapoport (Microsoft)</name>
<email>rppt@kernel.org</email>
</author>
<published>2026-01-11T08:21:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9fac145b6d3fe570277438f8d860eabf229dc545'/>
<id>urn:sha1:9fac145b6d3fe570277438f8d860eabf229dc545</id>
<content type='text'>
Every architecture that supports hugetlb_cma command line parameter
reserves CMA areas for hugetlb during setup_arch().

This obfuscates the ordering of hugetlb CMA initialization with respect to
the rest initialization of the core MM.

Introduce arch_hugetlb_cma_order() callback to allow architectures report
the desired order-per-bit of CMA areas and provide a week implementation
of arch_hugetlb_cma_order() for architectures that don't support hugetlb
with CMA.

Use this callback in hugetlb_cma_reserve() instead if passing the order as
parameter and call hugetlb_cma_reserve() from mm_core_init_early() rather
than have it spread over architecture specific code.

Link: https://lkml.kernel.org/r/20260111082105.290734-28-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Alex Shi &lt;alexs@kernel.org&gt;
Cc: Andreas Larsson &lt;andreas@gaisler.com&gt;
Cc: "Borislav Petkov (AMD)" &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Dinh Nguyen &lt;dinguyen@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Berg &lt;johannes@sipsolutions.net&gt;
Cc: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Klara Modin &lt;klarasmodin@gmail.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Magnus Lindholm &lt;linmag7@gmail.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Pratyush Yadav &lt;pratyush@kernel.org&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: "Ritesh Harjani (IBM)" &lt;ritesh.list@gmail.com&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Stafford Horne &lt;shorne@gmail.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vineet Gupta &lt;vgupta@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, hugetlb: implement movable_gigantic_pages sysctl</title>
<updated>2026-01-21T03:24:50+00:00</updated>
<author>
<name>Gregory Price</name>
<email>gourry@gourry.net</email>
</author>
<published>2025-12-21T12:56:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9e80e66ddaf736e5ca80cba8adf8d497bd53092f'/>
<id>urn:sha1:9e80e66ddaf736e5ca80cba8adf8d497bd53092f</id>
<content type='text'>
This reintroduces a concept removed by: commit d6cb41cc44c6 ("mm, hugetlb:
remove hugepages_treat_as_movable sysctl")

This sysctl provides flexibility between ZONE_MOVABLE use cases:
1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
2) onlining memory in ZONE_MOVABLE to make hugepage allocate reliable

When ZONE_MOVABLE is used to make huge page allocation more reliable,
disallowing gigantic pages memory in this region is pointless.  If hotplug
is not a requirement, we can loosen the restrictions to allow 1GB gigantic
pages in ZONE_MOVABLE.

Since 1GB can be difficult to migrate / has impacts on compaction /
defragmentation, we don't enable this by default.  Notably, 1GB pages can
only be migrated if another 1GB page is available - so hot-unplug will
fail if such a page cannot be found.

However, since there are scenarios where gigantic pages are migratable, we
should allow use of these on movable regions.

When not valid 1GB is available for migration, hot-unplug will retry
indefinitely (or until interrupted).  For example:

  echo 0 &gt; node0/hugepages/..-1GB/nr_hugepages  # clear node0 1GB pages
  echo 1 &gt; node1/hugepages/..-1GB/nr_hugepages  # reserve node1 1GB page
  ./alloc_huge_node1 &amp;    # Allocate a 1GB page on node1
  ./node1_offline  &amp;      # attempt to offline all node1 memory
  echo 1 &gt; node0/hugepages/..-1GB/nr_hugepages  # reserve node0 1GB page

In this example, node1_offline will block indefinitely until the final
step, when a node0 1GB page is made available.

Note: Boot-time CMA is not possible for driver-managed hotplug memory, as
CMA requires the memory to be registered as SystemRAM at boot time. 
Additionally, 1GB huge pages are not supported by THP.

Link: https://lkml.kernel.org/r/20251221125603.2364174-1-gourry@gourry.net
Signed-off-by: Gregory Price &lt;gourry@gourry.net&gt;
Suggested-by: David Rientjes &lt;rientjes@google.com&gt;
Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/
Acked-by: David Hildenbrand (Red Hat) &lt;david@kernel.org&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: "David Hildenbrand (Red Hat)" &lt;david@kernel.org&gt;
Cc: Gregory Price &lt;gourry@gourry.net&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather</title>
<updated>2026-01-20T17:34:26+00:00</updated>
<author>
<name>David Hildenbrand (Red Hat)</name>
<email>david@kernel.org</email>
</author>
<published>2025-12-23T21:40:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ce720d5bd91e9dc16db3604aa4b1bf76770a9a1'/>
<id>urn:sha1:8ce720d5bd91e9dc16db3604aa4b1bf76770a9a1</id>
<content type='text'>
As reported, ever since commit 1013af4f585f ("mm/hugetlb: fix
huge_pmd_unshare() vs GUP-fast race") we can end up in some situations
where we perform so many IPI broadcasts when unsharing hugetlb PMD page
tables that it severely regresses some workloads.

In particular, when we fork()+exit(), or when we munmap() a large
area backed by many shared PMD tables, we perform one IPI broadcast per
unshared PMD table.

There are two optimizations to be had:

(1) When we process (unshare) multiple such PMD tables, such as during
    exit(), it is sufficient to send a single IPI broadcast (as long as
    we respect locking rules) instead of one per PMD table.

    Locking prevents that any of these PMD tables could get reused before
    we drop the lock.

(2) When we are not the last sharer (&gt; 2 users including us), there is
    no need to send the IPI broadcast. The shared PMD tables cannot
    become exclusive (fully unshared) before an IPI will be broadcasted
    by the last sharer.

    Concurrent GUP-fast could walk into a PMD table just before we
    unshared it. It could then succeed in grabbing a page from the
    shared page table even after munmap() etc succeeded (and supressed
    an IPI). But there is not difference compared to GUP-fast just
    sleeping for a while after grabbing the page and re-enabling IRQs.

    Most importantly, GUP-fast will never walk into page tables that are
    no-longer shared, because the last sharer will issue an IPI
    broadcast.

    (if ever required, checking whether the PUD changed in GUP-fast
     after grabbing the page like we do in the PTE case could handle
     this)

So let's rework PMD sharing TLB flushing + IPI sync to use the mmu_gather
infrastructure so we can implement these optimizations and demystify the
code at least a bit. Extend the mmu_gather infrastructure to be able to
deal with our special hugetlb PMD table sharing implementation.

To make initialization of the mmu_gather easier when working on a single
VMA (in particular, when dealing with hugetlb), provide
tlb_gather_mmu_vma().

We'll consolidate the handling for (full) unsharing of PMD tables in
tlb_unshare_pmd_ptdesc() and tlb_flush_unshared_tables(), and track
in "struct mmu_gather" whether we had (full) unsharing of PMD tables.

Because locking is very special (concurrent unsharing+reuse must be
prevented), we disallow deferring flushing to tlb_finish_mmu() and instead
require an explicit earlier call to tlb_flush_unshared_tables().

From hugetlb code, we call huge_pmd_unshare_flush() where we make sure
that the expected lock protecting us from concurrent unsharing+reuse is
still held.

Check with a VM_WARN_ON_ONCE() in tlb_finish_mmu() that
tlb_flush_unshared_tables() was properly called earlier.

Document it all properly.

Notes about tlb_remove_table_sync_one() interaction with unsharing:

There are two fairly tricky things:

(1) tlb_remove_table_sync_one() is a NOP on architectures without
    CONFIG_MMU_GATHER_RCU_TABLE_FREE.

    Here, the assumption is that the previous TLB flush would send an
    IPI to all relevant CPUs. Careful: some architectures like x86 only
    send IPIs to all relevant CPUs when tlb-&gt;freed_tables is set.

    The relevant architectures should be selecting
    MMU_GATHER_RCU_TABLE_FREE, but x86 might not do that in stable
    kernels and it might have been problematic before this patch.

    Also, the arch flushing behavior (independent of IPIs) is different
    when tlb-&gt;freed_tables is set. Do we have to enlighten them to also
    take care of tlb-&gt;unshared_tables? So far we didn't care, so
    hopefully we are fine. Of course, we could be setting
    tlb-&gt;freed_tables as well, but that might then unnecessarily flush
    too much, because the semantics of tlb-&gt;freed_tables are a bit
    fuzzy.

    This patch changes nothing in this regard.

(2) tlb_remove_table_sync_one() is not a NOP on architectures with
    CONFIG_MMU_GATHER_RCU_TABLE_FREE that actually don't need a sync.

    Take x86 as an example: in the common case (!pv, !X86_FEATURE_INVLPGB)
    we still issue IPIs during TLB flushes and don't actually need the
    second tlb_remove_table_sync_one().

    This optimized can be implemented on top of this, by checking e.g., in
    tlb_remove_table_sync_one() whether we really need IPIs. But as
    described in (1), it really must honor tlb-&gt;freed_tables then to
    send IPIs to all relevant CPUs.

Notes on TLB flushing changes:

(1) Flushing for non-shared PMD tables

    We're converting from flush_hugetlb_tlb_range() to
    tlb_remove_huge_tlb_entry(). Given that we properly initialize the
    MMU gather in tlb_gather_mmu_vma() to be hugetlb aware, similar to
    __unmap_hugepage_range(), that should be fine.

(2) Flushing for shared PMD tables

    We're converting from various things (flush_hugetlb_tlb_range(),
    tlb_flush_pmd_range(), flush_tlb_range()) to tlb_flush_pmd_range().

    tlb_flush_pmd_range() achieves the same that
    tlb_remove_huge_tlb_entry() would achieve in these scenarios.
    Note that tlb_remove_huge_tlb_entry() also calls
    __tlb_remove_tlb_entry(), however that is only implemented on
    powerpc, which does not support PMD table sharing.

    Similar to (1), tlb_gather_mmu_vma() should make sure that TLB
    flushing keeps on working as expected.

Further, note that the ptdesc_pmd_pts_dec() in huge_pmd_share() is not a
concern, as we are holding the i_mmap_lock the whole time, preventing
concurrent unsharing. That ptdesc_pmd_pts_dec() usage will be removed
separately as a cleanup later.

There are plenty more cleanups to be had, but they have to wait until
this is fixed.

[david@kernel.org: fix kerneldoc]
  Link: https://lkml.kernel.org/r/f223dd74-331c-412d-93fc-69e360a5006c@kernel.org
Link: https://lkml.kernel.org/r/20251223214037.580860-5-david@kernel.org
Fixes: 1013af4f585f ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race")
Signed-off-by: David Hildenbrand (Red Hat) &lt;david@kernel.org&gt;
Reported-by: Uschakow, Stanislav" &lt;suschako@amazon.de&gt;
Closes: https://lore.kernel.org/all/4d3878531c76479d9f8ca9789dc6485d@amazon.de/
Tested-by: Laurence Oberman &lt;loberman@redhat.com&gt;
Acked-by: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Lance Yang &lt;lance.yang@linux.dev&gt;
Cc: Liu Shixin &lt;liushixin2@huawei.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: fix hugetlb_pmd_shared()</title>
<updated>2026-01-20T17:34:26+00:00</updated>
<author>
<name>David Hildenbrand (Red Hat)</name>
<email>david@kernel.org</email>
</author>
<published>2025-12-23T21:40:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca1a47cd3f5f4c46ca188b1c9a27af87d1ab2216'/>
<id>urn:sha1:ca1a47cd3f5f4c46ca188b1c9a27af87d1ab2216</id>
<content type='text'>
Patch series "mm/hugetlb: fixes for PMD table sharing (incl.  using
mmu_gather)", v3.

One functional fix, one performance regression fix, and two related
comment fixes.

I cleaned up my prototype I recently shared [1] for the performance fix,
deferring most of the cleanups I had in the prototype to a later point. 
While doing that I identified the other things.

The goal of this patch set is to be backported to stable trees "fairly"
easily. At least patch #1 and #4.

Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing
Patch #2 + #3 are simple comment fixes that patch #4 interacts with.
Patch #4 is a fix for the reported performance regression due to excessive
IPI broadcasts during fork()+exit().

The last patch is all about TLB flushes, IPIs and mmu_gather.
Read: complicated

There are plenty of cleanups in the future to be had + one reasonable
optimization on x86. But that's all out of scope for this series.

Runtime tested, with a focus on fixing the performance regression using
the original reproducer [2] on x86.


This patch (of 4):

We switched from (wrongly) using the page count to an independent shared
count.  Now, shared page tables have a refcount of 1 (excluding
speculative references) and instead use ptdesc-&gt;pt_share_count to identify
sharing.

We didn't convert hugetlb_pmd_shared(), so right now, we would never
detect a shared PMD table as such, because sharing/unsharing no longer
touches the refcount of a PMD table.

Page migration, like mbind() or migrate_pages() would allow for migrating
folios mapped into such shared PMD tables, even though the folios are not
exclusive.  In smaps we would account them as "private" although they are
"shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the
pagemap interface.

Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared().

Link: https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org
Link: https://lkml.kernel.org/r/20251223214037.580860-2-david@kernel.org
Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [1]
Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [2]
Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count")
Signed-off-by: David Hildenbrand (Red Hat) &lt;david@kernel.org&gt;
Reviewed-by: Rik van Riel &lt;riel@surriel.com&gt;
Reviewed-by: Lance Yang &lt;lance.yang@linux.dev&gt;
Tested-by: Lance Yang &lt;lance.yang@linux.dev&gt;
Reviewed-by: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Tested-by: Laurence Oberman &lt;loberman@redhat.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Liu Shixin &lt;liushixin2@huawei.com&gt;
Cc: Uschakow, Stanislav" &lt;suschako@amazon.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sysctl: Remove unused ctl_table forward declarations</title>
<updated>2026-01-05T12:54:41+00:00</updated>
<author>
<name>Joel Granados</name>
<email>joel.granados@kernel.org</email>
</author>
<published>2025-11-03T21:29:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f7386f545e49e5e6229a14d92b39340d155b0b3f'/>
<id>urn:sha1:f7386f545e49e5e6229a14d92b39340d155b0b3f</id>
<content type='text'>
Remove superfluous forward declarations of ctl_table from header files
where they are no longer needed. These declarations were left behind
after sysctl code refactoring and cleanup.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Muchun Song &lt;muchun.song@linux.dev&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Joel Granados &lt;joel.granados@kernel.org&gt;
</content>
</entry>
<entry>
<title>hugetlb: add __read_mostly to sysctl_hugetlb_shm_group</title>
<updated>2025-11-24T23:08:55+00:00</updated>
<author>
<name>Gregory Price</name>
<email>gourry@gourry.net</email>
</author>
<published>2025-11-21T19:48:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=348ced3da52b3161f5ceec8868e81973ce48e11d'/>
<id>urn:sha1:348ced3da52b3161f5ceec8868e81973ce48e11d</id>
<content type='text'>
sysctl bits are mostly-read values.

Link: https://lkml.kernel.org/r/20251121194859.265259-2-gourry@gourry.net
Signed-off-by: Gregory Price &lt;gourry@gourry.net&gt;
Suggested-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
