<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/page_alloc.c, branch v7.2-rc1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-06-21T18:31:29+00:00</updated>
<entry>
<title>mm/page_alloc: only update NUMA min ratios on sysctl write</title>
<updated>2026-06-21T18:31:29+00:00</updated>
<author>
<name>Jianlin Shi</name>
<email>shijianlin11@foxmail.com</email>
</author>
<published>2026-06-04T04:01:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=874611d193f29666184bcbcb8638eff87d63d019'/>
<id>urn:sha1:874611d193f29666184bcbcb8638eff87d63d019</id>
<content type='text'>
The sysctl handlers for min_unmapped_ratio and min_slab_ratio invoke
setup_min_unmapped_ratio() and setup_min_slab_ratio() unconditionally
after proc_dointvec_minmax(), even for read operations.

These setup functions first zero all per-NUMA node thresholds
(min_unmapped_pages and min_slab_pages) before recalculating them. 
Reading /proc sysctl entries therefore temporarily resets node reclaim
thresholds to zero, which may disturb the behavior of __node_reclaim() and
node_reclaim() during the recomputation.

Fix this by only calling the setup functions when the sysctl is actually
written (write == 1), matching the behavior of existing sysctl handlers
like min_free_kbytes and watermark_scale_factor.

This only affects systems with CONFIG_NUMA.

Link: https://lore.kernel.org/tencent_5891052AF9A4C2D490A62F478D446F74AB09@qq.com
Signed-off-by: Jianlin Shi &lt;shijianlin11@foxmail.com&gt;
Cc: Brendan Jackman &lt;jackmanb@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@kernel.org&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list</title>
<updated>2026-06-21T18:31:28+00:00</updated>
<author>
<name>Hao Ge</name>
<email>hao.ge@linux.dev</email>
</author>
<published>2026-06-04T02:40:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0c3a350d13ce8bb7c3427597819b0dfbc19ba242'/>
<id>urn:sha1:0c3a350d13ce8bb7c3427597819b0dfbc19ba242</id>
<content type='text'>
Pages allocated before page_ext is available have their codetag left
uninitialized.  Track these early PFNs and clear their codetag in
clear_early_alloc_pfn_tag_refs() to avoid "alloc_tag was not set" warnings
when they are freed later.

Currently a fixed-size array of 8192 entries is used, with a warning if
the limit is exceeded.  However, the number of early allocations depends
on the number of CPUs and can be larger than 8192.

Replace the fixed-size array with a dynamically allocated linked list of
pfn_pool structs.  Each node is allocated via alloc_page() and mapped to a
pfn_pool containing a next pointer, an atomic slot counter, and a PFN
array that fills the remainder of the page.

The tracking pages themselves are allocated via alloc_page(), which would
trigger __pgalloc_tag_add() -&gt; alloc_tag_add_early_pfn() and recurse
indefinitely.  Introduce __GFP_NO_CODETAG (reuses the %__GFP_NO_OBJ_EXT
bit) and pass gfp_flags through pgalloc_tag_add() so that the early path
can skip recording allocations that carry this flag.

Link: https://lore.kernel.org/20260604024008.46592-1-hao.ge@linux.dev
Signed-off-by: Hao Ge &lt;hao.ge@linux.dev&gt;
Suggested-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Acked-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc: fix deferred compaction accounting</title>
<updated>2026-06-09T01:21:26+00:00</updated>
<author>
<name>fujunjie</name>
<email>fujunjie1@qq.com</email>
</author>
<published>2026-05-26T09:12:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3d4f1a54160046d5059ec6c5f2152e054e7b12d7'/>
<id>urn:sha1:3d4f1a54160046d5059ec6c5f2152e054e7b12d7</id>
<content type='text'>
COMPACT_DEFERRED means compaction did not start because past failures
caused the zone to be deferred.  try_to_compact_pages() returns the
maximum result seen while walking the zonelist, so a final
COMPACT_DEFERRED result means no later zone reported that compaction
actually ran.

__alloc_pages_direct_compact() skips COMPACTSTALL and COMPACTFAIL
accounting when try_to_compact_pages() returns COMPACT_SKIPPED, but not
when it returns COMPACT_DEFERRED.  A deferred-only direct compaction
attempt can therefore look like a stall, and then a failure if the
allocation still cannot be satisfied.

Treat COMPACT_DEFERRED like COMPACT_SKIPPED in this accounting path.  If a
later zone runs compaction and returns a result above COMPACT_DEFERRED, or
compact_zone_order() reports COMPACT_SUCCESS for a captured page, the
final result is not COMPACT_DEFERRED and the existing accounting still
runs.

Link: https://lore.kernel.org/tencent_368AF1F3821E46232637BE16D65C45CF3308@qq.com
Fixes: 06dac2f467fe ("mm: compaction: update the COMPACT[STALL|FAIL] events properly")
Signed-off-by: fujunjie &lt;fujunjie1@qq.com&gt;
Reviewed-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Cc: Brendan Jackman &lt;jackmanb@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/compaction: respect cpusets when checking retry suitability</title>
<updated>2026-06-09T01:21:25+00:00</updated>
<author>
<name>fujunjie</name>
<email>fujunjie1@qq.com</email>
</author>
<published>2026-05-26T12:22:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ed384eb3a3e121c1d6d5c5d36950fbd286b92026'/>
<id>urn:sha1:ed384eb3a3e121c1d6d5c5d36950fbd286b92026</id>
<content type='text'>
should_compact_retry() handles COMPACT_SKIPPED by asking
compaction_zonelist_suitable() whether reclaim can make a later compaction
attempt worthwhile.  That answer is used for the current allocation, so it
should follow the same zone eligibility rules as the allocation itself.

When cpusets are enabled, allocator slowpath decisions are marked with
ALLOC_CPUSET.  The allocation path, direct compaction and reclaim retry
all skip zones rejected by __cpuset_zone_allowed().

compaction_zonelist_suitable() does not apply that filter.  It only walks
ac-&gt;zonelist/ac-&gt;nodemask, so it can return true because a zone that is
not usable for the current allocation would pass __compaction_suitable().

That does not let the allocation use the disallowed zone.  Later
allocation and direct compaction paths still apply cpuset filtering. 
However, it can make should_compact_retry() retry based on memory that
this allocation cannot use.

Pass gfp_mask down and apply the same ALLOC_CPUSET check in
compaction_zonelist_suitable().  This keeps the retry decision aligned
with the zones that the allocation is allowed to use.

A temporary debugfs probe was also used to call the old and new
compaction_zonelist_suitable() predicates in the same two-node NUMA guest.
The task was restricted to mems=0 while ac-&gt;nodemask covered nodes 0-1. 
After putting pressure on node0, node0 failed __compaction_suitable() for
order-10 and node1 passed it, but node1 was rejected by
__cpuset_zone_allowed().  In that state the old predicate returned true
and the patched predicate returned false.

Link: https://lore.kernel.org/tencent_F59F2BA2CC5779308E10DF54593C736D3E0A@qq.com
Fixes: 435b3894e742 ("mm:page_alloc: fix the NULL ac-&gt;nodemask in __alloc_pages_slowpath()")
Signed-off-by: fujunjie &lt;fujunjie1@qq.com&gt;
Reviewed-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Cc: Brendan Jackman &lt;jackmanb@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc: remove VM_BUG_ON()s from pindex helpers</title>
<updated>2026-06-04T21:45:03+00:00</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2026-05-26T11:28:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=62f272d2fbffa7494e4d01c35a3a7b30d71b30a1'/>
<id>urn:sha1:62f272d2fbffa7494e4d01c35a3a7b30d71b30a1</id>
<content type='text'>
Vlastimil pointed out that the VM_BUG_ON()s have fallen out of favour, so
remove them.

Link: https://lore.kernel.org/20260526-page_alloc-unmapped-prep-v2-1-412f4d486115@google.com
Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Suggested-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Link: https://lore.kernel.org/all/4074a816-9e75-45a6-8141-25459bcc106b@kernel.org/
Reviewed-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc: fix defrag_mode for non-reclaimable allocations</title>
<updated>2026-06-04T21:44:59+00:00</updated>
<author>
<name>Dmitry Ilvokhin</name>
<email>d@ilvokhin.com</email>
</author>
<published>2026-05-20T12:22:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4c0ed883e0516aee79496b6277cbea63a08b2676'/>
<id>urn:sha1:4c0ed883e0516aee79496b6277cbea63a08b2676</id>
<content type='text'>
When defrag_mode is enabled, ALLOC_NOFRAGMENT is enforced to prevent
migratetype fallbacks and keep pageblocks clean.  The allocator relies on
reclaim and compaction to free pages of the correct type before allowing
fallback as a last resort.

However, non-reclaimable allocations such as GFP_ATOMIC cannot invoke
direct reclaim or compaction.  With defrag_mode=1, these allocations hit
the !can_direct_reclaim bailout in __alloc_pages_slowpath() with
ALLOC_NOFRAGMENT still set, and fail without ever attempting a fallback.

This causes a large number of SLUB allocation failures for
skbuff_head_cache under network-heavy workloads, despite free memory being
available in other migratetype freelists.

We observed it on a few of the Meta workloads that adopted
defrag_mode=1.

For the service under load there were 85509 SLUB allocation failures
messages in dmesg within 2 hours.  All of them are GFP_ATOMIC
allocations for skbuff_head_cache, despite free pages being available
in other migratetype freelists (~13 GB free).

Since it is networking path from the practical point of view, this
means dropped packets, failed RPC requests, tail latency spikes and
overall service degradation.

Clear ALLOC_NOFRAGMENT and retry for allocations that request kswapd
reclaim but cannot do direct reclaim themselves (GFP_ATOMIC).  Purely
speculative allocations like GFP_TRANSHUGE_LIGHT that don't set
__GFP_KSWAPD_RECLAIM are left to fail, since they have reasonable
fallbacks and should not cause fragmentation.

Link: https://lore.kernel.org/20260520122228.201550-1-d@ilvokhin.com
Fixes: e3aa7df331bc ("mm: page_alloc: defrag_mode")
Signed-off-by: Dmitry Ilvokhin &lt;d@ilvokhin.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Cc: Brendan Jackman &lt;jackmanb@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc: document that alloc_pages_nolock() uses RCU</title>
<updated>2026-06-02T22:22:20+00:00</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2026-05-19T14:17:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=47166f2199557e57cbab2882b033fb2949818fbb'/>
<id>urn:sha1:47166f2199557e57cbab2882b033fb2949818fbb</id>
<content type='text'>
The allocator interacts with cgroups which rely on RCU.  RCU does not work
everywhere, so the "any context" claim is slightly overstated here.

This should already be enforced by objtool, since this function is not
marked noinstr the x86 build should fail if you call it from a place where
RCU is not watching.  But, expecting readers to make that connection for
themselves seems a bit cruel (I don't think there is even any
documentation of what noinstr means at all, let alone the connection with
RCU).

Note this is not claiming that any cgroup code called from the allocator
would actually break if this restriction was violated, it could very well
be that there's no real way for the allocator to act on a cgroup that can
disappear concurrently.  But, since it's likely nobody has verified this
one way or another, better to just be safe and declare that RCU is
required.  Allocating from an RCU-unsafe context seems a bit crazy anyway.

Link: https://lore.kernel.org/20260519-nolock-rcu-comment-v1-1-4a630c8794e5@google.com
Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Suggested-by: Junaid Shahid &lt;junaids@google.com&gt;
Acked-by: Harry Yoo (Oracle) &lt;harry@kernel.org&gt;
Acked-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc: drop a misleading __always_inline</title>
<updated>2026-06-02T22:22:20+00:00</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2026-05-17T23:37:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d231522bf07287c5bcf7c6af6960f476663324b5'/>
<id>urn:sha1:d231522bf07287c5bcf7c6af6960f476663324b5</id>
<content type='text'>
get_pfnblock_migratetype() is called from outside page_alloc.c, so it
cannot always be inlined.  Remove the annotation to avoid misleading
readers.

At least in my minimal config, with GCC, this doesn't change
mm/page_alloc.o at all.

Link: https://lore.kernel.org/all/20260517-b4-drop-always-inline-v1-1-97b90930e8b8@google.com/
Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Suggested-by: Vlastimil Babka &lt;vbabka@kernel.org&gt;
Link: https://lore.kernel.org/all/016c8bef-57ef-44ef-bf60-86dbfd368dcd@kernel.org/
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: SeongJae Park &lt;sj@kernel.org&gt;
Reviewed-by: Vishal Moola &lt;vishal.moola@gmail.com&gt;
Reviewed-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc: remove ifdefs from pindex helpers</title>
<updated>2026-06-02T22:22:19+00:00</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2026-05-13T12:35:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=248b144a8a6dc534d8bc1c1470efe571de5b7ae6'/>
<id>urn:sha1:248b144a8a6dc534d8bc1c1470efe571de5b7ae6</id>
<content type='text'>
The ifdefs are not technically needed here, everything used here is
always defined.

Switching to IS_ENABLED() makes the code a bit less tiresome to read.

Link: https://lore.kernel.org/20260513-page_alloc-unmapped-prep-v1-4-dacdf5402be8@google.com
Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Reviewed-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;kasong@tencent.com&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Liam R. Howlett &lt;liam@infradead.org&gt;
Cc: Lorenzo Stoakes &lt;ljs@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Yuanchu Xie &lt;yuanchu@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: rejig pageblock mask definitions</title>
<updated>2026-06-02T22:22:19+00:00</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2026-05-13T12:35:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3687c0fd67249cb971990b382a47f02f19ed9f67'/>
<id>urn:sha1:3687c0fd67249cb971990b382a47f02f19ed9f67</id>
<content type='text'>
- Add a PAGEBLOCK_ prefix to the names to avoid polluting the "global
  namespace" too much.

- This new prefix makes MIGRATETYPE_AND_ISO_MASK look pretty long. Well,
  that global mask only exists for quite a specific purpose, and is
  quite a weird thing to have a name for anyway. So drop it and take
  advantage of the newly-defined PAGEBLOCK_ISO_MASK.

Link: https://lore.kernel.org/20260513-page_alloc-unmapped-prep-v1-3-dacdf5402be8@google.com
Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Reviewed-by: Vlastimil Babka (SUSE) &lt;vbabka@kernel.org&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;kasong@tencent.com&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Liam R. Howlett &lt;liam@infradead.org&gt;
Cc: Lorenzo Stoakes &lt;ljs@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Yuanchu Xie &lt;yuanchu@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
