<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/vmstat.c, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-23T10:21:36+00:00</updated>
<entry>
<title>mm/page_alloc/vmstat: simplify refresh_cpu_vm_stats change detection</title>
<updated>2026-01-23T10:21:36+00:00</updated>
<author>
<name>Joshua Hahn</name>
<email>joshua.hahnjy@gmail.com</email>
</author>
<published>2025-10-14T14:50:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2a72a8ddf1888d56744ad7242be8f88d0c9df17b'/>
<id>urn:sha1:2a72a8ddf1888d56744ad7242be8f88d0c9df17b</id>
<content type='text'>
commit 0acc67c4030c39f39ac90413cc5d0abddd3a9527 upstream.

Patch series "mm/page_alloc: Batch callers of free_pcppages_bulk", v5.

Motivation &amp; Approach
=====================

While testing workloads with high sustained memory pressure on large
machines in the Meta fleet (1Tb memory, 316 CPUs), we saw an unexpectedly
high number of softlockups.  Further investigation showed that the zone
lock in free_pcppages_bulk was being held for a long time, and was called
to free 2k+ pages over 100 times just during boot.

This causes starvation in other processes for the zone lock, which can
lead to the system stalling as multiple threads cannot make progress
without the locks.  We can see these issues manifesting as warnings:

[ 4512.591979] rcu: INFO: rcu_sched self-detected stall on CPU
[ 4512.604370] rcu:     20-....: (9312 ticks this GP) idle=a654/1/0x4000000000000000 softirq=309340/309344 fqs=5426
[ 4512.626401] rcu:              hardirqs   softirqs   csw/system
[ 4512.638793] rcu:      number:        0        145            0
[ 4512.651177] rcu:     cputime:       30      10410          174   ==&gt; 10558(ms)
[ 4512.666657] rcu:     (t=21077 jiffies g=783665 q=1242213 ncpus=316)

While these warnings don't indicate a crash or a kernel panic, they do
point to the underlying issue of lock contention.  To prevent starvation
in both locks, batch the freeing of pages using pcp-&gt;batch.

Because free_pcppages_bulk is called with the pcp lock and acquires the
zone lock, relinquishing and reacquiring the locks are only effective when
both of them are broken together (unless the system was built with queued
spinlocks).  Thus, instead of modifying free_pcppages_bulk to break both
locks, batch the freeing from its callers instead.

A similar fix has been implemented in the Meta fleet, and we have seen
significantly less softlockups.

Testing
=======
The following are a few synthetic benchmarks, made on three machines. The
first is a large machine with 754GiB memory and 316 processors.
The second is a relatively smaller machine with 251GiB memory and 176
processors. The third and final is the smallest of the three, which has 62GiB
memory and 36 processors.

On all machines, I kick off a kernel build with -j$(nproc).
Negative delta is better (faster compilation).

Large machine (754GiB memory, 316 processors)
make -j$(nproc)
+------------+---------------+-----------+
| Metric (s) | Variation (%) | Delta(%)  |
+------------+---------------+-----------+
| real       |        0.8070 |  - 1.4865 |
| user       |        0.2823 |  + 0.4081 |
| sys        |        5.0267 |  -11.8737 |
+------------+---------------+-----------+

Medium machine (251GiB memory, 176 processors)
make -j$(nproc)
+------------+---------------+----------+
| Metric (s) | Variation (%) | Delta(%) |
+------------+---------------+----------+
| real       |        0.2806 |  +0.0351 |
| user       |        0.0994 |  +0.3170 |
| sys        |        0.6229 |  -0.6277 |
+------------+---------------+----------+

Small machine (62GiB memory, 36 processors)
make -j$(nproc)
+------------+---------------+----------+
| Metric (s) | Variation (%) | Delta(%) |
+------------+---------------+----------+
| real       |        0.1503 |  -2.6585 |
| user       |        0.0431 |  -2.2984 |
| sys        |        0.1870 |  -3.2013 |
+------------+---------------+----------+

Here, variation is the coefficient of variation, i.e.  standard deviation
/ mean.

Based on these results, it seems like there are varying degrees to how
much lock contention this reduces.  For the largest and smallest machines
that I ran the tests on, it seems like there is quite some significant
reduction.  There is also some performance increases visible from
userspace.

Interestingly, the performance gains don't scale with the size of the
machine, but rather there seems to be a dip in the gain there is for the
medium-sized machine.  One possible theory is that because the high
watermark depends on both memory and the number of local CPUs, what
impacts zone contention the most is not these individual values, but
rather the ratio of mem:processors.


This patch (of 5):

Currently, refresh_cpu_vm_stats returns an int, indicating how many
changes were made during its updates.  Using this information, callers
like vmstat_update can heuristically determine if more work will be done
in the future.

However, all of refresh_cpu_vm_stats's callers either (a) ignore the
result, only caring about performing the updates, or (b) only care about
whether changes were made, but not *how many* changes were made.

Simplify the code by returning a bool instead to indicate if updates
were made.

In addition, simplify fold_diff and decay_pcp_high to return a bool
for the same reason.

Link: https://lkml.kernel.org/r/20251014145011.3427205-1-joshua.hahnjy@gmail.com
Link: https://lkml.kernel.org/r/20251014145011.3427205-2-joshua.hahnjy@gmail.com
Signed-off-by: Joshua Hahn &lt;joshua.hahnjy@gmail.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Brendan Jackman &lt;jackmanb@google.com&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Stable-dep-of: 038a102535eb ("mm/page_alloc: prevent pcp corruption with SMP=n")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: re-enable kswapd when memory pressure subsides or demotion is toggled</title>
<updated>2025-09-21T21:22:29+00:00</updated>
<author>
<name>Chanwon Park</name>
<email>flyinrm@gmail.com</email>
</author>
<published>2025-09-08T10:04:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e7a5f249e6db3b41a4618763e0f840639f3578f4'/>
<id>urn:sha1:e7a5f249e6db3b41a4618763e0f840639f3578f4</id>
<content type='text'>
If kswapd fails to reclaim pages from a node MAX_RECLAIM_RETRIES in a
row, kswapd on that node gets disabled. That is, the system won't wakeup
kswapd for that node until page reclamation is observed at least once.
That reclamation is mostly done by direct reclaim, which in turn enables
kswapd back.

However, on systems with CXL memory nodes, workloads with high anon page
usage can disable kswapd indefinitely, without triggering direct
reclaim. This can be reproduced with following steps:

   numa node 0   (32GB memory, 48 CPUs)
   numa node 2~5 (512GB CXL memory, 128GB each)
   (numa node 1 is disabled)
   swap space 8GB

   1) Set /sys/kernel/mm/demotion_enabled to 0.
   2) Set /proc/sys/kernel/numa_balancing to 0.
   3) Run a process that allocates and random accesses 500GB of anon
      pages.
   4) Let the process exit normally.

During 3), free memory on node 0 gets lower than low watermark, and
kswapd runs and depletes swap space. Then, kswapd fails consecutively
and gets disabled. Allocation afterwards happens on CXL memory, so node
0 never gains more memory pressure to trigger direct reclaim.

After 4), kswapd on node 0 remains disabled, and tasks running on that
node are unable to swap. If you turn on NUMA_BALANCING_MEMORY_TIERING
and demotion now, it won't work properly since kswapd is disabled.

To mitigate this problem, reset kswapd_failures to 0 on following
conditions:

   a) ZONE_BELOW_HIGH bit of a zone in hopeless node with a fallback
      memory node gets cleared.
   b) demotion_enabled is changed from false to true.

Rationale for a):
   ZONE_BELOW_HIGH bit being cleared might be a sign that the node may
   be reclaimable afterwards. This won't help much if the memory-hungry
   process keeps running without freeing anything, but at least the node
   will go back to reclaimable state when the process exits.

Rationale for b):
   When demotion_enabled is false, kswapd can only reclaim anon pages by
   swapping them out to swap space. If demotion_enabled is turned on,
   kswapd can demote anon pages to another node for reclaiming. So, the
   original failure count for determining reclaimability is no longer
   valid.

Since kswapd_failures resets may be missed by ++ operation, it is
changed from int to atomic_t.

[akpm@linux-foundation.org: tweak whitespace]
Link: https://lkml.kernel.org/r/aL6qGi69jWXfPc4D@pcw-MS-7D22
Signed-off-by: Chanwon Park &lt;flyinrm@gmail.com&gt;
Cc: Brendan Jackman &lt;jackmanb@google.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add vmstat for kernel_file pages</title>
<updated>2025-09-13T23:55:20+00:00</updated>
<author>
<name>Boris Burkov</name>
<email>boris@bur.io</email>
</author>
<published>2025-08-21T21:55:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e3a9ac4e866ea746475b4026819a8c08ec1142e6'/>
<id>urn:sha1:e3a9ac4e866ea746475b4026819a8c08ec1142e6</id>
<content type='text'>
Kernel file pages are tricky to track because they are indistinguishable
from files whose usage is accounted to the root cgroup.

To maintain good accounting, introduce a vmstat counter tracking kernel
file pages.

Confirmed that these work as expected at a high level by mounting a btrfs
using AS_KERNEL_FILE for metadata pages, and seeing the counter rise with
fs usage then go back to a minimal level after drop_caches and finally
down to 0 after unmounting the fs.

Link: https://lkml.kernel.org/r/08ff633e3a005ed5f7691bfd9f58a5df8e474339.1755812945.git.boris@bur.io
Signed-off-by: Boris Burkov &lt;boris@bur.io&gt;
Suggested-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Tested-by: syzbot@syzkaller.appspotmail.com
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Qu Wenruo &lt;wqu@suse.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memory-tiering: fix PGPROMOTE_CANDIDATE counting</title>
<updated>2025-09-13T23:54:42+00:00</updated>
<author>
<name>Ruan Shiyang</name>
<email>ruansy.fnst@fujitsu.com</email>
</author>
<published>2025-07-29T03:51:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=337135e6124b6d37d7ef1cd5a6c0b9681938c5ee'/>
<id>urn:sha1:337135e6124b6d37d7ef1cd5a6c0b9681938c5ee</id>
<content type='text'>
Goto-san reported confusing pgpromote statistics where the
pgpromote_success count significantly exceeded pgpromote_candidate.

On a system with three nodes (nodes 0-1: DRAM 4GB, node 2: NVDIMM 4GB):
 # Enable demotion only
 echo 1 &gt; /sys/kernel/mm/numa/demotion_enabled
 numactl -m 0-1 memhog -r200 3500M &gt;/dev/null &amp;
 pid=$!
 sleep 2
 numactl memhog -r100 2500M &gt;/dev/null &amp;
 sleep 10
 kill -9 $pid # terminate the 1st memhog
 # Enable promotion
 echo 2 &gt; /proc/sys/kernel/numa_balancing

After a few seconds, we observeed `pgpromote_candidate &lt; pgpromote_success`
$ grep -e pgpromote /proc/vmstat
pgpromote_success 2579
pgpromote_candidate 0

In this scenario, after terminating the first memhog, the conditions for
pgdat_free_space_enough() are quickly met, and triggers promotion. 
However, these migrated pages are only counted for in PGPROMOTE_SUCCESS,
not in PGPROMOTE_CANDIDATE.

To solve these confusing statistics, introduce PGPROMOTE_CANDIDATE_NRL to
count the missed promotion pages.  And also, not counting these pages into
PGPROMOTE_CANDIDATE is to avoid changing the existing algorithm or
performance of the promotion rate limit.

Link: https://lkml.kernel.org/r/20250901090122.124262-1-ruansy.fnst@fujitsu.com
Link: https://lkml.kernel.org/r/20250729035101.1601407-1-ruansy.fnst@fujitsu.com
Fixes: c6833e10008f ("memory tiering: rate limit NUMA migration throughput")
Co-developed-by: Li Zhijian &lt;lizhijian@fujitsu.com&gt;
Signed-off-by: Li Zhijian &lt;lizhijian@fujitsu.com&gt;
Signed-off-by: Ruan Shiyang &lt;ruansy.fnst@fujitsu.com&gt;
Reported-by: Yasunori Gotou (Fujitsu) &lt;y-goto@fujitsu.com&gt;
Suggested-by: Huang Ying &lt;ying.huang@linux.alibaba.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Huang Ying &lt;ying.huang@linux.alibaba.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Valentin Schneider &lt;vschneid@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, vmstat: remove the NR_WRITEBACK_TEMP node_stat_item counter</title>
<updated>2025-07-20T01:59:47+00:00</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2025-06-25T15:51:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8356a5a3b078ca89c526dd6d71e9a76fec571c37'/>
<id>urn:sha1:8356a5a3b078ca89c526dd6d71e9a76fec571c37</id>
<content type='text'>
The only user of the counter (FUSE) was removed in commit 0c58a97f919c
("fuse: remove tmp folio for writebacks and internal rb tree") so follow
the established pattern of removing the counter and hardcoding 0 in
meminfo output, as done recently with NR_BOUNCE.  Update documentation for
procfs, including for the value for Bounce that was missed when removing
its counter.

Also remove the mention of NR_WRITEBACK_TEMP implications from a comment
in wb_position_ratio(). The rest of the comment there about fuse setting
bdi-&gt;max_ratio to 1% is still correct.

[vbabka@suse.cz: v2]
  Link: https://lkml.kernel.org/r/5a848e15-6a57-4ecb-a015-d4f358b8a5d3@suse.cz
Link: https://lkml.kernel.org/r/20250625-nr_writeback_removal-v1-1-7f2a0df70faa@suse.cz
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Brendan Jackman &lt;jackmanb@google.com&gt;
Cc: Danilo Krummrich &lt;dakr@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jeff Layton &lt;jlayton@kernel.org&gt;
Cc: Jeffle Xu &lt;jefflexu@linux.alibaba.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Joanne Koong &lt;joannelkoong@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kirill A. Shuemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Maxim Patlasov &lt;mpatlasov@parallels.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Zach O'Keefe &lt;zokeefe@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmstat: utilize designated initializers for the vmstat_text array</title>
<updated>2025-07-20T01:59:46+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2025-06-03T08:45:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ed6a9068a0fc01c27f49f97a84b3b80b0fda2787'/>
<id>urn:sha1:ed6a9068a0fc01c27f49f97a84b3b80b0fda2787</id>
<content type='text'>
The vmstat_text array defines labels for counters displayed in
/proc/vmstat.  The current definition of the array implies a specific
order of the counters in their enums, making it fragile.

To make it clear which counter the label is for, use designated
initializers.

Link: https://lkml.kernel.org/r/20250603084556.113975-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: strictly check vmstat_text array size</title>
<updated>2025-07-20T01:59:46+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2025-05-29T11:05:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8a63ff68e5ead16ab384dcbed6103f9ad371e246'/>
<id>urn:sha1:8a63ff68e5ead16ab384dcbed6103f9ad371e246</id>
<content type='text'>
/proc/vmstat displays counters from various sources.  It is easy to forget
to add or remove a label when a new counter is added or removed.

There is a BUILD_BUG_ON() function that catches missing labels.  However,
for some reason, it ignores extra labels.

Let's make the check strict.  This would help to catch issues when a
counter is removed.

Link: https://lkml.kernel.org/r/20250529110541.2960330-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Kirill A. Shuemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmstat: make MEMCG select VM_EVENT_COUNTERS</title>
<updated>2025-07-20T01:59:46+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2025-06-04T09:51:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fdc5001b002eaca9989b5f3563245662fd1e4d40'/>
<id>urn:sha1:fdc5001b002eaca9989b5f3563245662fd1e4d40</id>
<content type='text'>
The vmstat_text array contains labels for counters displayed in
/proc/vmstat.  It is important to keep the labels in sync with the
counters.

There is a BUILD_BUG_ON() check in vmstat_start() that ensures the size of
the vmstat_text is not smaller than VM_EVENT_COUNTERS.  This helps to
catch cases where a new counter is added but the label is not.  However,
it does not help if a counter is removed but the label remains.

It would be nice to make the BUILD_BUG_ON() check more strict to catch
such cases.  However, when compiling with MEMCG enabled but
VM_EVENT_COUNTERS disabled, the vmstat_text array is larger than
NR_VMSTAT_ITEMS.

This issue arises because some elements of the vmstat_text array are
present when either MEMCG or VM_EVENT_COUNTERS is enabled, but
NR_VMSTAT_ITEMS only accounts for these elements if VM_EVENT_COUNTERS is
enabled.

Instead of adjusting the NR_VMSTAT_ITEMS definition to account for MEMCG,
make MEMCG select VM_EVENT_COUNTERS.  VM_EVENT_COUNTERS is enabled in most
configurations anyway.

Link: https://lkml.kernel.org/r/20250604095111.533783-1-kirill.shutemov@linux.intel.com
Fixes: ebc5d83d0443 ("mm/memcontrol: use vmstat names for printing statistics")
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Tested-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Acked-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "sched/numa: add statistics of numa balance task"</title>
<updated>2025-07-10T04:07:56+00:00</updated>
<author>
<name>Chen Yu</name>
<email>yu.c.chen@intel.com</email>
</author>
<published>2025-07-04T13:56:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db6cc3f4ac2e6cdc898fc9cbc8b32ae1bf56bdad'/>
<id>urn:sha1:db6cc3f4ac2e6cdc898fc9cbc8b32ae1bf56bdad</id>
<content type='text'>
This reverts commit ad6b26b6a0a79166b53209df2ca1cf8636296382.

This commit introduces per-memcg/task NUMA balance statistics, but
unfortunately it introduced a NULL pointer exception due to the following
race condition: After a swap task candidate was chosen, its mm_struct
pointer was set to NULL due to task exit.  Later, when performing the
actual task swapping, the p-&gt;mm caused the problem.

CPU0                                   CPU1
:
...
task_numa_migrate
     task_numa_find_cpu
      task_numa_compare
        # a normal task p is chosen
        env-&gt;best_task = p

                                          # p exit:
                                          exit_signals(p);
                                             p-&gt;flags |= PF_EXITING
                                          exit_mm
                                             p-&gt;mm = NULL;

      migrate_swap_stop
        __migrate_swap_task((arg-&gt;src_task, arg-&gt;dst_cpu)
         count_memcg_event_mm(p-&gt;mm, NUMA_TASK_SWAP)# p-&gt;mm is NULL

task_lock() should be held and the PF_EXITING flag needs to be checked to
prevent this from happening.  After discussion, the conclusion was that
adding a lock is not worthwhile for some statistics calculations.  Revert
the change and rely on the tracepoint for this purpose.

Link: https://lkml.kernel.org/r/20250704135620.685752-1-yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/20250708064917.BBD13C4CEED@smtp.kernel.org
Fixes: ad6b26b6a0a7 ("sched/numa: add statistics of numa balance task")
Signed-off-by: Chen Yu &lt;yu.c.chen@intel.com&gt;
Reported-by: Jirka Hladky &lt;jhladky@redhat.com&gt;
Closes: https://lore.kernel.org/all/CAE4VaGBLJxpd=NeRJXpSCuw=REhC5LWJpC29kDy-Zh2ZDyzQZA@mail.gmail.com/
Reported-by: Srikanth Aithal &lt;Srikanth.Aithal@amd.com&gt;
Reported-by: Suneeth D &lt;Suneeth.D@amd.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Hladky &lt;jhladky@redhat.com&gt;
Cc: Libo Chen &lt;libo.chen@oracle.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix vmstat after removing NR_BOUNCE</title>
<updated>2025-06-06T05:02:22+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2025-05-29T10:38:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=684088173b1852e5f5891a7b2a124b7974cb827d'/>
<id>urn:sha1:684088173b1852e5f5891a7b2a124b7974cb827d</id>
<content type='text'>
Hongyu noticed that the nr_unaccepted counter kept growing even in the
absence of unaccepted memory on the machine.

This happens due to a commit that removed NR_BOUNCE: it removed the
counter from the enum zone_stat_item, but left it in the vmstat_text
array.

As a result, all counters below nr_bounce in /proc/vmstat are shifted by
one line, causing the numa_hit counter to be labeled as nr_unaccepted.

To fix this issue, remove nr_bounce from the vmstat_text array.

Link: https://lkml.kernel.org/r/20250529103832.2937460-1-kirill.shutemov@linux.intel.com
Fixes: 194df9f66db8 ("mm: remove NR_BOUNCE zone stat")
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-by: Hongyu Ning &lt;hongyu.ning@linux.intel.com&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hannes Reinecke &lt;hare@suse.de&gt;
Cc: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
