<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/vm_event_item.h, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-11-11T08:00:37+00:00</updated>
<entry>
<title>mm: count zeromap read and set for swapout and swapin</title>
<updated>2024-11-11T08:00:37+00:00</updated>
<author>
<name>Barry Song</name>
<email>v-songbaohua@oppo.com</email>
</author>
<published>2024-11-07T01:12:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e7ac4daeed91a25382091e73818ea0cddb1afd5e'/>
<id>urn:sha1:e7ac4daeed91a25382091e73818ea0cddb1afd5e</id>
<content type='text'>
When the proportion of folios from the zeromap is small, missing their
accounting may not significantly impact profiling.  However, it's easy to
construct a scenario where this becomes an issue—for example, allocating
1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT,
and then swapping it back in.  In this case, the swap-out and swap-in
counts seem to vanish into a black hole, potentially causing semantic
ambiguity.

On the other hand, Usama reported that zero-filled pages can exceed 10% in
workloads utilizing zswap, while Hailong noted that some app in Android
have more than 6% zero-filled pages.  Before commit 0ca0c24e3211 ("mm:
store zero pages to be swapped out in a bitmap"), both zswap and zRAM
implemented similar optimizations, leading to these optimized-out pages
being counted in either zswap or zRAM counters (with pswpin/pswpout also
increasing for zRAM).  With zeromap functioning prior to both zswap and
zRAM, userspace will no longer detect these swap-out and swap-in actions.

We have three ways to address this:

1. Introduce a dedicated counter specifically for the zeromap.

2. Use pswpin/pswpout accounting, treating the zero map as a standard
   backend.  This approach aligns with zRAM's current handling of
   same-page fills at the device level.  However, it would mean losing the
   optimized-out page counters previously available in zRAM and would not
   align with systems using zswap.  Additionally, as noted by Nhat Pham,
   pswpin/pswpout counters apply only to I/O done directly to the backend
   device.

3. Count zeromap pages under zswap, aligning with system behavior when
   zswap is enabled.  However, this would not be consistent with zRAM, nor
   would it align with systems lacking both zswap and zRAM.

Given the complications with options 2 and 3, this patch selects
option 1.

We can find these counters from /proc/vmstat (counters for the whole
system) and memcg's memory.stat (counters for the interested memcg).

For example:

$ grep -E 'swpin_zero|swpout_zero' /proc/vmstat
swpin_zero 1648
swpout_zero 33536

$ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat
swpin_zero 3905
swpout_zero 3985

This patch does not address any specific zeromap bug, but the missing
swpout and swpin counts for zero-filled pages can be highly confusing and
may mislead user-space agents that rely on changes in these counters as
indicators.  Therefore, we add a Fixes tag to encourage the inclusion of
this counter in any kernel versions with zeromap.

Many thanks to Kanchana for the contribution of changing
count_objcg_event() to count_objcg_events() to support large folios[1],
which has now been incorporated into this patch.

[1] https://lkml.kernel.org/r/20241001053222.6944-5-kanchana.p.sridhar@intel.com

Link: https://lkml.kernel.org/r/20241107011246.59137-1-21cnbao@gmail.com
Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap")
Co-developed-by: Kanchana P Sridhar &lt;kanchana.p.sridhar@intel.com&gt;
Signed-off-by: Barry Song &lt;v-songbaohua@oppo.com&gt;
Reviewed-by: Nhat Pham &lt;nphamcs@gmail.com&gt;
Reviewed-by: Chengming Zhou &lt;chengming.zhou@linux.dev&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Cc: Hailong Liu &lt;hailong.liu@oppo.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Kairui Song &lt;kasong@tencent.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: split underused THPs</title>
<updated>2024-09-09T23:39:04+00:00</updated>
<author>
<name>Usama Arif</name>
<email>usamaarif642@gmail.com</email>
</author>
<published>2024-08-30T10:03:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dafff3f4c850c98e15501bc6ee20581f9a013dcc'/>
<id>urn:sha1:dafff3f4c850c98e15501bc6ee20581f9a013dcc</id>
<content type='text'>
This is an attempt to mitigate the issue of running out of memory when THP
is always enabled.  During runtime whenever a THP is being faulted in
(__do_huge_pmd_anonymous_page) or collapsed by khugepaged
(collapse_huge_page), the THP is added to _deferred_list.  Whenever memory
reclaim happens in linux, the kernel runs the deferred_split shrinker
which goes through the _deferred_list.

If the folio was partially mapped, the shrinker attempts to split it.  If
the folio is not partially mapped, the shrinker checks if the THP was
underused, i.e.  how many of the base 4K pages of the entire THP were
zero-filled.  If this number goes above a certain threshold (decided by
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none), the
shrinker will attempt to split that THP.  Then at remap time, the pages
that were zero-filled are mapped to the shared zeropage, hence saving
memory.

Link: https://lkml.kernel.org/r/20240830100438.3623486-6-usamaarif642@gmail.com
Signed-off-by: Usama Arif &lt;usamaarif642@gmail.com&gt;
Suggested-by: Rik van Riel &lt;riel@surriel.com&gt;
Co-authored-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Alexander Zhu &lt;alexlzhu@fb.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Domenico Cerasuolo &lt;cerasuolodomenico@gmail.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Nico Pache &lt;npache@redhat.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Shuang Zhai &lt;zhais@google.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Shuang Zhai &lt;szhai2@cs.rochester.edu&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmstat: kernel stack usage histogram</title>
<updated>2024-09-02T03:25:49+00:00</updated>
<author>
<name>Pasha Tatashin</name>
<email>pasha.tatashin@soleen.com</email>
</author>
<published>2024-07-24T20:33:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c4a6fce85640beb4f79e8217272d1ef79839cc17'/>
<id>urn:sha1:c4a6fce85640beb4f79e8217272d1ef79839cc17</id>
<content type='text'>
As part of the dynamic kernel stack project, we need to know the amount of
data that can be saved by reducing the default kernel stack size [1].

Provide a kernel stack usage histogram to aid in optimizing kernel stack
sizes and minimizing memory waste in large-scale environments.  The
histogram divides stack usage into power-of-two buckets and reports the
results in /proc/vmstat.  This information is especially valuable in
environments with millions of machines, where even small optimizations can
have a significant impact.

The histogram data is presented in /proc/vmstat with entries like
"kstack_1k", "kstack_2k", and so on, indicating the number of threads that
exited with stack usage falling within each respective bucket.

Example outputs:
Intel:
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0

ARM with 64K page_size:
$ grep kstack /proc/vmstat
kstack_1k 1
kstack_2k 340
kstack_4k 25212
kstack_8k 1659
kstack_16k 0
kstack_32k 0
kstack_64k 0

Note: once the dynamic kernel stack is implemented it will depend on the
implementation the usability of this feature: On hardware that supports
faults on kernel stacks, we will have other metrics that show the total
number of pages allocated for stacks.  On hardware where faults are not
supported, we will most likely have some optimization where only some
threads are extended, and for those, these metrics will still be very
useful.

[1] https://lwn.net/Articles/974367

Link: https://lkml.kernel.org/r/20240730150158.832783-3-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240724203322.2765486-3-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Reviewed-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Domenico Cerasuolo &lt;cerasuolodomenico@gmail.com&gt;
Cc: Li Zhijian &lt;lizhijian@fujitsu.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add node_reclaim successes to VM event counters</title>
<updated>2024-09-02T03:25:43+00:00</updated>
<author>
<name>Matthew Cassell</name>
<email>mcassell411@gmail.com</email>
</author>
<published>2024-07-22T17:13:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5fe690a5946442f7f2d08448341dd621367f80bc'/>
<id>urn:sha1:5fe690a5946442f7f2d08448341dd621367f80bc</id>
<content type='text'>
/proc/vmstat currently shows the number of node_reclaim() failures when
vm.zone_reclaim_mode is set appropriately.  It would be convenient to have
the number of successes right next to zone_reclaim_failed (similar to
compaction and migration).

While just a trivially addition to the vmstat file.  It was helpful during
benchmarking to not have to probe node_reclaim() to observe the
success/failure ratio.

Link: https://lkml.kernel.org/r/20240722171316.7517-1-mcassell411@gmail.com
Signed-off-by: Matthew Cassell &lt;mcassell411@gmail.com&gt;
Cc: Domenico Cerasuolo &lt;cerasuolodomenico@gmail.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Li Zhijian &lt;lizhijian@fujitsu.com&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcg: add per-memcg zswap writeback stat</title>
<updated>2023-12-12T18:57:02+00:00</updated>
<author>
<name>Domenico Cerasuolo</name>
<email>cerasuolodomenico@gmail.com</email>
</author>
<published>2023-11-30T19:40:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7108cc3f765cafd48a6a35f8add140beaecfa75b'/>
<id>urn:sha1:7108cc3f765cafd48a6a35f8add140beaecfa75b</id>
<content type='text'>
Since zswap now writes back pages from memcg-specific LRUs, we now need a
new stat to show writebacks count for each memcg.

[nphamcs@gmail.com: rename ZSWP_WB to ZSWPWB]
  Link: https://lkml.kernel.org/r/20231205193307.2432803-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231130194023.4102148-5-nphamcs@gmail.com
Suggested-by: Nhat Pham &lt;nphamcs@gmail.com&gt;
Signed-off-by: Domenico Cerasuolo &lt;cerasuolodomenico@gmail.com&gt;
Signed-off-by: Nhat Pham &lt;nphamcs@gmail.com&gt;
Tested-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Reviewed-by: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Dan Streetman &lt;ddstreet@ieee.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Seth Jennings &lt;sjenning@redhat.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Cc: Vitaly Wool &lt;vitaly.wool@konsulko.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmstat: move pgdemote_* to per-node stats</title>
<updated>2023-12-11T00:51:31+00:00</updated>
<author>
<name>Li Zhijian</name>
<email>lizhijian@fujitsu.com</email>
</author>
<published>2023-11-03T03:14:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=23e9f0138963ceef2a252d887534923a0502b2da'/>
<id>urn:sha1:23e9f0138963ceef2a252d887534923a0502b2da</id>
<content type='text'>
Demotion will migrate pages across nodes.  Previously, only the global
demotion statistics were accounted for.  Changed them to per-node
statistics, making it easier to observe where demotion occurs on each
node.

This will help to identify which nodes are under pressure.

This patch also make pgdemote_* behind CONFIG_NUMA_BALANCING, since
demotion is not available for !CONFIG_NUMA_BALANCING

With this patch, here is a sample where node0 node1 are DRAM,
node3 is PMEM:
Global stats:
$ grep demote /proc/vmstat
pgdemote_kswapd 254288
pgdemote_direct 113497
pgdemote_khugepaged 0

Per-node stats:
$ grep demote /sys/devices/system/node/node0/vmstat # demotion source
pgdemote_kswapd 68454
pgdemote_direct 83431
pgdemote_khugepaged 0
$ grep demote /sys/devices/system/node/node1/vmstat # demotion source
pgdemote_kswapd 185834
pgdemote_direct 30066
pgdemote_khugepaged 0
$ grep demote /sys/devices/system/node/node3/vmstat # demotion target
pgdemote_kswapd 0
pgdemote_direct 0
pgdemote_khugepaged 0

Link: https://lkml.kernel.org/r/20231103031450.1456523-1-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian &lt;lizhijian@fujitsu.com&gt;
Acked-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: introduce per-VMA lock statistics</title>
<updated>2023-04-06T03:03:01+00:00</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2023-02-27T17:36:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=52f238653e452e0fda61e880f263a173d219acd1'/>
<id>urn:sha1:52f238653e452e0fda61e880f263a173d219acd1</id>
<content type='text'>
Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics
about handling page fault under VMA lock.

Link: https://lkml.kernel.org/r/20230227173632.3292573-29-surenb@google.com
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: vmscan: split khugepaged stats from direct reclaim stats</title>
<updated>2022-11-30T23:58:41+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2022-10-26T18:01:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=57e9cc50f4dd926d6c38751799d25cad89fb2bd9'/>
<id>urn:sha1:57e9cc50f4dd926d6c38751799d25cad89fb2bd9</id>
<content type='text'>
Direct reclaim stats are useful for identifying a potential source for
application latency, as well as spotting issues with kswapd.  However,
khugepaged currently distorts the picture: as a kernel thread it doesn't
impose allocation latencies on userspace, and it explicitly opts out of
kswapd reclaim.  Its activity showing up in the direct reclaim stats is
misleading.  Counting it as kswapd reclaim could also cause confusion when
trying to understand actual kswapd behavior.

Break out khugepaged from the direct reclaim counters into new
pgsteal_khugepaged, pgdemote_khugepaged, pgscan_khugepaged counters.

Test with a huge executable (CONFIG_READ_ONLY_THP_FOR_FS):

pgsteal_kswapd 1342185
pgsteal_direct 0
pgsteal_khugepaged 3623
pgscan_kswapd 1345025
pgscan_direct 0
pgscan_khugepaged 3623

Link: https://lkml.kernel.org/r/20221026180133.377671-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Eric Bergen &lt;ebergen@meta.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove vmacache</title>
<updated>2022-09-27T02:46:18+00:00</updated>
<author>
<name>Liam R. Howlett</name>
<email>Liam.Howlett@Oracle.com</email>
</author>
<published>2022-09-06T19:48:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7964cf8caa4dfa42c4149f3833d3878713cda3dc'/>
<id>urn:sha1:7964cf8caa4dfa42c4149f3833d3878713cda3dc</id>
<content type='text'>
By using the maple tree and the maple tree state, the vmacache is no
longer beneficial and is complicating the VMA code.  Remove the vmacache
to reduce the work in keeping it up to date and code complexity.

Link: https://lkml.kernel.org/r/20220906194824.2110408-26-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Tested-by: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add DEVICE_ZONE to FOR_ALL_ZONES</title>
<updated>2022-08-20T22:17:45+00:00</updated>
<author>
<name>Hao Lee</name>
<email>haolee.swjtu@gmail.com</email>
</author>
<published>2022-08-07T15:44:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a39c5d3ce03dd890ab6a9be44b21177cec32da55'/>
<id>urn:sha1:a39c5d3ce03dd890ab6a9be44b21177cec32da55</id>
<content type='text'>
FOR_ALL_ZONES should be consistent with enum zone_type.  Otherwise,
__count_zid_vm_events have the potential to add count to wrong item when
zid is ZONE_DEVICE.

Link: https://lkml.kernel.org/r/20220807154442.GA18167@haolee.io
Signed-off-by: Hao Lee &lt;haolee.swjtu@gmail.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
