<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/memcontrol.h, branch v4.14.217</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.14.217</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.14.217'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-01-09T12:37:37+00:00</updated>
<entry>
<title>mm: memcontrol: fix excessive complexity in memory.stat reporting</title>
<updated>2021-01-09T12:37:37+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2020-12-29T02:38:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aa93e9471afe103812badfd10be6509554312e72'/>
<id>urn:sha1:aa93e9471afe103812badfd10be6509554312e72</id>
<content type='text'>
commit a983b5ebee57209c99f68c8327072f25e0e6e3da upstream

We've seen memory.stat reads in top-level cgroups take up to fourteen
seconds during a userspace bug that created tens of thousands of ghost
cgroups pinned by lingering page cache.

Even with a more reasonable number of cgroups, aggregating memory.stat
is unnecessarily heavy.  The complexity is this:

	nr_cgroups * nr_stat_items * nr_possible_cpus

where the stat items are ~70 at this point.  With 128 cgroups and 128
CPUs - decent, not enormous setups - reading the top-level memory.stat
has to aggregate over a million per-cpu counters.  This doesn't scale.

Instead of spreading the source of truth across all CPUs, use the
per-cpu counters merely to batch updates to shared atomic counters.

This is the same as the per-cpu stocks we use for charging memory to the
shared atomic page_counters, and also the way the global vmstat counters
are implemented.

Vmstat has elaborate spilling thresholds that depend on the number of
CPUs, amount of memory, and memory pressure - carefully balancing the
cost of counter updates with the amount of per-cpu error.  That's
because the vmstat counters are system-wide, but also used for decisions
inside the kernel (e.g.  NR_FREE_PAGES in the allocator).  Neither is
true for the memory controller.

Use the same static batch size we already use for page_counter updates
during charging.  The per-cpu error in the stats will be 128k, which is
an acceptable ratio of cores to memory accounting granularity.

[hannes@cmpxchg.org: fix warning in __this_cpu_xchg() calls]
  Link: http://lkml.kernel.org/r/20171201135750.GB8097@cmpxchg.org
Link: http://lkml.kernel.org/r/20171103153336.24044-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: stable@vger.kernel.org
[shaoyi@amazon.com: resolved the conflict brought by commit 17ffa29c355658c8e9b19f56cbf0388500ca7905 in mm/memcontrol.c by contextual fix]
Signed-off-by: Shaoying Xu &lt;shaoyi@amazon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: implement lruvec stat functions on top of each other</title>
<updated>2021-01-09T12:37:37+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2020-12-29T02:35:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=955f8bc9eb69b3be9a7785015c726f7004ec36b9'/>
<id>urn:sha1:955f8bc9eb69b3be9a7785015c726f7004ec36b9</id>
<content type='text'>
commit 284542656e22c43fdada8c8cc0ca9ede8453eed7 upstream

The implementation of the lruvec stat functions and their variants for
accounting through a page, or accounting from a preemptible context, are
mostly identical and needlessly repetitive.

Implement the lruvec_page functions by looking up the page's lruvec and
then using the lruvec function.

Implement the functions for preemptible contexts by disabling preemption
before calling the atomic context functions.

Link: http://lkml.kernel.org/r/20171103153336.24044-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Shaoying Xu &lt;shaoyi@amazon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: eliminate raw access to stat and event counters</title>
<updated>2021-01-09T12:37:37+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2020-12-29T02:33:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fdcda71d87acfe8f645f3ce5e9303688cad8b8f1'/>
<id>urn:sha1:fdcda71d87acfe8f645f3ce5e9303688cad8b8f1</id>
<content type='text'>
commit c9019e9bf42e66d028d70d2da6206cad4dd9250d upstream

Replace all raw 'this_cpu_' modifications of the stat and event per-cpu
counters with API functions such as mod_memcg_state().

This makes the code easier to read, but is also in preparation for the
next patch, which changes the per-cpu implementation of those counters.

Link: http://lkml.kernel.org/r/20171103153336.24044-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Shaoying Xu &lt;shaoyi@amazon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: use int for event/state parameter in several functions</title>
<updated>2017-09-07T00:27:27+00:00</updated>
<author>
<name>Matthias Kaehlcke</name>
<email>mka@chromium.org</email>
</author>
<published>2017-09-06T23:22:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=04fecbf51b3cb79a628d50b65797c4866342b8d2'/>
<id>urn:sha1:04fecbf51b3cb79a628d50b65797c4866342b8d2</id>
<content type='text'>
Several functions use an enum type as parameter for an event/state, but
are called in some locations with an argument of a different enum type.
Adjust the interface of these functions to reality by changing the
parameter to int.

This fixes a ton of enum-conversion warnings that are generated when
building the kernel with clang.

[mka@chromium.org: also change parameter type of inc/dec/mod_memcg_page_state()]
  Link: http://lkml.kernel.org/r/20170728213442.93823-1-mka@chromium.org
Link: http://lkml.kernel.org/r/20170727211004.34435-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke &lt;mka@chromium.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Doug Anderson &lt;dianders@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()</title>
<updated>2017-08-18T22:32:01+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2017-08-18T22:15:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=739f79fc9db1b38f96b5a5109b247a650fbebf6d'/>
<id>urn:sha1:739f79fc9db1b38f96b5a5109b247a650fbebf6d</id>
<content type='text'>
Jaegeuk and Brad report a NULL pointer crash when writeback ending tries
to update the memcg stats:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
    IP: test_clear_page_writeback+0x12e/0x2c0
    [...]
    RIP: 0010:test_clear_page_writeback+0x12e/0x2c0
    Call Trace:
     &lt;IRQ&gt;
     end_page_writeback+0x47/0x70
     f2fs_write_end_io+0x76/0x180 [f2fs]
     bio_endio+0x9f/0x120
     blk_update_request+0xa8/0x2f0
     scsi_end_request+0x39/0x1d0
     scsi_io_completion+0x211/0x690
     scsi_finish_command+0xd9/0x120
     scsi_softirq_done+0x127/0x150
     __blk_mq_complete_request_remote+0x13/0x20
     flush_smp_call_function_queue+0x56/0x110
     generic_smp_call_function_single_interrupt+0x13/0x30
     smp_call_function_single_interrupt+0x27/0x40
     call_function_single_interrupt+0x89/0x90
    RIP: 0010:native_safe_halt+0x6/0x10

    (gdb) l *(test_clear_page_writeback+0x12e)
    0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619).
    614		mod_node_page_state(page_pgdat(page), idx, val);
    615		if (mem_cgroup_disabled() || !page-&gt;mem_cgroup)
    616			return;
    617		mod_memcg_state(page-&gt;mem_cgroup, idx, val);
    618		pn = page-&gt;mem_cgroup-&gt;nodeinfo[page_to_nid(page)];
    619		this_cpu_add(pn-&gt;lruvec_stat-&gt;count[idx], val);
    620	}
    621
    622	unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
    623							gfp_t gfp_mask,

The issue is that writeback doesn't hold a page reference and the page
might get freed after PG_writeback is cleared (and the mapping is
unlocked) in test_clear_page_writeback().  The stat functions looking up
the page's node or zone are safe, as those attributes are static across
allocation and free cycles.  But page-&gt;mem_cgroup is not, and it will
get cleared if we race with truncation or migration.

It appears this race window has been around for a while, but less likely
to trigger when the memcg stats were updated first thing after
PG_writeback is cleared.  Recent changes reshuffled this code to update
the global node stats before the memcg ones, though, stretching the race
window out to an extent where people can reproduce the problem.

Update test_clear_page_writeback() to look up and pin page-&gt;mem_cgroup
before clearing PG_writeback, then not use that pointer afterward.  It
is a partial revert of 62cccb8c8e7a ("mm: simplify lock_page_memcg()")
but leaves the pageref-holding callsites that aren't affected alone.

Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org
Fixes: 62cccb8c8e7a ("mm: simplify lock_page_memcg()")
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Tested-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Reported-by: Bradley Bolen &lt;bradleybolen@gmail.com&gt;
Tested-by: Brad Bolen &lt;bradleybolen@gmail.com&gt;
Cc: Vladimir Davydov &lt;vdavydov@virtuozzo.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.6+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: per-lruvec stats infrastructure</title>
<updated>2017-07-06T23:24:35+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2017-07-06T22:40:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=00f3ca2c2d6635d85108571c4dd9a29088668662'/>
<id>urn:sha1:00f3ca2c2d6635d85108571c4dd9a29088668662</id>
<content type='text'>
lruvecs are at the intersection of the NUMA node and memcg, which is the
scope for most paging activity.

Introduce a convenient accounting infrastructure that maintains
statistics per node, per memcg, and the lruvec itself.

Then convert over accounting sites for statistics that are already
tracked in both nodes and memcgs and can be easily switched.

[hannes@cmpxchg.org: fix crash in the new cgroup stat keeping code]
  Link: http://lkml.kernel.org/r/20170531171450.GA10481@cmpxchg.org
[hannes@cmpxchg.org: don't track uncharged pages at all
  Link: http://lkml.kernel.org/r/20170605175254.GA8547@cmpxchg.org
[hannes@cmpxchg.org: add missing free_percpu()]
  Link: http://lkml.kernel.org/r/20170605175354.GB8547@cmpxchg.org
[linux@roeck-us.net: hexagon: fix build error caused by include file order]
  Link: http://lkml.kernel.org/r/20170617153721.GA4382@roeck-us.net
Link: http://lkml.kernel.org/r/20170530181724.27197-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Acked-by: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: use generic mod_memcg_page_state for kmem pages</title>
<updated>2017-07-06T23:24:35+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2017-07-06T22:40:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ed52be7bfd45533b194b429f43361493d24599a7'/>
<id>urn:sha1:ed52be7bfd45533b194b429f43361493d24599a7</id>
<content type='text'>
The kmem-specific functions do the same thing.  Switch and drop.

Link: http://lkml.kernel.org/r/20170530181724.27197-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: use the node-native slab memory counters</title>
<updated>2017-07-06T23:24:35+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2017-07-06T22:40:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=320492961c1cf21da5547b00c23e525851c1d16f'/>
<id>urn:sha1:320492961c1cf21da5547b00c23e525851c1d16f</id>
<content type='text'>
Now that the slab counters are moved from the zone to the node level we
can drop the private memcg node stats and use the official ones.

Link: http://lkml.kernel.org/r/20170530181724.27197-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/oom_kill: count global and memory cgroup oom kills</title>
<updated>2017-07-06T23:24:35+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2017-07-06T22:40:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8e675f7af50747e1e9e96538e8706767e4f80e2c'/>
<id>urn:sha1:8e675f7af50747e1e9e96538e8706767e4f80e2c</id>
<content type='text'>
Show count of oom killer invocations in /proc/vmstat and count of
processes killed in memory cgroup in knob "memory.events" (in
memory.oom_control for v1 cgroup).

Also describe difference between "oom" and "oom_kill" in memory cgroup
documentation.  Currently oom in memory cgroup kills tasks iff shortage
has happened inside page fault.

These counters helps in monitoring oom kills - for now the only way is
grepping for magic words in kernel log.

[akpm@linux-foundation.org: fix for mem_cgroup_count_vm_event() rename]
[akpm@linux-foundation.org: fix comment, per Konstantin]
Link: http://lkml.kernel.org/r/149570810989.203600.9492483715840752937.stgit@buzz
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: Roman Guschin &lt;guroan@gmail.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: per-cgroup memory reclaim stats</title>
<updated>2017-07-06T23:24:35+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2017-07-06T22:40:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2262185c5b287f2758afda79c149b7cf6bee165c'/>
<id>urn:sha1:2262185c5b287f2758afda79c149b7cf6bee165c</id>
<content type='text'>
Track the following reclaim counters for every memory cgroup: PGREFILL,
PGSCAN, PGSTEAL, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE and PGLAZYFREED.

These values are exposed using the memory.stats interface of cgroup v2.

The meaning of each value is the same as for global counters, available
using /proc/vmstat.

Also, for consistency, rename mem_cgroup_count_vm_event() to
count_memcg_event_mm().

Link: http://lkml.kernel.org/r/1494530183-30808-1-git-send-email-guro@fb.com
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Suggested-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
