<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/memcontrol.h, branch v5.10.89</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.89</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.89'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-08-26T12:35:57+00:00</updated>
<entry>
<title>mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim</title>
<updated>2021-08-26T12:35:57+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2021-08-20T02:04:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8132fc2bf4b70d19f666e82e7ce28dc871ed3b92'/>
<id>urn:sha1:8132fc2bf4b70d19f666e82e7ce28dc871ed3b92</id>
<content type='text'>
[ Upstream commit f56ce412a59d7d938b81de8878faef128812482c ]

We've noticed occasional OOM killing when memory.low settings are in
effect for cgroups.  This is unexpected and undesirable as memory.low is
supposed to express non-OOMing memory priorities between cgroups.

The reason for this is proportional memory.low reclaim.  When cgroups
are below their memory.low threshold, reclaim passes them over in the
first round, and then retries if it couldn't find pages anywhere else.
But when cgroups are slightly above their memory.low setting, page scan
force is scaled down and diminished in proportion to the overage, to the
point where it can cause reclaim to fail as well - only in that case we
currently don't retry, and instead trigger OOM.

To fix this, hook proportional reclaim into the same retry logic we have
in place for when cgroups are skipped entirely.  This way if reclaim
fails and some cgroups were scanned with diminished pressure, we'll try
another full-force cycle before giving up and OOMing.

[akpm@linux-foundation.org: coding-style fixes]

Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org
Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Leon Yang &lt;lnyng@fb.com&gt;
Reviewed-by: Rik van Riel &lt;riel@surriel.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Roman Gushchin &lt;guro@fb.com&gt;
Acked-by: Chris Down &lt;chris@chrisdown.name&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;		[5.4+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument</title>
<updated>2021-03-30T12:31:47+00:00</updated>
<author>
<name>Zhou Guanghui</name>
<email>zhouguanghui1@huawei.com</email>
</author>
<published>2021-03-13T05:08:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6143a1d193e9ecc18250516594655c5a6fbc3a7b'/>
<id>urn:sha1:6143a1d193e9ecc18250516594655c5a6fbc3a7b</id>
<content type='text'>
commit be6c8982e4ab9a41907555f601b711a7e2a17d4c upstream.

Rename mem_cgroup_split_huge_fixup to split_page_memcg and explicitly pass
in page number argument.

In this way, the interface name is more common and can be used by
potential users.  In addition, the complete info(memcg and flag) of the
memcg needs to be set to the tail pages.

Link: https://lkml.kernel.org/r/20210304074053.65527-2-zhouguanghui1@huawei.com
Signed-off-by: Zhou Guanghui &lt;zhouguanghui1@huawei.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: Hanjun Guo &lt;guohanjun@huawei.com&gt;
Cc: Tianhong Ding &lt;dingtianhong@huawei.com&gt;
Cc: Weilong Chen &lt;chenweilong@huawei.com&gt;
Cc: Rui Xiang &lt;rui.xiang@huawei.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcg: relayout structure mem_cgroup to avoid cache interference</title>
<updated>2020-11-26T17:35:49+00:00</updated>
<author>
<name>Feng Tang</name>
<email>feng.tang@intel.com</email>
</author>
<published>2020-11-25T05:22:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4df910620bebb5cfe234af16ac8f6474b60215fd'/>
<id>urn:sha1:4df910620bebb5cfe234af16ac8f6474b60215fd</id>
<content type='text'>
0day reported one -22.7% regression for will-it-scale page_fault2
case [1] on a 4 sockets 144 CPU platform, and bisected to it to be
caused by Waiman's optimization (commit bd0b230fe1) of saving one
'struct page_counter' space for 'struct mem_cgroup'.

Initially we thought it was due to the cache alignment change introduced
by the patch, but further debug shows that it is due to some hot data
members ('vmstats_local', 'vmstats_percpu', 'vmstats') sit in 2 adjacent
cacheline (2N and 2N+1 cacheline), and when adjacent cache line prefetch
is enabled, it triggers an "extended level" of cache false sharing for
2 adjacent cache lines.

So exchange the 2 member blocks, while keeping mostly the original
cache alignment, which can restore and even enhance the performance,
and save 64 bytes of space for 'struct mem_cgroup' (from 2880 to 2816,
with 0day's default RHEL-8.3 kernel config)

[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/

Fixes: bd0b230fe145 ("mm/memcg: unify swap and memsw page counters")
Reported-by: kernel test robot &lt;rong.a.chen@intel.com&gt;
Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: fix missing wakeup polling thread</title>
<updated>2020-11-14T19:26:04+00:00</updated>
<author>
<name>Muchun Song</name>
<email>songmuchun@bytedance.com</email>
</author>
<published>2020-11-14T06:52:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8b21ca0218d29cc6bb7028125c7e5a10dfb4730c'/>
<id>urn:sha1:8b21ca0218d29cc6bb7028125c7e5a10dfb4730c</id>
<content type='text'>
When we poll the swap.events, we can miss being woken up when the swap
event occurs.  Because we didn't notify.

Fixes: f3a53a3a1e5b ("mm, memcontrol: implement memory.swap.events")
Signed-off-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Cc: Chris Down &lt;chris@chrisdown.name&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lkml.kernel.org/r/20201105161936.98312-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: kmem: enable kernel memcg accounting from interrupt contexts</title>
<updated>2020-10-18T16:27:09+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-10-17T23:13:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4127c6504f25c4fcff52dc996efda2ef859dd661'/>
<id>urn:sha1:4127c6504f25c4fcff52dc996efda2ef859dd661</id>
<content type='text'>
If a memcg to charge can be determined (using remote charging API), there
are no reasons to exclude allocations made from an interrupt context from
the accounting.

Such allocations will pass even if the resulting memcg size will exceed
the hard limit, but it will affect the application of the memory pressure
and an inability to put the workload under the limit will eventually
trigger the OOM.

To use active_memcg() helper, memcg_kmem_bypass() is moved back to
memcontrol.c.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Link: http://lkml.kernel.org/r/20200827225843.1270629-5-guro@fb.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memcg: unify swap and memsw page counters</title>
<updated>2020-10-14T01:38:30+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2020-10-13T23:52:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bd0b230fe14554bfffbae54e19038716f96f5a41'/>
<id>urn:sha1:bd0b230fe14554bfffbae54e19038716f96f5a41</id>
<content type='text'>
The swap page counter is v2 only while memsw is v1 only.  As v1 and v2
controllers cannot be active at the same time, there is no point to keep
both swap and memsw page counters in mem_cgroup.  The previous patch has
made sure that memsw page counter is updated and accessed only when in v1
code paths.  So it is now safe to alias the v1 memsw page counter to v2
swap page counter.  This saves 14 long's in the size of mem_cgroup.  This
is a saving of 112 bytes for 64-bit archs.

While at it, also document which page counters are used in v1 and/or v2.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Chris Down &lt;chris@chrisdown.name&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Link: https://lkml.kernel.org/r/20200914024452.19167-4-longman@redhat.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memcontrol: fix a data race in scan count</title>
<updated>2020-08-15T02:56:57+00:00</updated>
<author>
<name>Qian Cai</name>
<email>cai@lca.pw</email>
</author>
<published>2020-08-15T00:31:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e0e3f42fd96cf830c61aa720f0abb4eef41c5ce3'/>
<id>urn:sha1:e0e3f42fd96cf830c61aa720f0abb4eef41c5ce3</id>
<content type='text'>
struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
accessed concurrently as noticed by KCSAN,

 BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size

 write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
  mem_cgroup_update_lru_size+0x11c/0x1d0
  mem_cgroup_update_lru_size at mm/memcontrol.c:1266
  isolate_lru_pages+0x6a9/0xf30
  shrink_active_list+0x123/0xcc0
  shrink_lruvec+0x8fd/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
  lruvec_lru_size+0xbb/0x270
  mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
  (inlined by) lruvec_lru_size at mm/vmscan.c:326
  shrink_lruvec+0x1d0/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_current+0xa6/0x120
  alloc_slab_page+0x3b1/0x540
  allocate_slab+0x70/0x660
  new_slab+0x46/0x70
  ___slab_alloc+0x4ad/0x7d0
  __slab_alloc+0x43/0x70
  kmem_cache_alloc+0x2c3/0x420
  getname_flags+0x4c/0x230
  getname+0x22/0x30
  do_sys_openat2+0x205/0x3b0
  do_sys_open+0x9a/0xf0
  __x64_sys_openat+0x62/0x80
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The write is under lru_lock, but the read is done as lockless.  The scan
count is used to determine how aggressively the anon and file LRU lists
should be scanned.  Load tearing could generate an inefficient heuristic,
so fix it by adding READ_ONCE() for the read.

Signed-off-by: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Link: http://lkml.kernel.org/r/20200206034945.2481-1-cai@lca.pw
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>include/linux/memcontrol.h: drop duplicate word and fix spello</title>
<updated>2020-08-12T17:57:57+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2020-08-12T01:32:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0845f83122d93ae3bafa7ea10209de2148a3b9bc'/>
<id>urn:sha1:0845f83122d93ae3bafa7ea10209de2148a3b9bc</id>
<content type='text'>
Drop the doubled word "for" in a comment.
Fix spello of "incremented".

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Chris Down &lt;chris@chrisdown.name&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Link: http://lkml.kernel.org/r/b04aa2e4-7c95-12f0-599d-43d07fb28134@infradead.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcg/percpu: per-memcg percpu memory statistics</title>
<updated>2020-08-12T17:57:55+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-08-12T01:30:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=772616b031f06e05846488b01dab46a7c832da13'/>
<id>urn:sha1:772616b031f06e05846488b01dab46a7c832da13</id>
<content type='text'>
Percpu memory can represent a noticeable chunk of the total memory
consumption, especially on big machines with many CPUs.  Let's track
percpu memory usage for each memcg and display it in memory.stat.

A percpu allocation is usually scattered over multiple pages (and nodes),
and can be significantly smaller than a page.  So let's add a byte-sized
counter on the memcg level: MEMCG_PERCPU_B.  Byte-sized vmstat infra
created for slabs can be perfectly reused for percpu case.

[guro@fb.com: v3]
  Link: http://lkml.kernel.org/r/20200623184515.4132564-4-guro@fb.com

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Dennis Zhou &lt;dennis@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Tobin C. Harding &lt;tobin@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Bixuan Cui &lt;cuibixuan@huawei.com&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Link: http://lkml.kernel.org/r/20200608230819.832349-4-guro@fb.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, memcg: decouple e{low,min} state mutations from protection checks</title>
<updated>2020-08-07T18:33:25+00:00</updated>
<author>
<name>Chris Down</name>
<email>chris@chrisdown.name</email>
</author>
<published>2020-08-07T06:22:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=45c7f7e1ef17f09fe70bad4b705ce43772153fd7'/>
<id>urn:sha1:45c7f7e1ef17f09fe70bad4b705ce43772153fd7</id>
<content type='text'>
mem_cgroup_protected currently is both used to set effective low and min
and return a mem_cgroup_protection based on the result.  As a user, this
can be a little unexpected: it appears to be a simple predicate function,
if not for the big warning in the comment above about the order in which
it must be executed.

This change makes it so that we separate the state mutations from the
actual protection checks, which makes it more obvious where we need to be
careful mutating internal state, and where we are simply checking and
don't need to worry about that.

[mhocko@suse.com - don't check protection on root memcgs]

Suggested-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Chris Down &lt;chris@chrisdown.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Link: http://lkml.kernel.org/r/ff3f915097fcee9f6d7041c084ef92d16aaeb56a.1594638158.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
