<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/percpu.c, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-11-24T09:30:10+00:00</updated>
<entry>
<title>mm, percpu: do not consider sleepable allocations atomic</title>
<updated>2025-11-24T09:30:10+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2025-11-17T09:30:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=94b01ef5186cbc61dc81b37b92b6f25effc21fef'/>
<id>urn:sha1:94b01ef5186cbc61dc81b37b92b6f25effc21fef</id>
<content type='text'>
[ Upstream commit 9a5b183941b52f84c0f9e5f27ce44e99318c9e0f ]

28307d938fb2 ("percpu: make pcpu_alloc() aware of current gfp context")
has fixed a reclaim recursion for scoped GFP_NOFS context.  It has done
that by avoiding taking pcpu_alloc_mutex.  This is a correct solution as
the worker context with full GFP_KERNEL allocation/reclaim power and which
is using the same lock cannot block the NOFS pcpu_alloc caller.

On the other hand this is a very conservative approach that could lead to
failures because pcpu_alloc lockless implementation is quite limited.

We have a bug report about premature failures when scsi array of 193
devices is scanned.  Sometimes (not consistently) the scanning aborts
because the iscsid daemon fails to create the queue for a random scsi
device during the scan.  iscsid itself is running with PR_SET_IO_FLUSHER
set so all allocations from this process context are GFP_NOIO.  This in
turn makes any pcpu_alloc lockless (without pcpu_alloc_mutex) which leads
to pre-mature failures.

It has turned out that iscsid has worked around this by dropping
PR_SET_IO_FLUSHER (https://github.com/open-iscsi/open-iscsi/pull/382) when
scanning host.  But we can do better in this case on the kernel side and
use pcpu_alloc_mutex for NOIO resp.  NOFS constrained allocation scopes
too.  We just need the WQ worker to never trigger IO/FS reclaim.  Achieve
that by enforcing scoped GFP_NOIO for the whole execution of
pcpu_balance_workfn (this will imply NOFS constrain as well).  This will
remove the dependency chain and preserve the full allocation power of the
pcpu_alloc call.

While at it make is_atomic really test for blockable allocations.

Link: https://lkml.kernel.org/r/20250206122633.167896-1-mhocko@kernel.org
Fixes: 28307d938fb2 ("percpu: make pcpu_alloc() aware of current gfp context")
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Filipe David Manana &lt;fdmanana@suse.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: chenxin &lt;chenxinxin@xiaomi.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: introduce and use {pgd,p4d}_populate_kernel()</title>
<updated>2025-09-19T14:32:01+00:00</updated>
<author>
<name>Harry Yoo</name>
<email>harry.yoo@oracle.com</email>
</author>
<published>2025-08-18T02:02:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d699575d00ffc0b92e8e7cbddb897cf751a60cda'/>
<id>urn:sha1:d699575d00ffc0b92e8e7cbddb897cf751a60cda</id>
<content type='text'>
commit f2d2f9598ebb0158a3fe17cda0106d7752e654a2 upstream.

Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
populating PGD and P4D entries for the kernel address space.  These
helpers ensure proper synchronization of page tables when updating the
kernel portion of top-level page tables.

Until now, the kernel has relied on each architecture to handle
synchronization of top-level page tables in an ad-hoc manner.  For
example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct
mapping and vmemmap mapping changes").

However, this approach has proven fragile for following reasons:

  1) It is easy to forget to perform the necessary page table
     synchronization when introducing new changes.
     For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
     savings for compound devmaps") overlooked the need to synchronize
     page tables for the vmemmap area.

  2) It is also easy to overlook that the vmemmap and direct mapping areas
     must not be accessed before explicit page table synchronization.
     For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
     sub-pmd ranges")) caused crashes by accessing the vmemmap area
     before calling sync_global_pgds().

To address this, as suggested by Dave Hansen, introduce _kernel() variants
of the page table population helpers, which invoke architecture-specific
hooks to properly synchronize page tables.  These are introduced in a new
header file, include/linux/pgalloc.h, so they can be called from common
code.

They reuse existing infrastructure for vmalloc and ioremap.
Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
and the actual synchronization is performed by
arch_sync_kernel_mappings().

This change currently targets only x86_64, so only PGD and P4D level
helpers are introduced.  Currently, these helpers are no-ops since no
architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.

In theory, PUD and PMD level helpers can be added later if needed by other
architectures.  For now, 32-bit architectures (x86-32 and arm) only handle
PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless
we introduce a PMD level helper.

[harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()]
  Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com
Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Suggested-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Kiryl Shutsemau &lt;kas@kernel.org&gt;
Reviewed-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: bibo mao &lt;maobibo@loongson.cn&gt;
Cc: Borislav Betkov &lt;bp@alien8.de&gt;
Cc: Christoph Lameter (Ampere) &lt;cl@gentwo.org&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Dmitriy Vyukov &lt;dvyukov@google.com&gt;
Cc: Gwan-gyeong Mun &lt;gwan-gyeong.mun@intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Cc: Thomas Huth &lt;thuth@redhat.com&gt;
Cc: "Uladzislau Rezki (Sony)" &lt;urezki@gmail.com&gt;
Cc: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
[ Adjust context ]
Signed-off-by: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: Introduce flush_cache_vmap_early()</title>
<updated>2024-02-16T18:10:52+00:00</updated>
<author>
<name>Alexandre Ghiti</name>
<email>alexghiti@rivosinc.com</email>
</author>
<published>2023-12-12T21:34:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c4a05cf0ed782588cb3a2d9b2f5b539027108d06'/>
<id>urn:sha1:c4a05cf0ed782588cb3a2d9b2f5b539027108d06</id>
<content type='text'>
[ Upstream commit 7a92fc8b4d20680e4c20289a670d8fca2d1f2c1b ]

The pcpu setup when using the page allocator sets up a new vmalloc
mapping very early in the boot process, so early that it cannot use the
flush_cache_vmap() function which may depend on structures not yet
initialized (for example in riscv, we currently send an IPI to flush
other cpus TLB).

But on some architectures, we must call flush_cache_vmap(): for example,
in riscv, some uarchs can cache invalid TLB entries so we need to flush
the new established mapping to avoid taking an exception.

So fix this by introducing a new function flush_cache_vmap_early() which
is called right after setting the new page table entry and before
accessing this new mapping. This new function implements a local flush
tlb on riscv and is no-op for other architectures (same as today).

Signed-off-by: Alexandre Ghiti &lt;alexghiti@rivosinc.com&gt;
Acked-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Dennis Zhou &lt;dennis@kernel.org&gt;
Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/percpu.c: print error message too if atomic alloc failed</title>
<updated>2023-08-25T15:04:59+00:00</updated>
<author>
<name>Baoquan He</name>
<email>bhe@redhat.com</email>
</author>
<published>2023-07-28T03:02:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f7d77dfc91f747f64cb00884fd6d7940c3b49fca'/>
<id>urn:sha1:f7d77dfc91f747f64cb00884fd6d7940c3b49fca</id>
<content type='text'>
The variable 'err' is assgigned to an error message if atomic alloc
failed, while it has no chance to be printed if is_atomic is true.

Here change to print error message too if atomic alloc failed, while
avoid to call dump_stack() if that case.

Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
Signed-off-by: Dennis Zhou &lt;dennis@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/percpu.c: optimize the code in pcpu_setup_first_chunk() a little bit</title>
<updated>2023-08-25T15:04:59+00:00</updated>
<author>
<name>Baoquan He</name>
<email>bhe@redhat.com</email>
</author>
<published>2023-07-22T01:14:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7ee1e758bebe13d96217bcfd5230892ed44760e7'/>
<id>urn:sha1:7ee1e758bebe13d96217bcfd5230892ed44760e7</id>
<content type='text'>
This removes the need of local varibale 'chunk', and optimize the code
calling pcpu_alloc_first_chunk() to initialize reserved chunk and
dynamic chunk to make it simpler.

Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
[Dennis: reworded first chunk init comment]
Signed-off-by: Dennis Zhou &lt;dennis@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/percpu.c: remove redundant check</title>
<updated>2023-08-25T15:04:59+00:00</updated>
<author>
<name>Baoquan He</name>
<email>bhe@redhat.com</email>
</author>
<published>2023-07-21T13:17:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5b672085e70c2ea40f4c9d6a23848079bf0ff700'/>
<id>urn:sha1:5b672085e70c2ea40f4c9d6a23848079bf0ff700</id>
<content type='text'>
The conditional check "(ai-&gt;dyn_size &lt; PERCPU_DYNAMIC_EARLY_SIZE) has
covered the check '(!ai-&gt;dyn_size)'.

Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
Signed-off-by: Dennis Zhou &lt;dennis@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/percpu: Remove some local variables in pcpu_populate_pte</title>
<updated>2023-08-25T15:04:59+00:00</updated>
<author>
<name>Bibo Mao</name>
<email>maobibo@loongson.cn</email>
</author>
<published>2023-07-12T03:16:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=41fd59b7f9bdde2a473450680411c2016017b992'/>
<id>urn:sha1:41fd59b7f9bdde2a473450680411c2016017b992</id>
<content type='text'>
In function pcpu_populate_pte there are already variable defined,
it can be reused for later use, here remove duplicated local
variables.

Signed-off-by: Bibo Mao &lt;maobibo@loongson.cn&gt;
Signed-off-by: Dennis Zhou &lt;dennis@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: rename memcg_kmem_enabled()</title>
<updated>2023-02-17T04:43:56+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>roman.gushchin@linux.dev</email>
</author>
<published>2023-02-13T19:29:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f7a449f779608efe1941a0e0c4bd7b5f57000be7'/>
<id>urn:sha1:f7a449f779608efe1941a0e0c4bd7b5f57000be7</id>
<content type='text'>
Currently there are two kmem-related helper functions with a confusing
semantics: memcg_kmem_enabled() and mem_cgroup_kmem_disabled().

The problem is that an obvious expectation
memcg_kmem_enabled() == !mem_cgroup_kmem_disabled(),
can be false.

mem_cgroup_kmem_disabled() is similar to mem_cgroup_disabled(): it returns
true only if CONFIG_MEMCG_KMEM is not set or the kmem accounting is
disabled using a boot time kernel option "cgroup.memory=nokmem".  It never
changes the value dynamically.

memcg_kmem_enabled() is different: it always returns false until the first
non-root memory cgroup will get online (assuming the kernel memory
accounting is enabled).  It's goal is to improve the performance on
systems without the cgroupfs mounted/memory controller enabled or on the
systems with only the root memory cgroup.

To make things more obvious and avoid potential bugs, let's rename
memcg_kmem_enabled() to memcg_kmem_online().

Link: https://lkml.kernel.org/r/20230213192922.1146370-1-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Acked-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/percpu.c: remove the lcm code since block size is fixed at page size</title>
<updated>2022-11-08T06:59:18+00:00</updated>
<author>
<name>Baoquan He</name>
<email>bhe@redhat.com</email>
</author>
<published>2022-10-24T08:14:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3289e0533e70aafa9fb6d128fd4452db1b8befe8'/>
<id>urn:sha1:3289e0533e70aafa9fb6d128fd4452db1b8befe8</id>
<content type='text'>
Since commit b239f7daf553 ("percpu: set PCPU_BITMAP_BLOCK_SIZE to
PAGE_SIZE"), the PCPU_BITMAP_BLOCK_SIZE has been set to page size
fixedly. So the lcm code in pcpu_alloc_first_chunk() doesn't make
sense any more, clean it up.

Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
Signed-off-by: Dennis Zhou &lt;dennis@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/percpu: replace the goto with break</title>
<updated>2022-11-08T06:59:15+00:00</updated>
<author>
<name>Baoquan He</name>
<email>bhe@redhat.com</email>
</author>
<published>2022-10-24T08:14:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=83d261fc9e5fb03e8c32e365ca4ee53952611a2b'/>
<id>urn:sha1:83d261fc9e5fb03e8c32e365ca4ee53952611a2b</id>
<content type='text'>
In function pcpu_reclaim_populated(), the line of goto jumping is
unnecessary since the label 'end_chunk' is near the end of the for
loop, use break instead.

Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
Signed-off-by: Dennis Zhou &lt;dennis@kernel.org&gt;
</content>
</entry>
</feed>
