<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/cpuset.h, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-08-30T20:00:16+00:00</updated>
<entry>
<title>cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1</title>
<updated>2024-08-30T20:00:16+00:00</updated>
<author>
<name>Chen Ridong</name>
<email>chenridong@huawei.com</email>
</author>
<published>2024-08-30T10:02:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1abab1ba0775036bb67c6c57945c637be644c04f'/>
<id>urn:sha1:1abab1ba0775036bb67c6c57945c637be644c04f</id>
<content type='text'>
This patch introduces CONFIG_CPUSETS_V1 and guard cpuset-v1 code under
CONFIG_CPUSETS_V1. The default value of CONFIG_CPUSETS_V1 is N, so that
user who adopted v2 don't have 'pay' for cpuset v1.

Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Remove cpuset_slab_spread_rotor</title>
<updated>2024-07-30T22:29:08+00:00</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2024-07-13T08:59:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c149c4a48b19afbf0c383614e57b452d39b154de'/>
<id>urn:sha1:c149c4a48b19afbf0c383614e57b452d39b154de</id>
<content type='text'>
Since the SLAB implementation was removed in v6.8, so the
cpuset_slab_spread_rotor is no longer used and can be removed.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Make cpuset hotplug processing synchronous</title>
<updated>2024-04-08T17:39:16+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2024-04-04T13:47:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2125c0034c5dfd61171b494bd309bb7637bff6eb'/>
<id>urn:sha1:2125c0034c5dfd61171b494bd309bb7637bff6eb</id>
<content type='text'>
Since commit 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside
get_online_cpus()"), cpuset hotplug was done asynchronously via a work
function. This is to avoid recursive locking of cgroup_mutex.

Since then, the cgroup locking scheme has changed quite a bit. A
cpuset_mutex was introduced to protect cpuset specific operations.
The cpuset_mutex is then replaced by a cpuset_rwsem. With commit
d74b27d63a8b ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock
order"), cpu_hotplug_lock is acquired before cpuset_rwsem. Later on,
cpuset_rwsem is reverted back to cpuset_mutex. All these locking changes
allow the hotplug code to call into cpuset core directly.

The following commits were also merged due to the asynchronous nature
of cpuset hotplug processing.

  - commit b22afcdf04c9 ("cpu/hotplug: Cure the cpusets trainwreck")
  - commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume
    bugs")
  - commit 28b89b9e6f7b ("cpuset: handle race between CPU hotplug and
    cpuset_hotplug_work")

Clean up all these bandages by making cpuset hotplug
processing synchronous again with the exception that the call to
cgroup_transfer_tasks() to transfer tasks out of an empty cgroup v1
cpuset, if necessary, will still be done via a work function due to the
existing cgroup_mutex -&gt; cpu_hotplug_lock dependency. It is possible
to reverse that dependency, but that will require updating a number of
different cgroup controllers. This special hotplug code path should be
rarely taken anyway.

As all the cpuset states will be updated by the end of the hotplug
operation, we can revert most the above commits except commit
50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs")
which is partially reverted.  Also removing some cpus_read_lock trylock
attempts in the cpuset partition code as they are no longer necessary
since the cpu_hotplug_lock is now held for the whole duration of the
cpuset hotplug code path.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Tested-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Remove cpuset_do_slab_mem_spread()</title>
<updated>2024-02-29T20:25:07+00:00</updated>
<author>
<name>Xiongwei Song</name>
<email>xiongwei.song@windriver.com</email>
</author>
<published>2024-02-29T14:20:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=61a182ab61a6dbb8a8782d52347168dc2aa0aa0a'/>
<id>urn:sha1:61a182ab61a6dbb8a8782d52347168dc2aa0aa0a</id>
<content type='text'>
The SLAB allocator has been removed sine 6.8-rc1 [1], so there is no user
with SLAB_MEM_SPREAD and cpuset_do_slab_mem_spread(). Then SLAB_MEM_SPREAD
is marked as unused by [2]. Here we can remove
cpuset_do_slab_mem_spread(). For more details, please check [3].

[1] https://lore.kernel.org/linux-mm/20231120-slab-remove-slab-v2-0-9c9c70177183@suse.cz/
[2] https://lore.kernel.org/linux-kernel/20240223-slab-cleanup-flags-v2-0-02f1753e8303@suse.cz/T/
[3] https://lore.kernel.org/lkml/32bc1403-49da-445a-8c00-9686a3b0d6a3@redhat.com/T/#mf14b838c5e0e77f4756d436bac3d8c0447ea4350

Signed-off-by: Xiongwei Song &lt;xiongwei.song@windriver.com&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Include isolated cpuset CPUs in cpu_is_isolated() check</title>
<updated>2023-12-06T19:37:28+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2023-12-05T22:21:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3232e7aad11e541da86bbb1fa5ea5737b30bd006'/>
<id>urn:sha1:3232e7aad11e541da86bbb1fa5ea5737b30bd006</id>
<content type='text'>
Currently, the cpu_is_isolated() function checks only the statically
isolated CPUs specified via the "isolcpus" and "nohz_full" kernel
command line options. This function is used by vmstat and memcg to
reduce interference with isolated CPUs by not doing stat flushing
or scheduling works on those CPUs.

Workloads running on isolated CPUs within isolated cpuset
partitions should receive the same treatment to reduce unnecessary
interference. This patch introduces a new cpuset_cpu_is_isolated()
function to be called by cpu_is_isolated() so that the set of dynamically
created cpuset isolated CPUs will be included in the check.

Assuming that testing a bit in a cpumask is atomic, no synchronization
primitive is currently used to synchronize access to the cpuset's
isolated_cpus mask.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets</title>
<updated>2023-05-08T23:22:33+00:00</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2023-05-08T07:58:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6c24849f5515e4966d94fa5279bdff4acf2e9489'/>
<id>urn:sha1:6c24849f5515e4966d94fa5279bdff4acf2e9489</id>
<content type='text'>
Qais reported that iterating over all tasks when rebuilding root domains
for finding out which ones are DEADLINE and need their bandwidth
correctly restored on such root domains can be a costly operation (10+
ms delays on suspend-resume).

To fix the problem keep track of the number of DEADLINE tasks belonging
to each cpuset and then use this information (followup patch) to only
perform the above iteration if DEADLINE tasks are actually present in
the cpuset for which a corresponding root domain is being rebuilt.

Reported-by: Qais Yousef &lt;qyousef@layalina.io&gt;
Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/cpuset: Bring back cpuset_mutex</title>
<updated>2023-05-08T23:22:33+00:00</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2023-05-08T07:58:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=111cd11bbc54850f24191c52ff217da88a5e639b'/>
<id>urn:sha1:111cd11bbc54850f24191c52ff217da88a5e639b</id>
<content type='text'>
Turns out percpu_cpuset_rwsem - commit 1243dc518c9d ("cgroup/cpuset:
Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea,
as it has been reported to cause slowdowns in workloads that need to
change cpuset configuration frequently and it is also not implementing
priority inheritance (which causes troubles with realtime workloads).

Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it
only for SCHED_DEADLINE tasks (other policies don't care about stable
cpusets anyway).

Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cpuset: Clean up cpuset_node_allowed</title>
<updated>2023-03-24T02:02:27+00:00</updated>
<author>
<name>Haifeng Xu</name>
<email>haifeng.xu@shopee.com</email>
</author>
<published>2023-02-28T08:35:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8e4645226b4931e96d55546a1fb3863aa50b5e62'/>
<id>urn:sha1:8e4645226b4931e96d55546a1fb3863aa50b5e62</id>
<content type='text'>
Commit 002f290627c2 ("cpuset: use static key better and convert to new API")
has used __cpuset_node_allowed() instead of cpuset_node_allowed() to check
whether we can allocate on a memory node. Now this function isn't used by
anyone, so we can do the follow things to clean up it.

1. remove unused codes
2. rename __cpuset_node_allowed() to cpuset_node_allowed()
3. update comments in mm/page_alloc.c

Suggested-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Haifeng Xu &lt;haifeng.xu@shopee.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc: detect allocation forbidden by cpuset and bail out early</title>
<updated>2021-11-06T20:30:38+00:00</updated>
<author>
<name>Feng Tang</name>
<email>feng.tang@intel.com</email>
</author>
<published>2021-11-05T20:40:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ca1b5a49885f0c0c486544da46a9e0ac790831d'/>
<id>urn:sha1:8ca1b5a49885f0c0c486544da46a9e0ac790831d</id>
<content type='text'>
There was a report that starting an Ubuntu in docker while using cpuset
to bind it to movable nodes (a node only has movable zone, like a node
for hotplug or a Persistent Memory node in normal usage) will fail due
to memory allocation failure, and then OOM is involved and many other
innocent processes got killed.

It can be reproduced with command:

    $ docker run -it --rm --cpuset-mems 4 ubuntu:latest bash -c "grep Mems_allowed /proc/self/status"

(where node 4 is a movable node)

  runc:[2:INIT] invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=0
  CPU: 8 PID: 8291 Comm: runc:[2:INIT] Tainted: G        W I E     5.8.2-0.g71b519a-default #1 openSUSE Tumbleweed (unreleased)
  Hardware name: Dell Inc. PowerEdge R640/0PHYDR, BIOS 2.6.4 04/09/2020
  Call Trace:
   dump_stack+0x6b/0x88
   dump_header+0x4a/0x1e2
   oom_kill_process.cold+0xb/0x10
   out_of_memory.part.0+0xaf/0x230
   out_of_memory+0x3d/0x80
   __alloc_pages_slowpath.constprop.0+0x954/0xa20
   __alloc_pages_nodemask+0x2d3/0x300
   pipe_write+0x322/0x590
   new_sync_write+0x196/0x1b0
   vfs_write+0x1c3/0x1f0
   ksys_write+0xa7/0xe0
   do_syscall_64+0x52/0xd0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Mem-Info:
  active_anon:392832 inactive_anon:182 isolated_anon:0
   active_file:68130 inactive_file:151527 isolated_file:0
   unevictable:2701 dirty:0 writeback:7
   slab_reclaimable:51418 slab_unreclaimable:116300
   mapped:45825 shmem:735 pagetables:2540 bounce:0
   free:159849484 free_pcp:73 free_cma:0
  Node 4 active_anon:1448kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB all_unreclaimable? no
  Node 4 Movable free:130021408kB min:9140kB low:139160kB high:269180kB reserved_highatomic:0KB active_anon:1448kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:130023424kB managed:130023424kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:292kB local_pcp:84kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0 0
  Node 4 Movable: 1*4kB (M) 0*8kB 0*16kB 1*32kB (M) 0*64kB 0*128kB 1*256kB (M) 1*512kB (M) 1*1024kB (M) 0*2048kB 31743*4096kB (M) = 130021156kB

  oom-kill:constraint=CONSTRAINT_CPUSET,nodemask=(null),cpuset=docker-9976a269caec812c134fa317f27487ee36e1129beba7278a463dd53e5fb9997b.scope,mems_allowed=4,global_oom,task_memcg=/system.slice/containerd.service,task=containerd,pid=4100,uid=0
  Out of memory: Killed process 4100 (containerd) total-vm:4077036kB, anon-rss:51184kB, file-rss:26016kB, shmem-rss:0kB, UID:0 pgtables:676kB oom_score_adj:0
  oom_reaper: reaped process 8248 (docker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
  oom_reaper: reaped process 2054 (node_exporter), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
  oom_reaper: reaped process 1452 (systemd-journal), now anon-rss:0kB, file-rss:8564kB, shmem-rss:4kB
  oom_reaper: reaped process 2146 (munin-node), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
  oom_reaper: reaped process 8291 (runc:[2:INIT]), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The reason is that in this case, the target cpuset nodes only have
movable zone, while the creation of an OS in docker sometimes needs to
allocate memory in non-movable zones (dma/dma32/normal) like
GFP_HIGHUSER, and the cpuset limit forbids the allocation, then
out-of-memory killing is involved even when normal nodes and movable
nodes both have many free memory.

The OOM killer cannot help to resolve the situation as there is no
usable memory for the request in the cpuset scope.  The only reasonable
measure to take is to fail the allocation right away and have the caller
to deal with it.

So add a check for cases like this in the slowpath of allocation, and
bail out early returning NULL for the allocation.

As page allocation is one of the hottest path in kernel, this check will
hurt all users with sane cpuset configuration, add a static branch check
and detect the abnormal config in cpuset memory binding setup so that
the extra check cost in page allocation is not paid by everyone.

[thanks to Micho Hocko and David Rientjes for suggesting not handling
 it inside OOM code, adding cpuset check, refining comments]

Link: https://lkml.kernel.org/r/1632481657-68112-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Suggested-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Zefan Li &lt;lizefan.x@bytedance.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()</title>
<updated>2021-08-20T10:32:59+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will@kernel.org</email>
</author>
<published>2021-07-30T11:24:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=97c0054dbe2c3c59d1156fd233f2d44e91981c8e'/>
<id>urn:sha1:97c0054dbe2c3c59d1156fd233f2d44e91981c8e</id>
<content type='text'>
select_fallback_rq() only needs to recheck for an allowed CPU if the
affinity mask of the task has changed since the last check.

Return a 'bool' from cpuset_cpus_allowed_fallback() to indicate whether
the affinity mask was updated, and use this to elide the allowed check
when the mask has been left alone.

No functional change.

Suggested-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Link: https://lore.kernel.org/r/20210730112443.23245-5-will@kernel.org
</content>
</entry>
</feed>
