<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/cgroup-defs.h, branch v7.0-rc7</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-24T20:21:40+00:00</updated>
<entry>
<title>cgroup: Wait for dying tasks to leave on rmdir</title>
<updated>2026-03-24T20:21:40+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-24T20:21:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1b164b876c36c3eb5561dd9b37702b04401b0166'/>
<id>urn:sha1:1b164b876c36c3eb5561dd9b37702b04401b0166</id>
<content type='text'>
a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") hid PF_EXITING
tasks from cgroup.procs so that systemd doesn't see tasks that have already
been reaped via waitpid(). However, the populated counter (nr_populated_csets)
is only decremented when the task later passes through cgroup_task_dead() in
finish_task_switch(). This means cgroup.procs can appear empty while the
cgroup is still populated, causing rmdir to fail with -EBUSY.

Fix this by making cgroup_rmdir() wait for dying tasks to fully leave. If the
cgroup is populated but all remaining tasks have PF_EXITING set (the task
iterator returns none due to the existing filter), wait for a kick from
cgroup_task_dead() and retry. The wait is brief as tasks are removed from the
cgroup's css_set between PF_EXITING assertion in do_exit() and
cgroup_task_dead() in finish_task_switch().

v2: cgroup_is_populated() true to false transition happens under css_set_lock
    not cgroup_mutex, so retest under css_set_lock before sleeping to avoid
    missed wakeups (Sebastian).

Fixes: a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202603222104.2c81684e-lkp@intel.com
Reported-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Bert Karwatzki &lt;spasswolf@web.de&gt;
Cc: Michal Koutny &lt;mkoutny@suse.com&gt;
Cc: cgroups@vger.kernel.org
</content>
</entry>
<entry>
<title>Merge tag 'cgroup-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2026-02-11T21:20:50+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-11T21:20:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ff661eeee26038f15ed9dd33c91809632e11d9eb'/>
<id>urn:sha1:ff661eeee26038f15ed9dd33c91809632e11d9eb</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - cpuset changes:

    - Continue separating v1 and v2 implementations by moving more
      v1-specific logic into cpuset-v1.c

    - Improve partition handling. Sibling partitions are no longer
      invalidated on cpuset.cpus conflict, cpuset.cpus changes no longer
      fail in v2, and effective_xcpus computation is made consistent

    - Fix partition effective CPUs overlap that caused a warning on
      cpuset removal when sibling partitions shared CPUs

 - Increase the maximum cgroup subsystem count from 16 to 32 to
   accommodate future subsystem additions

 - Misc cleanups and selftest improvements including switching to
   css_is_online() helper, removing dead code and stale documentation
   references, using lockdep_assert_cpuset_lock_held() consistently, and
   adding polling helpers for asynchronously updated cgroup statistics

* tag 'cgroup-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cpuset: fix overlap of partition effective CPUs
  cgroup: increase maximum subsystem count from 16 to 32
  cgroup: Remove stale cpu.rt.max reference from documentation
  cpuset: replace direct lockdep_assert_held() with lockdep_assert_cpuset_lock_held()
  cgroup/cpuset: Move the v1 empty cpus/mems check to cpuset1_validate_change()
  cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
  cgroup/cpuset: Don't fail cpuset.cpus change in v2
  cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
  cgroup/cpuset: Streamline rm_siblings_excl_cpus()
  cpuset: remove dead code in cpuset-v1.c
  cpuset: remove v1-specific code from generate_sched_domains
  cpuset: separate generate_sched_domains for v1 and v2
  cpuset: move update_domain_attr_tree to cpuset_v1.c
  cpuset: add cpuset1_init helper for v1 initialization
  cpuset: add cpuset1_online_css helper for v1-specific operations
  cpuset: add lockdep_assert_cpuset_lock_held helper
  cpuset: Remove unnecessary checks in rebuild_sched_domains_locked
  cgroup: switch to css_is_online() helper
  selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants
  selftests: cgroup: make test_memcg_sock robust against delayed sock stats
  ...
</content>
</entry>
<entry>
<title>cgroup: increase maximum subsystem count from 16 to 32</title>
<updated>2026-02-01T16:34:15+00:00</updated>
<author>
<name>Chen Ridong</name>
<email>chenridong@huawei.com</email>
</author>
<published>2026-01-31T03:05:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5eab8c588bf37b7eb498f23a2ac3fb135c258e17'/>
<id>urn:sha1:5eab8c588bf37b7eb498f23a2ac3fb135c258e17</id>
<content type='text'>
The current cgroup subsystem limit of 16 is insufficient, as the number of
existing subsystems has already reached this limit. When adding a new
subsystem that is not yet in the mainline kernel, building with
`make allmodconfig` requires first bypassing the
`BUILD_BUG_ON(CGROUP_SUBSYS_COUNT &gt; 16)` restriction to allow compilation
to succeed. However, the kernel still fails to boot afterward.

This patch increases the maximum number of supported cgroup subsystems from
16 to 32, providing enough room for future subsystem additions.

Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Tested-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Acked-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Eliminate cgrp_ancestor_storage in cgroup_root</title>
<updated>2026-01-08T01:11:03+00:00</updated>
<author>
<name>Michal Koutný</name>
<email>mkoutny@suse.com</email>
</author>
<published>2026-01-07T16:59:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ef56578274d2b98423c8ef82bb450223f5811b59'/>
<id>urn:sha1:ef56578274d2b98423c8ef82bb450223f5811b59</id>
<content type='text'>
The cgrp_ancestor_storage has two drawbacks:
- it's not guaranteed that the member immediately follows struct cgrp in
  cgroup_root (root cgroup's ancestors[0] might thus point to a padding
  and not in cgrp_ancestor_storage proper),
- this idiom raises warnings with -Wflex-array-member-not-at-end.

Instead of relying on the auxiliary member in cgroup_root, define the
0-th level ancestor inside struct cgroup (needed for static allocation
of cgrp_dfl_root), deeper cgroups would allocate flexible
_low_ancestors[].  Unionized alias through ancestors[] will
transparently join the two ranges.

The above change would still leave the flexible array at the end of
struct cgroup inside cgroup_root, so move cgrp also towards the end of
cgroup_root to resolve the -Wflex-array-member-not-at-end.

Link: https://lore.kernel.org/r/5fb74444-2fbb-476e-b1bf-3f3e279d0ced@embeddedor.com/
Reported-by: Gustavo A. R. Silva &lt;gustavo@embeddedor.com&gt;
Closes: https://lore.kernel.org/r/b3eb050d-9451-4b60-b06c-ace7dab57497@embeddedor.com/
Cc: David Laight &lt;david.laight.linux@gmail.com&gt;
Acked-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Signed-off-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Fix seqcount lockdep assertion in cgroup freezer</title>
<updated>2025-10-03T14:30:28+00:00</updated>
<author>
<name>Nirbhay Sharma</name>
<email>nirbhay.lkd@gmail.com</email>
</author>
<published>2025-10-03T11:45:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=93a4b36ef3cf4ce5e6a7e7a7686181de76e246a1'/>
<id>urn:sha1:93a4b36ef3cf4ce5e6a7e7a7686181de76e246a1</id>
<content type='text'>
The commit afa3701c0e45 ("cgroup: cgroup.stat.local time accounting")
introduced a seqcount to track freeze timing but initialized it as a
plain seqcount_t using seqcount_init().

However, the write-side critical section in cgroup_do_freeze() holds
the css_set_lock spinlock while calling write_seqcount_begin(). On
PREEMPT_RT kernels, spinlocks do not disable preemption, causing the
lockdep assertion for a plain seqcount_t, which checks for preemption
being disabled, to fail.

This triggers the following warning:
  WARNING: CPU: 0 PID: 9692 at include/linux/seqlock.h:221

Fix this by changing the type to seqcount_spinlock_t and initializing
it with seqcount_spinlock_init() to associate css_set_lock with the
seqcount. This allows lockdep to correctly validate that the spinlock
is held during write operations, resolving the assertion failure on all
kernel configurations.

Reported-by: syzbot+27a2519eb4dad86d0156@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=27a2519eb4dad86d0156
Fixes: afa3701c0e45 ("cgroup: cgroup.stat.local time accounting")
Signed-off-by: Nirbhay Sharma &lt;nirbhay.lkd@gmail.com&gt;
Link: https://lore.kernel.org/r/20251002165510.KtY3IT--@linutronix.de/
Acked-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: replace global percpu_rwsem with per threadgroup resem when writing to cgroup.procs</title>
<updated>2025-09-10T17:44:51+00:00</updated>
<author>
<name>Yi Tao</name>
<email>escape@linux.alibaba.com</email>
</author>
<published>2025-09-10T06:59:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0568f89d4fb82d98001baeb870e92f43cd1f7317'/>
<id>urn:sha1:0568f89d4fb82d98001baeb870e92f43cd1f7317</id>
<content type='text'>
The static usage pattern of creating a cgroup, enabling controllers,
and then seeding it with CLONE_INTO_CGROUP doesn't require write
locking cgroup_threadgroup_rwsem and thus doesn't benefit from this
patch.

To avoid affecting other users, the per threadgroup rwsem is only used
when the favordynmods is enabled.

As computer hardware advances, modern systems are typically equipped
with many CPU cores and large amounts of memory, enabling the deployment
of numerous applications. On such systems, container creation and
deletion become frequent operations, making cgroup process migration no
longer a cold path. This leads to noticeable contention with common
process operations such as fork, exec, and exit.

To alleviate the contention between cgroup process migration and
operations like process fork, this patch modifies lock to take the write
lock on signal_struct-&gt;group_rwsem when writing pid to
cgroup.procs/threads instead of holding a global write lock.

Cgroup process migration has historically relied on
signal_struct-&gt;group_rwsem to protect thread group integrity. In commit
&lt;1ed1328792ff&gt; ("sched, cgroup: replace signal_struct-&gt;group_rwsem with
a global percpu_rwsem"), this was changed to a global
cgroup_threadgroup_rwsem. The advantage of using a global lock was
simplified handling of process group migrations. This patch retains the
use of the global lock for protecting process group migration, while
reducing contention by using per thread group lock during
cgroup.procs/threads writes.

The locking behavior is as follows:

write cgroup.procs/threads  | process fork,exec,exit | process group migration
------------------------------------------------------------------------------
cgroup_lock()               | down_read(&amp;g_rwsem)    | cgroup_lock()
down_write(&amp;p_rwsem)        | down_read(&amp;p_rwsem)    | down_write(&amp;g_rwsem)
critical section            | critical section       | critical section
up_write(&amp;p_rwsem)          | up_read(&amp;p_rwsem)      | up_write(&amp;g_rwsem)
cgroup_unlock()             | up_read(&amp;g_rwsem)      | cgroup_unlock()

g_rwsem denotes cgroup_threadgroup_rwsem, p_rwsem denotes
signal_struct-&gt;group_rwsem.

This patch eliminates contention between cgroup migration and fork
operations for threads that belong to different thread groups, thereby
reducing the long-tail latency of cgroup migrations and lowering system
load.

With this patch, under heavy fork and exec interference, the long-tail
latency of cgroup migration has been reduced from milliseconds to
microseconds. Under heavy cgroup migration interference, the multi-CPU
score of the spawn test case in UnixBench increased by 9%.

tj: Update comment in cgroup_favor_dynmods() and switch WARN_ONCE() to
    pr_warn_once().

Signed-off-by: Yi Tao &lt;escape@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: refactor the cgroup_attach_lock code to make it clearer</title>
<updated>2025-09-10T17:26:15+00:00</updated>
<author>
<name>Yi Tao</name>
<email>escape@linux.alibaba.com</email>
</author>
<published>2025-09-10T06:59:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a1ffc8ad3165fa1cf6a60c6a4b4e00dfd6603cf2'/>
<id>urn:sha1:a1ffc8ad3165fa1cf6a60c6a4b4e00dfd6603cf2</id>
<content type='text'>
Dynamic cgroup migration involving threadgroup locks can be in one of
two states: no lock held, or holding the global lock. Explicitly
declaring the different lock modes to make the code easier to
understand and facilitates future extensions of the lock modes.

Signed-off-by: Yi Tao &lt;escape@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Remove unused cgroup_subsys::post_attach</title>
<updated>2025-09-04T17:25:20+00:00</updated>
<author>
<name>Chuyi Zhou</name>
<email>zhouchuyi@bytedance.com</email>
</author>
<published>2025-09-04T07:45:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d8b269e009bbc471cb2735b5f737839495efce3b'/>
<id>urn:sha1:d8b269e009bbc471cb2735b5f737839495efce3b</id>
<content type='text'>
cgroup_subsys::post_attach callback was introduced in commit 5cf1cacb49ae
("cgroup, cpuset: replace cpuset_post_attach_flush() with
cgroup_subsys-&gt;post_attach callback") and only cpuset would use this
callback to wait for the mm migration to complete at the end of
__cgroup_procs_write(). Since the previous patch defer the flush operation
until returning to userspace, no one use this callback now. Remove this
callback from cgroup_subsys.

Signed-off-by: Chuyi Zhou &lt;zhouchuyi@bytedance.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: cgroup.stat.local time accounting</title>
<updated>2025-08-22T17:50:43+00:00</updated>
<author>
<name>Tiffany Yang</name>
<email>ynaffit@google.com</email>
</author>
<published>2025-08-22T01:37:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=afa3701c0e45ecb9e4d160048ca4e353c7489948'/>
<id>urn:sha1:afa3701c0e45ecb9e4d160048ca4e353c7489948</id>
<content type='text'>
There isn't yet a clear way to identify a set of "lost" time that
everyone (or at least a wider group of users) cares about. However,
users can perform some delay accounting by iterating over components of
interest. This patch allows cgroup v2 freezing time to be one of those
components.

Track the cumulative time that each v2 cgroup spends freezing and expose
it to userland via a new local stat file in cgroupfs. Thank you to
Michal, who provided the ASCII art in the updated documentation.

To access this value:
  $ mkdir /sys/fs/cgroup/test
  $ cat /sys/fs/cgroup/test/cgroup.stat.local
  freeze_time_total 0

Ensure consistent freeze time reads with freeze_seq, a per-cgroup
sequence counter. Writes are serialized using the css_set_lock.

Signed-off-by: Tiffany Yang &lt;ynaffit@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: remove per-cpu per-subsystem locks</title>
<updated>2025-06-17T20:01:18+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2025-06-17T19:57:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6af89c6ca71742e9227e6f8172a86ce1ee16aa85'/>
<id>urn:sha1:6af89c6ca71742e9227e6f8172a86ce1ee16aa85</id>
<content type='text'>
The rstat update side used to insert the cgroup whose stats are updated
in the update tree and the read side flush the update tree to get the
latest uptodate stats. The per-cpu per-subsystem locks were used to
synchronize the update and flush side. However now the update side does
not access update tree but uses per-cpu lockless lists. So there is no
need for locks to synchronize update and flush side. Let's remove them.

Suggested-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Tested-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
