kernel/linux.git/include/linux/cgroup-defs.h, branch v6.19.11

cgroup: Eliminate cgrp_ancestor_storage in cgroup_root

2026-01-08T01:11:03+00:00

The cgrp_ancestor_storage has two drawbacks: - it's not guaranteed that the member immediately follows struct cgrp in cgroup_root (root cgroup's ancestors[0] might thus point to a padding and not in cgrp_ancestor_storage proper), - this idiom raises warnings with -Wflex-array-member-not-at-end. Instead of relying on the auxiliary member in cgroup_root, define the 0-th level ancestor inside struct cgroup (needed for static allocation of cgrp_dfl_root), deeper cgroups would allocate flexible _low_ancestors[]. Unionized alias through ancestors[] will transparently join the two ranges. The above change would still leave the flexible array at the end of struct cgroup inside cgroup_root, so move cgrp also towards the end of cgroup_root to resolve the -Wflex-array-member-not-at-end. Link: https://lore.kernel.org/r/5fb74444-2fbb-476e-b1bf-3f3e279d0ced@embeddedor.com/ Reported-by: Gustavo A. R. Silva Closes: https://lore.kernel.org/r/b3eb050d-9451-4b60-b06c-ace7dab57497@embeddedor.com/ Cc: David Laight Acked-by: Gustavo A. R. Silva Signed-off-by: Michal Koutný Signed-off-by: Tejun Heo

cgroup: Fix seqcount lockdep assertion in cgroup freezer

2025-10-03T14:30:28+00:00

The commit afa3701c0e45 ("cgroup: cgroup.stat.local time accounting") introduced a seqcount to track freeze timing but initialized it as a plain seqcount_t using seqcount_init(). However, the write-side critical section in cgroup_do_freeze() holds the css_set_lock spinlock while calling write_seqcount_begin(). On PREEMPT_RT kernels, spinlocks do not disable preemption, causing the lockdep assertion for a plain seqcount_t, which checks for preemption being disabled, to fail. This triggers the following warning: WARNING: CPU: 0 PID: 9692 at include/linux/seqlock.h:221 Fix this by changing the type to seqcount_spinlock_t and initializing it with seqcount_spinlock_init() to associate css_set_lock with the seqcount. This allows lockdep to correctly validate that the spinlock is held during write operations, resolving the assertion failure on all kernel configurations. Reported-by: syzbot+27a2519eb4dad86d0156@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=27a2519eb4dad86d0156 Fixes: afa3701c0e45 ("cgroup: cgroup.stat.local time accounting") Signed-off-by: Nirbhay Sharma Link: https://lore.kernel.org/r/20251002165510.KtY3IT--@linutronix.de/ Acked-by: Michal Koutný Signed-off-by: Tejun Heo

cgroup: replace global percpu_rwsem with per threadgroup resem when writing to cgroup.procs

2025-09-10T17:44:51+00:00

The static usage pattern of creating a cgroup, enabling controllers, and then seeding it with CLONE_INTO_CGROUP doesn't require write locking cgroup_threadgroup_rwsem and thus doesn't benefit from this patch. To avoid affecting other users, the per threadgroup rwsem is only used when the favordynmods is enabled. As computer hardware advances, modern systems are typically equipped with many CPU cores and large amounts of memory, enabling the deployment of numerous applications. On such systems, container creation and deletion become frequent operations, making cgroup process migration no longer a cold path. This leads to noticeable contention with common process operations such as fork, exec, and exit. To alleviate the contention between cgroup process migration and operations like process fork, this patch modifies lock to take the write lock on signal_struct->group_rwsem when writing pid to cgroup.procs/threads instead of holding a global write lock. Cgroup process migration has historically relied on signal_struct->group_rwsem to protect thread group integrity. In commit <1ed1328792ff> ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem"), this was changed to a global cgroup_threadgroup_rwsem. The advantage of using a global lock was simplified handling of process group migrations. This patch retains the use of the global lock for protecting process group migration, while reducing contention by using per thread group lock during cgroup.procs/threads writes. The locking behavior is as follows: write cgroup.procs/threads | process fork,exec,exit | process group migration ------------------------------------------------------------------------------ cgroup_lock() | down_read(&g_rwsem) | cgroup_lock() down_write(&p_rwsem) | down_read(&p_rwsem) | down_write(&g_rwsem) critical section | critical section | critical section up_write(&p_rwsem) | up_read(&p_rwsem) | up_write(&g_rwsem) cgroup_unlock() | up_read(&g_rwsem) | cgroup_unlock() g_rwsem denotes cgroup_threadgroup_rwsem, p_rwsem denotes signal_struct->group_rwsem. This patch eliminates contention between cgroup migration and fork operations for threads that belong to different thread groups, thereby reducing the long-tail latency of cgroup migrations and lowering system load. With this patch, under heavy fork and exec interference, the long-tail latency of cgroup migration has been reduced from milliseconds to microseconds. Under heavy cgroup migration interference, the multi-CPU score of the spawn test case in UnixBench increased by 9%. tj: Update comment in cgroup_favor_dynmods() and switch WARN_ONCE() to pr_warn_once(). Signed-off-by: Yi Tao Signed-off-by: Tejun Heo

cgroup: refactor the cgroup_attach_lock code to make it clearer

2025-09-10T17:26:15+00:00

Dynamic cgroup migration involving threadgroup locks can be in one of two states: no lock held, or holding the global lock. Explicitly declaring the different lock modes to make the code easier to understand and facilitates future extensions of the lock modes. Signed-off-by: Yi Tao Signed-off-by: Tejun Heo

cgroup: Remove unused cgroup_subsys::post_attach

2025-09-04T17:25:20+00:00

cgroup_subsys::post_attach callback was introduced in commit 5cf1cacb49ae ("cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback") and only cpuset would use this callback to wait for the mm migration to complete at the end of __cgroup_procs_write(). Since the previous patch defer the flush operation until returning to userspace, no one use this callback now. Remove this callback from cgroup_subsys. Signed-off-by: Chuyi Zhou Acked-by: Waiman Long Signed-off-by: Tejun Heo

cgroup: cgroup.stat.local time accounting

2025-08-22T17:50:43+00:00

There isn't yet a clear way to identify a set of "lost" time that everyone (or at least a wider group of users) cares about. However, users can perform some delay accounting by iterating over components of interest. This patch allows cgroup v2 freezing time to be one of those components. Track the cumulative time that each v2 cgroup spends freezing and expose it to userland via a new local stat file in cgroupfs. Thank you to Michal, who provided the ASCII art in the updated documentation. To access this value: $ mkdir /sys/fs/cgroup/test $ cat /sys/fs/cgroup/test/cgroup.stat.local freeze_time_total 0 Ensure consistent freeze time reads with freeze_seq, a per-cgroup sequence counter. Writes are serialized using the css_set_lock. Signed-off-by: Tiffany Yang Cc: Tejun Heo Cc: Michal Koutný Signed-off-by: Tejun Heo

cgroup: remove per-cpu per-subsystem locks

2025-06-17T20:01:18+00:00

The rstat update side used to insert the cgroup whose stats are updated in the update tree and the read side flush the update tree to get the latest uptodate stats. The per-cpu per-subsystem locks were used to synchronize the update and flush side. However now the update side does not access update tree but uses per-cpu lockless lists. So there is no need for locks to synchronize update and flush side. Let's remove them. Suggested-by: JP Kobryn Signed-off-by: Shakeel Butt Tested-by: JP Kobryn Signed-off-by: Tejun Heo

cgroup: support to enable nmi-safe css_rstat_updated

2025-06-17T19:58:45+00:00

Add necessary infrastructure to enable the nmi-safe execution of css_rstat_updated(). Currently css_rstat_updated() takes a per-cpu per-css raw spinlock to add the given css in the per-cpu per-css update tree. However the kernel can not spin in nmi context, so we need to remove the spinning on the raw spinlock in css_rstat_updated(). To support lockless css_rstat_updated(), let's add necessary data structures in the css and ss structures. Signed-off-by: Shakeel Butt Tested-by: JP Kobryn Signed-off-by: Tejun Heo

cgroup: Drop sock_cgroup_classid() dummy implementation

2025-06-03T18:26:36+00:00

The semantic of returning 0 is unclear when !CONFIG_CGROUP_NET_CLASSID. Since there are no callers of sock_cgroup_classid() with that config anymore we can undefine the helper at all and enforce all (future) callers to handle cases when !CONFIG_CGROUP_NET_CLASSID. Signed-off-by: Michal Koutný Link: https://lore.kernel.org/r/Z_52r_v9-3JUzDT7@calendula/ Acked-by: Tejun Heo Reviewed-by: Waiman Long Signed-off-by: Tejun Heo

cgroup: document the rstat per-cpu initialization

2025-05-19T20:30:02+00:00

The calls to css_rstat_init() occur at different places depending on the context. Document the conditions that determine which point of initialization is used. Signed-off-by: JP Kobryn Signed-off-by: Tejun Heo