From cfc1da7e1127b4c8787f4dc25d59987c10c9107f Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 4 May 2026 14:51:20 -1000 Subject: cgroup: Add per-subsys-css kill_css_finish deferral 93618edf7538 ("cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated") deferred kill_css_finish() at the cgroup level: rmdir waits for the entire cgroup's populated count to drop to zero, then fires kill_css_finish() on every subsystem css at once. Replace that with per-subsys-css deferral. Each subsystem css now tracks its own hierarchical populated count and independently defers its kill_css_finish() until its own subtree drains. The rmdir-race fix carries through unchanged in shape. The dying css's ->css_offline() still waits until no PF_EXITING task references it, and v2's cgroup-level machinery goes away. cgroup_apply_control_disable() has the same race shape (PF_EXITING tasks pinning a css whose ->css_offline() is about to run) and stays synchronous here. This patch lays the groundwork for fixing it - per-cgroup waiting can't gate one subsys css being killed while the rest of the cgroup stays live, but per-css can. Subtree-wide invariant preserved: a dying ancestor css stays populated through nr_populated_children until every dying descendant's task drains, so the walker fires the ancestor's kill_finish_work only after all descendants have drained. Add paired smp_mb()s in kill_css_sync() and css_update_populated() to fence the StoreLoad on (CSS_DYING, populated counter), guaranteeing that either the walker queues kill_finish_work or the caller fires synchronously. cgroup_destroy_locked() was implicitly fenced by an unrelated css_set_lock pair; cgroup_apply_control_disable() in the next patch is not. Signed-off-by: Tejun Heo --- include/linux/cgroup-defs.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index c4929f7bbe5a..de2cd6238c2a 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -262,6 +262,9 @@ struct cgroup_subsys_state { int nr_populated_csets; int nr_populated_children; + /* deferred kill_css_finish() queued by css_update_populated() */ + struct work_struct kill_finish_work; + /* * A singly-linked list of css structures to be rstat flushed. * This is a scratch field to be used exclusively by @@ -615,9 +618,6 @@ struct cgroup { /* used to wait for offlining of csses */ wait_queue_head_t offline_waitq; - /* defers killing csses after removal until cgroup is depopulated */ - struct work_struct finish_destroy_work; - /* used to schedule release agent */ struct work_struct release_agent_work; -- cgit v1.2.3