summaryrefslogtreecommitdiff
path: root/scripts/Makefile.thinlto
diff options
context:
space:
mode:
authorBaokun Li <libaokun@linux.alibaba.com>2026-05-21 12:50:16 +0300
committerChristian Brauner <brauner@kernel.org>2026-05-22 13:06:35 +0300
commit31c1d19ead2c26a63859a2757d8b786765ba9cdd (patch)
treeeaf2bf1530d119356ec790fddb0175ac6f601842 /scripts/Makefile.thinlto
parente90a6d668e26e00a72df2d09c173b563468f09c9 (diff)
downloadlinux-31c1d19ead2c26a63859a2757d8b786765ba9cdd.tar.xz
writeback: use a per-sb counter to drain inode wb switches at umount
Tracking in-flight inode wb switches with a single global counter (isw_nr_in_flight) plus a synchronize_rcu() based wait in cgroup_writeback_umount() forces every umount to take a global hit whenever any other superblock on the system has wb switches in flight, even if the superblock being unmounted has none of its own. Replace the global synchronize_rcu()/flush_workqueue() pair with a per-sb counter, s_isw_nr_in_flight, plus three small helpers: - cgroup_writeback_pin(sb) - increment counter - cgroup_writeback_unpin(sb) - decrement and wake drainer if last - cgroup_writeback_drain(sb) - wait for counter to reach zero The wiring is: - inode_prepare_wbs_switch() pins before checking SB_ACTIVE and grabbing the inode; failure paths unpin before returning. A lockless SB_ACTIVE check at the top of the function lets us skip the atomic_inc/smp_mb dance once SB_ACTIVE has been cleared (it is monotonic and never set back). - process_inode_switch_wbs() unpins after the matching iput(). - cgroup_writeback_umount() drains the per-sb counter via wait_var_event(). The smp_mb() pair between inode_prepare_wbs_switch() and cgroup_writeback_umount() keeps the SB_ACTIVE / counter ordering: either the umounter sees a non-zero counter and waits, or the switcher sees SB_ACTIVE cleared and aborts before grabbing the inode. The global isw_nr_in_flight is left in place, since it is still used to throttle in-flight switches via WB_FRN_MAX_IN_FLIGHT. The rcu_read_lock() extension in inode_switch_wbs() and cleanup_offline_cgwb() that the race fix added is no longer needed and is reverted; the synchronize_rcu() that the race fix added to cgroup_writeback_umount() is dropped as well. The following numbers were measured on a 16 vCPU QEMU guest with 4 background superblocks each churning "create memcg -> write 1 MiB -> rmdir memcg" to keep the global isw_nr_in_flight non-zero. Latencies are wall-clock around umount(8); only the target sb's umount is measured. Target sb runs its own cgwb churn: p50 p95 p99 max global synchronize_rcu() 67.6 ms 88.3 ms 88.3 ms 96.8 ms per-sb counter (this) 7.9 ms 10.0 ms 10.0 ms 10.1 ms Idle target umount latency under cross-sb cgwb-switch pressure: p50 p95 p99 max global synchronize_rcu() 62.7 ms 95.4 ms 108.1 ms 108.6 ms per-sb counter (this) 5.3 ms 6.9 ms 7.4 ms 7.4 ms no-pressure baseline 4.9 ms 5.9 ms 6.3 ms 6.7 ms 8 concurrent umounts of idle sbs under the same pressure: p50 p95 max global synchronize_rcu() 61.3 ms 99.5 ms 113.7 ms per-sb counter (this) 8.1 ms 9.1 ms 9.5 ms In-kernel cgroup_writeback_umount() time across the same run (bpftrace, ~340 calls covering all scenarios): global synchronize_rcu() 12371 ms total (~36 ms / call) per-sb counter (this) 1.37 ms total ( ~4 us / call) Suggested-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/177910456953.488929.2169908940676707307.b4-review@b4 Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260521095016.2791354-4-libaokun@linux.alibaba.com Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Diffstat (limited to 'scripts/Makefile.thinlto')
0 files changed, 0 insertions, 0 deletions