summaryrefslogtreecommitdiff
path: root/include/linux
diff options
context:
space:
mode:
authorPeter Zijlstra <peterz@infradead.org>2025-05-23 18:28:00 +0300
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2025-08-15 13:08:46 +0300
commitb1c6b13f48a8f5bf147e91e42da1db740b0a7b1f (patch)
tree41f4bfc8c3ad68f105e6d65c1be9285aa16605c5 /include/linux
parentdabe5fc2ee7a43c900a9e6a5de9424d6f7e0c87d (diff)
downloadlinux-b1c6b13f48a8f5bf147e91e42da1db740b0a7b1f.tar.xz
sched/psi: Optimize psi_group_change() cpu_clock() usage
[ Upstream commit 570c8efd5eb79c3725ba439ce105ed1bedc5acd9 ] Dietmar reported that commit 3840cbe24cf0 ("sched: psi: fix bogus pressure spikes from aggregation race") caused a regression for him on a high context switch rate benchmark (schbench) due to the now repeating cpu_clock() calls. In particular the problem is that get_recent_times() will extrapolate the current state to 'now'. But if an update uses a timestamp from before the start of the update, it is possible to get two reads with inconsistent results. It is effectively back-dating an update. (note that this all hard-relies on the clock being synchronized across CPUs -- if this is not the case, all bets are off). Combine this problem with the fact that there are per-group-per-cpu seqcounts, the commit in question pushed the clock read into the group iteration, causing tree-depth cpu_clock() calls. On architectures where cpu_clock() has appreciable overhead, this hurts. Instead move to a per-cpu seqcount, which allows us to have a single clock read for all group updates, increasing internal consistency and lowering update overhead. This comes at the cost of a longer update side (proportional to the tree depth) which can cause the read side to retry more often. Fixes: 3840cbe24cf0 ("sched: psi: fix bogus pressure spikes from aggregation race") Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>, Link: https://lkml.kernel.org/20250522084844.GC31726@noisy.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/psi_types.h6
1 files changed, 2 insertions, 4 deletions
diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
index f1fd3a8044e0..dd10c22299ab 100644
--- a/include/linux/psi_types.h
+++ b/include/linux/psi_types.h
@@ -84,11 +84,9 @@ enum psi_aggregators {
struct psi_group_cpu {
/* 1st cacheline updated by the scheduler */
- /* Aggregator needs to know of concurrent changes */
- seqcount_t seq ____cacheline_aligned_in_smp;
-
/* States of the tasks belonging to this group */
- unsigned int tasks[NR_PSI_TASK_COUNTS];
+ unsigned int tasks[NR_PSI_TASK_COUNTS]
+ ____cacheline_aligned_in_smp;
/* Aggregate pressure state derived from the tasks */
u32 state_mask;