diff options
| author | Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> | 2026-04-19 16:26:54 +0300 |
|---|---|---|
| committer | Rafael J. Wysocki <rafael.j.wysocki@intel.com> | 2026-05-26 20:06:14 +0300 |
| commit | 24fc5870808dea4290e9563746cb5f2146043c6c (patch) | |
| tree | 38053c32e8d81a44c981681972bde5308d303bb5 /include/linux/stackprotector.h | |
| parent | 91f5d698478f3d07230cf9ca4dfaf67e0316a53d (diff) | |
| download | linux-24fc5870808dea4290e9563746cb5f2146043c6c.tar.xz | |
cpufreq: governor: Fix stale prev_cpu_nice spike when enabling ignore_nice_load
When ignore_nice_load is toggled from 0 to 1 via sysfs, dbs_update() may
run concurrently and observe the new tunable value while prev_cpu_nice
still holds a stale baseline, producing a spurious massive idle_time that
results in an incorrect CPU load value.
The race can be illustrated with two concurrent paths:
Path A (sysfs write, holds attr_set->update_lock):
governor_store()
mutex_lock(&attr_set->update_lock)
ignore_nice_load_store()
dbs_data->ignore_nice_load = 1 /* (A1) */
gov_update_cpu_data(dbs_data)
mutex_lock(&policy_dbs->update_mutex) /* (A2) */
j_cdbs->prev_cpu_nice = kcpustat_field(...)
mutex_unlock(&policy_dbs->update_mutex)
mutex_unlock(&attr_set->update_lock)
Path B (work queue, wins the race between A1 and A2):
dbs_work_handler()
mutex_lock(&policy_dbs->update_mutex) /* acquired before A2 */
dbs_update()
ignore_nice = dbs_data->ignore_nice_load /* sees new value: 1 */
cur_nice = kcpustat_field(...)
idle_time += div_u64(cur_nice - j_cdbs->prev_cpu_nice, ..) /* stale */
j_cdbs->prev_cpu_nice = cur_nice
mutex_unlock(&policy_dbs->update_mutex)
Fix this by unconditionally sampling cur_nice and advancing prev_cpu_nice
in dbs_update() on every call, regardless of ignore_nice. With
prev_cpu_nice always reflecting the most recent sample, enabling
ignore_nice_load can never produce a stale-baseline spike: the delta will
always be the nice time accumulated in the last sampling interval, not
since boot. The additional kcpustat_field() call per CPU per sample is
negligible given that the sampling path already reads idle and load
accounting.
To keep prev_cpu_nice handling consistent with the always-tracking
semantics introduced above:
- gov_update_cpu_data() unconditionally resets prev_cpu_nice alongside
prev_cpu_idle, so both baselines share the same timestamp when
io_is_busy changes. This prevents an interval mismatch between
idle_time and nice_delta on the next dbs_update() when
ignore_nice_load is enabled.
- cpufreq_dbs_governor_start() unconditionally initializes prev_cpu_nice
so the baseline is always valid from the first dbs_update() call;
remove the ignore_nice guard and the now-unused ignore_nice variable.
Fixes: ee88415caf736b ("[CPUFREQ] Cleanup locking in conservative governor")
Fixes: 5a75c82828e7c0 ("[CPUFREQ] Cleanup locking in ondemand governor")
Fixes: 326c86deaed54a ("[CPUFREQ] Remove unneeded locks")
Signed-off-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Link: https://patch.msgid.link/20260419132655.3800673-3-zhongqiu.han@oss.qualcomm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Diffstat (limited to 'include/linux/stackprotector.h')
0 files changed, 0 insertions, 0 deletions
