<feed xmlns='http://www.w3.org/2005/Atom'>
<title>BMC/Intel-BMC/linux.git/kernel/sched, branch dev-5.14-intel</title>
<subtitle>Intel OpenBMC Linux kernel source tree (mirror)</subtitle>
<id>https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-5.14-intel</id>
<link rel='self' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-5.14-intel'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/'/>
<updated>2021-10-07T05:53:16+00:00</updated>
<entry>
<title>sched/fair: Null terminate buffer when updating tunable_scaling</title>
<updated>2021-10-07T05:53:16+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2021-09-27T11:46:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=9ea06d55278eb87b02189d0a03e7a4ec82466e4f'/>
<id>urn:sha1:9ea06d55278eb87b02189d0a03e7a4ec82466e4f</id>
<content type='text'>
[ Upstream commit 703066188f63d66cc6b9d678e5b5ef1213c5938e ]

This patch null-terminates the temporary buffer in sched_scaling_write()
so kstrtouint() does not return failure and checks the value is valid.

Before:
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1
  $ echo 0 &gt; /sys/kernel/debug/sched/tunable_scaling
  -bash: echo: write error: Invalid argument
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1

After:
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1
  $ echo 0 &gt; /sys/kernel/debug/sched/tunable_scaling
  $ cat /sys/kernel/debug/sched/tunable_scaling
  0
  $ echo 3 &gt; /sys/kernel/debug/sched/tunable_scaling
  -bash: echo: write error: Invalid argument

Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lore.kernel.org/r/20210927114635.GH3959@techsingularity.net
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Add ancestors of unthrottled undecayed cfs_rq</title>
<updated>2021-10-07T05:53:16+00:00</updated>
<author>
<name>Michal Koutný</name>
<email>mkoutny@suse.com</email>
</author>
<published>2021-09-17T15:30:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=fce08b03923ed829ee021f27c443d0fb5c8d2f7a'/>
<id>urn:sha1:fce08b03923ed829ee021f27c443d0fb5c8d2f7a</id>
<content type='text'>
[ Upstream commit 2630cde26711dab0d0b56a8be1616475be646d13 ]

Since commit a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to
list on unthrottle") we add cfs_rqs with no runnable tasks but not fully
decayed into the load (leaf) list. We may ignore adding some ancestors
and therefore breaking tmp_alone_branch invariant. This broke LTP test
cfs_bandwidth01 and it was partially fixed in commit fdaba61ef8a2
("sched/fair: Ensure that the CFS parent is added after unthrottling").

I noticed the named test still fails even with the fix (but with low
probability, 1 in ~1000 executions of the test). The reason is when
bailing out of unthrottle_cfs_rq early, we may miss adding ancestors of
the unthrottled cfs_rq, thus, not joining tmp_alone_branch properly.

Fix this by adding ancestors if we notice the unthrottled cfs_rq was
added to the load list.

Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
Signed-off-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Odin Ugedal &lt;odin@uged.al&gt;
Link: https://lore.kernel.org/r/20210917153037.11176-1-mkoutny@suse.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cpufreq: schedutil: Use kobject release() method to free sugov_tunables</title>
<updated>2021-10-07T05:53:04+00:00</updated>
<author>
<name>Kevin Hao</name>
<email>haokexin@gmail.com</email>
</author>
<published>2021-08-05T07:29:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=8d62aec52a8c5b1d25a2364b243fcc5098a2ede9'/>
<id>urn:sha1:8d62aec52a8c5b1d25a2364b243fcc5098a2ede9</id>
<content type='text'>
[ Upstream commit e5c6b312ce3cc97e90ea159446e6bfa06645364d ]

The struct sugov_tunables is protected by the kobject, so we can't free
it directly. Otherwise we would get a call trace like this:
  ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x30
  WARNING: CPU: 3 PID: 720 at lib/debugobjects.c:505 debug_print_object+0xb8/0x100
  Modules linked in:
  CPU: 3 PID: 720 Comm: a.sh Tainted: G        W         5.14.0-rc1-next-20210715-yocto-standard+ #507
  Hardware name: Marvell OcteonTX CN96XX board (DT)
  pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--)
  pc : debug_print_object+0xb8/0x100
  lr : debug_print_object+0xb8/0x100
  sp : ffff80001ecaf910
  x29: ffff80001ecaf910 x28: ffff00011b10b8d0 x27: ffff800011043d80
  x26: ffff00011a8f0000 x25: ffff800013cb3ff0 x24: 0000000000000000
  x23: ffff80001142aa68 x22: ffff800011043d80 x21: ffff00010de46f20
  x20: ffff800013c0c520 x19: ffff800011d8f5b0 x18: 0000000000000010
  x17: 6e6968207473696c x16: 5f72656d6974203a x15: 6570797420746365
  x14: 6a626f2029302065 x13: 303378302f307830 x12: 2b6e665f72656d69
  x11: ffff8000124b1560 x10: ffff800012331520 x9 : ffff8000100ca6b0
  x8 : 000000000017ffe8 x7 : c0000000fffeffff x6 : 0000000000000001
  x5 : ffff800011d8c000 x4 : ffff800011d8c740 x3 : 0000000000000000
  x2 : ffff0001108301c0 x1 : ab3c90eedf9c0f00 x0 : 0000000000000000
  Call trace:
   debug_print_object+0xb8/0x100
   __debug_check_no_obj_freed+0x1c0/0x230
   debug_check_no_obj_freed+0x20/0x88
   slab_free_freelist_hook+0x154/0x1c8
   kfree+0x114/0x5d0
   sugov_exit+0xbc/0xc0
   cpufreq_exit_governor+0x44/0x90
   cpufreq_set_policy+0x268/0x4a8
   store_scaling_governor+0xe0/0x128
   store+0xc0/0xf0
   sysfs_kf_write+0x54/0x80
   kernfs_fop_write_iter+0x128/0x1c0
   new_sync_write+0xf0/0x190
   vfs_write+0x2d4/0x478
   ksys_write+0x74/0x100
   __arm64_sys_write+0x24/0x30
   invoke_syscall.constprop.0+0x54/0xe0
   do_el0_svc+0x64/0x158
   el0_svc+0x2c/0xb0
   el0t_64_sync_handler+0xb0/0xb8
   el0t_64_sync+0x198/0x19c
  irq event stamp: 5518
  hardirqs last  enabled at (5517): [&lt;ffff8000100cbd7c&gt;] console_unlock+0x554/0x6c8
  hardirqs last disabled at (5518): [&lt;ffff800010fc0638&gt;] el1_dbg+0x28/0xa0
  softirqs last  enabled at (5504): [&lt;ffff8000100106e0&gt;] __do_softirq+0x4d0/0x6c0
  softirqs last disabled at (5483): [&lt;ffff800010049548&gt;] irq_exit+0x1b0/0x1b8

So split the original sugov_tunables_free() into two functions,
sugov_clear_global_tunables() is just used to clear the global_tunables
and the new sugov_tunables_free() is used as kobj_type::release to
release the sugov_tunables safely.

Fixes: 9bdcb44e391d ("cpufreq: schedutil: New governor based on scheduler utilization data")
Cc: 4.7+ &lt;stable@vger.kernel.org&gt; # 4.7+
Signed-off-by: Kevin Hao &lt;haokexin@gmail.com&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/idle: Make the idle timer expire in hard interrupt context</title>
<updated>2021-09-26T12:10:25+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2021-09-06T11:30:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=cacfce79af9bc0f03a3b33377718d76a138651f5'/>
<id>urn:sha1:cacfce79af9bc0f03a3b33377718d76a138651f5</id>
<content type='text'>
[ Upstream commit 9848417926353daa59d2b05eb26e185063dbac6e ]

The intel powerclamp driver will setup a per-CPU worker with RT
priority. The worker will then invoke play_idle() in which it remains in
the idle poll loop until it is stopped by the timer it started earlier.

That timer needs to expire in hard interrupt context on PREEMPT_RT.
Otherwise the timer will expire in ksoftirqd as a SOFT timer but that task
won't be scheduled on the CPU because its priority is lower than the
priority of the worker which is in the idle loop.

Always expire the idle timer in hard interrupt context.

Reported-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20210906113034.jgfxrjdvxnjqgtmc@linutronix.de
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Prevent balance_push() on remote runqueues</title>
<updated>2021-09-18T11:43:24+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-28T13:55:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=5b7992c06c54f76f71ff5fa0135a80bc41153689'/>
<id>urn:sha1:5b7992c06c54f76f71ff5fa0135a80bc41153689</id>
<content type='text'>
commit 868ad33bfa3bf39960982682ad3a0f8ebda1656e upstream.

sched_setscheduler() and rt_mutex_setprio() invoke the run-queue balance
callback after changing priorities or the scheduling class of a task. The
run-queue for which the callback is invoked can be local or remote.

That's not a problem for the regular rq::push_work which is serialized with
a busy flag in the run-queue struct, but for the balance_push() work which
is only valid to be invoked on the outgoing CPU that's wrong. It not only
triggers the debug warning, but also leaves the per CPU variable push_work
unprotected, which can result in double enqueues on the stop machine list.

Remove the warning and validate that the function is invoked on the
outgoing CPU.

Fixes: ae7927023243 ("sched: Optimize finish_lock_switch()")
Reported-by: Sebastian Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87zgt1hdw7.ffs@tglx
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix UCLAMP_FLAG_IDLE setting</title>
<updated>2021-09-15T08:02:07+00:00</updated>
<author>
<name>Quentin Perret</name>
<email>qperret@google.com</email>
</author>
<published>2021-08-05T10:21:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=ba3396c359a0fdefe1860667f2a5947192f4eea0'/>
<id>urn:sha1:ba3396c359a0fdefe1860667f2a5947192f4eea0</id>
<content type='text'>
[ Upstream commit ca4984a7dd863f3e1c0df775ae3e744bff24c303 ]

The UCLAMP_FLAG_IDLE flag is set on a runqueue when dequeueing the last
uclamp active task (that is, when buckets.tasks reaches 0 for all
buckets) to maintain the last uclamp.max and prevent blocked util from
suddenly becoming visible.

However, there is an asymmetry in how the flag is set and cleared which
can lead to having the flag set whilst there are active tasks on the rq.
Specifically, the flag is cleared in the uclamp_rq_inc() path, which is
called at enqueue time, but set in uclamp_rq_dec_id() which is called
both when dequeueing a task _and_ in the update_uclamp_active() path. As
a result, when both uclamp_rq_{dec,ind}_id() are called from
update_uclamp_active(), the flag ends up being set but not cleared,
hence leaving the runqueue in a broken state.

Fix this by clearing the flag in update_uclamp_active() as well.

Fixes: e496187da710 ("sched/uclamp: Enforce last task's UCLAMP_MAX")
Reported-by: Rick Yiu &lt;rickyiu@google.com&gt;
Signed-off-by: Quentin Perret &lt;qperret@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Link: https://lore.kernel.org/r/20210805102154.590709-2-qperret@google.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/numa: Fix is_core_idle()</title>
<updated>2021-09-15T08:02:07+00:00</updated>
<author>
<name>Mika Penttilä</name>
<email>mika.penttila@gmail.com</email>
</author>
<published>2021-07-22T06:39:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=54ae668378baed204fb019d682735e0d6d71a369'/>
<id>urn:sha1:54ae668378baed204fb019d682735e0d6d71a369</id>
<content type='text'>
[ Upstream commit 1c6829cfd3d5124b125e6df41158665aea413b35 ]

Use the loop variable instead of the function argument to test the
other SMT siblings for idle.

Fixes: ff7db0bf24db ("sched/numa: Prefer using an idle CPU as a migration target instead of comparing tasks")
Signed-off-by: Mika Penttilä &lt;mika.penttila@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Acked-by: Pankaj Gupta &lt;pankaj.gupta@ionos.com&gt;
Link: https://lkml.kernel.org/r/20210722063946.28951-1-mika.penttila@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/debug: Don't update sched_domain debug directories before sched_debug_init()</title>
<updated>2021-09-15T08:02:06+00:00</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2021-05-18T13:07:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=d2964558f741790b8d0f8bcdc2ab669875a431c6'/>
<id>urn:sha1:d2964558f741790b8d0f8bcdc2ab669875a431c6</id>
<content type='text'>
[ Upstream commit 459b09b5a3254008b63382bf41a9b36d0b590f57 ]

Since CPU capacity asymmetry can stem purely from maximum frequency
differences (e.g. Pixel 1), a rebuild of the scheduler topology can be
issued upon loading cpufreq, see:

  arch_topology.c::init_cpu_capacity_callback()

Turns out that if this rebuild happens *before* sched_debug_init() is
run (which is a late initcall), we end up messing up the sched_domain debug
directory: passing a NULL parent to debugfs_create_dir() ends up creating
the directory at the debugfs root, which in this case creates
/sys/kernel/debug/domains (instead of /sys/kernel/debug/sched/domains).

This currently doesn't happen on asymmetric systems which use cpufreq-scpi
or cpufreq-dt drivers, as those are loaded via
deferred_probe_initcall() (it is also a late initcall, but appears to be
ordered *after* sched_debug_init()).

Ionela has been working on detecting maximum frequency asymmetry via ACPI,
and that actually happens via a *device* initcall, thus before
sched_debug_init(), and causes the aforementionned debugfs mayhem.

One option would be to punt sched_debug_init() down to
fs_initcall_sync(). Preventing update_sched_domain_debugfs() from running
before sched_debug_init() appears to be the safer option.

Fixes: 3b87f136f8fc ("sched,debug: Convert sysctl sched_domains to debugfs")
Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: http://lore.kernel.org/r/20210514095339.12979-1-ionela.voinescu@arm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/topology: Skip updating masks for non-online nodes</title>
<updated>2021-09-15T08:02:04+00:00</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2021-08-18T07:43:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=34c033400a88625519f640b845e9cb51a8fa945b'/>
<id>urn:sha1:34c033400a88625519f640b845e9cb51a8fa945b</id>
<content type='text'>
[ Upstream commit 0083242c93759dde353a963a90cb351c5c283379 ]

The scheduler currently expects NUMA node distances to be stable from
init onwards, and as a consequence builds the related data structures
once-and-for-all at init (see sched_init_numa()).

Unfortunately, on some architectures node distance is unreliable for
offline nodes and may very well change upon onlining.

Skip over offline nodes during sched_init_numa(). Track nodes that have
been onlined at least once, and trigger a build of a node's NUMA masks
when it is first onlined post-init.

Reported-by: Geetika Moolchandani &lt;Geetika.Moolchandani1@ibm.com&gt;
Signed-off-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20210818074333.48645-1-srikar@linux.vnet.ibm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/deadline: Fix missing clock update in migrate_task_rq_dl()</title>
<updated>2021-09-15T08:02:02+00:00</updated>
<author>
<name>Dietmar Eggemann</name>
<email>dietmar.eggemann@arm.com</email>
</author>
<published>2021-08-04T13:59:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=daf2ceb70199283fc350365a61aa3adfd9ab7c23'/>
<id>urn:sha1:daf2ceb70199283fc350365a61aa3adfd9ab7c23</id>
<content type='text'>
[ Upstream commit b4da13aa28d4fd0071247b7b41c579ee8a86c81a ]

A missing clock update is causing the following warning:

rq-&gt;clock_update_flags &lt; RQCF_ACT_SKIP
WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453
sub_running_bw.isra.0+0x190/0x1a0
...
CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G W 5.14.0-rc1 #1
Hardware name: WIWYNN Mt.Jade Server System
B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP:
1.06.20210526) 2021/05/26
...
Call trace:
  sub_running_bw.isra.0+0x190/0x1a0
  migrate_task_rq_dl+0xf8/0x1e0
  set_task_cpu+0xa8/0x1f0
  try_to_wake_up+0x150/0x3d4
  wake_up_q+0x64/0xc0
  __up_write+0xd0/0x1c0
  up_write+0x4c/0x2b0
  cppc_set_perf+0x120/0x2d0
  cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq]
  __cpufreq_driver_target+0x74/0x140
  sugov_work+0x64/0x80
  kthread_worker_fn+0xe0/0x230
  kthread+0x138/0x140
  ret_from_fork+0x10/0x18

The task causing this is the `cppc_fie` DL task introduced by
commit 1eb5dde674f5 ("cpufreq: CPPC: Add support for frequency
invariance").

With CONFIG_ACPI_CPPC_CPUFREQ_FIE=y and schedutil cpufreq governor on
slow-switching system (like on this Ampere Altra WIWYNN Mt. Jade Arm
Server):

DL task `curr=sugov:112` lets `p=cppc_fie` migrate and since the latter
is in `non_contending` state, migrate_task_rq_dl() calls

  sub_running_bw()-&gt;__sub_running_bw()-&gt;cpufreq_update_util()-&gt;
  rq_clock()-&gt;assert_clock_updated()

on p.

Fix this by updating the clock for a non_contending task in
migrate_task_rq_dl() before calling sub_running_bw().

Reported-by: Bruno Goncalves &lt;bgoncalv@redhat.com&gt;
Signed-off-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Acked-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Link: https://lore.kernel.org/r/20210804135925.3734605-1-dietmar.eggemann@arm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
