diff options
author | Valentin Schneider <valentin.schneider@arm.com> | 2022-01-27 18:40:59 +0300 |
---|---|---|
committer | Peter Zijlstra <peterz@infradead.org> | 2022-03-01 18:18:38 +0300 |
commit | 49bef33e4b87b743495627a529029156c6e09530 (patch) | |
tree | 1022a0794c2ffbb6d35f75f9e014d3a2937cb21c /kernel/sched/deadline.c | |
parent | 3eba0505d03a9c1eb30d40c2330c0880b22d1b3a (diff) | |
download | linux-49bef33e4b87b743495627a529029156c6e09530.tar.xz |
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
John reported that push_rt_task() can end up invoking
find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
one), which causes mayhem down convert_prio().
This can happen when current gets demoted to e.g. CFS when releasing an
rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
getting the chance to reschedule. Exactly who triggers this work isn't
entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
if there are no RT tasks on the local RQ, which means the local CPU can't
be in the rto_mask.
My current suspected sequence is something along the lines of the below,
with the demoted task being current.
mark_wakeup_next_waiter()
rt_mutex_adjust_prio()
rt_mutex_setprio() // deboost originally-CFS task
check_class_changed()
switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
switched_to_fair() // Sets need_resched
__balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
raw_spin_rq_unlock(rq)
// need_resched is set, so task_woken_rt() can't
// invoke push_rt_tasks(). Best I can come up with is
// local CPU has rt_nr_migratory >= 2 after the demotion, so stays
// in the rto_mask, and then:
<some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
push_rt_task()
// breakage follows here as rq->curr is CFS
Move an existing check to check rq->curr vs the next pushable task's
priority before getting anywhere near find_lowest_rq(). While at it, add an
explicit sched_class of rq->curr check prior to invoking
find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
of next_task's migratability.
Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: John Keeping <john@metanate.com>
Link: https://lore.kernel.org/r/20220127154059.974729-1-valentin.schneider@arm.com
Diffstat (limited to 'kernel/sched/deadline.c')
-rw-r--r-- | kernel/sched/deadline.c | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index d2c072b0ef01..62f0cf842277 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2240,12 +2240,6 @@ static int push_dl_task(struct rq *rq) return 0; retry: - if (is_migration_disabled(next_task)) - return 0; - - if (WARN_ON(next_task == rq->curr)) - return 0; - /* * If next_task preempts rq->curr, and rq->curr * can move away, it makes sense to just reschedule @@ -2258,6 +2252,12 @@ retry: return 0; } + if (is_migration_disabled(next_task)) + return 0; + + if (WARN_ON(next_task == rq->curr)) + return 0; + /* We might release rq lock */ get_task_struct(next_task); |