From 4b680b3fc6f319860652532084b35b11d3868008 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Sat, 22 May 2021 00:56:23 +0900 Subject: rcu/tree: Handle VM stoppage in stall detection [ Upstream commit ccfc9dd6914feaa9a81f10f9cce56eb0f7712264 ] The soft watchdog timer function checks if a virtual machine was suspended and hence what looks like a lockup in fact is a false positive. This is what kvm_check_and_clear_guest_paused() does: it tests guest PVCLOCK_GUEST_STOPPED (which is set by the host) and if it's set then we need to touch all watchdogs and bail out. Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED check works fine. There is, however, one more watchdog that runs from IRQ, so watchdog timer fn races with it, and that watchdog is not aware of PVCLOCK_GUEST_STOPPED - RCU stall detector. apic_timer_interrupt() smp_apic_timer_interrupt() hrtimer_interrupt() __hrtimer_run_queues() tick_sched_timer() tick_sched_handle() update_process_times() rcu_sched_clock_irq() This triggers RCU stalls on our devices during VM resume. If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU before watchdog_timer_fn()->kvm_check_and_clear_guest_paused() then there is nothing on this VCPU that touches watchdogs and RCU reads stale gp stall timestamp and new jiffies value, which makes it think that RCU has stalled. Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and don't report RCU stalls when we resume the VM. Signed-off-by: Sergey Senozhatsky Signed-off-by: Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin --- kernel/rcu/tree_stall.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'kernel/rcu/tree_stall.h') diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index ca21d28a0f98..0435e5e716a8 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -7,6 +7,8 @@ * Author: Paul E. McKenney */ +#include + ////////////////////////////////////////////////////////////////////////////// // // Controlling CPU stall warnings, including delay calculation. @@ -633,6 +635,14 @@ static void check_cpu_stall(struct rcu_data *rdp) (READ_ONCE(rnp->qsmask) & rdp->grpmask) && cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like an RCU stall. Check to see if the host + * stopped the vm. + */ + if (kvm_check_and_clear_guest_paused()) + return; + /* We haven't checked in, so go dump stack. */ print_cpu_stall(gps); if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) @@ -642,6 +652,14 @@ static void check_cpu_stall(struct rcu_data *rdp) ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like an RCU stall. Check to see if the host + * stopped the vm. + */ + if (kvm_check_and_clear_guest_paused()) + return; + /* They had a few time units to dump stack, so complain. */ print_other_cpu_stall(gs2, gps); if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) -- cgit v1.2.3 From 527b56d7856fc9a6dd5a0a96bca8ce394b97bfa8 Mon Sep 17 00:00:00 2001 From: Yanfei Xu Date: Sun, 16 May 2021 00:45:11 +0800 Subject: rcu: Fix to include first blocked task in stall warning [ Upstream commit e6a901a44f76878ed1653626c9ff4cfc5a3f58f8 ] The for loop in rcu_print_task_stall() always omits ts[0], which points to the first task blocking the stalled grace period. This in turn fails to count this first task, which means that ndetected will be equal to zero when all CPUs have passed through their quiescent states and only one task is blocking the stalled grace period. This zero value for ndetected will in turn result in an incorrect "All QSes seen" message: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: Tasks blocked on level-1 rcu_node (CPUs 12-23): (detected by 15, t=6504 jiffies, g=164777, q=9011209) rcu: All QSes seen, last rcu_preempt kthread activity 1 (4295252379-4295252378), jiffies_till_next_fqs=1, root ->qsmask 0x2 BUG: sleeping function called from invalid context at include/linux/uaccess.h:156 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 70613, name: msgstress04 INFO: lockdep is turned off. Preemption disabled at: [] create_object.isra.0+0x204/0x4b0 CPU: 15 PID: 70613 Comm: msgstress04 Kdump: loaded Not tainted 5.12.2-yoctodev-standard #1 Hardware name: Marvell OcteonTX CN96XX board (DT) Call trace: dump_backtrace+0x0/0x2cc show_stack+0x24/0x30 dump_stack+0x110/0x188 ___might_sleep+0x214/0x2d0 __might_sleep+0x7c/0xe0 This commit therefore fixes the loop to include ts[0]. Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled") Tested-by: Qais Yousef Signed-off-by: Yanfei Xu Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin --- kernel/rcu/tree_stall.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'kernel/rcu/tree_stall.h') diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 0435e5e716a8..cdfaa44ffd70 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -275,8 +275,8 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags) break; } raw_spin_unlock_irqrestore_rcu_node(rnp, flags); - for (i--; i; i--) { - t = ts[i]; + while (i) { + t = ts[--i]; if (!try_invoke_on_locked_down_task(t, check_slow_task, &rscr)) pr_cont(" P%d", t->pid); else -- cgit v1.2.3 From ea5e5bc881a4de9b5e09a16af950af2584dd67d2 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 19 Nov 2020 10:13:06 -0800 Subject: rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and callees [ Upstream commit a649d25dcc671a33b9cc3176411920fdc5fbd98e ] This commit adds a number of lockdep_assert_irqs_disabled() calls to rcu_sched_clock_irq() and a number of the functions that it calls. The point of this is to help track down a situation where lockdep appears to be insisting that interrupts are enabled within these functions, which should only ever be invoked from the scheduling-clock interrupt handler. Link: https://lore.kernel.org/lkml/20201111133813.GA81547@elver.google.com/ Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin --- kernel/rcu/tree.c | 4 ++++ kernel/rcu/tree_plugin.h | 1 + kernel/rcu/tree_stall.h | 8 ++++++++ 3 files changed, 13 insertions(+) (limited to 'kernel/rcu/tree_stall.h') diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8c3ba0185082..8c81c05c4236 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2561,6 +2561,7 @@ static void rcu_do_batch(struct rcu_data *rdp) void rcu_sched_clock_irq(int user) { trace_rcu_utilization(TPS("Start scheduler-tick")); + lockdep_assert_irqs_disabled(); raw_cpu_inc(rcu_data.ticks_this_gp); /* The load-acquire pairs with the store-release setting to true. */ if (smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs))) { @@ -2574,6 +2575,7 @@ void rcu_sched_clock_irq(int user) rcu_flavor_sched_clock_irq(user); if (rcu_pending(user)) invoke_rcu_core(); + lockdep_assert_irqs_disabled(); trace_rcu_utilization(TPS("End scheduler-tick")); } @@ -3730,6 +3732,8 @@ static int rcu_pending(int user) struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_node *rnp = rdp->mynode; + lockdep_assert_irqs_disabled(); + /* Check for CPU stalls, if enabled. */ check_cpu_stall(rdp); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 7d4f78bf4057..574aeaac9272 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -682,6 +682,7 @@ static void rcu_flavor_sched_clock_irq(int user) { struct task_struct *t = current; + lockdep_assert_irqs_disabled(); if (user || rcu_is_cpu_rrupt_from_idle()) { rcu_note_voluntary_context_switch(current); } diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index cdfaa44ffd70..3fc21617546d 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -262,6 +262,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags) struct task_struct *t; struct task_struct *ts[8]; + lockdep_assert_irqs_disabled(); if (!rcu_preempt_blocked_readers_cgp(rnp)) return 0; pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):", @@ -286,6 +287,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags) ".q"[rscr.rs.b.need_qs], ".e"[rscr.rs.b.exp_hint], ".l"[rscr.on_blkd_list]); + lockdep_assert_irqs_disabled(); put_task_struct(t); ndetected++; } @@ -474,6 +476,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps) struct rcu_node *rnp; long totqlen = 0; + lockdep_assert_irqs_disabled(); + /* Kick and suppress, if so configured. */ rcu_stall_kick_kthreads(); if (rcu_stall_is_suppressed()) @@ -495,6 +499,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps) } } ndetected += rcu_print_task_stall(rnp, flags); // Releases rnp->lock. + lockdep_assert_irqs_disabled(); } for_each_possible_cpu(cpu) @@ -540,6 +545,8 @@ static void print_cpu_stall(unsigned long gps) struct rcu_node *rnp = rcu_get_root(); long totqlen = 0; + lockdep_assert_irqs_disabled(); + /* Kick and suppress, if so configured. */ rcu_stall_kick_kthreads(); if (rcu_stall_is_suppressed()) @@ -594,6 +601,7 @@ static void check_cpu_stall(struct rcu_data *rdp) unsigned long js; struct rcu_node *rnp; + lockdep_assert_irqs_disabled(); if ((rcu_stall_is_suppressed() && !READ_ONCE(rcu_kick_kthreads)) || !rcu_gp_in_progress()) return; -- cgit v1.2.3 From 497f3d9c3f58fba01a45fabe93fa8679a367b48f Mon Sep 17 00:00:00 2001 From: Yanfei Xu Date: Sun, 16 May 2021 17:50:10 +0800 Subject: rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock [ Upstream commit dc87740c8a6806bd2162bfb441770e4e53be5601 ] If rcu_print_task_stall() is invoked on an rcu_node structure that does not contain any tasks blocking the current grace period, it takes an early exit that fails to release that rcu_node structure's lock. This results in a self-deadlock, which is detected by lockdep. To reproduce this bug: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make --configs "TREE03" --kconfig "CONFIG_PROVE_LOCKING=y" --bootargs "rcutorture.stall_cpu=30 rcutorture.stall_cpu_block=1 rcutorture.fwd_progress=0 rcutorture.test_boost=0" This will also result in other complaints, including RCU's scheduler hook complaining about blocking rather than preemption and an rcutorture writer stall. Only a partial RCU CPU stall warning message will be printed because of the self-deadlock. This commit therefore releases the lock on the rcu_print_task_stall() function's early exit path. Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled") Tested-by: Qais Yousef Signed-off-by: Yanfei Xu Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin --- kernel/rcu/tree_stall.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'kernel/rcu/tree_stall.h') diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 3fc21617546d..251a9af3709a 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -263,8 +263,10 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags) struct task_struct *ts[8]; lockdep_assert_irqs_disabled(); - if (!rcu_preempt_blocked_readers_cgp(rnp)) + if (!rcu_preempt_blocked_readers_cgp(rnp)) { + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); return 0; + } pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):", rnp->level, rnp->grplo, rnp->grphi); t = list_entry(rnp->gp_tasks->prev, -- cgit v1.2.3