diff options
-rw-r--r-- | Documentation/RCU/Design/Requirements/Requirements.rst | 61 | ||||
-rw-r--r-- | kernel/rcu/tree_plugin.h | 17 |
2 files changed, 24 insertions, 54 deletions
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst index fd5e2cbc4935..75b8ca007a11 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.rst +++ b/Documentation/RCU/Design/Requirements/Requirements.rst @@ -1943,56 +1943,27 @@ invoked from a CPU-hotplug notifier. Scheduler and RCU ~~~~~~~~~~~~~~~~~ -RCU depends on the scheduler, and the scheduler uses RCU to protect some -of its data structures. The preemptible-RCU ``rcu_read_unlock()`` -implementation must therefore be written carefully to avoid deadlocks -involving the scheduler's runqueue and priority-inheritance locks. In -particular, ``rcu_read_unlock()`` must tolerate an interrupt where the -interrupt handler invokes both ``rcu_read_lock()`` and -``rcu_read_unlock()``. This possibility requires ``rcu_read_unlock()`` -to use negative nesting levels to avoid destructive recursion via -interrupt handler's use of RCU. - -This scheduler-RCU requirement came as a `complete -surprise <https://lwn.net/Articles/453002/>`__. - -As noted above, RCU makes use of kthreads, and it is necessary to avoid -excessive CPU-time accumulation by these kthreads. This requirement was -no surprise, but RCU's violation of it when running context-switch-heavy -workloads when built with ``CONFIG_NO_HZ_FULL=y`` `did come as a -surprise +RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time +accumulation by these kthreads. This requirement was no surprise, but +RCU's violation of it when running context-switch-heavy workloads when +built with ``CONFIG_NO_HZ_FULL=y`` `did come as a surprise [PDF] <http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf>`__. RCU has made good progress towards meeting this requirement, even for context-switch-heavy ``CONFIG_NO_HZ_FULL=y`` workloads, but there is room for further improvement. -It is forbidden to hold any of scheduler's runqueue or -priority-inheritance spinlocks across an ``rcu_read_unlock()`` unless -interrupts have been disabled across the entire RCU read-side critical -section, that is, up to and including the matching ``rcu_read_lock()``. -Violating this restriction can result in deadlocks involving these -scheduler spinlocks. There was hope that this restriction might be -lifted when interrupt-disabled calls to ``rcu_read_unlock()`` started -deferring the reporting of the resulting RCU-preempt quiescent state -until the end of the corresponding interrupts-disabled region. -Unfortunately, timely reporting of the corresponding quiescent state to -expedited grace periods requires a call to ``raise_softirq()``, which -can acquire these scheduler spinlocks. In addition, real-time systems -using RCU priority boosting need this restriction to remain in effect -because deferred quiescent-state reporting would also defer deboosting, -which in turn would degrade real-time latencies. - -In theory, if a given RCU read-side critical section could be guaranteed -to be less than one second in duration, holding a scheduler spinlock -across that critical section's ``rcu_read_unlock()`` would require only -that preemption be disabled across the entire RCU read-side critical -section, not interrupts. Unfortunately, given the possibility of vCPU -preemption, long-running interrupts, and so on, it is not possible in -practice to guarantee that a given RCU read-side critical section will -complete in less than one second. Therefore, as noted above, if -scheduler spinlocks are held across a given call to -``rcu_read_unlock()``, interrupts must be disabled across the entire RCU -read-side critical section. +There is no longer any prohibition against holding any of +scheduler's runqueue or priority-inheritance spinlocks across an +``rcu_read_unlock()``, even if interrupts and preemption were enabled +somewhere within the corresponding RCU read-side critical section. +Therefore, it is now perfectly legal to execute ``rcu_read_lock()`` +with preemption enabled, acquire one of the scheduler locks, and hold +that lock across the matching ``rcu_read_unlock()``. + +Similarly, the RCU flavor consolidation has removed the need for negative +nesting. The fact that interrupt-disabled regions of code act as RCU +read-side critical sections implicitly avoids earlier issues that used +to result in destructive recursion via interrupt handler's use of RCU. Tracing and RCU ~~~~~~~~~~~~~~~ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 097635c41135..ccad77639d80 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -615,19 +615,18 @@ static void rcu_read_unlock_special(struct task_struct *t) struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_node *rnp = rdp->mynode; - exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) || - (rdp->grpmask & READ_ONCE(rnp->expmask)) || - tick_nohz_full_cpu(rdp->cpu); + exp = (t->rcu_blocked_node && + READ_ONCE(t->rcu_blocked_node->exp_tasks)) || + (rdp->grpmask & READ_ONCE(rnp->expmask)); // Need to defer quiescent state until everything is enabled. - if (irqs_were_disabled && use_softirq && - (in_interrupt() || - (exp && !t->rcu_read_unlock_special.b.deferred_qs))) { - // Using softirq, safe to awaken, and we get - // no help from enabling irqs, unlike bh/preempt. + if (use_softirq && (in_irq() || (exp && !irqs_were_disabled))) { + // Using softirq, safe to awaken, and either the + // wakeup is free or there is an expedited GP. raise_softirq_irqoff(RCU_SOFTIRQ); } else { // Enabling BH or preempt does reschedule, so... - // Also if no expediting or NO_HZ_FULL, slow is OK. + // Also if no expediting, slow is OK. + // Plus nohz_full CPUs eventually get tick enabled. set_tsk_need_resched(current); set_preempt_need_resched(); if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled && |