diff options
author | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2017-07-19 19:52:58 +0300 |
---|---|---|
committer | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2017-08-17 17:31:14 +0300 |
commit | 850bf6d59265a5b868ede7eb6c28cd1ad4640a7e (patch) | |
tree | d78d7f5c806d8d9eadae3b29dcba5d2c2c26bb51 | |
parent | 8a597d636f3ef2ddd31019b11da1c52f118babff (diff) | |
download | linux-850bf6d59265a5b868ede7eb6c28cd1ad4640a7e.tar.xz |
doc: Set down RCU's scheduling-clock-interrupt needs
This commit documents the situations in which RCU needs the
scheduling-clock interrupt to be enabled, along with the consequences
of failing to meet RCU's needs in this area.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-rw-r--r-- | Documentation/RCU/Design/Requirements/Requirements.html | 130 |
1 files changed, 130 insertions, 0 deletions
diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 95b30fa25d56..62e847bcdcdd 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -2080,6 +2080,8 @@ Some of the relevant points of interest are as follows: <li> <a href="#Scheduler and RCU">Scheduler and RCU</a>. <li> <a href="#Tracing and RCU">Tracing and RCU</a>. <li> <a href="#Energy Efficiency">Energy Efficiency</a>. +<li> <a href="#Scheduling-Clock Interrupts and RCU"> + Scheduling-Clock Interrupts and RCU</a>. <li> <a href="#Memory Efficiency">Memory Efficiency</a>. <li> <a href="#Performance, Scalability, Response Time, and Reliability"> Performance, Scalability, Response Time, and Reliability</a>. @@ -2532,6 +2534,134 @@ I learned of many of these requirements via angry phone calls: Flaming me on the Linux-kernel mailing list was apparently not sufficient to fully vent their ire at RCU's energy-efficiency bugs! +<h3><a name="Scheduling-Clock Interrupts and RCU"> +Scheduling-Clock Interrupts and RCU</a></h3> + +<p> +The kernel transitions between in-kernel non-idle execution, userspace +execution, and the idle loop. +Depending on kernel configuration, RCU handles these states differently: + +<table border=3> +<tr><th><tt>HZ</tt> Kconfig</th> + <th>In-Kernel</th> + <th>Usermode</th> + <th>Idle</th></tr> +<tr><th align="left"><tt>HZ_PERIODIC</tt></th> + <td>Can rely on scheduling-clock interrupt.</td> + <td>Can rely on scheduling-clock interrupt and its + detection of interrupt from usermode.</td> + <td>Can rely on RCU's dyntick-idle detection.</td></tr> +<tr><th align="left"><tt>NO_HZ_IDLE</tt></th> + <td>Can rely on scheduling-clock interrupt.</td> + <td>Can rely on scheduling-clock interrupt and its + detection of interrupt from usermode.</td> + <td>Can rely on RCU's dyntick-idle detection.</td></tr> +<tr><th align="left"><tt>NO_HZ_FULL</tt></th> + <td>Can only sometimes rely on scheduling-clock interrupt. + In other cases, it is necessary to bound kernel execution + times and/or use IPIs.</td> + <td>Can rely on RCU's dyntick-idle detection.</td> + <td>Can rely on RCU's dyntick-idle detection.</td></tr> +</table> + +<table> +<tr><th> </th></tr> +<tr><th align="left">Quick Quiz:</th></tr> +<tr><td> + Why can't <tt>NO_HZ_FULL</tt> in-kernel execution rely on the + scheduling-clock interrupt, just like <tt>HZ_PERIODIC</tt> + and <tt>NO_HZ_IDLE</tt> do? +</td></tr> +<tr><th align="left">Answer:</th></tr> +<tr><td bgcolor="#ffffff"><font color="ffffff"> + Because, as a performance optimization, <tt>NO_HZ_FULL</tt> + does not necessarily re-enable the scheduling-clock interrupt + on entry to each and every system call. +</font></td></tr> +<tr><td> </td></tr> +</table> + +<p> +However, RCU must be reliably informed as to whether any given +CPU is currently in the idle loop, and, for <tt>NO_HZ_FULL</tt>, +also whether that CPU is executing in usermode, as discussed +<a href="#Energy Efficiency">earlier</a>. +It also requires that the scheduling-clock interrupt be enabled when +RCU needs it to be: + +<ol> +<li> If a CPU is either idle or executing in usermode, and RCU believes + it is non-idle, the scheduling-clock tick had better be running. + Otherwise, you will get RCU CPU stall warnings. Or at best, + very long (11-second) grace periods, with a pointless IPI waking + the CPU from time to time. +<li> If a CPU is in a portion of the kernel that executes RCU read-side + critical sections, and RCU believes this CPU to be idle, you will get + random memory corruption. <b>DON'T DO THIS!!!</b> + + <br>This is one reason to test with lockdep, which will complain + about this sort of thing. +<li> If a CPU is in a portion of the kernel that is absolutely + positively no-joking guaranteed to never execute any RCU read-side + critical sections, and RCU believes this CPU to to be idle, + no problem. This sort of thing is used by some architectures + for light-weight exception handlers, which can then avoid the + overhead of <tt>rcu_irq_enter()</tt> and <tt>rcu_irq_exit()</tt> + at exception entry and exit, respectively. + Some go further and avoid the entireties of <tt>irq_enter()</tt> + and <tt>irq_exit()</tt>. + + <br>Just make very sure you are running some of your tests with + <tt>CONFIG_PROVE_RCU=y</tt>, just in case one of your code paths + was in fact joking about not doing RCU read-side critical sections. +<li> If a CPU is executing in the kernel with the scheduling-clock + interrupt disabled and RCU believes this CPU to be non-idle, + and if the CPU goes idle (from an RCU perspective) every few + jiffies, no problem. It is usually OK for there to be the + occasional gap between idle periods of up to a second or so. + + <br>If the gap grows too long, you get RCU CPU stall warnings. +<li> If a CPU is either idle or executing in usermode, and RCU believes + it to be idle, of course no problem. +<li> If a CPU is executing in the kernel, the kernel code + path is passing through quiescent states at a reasonable + frequency (preferably about once per few jiffies, but the + occasional excursion to a second or so is usually OK) and the + scheduling-clock interrupt is enabled, of course no problem. + + <br>If the gap between a successive pair of quiescent states grows + too long, you get RCU CPU stall warnings. +</ol> + +<table> +<tr><th> </th></tr> +<tr><th align="left">Quick Quiz:</th></tr> +<tr><td> + But what if my driver has a hardware interrupt handler + that can run for many seconds? + I cannot invoke <tt>schedule()</tt> from an hardware + interrupt handler, after all! +</td></tr> +<tr><th align="left">Answer:</th></tr> +<tr><td bgcolor="#ffffff"><font color="ffffff"> + One approach is to do <tt>rcu_irq_exit();rcu_irq_enter();</tt> + every so often. + But given that long-running interrupt handlers can cause + other problems, not least for response time, shouldn't you + work to keep your interrupt handler's runtime within reasonable + bounds? +</font></td></tr> +<tr><td> </td></tr> +</table> + +<p> +But as long as RCU is properly informed of kernel state transitions between +in-kernel execution, usermode execution, and idle, and as long as the +scheduling-clock interrupt is enabled when RCU needs it to be, you +can rest assured that the bugs you encounter will be in some other +part of RCU or some other part of the kernel! + <h3><a name="Memory Efficiency">Memory Efficiency</a></h3> <p> |