From a00e0d714fbded07a7a2254391ce9ed5a5cb9d82 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 8 Feb 2011 17:14:39 -0800 Subject: rcu: Remove conditional compilation for RCU CPU stall warnings The RCU CPU stall warnings can now be controlled using the rcu_cpu_stall_suppress boot-time parameter or via the same parameter from sysfs. There is therefore no longer any reason to have kernel config parameters for this feature. This commit therefore removes the RCU_CPU_STALL_DETECTOR and RCU_CPU_STALL_DETECTOR_RUNNABLE kernel config parameters. The RCU_CPU_STALL_TIMEOUT parameter remains to allow the timeout to be tuned and the RCU_CPU_STALL_VERBOSE parameter remains to allow task-stall information to be suppressed if desired. Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- Documentation/RCU/00-INDEX | 2 +- Documentation/RCU/stallwarn.txt | 23 +++++++++++++---------- 2 files changed, 14 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX index 71b6f500ddb9..1d7a885761f5 100644 --- a/Documentation/RCU/00-INDEX +++ b/Documentation/RCU/00-INDEX @@ -21,7 +21,7 @@ rcu.txt RTFP.txt - List of RCU papers (bibliography) going back to 1980. stallwarn.txt - - RCU CPU stall warnings (CONFIG_RCU_CPU_STALL_DETECTOR) + - RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress) torture.txt - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST) trace.txt diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 862c08ef1fde..4e959208f736 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -1,22 +1,25 @@ Using RCU's CPU Stall Detector -The CONFIG_RCU_CPU_STALL_DETECTOR kernel config parameter enables -RCU's CPU stall detector, which detects conditions that unduly delay -RCU grace periods. The stall detector's idea of what constitutes -"unduly delayed" is controlled by a set of C preprocessor macros: +The rcu_cpu_stall_suppress module parameter enables RCU's CPU stall +detector, which detects conditions that unduly delay RCU grace periods. +This module parameter enables CPU stall detection by default, but +may be overridden via boot-time parameter or at runtime via sysfs. +The stall detector's idea of what constitutes "unduly delayed" is +controlled by a set of kernel configuration variables and cpp macros: -RCU_SECONDS_TILL_STALL_CHECK +CONFIG_RCU_CPU_STALL_TIMEOUT - This macro defines the period of time that RCU will wait from - the beginning of a grace period until it issues an RCU CPU - stall warning. This time period is normally ten seconds. + This kernel configuration parameter defines the period of time + that RCU will wait from the beginning of a grace period until it + issues an RCU CPU stall warning. This time period is normally + ten seconds. RCU_SECONDS_TILL_STALL_RECHECK This macro defines the period of time that RCU will wait after issuing a stall warning until it issues another stall warning - for the same stall. This time period is normally set to thirty - seconds. + for the same stall. This time period is normally set to three + times the check interval plus thirty seconds. RCU_STALL_RAT_DELAY -- cgit v1.2.3 From e59fb3120becfb36b22ddb8bd27d065d3cdca499 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 7 Sep 2010 10:38:22 -0700 Subject: rcu: Decrease memory-barrier usage based on semi-formal proof Commit d09b62d fixed grace-period synchronization, but left some smp_mb() invocations in rcu_process_callbacks() that are no longer needed, but sheer paranoia prevented them from being removed. This commit removes them and provides a proof of correctness in their absence. It also adds a memory barrier to rcu_report_qs_rsp() immediately before the update to rsp->completed in order to handle the theoretical possibility that the compiler or CPU might move massive quantities of code into a lock-based critical section. This also proves that the sheer paranoia was not entirely unjustified, at least from a theoretical point of view. In addition, the old dyntick-idle synchronization depended on the fact that grace periods were many milliseconds in duration, so that it could be assumed that no dyntick-idle CPU could reorder a memory reference across an entire grace period. Unfortunately for this design, the addition of expedited grace periods breaks this assumption, which has the unfortunate side-effect of requiring atomic operations in the functions that track dyntick-idle state for RCU. (There is some hope that the algorithms used in user-level RCU might be applied here, but some work is required to handle the NMIs that user-space applications can happily ignore. For the short term, better safe than sorry.) This proof assumes that neither compiler nor CPU will allow a lock acquisition and release to be reordered, as doing so can result in deadlock. The proof is as follows: 1. A given CPU declares a quiescent state under the protection of its leaf rcu_node's lock. 2. If there is more than one level of rcu_node hierarchy, the last CPU to declare a quiescent state will also acquire the ->lock of the next rcu_node up in the hierarchy, but only after releasing the lower level's lock. The acquisition of this lock clearly cannot occur prior to the acquisition of the leaf node's lock. 3. Step 2 repeats until we reach the root rcu_node structure. Please note again that only one lock is held at a time through this process. The acquisition of the root rcu_node's ->lock must occur after the release of that of the leaf rcu_node. 4. At this point, we set the ->completed field in the rcu_state structure in rcu_report_qs_rsp(). However, if the rcu_node hierarchy contains only one rcu_node, then in theory the code preceding the quiescent state could leak into the critical section. We therefore precede the update of ->completed with a memory barrier. All CPUs will therefore agree that any updates preceding any report of a quiescent state will have happened before the update of ->completed. 5. Regardless of whether a new grace period is needed, rcu_start_gp() will propagate the new value of ->completed to all of the leaf rcu_node structures, under the protection of each rcu_node's ->lock. If a new grace period is needed immediately, this propagation will occur in the same critical section that ->completed was set in, but courtesy of the memory barrier in #4 above, is still seen to follow any pre-quiescent-state activity. 6. When a given CPU invokes __rcu_process_gp_end(), it becomes aware of the end of the old grace period and therefore makes any RCU callbacks that were waiting on that grace period eligible for invocation. If this CPU is the same one that detected the end of the grace period, and if there is but a single rcu_node in the hierarchy, we will still be in the single critical section. In this case, the memory barrier in step #4 guarantees that all callbacks will be seen to execute after each CPU's quiescent state. On the other hand, if this is a different CPU, it will acquire the leaf rcu_node's ->lock, and will again be serialized after each CPU's quiescent state for the old grace period. On the strength of this proof, this commit therefore removes the memory barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp(). The effect is to reduce the number of memory barriers by one and to reduce the frequency of execution from about once per scheduling tick per CPU to once per grace period. Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- Documentation/RCU/trace.txt | 48 +++++++--------- kernel/rcutree.c | 130 +++++++++++++++++++------------------------- kernel/rcutree.h | 9 +-- kernel/rcutree_plugin.h | 7 +-- kernel/rcutree_trace.c | 12 ++-- 5 files changed, 88 insertions(+), 118 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 6a8c73f55b80..e731ad20d166 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -21,23 +21,23 @@ rcu_pending() function decided that there was core RCU work to do). The output of "cat rcu/rcudata" looks as follows: rcu_sched: - 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10 - 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10 - 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10 - 3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1 dn=0 df=1545 of=0 ri=0 ql=0 b=10 - 4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1 dn=0 df=1992 of=0 ri=0 ql=0 b=10 - 5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1 dn=0 df=3331 of=0 ri=4 ql=2 b=10 - 6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1 dn=0 df=3224 of=0 ri=0 ql=0 b=10 - 7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1 dn=0 df=1818 of=0 ri=0 ql=2 b=10 + 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1/0 df=1101 of=0 ri=36 ql=0 b=10 + 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1/0 df=1015 of=0 ri=0 ql=0 b=10 + 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1/0 df=1839 of=0 ri=0 ql=0 b=10 + 3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1/0 df=1545 of=0 ri=0 ql=0 b=10 + 4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1/0 df=1992 of=0 ri=0 ql=0 b=10 + 5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1/0 df=3331 of=0 ri=4 ql=2 b=10 + 6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1/0 df=3224 of=0 ri=0 ql=0 b=10 + 7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1/0 df=1818 of=0 ri=0 ql=2 b=10 rcu_bh: - 0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1 dn=0 df=0 of=0 ri=0 ql=0 b=10 - 1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1 dn=0 df=13 of=0 ri=0 ql=0 b=10 - 2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 - 3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1 dn=0 df=9 of=0 ri=0 ql=0 b=10 - 4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 - 5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 - 6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 - 7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 + 0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1/0 df=0 of=0 ri=0 ql=0 b=10 + 1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1/0 df=13 of=0 ri=0 ql=0 b=10 + 2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1/0 df=15 of=0 ri=0 ql=0 b=10 + 3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1/0 df=9 of=0 ri=0 ql=0 b=10 + 4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1/0 df=15 of=0 ri=0 ql=0 b=10 + 5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1/0 df=15 of=0 ri=0 ql=0 b=10 + 6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1/0 df=15 of=0 ri=0 ql=0 b=10 + 7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1/0 df=15 of=0 ri=0 ql=0 b=10 The first section lists the rcu_data structures for rcu_sched, the second for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an @@ -85,18 +85,10 @@ o "qp" indicates that RCU still expects a quiescent state from o "dt" is the current value of the dyntick counter that is incremented when entering or leaving dynticks idle state, either by the - scheduler or by irq. The number after the "/" is the interrupt - nesting depth when in dyntick-idle state, or one greater than - the interrupt-nesting depth otherwise. - - This field is displayed only for CONFIG_NO_HZ kernels. - -o "dn" is the current value of the dyntick counter that is incremented - when entering or leaving dynticks idle state via NMI. If both - the "dt" and "dn" values are even, then this CPU is in dynticks - idle mode and may be ignored by RCU. If either of these two - counters is odd, then RCU must be alert to the possibility of - an RCU read-side critical section running on this CPU. + scheduler or by irq. The number after the first "/" is the + interrupt nesting depth when in dyntick-idle state, or one + greater than the interrupt-nesting depth otherwise. The number + after the second "/" is the NMI nesting depth. This field is displayed only for CONFIG_NO_HZ kernels. diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 18f7a593d4c7..90104a19c564 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -128,7 +128,7 @@ void rcu_note_context_switch(int cpu) #ifdef CONFIG_NO_HZ DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { .dynticks_nesting = 1, - .dynticks = 1, + .dynticks = ATOMIC_INIT(1), }; #endif /* #ifdef CONFIG_NO_HZ */ @@ -262,13 +262,25 @@ void rcu_enter_nohz(void) unsigned long flags; struct rcu_dynticks *rdtp; - smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */ local_irq_save(flags); rdtp = &__get_cpu_var(rcu_dynticks); - rdtp->dynticks++; - rdtp->dynticks_nesting--; - WARN_ON_ONCE(rdtp->dynticks & 0x1); + if (--rdtp->dynticks_nesting) { + local_irq_restore(flags); + return; + } + /* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */ + smp_mb__before_atomic_inc(); /* See above. */ + atomic_inc(&rdtp->dynticks); + smp_mb__after_atomic_inc(); /* Force ordering with next sojourn. */ + WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1); local_irq_restore(flags); + + /* If the interrupt queued a callback, get out of dyntick mode. */ + if (in_irq() && + (__get_cpu_var(rcu_sched_data).nxtlist || + __get_cpu_var(rcu_bh_data).nxtlist || + rcu_preempt_needs_cpu(smp_processor_id()))) + set_need_resched(); } /* @@ -284,11 +296,16 @@ void rcu_exit_nohz(void) local_irq_save(flags); rdtp = &__get_cpu_var(rcu_dynticks); - rdtp->dynticks++; - rdtp->dynticks_nesting++; - WARN_ON_ONCE(!(rdtp->dynticks & 0x1)); + if (rdtp->dynticks_nesting++) { + local_irq_restore(flags); + return; + } + smp_mb__before_atomic_inc(); /* Force ordering w/previous sojourn. */ + atomic_inc(&rdtp->dynticks); + /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */ + smp_mb__after_atomic_inc(); /* See above. */ + WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1)); local_irq_restore(flags); - smp_mb(); /* CPUs seeing ++ must see later RCU read-side crit sects */ } /** @@ -302,11 +319,15 @@ void rcu_nmi_enter(void) { struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks); - if (rdtp->dynticks & 0x1) + if (rdtp->dynticks_nmi_nesting == 0 && + (atomic_read(&rdtp->dynticks) & 0x1)) return; - rdtp->dynticks_nmi++; - WARN_ON_ONCE(!(rdtp->dynticks_nmi & 0x1)); - smp_mb(); /* CPUs seeing ++ must see later RCU read-side crit sects */ + rdtp->dynticks_nmi_nesting++; + smp_mb__before_atomic_inc(); /* Force delay from prior write. */ + atomic_inc(&rdtp->dynticks); + /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */ + smp_mb__after_atomic_inc(); /* See above. */ + WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1)); } /** @@ -320,11 +341,14 @@ void rcu_nmi_exit(void) { struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks); - if (rdtp->dynticks & 0x1) + if (rdtp->dynticks_nmi_nesting == 0 || + --rdtp->dynticks_nmi_nesting != 0) return; - smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */ - rdtp->dynticks_nmi++; - WARN_ON_ONCE(rdtp->dynticks_nmi & 0x1); + /* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */ + smp_mb__before_atomic_inc(); /* See above. */ + atomic_inc(&rdtp->dynticks); + smp_mb__after_atomic_inc(); /* Force delay to next write. */ + WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1); } /** @@ -335,13 +359,7 @@ void rcu_nmi_exit(void) */ void rcu_irq_enter(void) { - struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks); - - if (rdtp->dynticks_nesting++) - return; - rdtp->dynticks++; - WARN_ON_ONCE(!(rdtp->dynticks & 0x1)); - smp_mb(); /* CPUs seeing ++ must see later RCU read-side crit sects */ + rcu_exit_nohz(); } /** @@ -353,18 +371,7 @@ void rcu_irq_enter(void) */ void rcu_irq_exit(void) { - struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks); - - if (--rdtp->dynticks_nesting) - return; - smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */ - rdtp->dynticks++; - WARN_ON_ONCE(rdtp->dynticks & 0x1); - - /* If the interrupt queued a callback, get out of dyntick mode. */ - if (__this_cpu_read(rcu_sched_data.nxtlist) || - __this_cpu_read(rcu_bh_data.nxtlist)) - set_need_resched(); + rcu_enter_nohz(); } #ifdef CONFIG_SMP @@ -376,19 +383,8 @@ void rcu_irq_exit(void) */ static int dyntick_save_progress_counter(struct rcu_data *rdp) { - int ret; - int snap; - int snap_nmi; - - snap = rdp->dynticks->dynticks; - snap_nmi = rdp->dynticks->dynticks_nmi; - smp_mb(); /* Order sampling of snap with end of grace period. */ - rdp->dynticks_snap = snap; - rdp->dynticks_nmi_snap = snap_nmi; - ret = ((snap & 0x1) == 0) && ((snap_nmi & 0x1) == 0); - if (ret) - rdp->dynticks_fqs++; - return ret; + rdp->dynticks_snap = atomic_add_return(0, &rdp->dynticks->dynticks); + return 0; } /* @@ -399,16 +395,11 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp) */ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) { - long curr; - long curr_nmi; - long snap; - long snap_nmi; + unsigned long curr; + unsigned long snap; - curr = rdp->dynticks->dynticks; - snap = rdp->dynticks_snap; - curr_nmi = rdp->dynticks->dynticks_nmi; - snap_nmi = rdp->dynticks_nmi_snap; - smp_mb(); /* force ordering with cpu entering/leaving dynticks. */ + curr = (unsigned long)atomic_add_return(0, &rdp->dynticks->dynticks); + snap = (unsigned long)rdp->dynticks_snap; /* * If the CPU passed through or entered a dynticks idle phase with @@ -418,8 +409,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) * read-side critical section that started before the beginning * of the current RCU grace period. */ - if ((curr != snap || (curr & 0x1) == 0) && - (curr_nmi != snap_nmi || (curr_nmi & 0x1) == 0)) { + if ((curr & 0x1) == 0 || ULONG_CMP_GE(curr, snap + 2)) { rdp->dynticks_fqs++; return 1; } @@ -841,6 +831,12 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags) __releases(rcu_get_root(rsp)->lock) { WARN_ON_ONCE(!rcu_gp_in_progress(rsp)); + + /* + * Ensure that all grace-period and pre-grace-period activity + * is seen before the assignment to rsp->completed. + */ + smp_mb(); /* See above block comment. */ rsp->completed = rsp->gpnum; rsp->signaled = RCU_GP_IDLE; rcu_start_gp(rsp, flags); /* releases root node's rnp->lock. */ @@ -1367,25 +1363,11 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp) */ static void rcu_process_callbacks(struct softirq_action *unused) { - /* - * Memory references from any prior RCU read-side critical sections - * executed by the interrupted code must be seen before any RCU - * grace-period manipulations below. - */ - smp_mb(); /* See above block comment. */ - __rcu_process_callbacks(&rcu_sched_state, &__get_cpu_var(rcu_sched_data)); __rcu_process_callbacks(&rcu_bh_state, &__get_cpu_var(rcu_bh_data)); rcu_preempt_process_callbacks(); - /* - * Memory references from any later RCU read-side critical sections - * executed by the interrupted code must be seen after any RCU - * grace-period manipulations above. - */ - smp_mb(); /* See above block comment. */ - /* If we are last CPU on way to dyntick-idle mode, accelerate it. */ rcu_needs_cpu_flush(); } diff --git a/kernel/rcutree.h b/kernel/rcutree.h index e1a6663c0155..bd891def3303 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -84,11 +84,9 @@ * Dynticks per-CPU state. */ struct rcu_dynticks { - int dynticks_nesting; /* Track nesting level, sort of. */ - int dynticks; /* Even value for dynticks-idle, else odd. */ - int dynticks_nmi; /* Even value for either dynticks-idle or */ - /* not in nmi handler, else odd. So this */ - /* remains even for nmi from irq handler. */ + int dynticks_nesting; /* Track irq/process nesting level. */ + int dynticks_nmi_nesting; /* Track NMI nesting level. */ + atomic_t dynticks; /* Even value for dynticks-idle, else odd. */ }; /* @@ -218,7 +216,6 @@ struct rcu_data { /* 3) dynticks interface. */ struct rcu_dynticks *dynticks; /* Shared per-CPU dynticks state. */ int dynticks_snap; /* Per-GP tracking for dynticks. */ - int dynticks_nmi_snap; /* Per-GP tracking for dynticks_nmi. */ #endif /* #ifdef CONFIG_NO_HZ */ /* 4) reasons this CPU needed to be kicked by force_quiescent_state */ diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index 38426ef1bcd6..764b5fcc7c56 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -1182,7 +1182,6 @@ int rcu_needs_cpu(int cpu) { int c = 0; int snap; - int snap_nmi; int thatcpu; /* Check for being in the holdoff period. */ @@ -1193,10 +1192,10 @@ int rcu_needs_cpu(int cpu) for_each_online_cpu(thatcpu) { if (thatcpu == cpu) continue; - snap = per_cpu(rcu_dynticks, thatcpu).dynticks; - snap_nmi = per_cpu(rcu_dynticks, thatcpu).dynticks_nmi; + snap = atomic_add_return(0, &per_cpu(rcu_dynticks, + thatcpu).dynticks); smp_mb(); /* Order sampling of snap with end of grace period. */ - if (((snap & 0x1) != 0) || ((snap_nmi & 0x1) != 0)) { + if ((snap & 0x1) != 0) { per_cpu(rcu_dyntick_drain, cpu) = 0; per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1; return rcu_needs_cpu_quick_check(cpu); diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c index c8e97853b970..4a21ca55ef7c 100644 --- a/kernel/rcutree_trace.c +++ b/kernel/rcutree_trace.c @@ -57,10 +57,10 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp) rdp->passed_quiesc, rdp->passed_quiesc_completed, rdp->qs_pending); #ifdef CONFIG_NO_HZ - seq_printf(m, " dt=%d/%d dn=%d df=%lu", - rdp->dynticks->dynticks, + seq_printf(m, " dt=%d/%d/%d df=%lu", + atomic_read(&rdp->dynticks->dynticks), rdp->dynticks->dynticks_nesting, - rdp->dynticks->dynticks_nmi, + rdp->dynticks->dynticks_nmi_nesting, rdp->dynticks_fqs); #endif /* #ifdef CONFIG_NO_HZ */ seq_printf(m, " of=%lu ri=%lu", rdp->offline_fqs, rdp->resched_ipi); @@ -115,9 +115,9 @@ static void print_one_rcu_data_csv(struct seq_file *m, struct rcu_data *rdp) rdp->qs_pending); #ifdef CONFIG_NO_HZ seq_printf(m, ",%d,%d,%d,%lu", - rdp->dynticks->dynticks, + atomic_read(&rdp->dynticks->dynticks), rdp->dynticks->dynticks_nesting, - rdp->dynticks->dynticks_nmi, + rdp->dynticks->dynticks_nmi_nesting, rdp->dynticks_fqs); #endif /* #ifdef CONFIG_NO_HZ */ seq_printf(m, ",%lu,%lu", rdp->offline_fqs, rdp->resched_ipi); @@ -130,7 +130,7 @@ static int show_rcudata_csv(struct seq_file *m, void *unused) { seq_puts(m, "\"CPU\",\"Online?\",\"c\",\"g\",\"pq\",\"pqc\",\"pq\","); #ifdef CONFIG_NO_HZ - seq_puts(m, "\"dt\",\"dt nesting\",\"dn\",\"df\","); + seq_puts(m, "\"dt\",\"dt nesting\",\"dt NMI nesting\",\"df\","); #endif /* #ifdef CONFIG_NO_HZ */ seq_puts(m, "\"of\",\"ri\",\"ql\",\"b\",\"ci\",\"co\",\"ca\"\n"); #ifdef CONFIG_TREE_PREEMPT_RCU -- cgit v1.2.3 From 12f5f524cafef3ab689929b118f2dfb8bf2be321 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 29 Nov 2010 21:56:39 -0800 Subject: rcu: merge TREE_PREEPT_RCU blocked_tasks[] lists Combine the current TREE_PREEMPT_RCU ->blocked_tasks[] lists in the rcu_node structure into a single ->blkd_tasks list with ->gp_tasks and ->exp_tasks tail pointers. This is in preparation for RCU priority boosting, which will add a third dimension to the combinatorial explosion in the ->blocked_tasks[] case, but simply a third pointer in the new ->blkd_tasks case. Also update documentation to reflect blocked_tasks[] merge Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- Documentation/RCU/trace.txt | 29 +++++--- kernel/rcutree.c | 5 +- kernel/rcutree.h | 21 ++++-- kernel/rcutree_plugin.h | 163 ++++++++++++++++++++++++++------------------ kernel/rcutree_trace.c | 11 ++- 5 files changed, 135 insertions(+), 94 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index e731ad20d166..5a704ffd0bbc 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -166,14 +166,14 @@ o "gpnum" is the number of grace periods that have started. It is The output of "cat rcu/rcuhier" looks as follows, with very long lines: c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 -1/1 .>. 0:127 ^0 -3/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 -3/3f .>. 0:5 ^0 2/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3 +1/1 ..>. 0:127 ^0 +3/3 ..>. 0:35 ^0 0/0 ..>. 36:71 ^1 0/0 ..>. 72:107 ^2 0/0 ..>. 108:127 ^3 +3/3f ..>. 0:5 ^0 2/3 ..>. 6:11 ^1 0/0 ..>. 12:17 ^2 0/0 ..>. 18:23 ^3 0/0 ..>. 24:29 ^4 0/0 ..>. 30:35 ^5 0/0 ..>. 36:41 ^0 0/0 ..>. 42:47 ^1 0/0 ..>. 48:53 ^2 0/0 ..>. 54:59 ^3 0/0 ..>. 60:65 ^4 0/0 ..>. 66:71 ^5 0/0 ..>. 72:77 ^0 0/0 ..>. 78:83 ^1 0/0 ..>. 84:89 ^2 0/0 ..>. 90:95 ^3 0/0 ..>. 96:101 ^4 0/0 ..>. 102:107 ^5 0/0 ..>. 108:113 ^0 0/0 ..>. 114:119 ^1 0/0 ..>. 120:125 ^2 0/0 ..>. 126:127 ^3 rcu_bh: c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 -0/1 .>. 0:127 ^0 -0/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 -0/3f .>. 0:5 ^0 0/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3 +0/1 ..>. 0:127 ^0 +0/3 ..>. 0:35 ^0 0/0 ..>. 36:71 ^1 0/0 ..>. 72:107 ^2 0/0 ..>. 108:127 ^3 +0/3f ..>. 0:5 ^0 0/3 ..>. 6:11 ^1 0/0 ..>. 12:17 ^2 0/0 ..>. 18:23 ^3 0/0 ..>. 24:29 ^4 0/0 ..>. 30:35 ^5 0/0 ..>. 36:41 ^0 0/0 ..>. 42:47 ^1 0/0 ..>. 48:53 ^2 0/0 ..>. 54:59 ^3 0/0 ..>. 60:65 ^4 0/0 ..>. 66:71 ^5 0/0 ..>. 72:77 ^0 0/0 ..>. 78:83 ^1 0/0 ..>. 84:89 ^2 0/0 ..>. 90:95 ^3 0/0 ..>. 96:101 ^4 0/0 ..>. 102:107 ^5 0/0 ..>. 108:113 ^0 0/0 ..>. 114:119 ^1 0/0 ..>. 120:125 ^2 0/0 ..>. 126:127 ^3 This is once again split into "rcu_sched" and "rcu_bh" portions, and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional @@ -232,13 +232,20 @@ o Each element of the form "1/1 0:127 ^0" represents one struct current grace period. o The characters separated by the ">" indicate the state - of the blocked-tasks lists. A "T" preceding the ">" + of the blocked-tasks lists. A "G" preceding the ">" indicates that at least one task blocked in an RCU read-side critical section blocks the current grace - period, while a "." preceding the ">" indicates otherwise. - The character following the ">" indicates similarly for - the next grace period. A "T" should appear in this - field only for rcu-preempt. + period, while a "E" preceding the ">" indicates that + at least one task blocked in an RCU read-side critical + section blocks the current expedited grace period. + A "T" character following the ">" indicates that at + least one task is blocked within an RCU read-side + critical section, regardless of whether any current + grace period (expedited or normal) is inconvenienced. + A "." character appears if the corresponding condition + does not hold, so that "..>." indicates that no tasks + are blocked. In contrast, "GE>T" indicates maximal + inconvenience from blocked tasks. o The numbers separated by the ":" are the range of CPUs served by this struct rcu_node. This can be helpful diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 90104a19c564..0ac1cc03f935 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1901,10 +1901,7 @@ static void __init rcu_init_one(struct rcu_state *rsp, j / rsp->levelspread[i - 1]; } rnp->level = i; - INIT_LIST_HEAD(&rnp->blocked_tasks[0]); - INIT_LIST_HEAD(&rnp->blocked_tasks[1]); - INIT_LIST_HEAD(&rnp->blocked_tasks[2]); - INIT_LIST_HEAD(&rnp->blocked_tasks[3]); + INIT_LIST_HEAD(&rnp->blkd_tasks); } } diff --git a/kernel/rcutree.h b/kernel/rcutree.h index bd891def3303..5a439c180e69 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -107,7 +107,7 @@ struct rcu_node { /* an rcu_data structure, otherwise, each */ /* bit corresponds to a child rcu_node */ /* structure. */ - unsigned long expmask; /* Groups that have ->blocked_tasks[] */ + unsigned long expmask; /* Groups that have ->blkd_tasks */ /* elements that need to drain to allow the */ /* current expedited grace period to */ /* complete (only for TREE_PREEMPT_RCU). */ @@ -120,11 +120,20 @@ struct rcu_node { u8 grpnum; /* CPU/group number for next level up. */ u8 level; /* root is at level 0. */ struct rcu_node *parent; - struct list_head blocked_tasks[4]; - /* Tasks blocked in RCU read-side critsect. */ - /* Grace period number (->gpnum) x blocked */ - /* by tasks on the (x & 0x1) element of the */ - /* blocked_tasks[] array. */ + struct list_head blkd_tasks; + /* Tasks blocked in RCU read-side critical */ + /* section. Tasks are placed at the head */ + /* of this list and age towards the tail. */ + struct list_head *gp_tasks; + /* Pointer to the first task blocking the */ + /* current grace period, or NULL if there */ + /* is no such task. */ + struct list_head *exp_tasks; + /* Pointer to the first task blocking the */ + /* current expedited grace period, or NULL */ + /* if there is no such task. If there */ + /* is no current expedited grace period, */ + /* then there can cannot be any such task. */ } ____cacheline_internodealigned_in_smp; /* diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index 764b5fcc7c56..774f010a4619 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -130,12 +130,12 @@ static void rcu_preempt_qs(int cpu) * We have entered the scheduler, and the current task might soon be * context-switched away from. If this task is in an RCU read-side * critical section, we will no longer be able to rely on the CPU to - * record that fact, so we enqueue the task on the appropriate entry - * of the blocked_tasks[] array. The task will dequeue itself when - * it exits the outermost enclosing RCU read-side critical section. - * Therefore, the current grace period cannot be permitted to complete - * until the blocked_tasks[] entry indexed by the low-order bit of - * rnp->gpnum empties. + * record that fact, so we enqueue the task on the blkd_tasks list. + * The task will dequeue itself when it exits the outermost enclosing + * RCU read-side critical section. Therefore, the current grace period + * cannot be permitted to complete until the blkd_tasks list entries + * predating the current grace period drain, in other words, until + * rnp->gp_tasks becomes NULL. * * Caller must disable preemption. */ @@ -143,7 +143,6 @@ static void rcu_preempt_note_context_switch(int cpu) { struct task_struct *t = current; unsigned long flags; - int phase; struct rcu_data *rdp; struct rcu_node *rnp; @@ -165,15 +164,26 @@ static void rcu_preempt_note_context_switch(int cpu) * (i.e., this CPU has not yet passed through a quiescent * state for the current grace period), then as long * as that task remains queued, the current grace period - * cannot end. + * cannot end. Note that there is some uncertainty as + * to exactly when the current grace period started. + * We take a conservative approach, which can result + * in unnecessarily waiting on tasks that started very + * slightly after the current grace period began. C'est + * la vie!!! * * But first, note that the current CPU must still be * on line! */ WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0); WARN_ON_ONCE(!list_empty(&t->rcu_node_entry)); - phase = (rnp->gpnum + !(rnp->qsmask & rdp->grpmask)) & 0x1; - list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]); + if ((rnp->qsmask & rdp->grpmask) && rnp->gp_tasks != NULL) { + list_add(&t->rcu_node_entry, rnp->gp_tasks->prev); + rnp->gp_tasks = &t->rcu_node_entry; + } else { + list_add(&t->rcu_node_entry, &rnp->blkd_tasks); + if (rnp->qsmask & rdp->grpmask) + rnp->gp_tasks = &t->rcu_node_entry; + } raw_spin_unlock_irqrestore(&rnp->lock, flags); } @@ -210,10 +220,7 @@ EXPORT_SYMBOL_GPL(__rcu_read_lock); */ static int rcu_preempted_readers(struct rcu_node *rnp) { - int phase = rnp->gpnum & 0x1; - - return !list_empty(&rnp->blocked_tasks[phase]) || - !list_empty(&rnp->blocked_tasks[phase + 2]); + return rnp->gp_tasks != NULL; } /* @@ -252,6 +259,21 @@ static void rcu_report_unblock_qs_rnp(struct rcu_node *rnp, unsigned long flags) rcu_report_qs_rnp(mask, &rcu_preempt_state, rnp_p, flags); } +/* + * Advance a ->blkd_tasks-list pointer to the next entry, instead + * returning NULL if at the end of the list. + */ +static struct list_head *rcu_next_node_entry(struct task_struct *t, + struct rcu_node *rnp) +{ + struct list_head *np; + + np = t->rcu_node_entry.next; + if (np == &rnp->blkd_tasks) + np = NULL; + return np; +} + /* * Handle special cases during rcu_read_unlock(), such as needing to * notify RCU core processing or task having blocked during the RCU @@ -262,6 +284,7 @@ static void rcu_read_unlock_special(struct task_struct *t) int empty; int empty_exp; unsigned long flags; + struct list_head *np; struct rcu_node *rnp; int special; @@ -305,7 +328,12 @@ static void rcu_read_unlock_special(struct task_struct *t) empty = !rcu_preempted_readers(rnp); empty_exp = !rcu_preempted_readers_exp(rnp); smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */ + np = rcu_next_node_entry(t, rnp); list_del_init(&t->rcu_node_entry); + if (&t->rcu_node_entry == rnp->gp_tasks) + rnp->gp_tasks = np; + if (&t->rcu_node_entry == rnp->exp_tasks) + rnp->exp_tasks = np; t->rcu_blocked_node = NULL; /* @@ -361,18 +389,16 @@ EXPORT_SYMBOL_GPL(__rcu_read_unlock); static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp) { unsigned long flags; - struct list_head *lp; - int phase; struct task_struct *t; - if (rcu_preempted_readers(rnp)) { - raw_spin_lock_irqsave(&rnp->lock, flags); - phase = rnp->gpnum & 0x1; - lp = &rnp->blocked_tasks[phase]; - list_for_each_entry(t, lp, rcu_node_entry) - sched_show_task(t); - raw_spin_unlock_irqrestore(&rnp->lock, flags); - } + if (!rcu_preempted_readers(rnp)) + return; + raw_spin_lock_irqsave(&rnp->lock, flags); + t = list_entry(rnp->gp_tasks, + struct task_struct, rcu_node_entry); + list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) + sched_show_task(t); + raw_spin_unlock_irqrestore(&rnp->lock, flags); } /* @@ -402,16 +428,14 @@ static void rcu_print_detail_task_stall(struct rcu_state *rsp) */ static void rcu_print_task_stall(struct rcu_node *rnp) { - struct list_head *lp; - int phase; struct task_struct *t; - if (rcu_preempted_readers(rnp)) { - phase = rnp->gpnum & 0x1; - lp = &rnp->blocked_tasks[phase]; - list_for_each_entry(t, lp, rcu_node_entry) - printk(" P%d", t->pid); - } + if (!rcu_preempted_readers(rnp)) + return; + t = list_entry(rnp->gp_tasks, + struct task_struct, rcu_node_entry); + list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) + printk(" P%d", t->pid); } /* @@ -430,10 +454,15 @@ static void rcu_preempt_stall_reset(void) * period that still has RCU readers blocked! This function must be * invoked -before- updating this rnp's ->gpnum, and the rnp's ->lock * must be held by the caller. + * + * Also, if there are blocked tasks on the list, they automatically + * block the newly created grace period, so set up ->gp_tasks accordingly. */ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp) { WARN_ON_ONCE(rcu_preempted_readers(rnp)); + if (!list_empty(&rnp->blkd_tasks)) + rnp->gp_tasks = rnp->blkd_tasks.next; WARN_ON_ONCE(rnp->qsmask); } @@ -457,45 +486,49 @@ static int rcu_preempt_offline_tasks(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp) { - int i; struct list_head *lp; struct list_head *lp_root; int retval = 0; struct rcu_node *rnp_root = rcu_get_root(rsp); - struct task_struct *tp; + struct task_struct *t; if (rnp == rnp_root) { WARN_ONCE(1, "Last CPU thought to be offlined?"); return 0; /* Shouldn't happen: at least one CPU online. */ } - WARN_ON_ONCE(rnp != rdp->mynode && - (!list_empty(&rnp->blocked_tasks[0]) || - !list_empty(&rnp->blocked_tasks[1]) || - !list_empty(&rnp->blocked_tasks[2]) || - !list_empty(&rnp->blocked_tasks[3]))); + + /* If we are on an internal node, complain bitterly. */ + WARN_ON_ONCE(rnp != rdp->mynode); /* - * Move tasks up to root rcu_node. Rely on the fact that the - * root rcu_node can be at most one ahead of the rest of the - * rcu_nodes in terms of gp_num value. This fact allows us to - * move the blocked_tasks[] array directly, element by element. + * Move tasks up to root rcu_node. Don't try to get fancy for + * this corner-case operation -- just put this node's tasks + * at the head of the root node's list, and update the root node's + * ->gp_tasks and ->exp_tasks pointers to those of this node's, + * if non-NULL. This might result in waiting for more tasks than + * absolutely necessary, but this is a good performance/complexity + * tradeoff. */ if (rcu_preempted_readers(rnp)) retval |= RCU_OFL_TASKS_NORM_GP; if (rcu_preempted_readers_exp(rnp)) retval |= RCU_OFL_TASKS_EXP_GP; - for (i = 0; i < 4; i++) { - lp = &rnp->blocked_tasks[i]; - lp_root = &rnp_root->blocked_tasks[i]; - while (!list_empty(lp)) { - tp = list_entry(lp->next, typeof(*tp), rcu_node_entry); - raw_spin_lock(&rnp_root->lock); /* irqs already disabled */ - list_del(&tp->rcu_node_entry); - tp->rcu_blocked_node = rnp_root; - list_add(&tp->rcu_node_entry, lp_root); - raw_spin_unlock(&rnp_root->lock); /* irqs remain disabled */ - } + lp = &rnp->blkd_tasks; + lp_root = &rnp_root->blkd_tasks; + while (!list_empty(lp)) { + t = list_entry(lp->next, typeof(*t), rcu_node_entry); + raw_spin_lock(&rnp_root->lock); /* irqs already disabled */ + list_del(&t->rcu_node_entry); + t->rcu_blocked_node = rnp_root; + list_add(&t->rcu_node_entry, lp_root); + if (&t->rcu_node_entry == rnp->gp_tasks) + rnp_root->gp_tasks = rnp->gp_tasks; + if (&t->rcu_node_entry == rnp->exp_tasks) + rnp_root->exp_tasks = rnp->exp_tasks; + raw_spin_unlock(&rnp_root->lock); /* irqs still disabled */ } + rnp->gp_tasks = NULL; + rnp->exp_tasks = NULL; return retval; } @@ -586,8 +619,7 @@ static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex); */ static int rcu_preempted_readers_exp(struct rcu_node *rnp) { - return !list_empty(&rnp->blocked_tasks[2]) || - !list_empty(&rnp->blocked_tasks[3]); + return rnp->exp_tasks != NULL; } /* @@ -647,12 +679,13 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp) static void sync_rcu_preempt_exp_init(struct rcu_state *rsp, struct rcu_node *rnp) { - int must_wait; + int must_wait = 0; raw_spin_lock(&rnp->lock); /* irqs already disabled */ - list_splice_init(&rnp->blocked_tasks[0], &rnp->blocked_tasks[2]); - list_splice_init(&rnp->blocked_tasks[1], &rnp->blocked_tasks[3]); - must_wait = rcu_preempted_readers_exp(rnp); + if (!list_empty(&rnp->blkd_tasks)) { + rnp->exp_tasks = rnp->blkd_tasks.next; + must_wait = 1; + } raw_spin_unlock(&rnp->lock); /* irqs remain disabled */ if (!must_wait) rcu_report_exp_rnp(rsp, rnp); @@ -661,9 +694,7 @@ sync_rcu_preempt_exp_init(struct rcu_state *rsp, struct rcu_node *rnp) /* * Wait for an rcu-preempt grace period, but expedite it. The basic idea * is to invoke synchronize_sched_expedited() to push all the tasks to - * the ->blocked_tasks[] lists, move all entries from the first set of - * ->blocked_tasks[] lists to the second set, and finally wait for this - * second set to drain. + * the ->blkd_tasks lists and wait for this list to drain. */ void synchronize_rcu_expedited(void) { @@ -695,7 +726,7 @@ void synchronize_rcu_expedited(void) if ((ACCESS_ONCE(sync_rcu_preempt_exp_count) - snap) > 0) goto unlock_mb_ret; /* Others did our work for us. */ - /* force all RCU readers onto blocked_tasks[]. */ + /* force all RCU readers onto ->blkd_tasks lists. */ synchronize_sched_expedited(); raw_spin_lock_irqsave(&rsp->onofflock, flags); @@ -707,7 +738,7 @@ void synchronize_rcu_expedited(void) raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */ } - /* Snapshot current state of ->blocked_tasks[] lists. */ + /* Snapshot current state of ->blkd_tasks lists. */ rcu_for_each_leaf_node(rsp, rnp) sync_rcu_preempt_exp_init(rsp, rnp); if (NUM_RCU_NODES > 1) @@ -715,7 +746,7 @@ void synchronize_rcu_expedited(void) raw_spin_unlock_irqrestore(&rsp->onofflock, flags); - /* Wait for snapshotted ->blocked_tasks[] lists to drain. */ + /* Wait for snapshotted ->blkd_tasks lists to drain. */ rnp = rcu_get_root(rsp); wait_event(sync_rcu_preempt_exp_wq, sync_rcu_preempt_exp_done(rnp)); diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c index 4a21ca55ef7c..1cedf94e2c4f 100644 --- a/kernel/rcutree_trace.c +++ b/kernel/rcutree_trace.c @@ -161,7 +161,6 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp) { unsigned long gpnum; int level = 0; - int phase; struct rcu_node *rnp; gpnum = rsp->gpnum; @@ -178,13 +177,11 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp) seq_puts(m, "\n"); level = rnp->level; } - phase = gpnum & 0x1; - seq_printf(m, "%lx/%lx %c%c>%c%c %d:%d ^%d ", + seq_printf(m, "%lx/%lx %c%c>%c %d:%d ^%d ", rnp->qsmask, rnp->qsmaskinit, - "T."[list_empty(&rnp->blocked_tasks[phase])], - "E."[list_empty(&rnp->blocked_tasks[phase + 2])], - "T."[list_empty(&rnp->blocked_tasks[!phase])], - "E."[list_empty(&rnp->blocked_tasks[!phase + 2])], + ".G"[rnp->gp_tasks != NULL], + ".E"[rnp->exp_tasks != NULL], + ".T"[!list_empty(&rnp->blkd_tasks)], rnp->grplo, rnp->grphi, rnp->grpnum); } seq_puts(m, "\n"); -- cgit v1.2.3 From a26ac2455ffcf3be5c6ef92bc6df7182700f2114 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 12 Jan 2011 14:10:23 -0800 Subject: rcu: move TREE_RCU from softirq to kthread If RCU priority boosting is to be meaningful, callback invocation must be boosted in addition to preempted RCU readers. Otherwise, in presence of CPU real-time threads, the grace period ends, but the callbacks don't get invoked. If the callbacks don't get invoked, the associated memory doesn't get freed, so the system is still subject to OOM. But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit moves the callback invocations to a kthread, which can be boosted easily. Also add comments and properly synchronized all accesses to rcu_cpu_kthread_task, as suggested by Lai Jiangshan. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- Documentation/filesystems/proc.txt | 1 - include/linux/interrupt.h | 1 - include/trace/events/irq.h | 3 +- kernel/rcutree.c | 340 +++++++++++++++++++++++++++++++++++- kernel/rcutree.h | 8 + kernel/rcutree_plugin.h | 4 +- kernel/softirq.c | 2 +- tools/perf/util/trace-event-parse.c | 1 - 8 files changed, 348 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index b0b814d75ca1..60740e8ecb37 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -836,7 +836,6 @@ Provides counts of softirq handlers serviced since boot time, for each cpu. TASKLET: 0 0 0 290 SCHED: 27035 26983 26971 26746 HRTIMER: 0 0 0 0 - RCU: 1678 1769 2178 2250 1.3 IDE devices in /proc/ide diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index bea0ac750712..6c12989839d9 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -414,7 +414,6 @@ enum TASKLET_SOFTIRQ, SCHED_SOFTIRQ, HRTIMER_SOFTIRQ, - RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ NR_SOFTIRQS }; diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h index 1c09820df585..ae045ca7d356 100644 --- a/include/trace/events/irq.h +++ b/include/trace/events/irq.h @@ -20,8 +20,7 @@ struct softirq_action; softirq_name(BLOCK_IOPOLL), \ softirq_name(TASKLET), \ softirq_name(SCHED), \ - softirq_name(HRTIMER), \ - softirq_name(RCU)) + softirq_name(HRTIMER)) /** * irq_handler_entry - called immediately before the irq action handler diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 0ac1cc03f935..18e33313873e 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -47,6 +47,8 @@ #include #include #include +#include +#include #include "rcutree.h" @@ -82,6 +84,20 @@ DEFINE_PER_CPU(struct rcu_data, rcu_bh_data); int rcu_scheduler_active __read_mostly; EXPORT_SYMBOL_GPL(rcu_scheduler_active); +/* + * Control variables for per-CPU and per-rcu_node kthreads. These + * handle all flavors of RCU. + */ +static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task); +static DEFINE_PER_CPU(wait_queue_head_t, rcu_cpu_wq); +static DEFINE_PER_CPU(char, rcu_cpu_has_work); +static char rcu_kthreads_spawnable; + +static void rcu_node_kthread_setaffinity(struct rcu_node *rnp); +static void invoke_rcu_kthread(void); + +#define RCU_KTHREAD_PRIO 1 /* RT priority for per-CPU kthreads. */ + /* * Return true if an RCU grace period is in progress. The ACCESS_ONCE()s * permit this function to be invoked without holding the root rcu_node @@ -1009,6 +1025,8 @@ static void rcu_send_cbs_to_online(struct rcu_state *rsp) /* * Remove the outgoing CPU from the bitmasks in the rcu_node hierarchy * and move all callbacks from the outgoing CPU to the current one. + * There can only be one CPU hotplug operation at a time, so no other + * CPU can be attempting to update rcu_cpu_kthread_task. */ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp) { @@ -1017,6 +1035,14 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp) int need_report = 0; struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); struct rcu_node *rnp; + struct task_struct *t; + + /* Stop the CPU's kthread. */ + t = per_cpu(rcu_cpu_kthread_task, cpu); + if (t != NULL) { + per_cpu(rcu_cpu_kthread_task, cpu) = NULL; + kthread_stop(t); + } /* Exclude any attempts to start a new grace period. */ raw_spin_lock_irqsave(&rsp->onofflock, flags); @@ -1054,6 +1080,19 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp) raw_spin_unlock_irqrestore(&rnp->lock, flags); if (need_report & RCU_OFL_TASKS_EXP_GP) rcu_report_exp_rnp(rsp, rnp); + + /* + * If there are no more online CPUs for this rcu_node structure, + * kill the rcu_node structure's kthread. Otherwise, adjust its + * affinity. + */ + t = rnp->node_kthread_task; + if (t != NULL && + rnp->qsmaskinit == 0) { + kthread_stop(t); + rnp->node_kthread_task = NULL; + } else + rcu_node_kthread_setaffinity(rnp); } /* @@ -1151,7 +1190,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp) /* Re-raise the RCU softirq if there are callbacks remaining. */ if (cpu_has_callbacks_ready_to_invoke(rdp)) - raise_softirq(RCU_SOFTIRQ); + invoke_rcu_kthread(); } /* @@ -1197,7 +1236,7 @@ void rcu_check_callbacks(int cpu, int user) } rcu_preempt_check_callbacks(cpu); if (rcu_pending(cpu)) - raise_softirq(RCU_SOFTIRQ); + invoke_rcu_kthread(); } #ifdef CONFIG_SMP @@ -1361,7 +1400,7 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp) /* * Do softirq processing for the current CPU. */ -static void rcu_process_callbacks(struct softirq_action *unused) +static void rcu_process_callbacks(void) { __rcu_process_callbacks(&rcu_sched_state, &__get_cpu_var(rcu_sched_data)); @@ -1372,6 +1411,281 @@ static void rcu_process_callbacks(struct softirq_action *unused) rcu_needs_cpu_flush(); } +/* + * Wake up the current CPU's kthread. This replaces raise_softirq() + * in earlier versions of RCU. Note that because we are running on + * the current CPU with interrupts disabled, the rcu_cpu_kthread_task + * cannot disappear out from under us. + */ +static void invoke_rcu_kthread(void) +{ + unsigned long flags; + wait_queue_head_t *q; + int cpu; + + local_irq_save(flags); + cpu = smp_processor_id(); + per_cpu(rcu_cpu_has_work, cpu) = 1; + if (per_cpu(rcu_cpu_kthread_task, cpu) == NULL) { + local_irq_restore(flags); + return; + } + q = &per_cpu(rcu_cpu_wq, cpu); + wake_up(q); + local_irq_restore(flags); +} + +/* + * Timer handler to initiate the waking up of per-CPU kthreads that + * have yielded the CPU due to excess numbers of RCU callbacks. + */ +static void rcu_cpu_kthread_timer(unsigned long arg) +{ + unsigned long flags; + struct rcu_data *rdp = (struct rcu_data *)arg; + struct rcu_node *rnp = rdp->mynode; + struct task_struct *t; + + raw_spin_lock_irqsave(&rnp->lock, flags); + rnp->wakemask |= rdp->grpmask; + t = rnp->node_kthread_task; + if (t == NULL) { + raw_spin_unlock_irqrestore(&rnp->lock, flags); + return; + } + wake_up_process(t); + raw_spin_unlock_irqrestore(&rnp->lock, flags); +} + +/* + * Drop to non-real-time priority and yield, but only after posting a + * timer that will cause us to regain our real-time priority if we + * remain preempted. Either way, we restore our real-time priority + * before returning. + */ +static void rcu_yield(int cpu) +{ + struct rcu_data *rdp = per_cpu_ptr(rcu_sched_state.rda, cpu); + struct sched_param sp; + struct timer_list yield_timer; + + setup_timer_on_stack(&yield_timer, rcu_cpu_kthread_timer, (unsigned long)rdp); + mod_timer(&yield_timer, jiffies + 2); + sp.sched_priority = 0; + sched_setscheduler_nocheck(current, SCHED_NORMAL, &sp); + schedule(); + sp.sched_priority = RCU_KTHREAD_PRIO; + sched_setscheduler_nocheck(current, SCHED_FIFO, &sp); + del_timer(&yield_timer); +} + +/* + * Handle cases where the rcu_cpu_kthread() ends up on the wrong CPU. + * This can happen while the corresponding CPU is either coming online + * or going offline. We cannot wait until the CPU is fully online + * before starting the kthread, because the various notifier functions + * can wait for RCU grace periods. So we park rcu_cpu_kthread() until + * the corresponding CPU is online. + * + * Return 1 if the kthread needs to stop, 0 otherwise. + * + * Caller must disable bh. This function can momentarily enable it. + */ +static int rcu_cpu_kthread_should_stop(int cpu) +{ + while (cpu_is_offline(cpu) || + !cpumask_equal(¤t->cpus_allowed, cpumask_of(cpu)) || + smp_processor_id() != cpu) { + if (kthread_should_stop()) + return 1; + local_bh_enable(); + schedule_timeout_uninterruptible(1); + if (!cpumask_equal(¤t->cpus_allowed, cpumask_of(cpu))) + set_cpus_allowed_ptr(current, cpumask_of(cpu)); + local_bh_disable(); + } + return 0; +} + +/* + * Per-CPU kernel thread that invokes RCU callbacks. This replaces the + * earlier RCU softirq. + */ +static int rcu_cpu_kthread(void *arg) +{ + int cpu = (int)(long)arg; + unsigned long flags; + int spincnt = 0; + wait_queue_head_t *wqp = &per_cpu(rcu_cpu_wq, cpu); + char work; + char *workp = &per_cpu(rcu_cpu_has_work, cpu); + + for (;;) { + wait_event_interruptible(*wqp, + *workp != 0 || kthread_should_stop()); + local_bh_disable(); + if (rcu_cpu_kthread_should_stop(cpu)) { + local_bh_enable(); + break; + } + local_irq_save(flags); + work = *workp; + *workp = 0; + local_irq_restore(flags); + if (work) + rcu_process_callbacks(); + local_bh_enable(); + if (*workp != 0) + spincnt++; + else + spincnt = 0; + if (spincnt > 10) { + rcu_yield(cpu); + spincnt = 0; + } + } + return 0; +} + +/* + * Spawn a per-CPU kthread, setting up affinity and priority. + * Because the CPU hotplug lock is held, no other CPU will be attempting + * to manipulate rcu_cpu_kthread_task. There might be another CPU + * attempting to access it during boot, but the locking in kthread_bind() + * will enforce sufficient ordering. + */ +static int __cpuinit rcu_spawn_one_cpu_kthread(int cpu) +{ + struct sched_param sp; + struct task_struct *t; + + if (!rcu_kthreads_spawnable || + per_cpu(rcu_cpu_kthread_task, cpu) != NULL) + return 0; + t = kthread_create(rcu_cpu_kthread, (void *)(long)cpu, "rcuc%d", cpu); + if (IS_ERR(t)) + return PTR_ERR(t); + kthread_bind(t, cpu); + WARN_ON_ONCE(per_cpu(rcu_cpu_kthread_task, cpu) != NULL); + per_cpu(rcu_cpu_kthread_task, cpu) = t; + wake_up_process(t); + sp.sched_priority = RCU_KTHREAD_PRIO; + sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); + return 0; +} + +/* + * Per-rcu_node kthread, which is in charge of waking up the per-CPU + * kthreads when needed. We ignore requests to wake up kthreads + * for offline CPUs, which is OK because force_quiescent_state() + * takes care of this case. + */ +static int rcu_node_kthread(void *arg) +{ + int cpu; + unsigned long flags; + unsigned long mask; + struct rcu_node *rnp = (struct rcu_node *)arg; + struct sched_param sp; + struct task_struct *t; + + for (;;) { + wait_event_interruptible(rnp->node_wq, rnp->wakemask != 0 || + kthread_should_stop()); + if (kthread_should_stop()) + break; + raw_spin_lock_irqsave(&rnp->lock, flags); + mask = rnp->wakemask; + rnp->wakemask = 0; + raw_spin_unlock_irqrestore(&rnp->lock, flags); + for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) { + if ((mask & 0x1) == 0) + continue; + preempt_disable(); + t = per_cpu(rcu_cpu_kthread_task, cpu); + if (!cpu_online(cpu) || t == NULL) { + preempt_enable(); + continue; + } + per_cpu(rcu_cpu_has_work, cpu) = 1; + sp.sched_priority = RCU_KTHREAD_PRIO; + sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); + preempt_enable(); + } + } + return 0; +} + +/* + * Set the per-rcu_node kthread's affinity to cover all CPUs that are + * served by the rcu_node in question. + */ +static void rcu_node_kthread_setaffinity(struct rcu_node *rnp) +{ + cpumask_var_t cm; + int cpu; + unsigned long mask = rnp->qsmaskinit; + + if (rnp->node_kthread_task == NULL || + rnp->qsmaskinit == 0) + return; + if (!alloc_cpumask_var(&cm, GFP_KERNEL)) + return; + cpumask_clear(cm); + for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) + if (mask & 0x1) + cpumask_set_cpu(cpu, cm); + set_cpus_allowed_ptr(rnp->node_kthread_task, cm); + free_cpumask_var(cm); +} + +/* + * Spawn a per-rcu_node kthread, setting priority and affinity. + */ +static int __cpuinit rcu_spawn_one_node_kthread(struct rcu_state *rsp, + struct rcu_node *rnp) +{ + int rnp_index = rnp - &rsp->node[0]; + struct sched_param sp; + struct task_struct *t; + + if (!rcu_kthreads_spawnable || + rnp->qsmaskinit == 0 || + rnp->node_kthread_task != NULL) + return 0; + t = kthread_create(rcu_node_kthread, (void *)rnp, "rcun%d", rnp_index); + if (IS_ERR(t)) + return PTR_ERR(t); + rnp->node_kthread_task = t; + wake_up_process(t); + sp.sched_priority = 99; + sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); + return 0; +} + +/* + * Spawn all kthreads -- called as soon as the scheduler is running. + */ +static int __init rcu_spawn_kthreads(void) +{ + int cpu; + struct rcu_node *rnp; + + rcu_kthreads_spawnable = 1; + for_each_possible_cpu(cpu) { + init_waitqueue_head(&per_cpu(rcu_cpu_wq, cpu)); + per_cpu(rcu_cpu_has_work, cpu) = 0; + if (cpu_online(cpu)) + (void)rcu_spawn_one_cpu_kthread(cpu); + } + rcu_for_each_leaf_node(&rcu_sched_state, rnp) { + init_waitqueue_head(&rnp->node_wq); + (void)rcu_spawn_one_node_kthread(&rcu_sched_state, rnp); + } + return 0; +} +early_initcall(rcu_spawn_kthreads); + static void __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu), struct rcu_state *rsp) @@ -1771,6 +2085,19 @@ static void __cpuinit rcu_online_cpu(int cpu) rcu_preempt_init_percpu_data(cpu); } +static void __cpuinit rcu_online_kthreads(int cpu) +{ + struct rcu_data *rdp = per_cpu_ptr(rcu_sched_state.rda, cpu); + struct rcu_node *rnp = rdp->mynode; + + /* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */ + if (rcu_kthreads_spawnable) { + (void)rcu_spawn_one_cpu_kthread(cpu); + if (rnp->node_kthread_task == NULL) + (void)rcu_spawn_one_node_kthread(&rcu_sched_state, rnp); + } +} + /* * Handle CPU online/offline notification events. */ @@ -1778,11 +2105,17 @@ static int __cpuinit rcu_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) { long cpu = (long)hcpu; + struct rcu_data *rdp = per_cpu_ptr(rcu_sched_state.rda, cpu); + struct rcu_node *rnp = rdp->mynode; switch (action) { case CPU_UP_PREPARE: case CPU_UP_PREPARE_FROZEN: rcu_online_cpu(cpu); + rcu_online_kthreads(cpu); + break; + case CPU_ONLINE: + rcu_node_kthread_setaffinity(rnp); break; case CPU_DYING: case CPU_DYING_FROZEN: @@ -1923,7 +2256,6 @@ void __init rcu_init(void) rcu_init_one(&rcu_sched_state, &rcu_sched_data); rcu_init_one(&rcu_bh_state, &rcu_bh_data); __rcu_init_preempt(); - open_softirq(RCU_SOFTIRQ, rcu_process_callbacks); /* * We don't need protection against CPU-hotplug here because diff --git a/kernel/rcutree.h b/kernel/rcutree.h index 5a439c180e69..c0213802d164 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -111,6 +111,7 @@ struct rcu_node { /* elements that need to drain to allow the */ /* current expedited grace period to */ /* complete (only for TREE_PREEMPT_RCU). */ + unsigned long wakemask; /* CPUs whose kthread needs to be awakened. */ unsigned long qsmaskinit; /* Per-GP initial value for qsmask & expmask. */ unsigned long grpmask; /* Mask to apply to parent qsmask. */ @@ -134,6 +135,13 @@ struct rcu_node { /* if there is no such task. If there */ /* is no current expedited grace period, */ /* then there can cannot be any such task. */ + struct task_struct *node_kthread_task; + /* kthread that takes care of this rcu_node */ + /* structure, for example, awakening the */ + /* per-CPU kthreads as needed. */ + wait_queue_head_t node_wq; + /* Wait queue on which to park the per-node */ + /* kthread. */ } ____cacheline_internodealigned_in_smp; /* diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index 774f010a4619..b9bd69a5a4fe 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -1206,7 +1206,7 @@ static DEFINE_PER_CPU(unsigned long, rcu_dyntick_holdoff); * * Because it is not legal to invoke rcu_process_callbacks() with irqs * disabled, we do one pass of force_quiescent_state(), then do a - * raise_softirq() to cause rcu_process_callbacks() to be invoked later. + * invoke_rcu_kthread() to cause rcu_process_callbacks() to be invoked later. * The per-cpu rcu_dyntick_drain variable controls the sequencing. */ int rcu_needs_cpu(int cpu) @@ -1257,7 +1257,7 @@ int rcu_needs_cpu(int cpu) /* If RCU callbacks are still pending, RCU still needs this CPU. */ if (c) - raise_softirq(RCU_SOFTIRQ); + invoke_rcu_kthread(); return c; } diff --git a/kernel/softirq.c b/kernel/softirq.c index 174f976c2874..13960170cad4 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -58,7 +58,7 @@ DEFINE_PER_CPU(struct task_struct *, ksoftirqd); char *softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL", - "TASKLET", "SCHED", "HRTIMER", "RCU" + "TASKLET", "SCHED", "HRTIMER" }; /* diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c index 0a7ed5b5e281..1e88485c16a0 100644 --- a/tools/perf/util/trace-event-parse.c +++ b/tools/perf/util/trace-event-parse.c @@ -2187,7 +2187,6 @@ static const struct flag flags[] = { { "TASKLET_SOFTIRQ", 6 }, { "SCHED_SOFTIRQ", 7 }, { "HRTIMER_SOFTIRQ", 8 }, - { "RCU_SOFTIRQ", 9 }, { "HRTIMER_NORESTART", 0 }, { "HRTIMER_RESTART", 1 }, -- cgit v1.2.3 From 2fa218d8bbcff239302f9f36e19d7187077dd636 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Sun, 27 Mar 2011 21:37:58 -0700 Subject: rcu: Update RCU's trace.txt documentation for new format The trace.txt file had obsolete output for the debugfs rcu/rcudata file, so update it. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- Documentation/RCU/trace.txt | 65 ++++++++++++++++++++++++--------------------- 1 file changed, 34 insertions(+), 31 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 5a704ffd0bbc..13dbf9bccdbf 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -21,23 +21,23 @@ rcu_pending() function decided that there was core RCU work to do). The output of "cat rcu/rcudata" looks as follows: rcu_sched: - 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1/0 df=1101 of=0 ri=36 ql=0 b=10 - 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1/0 df=1015 of=0 ri=0 ql=0 b=10 - 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1/0 df=1839 of=0 ri=0 ql=0 b=10 - 3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1/0 df=1545 of=0 ri=0 ql=0 b=10 - 4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1/0 df=1992 of=0 ri=0 ql=0 b=10 - 5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1/0 df=3331 of=0 ri=4 ql=2 b=10 - 6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1/0 df=3224 of=0 ri=0 ql=0 b=10 - 7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1/0 df=1818 of=0 ri=0 ql=2 b=10 + 0!c=423090 g=423091 pq=1 pqc=423090 qp=1 dt=86475/1/0 df=16319 of=163 ri=1519 ql=0 b=10 ci=1460693 co=1648 ca=6448 + 1!c=423329 g=423330 pq=1 pqc=423329 qp=1 dt=90875/1/0 df=16231 of=157 ri=1249 ql=0 b=10 ci=1459002 co=1614 ca=3310 + 2!c=423370 g=423371 pq=1 pqc=423370 qp=1 dt=69661/1/0 df=16125 of=163 ri=1469 ql=0 b=10 ci=1610701 co=2015 ca=2378 + 3!c=422967 g=422968 pq=1 pqc=422967 qp=1 dt=70349/1/0 df=12528 of=163 ri=1450 ql=0 b=10 ci=1427543 co=1430 ca=897 + 4!c=423196 g=423197 pq=1 pqc=423196 qp=0 dt=38935/1/0 df=10959 of=177 ri=1657 ql=0 b=10 ci=1562249 co=1896 ca=533 + 5!c=422950 g=422951 pq=1 pqc=422950 qp=0 dt=25127/1/0 df=5895 of=167 ri=1549 ql=0 b=10 ci=1777260 co=2137 ca=274 + 6!c=423396 g=423397 pq=1 pqc=423396 qp=1 dt=22639/1/0 df=4590 of=149 ri=1572 ql=0 b=10 ci=1471186 co=1530 ca=243 + 7 c=460203 g=460203 pq=1 pqc=460202 qp=0 dt=937087/1/0 df=3298 of=149 ri=1584 ql=6 b=10 ci=4026154 co=1948 ca=135 rcu_bh: - 0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1/0 df=0 of=0 ri=0 ql=0 b=10 - 1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1/0 df=13 of=0 ri=0 ql=0 b=10 - 2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1/0 df=15 of=0 ri=0 ql=0 b=10 - 3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1/0 df=9 of=0 ri=0 ql=0 b=10 - 4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1/0 df=15 of=0 ri=0 ql=0 b=10 - 5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1/0 df=15 of=0 ri=0 ql=0 b=10 - 6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1/0 df=15 of=0 ri=0 ql=0 b=10 - 7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1/0 df=15 of=0 ri=0 ql=0 b=10 + 0!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=86475/1/0 df=11 of=0 ri=0 ql=0 b=10 ci=112 co=0 ca=0 + 1!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=90875/1/0 df=15 of=0 ri=0 ql=0 b=10 ci=143 co=0 ca=0 + 2!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=69661/1/0 df=21 of=0 ri=1 ql=0 b=10 ci=88 co=0 ca=0 + 3!c=18446744073709551494 g=18446744073709551494 pq=1 pqc=18446744073709551493 qp=0 dt=70349/1/0 df=13 of=0 ri=0 ql=0 b=10 ci=100 co=0 ca=0 + 4!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=38935/1/0 df=17 of=0 ri=0 ql=0 b=10 ci=36 co=0 ca=0 + 5!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=25127/1/0 df=7 of=0 ri=0 ql=0 b=10 ci=32 co=0 ca=0 + 6!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=22639/1/0 df=9 of=0 ri=0 ql=0 b=10 ci=44 co=0 ca=0 + 7 c=182 g=182 pq=1 pqc=181 qp=0 dt=937087/1/0 df=14 of=0 ri=1 ql=0 b=10 ci=627 co=0 ca=0 The first section lists the rcu_data structures for rcu_sched, the second for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an @@ -52,17 +52,18 @@ o The number at the beginning of each line is the CPU number. substantially larger than the number of actual CPUs. o "c" is the count of grace periods that this CPU believes have - completed. CPUs in dynticks idle mode may lag quite a ways - behind, for example, CPU 4 under "rcu_sched" above, which has - slept through the past 25 RCU grace periods. It is not unusual - to see CPUs lagging by thousands of grace periods. + completed. Offlined CPUs and CPUs in dynticks idle mode may + lag quite a ways behind, for example, CPU 6 under "rcu_sched" + above, which has been offline through not quite 40,000 RCU grace + periods. It is not unusual to see CPUs lagging by thousands of + grace periods. o "g" is the count of grace periods that this CPU believes have - started. Again, CPUs in dynticks idle mode may lag behind. - If the "c" and "g" values are equal, this CPU has already - reported a quiescent state for the last RCU grace period that - it is aware of, otherwise, the CPU believes that it owes RCU a - quiescent state. + started. Again, offlined CPUs and CPUs in dynticks idle mode + may lag behind. If the "c" and "g" values are equal, this CPU + has already reported a quiescent state for the last RCU grace + period that it is aware of, otherwise, the CPU believes that it + owes RCU a quiescent state. o "pq" indicates that this CPU has passed through a quiescent state for the current grace period. It is possible for "pq" to be @@ -81,14 +82,16 @@ o "pqc" indicates which grace period the last-observed quiescent the next grace period! o "qp" indicates that RCU still expects a quiescent state from - this CPU. + this CPU. Offlined CPUs and CPUs in dyntick idle mode might + well have qp=1, which is OK: RCU is still ignoring them. o "dt" is the current value of the dyntick counter that is incremented when entering or leaving dynticks idle state, either by the - scheduler or by irq. The number after the first "/" is the - interrupt nesting depth when in dyntick-idle state, or one - greater than the interrupt-nesting depth otherwise. The number - after the second "/" is the NMI nesting depth. + scheduler or by irq. This number is even if the CPU is in + dyntick idle mode and odd otherwise. The number after the first + "/" is the interrupt nesting depth when in dyntick-idle state, + or one greater than the interrupt-nesting depth otherwise. + The number after the second "/" is the NMI nesting depth. This field is displayed only for CONFIG_NO_HZ kernels. @@ -100,7 +103,7 @@ o "df" is the number of times that some other CPU has forced a o "of" is the number of times that some other CPU has forced a quiescent state on behalf of this CPU due to this CPU being - offline. In a perfect world, this might neve happen, but it + offline. In a perfect world, this might never happen, but it turns out that offlining and onlining a CPU can take several grace periods, and so there is likely to be an extended period of time when RCU believes that the CPU is online when it really is not. -- cgit v1.2.3 From 0ac3d136b2e3cdf1161178223bc5da14a06241d0 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 28 Mar 2011 15:47:07 -0700 Subject: rcu: add callback-queue information to rcudata output This commit adds an indication of the state of the callback queue using a string of four characters following the "ql=" integer queue length. The first character is "N" if there are callbacks that have been queued that are not yet ready to be handled by the next grace period, or "." otherwise. The second character is "R" if there are callbacks queued that are ready to be handled by the next grace period, or "." otherwise. The third character is "W" if there are callbacks waiting for the current grace period, or "." otherwise. Finally, the fourth character is "D" if there are callbacks that have been handled by a prior grace period and are waiting to be invoked, or ".". Note that callbacks that are in the process of being invoked are not shown. These callbacks would have been removed from the rcu_data structure's list by rcu_do_batch() prior to being executed. (These callbacks are also not reflected in the "ql=" total, FWIW.) Also, document the new callback-queue trace information. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- Documentation/RCU/trace.txt | 58 ++++++++++++++++++++++++++++++++------------- kernel/rcutree_trace.c | 21 ++++++++++++++-- 2 files changed, 61 insertions(+), 18 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 13dbf9bccdbf..5aefd5fa7af5 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -21,23 +21,23 @@ rcu_pending() function decided that there was core RCU work to do). The output of "cat rcu/rcudata" looks as follows: rcu_sched: - 0!c=423090 g=423091 pq=1 pqc=423090 qp=1 dt=86475/1/0 df=16319 of=163 ri=1519 ql=0 b=10 ci=1460693 co=1648 ca=6448 - 1!c=423329 g=423330 pq=1 pqc=423329 qp=1 dt=90875/1/0 df=16231 of=157 ri=1249 ql=0 b=10 ci=1459002 co=1614 ca=3310 - 2!c=423370 g=423371 pq=1 pqc=423370 qp=1 dt=69661/1/0 df=16125 of=163 ri=1469 ql=0 b=10 ci=1610701 co=2015 ca=2378 - 3!c=422967 g=422968 pq=1 pqc=422967 qp=1 dt=70349/1/0 df=12528 of=163 ri=1450 ql=0 b=10 ci=1427543 co=1430 ca=897 - 4!c=423196 g=423197 pq=1 pqc=423196 qp=0 dt=38935/1/0 df=10959 of=177 ri=1657 ql=0 b=10 ci=1562249 co=1896 ca=533 - 5!c=422950 g=422951 pq=1 pqc=422950 qp=0 dt=25127/1/0 df=5895 of=167 ri=1549 ql=0 b=10 ci=1777260 co=2137 ca=274 - 6!c=423396 g=423397 pq=1 pqc=423396 qp=1 dt=22639/1/0 df=4590 of=149 ri=1572 ql=0 b=10 ci=1471186 co=1530 ca=243 - 7 c=460203 g=460203 pq=1 pqc=460202 qp=0 dt=937087/1/0 df=3298 of=149 ri=1584 ql=6 b=10 ci=4026154 co=1948 ca=135 + 0!c=423090 g=423091 pq=1 pqc=423090 qp=1 dt=86475/1/0 df=16319 of=163 ri=1519 ql=0 qs=.... b=10 ci=1460693 co=1648 ca=6448 + 1!c=423329 g=423330 pq=1 pqc=423329 qp=1 dt=90875/1/0 df=16231 of=157 ri=1249 ql=0 qs=.... b=10 ci=1459002 co=1614 ca=3310 + 2!c=423370 g=423371 pq=1 pqc=423370 qp=1 dt=69661/1/0 df=16125 of=163 ri=1469 ql=0 qs=.... b=10 ci=1610701 co=2015 ca=2378 + 3!c=422967 g=422968 pq=1 pqc=422967 qp=1 dt=70349/1/0 df=12528 of=163 ri=1450 ql=0 qs=.... b=10 ci=1427543 co=1430 ca=897 + 4!c=423196 g=423197 pq=1 pqc=423196 qp=0 dt=38935/1/0 df=10959 of=177 ri=1657 ql=0 qs=.... b=10 ci=1562249 co=1896 ca=533 + 5!c=422950 g=422951 pq=1 pqc=422950 qp=0 dt=25127/1/0 df=5895 of=167 ri=1549 ql=0 qs=.... b=10 ci=1777260 co=2137 ca=274 + 6!c=423396 g=423397 pq=1 pqc=423396 qp=1 dt=22639/1/0 df=4590 of=149 ri=1572 ql=0 qs=.... b=10 ci=1471186 co=1530 ca=243 + 7 c=460203 g=460203 pq=1 pqc=460202 qp=0 dt=937087/1/0 df=3298 of=149 ri=1584 ql=6 qs=N.W. b=10 ci=4026154 co=1948 ca=135 rcu_bh: - 0!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=86475/1/0 df=11 of=0 ri=0 ql=0 b=10 ci=112 co=0 ca=0 - 1!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=90875/1/0 df=15 of=0 ri=0 ql=0 b=10 ci=143 co=0 ca=0 - 2!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=69661/1/0 df=21 of=0 ri=1 ql=0 b=10 ci=88 co=0 ca=0 - 3!c=18446744073709551494 g=18446744073709551494 pq=1 pqc=18446744073709551493 qp=0 dt=70349/1/0 df=13 of=0 ri=0 ql=0 b=10 ci=100 co=0 ca=0 - 4!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=38935/1/0 df=17 of=0 ri=0 ql=0 b=10 ci=36 co=0 ca=0 - 5!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=25127/1/0 df=7 of=0 ri=0 ql=0 b=10 ci=32 co=0 ca=0 - 6!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=22639/1/0 df=9 of=0 ri=0 ql=0 b=10 ci=44 co=0 ca=0 - 7 c=182 g=182 pq=1 pqc=181 qp=0 dt=937087/1/0 df=14 of=0 ri=1 ql=0 b=10 ci=627 co=0 ca=0 + 0!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=86475/1/0 df=11 of=0 ri=0 ql=0 qs=.... b=10 ci=112 co=0 ca=0 + 1!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=90875/1/0 df=15 of=0 ri=0 ql=0 qs=.... b=10 ci=143 co=0 ca=0 + 2!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=69661/1/0 df=21 of=0 ri=1 ql=0 qs=.... b=10 ci=88 co=0 ca=0 + 3!c=18446744073709551494 g=18446744073709551494 pq=1 pqc=18446744073709551493 qp=0 dt=70349/1/0 df=13 of=0 ri=0 ql=0 qs=.... b=10 ci=100 co=0 ca=0 + 4!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=38935/1/0 df=17 of=0 ri=0 ql=0 qs=.... b=10 ci=36 co=0 ca=0 + 5!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=25127/1/0 df=7 of=0 ri=0 ql=0 qs=.... b=10 ci=32 co=0 ca=0 + 6!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=22639/1/0 df=9 of=0 ri=0 ql=0 qs=.... b=10 ci=44 co=0 ca=0 + 7 c=182 g=182 pq=1 pqc=181 qp=0 dt=937087/1/0 df=14 of=0 ri=1 ql=0 qs=.... b=10 ci=627 co=0 ca=0 The first section lists the rcu_data structures for rcu_sched, the second for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an @@ -120,6 +120,32 @@ o "ql" is the number of RCU callbacks currently residing on of what state they are in (new, waiting for grace period to start, waiting for grace period to end, ready to invoke). +o "qs" gives an indication of the state of the callback queue + with four characters: + + "N" Indicates that there are callbacks queued that are not + ready to be handled by the next grace period, and thus + will be handled by the grace period following the next + one. + + "R" Indicates that there are callbacks queued that are + ready to be handled by the next grace period. + + "W" Indicates that there are callbacks queued that are + waiting on the current grace period. + + "D" Indicates that there are callbacks queued that have + already been handled by a prior grace period, and are + thus waiting to be invoked. Note that callbacks in + the process of being invoked are not counted here. + Callbacks in the process of being invoked are those + that have been removed from the rcu_data structures + queues by rcu_do_batch(), but which have not yet been + invoked. + + If there are no callbacks in a given one of the above states, + the corresponding character is replaced by ".". + o "b" is the batch limit for this CPU. If more than this number of RCU callbacks is ready to invoke, then the remainder will be deferred. diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c index ead5736f99b5..afd262f91262 100644 --- a/kernel/rcutree_trace.c +++ b/kernel/rcutree_trace.c @@ -64,7 +64,16 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp) rdp->dynticks_fqs); #endif /* #ifdef CONFIG_NO_HZ */ seq_printf(m, " of=%lu ri=%lu", rdp->offline_fqs, rdp->resched_ipi); - seq_printf(m, " ql=%ld b=%ld", rdp->qlen, rdp->blimit); + seq_printf(m, " ql=%ld qs=%c%c%c%c b=%ld", + rdp->qlen, + ".N"[rdp->nxttail[RCU_NEXT_READY_TAIL] != + rdp->nxttail[RCU_NEXT_TAIL]], + ".R"[rdp->nxttail[RCU_WAIT_TAIL] != + rdp->nxttail[RCU_NEXT_READY_TAIL]], + ".W"[rdp->nxttail[RCU_DONE_TAIL] != + rdp->nxttail[RCU_WAIT_TAIL]], + ".D"[&rdp->nxtlist != rdp->nxttail[RCU_DONE_TAIL]], + rdp->blimit); seq_printf(m, " ci=%lu co=%lu ca=%lu\n", rdp->n_cbs_invoked, rdp->n_cbs_orphaned, rdp->n_cbs_adopted); } @@ -121,7 +130,15 @@ static void print_one_rcu_data_csv(struct seq_file *m, struct rcu_data *rdp) rdp->dynticks_fqs); #endif /* #ifdef CONFIG_NO_HZ */ seq_printf(m, ",%lu,%lu", rdp->offline_fqs, rdp->resched_ipi); - seq_printf(m, ",%ld,%ld", rdp->qlen, rdp->blimit); + seq_printf(m, ",%ld,\"%c%c%c%c\",%ld", rdp->qlen, + ".N"[rdp->nxttail[RCU_NEXT_READY_TAIL] != + rdp->nxttail[RCU_NEXT_TAIL]], + ".R"[rdp->nxttail[RCU_WAIT_TAIL] != + rdp->nxttail[RCU_NEXT_READY_TAIL]], + ".W"[rdp->nxttail[RCU_DONE_TAIL] != + rdp->nxttail[RCU_WAIT_TAIL]], + ".D"[&rdp->nxtlist != rdp->nxttail[RCU_DONE_TAIL]], + rdp->blimit); seq_printf(m, ",%lu,%lu,%lu\n", rdp->n_cbs_invoked, rdp->n_cbs_orphaned, rdp->n_cbs_adopted); } -- cgit v1.2.3 From 90e6ac3657fd3b0446d585082000af3cf46439a7 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 6 Apr 2011 15:20:47 -0700 Subject: rcu: update tracing documentation for new rcutorture and rcuboost This commit documents the new debugfs rcu/rcutorture and rcu/rcuboost trace files. The description has been updated as suggested by Josh Triplett. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney --- Documentation/RCU/trace.txt | 186 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 161 insertions(+), 25 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 5aefd5fa7af5..40b530dd0fc7 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -10,34 +10,46 @@ for rcutree and next for rcutiny. CONFIG_TREE_RCU and CONFIG_TREE_PREEMPT_RCU debugfs Files and Formats -These implementations of RCU provides five debugfs files under the -top-level directory RCU: rcu/rcudata (which displays fields in struct -rcu_data), rcu/rcudata.csv (which is a .csv spreadsheet version of -rcu/rcudata), rcu/rcugp (which displays grace-period counters), -rcu/rcuhier (which displays the struct rcu_node hierarchy), and -rcu/rcu_pending (which displays counts of the reasons that the -rcu_pending() function decided that there was core RCU work to do). +These implementations of RCU provides several debugfs files under the +top-level directory "rcu": + +rcu/rcudata: + Displays fields in struct rcu_data. +rcu/rcudata.csv: + Comma-separated values spreadsheet version of rcudata. +rcu/rcugp: + Displays grace-period counters. +rcu/rcuhier: + Displays the struct rcu_node hierarchy. +rcu/rcu_pending: + Displays counts of the reasons rcu_pending() decided that RCU had + work to do. +rcu/rcutorture: + Displays rcutorture test progress. +rcu/rcuboost: + Displays RCU boosting statistics. Only present if + CONFIG_RCU_BOOST=y. The output of "cat rcu/rcudata" looks as follows: rcu_sched: - 0!c=423090 g=423091 pq=1 pqc=423090 qp=1 dt=86475/1/0 df=16319 of=163 ri=1519 ql=0 qs=.... b=10 ci=1460693 co=1648 ca=6448 - 1!c=423329 g=423330 pq=1 pqc=423329 qp=1 dt=90875/1/0 df=16231 of=157 ri=1249 ql=0 qs=.... b=10 ci=1459002 co=1614 ca=3310 - 2!c=423370 g=423371 pq=1 pqc=423370 qp=1 dt=69661/1/0 df=16125 of=163 ri=1469 ql=0 qs=.... b=10 ci=1610701 co=2015 ca=2378 - 3!c=422967 g=422968 pq=1 pqc=422967 qp=1 dt=70349/1/0 df=12528 of=163 ri=1450 ql=0 qs=.... b=10 ci=1427543 co=1430 ca=897 - 4!c=423196 g=423197 pq=1 pqc=423196 qp=0 dt=38935/1/0 df=10959 of=177 ri=1657 ql=0 qs=.... b=10 ci=1562249 co=1896 ca=533 - 5!c=422950 g=422951 pq=1 pqc=422950 qp=0 dt=25127/1/0 df=5895 of=167 ri=1549 ql=0 qs=.... b=10 ci=1777260 co=2137 ca=274 - 6!c=423396 g=423397 pq=1 pqc=423396 qp=1 dt=22639/1/0 df=4590 of=149 ri=1572 ql=0 qs=.... b=10 ci=1471186 co=1530 ca=243 - 7 c=460203 g=460203 pq=1 pqc=460202 qp=0 dt=937087/1/0 df=3298 of=149 ri=1584 ql=6 qs=N.W. b=10 ci=4026154 co=1948 ca=135 + 0!c=423090 g=423091 pq=1 pqc=423090 qp=1 dt=86475/1/0 df=16319 of=163 ri=1519 ql=0 qs=.... kt=0/W b=10 ci=1460693 co=1648 ca=6448 + 1!c=423329 g=423330 pq=1 pqc=423329 qp=1 dt=90875/1/0 df=16231 of=157 ri=1249 ql=0 qs=.... kt=0/W b=10 ci=1459002 co=1614 ca=3310 + 2!c=423370 g=423371 pq=1 pqc=423370 qp=1 dt=69661/1/0 df=16125 of=163 ri=1469 ql=0 qs=.... kt=0/W b=10 ci=1610701 co=2015 ca=2378 + 3!c=422967 g=422968 pq=1 pqc=422967 qp=1 dt=70349/1/0 df=12528 of=163 ri=1450 ql=0 qs=.... kt=0/W b=10 ci=1427543 co=1430 ca=897 + 4!c=423196 g=423197 pq=1 pqc=423196 qp=0 dt=38935/1/0 df=10959 of=177 ri=1657 ql=0 qs=.... kt=0/W b=10 ci=1562249 co=1896 ca=533 + 5!c=422950 g=422951 pq=1 pqc=422950 qp=0 dt=25127/1/0 df=5895 of=167 ri=1549 ql=0 qs=.... kt=0/W b=10 ci=1777260 co=2137 ca=274 + 6!c=423396 g=423397 pq=1 pqc=423396 qp=1 dt=22639/1/0 df=4590 of=149 ri=1572 ql=0 qs=.... kt=0/W b=10 ci=1471186 co=1530 ca=243 + 7 c=460203 g=460203 pq=1 pqc=460202 qp=0 dt=937087/1/0 df=3298 of=149 ri=1584 ql=6 qs=N.W. kt=0/W b=10 ci=4026154 co=1948 ca=135 rcu_bh: - 0!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=86475/1/0 df=11 of=0 ri=0 ql=0 qs=.... b=10 ci=112 co=0 ca=0 - 1!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=90875/1/0 df=15 of=0 ri=0 ql=0 qs=.... b=10 ci=143 co=0 ca=0 - 2!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=69661/1/0 df=21 of=0 ri=1 ql=0 qs=.... b=10 ci=88 co=0 ca=0 - 3!c=18446744073709551494 g=18446744073709551494 pq=1 pqc=18446744073709551493 qp=0 dt=70349/1/0 df=13 of=0 ri=0 ql=0 qs=.... b=10 ci=100 co=0 ca=0 - 4!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=38935/1/0 df=17 of=0 ri=0 ql=0 qs=.... b=10 ci=36 co=0 ca=0 - 5!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=25127/1/0 df=7 of=0 ri=0 ql=0 qs=.... b=10 ci=32 co=0 ca=0 - 6!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=22639/1/0 df=9 of=0 ri=0 ql=0 qs=.... b=10 ci=44 co=0 ca=0 - 7 c=182 g=182 pq=1 pqc=181 qp=0 dt=937087/1/0 df=14 of=0 ri=1 ql=0 qs=.... b=10 ci=627 co=0 ca=0 + 0!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=86475/1/0 df=11 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=112 co=0 ca=0 + 1!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=90875/1/0 df=15 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=143 co=0 ca=0 + 2!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=69661/1/0 df=21 of=0 ri=1 ql=0 qs=.... kt=0/W b=10 ci=88 co=0 ca=0 + 3!c=18446744073709551494 g=18446744073709551494 pq=1 pqc=18446744073709551493 qp=0 dt=70349/1/0 df=13 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=100 co=0 ca=0 + 4!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=38935/1/0 df=17 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=36 co=0 ca=0 + 5!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=25127/1/0 df=7 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=32 co=0 ca=0 + 6!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=22639/1/0 df=9 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=44 co=0 ca=0 + 7 c=182 g=182 pq=1 pqc=181 qp=0 dt=937087/1/0 df=14 of=0 ri=1 ql=0 qs=.... kt=0/W b=10 ci=627 co=0 ca=0 The first section lists the rcu_data structures for rcu_sched, the second for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an @@ -146,6 +158,23 @@ o "qs" gives an indication of the state of the callback queue If there are no callbacks in a given one of the above states, the corresponding character is replaced by ".". +o "kt" is the per-CPU kernel-thread state. The digit preceding + the slash is zero if there is no work pending and 1 otherwise. + The character after the slash is as follows: + + "S" The kernel thread is stopped, in other words, all + CPUs corresponding to this rcu_node structure are + offline. + + "R" The kernel thread is running. + + "W" The kernel thread is waiting because there is no work + for it to do. + + "Y" The kernel thread is yielding to avoid hogging CPU. + + "?" Unknown value, indicates a bug. + o "b" is the batch limit for this CPU. If more than this number of RCU callbacks is ready to invoke, then the remainder will be deferred. @@ -356,6 +385,113 @@ o "nn" is the number of times that this CPU needed nothing. Alert is due to short-circuit evaluation in rcu_pending(). +The output of "cat rcu/rcutorture" looks as follows: + +rcutorture test sequence: 0 (test in progress) +rcutorture update version number: 615 + +The first line shows the number of rcutorture tests that have completed +since boot. If a test is currently running, the "(test in progress)" +string will appear as shown above. The second line shows the number of +update cycles that the current test has started, or zero if there is +no test in progress. + + +The output of "cat rcu/rcuboost" looks as follows: + +0:5 tasks=.... kt=W ntb=0 neb=0 nnb=0 j=2f95 bt=300f + balk: nt=0 egt=989 bt=0 nb=0 ny=0 nos=16 +6:7 tasks=.... kt=W ntb=0 neb=0 nnb=0 j=2f95 bt=300f + balk: nt=0 egt=225 bt=0 nb=0 ny=0 nos=6 + +This information is output only for rcu_preempt. Each two-line entry +corresponds to a leaf rcu_node strcuture. The fields are as follows: + +o "n:m" is the CPU-number range for the corresponding two-line + entry. In the sample output above, the first entry covers + CPUs zero through five and the second entry covers CPUs 6 + and 7. + +o "tasks=TNEB" gives the state of the various segments of the + rnp->blocked_tasks list: + + "T" This indicates that there are some tasks that blocked + while running on one of the corresponding CPUs while + in an RCU read-side critical section. + + "N" This indicates that some of the blocked tasks are preventing + the current normal (non-expedited) grace period from + completing. + + "E" This indicates that some of the blocked tasks are preventing + the current expedited grace period from completing. + + "B" This indicates that some of the blocked tasks are in + need of RCU priority boosting. + + Each character is replaced with "." if the corresponding + condition does not hold. + +o "kt" is the state of the RCU priority-boosting kernel + thread associated with the corresponding rcu_node structure. + The state can be one of the following: + + "S" The kernel thread is stopped, in other words, all + CPUs corresponding to this rcu_node structure are + offline. + + "R" The kernel thread is running. + + "W" The kernel thread is waiting because there is no work + for it to do. + + "Y" The kernel thread is yielding to avoid hogging CPU. + + "?" Unknown value, indicates a bug. + +o "ntb" is the number of tasks boosted. + +o "neb" is the number of tasks boosted in order to complete an + expedited grace period. + +o "nnb" is the number of tasks boosted in order to complete a + normal (non-expedited) grace period. When boosting a task + that was blocking both an expedited and a normal grace period, + it is counted against the expedited total above. + +o "j" is the low-order 16 bits of the jiffies counter in + hexadecimal. + +o "bt" is the low-order 16 bits of the value that the jiffies + counter will have when we next start boosting, assuming that + the current grace period does not end beforehand. This is + also in hexadecimal. + +o "balk: nt" counts the number of times we didn't boost (in + other words, we balked) even though it was time to boost because + there were no blocked tasks to boost. This situation occurs + when there is one blocked task on one rcu_node structure and + none on some other rcu_node structure. + +o "egt" counts the number of times we balked because although + there were blocked tasks, none of them were blocking the + current grace period, whether expedited or otherwise. + +o "bt" counts the number of times we balked because boosting + had already been initiated for the current grace period. + +o "nb" counts the number of times we balked because there + was at least one task blocking the current non-expedited grace + period that never had blocked. If it is already running, it + just won't help to boost its priority! + +o "ny" counts the number of times we balked because it was + not yet time to start boosting. + +o "nos" counts the number of times we balked for other + reasons, e.g., the grace period ended first. + + CONFIG_TINY_RCU and CONFIG_TINY_PREEMPT_RCU debugfs Files and Formats These implementations of RCU provides a single debugfs file under the @@ -422,9 +558,9 @@ o "neb" is the number of expedited grace periods that have had o "nnb" is the number of normal grace periods that have had to resort to RCU priority boosting since boot. -o "j" is the low-order 12 bits of the jiffies counter in hexadecimal. +o "j" is the low-order 16 bits of the jiffies counter in hexadecimal. -o "bt" is the low-order 12 bits of the value that the jiffies counter +o "bt" is the low-order 16 bits of the value that the jiffies counter will have at the next time that boosting is scheduled to begin. o In the line beginning with "normal balk", the fields are as follows: -- cgit v1.2.3 From 15ba0ba860871cf74b48b1bb47c26c91a66126f3 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 6 Apr 2011 16:01:16 -0700 Subject: rcu: add grace-period age and more kthread state to tracing This commit adds the age in jiffies of the current grace period along with the duration in jiffies of the longest grace period since boot to the rcu/rcugp debugfs file. It also adds an additional "O" state to kthread tracing to differentiate between the kthread waiting due to having nothing to do on the one hand and waiting due to being on the wrong CPU on the other hand. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney --- Documentation/RCU/trace.txt | 12 ++++++++++-- kernel/rcutree.c | 10 ++++++++++ kernel/rcutree.h | 7 +++++-- kernel/rcutree_trace.c | 37 +++++++++++++++++++++++++++++-------- 4 files changed, 54 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 40b530dd0fc7..fd4bffb6e8c9 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -159,8 +159,8 @@ o "qs" gives an indication of the state of the callback queue the corresponding character is replaced by ".". o "kt" is the per-CPU kernel-thread state. The digit preceding - the slash is zero if there is no work pending and 1 otherwise. - The character after the slash is as follows: + the first slash is zero if there is no work pending and 1 + otherwise. The character between the slashes is as follows: "S" The kernel thread is stopped, in other words, all CPUs corresponding to this rcu_node structure are @@ -171,10 +171,18 @@ o "kt" is the per-CPU kernel-thread state. The digit preceding "W" The kernel thread is waiting because there is no work for it to do. + "O" The kernel thread is waiting because it has been + forced off of its designated CPU or because its + ->cpus_allowed mask permits it to run on other than + its designated CPU. + "Y" The kernel thread is yielding to avoid hogging CPU. "?" Unknown value, indicates a bug. + The number after the final slash is the CPU that the kthread + is actually running on. + o "b" is the batch limit for this CPU. If more than this number of RCU callbacks is ready to invoke, then the remainder will be deferred. diff --git a/kernel/rcutree.c b/kernel/rcutree.c index bb84deca3319..27b6d8de82f6 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -92,6 +92,7 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active); */ static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task); DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status); +DEFINE_PER_CPU(int, rcu_cpu_kthread_cpu); static DEFINE_PER_CPU(wait_queue_head_t, rcu_cpu_wq); DEFINE_PER_CPU(char, rcu_cpu_has_work); static char rcu_kthreads_spawnable; @@ -888,6 +889,8 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags) static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags) __releases(rcu_get_root(rsp)->lock) { + unsigned long gp_duration; + WARN_ON_ONCE(!rcu_gp_in_progress(rsp)); /* @@ -895,6 +898,9 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags) * is seen before the assignment to rsp->completed. */ smp_mb(); /* See above block comment. */ + gp_duration = jiffies - rsp->gp_start; + if (gp_duration > rsp->gp_max) + rsp->gp_max = gp_duration; rsp->completed = rsp->gpnum; rsp->signaled = RCU_GP_IDLE; rcu_start_gp(rsp, flags); /* releases root node's rnp->lock. */ @@ -1583,12 +1589,15 @@ static int rcu_cpu_kthread_should_stop(int cpu) smp_processor_id() != cpu) { if (kthread_should_stop()) return 1; + per_cpu(rcu_cpu_kthread_status, cpu) = RCU_KTHREAD_OFFCPU; + per_cpu(rcu_cpu_kthread_cpu, cpu) = raw_smp_processor_id(); local_bh_enable(); schedule_timeout_uninterruptible(1); if (!cpumask_equal(¤t->cpus_allowed, cpumask_of(cpu))) set_cpus_allowed_ptr(current, cpumask_of(cpu)); local_bh_disable(); } + per_cpu(rcu_cpu_kthread_cpu, cpu) = cpu; return 0; } @@ -1656,6 +1665,7 @@ static int __cpuinit rcu_spawn_one_cpu_kthread(int cpu) if (IS_ERR(t)) return PTR_ERR(t); kthread_bind(t, cpu); + per_cpu(rcu_cpu_kthread_cpu, cpu) = cpu; WARN_ON_ONCE(per_cpu(rcu_cpu_kthread_task, cpu) != NULL); per_cpu(rcu_cpu_kthread_task, cpu) = t; wake_up_process(t); diff --git a/kernel/rcutree.h b/kernel/rcutree.h index 67341dbebd95..37502a27a072 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -93,8 +93,9 @@ struct rcu_dynticks { #define RCU_KTHREAD_STOPPED 0 #define RCU_KTHREAD_RUNNING 1 #define RCU_KTHREAD_WAITING 2 -#define RCU_KTHREAD_YIELDING 3 -#define RCU_KTHREAD_MAX 3 +#define RCU_KTHREAD_OFFCPU 3 +#define RCU_KTHREAD_YIELDING 4 +#define RCU_KTHREAD_MAX 4 /* * Definition for node within the RCU grace-period-detection hierarchy. @@ -383,6 +384,8 @@ struct rcu_state { /* but in jiffies. */ unsigned long jiffies_stall; /* Time at which to check */ /* for CPU stalls. */ + unsigned long gp_max; /* Maximum GP duration in */ + /* jiffies. */ char *name; /* Name of structure. */ }; diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c index 3baa235786b5..564b8fef2a7e 100644 --- a/kernel/rcutree_trace.c +++ b/kernel/rcutree_trace.c @@ -47,13 +47,14 @@ #include "rcutree.h" DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_status); +DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_cpu); DECLARE_PER_CPU(char, rcu_cpu_has_work); static char convert_kthread_status(unsigned int kthread_status) { if (kthread_status > RCU_KTHREAD_MAX) return '?'; - return "SRWY"[kthread_status]; + return "SRWOY"[kthread_status]; } static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp) @@ -74,7 +75,7 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp) rdp->dynticks_fqs); #endif /* #ifdef CONFIG_NO_HZ */ seq_printf(m, " of=%lu ri=%lu", rdp->offline_fqs, rdp->resched_ipi); - seq_printf(m, " ql=%ld qs=%c%c%c%c kt=%d/%c b=%ld", + seq_printf(m, " ql=%ld qs=%c%c%c%c kt=%d/%c/%d b=%ld", rdp->qlen, ".N"[rdp->nxttail[RCU_NEXT_READY_TAIL] != rdp->nxttail[RCU_NEXT_TAIL]], @@ -86,6 +87,7 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp) per_cpu(rcu_cpu_has_work, rdp->cpu), convert_kthread_status(per_cpu(rcu_cpu_kthread_status, rdp->cpu)), + per_cpu(rcu_cpu_kthread_cpu, rdp->cpu), rdp->blimit); seq_printf(m, " ci=%lu co=%lu ca=%lu\n", rdp->n_cbs_invoked, rdp->n_cbs_orphaned, rdp->n_cbs_adopted); @@ -312,16 +314,35 @@ static const struct file_operations rcuhier_fops = { .release = single_release, }; +static void show_one_rcugp(struct seq_file *m, struct rcu_state *rsp) +{ + unsigned long flags; + unsigned long completed; + unsigned long gpnum; + unsigned long gpage; + unsigned long gpmax; + struct rcu_node *rnp = &rsp->node[0]; + + raw_spin_lock_irqsave(&rnp->lock, flags); + completed = rsp->completed; + gpnum = rsp->gpnum; + if (rsp->completed == rsp->gpnum) + gpage = 0; + else + gpage = jiffies - rsp->gp_start; + gpmax = rsp->gp_max; + raw_spin_unlock_irqrestore(&rnp->lock, flags); + seq_printf(m, "%s: completed=%ld gpnum=%lu age=%ld max=%ld\n", + rsp->name, completed, gpnum, gpage, gpmax); +} + static int show_rcugp(struct seq_file *m, void *unused) { #ifdef CONFIG_TREE_PREEMPT_RCU - seq_printf(m, "rcu_preempt: completed=%ld gpnum=%lu\n", - rcu_preempt_state.completed, rcu_preempt_state.gpnum); + show_one_rcugp(m, &rcu_preempt_state); #endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */ - seq_printf(m, "rcu_sched: completed=%ld gpnum=%lu\n", - rcu_sched_state.completed, rcu_sched_state.gpnum); - seq_printf(m, "rcu_bh: completed=%ld gpnum=%lu\n", - rcu_bh_state.completed, rcu_bh_state.gpnum); + show_one_rcugp(m, &rcu_sched_state); + show_one_rcugp(m, &rcu_bh_state); return 0; } -- cgit v1.2.3 From 5ece5bab3ed8594ce2c85c6c6e6b82109db36ca7 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 22 Apr 2011 18:08:51 -0700 Subject: rcu: Add forward-progress diagnostic for per-CPU kthreads Increment a per-CPU counter on each pass through rcu_cpu_kthread()'s service loop, and add it to the rcudata trace output. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- Documentation/RCU/trace.txt | 39 ++++++++++++++++++++++----------------- kernel/rcutree.c | 2 ++ kernel/rcutree_trace.c | 4 +++- 3 files changed, 27 insertions(+), 18 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index fd4bffb6e8c9..8173cec473aa 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -33,23 +33,23 @@ rcu/rcuboost: The output of "cat rcu/rcudata" looks as follows: rcu_sched: - 0!c=423090 g=423091 pq=1 pqc=423090 qp=1 dt=86475/1/0 df=16319 of=163 ri=1519 ql=0 qs=.... kt=0/W b=10 ci=1460693 co=1648 ca=6448 - 1!c=423329 g=423330 pq=1 pqc=423329 qp=1 dt=90875/1/0 df=16231 of=157 ri=1249 ql=0 qs=.... kt=0/W b=10 ci=1459002 co=1614 ca=3310 - 2!c=423370 g=423371 pq=1 pqc=423370 qp=1 dt=69661/1/0 df=16125 of=163 ri=1469 ql=0 qs=.... kt=0/W b=10 ci=1610701 co=2015 ca=2378 - 3!c=422967 g=422968 pq=1 pqc=422967 qp=1 dt=70349/1/0 df=12528 of=163 ri=1450 ql=0 qs=.... kt=0/W b=10 ci=1427543 co=1430 ca=897 - 4!c=423196 g=423197 pq=1 pqc=423196 qp=0 dt=38935/1/0 df=10959 of=177 ri=1657 ql=0 qs=.... kt=0/W b=10 ci=1562249 co=1896 ca=533 - 5!c=422950 g=422951 pq=1 pqc=422950 qp=0 dt=25127/1/0 df=5895 of=167 ri=1549 ql=0 qs=.... kt=0/W b=10 ci=1777260 co=2137 ca=274 - 6!c=423396 g=423397 pq=1 pqc=423396 qp=1 dt=22639/1/0 df=4590 of=149 ri=1572 ql=0 qs=.... kt=0/W b=10 ci=1471186 co=1530 ca=243 - 7 c=460203 g=460203 pq=1 pqc=460202 qp=0 dt=937087/1/0 df=3298 of=149 ri=1584 ql=6 qs=N.W. kt=0/W b=10 ci=4026154 co=1948 ca=135 + 0 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=545/1/0 df=50 of=0 ri=0 ql=163 qs=NRW. kt=0/W/0 ktl=ebc3 b=10 ci=153737 co=0 ca=0 + 1 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=967/1/0 df=58 of=0 ri=0 ql=634 qs=NRW. kt=0/W/1 ktl=58c b=10 ci=191037 co=0 ca=0 + 2 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=1081/1/0 df=175 of=0 ri=0 ql=74 qs=N.W. kt=0/W/2 ktl=da94 b=10 ci=75991 co=0 ca=0 + 3 c=20942 g=20943 pq=1 pqc=20942 qp=1 dt=1846/0/0 df=404 of=0 ri=0 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=72261 co=0 ca=0 + 4 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=369/1/0 df=83 of=0 ri=0 ql=48 qs=N.W. kt=0/W/4 ktl=e0e7 b=10 ci=128365 co=0 ca=0 + 5 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=381/1/0 df=64 of=0 ri=0 ql=169 qs=NRW. kt=0/W/5 ktl=fb2f b=10 ci=164360 co=0 ca=0 + 6 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=1037/1/0 df=183 of=0 ri=0 ql=62 qs=N.W. kt=0/W/6 ktl=d2ad b=10 ci=65663 co=0 ca=0 + 7 c=20897 g=20897 pq=1 pqc=20896 qp=0 dt=1572/0/0 df=382 of=0 ri=0 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=75006 co=0 ca=0 rcu_bh: - 0!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=86475/1/0 df=11 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=112 co=0 ca=0 - 1!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=90875/1/0 df=15 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=143 co=0 ca=0 - 2!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=69661/1/0 df=21 of=0 ri=1 ql=0 qs=.... kt=0/W b=10 ci=88 co=0 ca=0 - 3!c=18446744073709551494 g=18446744073709551494 pq=1 pqc=18446744073709551493 qp=0 dt=70349/1/0 df=13 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=100 co=0 ca=0 - 4!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=38935/1/0 df=17 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=36 co=0 ca=0 - 5!c=18446744073709551494 g=18446744073709551494 pq=0 pqc=18446744073709551493 qp=1 dt=25127/1/0 df=7 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=32 co=0 ca=0 - 6!c=18446744073709551496 g=18446744073709551496 pq=1 pqc=18446744073709551495 qp=0 dt=22639/1/0 df=9 of=0 ri=0 ql=0 qs=.... kt=0/W b=10 ci=44 co=0 ca=0 - 7 c=182 g=182 pq=1 pqc=181 qp=0 dt=937087/1/0 df=14 of=0 ri=1 ql=0 qs=.... kt=0/W b=10 ci=627 co=0 ca=0 + 0 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=545/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/0 ktl=ebc3 b=10 ci=0 co=0 ca=0 + 1 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=967/1/0 df=3 of=0 ri=1 ql=0 qs=.... kt=0/W/1 ktl=58c b=10 ci=151 co=0 ca=0 + 2 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=1081/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/2 ktl=da94 b=10 ci=0 co=0 ca=0 + 3 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=1846/0/0 df=8 of=0 ri=1 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=0 co=0 ca=0 + 4 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=369/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/4 ktl=e0e7 b=10 ci=0 co=0 ca=0 + 5 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=381/1/0 df=4 of=0 ri=1 ql=0 qs=.... kt=0/W/5 ktl=fb2f b=10 ci=0 co=0 ca=0 + 6 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=1037/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/6 ktl=d2ad b=10 ci=0 co=0 ca=0 + 7 c=1474 g=1474 pq=1 pqc=1473 qp=0 dt=1572/0/0 df=8 of=0 ri=1 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=0 co=0 ca=0 The first section lists the rcu_data structures for rcu_sched, the second for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an @@ -160,7 +160,8 @@ o "qs" gives an indication of the state of the callback queue o "kt" is the per-CPU kernel-thread state. The digit preceding the first slash is zero if there is no work pending and 1 - otherwise. The character between the slashes is as follows: + otherwise. The character between the first pair of slashes is + as follows: "S" The kernel thread is stopped, in other words, all CPUs corresponding to this rcu_node structure are @@ -183,6 +184,10 @@ o "kt" is the per-CPU kernel-thread state. The digit preceding The number after the final slash is the CPU that the kthread is actually running on. +o "ktl" is the low-order 16 bits (in hexadecimal) of the count of + the number of times that this CPU's per-CPU kthread has gone + through its loop servicing invoke_rcu_cpu_kthread() requests. + o "b" is the batch limit for this CPU. If more than this number of RCU callbacks is ready to invoke, then the remainder will be deferred. diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 27b6d8de82f6..575d6414763e 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -93,6 +93,7 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active); static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task); DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status); DEFINE_PER_CPU(int, rcu_cpu_kthread_cpu); +DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_loops); static DEFINE_PER_CPU(wait_queue_head_t, rcu_cpu_wq); DEFINE_PER_CPU(char, rcu_cpu_has_work); static char rcu_kthreads_spawnable; @@ -1625,6 +1626,7 @@ static int rcu_cpu_kthread(void *arg) break; } *statusp = RCU_KTHREAD_RUNNING; + per_cpu(rcu_cpu_kthread_loops, cpu)++; local_irq_save(flags); work = *workp; *workp = 0; diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c index 564b8fef2a7e..9678cc3650f5 100644 --- a/kernel/rcutree_trace.c +++ b/kernel/rcutree_trace.c @@ -48,6 +48,7 @@ DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_status); DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_cpu); +DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_loops); DECLARE_PER_CPU(char, rcu_cpu_has_work); static char convert_kthread_status(unsigned int kthread_status) @@ -75,7 +76,7 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp) rdp->dynticks_fqs); #endif /* #ifdef CONFIG_NO_HZ */ seq_printf(m, " of=%lu ri=%lu", rdp->offline_fqs, rdp->resched_ipi); - seq_printf(m, " ql=%ld qs=%c%c%c%c kt=%d/%c/%d b=%ld", + seq_printf(m, " ql=%ld qs=%c%c%c%c kt=%d/%c/%d ktl=%x b=%ld", rdp->qlen, ".N"[rdp->nxttail[RCU_NEXT_READY_TAIL] != rdp->nxttail[RCU_NEXT_TAIL]], @@ -88,6 +89,7 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp) convert_kthread_status(per_cpu(rcu_cpu_kthread_status, rdp->cpu)), per_cpu(rcu_cpu_kthread_cpu, rdp->cpu), + per_cpu(rcu_cpu_kthread_loops, rdp->cpu) & 0xffff, rdp->blimit); seq_printf(m, " ci=%lu co=%lu ca=%lu\n", rdp->n_cbs_invoked, rdp->n_cbs_orphaned, rdp->n_cbs_adopted); -- cgit v1.2.3