summaryrefslogtreecommitdiff
path: root/kernel/rcu/tree_plugin.h
AgeCommit message (Collapse)AuthorFilesLines
2018-08-31rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special unionPaul E. McKenney1-9/+4
The ->b.exp_need_qs field is now set only to false, so this commit removes it. The job this field used to do is now done by the rcu_data structure's ->deferred_qs field, which is a consequence of a better split between task-based (the rcu_node structure's ->exp_tasks field) and CPU-based (the aforementioned rcu_data structure's ->deferred_qs field) tracking of quiescent states for RCU-preempt expedited grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-31rcu: Allow processing deferred QSes for exiting RCU-preempt readersPaul E. McKenney1-1/+1
If an RCU-preempt read-side critical section is exiting, that is, ->rcu_read_lock_nesting is negative, then it is a good time to look at the possibility of reporting deferred quiescent states. This commit therefore updates the checks in rcu_preempt_need_deferred_qs() to allow exiting critical sections to report deferred quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-31rcu: Defer reporting RCU-preempt quiescent states when disabledPaul E. McKenney1-35/+109
This commit defers reporting of RCU-preempt quiescent states at rcu_read_unlock_special() time when any of interrupts, softirq, or preemption are disabled. These deferred quiescent states are reported at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline operation. Of course, if another RCU read-side critical section has started in the meantime, the reporting of the quiescent state will be further deferred. This also means that disabling preemption, interrupts, and/or softirqs will act as an RCU-preempt read-side critical section. This is enforced by checking preempt_count() as needed. Some special cases must be handled on an ad-hoc basis, for example, context switch is a quiescent state even though both the scheduler and do_exit() disable preemption. In these cases, additional calls to rcu_preempt_deferred_qs() override the preemption disabling. Similar logic overrides disabled interrupts in rcu_preempt_check_callbacks() because in this case the quiescent state happened just before the corresponding scheduling-clock interrupt. In theory, this change lifts a long-standing restriction that required that if interrupts were disabled across a call to rcu_read_unlock() that the matching rcu_read_lock() also be contained within that interrupts-disabled region of code. Because the reporting of the corresponding RCU-preempt quiescent state is now deferred until after interrupts have been enabled, it is no longer possible for this situation to result in deadlocks involving the scheduler's runqueue and priority-inheritance locks. This may allow some code simplification that might reduce interrupt latency a bit. Unfortunately, in practice this would also defer deboosting a low-priority task that had been subjected to RCU priority boosting, so real-time-response considerations might well force this restriction to remain in place. Because RCU-preempt grace periods are now blocked not only by RCU read-side critical sections, but also by disabling of interrupts, preemption, and softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels. This may require some additional plumbing to provide the network denial-of-service guarantees that have been traditionally provided by RCU-bh. Once these are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched. This would mean that all kernels would have but one flavor of RCU, which would open the door to significant code cleanup. Moving to a single flavor of RCU would also have the beneficial effect of reducing the NOCB kthreads by at least a factor of two. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback from Joel Fernandes. ] [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located by kbuild test robot involving recursion via rcu_preempt_deferred_qs(). ]
2018-08-13Merge branch 'sched-core-for-linus' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Cleanup and improvement of NUMA balancing - Refactoring and improvements to the PELT (Per Entity Load Tracking) code - Watchdog simplification and related cleanups - The usual pile of small incremental fixes and improvements * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) watchdog: Reduce message verbosity stop_machine: Reflow cpu_stop_queue_two_works() sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() sched/numa: Use group_weights to identify if migration degrades locality sched/numa: Update the scan period without holding the numa_group lock sched/numa: Remove numa_has_capacity() sched/numa: Modify migrate_swap() to accept additional parameters sched/numa: Remove unused task_capacity from 'struct numa_stats' sched/numa: Skip nodes that are at 'hoplimit' sched/debug: Reverse the order of printing faults sched/numa: Use task faults only if numa_group is not yet set up sched/numa: Set preferred_node based on best_cpu sched/numa: Simplify load_too_imbalanced() sched/numa: Evaluate move once per node sched/numa: Remove redundant field sched/debug: Show the sum wait time of a task group sched/fair: Remove #ifdefs from scale_rt_capacity() sched/core: Remove get_cpu() from sched_fork() sched/cpufreq: Clarify sugov_get_util() sched/sysctl: Remove unused sched_time_avg_ms sysctl ...
2018-07-13rcu: Print stall-warning NMI dyntick state in hexadecimalPaul E. McKenney1-1/+1
The ->dynticks_nmi_nesting field records the nesting depth of both interrupt and NMI handlers. Because the kernel can enter interrupts and never leave them (and vice versa) and because NMIs can interrupt manipulation of the ->dynticks_nmi_nesting field, the values in this field must be both chosen and maniupated very carefully. As a result, although the value is zero when the corresponding CPU is executing neither an interrupt nor an NMI handler, it is 4,611,686,018,427,387,906 on 64-bit systems when there is a single level of interrupt/NMI handling in progress. This number is difficult to remember and interpret, so this commit switches the output to hexadecimal, resulting in the much nicer 0x4000000000000002. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Remove unused local variable "cpu"Paul E. McKenney1-2/+0
One danger of using __maybe_unused is that the compiler doesn't yell at you when you remove the last reference, witness rcu_bind_gp_kthread() and its local variable "cpu". This commit removes this local variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Remove unused rcu_kick_nohz_cpu() functionPaul E. McKenney1-17/+0
The rcu_kick_nohz_cpu() function is no longer used, and the functionality it used to provide is now provided by a call to resched_cpu() in the force-quiescent-state function rcu_implicit_dynticks_qs(). This commit therefore removes rcu_kick_nohz_cpu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Clarify and correct the rcu_preempt_qs() header commentPaul E. McKenney1-6/+10
The rcu_preempt_qs() function only applies to the CPU, not the task. A task really is allowed to invoke this function while in an RCU-preempt read-side critical section, but only if it has first added itself to some leaf rcu_node structure's ->blkd_tasks list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Mark task as .need_qs less aggressivelyPaul E. McKenney1-1/+4
If any scheduling-clock interrupt interrupts an RCU-preempt read-side critical section, the interrupted task's ->rcu_read_unlock_special.b.need_qs field is set. This causes the outermost rcu_read_unlock() to incur the extra overhead of calling into rcu_read_unlock_special(). This commit reduces that overhead by setting ->rcu_read_unlock_special.b.need_qs only if the grace period has been in effect for more than one second. Why one second? Because this is comfortably smaller than the minimum RCU CPU stall-warning timeout of three seconds, but long enough that the .need_qs marking should happen quite rarely. And if your RCU read-side critical section has run on-CPU for a full second, it is not unreasonable to invest some CPU time in ending the grace period quickly. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Use pr_fmt to prefix "rcu: " to logging outputJoe Perches1-4/+6
This commit also adjusts some whitespace while in the area. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Revert string-breaking %s as requested by Andy Shevchenko. ]
2018-07-13rcu: Make rcu_read_unlock_special() staticPaul E. McKenney1-1/+2
Because rcu_read_unlock_special() is no longer used outside of kernel/rcu/tree_plugin.h, this commit makes it static. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Add CPU online/offline state to dump_blkd_tasks()Paul E. McKenney1-5/+20
Interactions between CPU-hotplug operations and grace-period initialization can result in dump_blkd_tasks(). One of the first debugging actions in this case is to search back in dmesg to work out which of the affected rcu_node structure's CPUs are online and to determine the last CPU-hotplug operation affecting any of those CPUs. This can be laborious and error-prone, especially when console output is lost. This commit therefore causes dump_blkd_tasks() to dump the state of the affected rcu_node structure's CPUs and the last grace period during which the last offline and online operation affected each of these CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Add up-tree information to dump_blkd_tasks() diagnosticsPaul E. McKenney1-4/+6
This commit updates dump_blkd_tasks() to print out quiescent-state bitmasks for the rcu_node structures further up the tree. This information helps debugging of interactions between CPU-hotplug operations and RCU grace-period initialization. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Add RCU-preempt check for waiting on newly onlined CPUPaul E. McKenney1-0/+3
RCU should only be waiting on CPUs that were online at the time that the current grace period started. Failure to abide by this rule can result in confusing splats during grace-period cleanup and initialization. This commit therefore adds a check to RCU-preempt's preempted-task queuing that checks for waiting on newly onlined CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Suppress false-positive splats from mid-init task resumePaul E. McKenney1-1/+2
Consider the following sequence of events in a PREEMPT=y kernel: 1. All CPUs corresponding to a given leaf rcu_node structure are offline. 2. The first phase of the rcu_gp_init() function's grace-period initialization runs, and sets that rcu_node structure's ->qsmaskinit to zero, as it should. 3. One of the CPUs corresponding to that rcu_node structure comes back online. Note that because this CPU came online after the grace period started, this grace period can safely ignore this newly onlined CPU. 4. A task running on the newly onlined CPU enters an RCU-preempt read-side critical section, and is then preempted. Because the corresponding rcu_node structure's ->qsmask is zero, rcu_preempt_ctxt_queue() leaves the rcu_node structure's ->gp_tasks field NULL, as it should. 5. The rcu_gp_init() function continues running the second phase of grace-period initialization. The ->qsmask field of the parent of the aforementioned leaf rcu_node structure is set to not expect a quiescent state from the leaf, as is only right and proper. However, when rcu_gp_init() reaches the leaf, it invokes rcu_preempt_check_blocked_tasks(), which sees that the leaf's ->blkd_tasks list is non-empty, and therefore sets the leaf's ->gp_tasks field to reference the first task on that list. 6. The grace period ends before the preempted task resumes, which is perfectly fine, given that this grace period was under no obligation to wait for that task to exit its late-starting RCU-preempt read-side critical section. Unfortunately, the leaf's ->gp_tasks field is non-NULL, so rcu_gp_cleanup() splats. After all, it appears to rcu_gp_cleanup() that the grace period failed to wait for a task that was supposed to be blocking that grace period. This commit avoids this false-positive splat by adding a check of both ->qsmaskinit and ->wait_blkd_tasks to rcu_preempt_check_blocked_tasks(). If both ->qsmaskinit and ->wait_blkd_tasks are zero, then the task must have entered its RCU-preempt read-side critical section late (after all, the CPU that it is running on was not online at that time), which means that the upper-level rcu_node structure won't be waiting for anything on the leaf anyway. If ->wait_blkd_tasks is non-zero, then there is at least one task on ths rcu_node structure's ->blkd_tasks list whose RCU read-side critical section predates the current grace period. If ->qsmaskinit is non-zero, there is at least one CPU that was online at the start of the current grace period. Thus, if both are zero, there is nothing to wait for. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Fix typo and add additional debugPaul E. McKenney1-2/+8
This commit fixes a typo and adds some additional debugging to the message emitted when a task blocking the current grace period is listed as blocking it when either that grace period ends or the next grace period begins. This commit also reformats the console message for readability. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Remove ->gpnum and ->completedPaul E. McKenney1-1/+1
Now that everything has been converted to use ->gp_seq instead of ->gpnum and ->completed, this commit removes ->gpnum and ->completed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Convert rcu_quiescent_state_report tracepoint to ->gp_seqPaul E. McKenney1-1/+1
This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Convert rcu_unlock_preempted_task tracepoint to ->gp_seqPaul E. McKenney1-2/+2
This commit makes the rcu_unlock_preempted_task tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Convert rcu_preempt_task tracepoint to ->gp_seqPaul E. McKenney1-2/+2
This commit makes the rcu_preempt_task tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Convert rcu_grace_period tracepoint to gp_seqPaul E. McKenney1-1/+1
This commit makes the rcu_grace_period tracepoint use gp_seq instead of ->gpnum or ->completed. It also introduces a "cpuofl-bgp" string to less obscurely indicate when a CPU has gone offline while a grace period is waiting on it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Make rcu_nocb_wait_gp() check if GP already requestedPaul E. McKenney1-5/+10
This commit makes rcu_nocb_wait_gp() check rdp->gp_seq_needed to see if the current CPU already knows about the needed grace period having already been requested. If so, it avoids acquiring the corresponding leaf rcu_node structure's ->lock, thus decreasing contention. This optimization is intended for cases where either multiple leader rcuo kthreads are running on the same CPU or these kthreads are running on a non-offloaded (e.g., housekeeping) CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Move lock release past "if" as suggested by Joel Fernandes. ] [ paulmck: Fix caching of furthest-future requested grace period. ]
2018-07-13rcu: Make RCU CPU stall warnings use ->gp_seqPaul E. McKenney1-4/+4
This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(), print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum and ->completed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Convert grace-period requests to ->gp_seqPaul E. McKenney1-3/+3
This commit converts the grace-period request code paths from ->completed and ->gpnum to ->gp_seq. The need_future_gp_element() macro encapsulates the shift operation required to use ->gp_seq as an index to the ->need_future_gp[] array. The rcu_cbs_completed() function is removed in favor of the rcu_seq_snap() function. The rcu_start_this_gp() gets some temporary consistency checks and uses rcu_seq_done(), rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place of the earlier open-coded comparisons of ->gpnum and ->completed. The rcu_future_gp_cleanup() function replaces use of ->completed with ->gp_seq. The rcu_accelerate_cbs() function replaces a call to rcu_cbs_completed() with one to rcu_seq_snap(). The rcu_advance_cbs() function replaces an access to >completed with one to ->gp_seq and adds some temporary warnings. The rcu_nocb_wait_gp() function replaces a call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded comparison with rcu_seq_done(). The temporary warnings will be removed when the various ->gpnum and ->completed fields are removed. Their purpose is to locate code who might still be using ->gpnum and ->completed. (Much easier that way than trying to trace down the causes of too-short grace periods and grace-period hangs!) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Convert ->completedqs to ->gp_seqPaul E. McKenney1-2/+2
This commit switches the quiescent-state no-backtracking checks from ->gpnum and ->completed to ->gp_seq. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Convert ->rcu_iw_gpnum to ->gp_seqPaul E. McKenney1-1/+1
This commit switches the interrupt-disabled detection mechanism to ->gp_seq. This mechanism is used as part of RCU CPU stall warnings, and detects cases where the stall is due to a CPU having interrupts disabled. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Move rcu_nocb_gp_get() to ->gp_seqPaul E. McKenney1-1/+1
This commit makes rcu_try_advance_all_cbs() use ->gp_seq. It uses rcu_seq_ctr() in order to shift away the state bits, so that the low-order bits of the result may safely be used to index ->nocb_gp_wq[]. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-13rcu: Move rcu_try_advance_all_cbs() to ->gp_seqPaul E. McKenney1-1/+2
This commit makes rcu_try_advance_all_cbs() use ->gp_seq, with the exception of tracing, which will be converted later. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26rcu: Use the proper lockdep annotation in dump_blkd_tasks()Boqun Feng1-1/+1
Sparse reported this: | kernel/rcu/tree_plugin.h:814:9: warning: incorrect type in argument 1 (different modifiers) | kernel/rcu/tree_plugin.h:814:9: expected struct lockdep_map const *lock | kernel/rcu/tree_plugin.h:814:9: got struct lockdep_map [noderef] *<noident> This is caused by using vanilla lockdep annotations on rcu_node::lock, and that requires accessing ->lock of rcu_node directly. However we need to keep rcu_node::lock __private to avoid breaking its extra ordering guarantee. And we have a dedicated lockdep annotation for rcu_node::lock, so use it. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26rcu: Add debugging info to assertionPaul E. McKenney1-2/+36
The WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp()) in rcu_gp_cleanup() triggers (inexplicably, of course) every so often. This commit therefore extracts more information. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-20sched/swait: Rename to exclusivePeter Zijlstra1-6/+6
Since swait basically implemented exclusive waits only, make sure the API reflects that. $ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]*" -e "\<prepare_to_swait\>" | while read file; do sed -i -e 's/\<swake_up\>/&_one/g' -e 's/\<swait_event[^ (]*/&_exclusive/g' -e 's/\<prepare_to_swait\>/&_exclusive/g' $file; done With a few manual touch-ups. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org
2018-05-15Merge branches 'exp.2018.05.15a', 'fixes.2018.05.15a', 'lock.2018.05.15a' ↵Paul E. McKenney1-40/+58
and 'torture.2018.05.15a' into HEAD exp.2018.05.15a: Parallelize expedited grace-period initialization. fixes.2018.05.15a: Miscellaneous fixes. lock.2018.05.15a: Decrease lock contention on root rcu_node structure, which is a step towards merging RCU flavors. torture.2018.05.15a: Torture-test updates.
2018-05-15rcu: Make rcu_start_future_gp() caller select grace periodPaul E. McKenney1-4/+5
The rcu_accelerate_cbs() function selects a grace-period target, which it uses to have rcu_segcblist_accelerate() assign numbers to recently queued callbacks. Then it invokes rcu_start_future_gp(), which selects a grace-period target again, which is a bit pointless. This commit therefore changes rcu_start_future_gp() to take the grace-period target as a parameter, thus avoiding double selection. This commit also changes the name of rcu_start_future_gp() to rcu_start_this_gp() to reflect this change in functionality, and also makes a similar change to the name of trace_rcu_future_gp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Make rcu_gp_cleanup() more accurately predict need for new GPPaul E. McKenney1-17/+0
Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset state to reflect the end of the grace period. It also checks to see whether a new grace period is needed, but in a number of cases, rather than directly cause the new grace period to be immediately started, it instead leaves the grace-period-needed state where various fail-safes can find it. This works fine, but results in higher contention on the root rcu_node structure's ->lock, which is undesirable, and contention on that lock has recently become noticeable. This commit therefore makes rcu_gp_cleanup() immediately start a new grace period if there is any need for one. It is quite possible that it will later be necessary to throttle the grace-period rate, but that can be dealt with when and if. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Add accessor macros for the ->need_future_gp[] arrayPaul E. McKenney1-1/+1
Accessors for the ->need_future_gp[] array are currently open-coded, which makes them difficult to change. To improve maintainability, this commit adds need_future_gp_mask() to compute the indexing mask from the array size, need_future_gp_element() to access the element corresponding to the specified grace-period number, and need_any_future_gp() to determine if any future grace period is needed. This commit also applies need_future_gp_element() to existing open-coded single-element accesses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Add leaf-node macrosPaul E. McKenney1-2/+2
This commit adds rcu_first_leaf_node() that returns a pointer to the first leaf rcu_node structure in the specified RCU flavor and an rcu_is_leaf_node() that returns true iff the specified rcu_node structure is a leaf. This commit also uses these macros where appropriate. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Update rcu_bind_gp_kthread() header commentPaul E. McKenney1-2/+1
The header comment for rcu_bind_gp_kthread() refers to sysidle, which is no longer with us. However, it is still important to bind RCU's grace-period kthreads to the housekeeping CPU(s), so rather than remove rcu_bind_gp_kthread(), this commit updates the comment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Move __rcu_read_lock() and __rcu_read_unlock() to tree_plugin.hPaul E. McKenney1-0/+44
The __rcu_read_lock() and __rcu_read_unlock() functions were moved to kernel/rcu/update.c in order to implement tiny preemptible RCU. However, tiny preemptible RCU was removed from the kernel a long time ago, so this commit belatedly moves them back into the only remaining preemptible-RCU code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs()Paul E. McKenney1-2/+2
Commit e31d28b6ab8f ("trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched()") substituted cond_resched() for the earlier call to cond_resched_rcu_qs(). However, the new-age cond_resched() does not do anything to help RCU-tasks grace periods because (1) RCU-tasks is only enabled when CONFIG_PREEMPT=y and (2) cond_resched() is a complete no-op when preemption is enabled. This situation results in hangs when running the trace benchmarks. A number of potential fixes were discussed on LKML (https://lkml.kernel.org/r/20180224151240.0d63a059@vmware.local.home), including making cond_resched() not be a no-op; making cond_resched() not be a no-op, but only when running tracing benchmarks; reverting the aforementioned commit (which works because cond_resched_rcu_qs() does provide an RCU-tasks quiescent state; and adding a call to the scheduler/RCU rcu_note_voluntary_context_switch() function. All were deemed unsatisfactory, either due to added cond_resched() overhead or due to magic functions inviting cargo culting. This commit renames cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs(), which provides a clear hint as to what this function is doing and why and where it should be used, and then replaces the call to cond_resched() with cond_resched_tasks_rcu_qs() in the trace benchmark's benchmark_event_kthread() function. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Call wake_nocb_leader_defer() with 'FORCE' when nocb_q_count is highByungchul Park1-1/+1
If an excessive number of callbacks have been queued, but the NOCB leader kthread's wakeup must be deferred, then we should wake up the leader unconditionally once it is safe to do so. This was handled correctly in commit fbce7497ee ("rcu: Parallelize and economize NOCB kthread wakeups"), but then commit 8be6e1b15c ("rcu: Use timer as backstop for NOCB deferred wakeups") passed RCU_NOCB_WAKE instead of the correct RCU_NOCB_WAKE_FORCE to wake_nocb_leader_defer(). As an interesting aside, RCU_NOCB_WAKE_FORCE is never passed to anything, which should have been taken as a hint. ;-) This commit therefore passes RCU_NOCB_WAKE_FORCE instead of RCU_NOCB_WAKE to wake_nocb_leader_defer() when a callback is queued onto a NOCB CPU that already has an excessive number of callbacks pending. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Don't allocate rcu_nocb_mask if no one needs itPaul E. McKenney1-2/+2
Commit 44c65ff2e3b0 ("rcu: Eliminate NOCBs CPU-state Kconfig options") made allocation of rcu_nocb_mask depend only on the rcu_nocbs=, nohz_full=, or isolcpus= kernel boot parameters. However, it failed to change the initial value of rcu_init_nohz()'s local variable need_rcu_nocb_mask to false, which can result in useless allocation of an all-zero rcu_nocb_mask. This commit therefore fixes this bug by changing the initial value of need_rcu_nocb_mask to false. While we are in the area, also correct the error message that is printed when someone specifies that can-never-exist CPUs should be NOCBs CPUs. Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Byungchul Park <byungchul.park@lge.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Inline rcu_preempt_do_callback() into its sole callerByungchul Park1-10/+1
The rcu_preempt_do_callbacks() function was introduced in commit 09223371dea(rcu: Use softirq to address performance regression), where it was necessary to handle kernel builds both containing and not containing RCU-preempt. Since then, various changes (most notably f8b7fc6b51 ("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) have resulted in this function being invoked only from rcu_kthread_do_work(), which is present only in kernels containing RCU-preempt, which in turn means that the rcu_preempt_do_callbacks() function is no longer needed. This commit therefore inlines rcu_preempt_do_callbacks() into its sole remaining caller and also removes the rcu_state_p and rcu_data_p indirection for added clarity. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-02-21rcu: Use wrapper for lockdep assertsMatthew Wilcox1-2/+2
Commits c0b334c5bfa9 and ea9b0c8a26a2 introduced new sparse warnings by accessing rcu_node->lock directly and ignoring the __private marker. Introduce a new wrapper and use it. Also fix a similar problem in srcutree.c introduced by a3883df3935e. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-21rcu: Remove obsolete callback-invocation statistics for debugfsPaul E. McKenney1-1/+0
The debugfs interface displayed statistics on RCU callback invocation but this interface has since been removed. This commit therefore removes the no-longer-used rcu_data structure's ->n_cbs_invoked and ->n_nocbs_invoked fields along with their updates. If this information proves necessary in the future, the corresponding event traces will be added. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-21rcu: Remove obsolete boost statistics for debugfsPaul E. McKenney1-6/+2
The debugfs interface displayed statistics on RCU priority boosting, but this interface has since been removed. This commit therefore removes the no-longer-used rcu_data structure's ->n_tasks_boosted, ->n_exp_boosts, and ->n_exp_boosts and their updates. If this information proves necessary in the future, the corresponding event traces will be added. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-21rcu: Call touch_nmi_watchdog() while printing stall warningsTejun Heo1-1/+13
When RCU stall warning triggers, it can print out a lot of messages while holding spinlocks. If the console device is slow (e.g. an actual or IPMI serial console), it may end up triggering NMI hard lockup watchdog like the following. *** CPU printking while holding RCU spinlock PID: 4149739 TASK: ffff881a46baa880 CPU: 13 COMMAND: "CPUThreadPool8" #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0 #1 [ffff881fff945e58] nmi_handle at ffffffff81020653 #2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36 #3 [ffff881fff945ed0] do_nmi at ffffffff81020d32 #4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e [exception RIP: io_serial_in+21] RIP: ffffffff81630e55 RSP: ffff881fff943b88 RFLAGS: 00000002 RAX: 000000000000ca00 RBX: ffffffff8230e188 RCX: 0000000000000000 RDX: 00000000000002fd RSI: 0000000000000005 RDI: ffffffff8230e188 RBP: ffff881fff943bb0 R8: 0000000000000000 R9: ffffffff820cb3c4 R10: 0000000000000019 R11: 0000000000002000 R12: 00000000000026e1 R13: 0000000000000020 R14: ffffffff820cd398 R15: 0000000000000035 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- <NMI exception stack> --- #5 [ffff881fff943b88] io_serial_in at ffffffff81630e55 #6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c #7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc #8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00 #9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691 #10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2 #11 [ffff881fff943c90] console_unlock at ffffffff810dfc55 #12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5 #13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf #14 [ffff881fff943d60] vprintk_func at ffffffff810e1127 #15 [ffff881fff943d70] printk at ffffffff8119a8a4 #16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c #17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133 #18 [ffff881fff943ee8] update_process_times at ffffffff810f3497 #19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037 #20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38 #21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b *** CPU triggering the hardlockup watchdog PID: 4149709 TASK: ffff88010f88c380 CPU: 26 COMMAND: "CPUThreadPool35" #0 [ffff883fff1059d0] machine_kexec at ffffffff8104a874 #1 [ffff883fff105a30] __crash_kexec at ffffffff811116cc #2 [ffff883fff105af0] __crash_kexec at ffffffff81111795 #3 [ffff883fff105b08] panic at ffffffff8119a6ae #4 [ffff883fff105b98] watchdog_overflow_callback at ffffffff81135dbd #5 [ffff883fff105bb0] __perf_event_overflow at ffffffff81186866 #6 [ffff883fff105be8] perf_event_overflow at ffffffff81192bc4 #7 [ffff883fff105bf8] intel_pmu_handle_irq at ffffffff8100b265 #8 [ffff883fff105df8] perf_event_nmi_handler at ffffffff8100489f #9 [ffff883fff105e58] nmi_handle at ffffffff81020653 #10 [ffff883fff105eb0] default_do_nmi at ffffffff81020b94 #11 [ffff883fff105ed0] do_nmi at ffffffff81020d32 #12 [ffff883fff105ef0] end_repeat_nmi at ffffffff81956a7e [exception RIP: queued_spin_lock_slowpath+248] RIP: ffffffff810da958 RSP: ffff883fff103e68 RFLAGS: 00000046 RAX: 0000000000000000 RBX: 0000000000000046 RCX: 00000000006d0000 RDX: ffff883fff49a950 RSI: 0000000000d10101 RDI: ffffffff81e54300 RBP: ffff883fff103e80 R8: ffff883fff11a950 R9: 0000000000000000 R10: 000000000e5873ba R11: 000000000000010f R12: ffffffff81e54300 R13: 0000000000000000 R14: ffff88010f88c380 R15: ffffffff81e54300 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #13 [ffff883fff103e68] queued_spin_lock_slowpath at ffffffff810da958 #14 [ffff883fff103e70] _raw_spin_lock_irqsave at ffffffff8195550b #15 [ffff883fff103e88] rcu_check_callbacks at ffffffff810eed18 #16 [ffff883fff103ee8] update_process_times at ffffffff810f3497 #17 [ffff883fff103f10] tick_sched_timer at ffffffff81103037 #18 [ffff883fff103f38] __hrtimer_run_queues at ffffffff810f3f38 #19 [ffff883fff103f88] hrtimer_interrupt at ffffffff810f442b --- <IRQ stack> --- Avoid spuriously triggering NMI hardlockup watchdog by touching it from the print functions. show_state_filter() shares the same problem and solution. v2: Relocate the comment to where it belongs. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-21rcu: Fix CPU offload boot message when no CPUs are offloadedPaul E. McKenney1-2/+5
In CONFIG_RCU_NOCB_CPU=y kernels, if the boot parameters indicate that none of the CPUs should in fact be offloaded, the following somewhat obtuse message appears: Offload RCU callbacks from CPUs: . This commit therefore makes the message at least grammatically correct in this case: Offload RCU callbacks from CPUs: (none) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11Merge branches 'cond_resched.2017.12.04a', 'dyntick.2017.11.28a', ↵Paul E. McKenney1-8/+5
'fixes.2017.12.11a', 'srbd.2017.12.05a' and 'torture.2017.12.11a' into HEAD cond_resched.2017.12.04a: Convert cond_resched_rcu_qs() to cond_resched() dyntick.2017.11.28a: Make RCU dynticks handle interrupts from NMI fixes.2017.12.11a: Miscellaneous fixes srbd.2017.12.05a: Remove now-redundant smp_read_barrier_depends() torture.2017.12.11a: Torture-testing update
2017-12-11rcu: Remove have_rcu_nocb_mask from tree_plugin.hRakib Mullick1-7/+4
Currently have_rcu_nocb_mask is used to avoid double allocation of rcu_nocb_mask during boot up. Due to different representation of cpumask_var_t on different kernel config CPUMASK=y(or n) it was okay. But now we have a helper cpumask_available(), which can be utilized to check whether rcu_nocb_mask has been allocated or not without using a variable. Removing the variable also reduces vmlinux size. Unpatched version: text data bss dec hex filename 13050393 7852470 14543408 35446271 21cddff vmlinux Patched version: text data bss dec hex filename 13050390 7852438 14543408 35446236 21cdddc vmlinux Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-29rcu: Shrink ->dynticks_{nmi_,}nesting from long long to longPaul E. McKenney1-1/+1
Because the ->dynticks_nesting field now only contains the process-based nesting level instead of a value encoding both the process nesting level and the irq "nesting" level, we no longer need a long long, even on 32-bit systems. This commit therefore changes both the ->dynticks_nesting and ->dynticks_nmi_nesting fields to long. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>