Age | Commit message (Collapse) | Author | Files | Lines |
|
'rcutorture.2013.12.03a' and 'sparse.2013.12.12a' into HEAD
doc.2013.12.03a: Topic branch for documentation changes.
fixes.2013.12.12a: Topic branch for miscellaneous fixes.
rcutorture.2013.12.03a: Topic branch for new rcutorture/KVM scripting.
sparse.2013.12.12a: Topic branch for sparse-RCU changes.
|
|
Function prototypes don't need to have the "extern" keyword since this
is the default behavior. Its explicit use is redundant. This commit
therefore removes them.
Signed-off-by: Teodora Baluta <teobaluta@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Function prototypes don't need to have the "extern" keyword since this
is the default behavior. Its explicit use is redundant. This commit
therefore removes them.
Signed-off-by: Teodora Baluta <teobaluta@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
If the rcutorture SRCU output exceeds 4096 bytes, for example, if you
have more than about 75 CPUs, it will overflow the current statically
allocated buffer. This commit therefore replaces this static buffer
with a dynamically buffer whose size is based on the number of CPUs.
Benefits:
- Avoids both buffer overflow and output truncation.
- Handles an arbitrarily large number of CPUs.
- Straightforward implementation.
Shortcomings:
- Some memory is wasted:
1 cpu now comsumes 50 - 60 bytes, and this patch provides 200 bytes.
Therefore, for 1K CPUs, roughly 100KB of memory will be wasted.
However, the memory is freed immediately after printing, so this
wastage should not be a problem in practice.
Testing (Fedora16 2 CPUs, 2GB RAM x86_64):
- as module, with/without "torture_type=srcu".
- build-in not boot runnable, with/without "torture_type=srcu".
- build-in let boot runnable, with/without "torture_type=srcu".
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see
if the RCU core needs anything from this CPU. If so, RCU raises
RCU_SOFTIRQ to carry out any needed processing.
This approach has worked well historically, but it is undesirable on
NO_HZ_FULL CPUs. Such CPUs are expected to spend almost all of their time
in userspace, so that scheduling-clock interrupts can be disabled while
there is only one runnable task on the CPU in question. Unfortunately,
raising any softirq has the potential to wake up ksoftirqd, which would
provide the second runnable task on that CPU, preventing disabling of
scheduling-clock interrupts.
What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone,
relying on the grace-period kthreads' quiescent-state forcing to
do any needed RCU work on behalf of those CPUs.
This commit therefore refrains from raising RCU_SOFTIRQ on any
NO_HZ_FULL CPUs during any grace periods that have been in effect
for less than one second. The one-second limit handles the case
where an inappropriate workload is running on a NO_HZ_FULL CPU
that features lots of scheduling-clock interrupts, but no idle
or userspace time.
Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <bitbucket@online.de>
Toasted-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
After commit #10f39bb1b2c1 (rcu: protect __rcu_read_unlock() against
scheduler-using irq handlers), it is no longer possible to enter
the main body of rcu_read_lock_special() from an NMI, interrupt, or
softirq handler. In theory, this implies that the check for "in_irq()
|| in_serving_softirq()" must always fail, so that in theory this check
could be removed entirely.
In practice, this commit wraps this condition with a WARN_ON_ONCE().
If this warning never triggers, then the condition will be removed
entirely.
[ paulmck: And one way of triggering the WARN_ON() is if a scheduling
clock interrupt occurs in an RCU read-side critical section, setting
RCU_READ_UNLOCK_NEED_QS, which is handled by rcu_read_unlock_special().
Updated this commit to return if only that bit was set. ]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
There is currently no way to initialize a global RCU-protected pointer
without either putting up with sparse complaints or open-coding an
obscure cast. This commit therefore creates RCU_INITIALIZER(), which
is intended to be used as follows:
struct foo __rcu *p = RCU_INITIALIZER(&my_rcu_structure);
This commit also applies RCU_INITIALIZER() to eliminate repeated
open-coded obscure casts in __rcu_assign_pointer(), RCU_INIT_POINTER(),
and RCU_POINTER_INITIALIZER(). This commit also inlines
__rcu_assign_pointer() into its only caller, rcu_assign_pointer().
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The rcu_assign_pointer() primitive needs to use ACCESS_ONCE to make
the assignment to the destination pointer volatile, to protect against
compilers too clever for their own good.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Although rcu_assign_pointer() can be used to assign a constant
NULL pointer, doing so gets you an unnecessary memory barrier and
in some circumstances, sparse warnings. This commit therefore
changes the rcu_assign_pointer() of NULL in __bond_release_one() to
RCU_INIT_POINTER().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The rcu_assign_pointer() macro, as with most cpp macros, must not evaluate
its argument more than once. And it in fact does not. But this might
not be obvious to the casual observer, because one of the arguments
appears no less than three times. However, but one expansion is only
visible to sparse (__CHECKER__), and one lives inside a typeof (where
it will never be evaluated), so this is in fact safe.
This commit therefore adds a comment making this explicit.
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Currently blocking in an RCU callback function will result in
"scheduling while atomic", which could be triggered for any number
of reasons. To aid debugging, this patch introduces a rcu_callback_map
that is used to tie the inappropriate voluntary context switch back
to the fact that the function is being invoked from within a callback.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
This commit documents the memory-barrier guarantees provided by
synchronize_srcu() and call_srcu().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Each element of the rcu_state structure's ->levelspread[] array
is intended to contain the per-level fanout, where the zero-th
element corresponds to the root of the rcu_node tree, and the last
element corresponds to the leaves. In the CONFIG_RCU_FANOUT_EXACT
case, this means that the last element should be filled in
from CONFIG_RCU_FANOUT_LEAF (or from the rcu_fanout_leaf boot
parameter, if provided) and that the remaining elements should
be filled in from CONFIG_RCU_FANOUT. Unfortunately, the current
code in rcu_init_levelspread() takes the opposite approach, placing
CONFIG_RCU_FANOUT_LEAF in the zero-th element and CONFIG_RCU_FANOUT in
the remaining elements.
For typical power-of-two values, this generates odd but functional
rcu_node trees. However, other values, for example CONFIG_RCU_FANOUT=3
and CONFIG_RCU_FANOUT_LEAF=2, generate trees that can leave some CPUs
out of the grace-period computation, resulting in too-short grace periods
and therefore a broken RCU implementation.
This commit therefore fixes rcu_init_levelspread() to set the last
->levelspread[] array element from CONFIG_RCU_FANOUT_LEAF and the
remaining elements from CONFIG_RCU_FANOUT, thus generating the
intended rcu_node trees.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
This commit fixes the following coccinelle warning:
kernel/rcu/tree.c:712:9-10: WARNING: return of 0/1 in function
'rcu_lockdep_current_cpu_online' with return type bool
Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: coccinelle/misc/boolreturn.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
All of the rcutorture scripts has the usual GPL header, which contains
a long-obsolete postal address for FSF. To avoid the need to track the
FSF office's movements, this commit substitutes the URL where GPL may
be found.
Reported-by: Greg KH <gregkh@linuxfoundation.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The output of the rcutorture scripts often requires interpretation, so
this commit simplifies this interpretation by tagging messages as
BUGs (colored red) or WARNINGs (colored yellow).
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
Repeatedly running a given test, for example, by repeating the name
as in "--configs "TREE08 TREE08 TREE08" records the results only of
the last run of this test. This is because the earlier results are
overwritten by the later results.
This commit therefore checks for earlier results, using numbered
file extensions to distinguish multiple runs. The earlier example
would therefore create directories TREE01, TREE01.2, and TREE01.3.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The commit causes kvm.sh to invoke kvm-recheck.sh at the end of each
run, and causes kvm-recheck.sh to print only the name of the test, not
the full path to the corresponding Kconfig file.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The TREE08 Kconfig fragment does not enable tracing, which is appropriate
for its test case. However, this can be inconvenient in cases where
TREE08 locates RCU bugs. This commit therefore adds a TREE08-T that
differs from TREE08 only in enabling CONFIG_RCU_TRACE.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
This commit adds the --kmake-arg to kvm.sh, which allows passing in
things like "V=1" to see the build commands, as well as enabling the
CROSS_COMPILE= make macro used for cross-building.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
This commit adds the --no-initrd argument to kvm.sh, which permits
initrd to be contained in a root partition specified by the --bootargs
argument. Without --no-initrd, the kernel build expects an initrd
directory in the same rcutorture directory that contains bin and configs.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
This commits adds the --qemu-args argument to kvm.sh that is required
to pass boot devices down through to qemu.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
This commit allows easy specification of trace_event lists, among other
things.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
This commit adds --buildonly, which does the builds specified by the
--configs argument, but does not boot or test the resulting kernels.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
Don't grab the configuration fragment from the configs directory because
it might well have been changed since the test was run. Instead, use
the ConfigFragment file that was placed in the results directory.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
As it stands, the default kernel boot parameters generated from
the Kconfig fragment will override any supplied with the .boot
file that can optionally accompany a Kconfig fragment. Rearrange
ordering to permit the specific .boot arguments to override those
generated by analyzing the Kconfig fragment.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
This commit expands the checks for what architecture is running to generate
additional qemu-system- commands, then uses the resulting qemu-system-
command name to choose different qemu arguments as needed for different
architectures.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The --rcu-kvm argument was intended to allow the scripts to live in
an alternate location. Unfortunately, this prevents the kvm.sh script
from using common functions until after it finished parsing arguments,
because it doesn't know where to find them until then. However, "cp -a"
and "ln -s" work pretty well, so lack of an --rcu-kvm argument can be
easily worked around.
This commit therefore removes this argument.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The qemu -name argument doesn't seem to be useful in this environment,
so this commit removes it.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The task of working out which flavor of qemu to use gets more complex
as more types of CPUs are supported. Adding Power makes three in addition
to 32-bit and 64-bit x86, so it is time to pull this out into a function.
This commit therefore creates an identify_qemu function and also adds
a --qemu-cmd command-line argument for the inevitable case where the
identify_qemu cannot figure it out.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The commit uses configcheck.sh from within configinit.sh, replacing the
imperfect inline expansion that was there before.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
This commit drops no-longer-needed diagnostics from the output. Some of
them are retained in logfiles, in case they are ever needed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The TINY_RCU test cases were first put in place many years ago, and have
been incrementally modified rather than being reworked. This commit
therefore completes a long-overdue reworking of the TINY_RCU test cases.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The TREE_RCU test cases were first put in place many years ago, and have
been incrementally modified rather than being reworked. This commit
therefore completes a long-overdue reworking of the TREE_RCU test cases.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
Use .boot facility to ease inclusion of SRCU into automated testing.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
The v3.12 version of the kernel added the CONFIG_NO_HZ_FULL_SYSIDLE
Kconfig parameter, so this commit adds a version transition at that
point.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
Some Kconfig fragments require rcutorture module parameters to
do optimal testing, for example, a configuration for SRCU would
need rcutorture.torture_type=srcu. This commit therefore adds a
per-Kconfig-fragment boot-parameter capability.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
Different Kconfig parameters apply to different kernel versions, as
do different rcutorture module parameters. This commit allows the
rcutorture test scripts to adjust for different kernel versions.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
Allow datestamp to be specified to allow tests to be broken up and run
in parallel.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
This commit adds the test framework that I used to test RCU under KVM.
This consists of a group of scripts and Kconfig fragments.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
|
|
Some RCU bugs have been specific to the layout of the rcu_node tree,
but RCU will silently adjust the tree at boot time if appropriate.
This obscures valuable debugging information, so print a message when
this happens.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The srcu_barrier() docbook header left out the "sp" argument, so this
commit adds that argument's docbook text.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The current task-level idle entry/exit code forces an entry/exit on
each call, regardless of the nesting level. This commit therefore
properly accounts for nesting.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
Dave Jones got the following lockdep splat:
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.12.0-rc3+ #92 Not tainted
> -------------------------------------------------------
> trinity-child2/15191 is trying to acquire lock:
> (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
>
> but task is already holding lock:
> (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&ctx->lock){-.-...}:
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
> [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0
> [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0
> [<ffffffff81732052>] __schedule+0x1d2/0xa20
> [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0
> [<ffffffff817352b6>] retint_kernel+0x26/0x30
> [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50
> [<ffffffff813f0504>] pty_write+0x54/0x60
> [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0
> [<ffffffff813e5838>] tty_write+0x158/0x2d0
> [<ffffffff811c4850>] vfs_write+0xc0/0x1f0
> [<ffffffff811c52cc>] SyS_write+0x4c/0xa0
> [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> -> #2 (&rq->lock){-.-.-.}:
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
> [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0
> [<ffffffff81054336>] do_fork+0x126/0x460
> [<ffffffff81054696>] kernel_thread+0x26/0x30
> [<ffffffff8171ff93>] rest_init+0x23/0x140
> [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403
> [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c
> [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4
>
> -> #1 (&p->pi_lock){-.-.-.}:
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
> [<ffffffff810979d1>] try_to_wake_up+0x31/0x350
> [<ffffffff81097d62>] default_wake_function+0x12/0x20
> [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40
> [<ffffffff8108ea38>] __wake_up_common+0x58/0x90
> [<ffffffff8108ff59>] __wake_up+0x39/0x50
> [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
> [<ffffffff81111450>] __call_rcu+0x140/0x820
> [<ffffffff81111b8d>] call_rcu+0x1d/0x20
> [<ffffffff81093697>] cpu_attach_domain+0x287/0x360
> [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0
> [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a
> [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202
> [<ffffffff817200be>] kernel_init+0xe/0x190
> [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0
>
> -> #0 (&rdp->nocb_wq){......}:
> [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
> [<ffffffff8108ff43>] __wake_up+0x23/0x50
> [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
> [<ffffffff81111450>] __call_rcu+0x140/0x820
> [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
> [<ffffffff81149abf>] put_ctx+0x4f/0x70
> [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
> [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
> [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
> [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
> [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> other info that might help us debug this:
>
> Chain exists of:
> &rdp->nocb_wq --> &rq->lock --> &ctx->lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&ctx->lock);
> lock(&rq->lock);
> lock(&ctx->lock);
> lock(&rdp->nocb_wq);
>
> *** DEADLOCK ***
>
> 1 lock held by trinity-child2/15191:
> #0: (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> stack backtrace:
> CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92
> ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40
> ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0
> ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0
> Call Trace:
> [<ffffffff8172a363>] dump_stack+0x4e/0x82
> [<ffffffff81726741>] print_circular_bug+0x200/0x20f
> [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
> [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60
> [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
> [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
> [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
> [<ffffffff8108ff43>] __wake_up+0x23/0x50
> [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
> [<ffffffff81111450>] __call_rcu+0x140/0x820
> [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50
> [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
> [<ffffffff81149abf>] put_ctx+0x4f/0x70
> [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
> [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
> [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
> [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
> [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
> [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
The underlying problem is that perf is invoking call_rcu() with the
scheduler locks held, but in NOCB mode, call_rcu() will with high
probability invoke the scheduler -- which just might want to use its
locks. The reason that call_rcu() needs to invoke the scheduler is
to wake up the corresponding rcuo callback-offload kthread, which
does the job of starting up a grace period and invoking the callbacks
afterwards.
One solution (championed on a related problem by Lai Jiangshan) is to
simply defer the wakeup to some point where scheduler locks are no longer
held. Since we don't want to unnecessarily incur the cost of such
deferral, the task before us is threefold:
1. Determine when it is likely that a relevant scheduler lock is held.
2. Defer the wakeup in such cases.
3. Ensure that all deferred wakeups eventually happen, preferably
sooner rather than later.
We use irqs_disabled_flags() as a proxy for relevant scheduler locks
being held. This works because the relevant locks are always acquired
with interrupts disabled. We may defer more often than needed, but that
is at least safe.
The wakeup deferral is tracked via a new field in the per-CPU and
per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.
This flag is checked by the RCU core processing. The __rcu_pending()
function now checks this flag, which causes rcu_check_callbacks()
to initiate RCU core processing at each scheduling-clock interrupt
where this flag is set. Of course this is not sufficient because
scheduling-clock interrupts are often turned off (the things we used to
be able to count on!). So the flags are also checked on entry to any
state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
state and NO_HZ_FULL user-mode-execution state.
This approach should allow call_rcu() to be invoked regardless of what
locks you might be holding, the key word being "should".
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
|
|
It is all too easy to forget that wait_event() does not necessarily
imply a full memory barrier. The case where it does not is where the
condition transitions to true just as wait_event() starts execution.
This is actually a feature: The standard use of wait_event() involves
locking, in which case the locks provide the needed ordering (you hold a
lock across the wake_up() and acquire that same lock after wait_event()
returns).
Given that I did forget that wait_event() does not necessarily imply a
full memory barrier in one case, this commit fixes that case. This commit
also adds comments calling out the placement of existing memory barriers
relied on by wait_event() calls.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
When an RCU CPU stall warning occurs, the CPU invokes resched_cpu() on
itself. This can help move the grace period forward in some situations,
but it would be even better to do this -before- the RCU CPU stall warning.
This commit therefore causes resched_cpu() to be called every five jiffies
once the system is halfway to an RCU CPU stall warning.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
This commit replaces full barriers by targeted use of load-acquire and
store-release.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Restore comments as suggested by David Howells. ]
|