summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-08-04 00:31:33 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2020-08-04 00:31:33 +0300
commit8f0cb6660acb0d4756df880a3e60e73daa9c244e (patch)
tree9f14396bed65f82b94f7bb2425ffb2f0cbd4519a /Documentation
parent5ece08178d6567db5ef0090b1ae7f795c3c36161 (diff)
parentc1cc4784ce6e8cceff1013709abd74bcbf7fbf24 (diff)
downloadlinux-8f0cb6660acb0d4756df880a3e60e73daa9c244e.tar.xz
Merge tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar: - kfree_rcu updates - RCU tasks updates - Read-side scalability tests - SRCU updates - Torture-test updates - Documentation updates - Miscellaneous fixes * tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (109 commits) torture: Remove obsolete "cd $KVM" torture: Avoid duplicate specification of qemu command torture: Dump ftrace at shutdown only if requested torture: Add kvm-tranform.sh script for qemu-cmd files torture: Add more tracing crib notes to kvm.sh torture: Improve diagnostic for KCSAN-incapable compilers torture: Correctly summarize build-only runs torture: Pass --kmake-arg to all make invocations rcutorture: Check for unwatched readers torture: Abstract out console-log error detection torture: Add a stop-run capability torture: Create qemu-cmd in --buildonly runs rcu/rcutorture: Replace 0 with false torture: Add --allcpus argument to the kvm.sh script torture: Remove whitespace from identify_qemu_vcpus output rcutorture: NULL rcu_torture_current earlier in cleanup code rcutorture: Handle non-statistic bang-string error messages torture: Set configfile variable to current scenario rcutorture: Add races with task-exit processing locktorture: Use true and false to assign to bool variables ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RCU/Design/Requirements/Requirements.rst7
-rw-r--r--Documentation/RCU/checklist.rst (renamed from Documentation/RCU/checklist.txt)17
-rw-r--r--Documentation/RCU/index.rst9
-rw-r--r--Documentation/RCU/lockdep-splat.rst (renamed from Documentation/RCU/lockdep-splat.txt)109
-rw-r--r--Documentation/RCU/lockdep.rst (renamed from Documentation/RCU/lockdep.txt)12
-rw-r--r--Documentation/RCU/rculist_nulls.rst200
-rw-r--r--Documentation/RCU/rculist_nulls.txt172
-rw-r--r--Documentation/RCU/rcuref.rst (renamed from Documentation/RCU/rcuref.txt)199
-rw-r--r--Documentation/RCU/stallwarn.rst (renamed from Documentation/RCU/stallwarn.txt)62
-rw-r--r--Documentation/RCU/torture.rst (renamed from Documentation/RCU/torture.txt)117
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt68
-rw-r--r--Documentation/locking/locktorture.rst2
12 files changed, 569 insertions, 405 deletions
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index 50d5c43c48b0..8f41ad0aa753 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -2583,7 +2583,12 @@ not work to have these markers in the trampoline itself, because there
would need to be instructions following ``rcu_read_unlock()``. Although
``synchronize_rcu()`` would guarantee that execution reached the
``rcu_read_unlock()``, it would not be able to guarantee that execution
-had completely left the trampoline.
+had completely left the trampoline. Worse yet, in some situations
+the trampoline's protection must extend a few instructions *prior* to
+execution reaching the trampoline. For example, these few instructions
+might calculate the address of the trampoline, so that entering the
+trampoline would be pre-ordained a surprisingly long time before execution
+actually reached the trampoline itself.
The solution, in the form of `Tasks
RCU <https://lwn.net/Articles/607117/>`__, is to have implicit read-side
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.rst
index e98ff261a438..2efed9926c3f 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.rst
@@ -1,4 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
Review Checklist for RCU Patches
+================================
This document contains a checklist for producing and reviewing patches
@@ -411,18 +415,21 @@ over a rather long period of time, but improvements are always welcome!
__rcu sparse checks to validate your RCU code. These can help
find problems as follows:
- CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data
+ CONFIG_PROVE_LOCKING:
+ check that accesses to RCU-protected data
structures are carried out under the proper RCU
read-side critical section, while holding the right
combination of locks, or whatever other conditions
are appropriate.
- CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
+ CONFIG_DEBUG_OBJECTS_RCU_HEAD:
+ check that you don't pass the
same object to call_rcu() (or friends) before an RCU
grace period has elapsed since the last time that you
passed that same object to call_rcu() (or friends).
- __rcu sparse checks: tag the pointer to the RCU-protected data
+ __rcu sparse checks:
+ tag the pointer to the RCU-protected data
structure with __rcu, and sparse will warn you if you
access that pointer without the services of one of the
variants of rcu_dereference().
@@ -442,8 +449,8 @@ over a rather long period of time, but improvements are always welcome!
You instead need to use one of the barrier functions:
- o call_rcu() -> rcu_barrier()
- o call_srcu() -> srcu_barrier()
+ - call_rcu() -> rcu_barrier()
+ - call_srcu() -> srcu_barrier()
However, these barrier functions are absolutely -not- guaranteed
to wait for a grace period. In fact, if there are no call_rcu()
diff --git a/Documentation/RCU/index.rst b/Documentation/RCU/index.rst
index 81a0a1e5f767..e703d3dbe60c 100644
--- a/Documentation/RCU/index.rst
+++ b/Documentation/RCU/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
.. _rcu_concepts:
============
@@ -8,10 +10,17 @@ RCU concepts
:maxdepth: 3
arrayRCU
+ checklist
+ lockdep
+ lockdep-splat
rcubarrier
rcu_dereference
whatisRCU
rcu
+ rculist_nulls
+ rcuref
+ torture
+ stallwarn
listRCU
NMI-RCU
UP
diff --git a/Documentation/RCU/lockdep-splat.txt b/Documentation/RCU/lockdep-splat.rst
index b8096316fd11..2a5c79db57dc 100644
--- a/Documentation/RCU/lockdep-splat.txt
+++ b/Documentation/RCU/lockdep-splat.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+Lockdep-RCU Splat
+=================
+
Lockdep-RCU was added to the Linux kernel in early 2010
(http://lwn.net/Articles/371986/). This facility checks for some common
misuses of the RCU API, most notably using one of the rcu_dereference()
@@ -12,55 +18,54 @@ overwriting or worse. There can of course be false positives, this
being the real world and all that.
So let's look at an example RCU lockdep splat from 3.0-rc5, one that
-has long since been fixed:
-
-=============================
-WARNING: suspicious RCU usage
------------------------------
-block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!
-
-other info that might help us debug this:
-
-
-rcu_scheduler_active = 1, debug_locks = 0
-3 locks held by scsi_scan_6/1552:
- #0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>]
-scsi_scan_host_selected+0x5a/0x150
- #1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>]
-elevator_exit+0x22/0x60
- #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>]
-cfq_exit_queue+0x43/0x190
-
-stack backtrace:
-Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17
-Call Trace:
- [<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0
- [<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120
- [<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190
- [<ffffffff812a5046>] elevator_exit+0x36/0x60
- [<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60
- [<ffffffff8145cc09>] scsi_free_queue+0x9/0x10
- [<ffffffff81460944>] __scsi_remove_device+0x84/0xd0
- [<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10
- [<ffffffff817da069>] ? error_exit+0x29/0xb0
- [<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
- [<ffffffff8145e722>] __scsi_scan_target+0x112/0x680
- [<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
- [<ffffffff817da069>] ? error_exit+0x29/0xb0
- [<ffffffff812bcc60>] ? kobject_del+0x40/0x40
- [<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0
- [<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150
- [<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90
- [<ffffffff8145f170>] do_scan_async+0x20/0x160
- [<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90
- [<ffffffff810975b6>] kthread+0xa6/0xb0
- [<ffffffff817db154>] kernel_thread_helper+0x4/0x10
- [<ffffffff81066430>] ? finish_task_switch+0x80/0x110
- [<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
- [<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70
- [<ffffffff817db150>] ? gs_change+0xb/0xb
-
-Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:
+has long since been fixed::
+
+ =============================
+ WARNING: suspicious RCU usage
+ -----------------------------
+ block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!
+
+other info that might help us debug this::
+
+ rcu_scheduler_active = 1, debug_locks = 0
+ 3 locks held by scsi_scan_6/1552:
+ #0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>]
+ scsi_scan_host_selected+0x5a/0x150
+ #1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>]
+ elevator_exit+0x22/0x60
+ #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>]
+ cfq_exit_queue+0x43/0x190
+
+ stack backtrace:
+ Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17
+ Call Trace:
+ [<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0
+ [<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120
+ [<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190
+ [<ffffffff812a5046>] elevator_exit+0x36/0x60
+ [<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60
+ [<ffffffff8145cc09>] scsi_free_queue+0x9/0x10
+ [<ffffffff81460944>] __scsi_remove_device+0x84/0xd0
+ [<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10
+ [<ffffffff817da069>] ? error_exit+0x29/0xb0
+ [<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
+ [<ffffffff8145e722>] __scsi_scan_target+0x112/0x680
+ [<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
+ [<ffffffff817da069>] ? error_exit+0x29/0xb0
+ [<ffffffff812bcc60>] ? kobject_del+0x40/0x40
+ [<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0
+ [<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150
+ [<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90
+ [<ffffffff8145f170>] do_scan_async+0x20/0x160
+ [<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90
+ [<ffffffff810975b6>] kthread+0xa6/0xb0
+ [<ffffffff817db154>] kernel_thread_helper+0x4/0x10
+ [<ffffffff81066430>] ? finish_task_switch+0x80/0x110
+ [<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
+ [<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70
+ [<ffffffff817db150>] ? gs_change+0xb/0xb
+
+Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows::
if (rcu_dereference(ioc->ioc_data) == cic) {
@@ -70,7 +75,7 @@ case. Instead, we hold three locks, one of which might be RCU related.
And maybe that lock really does protect this reference. If so, the fix
is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to
take the struct request_queue "q" from cfq_exit_queue() as an argument,
-which would permit us to invoke rcu_dereference_protected as follows:
+which would permit us to invoke rcu_dereference_protected as follows::
if (rcu_dereference_protected(ioc->ioc_data,
lockdep_is_held(&q->queue_lock)) == cic) {
@@ -85,7 +90,7 @@ On the other hand, perhaps we really do need an RCU read-side critical
section. In this case, the critical section must span the use of the
return value from rcu_dereference(), or at least until there is some
reference count incremented or some such. One way to handle this is to
-add rcu_read_lock() and rcu_read_unlock() as follows:
+add rcu_read_lock() and rcu_read_unlock() as follows::
rcu_read_lock();
if (rcu_dereference(ioc->ioc_data) == cic) {
@@ -102,7 +107,7 @@ above lockdep-RCU splat.
But in this particular case, we don't actually dereference the pointer
returned from rcu_dereference(). Instead, that pointer is just compared
to the cic pointer, which means that the rcu_dereference() can be replaced
-by rcu_access_pointer() as follows:
+by rcu_access_pointer() as follows::
if (rcu_access_pointer(ioc->ioc_data) == cic) {
diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.rst
index 89db949eeca0..f1fc8ae3846a 100644
--- a/Documentation/RCU/lockdep.txt
+++ b/Documentation/RCU/lockdep.rst
@@ -1,4 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================
RCU and lockdep checking
+========================
All flavors of RCU have lockdep checking available, so that lockdep is
aware of when each task enters and leaves any flavor of RCU read-side
@@ -8,7 +12,7 @@ tracking to include RCU state, which can sometimes help when debugging
deadlocks and the like.
In addition, RCU provides the following primitives that check lockdep's
-state:
+state::
rcu_read_lock_held() for normal RCU.
rcu_read_lock_bh_held() for RCU-bh.
@@ -63,7 +67,7 @@ checking of rcu_dereference() primitives:
The rcu_dereference_check() check expression can be any boolean
expression, but would normally include a lockdep expression. However,
any boolean expression can be used. For a moderately ornate example,
-consider the following:
+consider the following::
file = rcu_dereference_check(fdt->fd[fd],
lockdep_is_held(&files->file_lock) ||
@@ -82,7 +86,7 @@ RCU read-side critical sections, in case (2) the ->file_lock prevents
any change from taking place, and finally, in case (3) the current task
is the only task accessing the file_struct, again preventing any change
from taking place. If the above statement was invoked only from updater
-code, it could instead be written as follows:
+code, it could instead be written as follows::
file = rcu_dereference_protected(fdt->fd[fd],
lockdep_is_held(&files->file_lock) ||
@@ -105,7 +109,7 @@ false and they are called from outside any RCU read-side critical section.
For example, the workqueue for_each_pwq() macro is intended to be used
either within an RCU read-side critical section or with wq->mutex held.
-It is thus implemented as follows:
+It is thus implemented as follows::
#define for_each_pwq(pwq, wq)
list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node,
diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
new file mode 100644
index 000000000000..a9fc774bc400
--- /dev/null
+++ b/Documentation/RCU/rculist_nulls.rst
@@ -0,0 +1,200 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================================================
+Using RCU hlist_nulls to protect list and objects
+=================================================
+
+This section describes how to use hlist_nulls to
+protect read-mostly linked lists and
+objects using SLAB_TYPESAFE_BY_RCU allocations.
+
+Please read the basics in Documentation/RCU/listRCU.rst
+
+Using 'nulls'
+=============
+
+Using special makers (called 'nulls') is a convenient way
+to solve following problem :
+
+A typical RCU linked list managing objects which are
+allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
+use following algos :
+
+1) Lookup algo
+--------------
+
+::
+
+ rcu_read_lock()
+ begin:
+ obj = lockless_lookup(key);
+ if (obj) {
+ if (!try_get_ref(obj)) // might fail for free objects
+ goto begin;
+ /*
+ * Because a writer could delete object, and a writer could
+ * reuse these object before the RCU grace period, we
+ * must check key after getting the reference on object
+ */
+ if (obj->key != key) { // not the object we expected
+ put_ref(obj);
+ goto begin;
+ }
+ }
+ rcu_read_unlock();
+
+Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
+but a version with an additional memory barrier (smp_rmb())
+
+::
+
+ lockless_lookup(key)
+ {
+ struct hlist_node *node, *next;
+ for (pos = rcu_dereference((head)->first);
+ pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
+ ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
+ pos = rcu_dereference(next))
+ if (obj->key == key)
+ return obj;
+ return NULL;
+ }
+
+And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb()::
+
+ struct hlist_node *node;
+ for (pos = rcu_dereference((head)->first);
+ pos && ({ prefetch(pos->next); 1; }) &&
+ ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
+ pos = rcu_dereference(pos->next))
+ if (obj->key == key)
+ return obj;
+ return NULL;
+
+Quoting Corey Minyard::
+
+ "If the object is moved from one list to another list in-between the
+ time the hash is calculated and the next field is accessed, and the
+ object has moved to the end of a new list, the traversal will not
+ complete properly on the list it should have, since the object will
+ be on the end of the new list and there's not a way to tell it's on a
+ new list and restart the list traversal. I think that this can be
+ solved by pre-fetching the "next" field (with proper barriers) before
+ checking the key."
+
+2) Insert algo
+--------------
+
+We need to make sure a reader cannot read the new 'obj->obj_next' value
+and previous value of 'obj->key'. Or else, an item could be deleted
+from a chain, and inserted into another chain. If new chain was empty
+before the move, 'next' pointer is NULL, and lockless reader can
+not detect it missed following items in original chain.
+
+::
+
+ /*
+ * Please note that new inserts are done at the head of list,
+ * not in the middle or end.
+ */
+ obj = kmem_cache_alloc(...);
+ lock_chain(); // typically a spin_lock()
+ obj->key = key;
+ /*
+ * we need to make sure obj->key is updated before obj->next
+ * or obj->refcnt
+ */
+ smp_wmb();
+ atomic_set(&obj->refcnt, 1);
+ hlist_add_head_rcu(&obj->obj_node, list);
+ unlock_chain(); // typically a spin_unlock()
+
+
+3) Remove algo
+--------------
+Nothing special here, we can use a standard RCU hlist deletion.
+But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
+very very fast (before the end of RCU grace period)
+
+::
+
+ if (put_last_reference_on(obj) {
+ lock_chain(); // typically a spin_lock()
+ hlist_del_init_rcu(&obj->obj_node);
+ unlock_chain(); // typically a spin_unlock()
+ kmem_cache_free(cachep, obj);
+ }
+
+
+
+--------------------------------------------------------------------------
+
+Avoiding extra smp_rmb()
+========================
+
+With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
+and extra smp_wmb() in insert function.
+
+For example, if we choose to store the slot number as the 'nulls'
+end-of-list marker for each slot of the hash table, we can detect
+a race (some writer did a delete and/or a move of an object
+to another chain) checking the final 'nulls' value if
+the lookup met the end of chain. If final 'nulls' value
+is not the slot number, then we must restart the lookup at
+the beginning. If the object was moved to the same chain,
+then the reader doesn't care : It might eventually
+scan the list again without harm.
+
+
+1) lookup algo
+--------------
+
+::
+
+ head = &table[slot];
+ rcu_read_lock();
+ begin:
+ hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
+ if (obj->key == key) {
+ if (!try_get_ref(obj)) // might fail for free objects
+ goto begin;
+ if (obj->key != key) { // not the object we expected
+ put_ref(obj);
+ goto begin;
+ }
+ goto out;
+ }
+ /*
+ * if the nulls value we got at the end of this lookup is
+ * not the expected one, we must restart lookup.
+ * We probably met an item that was moved to another chain.
+ */
+ if (get_nulls_value(node) != slot)
+ goto begin;
+ obj = NULL;
+
+ out:
+ rcu_read_unlock();
+
+2) Insert function
+------------------
+
+::
+
+ /*
+ * Please note that new inserts are done at the head of list,
+ * not in the middle or end.
+ */
+ obj = kmem_cache_alloc(cachep);
+ lock_chain(); // typically a spin_lock()
+ obj->key = key;
+ /*
+ * changes to obj->key must be visible before refcnt one
+ */
+ smp_wmb();
+ atomic_set(&obj->refcnt, 1);
+ /*
+ * insert obj in RCU way (readers might be traversing chain)
+ */
+ hlist_nulls_add_head_rcu(&obj->obj_node, list);
+ unlock_chain(); // typically a spin_unlock()
diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt
deleted file mode 100644
index 23f115dc87cf..000000000000
--- a/Documentation/RCU/rculist_nulls.txt
+++ /dev/null
@@ -1,172 +0,0 @@
-Using hlist_nulls to protect read-mostly linked lists and
-objects using SLAB_TYPESAFE_BY_RCU allocations.
-
-Please read the basics in Documentation/RCU/listRCU.rst
-
-Using special makers (called 'nulls') is a convenient way
-to solve following problem :
-
-A typical RCU linked list managing objects which are
-allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
-use following algos :
-
-1) Lookup algo
---------------
-rcu_read_lock()
-begin:
-obj = lockless_lookup(key);
-if (obj) {
- if (!try_get_ref(obj)) // might fail for free objects
- goto begin;
- /*
- * Because a writer could delete object, and a writer could
- * reuse these object before the RCU grace period, we
- * must check key after getting the reference on object
- */
- if (obj->key != key) { // not the object we expected
- put_ref(obj);
- goto begin;
- }
-}
-rcu_read_unlock();
-
-Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
-but a version with an additional memory barrier (smp_rmb())
-
-lockless_lookup(key)
-{
- struct hlist_node *node, *next;
- for (pos = rcu_dereference((head)->first);
- pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
- ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
- pos = rcu_dereference(next))
- if (obj->key == key)
- return obj;
- return NULL;
-
-And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() :
-
- struct hlist_node *node;
- for (pos = rcu_dereference((head)->first);
- pos && ({ prefetch(pos->next); 1; }) &&
- ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
- pos = rcu_dereference(pos->next))
- if (obj->key == key)
- return obj;
- return NULL;
-}
-
-Quoting Corey Minyard :
-
-"If the object is moved from one list to another list in-between the
- time the hash is calculated and the next field is accessed, and the
- object has moved to the end of a new list, the traversal will not
- complete properly on the list it should have, since the object will
- be on the end of the new list and there's not a way to tell it's on a
- new list and restart the list traversal. I think that this can be
- solved by pre-fetching the "next" field (with proper barriers) before
- checking the key."
-
-2) Insert algo :
-----------------
-
-We need to make sure a reader cannot read the new 'obj->obj_next' value
-and previous value of 'obj->key'. Or else, an item could be deleted
-from a chain, and inserted into another chain. If new chain was empty
-before the move, 'next' pointer is NULL, and lockless reader can
-not detect it missed following items in original chain.
-
-/*
- * Please note that new inserts are done at the head of list,
- * not in the middle or end.
- */
-obj = kmem_cache_alloc(...);
-lock_chain(); // typically a spin_lock()
-obj->key = key;
-/*
- * we need to make sure obj->key is updated before obj->next
- * or obj->refcnt
- */
-smp_wmb();
-atomic_set(&obj->refcnt, 1);
-hlist_add_head_rcu(&obj->obj_node, list);
-unlock_chain(); // typically a spin_unlock()
-
-
-3) Remove algo
---------------
-Nothing special here, we can use a standard RCU hlist deletion.
-But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
-very very fast (before the end of RCU grace period)
-
-if (put_last_reference_on(obj) {
- lock_chain(); // typically a spin_lock()
- hlist_del_init_rcu(&obj->obj_node);
- unlock_chain(); // typically a spin_unlock()
- kmem_cache_free(cachep, obj);
-}
-
-
-
---------------------------------------------------------------------------
-With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
-and extra smp_wmb() in insert function.
-
-For example, if we choose to store the slot number as the 'nulls'
-end-of-list marker for each slot of the hash table, we can detect
-a race (some writer did a delete and/or a move of an object
-to another chain) checking the final 'nulls' value if
-the lookup met the end of chain. If final 'nulls' value
-is not the slot number, then we must restart the lookup at
-the beginning. If the object was moved to the same chain,
-then the reader doesn't care : It might eventually
-scan the list again without harm.
-
-
-1) lookup algo
-
- head = &table[slot];
- rcu_read_lock();
-begin:
- hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
- if (obj->key == key) {
- if (!try_get_ref(obj)) // might fail for free objects
- goto begin;
- if (obj->key != key) { // not the object we expected
- put_ref(obj);
- goto begin;
- }
- goto out;
- }
-/*
- * if the nulls value we got at the end of this lookup is
- * not the expected one, we must restart lookup.
- * We probably met an item that was moved to another chain.
- */
- if (get_nulls_value(node) != slot)
- goto begin;
- obj = NULL;
-
-out:
- rcu_read_unlock();
-
-2) Insert function :
---------------------
-
-/*
- * Please note that new inserts are done at the head of list,
- * not in the middle or end.
- */
-obj = kmem_cache_alloc(cachep);
-lock_chain(); // typically a spin_lock()
-obj->key = key;
-/*
- * changes to obj->key must be visible before refcnt one
- */
-smp_wmb();
-atomic_set(&obj->refcnt, 1);
-/*
- * insert obj in RCU way (readers might be traversing chain)
- */
-hlist_nulls_add_head_rcu(&obj->obj_node, list);
-unlock_chain(); // typically a spin_unlock()
diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.rst
index 5e6429d66c24..b33aeb14fde3 100644
--- a/Documentation/RCU/rcuref.txt
+++ b/Documentation/RCU/rcuref.rst
@@ -1,4 +1,8 @@
-Reference-count design for elements of lists/arrays protected by RCU.
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================================================
+Reference-count design for elements of lists/arrays protected by RCU
+====================================================================
Please note that the percpu-ref feature is likely your first
@@ -12,32 +16,33 @@ please read on.
Reference counting on elements of lists which are protected by traditional
reader/writer spinlocks or semaphores are straightforward:
-CODE LISTING A:
-1. 2.
-add() search_and_reference()
-{ {
- alloc_object read_lock(&list_lock);
- ... search_for_element
- atomic_set(&el->rc, 1); atomic_inc(&el->rc);
- write_lock(&list_lock); ...
- add_element read_unlock(&list_lock);
- ... ...
- write_unlock(&list_lock); }
-}
-
-3. 4.
-release_referenced() delete()
-{ {
- ... write_lock(&list_lock);
- if(atomic_dec_and_test(&el->rc)) ...
- kfree(el);
- ... remove_element
-} write_unlock(&list_lock);
- ...
- if (atomic_dec_and_test(&el->rc))
- kfree(el);
- ...
- }
+CODE LISTING A::
+
+ 1. 2.
+ add() search_and_reference()
+ { {
+ alloc_object read_lock(&list_lock);
+ ... search_for_element
+ atomic_set(&el->rc, 1); atomic_inc(&el->rc);
+ write_lock(&list_lock); ...
+ add_element read_unlock(&list_lock);
+ ... ...
+ write_unlock(&list_lock); }
+ }
+
+ 3. 4.
+ release_referenced() delete()
+ { {
+ ... write_lock(&list_lock);
+ if(atomic_dec_and_test(&el->rc)) ...
+ kfree(el);
+ ... remove_element
+ } write_unlock(&list_lock);
+ ...
+ if (atomic_dec_and_test(&el->rc))
+ kfree(el);
+ ...
+ }
If this list/array is made lock free using RCU as in changing the
write_lock() in add() and delete() to spin_lock() and changing read_lock()
@@ -46,34 +51,35 @@ search_and_reference() could potentially hold reference to an element which
has already been deleted from the list/array. Use atomic_inc_not_zero()
in this scenario as follows:
-CODE LISTING B:
-1. 2.
-add() search_and_reference()
-{ {
- alloc_object rcu_read_lock();
- ... search_for_element
- atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
- spin_lock(&list_lock); rcu_read_unlock();
- return FAIL;
- add_element }
- ... ...
- spin_unlock(&list_lock); rcu_read_unlock();
-} }
-3. 4.
-release_referenced() delete()
-{ {
- ... spin_lock(&list_lock);
- if (atomic_dec_and_test(&el->rc)) ...
- call_rcu(&el->head, el_free); remove_element
- ... spin_unlock(&list_lock);
-} ...
- if (atomic_dec_and_test(&el->rc))
- call_rcu(&el->head, el_free);
- ...
- }
+CODE LISTING B::
+
+ 1. 2.
+ add() search_and_reference()
+ { {
+ alloc_object rcu_read_lock();
+ ... search_for_element
+ atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
+ spin_lock(&list_lock); rcu_read_unlock();
+ return FAIL;
+ add_element }
+ ... ...
+ spin_unlock(&list_lock); rcu_read_unlock();
+ } }
+ 3. 4.
+ release_referenced() delete()
+ { {
+ ... spin_lock(&list_lock);
+ if (atomic_dec_and_test(&el->rc)) ...
+ call_rcu(&el->head, el_free); remove_element
+ ... spin_unlock(&list_lock);
+ } ...
+ if (atomic_dec_and_test(&el->rc))
+ call_rcu(&el->head, el_free);
+ ...
+ }
Sometimes, a reference to the element needs to be obtained in the
-update (write) stream. In such cases, atomic_inc_not_zero() might be
+update (write) stream. In such cases, atomic_inc_not_zero() might be
overkill, since we hold the update-side spinlock. One might instead
use atomic_inc() in such cases.
@@ -82,39 +88,40 @@ search_and_reference() code path. In such cases, the
atomic_dec_and_test() may be moved from delete() to el_free()
as follows:
-CODE LISTING C:
-1. 2.
-add() search_and_reference()
-{ {
- alloc_object rcu_read_lock();
- ... search_for_element
- atomic_set(&el->rc, 1); atomic_inc(&el->rc);
- spin_lock(&list_lock); ...
-
- add_element rcu_read_unlock();
- ... }
- spin_unlock(&list_lock); 4.
-} delete()
-3. {
-release_referenced() spin_lock(&list_lock);
-{ ...
- ... remove_element
- if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
- kfree(el); ...
- ... call_rcu(&el->head, el_free);
-} ...
-5. }
-void el_free(struct rcu_head *rhp)
-{
- release_referenced();
-}
+CODE LISTING C::
+
+ 1. 2.
+ add() search_and_reference()
+ { {
+ alloc_object rcu_read_lock();
+ ... search_for_element
+ atomic_set(&el->rc, 1); atomic_inc(&el->rc);
+ spin_lock(&list_lock); ...
+
+ add_element rcu_read_unlock();
+ ... }
+ spin_unlock(&list_lock); 4.
+ } delete()
+ 3. {
+ release_referenced() spin_lock(&list_lock);
+ { ...
+ ... remove_element
+ if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
+ kfree(el); ...
+ ... call_rcu(&el->head, el_free);
+ } ...
+ 5. }
+ void el_free(struct rcu_head *rhp)
+ {
+ release_referenced();
+ }
The key point is that the initial reference added by add() is not removed
until after a grace period has elapsed following removal. This means that
search_and_reference() cannot find this element, which means that the value
of el->rc cannot increase. Thus, once it reaches zero, there are no
-readers that can or ever will be able to reference the element. The
-element can therefore safely be freed. This in turn guarantees that if
+readers that can or ever will be able to reference the element. The
+element can therefore safely be freed. This in turn guarantees that if
any reader finds the element, that reader may safely acquire a reference
without checking the value of the reference counter.
@@ -130,21 +137,21 @@ the eventual invocation of kfree(), which is usually not a problem on
modern computer systems, even the small ones.
In cases where delete() can sleep, synchronize_rcu() can be called from
-delete(), so that el_free() can be subsumed into delete as follows:
-
-4.
-delete()
-{
- spin_lock(&list_lock);
- ...
- remove_element
- spin_unlock(&list_lock);
- ...
- synchronize_rcu();
- if (atomic_dec_and_test(&el->rc))
- kfree(el);
- ...
-}
+delete(), so that el_free() can be subsumed into delete as follows::
+
+ 4.
+ delete()
+ {
+ spin_lock(&list_lock);
+ ...
+ remove_element
+ spin_unlock(&list_lock);
+ ...
+ synchronize_rcu();
+ if (atomic_dec_and_test(&el->rc))
+ kfree(el);
+ ...
+ }
As additional examples in the kernel, the pattern in listing C is used by
reference counting of struct pid, while the pattern in listing B is used by
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.rst
index a360a8796710..c9ab6af4d3be 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.rst
@@ -1,4 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
Using RCU's CPU Stall Detector
+==============================
This document first discusses what sorts of issues RCU's CPU stall
detector can locate, and then discusses kernel parameters and Kconfig
@@ -7,39 +11,40 @@ this document explains the stall detector's "splat" format.
What Causes RCU CPU Stall Warnings?
+===================================
So your kernel printed an RCU CPU stall warning. The next question is
"What caused it?" The following problems can result in RCU CPU stall
warnings:
-o A CPU looping in an RCU read-side critical section.
+- A CPU looping in an RCU read-side critical section.
-o A CPU looping with interrupts disabled.
+- A CPU looping with interrupts disabled.
-o A CPU looping with preemption disabled.
+- A CPU looping with preemption disabled.
-o A CPU looping with bottom halves disabled.
+- A CPU looping with bottom halves disabled.
-o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
+- For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule(). If the looping in the kernel is
really expected and desirable behavior, you might need to add
some calls to cond_resched().
-o Booting Linux using a console connection that is too slow to
+- Booting Linux using a console connection that is too slow to
keep up with the boot-time console-message rate. For example,
a 115Kbaud serial console can be -way- too slow to keep up
with boot-time message rates, and will frequently result in
RCU CPU stall warning messages. Especially if you have added
debug printk()s.
-o Anything that prevents RCU's grace-period kthreads from running.
+- Anything that prevents RCU's grace-period kthreads from running.
This can result in the "All QSes seen" console-log message.
This message will include information on when the kthread last
ran and how often it should be expected to run. It can also
- result in the "rcu_.*kthread starved for" console-log message,
+ result in the ``rcu_.*kthread starved for`` console-log message,
which will include additional debugging information.
-o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
+- A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
happen to preempt a low-priority task in the middle of an RCU
read-side critical section. This is especially damaging if
that low-priority task is not permitted to run on any other CPU,
@@ -48,7 +53,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
While the system is in the process of running itself out of
memory, you might see stall-warning messages.
-o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
+- A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
is running at a higher priority than the RCU softirq threads.
This will prevent RCU callbacks from ever being invoked,
and in a CONFIG_PREEMPT_RCU kernel will further prevent
@@ -63,7 +68,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
can increase your system's context-switch rate and thus degrade
performance.
-o A periodic interrupt whose handler takes longer than the time
+- A periodic interrupt whose handler takes longer than the time
interval between successive pairs of interrupts. This can
prevent RCU's kthreads and softirq handlers from running.
Note that certain high-overhead debugging options, for example
@@ -71,20 +76,27 @@ o A periodic interrupt whose handler takes longer than the time
considerably longer than normal, which can in turn result in
RCU CPU stall warnings.
-o Testing a workload on a fast system, tuning the stall-warning
+- Testing a workload on a fast system, tuning the stall-warning
timeout down to just barely avoid RCU CPU stall warnings, and then
running the same workload with the same stall-warning timeout on a
slow system. Note that thermal throttling and on-demand governors
can cause a single system to be sometimes fast and sometimes slow!
-o A hardware or software issue shuts off the scheduler-clock
+- A hardware or software issue shuts off the scheduler-clock
interrupt on a CPU that is not in dyntick-idle mode. This
problem really has happened, and seems to be most likely to
result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
-o A bug in the RCU implementation.
+- A hardware or software issue that prevents time-based wakeups
+ from occurring. These issues can range from misconfigured or
+ buggy timer hardware through bugs in the interrupt or exception
+ path (whether hardware, firmware, or software) through bugs
+ in Linux's timer subsystem through bugs in the scheduler, and,
+ yes, even including bugs in RCU itself.
+
+- A bug in the RCU implementation.
-o A hardware failure. This is quite unlikely, but has occurred
+- A hardware failure. This is quite unlikely, but has occurred
at least once in real life. A CPU failed in a running system,
becoming unresponsive, but not causing an immediate crash.
This resulted in a series of RCU CPU stall warnings, eventually
@@ -109,6 +121,7 @@ see include/trace/events/rcu.h.
Fine-Tuning the RCU CPU Stall Detector
+======================================
The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's
CPU stall detector, which detects conditions that unduly delay RCU grace
@@ -118,6 +131,7 @@ The stall detector's idea of what constitutes "unduly delayed" is
controlled by a set of kernel configuration variables and cpp macros:
CONFIG_RCU_CPU_STALL_TIMEOUT
+----------------------------
This kernel configuration parameter defines the period of time
that RCU will wait from the beginning of a grace period until it
@@ -137,6 +151,7 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
RCU_STALL_DELAY_DELTA
+---------------------
Although the lockdep facility is extremely useful, it does add
some overhead. Therefore, under CONFIG_PROVE_RCU, the
@@ -145,6 +160,7 @@ RCU_STALL_DELAY_DELTA
macro, not a kernel configuration parameter.)
RCU_STALL_RAT_DELAY
+-------------------
The CPU stall detector tries to make the offending CPU print its
own warnings, as this often gives better-quality stack traces.
@@ -155,6 +171,7 @@ RCU_STALL_RAT_DELAY
parameter.)
rcupdate.rcu_task_stall_timeout
+-------------------------------
This boot/sysfs parameter controls the RCU-tasks stall warning
interval. A value of zero or less suppresses RCU-tasks stall
@@ -168,9 +185,10 @@ rcupdate.rcu_task_stall_timeout
Interpreting RCU's CPU Stall-Detector "Splats"
+==============================================
For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
-it will print a message similar to the following:
+it will print a message similar to the following::
INFO: rcu_sched detected stalls on CPUs/tasks:
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
@@ -223,7 +241,7 @@ an estimate of the total number of RCU callbacks queued across all CPUs
(625 in this case).
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
-for each CPU:
+for each CPU::
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1
@@ -235,7 +253,7 @@ processing is enabled.
If the grace period ends just as the stall warning starts printing,
there will be a spurious stall-warning message, which will include
-the following:
+the following::
INFO: Stall ended before state dump start
@@ -248,7 +266,7 @@ which is overkill for this sort of problem.
If all CPUs and tasks have passed through quiescent states, but the
grace period has nevertheless failed to end, the stall-warning splat
-will include something like the following:
+will include something like the following::
All QSes seen, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root ->qsmask 0x0
@@ -261,7 +279,7 @@ which is way less than 23807. Finally, the root rcu_node structure's
If the relevant grace-period kthread has been unable to run prior to
the stall warning, as was the case in the "All QSes seen" line above,
-the following additional line is printed:
+the following additional line is printed::
kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
@@ -276,6 +294,7 @@ kthread last ran on CPU 5.
Multiple Warnings From One Stall
+================================
If a stall lasts long enough, multiple stall-warning messages will be
printed for it. The second and subsequent messages are printed at
@@ -285,9 +304,10 @@ of the stall and the first message.
Stall Warnings for Expedited Grace Periods
+==========================================
If an expedited grace period detects a stall, it will place a message
-like the following in dmesg:
+like the following in dmesg::
INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/.
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.rst
index af712a3c5b6a..a90147713062 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.rst
@@ -1,7 +1,12 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
RCU Torture Test Operation
+==========================
CONFIG_RCU_TORTURE_TEST
+=======================
The CONFIG_RCU_TORTURE_TEST config option is available for all RCU
implementations. It creates an rcutorture kernel module that can
@@ -13,9 +18,10 @@ when the module is loaded, and stops when the module is unloaded.
Module parameters are prefixed by "rcutorture." in
Documentation/admin-guide/kernel-parameters.txt.
-OUTPUT
+Output
+======
-The statistics output is as follows:
+The statistics output is as follows::
rcu-torture:--- Start of test: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
rcu-torture: rtc: (null) ver: 155441 tfle: 0 rta: 155441 rtaf: 8884 rtf: 155440 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 3055767
@@ -36,53 +42,53 @@ automatic determination as to whether RCU operated correctly.
The entries are as follows:
-o "rtc": The hexadecimal address of the structure currently visible
+* "rtc": The hexadecimal address of the structure currently visible
to readers.
-o "ver": The number of times since boot that the RCU writer task
+* "ver": The number of times since boot that the RCU writer task
has changed the structure visible to readers.
-o "tfle": If non-zero, indicates that the "torture freelist"
+* "tfle": If non-zero, indicates that the "torture freelist"
containing structures to be placed into the "rtc" area is empty.
This condition is important, since it can fool you into thinking
that RCU is working when it is not. :-/
-o "rta": Number of structures allocated from the torture freelist.
+* "rta": Number of structures allocated from the torture freelist.
-o "rtaf": Number of allocations from the torture freelist that have
+* "rtaf": Number of allocations from the torture freelist that have
failed due to the list being empty. It is not unusual for this
to be non-zero, but it is bad for it to be a large fraction of
the value indicated by "rta".
-o "rtf": Number of frees into the torture freelist.
+* "rtf": Number of frees into the torture freelist.
-o "rtmbe": A non-zero value indicates that rcutorture believes that
+* "rtmbe": A non-zero value indicates that rcutorture believes that
rcu_assign_pointer() and rcu_dereference() are not working
correctly. This value should be zero.
-o "rtbe": A non-zero value indicates that one of the rcu_barrier()
+* "rtbe": A non-zero value indicates that one of the rcu_barrier()
family of functions is not working correctly.
-o "rtbke": rcutorture was unable to create the real-time kthreads
+* "rtbke": rcutorture was unable to create the real-time kthreads
used to force RCU priority inversion. This value should be zero.
-o "rtbre": Although rcutorture successfully created the kthreads
+* "rtbre": Although rcutorture successfully created the kthreads
used to force RCU priority inversion, it was unable to set them
to the real-time priority level of 1. This value should be zero.
-o "rtbf": The number of times that RCU priority boosting failed
+* "rtbf": The number of times that RCU priority boosting failed
to resolve RCU priority inversion.
-o "rtb": The number of times that rcutorture attempted to force
+* "rtb": The number of times that rcutorture attempted to force
an RCU priority inversion condition. If you are testing RCU
priority boosting via the "test_boost" module parameter, this
value should be non-zero.
-o "nt": The number of times rcutorture ran RCU read-side code from
+* "nt": The number of times rcutorture ran RCU read-side code from
within a timer handler. This value should be non-zero only
if you specified the "irqreader" module parameter.
-o "Reader Pipe": Histogram of "ages" of structures seen by readers.
+* "Reader Pipe": Histogram of "ages" of structures seen by readers.
If any entries past the first two are non-zero, RCU is broken.
And rcutorture prints the error flag string "!!!" to make sure
you notice. The age of a newly allocated structure is zero,
@@ -94,14 +100,14 @@ o "Reader Pipe": Histogram of "ages" of structures seen by readers.
RCU. If you want to see what it looks like when broken, break
it yourself. ;-)
-o "Reader Batch": Another histogram of "ages" of structures seen
+* "Reader Batch": Another histogram of "ages" of structures seen
by readers, but in terms of counter flips (or batches) rather
than in terms of grace periods. The legal number of non-zero
entries is again two. The reason for this separate view is that
it is sometimes easier to get the third entry to show up in the
"Reader Batch" list than in the "Reader Pipe" list.
-o "Free-Block Circulation": Shows the number of torture structures
+* "Free-Block Circulation": Shows the number of torture structures
that have reached a given point in the pipeline. The first element
should closely correspond to the number of structures allocated,
the second to the number that have been removed from reader view,
@@ -112,7 +118,7 @@ o "Free-Block Circulation": Shows the number of torture structures
Different implementations of RCU can provide implementation-specific
additional information. For example, Tree SRCU provides the following
-additional line:
+additional line::
srcud-torture: Tree SRCU per-CPU(idx=0): 0(35,-21) 1(-4,24) 2(1,1) 3(-26,20) 4(28,-47) 5(-9,4) 6(-10,14) 7(-14,11) T(1,6)
@@ -123,15 +129,15 @@ using a dynamically allocated srcu_struct (hence "srcud-" rather than
"old" and "current" values to the underlying array, and is useful for
debugging. The final "T" entry contains the totals of the counters.
-
-USAGE ON SPECIFIC KERNEL BUILDS
+Usage on Specific Kernel Builds
+===============================
It is sometimes desirable to torture RCU on a specific kernel build,
for example, when preparing to put that kernel build into production.
In that case, the kernel should be built with CONFIG_RCU_TORTURE_TEST=m
so that the test can be started using modprobe and terminated using rmmod.
-For example, the following script may be used to torture RCU:
+For example, the following script may be used to torture RCU::
#!/bin/sh
@@ -148,7 +154,8 @@ two are self-explanatory, while the last indicates that while there
were no RCU failures, CPU-hotplug problems were detected.
-USAGE ON MAINLINE KERNELS
+Usage on Mainline Kernels
+=========================
When using rcutorture to test changes to RCU itself, it is often
necessary to build a number of kernels in order to test that change
@@ -180,16 +187,16 @@ to Tree SRCU might run only the SRCU-N and SRCU-P scenarios using the
--configs argument to kvm.sh as follows: "--configs 'SRCU-N SRCU-P'".
Large systems can run multiple copies of of the full set of scenarios,
for example, a system with 448 hardware threads can run five instances
-of the full set concurrently. To make this happen:
+of the full set concurrently. To make this happen::
kvm.sh --cpus 448 --configs '5*CFLIST'
Alternatively, such a system can run 56 concurrent instances of a single
-eight-CPU scenario:
+eight-CPU scenario::
kvm.sh --cpus 448 --configs '56*TREE04'
-Or 28 concurrent instances of each of two eight-CPU scenarios:
+Or 28 concurrent instances of each of two eight-CPU scenarios::
kvm.sh --cpus 448 --configs '28*TREE03 28*TREE04'
@@ -199,14 +206,14 @@ values for memory may require disabling the callback-flooding tests
using the --bootargs parameter discussed below.
Sometimes additional debugging is useful, and in such cases the --kconfig
-parameter to kvm.sh may be used, for example, "--kconfig 'CONFIG_KASAN=y'".
+parameter to kvm.sh may be used, for example, ``--kconfig 'CONFIG_KASAN=y'``.
Kernel boot arguments can also be supplied, for example, to control
rcutorture's module parameters. For example, to test a change to RCU's
CPU stall-warning code, use "--bootargs 'rcutorture.stall_cpu=30'".
This will of course result in the scripting reporting a failure, namely
the resuling RCU CPU stall warning. As noted above, reducing memory may
-require disabling rcutorture's callback-flooding tests:
+require disabling rcutorture's callback-flooding tests::
kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \
--bootargs 'rcutorture.fwd_progress=0'
@@ -225,7 +232,7 @@ is listed at the end of the kvm.sh output, which you really should redirect
to a file. The build products and console output of each run is kept in
tools/testing/selftests/rcutorture/res in timestamped directories. A
given directory can be supplied to kvm-find-errors.sh in order to have
-it cycle you through summaries of errors and full error logs. For example:
+it cycle you through summaries of errors and full error logs. For example::
tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh \
tools/testing/selftests/rcutorture/res/2020.01.20-15.54.23
@@ -245,38 +252,42 @@ that was tested and any uncommitted changes in diff format.
The most frequently used files in each per-scenario-run directory are:
-.config: This file contains the Kconfig options.
+.config:
+ This file contains the Kconfig options.
-Make.out: This contains build output for a specific scenario.
+Make.out:
+ This contains build output for a specific scenario.
-console.log: This contains the console output for a specific scenario.
+console.log:
+ This contains the console output for a specific scenario.
This file may be examined once the kernel has booted, but
it might not exist if the build failed.
-vmlinux: This contains the kernel, which can be useful with tools like
+vmlinux:
+ This contains the kernel, which can be useful with tools like
objdump and gdb.
A number of additional files are available, but are less frequently used.
Many are intended for debugging of rcutorture itself or of its scripting.
As of v5.4, a successful run with the default set of scenarios produces
-the following summary at the end of the run on a 12-CPU system:
-
-SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ]
-SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ]
-SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ]
-SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ]
-TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ]
-TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ]
-TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ]
-TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198
-TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631
-TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ]
-TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844
-TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497
-CPU count limited from 16 to 12
-TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961
-TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997
-TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732
-CPU count limited from 16 to 12
-TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011
+the following summary at the end of the run on a 12-CPU system::
+
+ SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ]
+ SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ]
+ SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ]
+ SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ]
+ TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ]
+ TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ]
+ TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ]
+ TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198
+ TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631
+ TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ]
+ TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844
+ TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497
+ CPU count limited from 16 to 12
+ TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961
+ TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997
+ TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732
+ CPU count limited from 16 to 12
+ TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb95fad81c79..d35fd3ced0db 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4038,6 +4038,14 @@
latencies, which will choose a value aligned
with the appropriate hardware boundaries.
+ rcutree.rcu_min_cached_objs= [KNL]
+ Minimum number of objects which are cached and
+ maintained per one CPU. Object size is equal
+ to PAGE_SIZE. The cache allows to reduce the
+ pressure to page allocator, also it makes the
+ whole algorithm to behave better in low memory
+ condition.
+
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
first attempt to force quiescent states.
@@ -4258,6 +4266,20 @@
Set time (jiffies) between CPU-hotplug operations,
or zero to disable CPU-hotplug testing.
+ rcutorture.read_exit= [KNL]
+ Set the number of read-then-exit kthreads used
+ to test the interaction of RCU updaters and
+ task-exit processing.
+
+ rcutorture.read_exit_burst= [KNL]
+ The number of times in a given read-then-exit
+ episode that a set of read-then-exit kthreads
+ is spawned.
+
+ rcutorture.read_exit_delay= [KNL]
+ The delay, in seconds, between successive
+ read-then-exit testing episodes.
+
rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode
@@ -4407,6 +4429,45 @@
reboot_cpu is s[mp]#### with #### being the processor
to be used for rebooting.
+ refscale.holdoff= [KNL]
+ Set test-start holdoff period. The purpose of
+ this parameter is to delay the start of the
+ test until boot completes in order to avoid
+ interference.
+
+ refscale.loops= [KNL]
+ Set the number of loops over the synchronization
+ primitive under test. Increasing this number
+ reduces noise due to loop start/end overhead,
+ but the default has already reduced the per-pass
+ noise to a handful of picoseconds on ca. 2020
+ x86 laptops.
+
+ refscale.nreaders= [KNL]
+ Set number of readers. The default value of -1
+ selects N, where N is roughly 75% of the number
+ of CPUs. A value of zero is an interesting choice.
+
+ refscale.nruns= [KNL]
+ Set number of runs, each of which is dumped onto
+ the console log.
+
+ refscale.readdelay= [KNL]
+ Set the read-side critical-section duration,
+ measured in microseconds.
+
+ refscale.scale_type= [KNL]
+ Specify the read-protection implementation to test.
+
+ refscale.shutdown= [KNL]
+ Shut down the system at the end of the performance
+ test. This defaults to 1 (shut it down) when
+ rcuperf is built into the kernel and to 0 (leave
+ it running) when rcuperf is built as a module.
+
+ refscale.verbose= [KNL]
+ Enable additional printk() statements.
+
relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/admin-guide/cgroup-v1/cpusets.rst.
@@ -5082,6 +5143,13 @@
Prevent the CPU-hotplug component of torturing
until after init has spawned.
+ torture.ftrace_dump_at_shutdown= [KNL]
+ Dump the ftrace buffer at torture-test shutdown,
+ even if there were no errors. This can be a
+ very costly operation when many torture tests
+ are running concurrently, especially on systems
+ with rotating-rust storage.
+
tp720= [HW,PS2]
tpm_suspend_pcr=[HW,TPM]
diff --git a/Documentation/locking/locktorture.rst b/Documentation/locking/locktorture.rst
index 8012a74555e7..dfaf9fc883f4 100644
--- a/Documentation/locking/locktorture.rst
+++ b/Documentation/locking/locktorture.rst
@@ -166,4 +166,4 @@ checked for such errors. The "rmmod" command forces a "SUCCESS",
two are self-explanatory, while the last indicates that while there
were no locking failures, CPU-hotplug problems were detected.
-Also see: Documentation/RCU/torture.txt
+Also see: Documentation/RCU/torture.rst