<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/raid1.h, branch v6.1.168</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-03-08T23:16:54+00:00</updated>
<entry>
<title>md: raid1/raid10: drop pending_cnt</title>
<updated>2022-03-08T23:16:54+00:00</updated>
<author>
<name>Mariusz Tkaczyk</name>
<email>mariusz.tkaczyk@linux.intel.com</email>
</author>
<published>2022-01-17T11:38:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=daae161fd2e568b4f481b177b8be34374df98b68'/>
<id>urn:sha1:daae161fd2e568b4f481b177b8be34374df98b68</id>
<content type='text'>
Those counters are not necessary after commit 11bb45e8aaf6 ("md: drop queue
limitation for RAID1 and RAID10"). Remove them from all code (conf and
plug structs). raid1_plug_cb and raid10_plug_cb are identical, so move
definition of raid1_plug_cb to common raid1-10 definitions and use it for
RAID10 too.

Signed-off-by: Mariusz Tkaczyk &lt;mariusz.tkaczyk@linux.intel.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>md/raid1: enable io accounting</title>
<updated>2021-06-15T05:32:07+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>jgq516@gmail.com</email>
</author>
<published>2021-05-25T09:46:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a0159832e51e3af03b89ecc5d6b9db451e529b5f'/>
<id>urn:sha1:a0159832e51e3af03b89ecc5d6b9db451e529b5f</id>
<content type='text'>
For raid1, we record the start time between split bio and clone bio,
and finish the accounting in the final endio.

Also introduce start_time in r1bio accordingly.

Signed-off-by: Guoqing Jiang &lt;jiangguoqing@kylinos.cn&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>md/raid1: Replace zero-length array with flexible-array</title>
<updated>2020-05-13T19:02:23+00:00</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavoars@kernel.org</email>
</author>
<published>2020-05-07T19:22:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=358369f03ac94637c9fd9d8f94a2dfde86b9f25f'/>
<id>urn:sha1:358369f03ac94637c9fd9d8f94a2dfde86b9f25f</id>
<content type='text'>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>md: convert to bioset_init()/mempool_init()</title>
<updated>2018-05-30T21:33:32+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@gmail.com</email>
</author>
<published>2018-05-20T22:25:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=afeee514ce7f4cab605beedd03be71ebaf0c5fc8'/>
<id>urn:sha1:afeee514ce7f4cab605beedd03be71ebaf0c5fc8</id>
<content type='text'>
Convert md to embedded bio sets.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>md: document lifetime of internal rdev pointer.</title>
<updated>2018-02-18T18:22:27+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2018-02-02T22:19:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f2785b527cda46314805123ddcbc871655b7c4c4'/>
<id>urn:sha1:f2785b527cda46314805123ddcbc871655b7c4c4</id>
<content type='text'>
The rdev pointer kept in the local 'config' for each for
raid1, raid10, raid4/5/6 has non-obvious lifetime rules.
Sometimes RCU is needed, sometimes a lock, something nothing.

Add documentation to explain this.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;sh.li@alibaba-inc.com&gt;
</content>
</entry>
<entry>
<title>License cleanup: add SPDX GPL-2.0 license identifier to files with no license</title>
<updated>2017-11-02T10:10:55+00:00</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2017-11-01T14:07:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b24413180f5600bcb3bb70fbed5cf186b60864bd'/>
<id>urn:sha1:b24413180f5600bcb3bb70fbed5cf186b60864bd</id>
<content type='text'>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode &amp; Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained &gt;5
   lines of source
 - File already had some variant of a license header in it (even if &lt;5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Reviewed-by: Philippe Ombredanne &lt;pombredanne@nexb.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>md/raid1: Use a new variable to count flighting sync requests</title>
<updated>2017-04-27T21:01:16+00:00</updated>
<author>
<name>Xiao Ni</name>
<email>xni@redhat.com</email>
</author>
<published>2017-04-27T08:28:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=43ac9b84a399bc10210a2d9f4e0778b7c6059c07'/>
<id>urn:sha1:43ac9b84a399bc10210a2d9f4e0778b7c6059c07</id>
<content type='text'>
In new barrier codes, raise_barrier waits if conf-&gt;nr_pending[idx] is not zero.
After all the conditions are true, the resync request can go on be handled. But
it adds conf-&gt;nr_pending[idx] again. The next resync request hit the same bucket
idx need to wait the resync request which is submitted before. The performance
of resync/recovery is degraded.
So we should use a new variable to count sync requests which are in flight.

I did a simple test:
1. Without the patch, create a raid1 with two disks. The resync speed:
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00  166.00    0.00    10.38     0.00   128.00     0.03    0.20    0.20    0.00   0.19   3.20
sdc               0.00     0.00    0.00  166.00     0.00    10.38   128.00     0.96    5.77    0.00    5.77   5.75  95.50
2. With the patch, the result is:
sdb            2214.00     0.00  766.00    0.00   185.69     0.00   496.46     2.80    3.66    3.66    0.00   1.03  79.10
sdc               0.00  2205.00    0.00  769.00     0.00   186.44   496.52     5.25    6.84    0.00    6.84   1.30 100.10

Suggested-by: Shaohua Li &lt;shli@kernel.org&gt;
Signed-off-by: Xiao Ni &lt;xni@redhat.com&gt;
Acked-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>md/raid1: simplify the splitting of requests.</title>
<updated>2017-04-11T17:07:27+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-04-05T04:05:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c230e7e53526c223a3e1caf40747d6e37c0e4394'/>
<id>urn:sha1:c230e7e53526c223a3e1caf40747d6e37c0e4394</id>
<content type='text'>
raid1 currently splits requests in two different ways for
two different reasons.

First, bio_split() is used to ensure the bio fits within a
resync accounting region.
Second, multiple r1bios are allocated for each bio to handle
the possiblity of known bad blocks on some devices.

This can be simplified to just use bio_split() once, and not
use multiple r1bios.
We delay the split until we know a maximum bio size that can
be handled with a single r1bio, and then split the bio and
queue the remainder for later handling.

This avoids all loops inside raid1.c request handling.  Just
a single read, or a single set of writes, is submitted to
lower-level devices for each bio that comes from
generic_make_request().

When the bio needs to be split, generic_make_request() will
do the necessary looping and call md_make_request() multiple
times.

raid1_make_request() no longer queues request for raid1 to handle,
so we can remove that branch from the 'if'.

This patch also creates a new private bio_set
(conf-&gt;bio_split) for splitting bios.  Using fs_bio_set
is wrong, as it is meant to be used by filesystems, not
block devices.  Using it inside md can lead to deadlocks
under high memory pressure.

Delete unused variable in raid1_write_request() (Shaohua)
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>md: raid1: improve write behind</title>
<updated>2017-03-24T17:41:37+00:00</updated>
<author>
<name>Ming Lei</name>
<email>tom.leiming@gmail.com</email>
</author>
<published>2017-03-16T16:12:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=841c1316c7da6199a7df473893c141943991a756'/>
<id>urn:sha1:841c1316c7da6199a7df473893c141943991a756</id>
<content type='text'>
This patch improve handling of write behind in the following ways:

- introduce behind master bio to hold all write behind pages
- fast clone bios from behind master bio
- avoid to change bvec table directly
- use bio_copy_data() and make code more clean

Suggested-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>RAID1: avoid unnecessary spin locks in I/O barrier code</title>
<updated>2017-02-20T06:04:25+00:00</updated>
<author>
<name>colyli@suse.de</name>
<email>colyli@suse.de</email>
</author>
<published>2017-02-17T19:05:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=824e47daddbfc6ebe1006b8659f080620472a136'/>
<id>urn:sha1:824e47daddbfc6ebe1006b8659f080620472a136</id>
<content type='text'>
When I run a parallel reading performan testing on a md raid1 device with
two NVMe SSDs, I observe very bad throughput in supprise: by fio with 64KB
block size, 40 seq read I/O jobs, 128 iodepth, overall throughput is
only 2.7GB/s, this is around 50% of the idea performance number.

The perf reports locking contention happens at allow_barrier() and
wait_barrier() code,
 - 41.41%  fio [kernel.kallsyms]     [k] _raw_spin_lock_irqsave
   - _raw_spin_lock_irqsave
         + 89.92% allow_barrier
         + 9.34% __wake_up
 - 37.30%  fio [kernel.kallsyms]     [k] _raw_spin_lock_irq
   - _raw_spin_lock_irq
         - 100.00% wait_barrier

The reason is, in these I/O barrier related functions,
 - raise_barrier()
 - lower_barrier()
 - wait_barrier()
 - allow_barrier()
They always hold conf-&gt;resync_lock firstly, even there are only regular
reading I/Os and no resync I/O at all. This is a huge performance penalty.

The solution is a lockless-like algorithm in I/O barrier code, and only
holding conf-&gt;resync_lock when it has to.

The original idea is from Hannes Reinecke, and Neil Brown provides
comments to improve it. I continue to work on it, and make the patch into
current form.

In the new simpler raid1 I/O barrier implementation, there are two
wait barrier functions,
 - wait_barrier()
   Which calls _wait_barrier(), is used for regular write I/O. If there is
   resync I/O happening on the same I/O barrier bucket, or the whole
   array is frozen, task will wait until no barrier on same barrier bucket,
   or the whold array is unfreezed.
 - wait_read_barrier()
   Since regular read I/O won't interfere with resync I/O (read_balance()
   will make sure only uptodate data will be read out), it is unnecessary
   to wait for barrier in regular read I/Os, waiting in only necessary
   when the whole array is frozen.

The operations on conf-&gt;nr_pending[idx], conf-&gt;nr_waiting[idx], conf-&gt;
barrier[idx] are very carefully designed in raise_barrier(),
lower_barrier(), _wait_barrier() and wait_read_barrier(), in order to
avoid unnecessary spin locks in these functions. Once conf-&gt;
nr_pengding[idx] is increased, a resync I/O with same barrier bucket index
has to wait in raise_barrier(). Then in _wait_barrier() if no barrier
raised in same barrier bucket index and array is not frozen, the regular
I/O doesn't need to hold conf-&gt;resync_lock, it can just increase
conf-&gt;nr_pending[idx], and return to its caller. wait_read_barrier() is
very similar to _wait_barrier(), the only difference is it only waits when
array is frozen. For heavy parallel reading I/Os, the lockless I/O barrier
code almostly gets rid of all spin lock cost.

This patch significantly improves raid1 reading peroformance. From my
testing, a raid1 device built by two NVMe SSD, runs fio with 64KB
blocksize, 40 seq read I/O jobs, 128 iodepth, overall throughput
increases from 2.7GB/s to 4.6GB/s (+70%).

Changelog
V4:
- Change conf-&gt;nr_queued[] to atomic_t.
- Define BARRIER_BUCKETS_NR_BITS by (PAGE_SHIFT - ilog2(sizeof(atomic_t)))
V3:
- Add smp_mb__after_atomic() as Shaohua and Neil suggested.
- Change conf-&gt;nr_queued[] from atomic_t to int.
- Change conf-&gt;array_frozen from atomic_t back to int, and use
  READ_ONCE(conf-&gt;array_frozen) to check value of conf-&gt;array_frozen
  in _wait_barrier() and wait_read_barrier().
- In _wait_barrier() and wait_read_barrier(), add a call to
  wake_up(&amp;conf-&gt;wait_barrier) after atomic_dec(&amp;conf-&gt;nr_pending[idx]),
  to fix a deadlock between  _wait_barrier()/wait_read_barrier and
  freeze_array().
V2:
- Remove a spin_lock/unlock pair in raid1d().
- Add more code comments to explain why there is no racy when checking two
  atomic_t variables at same time.
V1:
- Original RFC patch for comments.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: Shaohua Li &lt;shli@fb.com&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Cc: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Cc: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Reviewed-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
</feed>
