<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/raid5.h, branch v5.15.7</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.7</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.7'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-09-24T23:44:44+00:00</updated>
<entry>
<title>md/raid5: let multiple devices of stripe_head share page</title>
<updated>2020-09-24T23:44:44+00:00</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2020-08-20T13:22:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=046169f048cdea84fd327469239e2dfa317981b5'/>
<id>urn:sha1:046169f048cdea84fd327469239e2dfa317981b5</id>
<content type='text'>
In current implementation, grow_buffers() uses alloc_page() to
allocate the buffers for each stripe_head, i.e. allocate a page
for each dev[i] in stripe_head.

After setting stripe_size as a configurable value by writing
sysfs entry, it means that we always allocate 64K buffers, but
just use 4K of them when stripe_size is 4K in 64KB arm64.

To avoid wasting memory, we try to let multiple sh-&gt;dev share
one real page. That means, multiple sh-&gt;dev[i].page will point
to the only page with different offset. Example of 64K PAGE_SIZE
and 4K stripe_size as following:

                    64K PAGE_SIZE
          +---+---+---+---+------------------------------+
          |   |   |   |   |
          |   |   |   |   |
          +-+-+-+-+-+-+-+-+------------------------------+
            ^   ^   ^   ^
            |   |   |   +----------------------------+
            |   |   |                                |
            |   |   +-------------------+            |
            |   |                       |            |
            |   +----------+            |            |
            |              |            |            |
            +-+            |            |            |
              |            |            |            |
        +-----+-----+------+-----+------+-----+------+------+
sh      | offset(0) | offset(4K) | offset(8K) | offset(12K) |
 +      +-----------+------------+------------+-------------+
 +----&gt; dev[0].page  dev[1].page  dev[2].page  dev[3].page

A new 'pages' array will be added into stripe_head to record shared
page used by this stripe_head. Allocate them when grow_buffers()
and free them when shrink_buffers().

After trying to share page, the users of sh-&gt;dev[i].page need to take
care of the related page offset: page of issued bio and page passed
to xor compution functions. But thanks for previous different page offset
supported. Here, we just need to set correct dev[i].offset.

Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>md/raid5: add a new member of offset into r5dev</title>
<updated>2020-09-24T23:44:44+00:00</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2020-08-20T13:22:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7aba13b7152979450270da5602f16f6f28fd35b8'/>
<id>urn:sha1:7aba13b7152979450270da5602f16f6f28fd35b8</id>
<content type='text'>
Add a new member of offset into struct r5dev. It indicates the
offset of related dev[i].page. For now, since each device have a
privated page, the value is always 0. Thus, we set offset as 0
when allcate page in grow_buffers() and resize_stripes().

To support following different page offset, we try to use the page
offset rather than '0' directly for async_memcpy() and ops_run_io().

We try to support different page offset for xor compution functions
in the following. To avoid repeatly allocate a new array each time,
we add a memory region into scribble buffer to record offset.

No functional change.

Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-08-11T02:07:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-08-11T02:07:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=97d052ea3fa853b9aabcc4baca1a605cb1188611'/>
<id>urn:sha1:97d052ea3fa853b9aabcc4baca1a605cb1188611</id>
<content type='text'>
Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce &lt;asm/smp.h&gt; header dependencies by moving XTP bits into the new &lt;asm/xtp.h&gt; header
  x86/headers: Remove APIC headers from &lt;asm/smp.h&gt;
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
</content>
</entry>
<entry>
<title>raid5: Use sequence counter with associated spinlock</title>
<updated>2020-07-29T14:14:27+00:00</updated>
<author>
<name>Ahmed S. Darwish</name>
<email>a.darwish@linutronix.de</email>
</author>
<published>2020-07-20T15:55:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0a87b25ff2eb6169403c88b0d5f3c97bdaa3c930'/>
<id>urn:sha1:0a87b25ff2eb6169403c88b0d5f3c97bdaa3c930</id>
<content type='text'>
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish &lt;a.darwish@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200720155530.1173732-20-a.darwish@linutronix.de
</content>
</entry>
<entry>
<title>md/raid5: set default stripe_size as 4096</title>
<updated>2020-07-22T00:18:17+00:00</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2020-07-18T09:29:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e236858243d7a8e0ac60972d2f9522146088a736'/>
<id>urn:sha1:e236858243d7a8e0ac60972d2f9522146088a736</id>
<content type='text'>
In RAID5, if issued bio size is bigger than stripe_size, it will be
split in the unit of stripe_size and process them one by one. Even
for size less then stripe_size, RAID5 also request data from disk at
least of stripe_size.

Nowdays, stripe_size is equal to the value of PAGE_SIZE. Since filesystem
usually issue bio in the unit of 4KB, there is no problem for PAGE_SIZE
as 4KB. But, for 64KB PAGE_SIZE, bio from filesystem requests 4KB data
while RAID5 issue IO at least stripe_size (64KB) each time. That will
waste resource of disk bandwidth and compute xor.

To avoding the waste, we want to make stripe_size configurable. This
patch just set default stripe_size as 4096. User can also set the value
bigger than 4KB for some special requirements, such as we know the
issued io size is more than 4KB.

To evaluate the new feature, we create raid5 device '/dev/md5' with
4 SSD disk and test it on arm64 machine with 64KB PAGE_SIZE.

1) We format /dev/md5 with mkfs.ext4 and mount ext4 with default
 configure on /mnt directory. Then, trying to test it by dbench with
 command: dbench -D /mnt -t 1000 10. Result show as:

 'stripe_size = 64KB'

  Operation      Count    AvgLat    MaxLat
  ----------------------------------------
  NTCreateX    9805011     0.021    64.728
  Close        7202525     0.001     0.120
  Rename        415213     0.051    44.681
  Unlink       1980066     0.079    93.147
  Deltree          240     1.793     6.516
  Mkdir            120     0.004     0.007
  Qpathinfo    8887512     0.007    37.114
  Qfileinfo    1557262     0.001     0.030
  Qfsinfo      1629582     0.012     0.152
  Sfileinfo     798756     0.040    57.641
  Find         3436004     0.019    57.782
  WriteX       4887239     0.021    57.638
  ReadX        15370483     0.005    37.818
  LockX          31934     0.003     0.022
  UnlockX        31933     0.001     0.021
  Flush         687205    13.302   530.088

 Throughput 307.799 MB/sec  10 clients  10 procs  max_latency=530.091 ms
 -------------------------------------------------------

 'stripe_size = 4KB'

  Operation      Count    AvgLat    MaxLat
  ----------------------------------------
  NTCreateX    11999166     0.021    36.380
  Close        8814128     0.001     0.122
  Rename        508113     0.051    29.169
  Unlink       2423242     0.070    38.141
  Deltree          300     1.885     7.155
  Mkdir            150     0.004     0.006
  Qpathinfo    10875921     0.007    35.485
  Qfileinfo    1905837     0.001     0.032
  Qfsinfo      1994304     0.012     0.125
  Sfileinfo     977450     0.029    26.489
  Find         4204952     0.019     9.361
  WriteX       5981890     0.019    27.804
  ReadX        18809742     0.004    33.491
  LockX          39074     0.003     0.025
  UnlockX        39074     0.001     0.014
  Flush         841022    10.712   458.848

 Throughput 376.777 MB/sec  10 clients  10 procs  max_latency=458.852 ms
 -------------------------------------------------------

 It show that setting stripe_size as 4KB has higher thoughput, i.e.
 (376.777 vs 307.799) and has smaller latency than that setting as 64KB.

 2) We try to evaluate IO throughput for /dev/md5 by fio with config:

 [4KB randwrite]
 direct=1
 numjob=2
 iodepth=64
 ioengine=libaio
 filename=/dev/md5
 bs=4KB
 rw=randwrite

 [64KB write]
 direct=1
 numjob=2
 iodepth=64
 ioengine=libaio
 filename=/dev/md5
 bs=1MB
 rw=write

 The result as follow:

               +                   +
               | stripe_size(64KB) | stripe_size(4KB)
 +----------------------------------------------------+
 4KB randwrite |     15MB/s        |      100MB/s
 +----------------------------------------------------+
 1MB write     |   1000MB/s        |      700MB/s

 The result show that when size of io is bigger than 4KB (64KB),
 64KB stripe_size has much higher IOPS. But for 4KB randwrite, that
 means, size of io issued to device are smaller, 4KB stripe_size
 have better performance.

Normally, default value (4096) can get relatively good performance.
But if each issued io is bigger than 4096, setting value more than
4096 may get better performance.

Here, we just set default stripe_size as 4096, and we will try to
support setting different stripe_size by sysfs interface in the
following patch.

Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>md/raid456: convert macro STRIPE_* to RAID5_STRIPE_*</title>
<updated>2020-07-22T00:18:12+00:00</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2020-07-18T09:29:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c911c46c017c745e0f5ece9626d0fbfaff5a1f97'/>
<id>urn:sha1:c911c46c017c745e0f5ece9626d0fbfaff5a1f97</id>
<content type='text'>
Convert macro STRIPE_SIZE, STRIPE_SECTORS and STRIPE_SHIFT to
RAID5_STRIPE_SIZE(), RAID5_STRIPE_SECTORS() and RAID5_STRIPE_SHIFT().

This patch is prepare for the following adjustable stripe_size.
It will not change any existing functionality.

Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>raid5: use bio_end_sector in r5_next_bio</title>
<updated>2019-09-13T20:14:43+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>guoqing.jiang@cloud.ionos.com</email>
</author>
<published>2019-09-12T10:10:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=067df25c83902e2950ef24fed713f0fa38282f34'/>
<id>urn:sha1:067df25c83902e2950ef24fed713f0fa38282f34</id>
<content type='text'>
Actually, we calculate bio's end sector here, so use the common
way for the purpose.

Signed-off-by: Guoqing Jiang &lt;guoqing.jiang@cloud.ionos.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>raid5: remove STRIPE_OPS_REQ_PENDING</title>
<updated>2019-09-13T20:14:39+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>guoqing.jiang@cloud.ionos.com</email>
</author>
<published>2019-09-12T10:10:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=feb9bf9849e2aa0dd2833285af7c25aee07df7bb'/>
<id>urn:sha1:feb9bf9849e2aa0dd2833285af7c25aee07df7bb</id>
<content type='text'>
This stripe state is not used anymore after commit 51acbcec6c42b24
("md: remove CONFIG_MULTICORE_RAID456"), so remove the obsoleted
state.

gjiang@nb01257:~/md$ grep STRIPE_OPS_REQ_PENDING drivers/md/ -r
drivers/md/raid5.c:					  (1 &lt;&lt; STRIPE_OPS_REQ_PENDING) |
drivers/md/raid5.h:	STRIPE_OPS_REQ_PENDING,

Signed-off-by: Guoqing Jiang &lt;guoqing.jiang@cloud.ionos.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>md: convert to kvmalloc</title>
<updated>2019-03-12T17:04:02+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@gmail.com</email>
</author>
<published>2019-03-12T06:31:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b330e6a49dc3e9145de5c986b29bbbb884351e92'/>
<id>urn:sha1:b330e6a49dc3e9145de5c986b29bbbb884351e92</id>
<content type='text'>
The code really just wants a big flat buffer, so just do that.

Link: http://lkml.kernel.org/r/20181217131929.11727-3-kent.overstreet@gmail.com
Signed-off-by: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Reviewed-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Eric Paris &lt;eparis@parisplace.org&gt;
Cc: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Cc: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Cc: Paul Moore &lt;paul@paul-moore.com&gt;
Cc: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md</title>
<updated>2018-06-09T19:01:36+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-06-09T19:01:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d60dafdca4b463405e5586df923f05b10e9ac2f9'/>
<id>urn:sha1:d60dafdca4b463405e5586df923f05b10e9ac2f9</id>
<content type='text'>
Pull MD updates from Shaohua Li:
 "A few fixes of MD for this merge window. Mostly bug fixes:

   - raid5 stripe batch fix from Amy

   - Read error handling for raid1 FailFast device from Gioh

   - raid10 recovery NULL pointer dereference fix from Guoqing

   - Support write hint for raid5 stripe cache from Mariusz

   - Fixes for device hot add/remove from Neil and Yufen

   - Improve flush bio scalability from Xiao"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  MD: fix lock contention for flush bios
  md/raid5: Assigning NULL to sh-&gt;batch_head before testing bit R5_Overlap of a stripe
  md/raid1: add error handling of read error from FailFast device
  md: fix NULL dereference of mddev-&gt;pers in remove_and_add_spares()
  raid5: copy write hint from origin bio to stripe
  md: fix two problems with setting the "re-add" device state.
  raid10: check bio in r10buf_pool_free to void NULL pointer dereference
  md: fix an error code format and remove unsed bio_sector
</content>
</entry>
</feed>
