<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/raid5.c, branch v4.17.1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.17.1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.17.1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2018-04-05T21:27:02+00:00</updated>
<entry>
<title>Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block</title>
<updated>2018-04-05T21:27:02+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-04-05T21:27:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3526dd0c7832f1011a0477cc6d903662bae05ea8'/>
<id>urn:sha1:3526dd0c7832f1011a0477cc6d903662bae05ea8</id>
<content type='text'>
Pull block layer updates from Jens Axboe:
 "It's a pretty quiet round this time, which is nice. This contains:

   - series from Bart, cleaning up the way we set/test/clear atomic
     queue flags.

   - series from Bart, fixing races between gendisk and queue
     registration and removal.

   - set of bcache fixes and improvements from various folks, by way of
     Michael Lyle.

   - set of lightnvm updates from Matias, most of it being the 1.2 to
     2.0 transition.

   - removal of unused DIO flags from Nikolay.

   - blk-mq/sbitmap memory ordering fixes from Omar.

   - divide-by-zero fix for BFQ from Paolo.

   - minor documentation patches from Randy.

   - timeout fix from Tejun.

   - Alpha "can't write a char atomically" fix from Mikulas.

   - set of NVMe fixes by way of Keith.

   - bsg and bsg-lib improvements from Christoph.

   - a few sed-opal fixes from Jonas.

   - cdrom check-disk-change deadlock fix from Maurizio.

   - various little fixes, comment fixes, etc from various folks"

* tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits)
  blk-mq: Directly schedule q-&gt;timeout_work when aborting a request
  blktrace: fix comment in blktrace_api.h
  lightnvm: remove function name in strings
  lightnvm: pblk: remove some unnecessary NULL checks
  lightnvm: pblk: don't recover unwritten lines
  lightnvm: pblk: implement 2.0 support
  lightnvm: pblk: implement get log report chunk
  lightnvm: pblk: rename ppaf* to addrf*
  lightnvm: pblk: check for supported version
  lightnvm: implement get log report chunk helpers
  lightnvm: make address conversions depend on generic device
  lightnvm: add support for 2.0 address format
  lightnvm: normalize geometry nomenclature
  lightnvm: complete geo structure with maxoc*
  lightnvm: add shorten OCSSD version in geo
  lightnvm: add minor version to generic geometry
  lightnvm: simplify geometry structure
  lightnvm: pblk: refactor init/exit sequences
  lightnvm: Avoid validation of default op value
  lightnvm: centralize permission check for lightnvm ioctl
  ...
</content>
</entry>
<entry>
<title>block: Use blk_queue_flag_*() in drivers instead of queue_flag_*()</title>
<updated>2018-03-08T21:13:48+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bart.vanassche@wdc.com</email>
</author>
<published>2018-03-08T01:10:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8b904b5b6b58b9a29dcf3f82d936d9e7fd69fda6'/>
<id>urn:sha1:8b904b5b6b58b9a29dcf3f82d936d9e7fd69fda6</id>
<content type='text'>
This patch has been generated as follows:

for verb in set_unlocked clear_unlocked set clear; do
  replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \
    $(git grep -lw queue_flag_${verb} drivers block/bsg*)
done

Except for protecting all queue flag changes with the queue lock
this patch does not change any functionality.

Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: Shaohua Li &lt;shli@fb.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hannes Reinecke &lt;hare@suse.de&gt;
Cc: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>md: fix a potential deadlock of raid5/raid10 reshape</title>
<updated>2018-02-25T18:39:15+00:00</updated>
<author>
<name>BingJing Chang</name>
<email>bingjingc@synology.com</email>
</author>
<published>2018-02-22T05:34:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8876391e440ba615b10eef729576e111f0315f87'/>
<id>urn:sha1:8876391e440ba615b10eef729576e111f0315f87</id>
<content type='text'>
There is a potential deadlock if mount/umount happens when
raid5_finish_reshape() tries to grow the size of emulated disk.

How the deadlock happens?
1) The raid5 resync thread finished reshape (expanding array).
2) The mount or umount thread holds VFS sb-&gt;s_umount lock and tries to
   write through critical data into raid5 emulated block device. So it
   waits for raid5 kernel thread handling stripes in order to finish it
   I/Os.
3) In the routine of raid5 kernel thread, md_check_recovery() will be
   called first in order to reap the raid5 resync thread. That is,
   raid5_finish_reshape() will be called. In this function, it will try
   to update conf and call VFS revalidate_disk() to grow the raid5
   emulated block device. It will try to acquire VFS sb-&gt;s_umount lock.
The raid5 kernel thread cannot continue, so no one can handle mount/
umount I/Os (stripes). Once the write-through I/Os cannot be finished,
mount/umount will not release sb-&gt;s_umount lock. The deadlock happens.

The raid5 kernel thread is an emulated block device. It is responible to
handle I/Os (stripes) from upper layers. The emulated block device
should not request any I/Os on itself. That is, it should not call VFS
layer functions. (If it did, it will try to acquire VFS locks to
guarantee the I/Os sequence.) So we have the resync thread to send
resync I/O requests and to wait for the results.

For solving this potential deadlock, we can put the size growth of the
emulated block device as the final step of reshape thread.

2017/12/29:
Thanks to Guoqing Jiang &lt;gqjiang@suse.com&gt;,
we confirmed that there is the same deadlock issue in raid10. It's
reproducible and can be fixed by this patch. For raid10.c, we can remove
the similar code to prevent deadlock as well since they has been called
before.

Reported-by: Alex Wu &lt;alexwu@synology.com&gt;
Reviewed-by: Alex Wu &lt;alexwu@synology.com&gt;
Reviewed-by: Chung-Chiang Cheng &lt;cccheng@synology.com&gt;
Signed-off-by: BingJing Chang &lt;bingjingc@synology.com&gt;
Signed-off-by: Shaohua Li &lt;sh.li@alibaba-inc.com&gt;
</content>
</entry>
<entry>
<title>md: raid5: avoid string overflow warning</title>
<updated>2018-02-21T17:49:15+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2018-02-20T13:09:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=53b8d89ddbdbb0e4625a46d2cdbb6f106c52f801'/>
<id>urn:sha1:53b8d89ddbdbb0e4625a46d2cdbb6f106c52f801</id>
<content type='text'>
gcc warns about a possible overflow of the kmem_cache string, when adding
four characters to a string of the same length:

drivers/md/raid5.c: In function 'setup_conf':
drivers/md/raid5.c:2207:34: error: '-alt' directive writing 4 bytes into a region of size between 1 and 32 [-Werror=format-overflow=]
  sprintf(conf-&gt;cache_name[1], "%s-alt", conf-&gt;cache_name[0]);
                                  ^~~~
drivers/md/raid5.c:2207:2: note: 'sprintf' output between 5 and 36 bytes into a destination of size 32
  sprintf(conf-&gt;cache_name[1], "%s-alt", conf-&gt;cache_name[0]);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If I'm counting correctly, we need 11 characters for the fixed part
of the string and 18 characters for a 64-bit pointer (when no gendisk
is used), so that leaves three characters for conf-&gt;level, which should
always be sufficient.

This makes the code use snprintf() with the correct length, to
make the code more robust against changes, and to get the compiler
to shut up.

In commit f4be6b43f1ac ("md/raid5: ensure we create a unique name for
kmem_cache when mddev has no gendisk") from 2010, Neil said that
the pointer could be removed "shortly" once devices without gendisk
are disallowed. I have no idea if that happened, but if it did, that
should probably be changed as well.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Shaohua Li &lt;sh.li@alibaba-inc.com&gt;
</content>
</entry>
<entry>
<title>md/raid5: simplify uninitialization of shrinker</title>
<updated>2018-02-17T20:35:34+00:00</updated>
<author>
<name>Aliaksei Karaliou</name>
<email>akaraliou.dev@gmail.com</email>
</author>
<published>2017-12-23T18:20:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=565e0450129647df5112bff3df3ffd02b0c08e32'/>
<id>urn:sha1:565e0450129647df5112bff3df3ffd02b0c08e32</id>
<content type='text'>
Don't use shrinker.nr_deferred to check whether shrinker was
initialized or not. Now this check was integrated into
unregister_shrinker(), so it is safe to call it against
unregistered shrinker.

Signed-off-by: Aliaksei Karaliou &lt;akaraliou.dev@gmail.com&gt;
Signed-off-by: Shaohua Li &lt;sh.li@alibaba-inc.com&gt;
</content>
</entry>
<entry>
<title>raid5-ppl: PPL support for disks with write-back cache enabled</title>
<updated>2018-01-15T22:29:42+00:00</updated>
<author>
<name>Tomasz Majchrzak</name>
<email>tomasz.majchrzak@intel.com</email>
</author>
<published>2017-12-27T09:31:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1532d9e87e8b2377f12929f9e40724d5fbe6ecc5'/>
<id>urn:sha1:1532d9e87e8b2377f12929f9e40724d5fbe6ecc5</id>
<content type='text'>
In order to provide data consistency with PPL for disks with write-back
cache enabled all data has to be flushed to disks before next PPL
entry. The disks to be flushed are marked in the bitmap. It's modified
under a mutex and it's only read after PPL io unit is submitted.

A limitation of 64 disks in the array has been introduced to keep data
structures and implementation simple. RAID5 arrays with so many disks are
not likely due to high risk of multiple disks failure. Such restriction
should not be a real life limitation.

With write-back cache disabled next PPL entry is submitted when data write
for current one completes. Data flush defers next log submission so trigger
it when there are no stripes for handling found.

As PPL assures all data is flushed to disk at request completion, just
acknowledge flush request when PPL is enabled.

Signed-off-by: Tomasz Majchrzak &lt;tomasz.majchrzak@intel.com&gt;
Signed-off-by: Shaohua Li &lt;sh.li@alibaba-inc.com&gt;
</content>
</entry>
<entry>
<title>md: introduce new personality funciton start()</title>
<updated>2017-12-11T16:52:34+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2017-11-20T06:17:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d5d885fd514fcebc9da5503c88aa0112df7514ef'/>
<id>urn:sha1:d5d885fd514fcebc9da5503c88aa0112df7514ef</id>
<content type='text'>
In do_md_run(), md threads should not wake up until the array is fully
initialized in md_run(). However, in raid5_run(), raid5-cache may wake
up mddev-&gt;thread to flush stripes that need to be written back. This
design doesn't break badly right now. But it could lead to bad bug in
the future.

This patch tries to resolve this problem by splitting start up work
into two personality functions, run() and start(). Tasks that do not
require the md threads should go into run(), while task that require
the md threads go into start().

r5l_load_log() is moved to raid5_start(), so it is not called until
the md threads are started in do_md_run().

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>md/raid5: correct degraded calculation in raid5_error</title>
<updated>2017-12-01T19:27:32+00:00</updated>
<author>
<name>bingjingc</name>
<email>bingjingc@synology.com</email>
</author>
<published>2017-11-17T02:57:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aff69d89bdebc39235cddb4445371eb979b49685'/>
<id>urn:sha1:aff69d89bdebc39235cddb4445371eb979b49685</id>
<content type='text'>
When disk failure occurs on new disks for reshape, mddev-&gt;degraded
is not calculated correctly. Faulty bit of the failure device is not
set before raid5_calc_degraded(conf).

mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/loop[012]
mdadm /dev/md0 -a /dev/loop3
mdadm /dev/md0 --grow -n4
mdadm /dev/md0 -f /dev/loop3 # simulating disk failure

cat /sys/block/md0/md/degraded # it outputs 0, but it should be 1.

However, mdadm -D /dev/md0 will show that it is degraded. It's a bug.
It can be fixed by moving the resources raid5_calc_degraded() depends
on before it.

Reported-by: Roy Chung &lt;roychung@synology.com&gt;
Reviewed-by: Alex Wu &lt;alexwu@synology.com&gt;
Signed-off-by: BingJing Chang &lt;bingjingc@synology.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md</title>
<updated>2017-11-15T00:07:26+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-11-15T00:07:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=47f521ba18190e4bfbb65ead3977af5756884427'/>
<id>urn:sha1:47f521ba18190e4bfbb65ead3977af5756884427</id>
<content type='text'>
Pull MD update from Shaohua Li:
 "This update mostly includes bug fixes:

   - md-cluster now supports raid10 from Guoqing

   - raid5 PPL fixes from Artur

   - badblock regression fix from Bo

   - suspend hang related fixes from Neil

   - raid5 reshape fixes from Neil

   - raid1 freeze deadlock fix from Nate

   - memleak fixes from Zdenek

   - bitmap related fixes from Me and Tao

   - other fixes and cleanups"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (33 commits)
  md: free unused memory after bitmap resize
  md: release allocated bitset sync_set
  md/bitmap: clear BITMAP_WRITE_ERROR bit before writing it to sb
  md: be cautious about using -&gt;curr_resync_completed for -&gt;recovery_offset
  badblocks: fix wrong return value in badblocks_set if badblocks are disabled
  md: don't check MD_SB_CHANGE_CLEAN in md_allow_write
  md-cluster: update document for raid10
  md: remove redundant variable q
  raid1: remove obsolete code in raid1_write_request
  md-cluster: Use a small window for raid10 resync
  md-cluster: Suspend writes in RAID10 if within range
  md-cluster/raid10: set "do_balance = 0" if area is resyncing
  md: use lockdep_assert_held
  raid1: prevent freeze_array/wait_all_barriers deadlock
  md: use TASK_IDLE instead of blocking signals
  md: remove special meaning of -&gt;quiesce(.., 2)
  md: allow metadata update while suspending.
  md: use mddev_suspend/resume instead of -&gt;quiesce()
  md: move suspend_hi/lo handling into core md code
  md: don't call bitmap_create() while array is quiesced.
  ...
</content>
</entry>
<entry>
<title>md: be cautious about using -&gt;curr_resync_completed for -&gt;recovery_offset</title>
<updated>2017-11-09T15:29:40+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-10-17T05:18:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db0505d320660b6ad92418847e7eca6b61b246ac'/>
<id>urn:sha1:db0505d320660b6ad92418847e7eca6b61b246ac</id>
<content type='text'>
The -&gt;recovery_offset shows how much of a non-InSync device is actually
in sync - how much has been recoveryed.

When performing a recovery, -&gt;curr_resync and -&gt;curr_resync_completed
follow the device address being recovered and so can be used to update
-&gt;recovery_offset.

When performing a reshape, -&gt;curr_resync* might follow the device
addresses (raid5) or might follow array addresses (raid10), so cannot
in general be used to set -&gt;recovery_offset.  When reshaping backwards,
-&gt;curre_resync* measures from the *end* of the array-or-device, so is
particularly unhelpful.

So change the common code in md.c to only use -&gt;curr_resync_complete
for the simple recovery case, and add code to raid5.c to update
-&gt;recovery_offset during a forwards reshape.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
</feed>
