<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/dm.c, branch v4.4.171</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.4.171</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.4.171'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2018-02-22T14:45:01+00:00</updated>
<entry>
<title>dm: correctly handle chained bios in dec_pending()</title>
<updated>2018-02-22T14:45:01+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2018-02-15T09:00:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bb18512819952a468c8f618fbf0b9a470a854a06'/>
<id>urn:sha1:bb18512819952a468c8f618fbf0b9a470a854a06</id>
<content type='text'>
commit 8dd601fa8317243be887458c49f6c29c2f3d719f upstream.

dec_pending() is given an error status (possibly 0) to be recorded
against a bio.  It can be called several times on the one 'struct
dm_io', and it is careful to only assign a non-zero error to
io-&gt;status.  However when it then assigned io-&gt;status to bio-&gt;bi_status,
it is not careful and could overwrite a genuine error status with 0.

This can happen when chained bios are in use.  If a bio is chained
beneath the bio that this dm_io is handling, the child bio might
complete and set bio-&gt;bi_status before the dm_io completes.

This has been possible since chained bios were introduced in 3.14, and
has become a lot easier to trigger with commit 18a25da84354 ("dm: ensure
bio submission follows a depth-first tree walk") as that commit caused
dm to start using chained bios itself.

A particular failure mode is that if a bio spans an 'error' target and a
working target, the 'error' fragment will complete instantly and set the
-&gt;bi_status, and the other fragment will normally complete a little
later, and will clear -&gt;bi_status.

The fix is simply to only assign io_error to bio-&gt;bi_status when
io_error is not zero.

Reported-and-tested-by: Milan Broz &lt;gmazyland@gmail.com&gt;
Cc: stable@vger.kernel.org (v3.14+)
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: fix race between dm_get_from_kobject() and __dm_destroy()</title>
<updated>2017-11-30T08:37:20+00:00</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2017-11-01T07:42:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4e82464aa4a398207e2ecbc4877c82319ecdbafa'/>
<id>urn:sha1:4e82464aa4a398207e2ecbc4877c82319ecdbafa</id>
<content type='text'>
commit b9a41d21dceadf8104812626ef85dc56ee8a60ed upstream.

The following BUG_ON was hit when testing repeat creation and removal of
DM devices:

    kernel BUG at drivers/md/dm.c:2919!
    CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
    Call Trace:
     [&lt;ffffffff81649e8b&gt;] dm_get_from_kobject+0x34/0x3a
     [&lt;ffffffff81650ef1&gt;] dm_attr_show+0x2b/0x5e
     [&lt;ffffffff817b46d1&gt;] ? mutex_lock+0x26/0x44
     [&lt;ffffffff811df7f5&gt;] sysfs_kf_seq_show+0x83/0xcf
     [&lt;ffffffff811de257&gt;] kernfs_seq_show+0x23/0x25
     [&lt;ffffffff81199118&gt;] seq_read+0x16f/0x325
     [&lt;ffffffff811de994&gt;] kernfs_fop_read+0x3a/0x13f
     [&lt;ffffffff8117b625&gt;] __vfs_read+0x26/0x9d
     [&lt;ffffffff8130eb59&gt;] ? security_file_permission+0x3c/0x44
     [&lt;ffffffff8117bdb8&gt;] ? rw_verify_area+0x83/0xd9
     [&lt;ffffffff8117be9d&gt;] vfs_read+0x8f/0xcf
     [&lt;ffffffff81193e34&gt;] ? __fdget_pos+0x12/0x41
     [&lt;ffffffff8117c686&gt;] SyS_read+0x4b/0x76
     [&lt;ffffffff817b606e&gt;] system_call_fastpath+0x12/0x71

The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
between the test of DMF_FREEING &amp; DMF_DELETING and dm_get() in
dm_get_from_kobject().

To fix it, we need to ensure the test of DMF_FREEING &amp; DMF_DELETING and
dm_get() are done in an atomic way, so _minor_lock is used.

The other callers of dm_get() have also been checked to be OK: some
callers invoke dm_get() under _minor_lock, some callers invoke it under
_hash_lock, and dm_start_request() invoke it after increasing
md-&gt;open_count.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>blk: Ensure users for current-&gt;bio_list can see the full list.</title>
<updated>2017-04-08T07:53:32+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-03-10T06:00:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5cca175b6cda16b68b18967210872327b1cadf4f'/>
<id>urn:sha1:5cca175b6cda16b68b18967210872327b1cadf4f</id>
<content type='text'>
commit f5fe1b51905df7cfe4fdfd85c5fb7bc5b71a094f upstream.

Commit 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()")
changed current-&gt;bio_list so that it did not contain *all* of the
queued bios, but only those submitted by the currently running
make_request_fn.

There are two places which walk the list and requeue selected bios,
and others that check if the list is empty.  These are no longer
correct.

So redefine current-&gt;bio_list to point to an array of two lists, which
contain all queued bios, and adjust various code to test or walk both
lists.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Fixes: 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()")
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
[jwang: backport to 4.4]
Signed-off-by: Jack Wang &lt;jinpu.wang@profitbricks.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
[bwh: Restore changes in device-mapper from upstream version]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
</content>
</entry>
<entry>
<title>dm: flush queued bios when process blocks to avoid deadlock</title>
<updated>2017-03-18T11:09:58+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2017-02-15T16:26:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cd8ad4d9eb6d9ee04e77b42c6a7a15eabada85ac'/>
<id>urn:sha1:cd8ad4d9eb6d9ee04e77b42c6a7a15eabada85ac</id>
<content type='text'>
commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 upstream.

Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
stacking drivers") created a workqueue for every bio set and code
in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
by redirecting bios queued on current-&gt;bio_list to the workqueue if the
system is low on memory.  However other deadlocks (see below **) may
happen, without any low memory condition, because generic_make_request
is queuing bios to current-&gt;bio_list (rather than submitting them).

** the related dm-snapshot deadlock is detailed here:
https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html

Fix this deadlock by redirecting any bios on current-&gt;bio_list to the
bio_set's rescue workqueue on every schedule() call.  Consequently,
when the process blocks on a mutex, the bios queued on
current-&gt;bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

The structure blk_plug contains an entry cb_list and this list can contain
arbitrary callback functions that are called when the process blocks.
To implement this fix DM (ab)uses the onstack plug's cb_list interface
to get its flush_current_bio_list() called at schedule() time.

This fixes the snapshot deadlock - if the map method blocks,
flush_current_bio_list() will be called and it redirects bios waiting
on current-&gt;bio_list to appropriate workqueues.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
Depends-on: df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers")
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: free io_barrier after blk_cleanup_queue call</title>
<updated>2016-11-10T15:36:34+00:00</updated>
<author>
<name>Tahsin Erdogan</name>
<email>tahsin@google.com</email>
</author>
<published>2016-10-10T12:35:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cb270a3f16668efb80141e530f5235035301cc11'/>
<id>urn:sha1:cb270a3f16668efb80141e530f5235035301cc11</id>
<content type='text'>
commit d09960b0032174eb493c4c13be5b9c9ef36dc9a7 upstream.

dm_old_request_fn() has paths that access md-&gt;io_barrier.  The party
destroying io_barrier should ensure that no future execution of
dm_old_request_fn() is possible.  Move io_barrier destruction to below
blk_cleanup_queue() to ensure this and avoid a NULL pointer crash during
request-based DM device shutdown.

Signed-off-by: Tahsin Erdogan &lt;tahsin@google.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: return correct error code in dm_resume()'s retry loop</title>
<updated>2016-10-28T07:01:27+00:00</updated>
<author>
<name>Minfei Huang</name>
<email>mnghuan@gmail.com</email>
</author>
<published>2016-09-06T08:00:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf74a108c67947a2c72d16786338b89d75119a48'/>
<id>urn:sha1:bf74a108c67947a2c72d16786338b89d75119a48</id>
<content type='text'>
commit 8dc23658b7aaa8b6b0609c81c8ad75e98b612801 upstream.

dm_resume() will return success (0) rather than -EINVAL if
!dm_suspended_md() upon retry within dm_resume().

Reset the error code at the start of dm_resume()'s retry loop.
Also, remove a useless assignment at the end of dm_resume().

Fixes: ffcc393641 ("dm: enhance internal suspend and resume interface")
Signed-off-by: Minfei Huang &lt;mnghuan@gmail.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: mark request_queue dead before destroying the DM device</title>
<updated>2016-10-28T07:01:27+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bart.vanassche@sandisk.com</email>
</author>
<published>2016-08-31T22:17:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=90be7f1538fb0ab22582f018e42115f18315eb8d'/>
<id>urn:sha1:90be7f1538fb0ab22582f018e42115f18315eb8d</id>
<content type='text'>
commit 3b785fbcf81c3533772c52b717f77293099498d3 upstream.

This avoids that new requests are queued while __dm_destroy() is in
progress.

Signed-off-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: set DMF_SUSPENDED* _before_ clearing DMF_NOFLUSH_SUSPENDING</title>
<updated>2016-08-20T16:09:19+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2016-08-02T17:07:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fb76628b66f88b6c8206fa906f524362869b5c03'/>
<id>urn:sha1:fb76628b66f88b6c8206fa906f524362869b5c03</id>
<content type='text'>
commit eaf9a7361f47727b166688a9f2096854eef60fbe upstream.

Otherwise, there is potential for both DMF_SUSPENDED* and
DMF_NOFLUSH_SUSPENDING to not be set during dm_suspend() -- which is
definitely _not_ a valid state.

This fix, in conjuction with "dm rq: fix the starting and stopping of
blk-mq queues", addresses the potential for request-based DM multipath's
__multipath_map() to see !dm_noflush_suspending() during suspend.

Reported-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request()</title>
<updated>2016-04-12T16:08:40+00:00</updated>
<author>
<name>Bryn M. Reeves</name>
<email>bmr@redhat.com</email>
</author>
<published>2016-03-14T21:04:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8907d8a6fd3f21992283efd67002aea719396f2b'/>
<id>urn:sha1:8907d8a6fd3f21992283efd67002aea719396f2b</id>
<content type='text'>
commit 98dbc9c6c61698792e3a66f32f3bf066201d42d7 upstream.

An "old" (.request_fn) DM 'struct request' stores a pointer to the
associated 'struct dm_rq_target_io' in rq-&gt;special.

dm_requeue_original_request(), previously named
dm_requeue_unmapped_original_request(), called dm_unprep_request() to
reset rq-&gt;special to NULL.  But rq_end_stats() would go on to hit a NULL
pointer deference because its call to tio_from_request() returned NULL.

Fix this by calling rq_end_stats() _before_ dm_unprep_request()

Signed-off-by: Bryn M. Reeves &lt;bmr@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Fixes: e262f34741 ("dm stats: add support for request-based DM devices")
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: fix excessive dm-mq context switching</title>
<updated>2016-04-12T16:08:40+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2016-02-05T13:49:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5504a47088034573d0839120751b1aec46204aab'/>
<id>urn:sha1:5504a47088034573d0839120751b1aec46204aab</id>
<content type='text'>
commit 6acfe68bac7e6f16dc312157b1fa6e2368985013 upstream.

Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly.  One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true.  This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request.  The ftrace
function_graph tracer showed:

  kworker-2013  =&gt;   fio-12190
  fio-12190    =&gt;  kworker-2013
  ...
  kworker-2013  =&gt;   fio-12190
  fio-12190    =&gt;  kworker-2013
  ...

Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.

In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:

1)  don't blk_mq_run_hw_queues in blk-mq request completion

    Motivated by desire to reduce overhead of dm-mq, punting to kblockd
    just increases context switches.

    In my testing against a really fast null_blk device there was no benefit
    to running blk_mq_run_hw_queues() on completion (and no other blk-mq
    driver does this).  So hopefully this change doesn't induce the need for
    yet another revert like commit 621739b00e16ca2d !

2)  use blk_mq_complete_request() in dm_complete_request()

    blk_complete_request() doesn't offer the traditional q-&gt;mq_ops vs
    .request_fn branching pattern that other historic block interfaces
    do (e.g. blk_get_request).  Using blk_mq_complete_request() for
    blk-mq requests is important for performance.  It should be noted
    that, like blk_complete_request(), blk_mq_complete_request() doesn't
    natively handle partial completions -- but the request-based
    DM-multipath target does provide the required partial completion
    support by dm.c:end_clone_bio() triggering requeueing of the request
    via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.

dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472

With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.

Fixes: 7fb4898e0 ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cdb ("dm: add full blk-mq support to request-based DM")
Reported-by: Sagi Grimberg &lt;sagig@dev.mellanox.co.il&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
