<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/block/blk-core.c, branch v4.4.69</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.4.69</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.4.69'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2017-04-08T07:53:32+00:00</updated>
<entry>
<title>blk: Ensure users for current-&gt;bio_list can see the full list.</title>
<updated>2017-04-08T07:53:32+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-03-10T06:00:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5cca175b6cda16b68b18967210872327b1cadf4f'/>
<id>urn:sha1:5cca175b6cda16b68b18967210872327b1cadf4f</id>
<content type='text'>
commit f5fe1b51905df7cfe4fdfd85c5fb7bc5b71a094f upstream.

Commit 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()")
changed current-&gt;bio_list so that it did not contain *all* of the
queued bios, but only those submitted by the currently running
make_request_fn.

There are two places which walk the list and requeue selected bios,
and others that check if the list is empty.  These are no longer
correct.

So redefine current-&gt;bio_list to point to an array of two lists, which
contain all queued bios, and adjust various code to test or walk both
lists.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Fixes: 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()")
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
[jwang: backport to 4.4]
Signed-off-by: Jack Wang &lt;jinpu.wang@profitbricks.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
[bwh: Restore changes in device-mapper from upstream version]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
</content>
</entry>
<entry>
<title>blk: improve order of bio handling in generic_make_request()</title>
<updated>2017-04-08T07:53:32+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-03-07T20:38:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2cbd78f4239bd28b86c6ff8e3b7867db72762f1a'/>
<id>urn:sha1:2cbd78f4239bd28b86c6ff8e3b7867db72762f1a</id>
<content type='text'>
commit 79bd99596b7305ab08109a8bf44a6a4511dbf1cd upstream.

To avoid recursion on the kernel stack when stacked block devices
are in use, generic_make_request() will, when called recursively,
queue new requests for later handling.  They will be handled when the
make_request_fn for the current bio completes.

If any bios are submitted by a make_request_fn, these will ultimately
be handled seqeuntially.  If the handling of one of those generates
further requests, they will be added to the end of the queue.

This strict first-in-first-out behaviour can lead to deadlocks in
various ways, normally because a request might need to wait for a
previous request to the same device to complete.  This can happen when
they share a mempool, and can happen due to interdependencies
particular to the device.  Both md and dm have examples where this happens.

These deadlocks can be erradicated by more selective ordering of bios.
Specifically by handling them in depth-first order.  That is: when the
handling of one bio generates one or more further bios, they are
handled immediately after the parent, before any siblings of the
parent.  That way, when generic_make_request() calls make_request_fn
for some particular device, we can be certain that all previously
submited requests for that device have been completely handled and are
not waiting for anything in the queue of requests maintained in
generic_make_request().

An easy way to achieve this would be to use a last-in-first-out stack
instead of a queue.  However this will change the order of consecutive
bios submitted by a make_request_fn, which could have unexpected consequences.
Instead we take a slightly more complex approach.
A fresh queue is created for each call to a make_request_fn.  After it completes,
any bios for a different device are placed on the front of the main queue, followed
by any bios for the same device, followed by all bios that were already on
the queue before the make_request_fn was called.
This provides the depth-first approach without reordering bios on the same level.

This, by itself, it not enough to remove all deadlocks.  It just makes
it possible for drivers to take the extra step required themselves.

To avoid deadlocks, drivers must never risk waiting for a request
after submitting one to generic_make_request.  This includes never
allocing from a mempool twice in the one call to a make_request_fn.

A common pattern in drivers is to call bio_split() in a loop, handling
the first part and then looping around to possibly split the next part.
Instead, a driver that finds it needs to split a bio should queue
(with generic_make_request) the second part, handle the first part,
and then return.  The new code in generic_make_request will ensure the
requests to underlying bios are processed first, then the second bio
that was split off.  If it splits again, the same process happens.  In
each case one bio will be completely handled before the next one is attempted.

With this is place, it should be possible to disable the
punt_bios_to_recover() recovery thread for many block devices, and
eventually it may be possible to remove it completely.

Ref: http://www.spinics.net/lists/raid/msg54680.html
Tested-by: Jinpu Wang &lt;jinpu.wang@profitbricks.com&gt;
Inspired-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
[jwang: backport to 4.4]
Signed-off-by: Jack Wang &lt;jinpu.wang@profitbricks.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: Fix race triggered by blk_set_queue_dying()</title>
<updated>2016-09-15T06:27:51+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bart.vanassche@sandisk.com</email>
</author>
<published>2016-08-16T23:48:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d3a6bd7b77ce9c9c19b37226ab0640a8918c4663'/>
<id>urn:sha1:d3a6bd7b77ce9c9c19b37226ab0640a8918c4663</id>
<content type='text'>
commit 1b856086813be9371929b6cc62045f9fd470f5a0 upstream.

blk_set_queue_dying() can be called while another thread is
submitting I/O or changing queue flags, e.g. through dm_stop_queue().
Hence protect the QUEUE_FLAG_DYING flag change with locking.

Signed-off-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: fix excessive dm-mq context switching</title>
<updated>2016-04-12T16:08:40+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2016-02-05T13:49:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5504a47088034573d0839120751b1aec46204aab'/>
<id>urn:sha1:5504a47088034573d0839120751b1aec46204aab</id>
<content type='text'>
commit 6acfe68bac7e6f16dc312157b1fa6e2368985013 upstream.

Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly.  One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true.  This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request.  The ftrace
function_graph tracer showed:

  kworker-2013  =&gt;   fio-12190
  fio-12190    =&gt;  kworker-2013
  ...
  kworker-2013  =&gt;   fio-12190
  fio-12190    =&gt;  kworker-2013
  ...

Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.

In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:

1)  don't blk_mq_run_hw_queues in blk-mq request completion

    Motivated by desire to reduce overhead of dm-mq, punting to kblockd
    just increases context switches.

    In my testing against a really fast null_blk device there was no benefit
    to running blk_mq_run_hw_queues() on completion (and no other blk-mq
    driver does this).  So hopefully this change doesn't induce the need for
    yet another revert like commit 621739b00e16ca2d !

2)  use blk_mq_complete_request() in dm_complete_request()

    blk_complete_request() doesn't offer the traditional q-&gt;mq_ops vs
    .request_fn branching pattern that other historic block interfaces
    do (e.g. blk_get_request).  Using blk_mq_complete_request() for
    blk-mq requests is important for performance.  It should be noted
    that, like blk_complete_request(), blk_mq_complete_request() doesn't
    natively handle partial completions -- but the request-based
    DM-multipath target does provide the required partial completion
    support by dm.c:end_clone_bio() triggering requeueing of the request
    via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.

dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472

With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.

Fixes: 7fb4898e0 ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cdb ("dm: add full blk-mq support to request-based DM")
Reported-by: Sagi Grimberg &lt;sagig@dev.mellanox.co.il&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: add blk_start_queue_async()</title>
<updated>2015-12-28T20:07:07+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2015-12-28T20:01:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=21491412f2ec6f13d4104de734dec0ba659d092e'/>
<id>urn:sha1:21491412f2ec6f13d4104de734dec0ba659d092e</id>
<content type='text'>
We currently only have an inline/sync helper to restart a stopped
queue. If drivers need an async version, they have to roll their
own. Add a generic helper instead.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: ensure to split after potentially bouncing a bio</title>
<updated>2015-12-22T17:26:53+00:00</updated>
<author>
<name>Junichi Nomura</name>
<email>j-nomura@ce.jp.nec.com</email>
</author>
<published>2015-12-22T17:23:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=23688bf4f830a89866fd0ed3501e342a7360fe4f'/>
<id>urn:sha1:23688bf4f830a89866fd0ed3501e342a7360fe4f</id>
<content type='text'>
blk_queue_bio() does split then bounce, which makes the segment
counting based on pages before bouncing and could go wrong. Move
the split to after bouncing, like we do for blk-mq, and the we
fix the issue of having the bio count for segments be wrong.

Fixes: 54efd50bfd87 ("block: make generic_make_request handle arbitrarily sized bios")
Cc: stable@vger.kernel.org
Tested-by: Artem S. Tashkinov &lt;t.artem@lycos.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>SCSI: Fix NULL pointer dereference in runtime PM</title>
<updated>2015-12-04T03:35:02+00:00</updated>
<author>
<name>Ken Xue</name>
<email>ken.xue@amd.com</email>
</author>
<published>2015-12-01T06:45:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4fd41a8552afc01054d9d9fc7f1a63c324867d27'/>
<id>urn:sha1:4fd41a8552afc01054d9d9fc7f1a63c324867d27</id>
<content type='text'>
The routines in scsi_pm.c assume that if a runtime-PM callback is
invoked for a SCSI device, it can only mean that the device's driver
has asked the block layer to handle the runtime power management (by
calling blk_pm_runtime_init(), which among other things sets q-&gt;dev).

However, this assumption turns out to be wrong for things like the ses
driver.  Normally ses devices are not allowed to do runtime PM, but
userspace can override this setting.  If this happens, the kernel gets
a NULL pointer dereference when blk_post_runtime_resume() tries to use
the uninitialized q-&gt;dev pointer.

This patch fixes the problem by checking q-&gt;dev in block layer before
handle runtime PM. Since ses doesn't define any PM callbacks and call
blk_pm_runtime_init(), the crash won't occur.

This fixes Bugzilla #101371.
https://bugzilla.kernel.org/show_bug.cgi?id=101371

More discussion can be found from below link.
http://marc.info/?l=linux-scsi&amp;m=144163730531875&amp;w=2

Signed-off-by: Ken Xue &lt;Ken.Xue@amd.com&gt;
Acked-by: Alan Stern &lt;stern@rowland.harvard.edu&gt;
Cc: Xiangliang Yu &lt;Xiangliang.Yu@amd.com&gt;
Cc: James E.J. Bottomley &lt;JBottomley@odin.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Michael Terry &lt;Michael.terry@canonical.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: Always check queue limits for cloned requests</title>
<updated>2015-11-29T21:37:27+00:00</updated>
<author>
<name>Hannes Reinecke</name>
<email>hare@suse.de</email>
</author>
<published>2015-11-26T07:46:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf4e6b4e757488dee1b6a581f49c7ac34cd217f8'/>
<id>urn:sha1:bf4e6b4e757488dee1b6a581f49c7ac34cd217f8</id>
<content type='text'>
When a cloned request is retried on other queues it always needs
to be checked against the queue limits of that queue.
Otherwise the calculations for nr_phys_segments might be wrong,
leading to a crash in scsi_init_sgtable().

To clarify this the patch renames blk_rq_check_limits()
to blk_cloned_rq_check_limits() and removes the symbol
export, as the new function should only be used for
cloned requests and never exported.

Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: Ewan Milne &lt;emilne@redhat.com&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Hannes Reinecke &lt;hare@suse.de&gt;
Fixes: e2a60da74 ("block: Clean up special command handling logic")
Cc: stable@vger.kernel.org # 3.7+
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: fix blk-core.c kernel-doc warning</title>
<updated>2015-11-11T16:36:57+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2015-10-31T01:36:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ccc2600b8a28f3eb0c126cd00312baba1c22cccb'/>
<id>urn:sha1:ccc2600b8a28f3eb0c126cd00312baba1c22cccb</id>
<content type='text'>
Fix kernel-doc warning in blk-core.c:

Warning(..//block/blk-core.c:1549): No description found for parameter 'same_queue_rq'

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block</title>
<updated>2015-11-11T01:23:49+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-11T01:23:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3419b45039c6b799c974a8019361c045e7ca232c'/>
<id>urn:sha1:3419b45039c6b799c974a8019361c045e7ca232c</id>
<content type='text'>
Pull block IO poll support from Jens Axboe:
 "Various groups have been doing experimentation around IO polling for
  (really) fast devices.  The code has been reviewed and has been
  sitting on the side for a few releases, but this is now good enough
  for coordinated benchmarking and further experimentation.

  Currently O_DIRECT sync read/write are supported.  A framework is in
  the works that allows scalable stats tracking so we can auto-tune
  this.  And we'll add libaio support as well soon.  Fow now, it's an
  opt-in feature for test purposes"

* 'for-4.4/io-poll' of git://git.kernel.dk/linux-block:
  direct-io: be sure to assign dio-&gt;bio_bdev for both paths
  directio: add block polling support
  NVMe: add blk polling support
  block: add block polling support
  blk-mq: return tag/queue combo in the make_request_fn handlers
  block: change -&gt;make_request_fn() and users to return a queue cookie
</content>
</entry>
</feed>
