<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/block, branch v4.20.9</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.20.9</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.20.9'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2019-01-22T20:09:57+00:00</updated>
<entry>
<title>block: use rcu_work instead of call_rcu to avoid sleep in softirq</title>
<updated>2019-01-22T20:09:57+00:00</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2018-11-28T08:42:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f3631a8b2fcb27648453dce4bec25123160095fa'/>
<id>urn:sha1:f3631a8b2fcb27648453dce4bec25123160095fa</id>
<content type='text'>
commit 94a2c3a32b62e868dc1e3d854326745a7f1b8c7a upstream.

We recently got a stack by syzkaller like this:

BUG: sleeping function called from invalid context at mm/slab.h:361
in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid
INFO: lockdep is turned off.
CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194
 0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00
 0000000000000000 0000000000000001 0000000000000004 0000000000000000
Call Trace:
 &lt;IRQ&gt;  [&lt;ffffffff81cb2194&gt;] __dump_stack lib/dump_stack.c:15 [inline]
 &lt;IRQ&gt;  [&lt;ffffffff81cb2194&gt;] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
 [&lt;ffffffff8129a981&gt;] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675
 [&lt;ffffffff8129ac33&gt;] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637
 [&lt;ffffffff81794c13&gt;] slab_pre_alloc_hook mm/slab.h:361 [inline]
 [&lt;ffffffff81794c13&gt;] slab_alloc_node mm/slub.c:2610 [inline]
 [&lt;ffffffff81794c13&gt;] slab_alloc mm/slub.c:2692 [inline]
 [&lt;ffffffff81794c13&gt;] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709
 [&lt;ffffffff81cbe9a7&gt;] kmalloc include/linux/slab.h:479 [inline]
 [&lt;ffffffff81cbe9a7&gt;] kzalloc include/linux/slab.h:623 [inline]
 [&lt;ffffffff81cbe9a7&gt;] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227
 [&lt;ffffffff81cbf84f&gt;] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374
 [&lt;ffffffff81cbb5b9&gt;] kobject_cleanup lib/kobject.c:633 [inline]
 [&lt;ffffffff81cbb5b9&gt;] kobject_release+0x229/0x440 lib/kobject.c:675
 [&lt;ffffffff81cbb0a2&gt;] kref_sub include/linux/kref.h:73 [inline]
 [&lt;ffffffff81cbb0a2&gt;] kref_put include/linux/kref.h:98 [inline]
 [&lt;ffffffff81cbb0a2&gt;] kobject_put+0x72/0xd0 lib/kobject.c:692
 [&lt;ffffffff8216f095&gt;] put_device+0x25/0x30 drivers/base/core.c:1237
 [&lt;ffffffff81c4cc34&gt;] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232
 [&lt;ffffffff813c08bc&gt;] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
 [&lt;ffffffff813c08bc&gt;] rcu_do_batch kernel/rcu/tree.c:2705 [inline]
 [&lt;ffffffff813c08bc&gt;] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline]
 [&lt;ffffffff813c08bc&gt;] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline]
 [&lt;ffffffff813c08bc&gt;] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957
 [&lt;ffffffff8120f509&gt;] __do_softirq+0x299/0xe20 kernel/softirq.c:273
 [&lt;ffffffff81210496&gt;] invoke_softirq kernel/softirq.c:350 [inline]
 [&lt;ffffffff81210496&gt;] irq_exit+0x216/0x2c0 kernel/softirq.c:391
 [&lt;ffffffff82c2cd7b&gt;] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
 [&lt;ffffffff82c2cd7b&gt;] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
 [&lt;ffffffff82c2bc25&gt;] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746
 &lt;EOI&gt;  [&lt;ffffffff814cbf40&gt;] ? audit_kill_trees+0x180/0x180
 [&lt;ffffffff8187d2f7&gt;] fd_install+0x57/0x80 fs/file.c:626
 [&lt;ffffffff8180989e&gt;] do_sys_open+0x45e/0x550 fs/open.c:1043
 [&lt;ffffffff818099c2&gt;] SYSC_open fs/open.c:1055 [inline]
 [&lt;ffffffff818099c2&gt;] SyS_open+0x32/0x40 fs/open.c:1050
 [&lt;ffffffff82c299e1&gt;] entry_SYSCALL_64_fastpath+0x1e/0x9a

In softirq context, we call rcu callback function delete_partition_rcu_cb(),
which may allocate memory by kzalloc with GFP_KERNEL flag. If the
allocation cannot be satisfied, it may sleep. However, That is not allowed
in softirq contex.

Although we found this problem on linux 4.4, the latest kernel version
seems to have this problem as well. And it is very similar to the
previous one:
	https://lkml.org/lkml/2018/7/9/391

Fix it by using RCU workqueue, which allows sleep.

Reviewed-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: mq-deadline: Fix write completion handling</title>
<updated>2019-01-13T08:24:05+00:00</updated>
<author>
<name>Damien Le Moal</name>
<email>damien.lemoal@wdc.com</email>
</author>
<published>2018-12-17T06:14:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d902258a8997d9e1007c41da5205a831756858c7'/>
<id>urn:sha1:d902258a8997d9e1007c41da5205a831756858c7</id>
<content type='text'>
commit 7211aef86f79583e59b88a0aba0bc830566f7e8e upstream.

For a zoned block device using mq-deadline, if a write request for a
zone is received while another write was already dispatched for the same
zone, dd_dispatch_request() will return NULL and the newly inserted
write request is kept in the scheduler queue waiting for the ongoing
zone write to complete. With this behavior, when no other request has
been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
__blk_mq_free_request() call of blk_mq_sched_restart() to not run the
queue when the already dispatched write request completes. The newly
dispatched request stays stuck in the scheduler queue until eventually
another request is submitted.

This problem does not affect SCSI disk as the SCSI stack handles queue
restart on request completion. However, this problem is can be triggered
the nullblk driver with zoned mode enabled.

Fix this by always requesting a queue restart in dd_dispatch_request()
if no request was dispatched while WRITE requests are queued.

Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

Add missing export of blk_mq_sched_restart()

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;

</content>
</entry>
<entry>
<title>block: deactivate blk_stat timer in wbt_disable_default()</title>
<updated>2019-01-13T08:24:05+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2018-12-12T11:44:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7571b18bcad56132570f86695b641b56f403991e'/>
<id>urn:sha1:7571b18bcad56132570f86695b641b56f403991e</id>
<content type='text'>
commit 544fbd16a461a318cd80537d1331c0df5c6cf930 upstream.

rwb_enabled() can't be changed when there is any inflight IO.

wbt_disable_default() may set rwb-&gt;wb_normal as zero, however the
blk_stat timer may still be pending, and the timer function will update
wrb-&gt;wb_normal again.

This patch introduces blk_stat_deactivate() and applies it in
wbt_disable_default(), then the following IO hang triggered when running
parted &amp; switching io scheduler can be fixed:

[  369.937806] INFO: task parted:3645 blocked for more than 120 seconds.
[  369.938941]       Not tainted 4.20.0-rc6-00284-g906c801e5248 #498
[  369.939797] "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.940768] parted          D    0  3645   3239 0x00000000
[  369.941500] Call Trace:
[  369.941874]  ? __schedule+0x6d9/0x74c
[  369.942392]  ? wbt_done+0x5e/0x5e
[  369.942864]  ? wbt_cleanup_cb+0x16/0x16
[  369.943404]  ? wbt_done+0x5e/0x5e
[  369.943874]  schedule+0x67/0x78
[  369.944298]  io_schedule+0x12/0x33
[  369.944771]  rq_qos_wait+0xb5/0x119
[  369.945193]  ? karma_partition+0x1c2/0x1c2
[  369.945691]  ? wbt_cleanup_cb+0x16/0x16
[  369.946151]  wbt_wait+0x85/0xb6
[  369.946540]  __rq_qos_throttle+0x23/0x2f
[  369.947014]  blk_mq_make_request+0xe6/0x40a
[  369.947518]  generic_make_request+0x192/0x2fe
[  369.948042]  ? submit_bio+0x103/0x11f
[  369.948486]  ? __radix_tree_lookup+0x35/0xb5
[  369.949011]  submit_bio+0x103/0x11f
[  369.949436]  ? blkg_lookup_slowpath+0x25/0x44
[  369.949962]  submit_bio_wait+0x53/0x7f
[  369.950469]  blkdev_issue_flush+0x8a/0xae
[  369.951032]  blkdev_fsync+0x2f/0x3a
[  369.951502]  do_fsync+0x2e/0x47
[  369.951887]  __x64_sys_fsync+0x10/0x13
[  369.952374]  do_syscall_64+0x89/0x149
[  369.952819]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  369.953492] RIP: 0033:0x7f95a1e729d4
[  369.953996] Code: Bad RIP value.
[  369.954456] RSP: 002b:00007ffdb570dd48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[  369.955506] RAX: ffffffffffffffda RBX: 000055c2139c6be0 RCX: 00007f95a1e729d4
[  369.956389] RDX: 0000000000000001 RSI: 0000000000001261 RDI: 0000000000000004
[  369.957325] RBP: 0000000000000002 R08: 0000000000000000 R09: 000055c2139c6ce0
[  369.958199] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c2139c0380
[  369.959143] R13: 0000000000000004 R14: 0000000000000100 R15: 0000000000000008

Cc: stable@vger.kernel.org
Cc: Paolo Valente &lt;paolo.valente@linaro.org&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: Fix null_blk_zoned creation failure with small number of zones</title>
<updated>2018-12-11T23:19:38+00:00</updated>
<author>
<name>Shin'ichiro Kawasaki</name>
<email>shinichiro.kawasaki@wdc.com</email>
</author>
<published>2018-12-11T12:08:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=927b6b2d69b4cc900fa50d7e46d8f1fa91c91b3a'/>
<id>urn:sha1:927b6b2d69b4cc900fa50d7e46d8f1fa91c91b3a</id>
<content type='text'>
null_blk_zoned creation fails if the number of zones specified is equal to or is
smaller than 64 due to a memory allocation failure in blk_alloc_zones(). With
such a small number of zones, the required memory size for all zones descriptors
fits in a single page, and the page order for alloc_pages_node() is zero. Allow
this value in blk_alloc_zones() for the allocation to succeed.

Fixes: bf5054569653 "block: Introduce blk_revalidate_disk_zones()"
Reviewed-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Shin'ichiro Kawasaki &lt;shinichiro.kawasaki@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block/bio: Do not zero user pages</title>
<updated>2018-12-10T20:37:20+00:00</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2018-12-10T15:44:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f55adad601c6a97c8c9628195453e0fb23b4a0ae'/>
<id>urn:sha1:f55adad601c6a97c8c9628195453e0fb23b4a0ae</id>
<content type='text'>
We don't need to zero fill the bio if not using kernel allocated pages.

Fixes: f3587d76da05 ("block: Clear kernel memory before copying to user") # v4.20-rc2
Reported-by: Todd Aiken &lt;taiken@mvtech.ca&gt;
Cc: Laurence Oberman &lt;loberman@redhat.com&gt;
Cc: stable@vger.kernel.org
Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Tested-by: Laurence Oberman &lt;loberman@redhat.com&gt;
Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: punt failed direct issue to dispatch list</title>
<updated>2018-12-07T15:16:11+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-12-07T05:17:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c616cbee97aed4bc6178f148a7240206dcdb85a6'/>
<id>urn:sha1:c616cbee97aed4bc6178f148a7240206dcdb85a6</id>
<content type='text'>
After the direct dispatch corruption fix, we permanently disallow direct
dispatch of non read/write requests. This works fine off the normal IO
path, as they will be retried like any other failed direct dispatch
request. But for the blk_insert_cloned_request() that only DM uses to
bypass the bottom level scheduler, we always first attempt direct
dispatch. For some types of requests, that's now a permanent failure,
and no amount of retrying will make that succeed. This results in a
livelock.

Instead of making special cases for what we can direct issue, and now
having to deal with DM solving the livelock while still retaining a BUSY
condition feedback loop, always just add a request that has been through
-&gt;queue_rq() to the hardware queue dispatch list. These are safe to use
as no merging can take place there. Additionally, if requests do have
prepped data from drivers, we aren't dependent on them not sharing space
in the request structure to safely add them to the IO scheduler lists.

This basically reverts ffe81d45322c and is based on a patch from Ming,
but with the list insert case covered as well.

Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
Cc: stable@vger.kernel.org
Suggested-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reported-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Tested-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block, bfq: fix decrement of num_active_groups</title>
<updated>2018-12-07T14:40:07+00:00</updated>
<author>
<name>Paolo Valente</name>
<email>paolo.valente@linaro.org</email>
</author>
<published>2018-12-06T18:18:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ba7aeae5539c7a7cccc4cf07a2bc61281a93c50e'/>
<id>urn:sha1:ba7aeae5539c7a7cccc4cf07a2bc61281a93c50e</id>
<content type='text'>
Since commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios
detection")', if there are process groups with I/O requests waiting for
completion, then BFQ tags the scenario as 'asymmetric'. This detection
is needed for preserving service guarantees (for details, see comments
on the computation * of the variable asymmetric_scenario in the
function bfq_better_to_idle).

Unfortunately, commit '2d29c9f89fcd ("block, bfq: improve asymmetric
scenarios detection")' contains an error exactly in the updating of
the number of groups with I/O requests waiting for completion: if a
group has more than one descendant process, then the above number of
groups, which is renamed from num_active_groups to a more appropriate
num_groups_with_pending_reqs by this commit, may happen to be wrongly
decremented multiple times, namely every time one of the descendant
processes gets all its pending I/O requests completed.

A correct, complete solution should work as follows. Consider a group
that is inactive, i.e., that has no descendant process with pending
I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
is still accounting for this group, because the group still has some
descendant process with some I/O request still in
flight. num_groups_with_pending_reqs should be decremented when the
in-flight request of the last descendant process is finally completed
(assuming that nothing else has changed for the group in the meantime,
in terms of composition of the group and active/inactive state of
child groups and processes). To accomplish this, an additional
pending-request counter must be added to entities, and must be
updated correctly.

To avoid this additional field and operations, this commit resorts to
the following tradeoff between simplicity and accuracy: for an
inactive group that is still counted in num_groups_with_pending_reqs,
this commit decrements num_groups_with_pending_reqs when the first
descendant process of the group remains with no request waiting for
completion.

This simplified scheme provides a fix to the unbalanced decrements
introduced by 2d29c9f89fcd. Since this error was also caused by lack
of comments on this non-trivial issue, this commit also adds related
comments.

Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")
Reported-by: Steven Barrett &lt;steven@liquorix.net&gt;
Tested-by: Steven Barrett &lt;steven@liquorix.net&gt;
Tested-by: Lucjan Lucjanov &lt;lucjan.lucjanov@gmail.com&gt;
Reviewed-by: Federico Motta &lt;federico@willer.it&gt;
Signed-off-by: Paolo Valente &lt;paolo.valente@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix corruption with direct issue</title>
<updated>2018-12-05T03:06:48+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-12-05T03:06:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ffe81d45322cc3cb140f0db080a4727ea284661e'/>
<id>urn:sha1:ffe81d45322cc3cb140f0db080a4727ea284661e</id>
<content type='text'>
If we attempt a direct issue to a SCSI device, and it returns BUSY, then
we queue the request up normally. However, the SCSI layer may have
already setup SG tables etc for this particular command. If we later
merge with this request, then the old tables are no longer valid. Once
we issue the IO, we only read/write the original part of the request,
not the new state of it.

This causes data corruption, and is most often noticed with the file
system complaining about the just read data being invalid:

[  235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)

because most of it is garbage...

This doesn't happen from the normal issue path, as we will simply defer
the request to the hardware queue dispatch list if we fail. Once it's on
the dispatch list, we never merge with it.

Fix this from the direct issue path by flagging the request as
REQ_NOMERGE so we don't change the size of it before issue.

See also:
  https://bugzilla.kernel.org/show_bug.cgi?id=201685

Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: fix single range discard merge</title>
<updated>2018-11-30T17:07:57+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2018-11-30T16:38:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2a5cf35cd6c56b2924bce103413ad3381bdc31fa'/>
<id>urn:sha1:2a5cf35cd6c56b2924bce103413ad3381bdc31fa</id>
<content type='text'>
There are actually two kinds of discard merge:

- one is the normal discard merge, just like normal read/write request,
and call it single-range discard

- another is the multi-range discard, queue_max_discard_segments(rq-&gt;q) &gt; 1

For the former case, queue_max_discard_segments(rq-&gt;q) is 1, and we
should handle this kind of discard merge like the normal read/write
request.

This patch fixes the following kernel panic issue[1], which is caused by
not removing the single-range discard request from elevator queue.

Guangwu has one raid discard test case, in which this issue is a bit
easier to trigger, and I verified that this patch can fix the kernel
panic issue in Guangwu's test case.

[1] kernel panic log from Jens's report

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
 PGD 0 P4D 0.
 Oops: 0000 [#1] SMP PTI
 CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \
4.20.0-rc3-00649-ge64d9a554a91-dirty #14  Hardware name: Wiwynn \
Leopard-Orv2/Leopard-DDR BW, BIOS LBM08   03/03/2017       Workqueue: kblockd \
blk_mq_run_work_fn                                            RIP: \
0010:blk_mq_get_driver_tag+0x81/0x120                                       Code: 24 \
10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \
0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 &lt;48&gt; 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \
f6 87 b0 00 00 00 02  RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246                     \
  RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8
 RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000
 R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300
 R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000
 FS:  0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  blk_mq_dispatch_rq_list+0xec/0x480
  ? elv_rb_del+0x11/0x30
  blk_mq_do_dispatch_sched+0x6e/0xf0
  blk_mq_sched_dispatch_requests+0xfa/0x170
  __blk_mq_run_hw_queue+0x5f/0xe0
  process_one_work+0x154/0x350
  worker_thread+0x46/0x3c0
  kthread+0xf5/0x130
  ? process_one_work+0x350/0x350
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x1f/0x30
 Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \
kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \
cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \
button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \
nvme_core fuse sg loop efivarfs autofs4  CR2: 0000000000000148                        \

 ---[ end trace 340a1fb996df1b9b ]---
 RIP: 0010:blk_mq_get_driver_tag+0x81/0x120
 Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \
00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 &lt;48&gt; 8b 87 48 01 00 00 8b 40 04 39 43 \
20 72 37 f6 87 b0 00 00 00 02

Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached")
Reported-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Guangwu Zhang &lt;guazhang@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Jianchao Wang &lt;jianchao.w.wang@oracle.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>SCSI: fix queue cleanup race before queue initialization is done</title>
<updated>2018-11-14T15:19:10+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2018-11-14T08:25:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a'/>
<id>urn:sha1:8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a</id>
<content type='text'>
c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has
already fixed this race, however the implied synchronize_rcu()
in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
performance regression.

Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
tried to quiesce queue for avoiding unnecessary synchronize_rcu()
only when queue initialization is done, because it is usual to see
lots of inexistent LUNs which need to be probed.

However, turns out it isn't safe to quiesce queue only when queue
initialization is done. Because when one SCSI command is completed,
the user of sending command can be waken up immediately, then the
scsi device may be removed, meantime the run queue in scsi_end_request()
is still in-progress, so kernel panic can be caused.

In Red Hat QE lab, there are several reports about this kind of kernel
panic triggered during kernel booting.

This patch tries to address the issue by grabing one queue usage
counter during freeing one request and the following run queue.

Fixes: 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
Cc: Andrew Jones &lt;drjones@redhat.com&gt;
Cc: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Cc: linux-scsi@vger.kernel.org
Cc: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: James E.J. Bottomley &lt;jejb@linux.vnet.ibm.com&gt;
Cc: stable &lt;stable@vger.kernel.org&gt;
Cc: jianchao.wang &lt;jianchao.w.wang@oracle.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
