kernel/linux.git/block, branch v5.4.61

iocost: Fix check condition of iocg abs_vdebt

2020-08-19T06:15:58+00:00

[ Upstream commit d9012a59db54442d5b2fcfdfcded35cf566397d3 ] We shouldn't skip iocg when its abs_vdebt is not zero. Fixes: 0b80f9866e6b ("iocost: protect iocg->abs_vdebt with iocg->waitq.lock") Signed-off-by: Chengming Zhou Acked-by: Tejun Heo Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block: fix get_max_segment_size() overflow on 32bit arch

2020-07-22T07:33:17+00:00

commit 4a2f704eb2d831a2d73d7f4cdd54f45c49c3c353 upstream. Commit 429120f3df2d starts to take account of segment's start dma address when computing max segment size, and data type of 'unsigned long' is used to do that. However, the segment mask may be 0xffffffff, so the figured out segment size may be overflowed in case of zero physical address on 32bit arch. Fix the issue by returning queue_max_segment_size() directly when that happens. Fixes: 429120f3df2d ("block: fix splitting segments on boundary masks") Reported-by: Guenter Roeck Tested-by: Guenter Roeck Cc: Christoph Hellwig Tested-by: Steven Rostedt (VMware) Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

block: fix splitting segments on boundary masks

2020-07-22T07:33:17+00:00

commit 429120f3df2dba2bf3a4a19f4212a53ecefc7102 upstream. We ran into a problem with a mpt3sas based controller, where we would see random (and hard to reproduce) file corruption). The issue seemed specific to this controller, but wasn't specific to the file system. After a lot of debugging, we find out that it's caused by segments spanning a 4G memory boundary. This shouldn't happen, as the default setting for segment boundary masks is 4G. Turns out there are two issues in get_max_segment_size(): 1) The default segment boundary mask is bypassed 2) The segment start address isn't taken into account when checking segment boundary limit Fix these two issues by removing the bypass of the segment boundary check even if the mask is set to the default value, and taking into account the actual start address of the request when checking if a segment needs splitting. Cc: stable@vger.kernel.org # v5.1+ Reviewed-by: Chris Mason Tested-by: Chris Mason Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count") Signed-off-by: Ming Lei Signed-off-by: Greg Kroah-Hartman Dropped const on the page pointer, ppc page_to_phys() doesn't mark the page as const... Signed-off-by: Jens Axboe

blk-mq-debugfs: update blk_queue_flag_name[] accordingly for new flags

2020-07-22T07:32:52+00:00

[ Upstream commit bfe373f608cf81b7626dfeb904001b0e867c5110 ] Else there may be magic numbers in /sys/kernel/debug/block/*/state. Signed-off-by: Hou Tao Reviewed-by: Bart Van Assche Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

blk-mq: consider non-idle request as "inflight" in blk_mq_rq_inflight()

2020-07-16T06:16:47+00:00

commit 05a4fed69ff00a8bd83538684cb602a4636b07a7 upstream. dm-multipath is the only user of blk_mq_queue_inflight(). When dm-multipath calls blk_mq_queue_inflight() to check if it has outstanding IO it can get a false negative. The reason for this is blk_mq_rq_inflight() doesn't consider requests that are no longer MQ_RQ_IN_FLIGHT but that are now MQ_RQ_COMPLETE (->complete isn't called or finished yet) as "inflight". This causes request-based dm-multipath's dm_wait_for_completion() to return before all outstanding dm-multipath requests have actually completed. This breaks DM multipath's suspend functionality because blk-mq requests complete after DM's suspend has finished -- which shouldn't happen. Fix this by considering any request not in the MQ_RQ_IDLE state (so either MQ_RQ_COMPLETE or MQ_RQ_IN_FLIGHT) as "inflight" in blk_mq_rq_inflight(). Fixes: 3c94d83cb3526 ("blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()") Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

block: release bip in a right way in error path

2020-07-16T06:16:36+00:00

[ Upstream commit 0b8eb629a700c0ef15a437758db8255f8444e76c ] Release bip using kfree() in error path when that was allocated by kmalloc(). Signed-off-by: Chengguang Xu Reviewed-by: Christoph Hellwig Acked-by: Martin K. Petersen Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block: update hctx map when use multiple maps

2020-06-30T19:37:06+00:00

[ Upstream commit fe35ec58f0d339221643287bbb7cee15c93a5389 ] There is an issue when tune the number for read and write queues, if the total queue count was not changed. The hctx->type cannot be updated, since __blk_mq_update_nr_hw_queues will return directly if the total queue count has not been changed. Reproduce: dmesg | grep "default/read/poll" [ 2.607459] nvme nvme0: 48/0/0 default/read/poll queues cat /sys/kernel/debug/block/nvme0n1/hctx*/type | sort | uniq -c 48 default tune the write queues to 24: echo 24 > /sys/module/nvme/parameters/write_queues echo 1 > /sys/block/nvme0n1/device/reset_controller dmesg | grep "default/read/poll" [ 433.547235] nvme nvme0: 24/24/0 default/read/poll queues cat /sys/kernel/debug/block/nvme0n1/hctx*/type | sort | uniq -c 48 default The driver's hardware queue mapping is not same as block layer. Signed-off-by: Weiping Zhang Reviewed-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed

2020-06-30T19:36:43+00:00

commit a75ca9303175d36af93c0937dd9b1a6422908b8d upstream. commit e7bf90e5afe3 ("block/bio-integrity: fix a memory leak bug") added a kfree() for 'buf' if bio_integrity_add_page() returns '0'. However, the object will be freed in bio_integrity_free() since 'bio->bi_opf' and 'bio->bi_integrity' were set previousy in bio_integrity_alloc(). Fixes: commit e7bf90e5afe3 ("block/bio-integrity: fix a memory leak bug") Signed-off-by: yu kuai Reviewed-by: Ming Lei Reviewed-by: Bob Liu Acked-by: Martin K. Petersen Signed-off-by: Jens Axboe Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman

iocost: don't let vrate run wild while there's no saturation signal

2020-06-22T07:31:06+00:00

[ Upstream commit 81ca627a933063fa63a6d4c66425de822a2ab7f5 ] When the QoS targets are met and nothing is being throttled, there's no way to tell how saturated the underlying device is - it could be almost entirely idle, at the cusp of saturation or anywhere inbetween. Given that there's no information, it's best to keep vrate as-is in this state. Before 7cd806a9a953 ("iocost: improve nr_lagging handling"), this was the case - if the device isn't missing QoS targets and nothing is being throttled, busy_level was reset to zero. While fixing nr_lagging handling, 7cd806a9a953 ("iocost: improve nr_lagging handling") broke this. Now, while the device is hitting QoS targets and nothing is being throttled, vrate keeps getting adjusted according to the existing busy_level. This led to vrate keeping climing till it hits max when there's an IO issuer with limited request concurrency if the vrate started low. vrate starts getting adjusted upwards until the issuer can issue IOs w/o being throttled. From then on, QoS targets keeps getting met and nothing on the system needs throttling and vrate keeps getting increased due to the existing busy_level. This patch makes the following changes to the busy_level logic. * Reset busy_level if nr_shortages is zero to avoid the above scenario. * Make non-zero nr_lagging block lowering nr_level but still clear positive busy_level if there's clear non-saturation signal - QoS targets are met and nr_shortages is non-zero. nr_lagging's role is preventing adjusting vrate upwards while there are long-running commands and it shouldn't keep busy_level positive while there's clear non-saturation signal. * Restructure code for clarity and add comments. Signed-off-by: Tejun Heo Reported-by: Andy Newell Fixes: 7cd806a9a953 ("iocost: improve nr_lagging handling") Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block: reset mapping if failed to update hardware queue count

2020-06-22T07:30:53+00:00

[ Upstream commit aa880ad690ab6d4c53934af85fb5a43e69ecb0f5 ] When we increase hardware queue count, blk_mq_update_queue_map will reset the mapping between cpu and hardware queue base on the hardware queue count(set->nr_hw_queues). The mapping cannot be reset if it encounters error in blk_mq_realloc_hw_ctxs, but the fallback flow will continue using it, then blk_mq_map_swqueue will touch a invalid memory, because the mapping points to a wrong hctx. blktest block/030: null_blk: module loaded Increasing nr_hw_queues to 8 fails, fallback to 1 ================================================================== BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x2f2/0x830 Read of size 8 at addr 0000000000000128 by task nproc/8541 CPU: 5 PID: 8541 Comm: nproc Not tainted 5.7.0-rc4-dbg+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack+0xa5/0xe6 __kasan_report.cold+0x65/0xbb kasan_report+0x45/0x60 check_memory_region+0x15e/0x1c0 __kasan_check_read+0x15/0x20 blk_mq_map_swqueue+0x2f2/0x830 __blk_mq_update_nr_hw_queues+0x3df/0x690 blk_mq_update_nr_hw_queues+0x32/0x50 nullb_device_submit_queues_store+0xde/0x160 [null_blk] configfs_write_file+0x1c4/0x250 [configfs] __vfs_write+0x4c/0x90 vfs_write+0x14b/0x2d0 ksys_write+0xdd/0x180 __x64_sys_write+0x47/0x50 do_syscall_64+0x6f/0x310 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Signed-off-by: Weiping Zhang Tested-by: Bart van Assche Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin