kernel/linux.git/block, branch v5.2.20

block: fix null pointer dereference in blk_mq_rq_timed_out()

2019-10-05T11:14:12+00:00

commit 8d6996630c03d7ceeabe2611378fea5ca1c3f1b3 upstream. We got a null pointer deference BUG_ON in blk_mq_rq_timed_out() as following: [ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040 [ 108.827059] PGD 0 P4D 0 [ 108.827313] Oops: 0000 [#1] SMP PTI [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431 [ 108.829503] Workqueue: kblockd blk_mq_timeout_work [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330 [ 108.838191] Call Trace: [ 108.838406] bt_iter+0x74/0x80 [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450 [ 108.839074] ? __switch_to_asm+0x34/0x70 [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40 [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40 [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f [ 108.840732] blk_mq_timeout_work+0x74/0x200 [ 108.841151] process_one_work+0x297/0x680 [ 108.841550] worker_thread+0x29c/0x6f0 [ 108.841926] ? rescuer_thread+0x580/0x580 [ 108.842344] kthread+0x16a/0x1a0 [ 108.842666] ? kthread_flush_work+0x170/0x170 [ 108.843100] ret_from_fork+0x35/0x40 The bug is caused by the race between timeout handle and completion for flush request. When timeout handle function blk_mq_rq_timed_out() try to read 'req->q->mq_ops', the 'req' have completed and reinitiated by next flush request, which would call blk_rq_init() to clear 'req' as 0. After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"), normal requests lifetime are protected by refcount. Until 'rq->ref' drop to zero, the request can really be free. Thus, these requests cannot been reused before timeout handle finish. However, flush request has defined .end_io and rq->end_io() is still called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq' can be reused by the next flush request handle, resulting in null pointer deference BUG ON. We fix this problem by covering flush request with 'rq->ref'. If the refcount is not zero, flush_end_io() return and wait the last holder recall it. To record the request status, we add a new entry 'rq_status', which will be used in flush_end_io(). Cc: Christoph Hellwig Cc: Keith Busch Cc: Bart Van Assche Cc: stable@vger.kernel.org # v4.18+ Reviewed-by: Ming Lei Reviewed-by: Bob Liu Signed-off-by: Yufen Yu Signed-off-by: Greg Kroah-Hartman ------- v2: - move rq_status from struct request to struct blk_flush_queue v3: - remove unnecessary '{}' pair. v4: - let spinlock to protect 'fq->rq_status' v5: - move rq_status after flush_running_idx member of struct blk_flush_queue Signed-off-by: Jens Axboe

block: mq-deadline: Fix queue restart handling

2019-10-05T11:14:12+00:00

commit cb8acabbe33b110157955a7425ee876fb81e6bbc upstream. Commit 7211aef86f79 ("block: mq-deadline: Fix write completion handling") added a call to blk_mq_sched_mark_restart_hctx() in dd_dispatch_request() to make sure that write request dispatching does not stall when all target zones are locked. This fix left a subtle race when a write completion happens during a dispatch execution on another CPU: CPU 0: Dispatch CPU1: write completion dd_dispatch_request() lock(&dd->lock); ... lock(&dd->zone_lock); dd_finish_request() rq = find request lock(&dd->zone_lock); unlock(&dd->zone_lock); zone write unlock unlock(&dd->zone_lock); ... __blk_mq_free_request check restart flag (not set) -> queue not run ... if (!rq && have writes) blk_mq_sched_mark_restart_hctx() unlock(&dd->lock) Since the dispatch context finishes after the write request completion handling, marking the queue as needing a restart is not seen from __blk_mq_free_request() and blk_mq_sched_restart() not executed leading to the dispatch stall under 100% write workloads. Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from dd_dispatch_request() into dd_finish_request() under the zone lock to ensure full mutual exclusion between write request dispatch selection and zone unlock on write request completion. Fixes: 7211aef86f79 ("block: mq-deadline: Fix write completion handling") Cc: stable@vger.kernel.org Reported-by: Hans Holmberg Reviewed-by: Hans Holmberg Reviewed-by: Christoph Hellwig Signed-off-by: Damien Le Moal Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

block: make rq sector size accessible for block stats

2019-10-05T11:13:55+00:00

[ Upstream commit 3d24430694077313c75c6b89f618db09943621e4 ] Currently rq->data_len will be decreased by partial completion or zeroed by completion, so when blk_stat_add() is invoked, data_len will be zero and there will never be samples in poll_cb because blk_mq_poll_stats_bkt() will return -1 if data_len is zero. We could move blk_stat_add() back to __blk_mq_complete_request(), but that would make the effort of trying to call ktime_get_ns() once in vain. Instead we can reuse throtl_size field, and use it for both block stats and block throttle, and adjust the logic in blk_mq_poll_stats_bkt() accordingly. Fixes: 4bc6339a583c ("block: move blk_stat_add() to __blk_mq_end_request()") Tested-by: Pavel Begunkov Signed-off-by: Hou Tao Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

blk-mq: Fix memory leak in blk_mq_init_allocated_queue error handling

2019-10-05T11:13:38+00:00

[ Upstream commit 73d9c8d4c0017e21e1ff519474ceb1450484dc9a ] If blk_mq_init_allocated_queue->elevator_init_mq fails, need to release the previously requested resources. Fixes: d34849913819 ("blk-mq-sched: allow setting of default IO scheduler") Signed-off-by: zhengbin Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block, bfq: handle NULL return value by bfq_init_rq()

2019-08-29T06:30:18+00:00

[ Upstream commit fd03177c33b287c6541f4048f1d67b7b45a1abc9 ] As reported in [1], the call bfq_init_rq(rq) may return NULL in case of OOM (in particular, if rq->elv.icq is NULL because memory allocation failed in failed in ioc_create_icq()). This commit handles this circumstance. [1] https://lkml.org/lkml/2019/7/22/824 Cc: Hsin-Yi Wang Cc: Nicolas Boichat Cc: Doug Anderson Reported-by: Guenter Roeck Reported-by: Hsin-Yi Wang Reviewed-by: Guenter Roeck Signed-off-by: Paolo Valente Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

blk-mq: move cancel of requeue_work to the front of blk_exit_queue

2019-08-25T14:10:24+00:00

commit e26cc08265dda37d2acc8394604f220ef412299d upstream. blk_exit_queue will free elevator_data, while blk_mq_requeue_work will access it. Move cancel of requeue_work to the front of blk_exit_queue to avoid use-after-free. blk_exit_queue blk_mq_requeue_work __elevator_exit blk_mq_run_hw_queues blk_mq_exit_sched blk_mq_run_hw_queue dd_exit_queue blk_mq_hctx_has_pending kfree(elevator_data) blk_mq_sched_has_work dd_has_work Fixes: fbc2a15e3433 ("blk-mq: move cancel of requeue_work into blk_mq_release") Cc: stable@vger.kernel.org Reviewed-by: Ming Lei Signed-off-by: zhengbin Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

rq-qos: use a mb for got_token

2019-08-16T08:11:00+00:00

[ Upstream commit ac38297f7038cd5b80d66f8809c7bbf5b70031f3 ] Oleg noticed that our checking of data.got_token is unsafe in the cleanup case, and should really use a memory barrier. Use a wmb on the write side, and a rmb() on the read side. We don't need one in the main loop since we're saved by set_current_state(). Reviewed-by: Oleg Nesterov Signed-off-by: Josef Bacik Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule

2019-08-16T08:11:00+00:00

[ Upstream commit d14a9b389a86a5154b704bc88ce8dd37c701456a ] In case we get a spurious wakeup we need to make sure to re-set ourselves to TASK_UNINTERRUPTIBLE so we don't busy wait. Reviewed-by: Oleg Nesterov Signed-off-by: Josef Bacik Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

rq-qos: don't reset has_sleepers on spurious wakeups

2019-08-16T08:11:00+00:00

[ Upstream commit 64e7ea875ef63b2801be7954cf7257d1bfccc266 ] If we raced with somebody else getting an inflight counter we could fail to get an inflight counter with no sleepers on the list, and thus need to go to sleep. In this case has_sleepers should be true because we are now relying on the waker to get our inflight counter for us. And in the case of spurious wakeups we'd still want this to be the case. So set has_sleepers to true if we went to sleep to make sure we're woken up the proper way. Reviewed-by: Oleg Nesterov Signed-off-by: Josef Bacik Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block/bio-integrity: fix a memory leak bug

2019-07-31T05:24:53+00:00

[ Upstream commit e7bf90e5afe3aa1d1282c1635a49e17a32c4ecec ] In bio_integrity_prep(), a kernel buffer is allocated through kmalloc() to hold integrity metadata. Later on, the buffer will be attached to the bio structure through bio_integrity_add_page(), which returns the number of bytes of integrity metadata attached. Due to unexpected situations, bio_integrity_add_page() may return 0. As a result, bio_integrity_prep() needs to be terminated with 'false' returned to indicate this error. However, the allocated kernel buffer is not freed on this execution path, leading to a memory leak. To fix this issue, free the allocated buffer before returning from bio_integrity_prep(). Reviewed-by: Ming Lei Acked-by: Martin K. Petersen Signed-off-by: Wenwen Wang Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin