kernel/linux.git/block, branch v4.0.7

blk-mq: free hctx->ctxs in queue's release handler

2015-06-23T00:03:36+00:00

commit c3b4afca7023b5aa0531912364246e67f79b3010 upstream. Now blk_cleanup_queue() can be called before calling del_gendisk()[1], inside which hctx->ctxs is touched from blk_mq_unregister_hctx(), but the variable has been freed by blk_cleanup_queue() at that time. So this patch moves freeing of hctx->ctxs into queue's release handler for fixing the oops reported by Stefan. [1], 6cd18e711dd8075 (block: destroy bdi before blockdev is unregistered) Reported-by: Stefan Seyfried Cc: NeilBrown Cc: Christoph Hellwig Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

block: discard bdi_unregister() in favour of bdi_destroy()

2015-06-23T00:03:29+00:00

commit aad653a0bc09dd4ebcb5579f9f835bbae9ef2ba3 upstream. bdi_unregister() now contains very little functionality. It contains a "WARN_ON" if bdi->dev is NULL. This warning is of no real consequence as bdi->dev isn't needed by anything else in the function, and it triggers if blk_cleanup_queue() -> bdi_destroy() is called before bdi_unregister, which happens since Commit: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.") So this isn't wanted. It also calls bdi_set_min_ratio(). This needs to be called after writes through the bdi have all been flushed, and before the bdi is destroyed. Calling it early is better than calling it late as it frees up a global resource. Calling it immediately after bdi_wb_shutdown() in bdi_destroy() perfectly fits these requirements. So bdi_unregister() can be discarded with the important content moved to bdi_destroy(), as can the writeback_bdi_unregister event which is already not used. Reported-by: Mike Snitzer Fixes: c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info") Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.") Acked-by: Peter Zijlstra (Intel) Acked-by: Dan Williams Tested-by: Nicholas Moulin Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

block: fix ext_dev_lock lockdep report

2015-06-23T00:03:28+00:00

commit 4d66e5e9b6d720d8463e11d027bd4ad91c8b1318 upstream. ================================= [ INFO: inconsistent lock state ] 4.1.0-rc7+ #217 Tainted: G O --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (ext_devt_lock){+.?...}, at: [] blk_free_devt+0x3c/0x70 {SOFTIRQ-ON-W} state was registered at: [] __lock_acquire+0x461/0x1e70 [] lock_acquire+0xb7/0x290 [] _raw_spin_lock+0x38/0x50 [] blk_alloc_devt+0x6d/0xd0 <-- take the lock in process context [..] [] __lock_acquire+0x3fe/0x1e70 [] ? __lock_acquire+0xe5d/0x1e70 [] lock_acquire+0xb7/0x290 [] ? blk_free_devt+0x3c/0x70 [] _raw_spin_lock+0x38/0x50 [] ? blk_free_devt+0x3c/0x70 [] blk_free_devt+0x3c/0x70 <-- take the lock in softirq [] part_release+0x1c/0x50 [] device_release+0x36/0xb0 [] kobject_cleanup+0x7b/0x1a0 [] kobject_put+0x30/0x70 [] put_device+0x17/0x20 [] delete_partition_rcu_cb+0x16c/0x180 [] ? read_dev_sector+0xa0/0xa0 [] rcu_process_callbacks+0x2ff/0xa90 [] ? rcu_process_callbacks+0x2bf/0xa90 [] __do_softirq+0xde/0x600 Neil sees this in his tests and it also triggers on pmem driver unbind for the libnvdimm tests. This fix is on top of an initial fix by Keith for incorrect usage of mutex_lock() in this path: 2da78092dda1 "block: Fix dev_t minor allocation lifetime". Both this and 2da78092dda1 are candidates for -stable. Fixes: 2da78092dda1 ("block: Fix dev_t minor allocation lifetime") Cc: Keith Busch Reported-by: NeilBrown Signed-off-by: Dan Williams Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

blk-mq: fix CPU hotplug handling

2015-05-17T16:55:07+00:00

commit 2a34c0872adf252f23a6fef2d051a169ac796cef upstream. hctx->tags has to be set as NULL in case that it is to be unmapped no matter if set->tags[hctx->queue_num] is NULL or not in blk_mq_map_swqueue() because shared tags can be freed already from another request queue. The same situation has to be considered during handling CPU online too. Unmapped hw queue can be remapped after CPU topo is changed, so we need to allocate tags for the hw queue in blk_mq_map_swqueue(). Then tags allocation for hw queue can be removed in hctx cpu online notifier, and it is reasonable to do that after mapping is updated. Reported-by: Dongsu Park Tested-by: Dongsu Park Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

blk-mq: fix race between timeout and CPU hotplug

2015-05-17T16:55:07+00:00

commit f054b56c951bf1731ba7314a4c7f1cc0b2977cc9 upstream. Firstly during CPU hotplug, even queue is freezed, timeout handler still may come and access hctx->tags, which may cause use after free, so this patch deactivates timeout handler inside CPU hotplug notifier. Secondly, tags can be shared by more than one queues, so we have to check if the hctx has been unmapped, otherwise still use-after-free on tags can be triggered. Reported-by: Dongsu Park Tested-by: Dongsu Park Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

block: destroy bdi before blockdev is unregistered.

2015-05-17T16:55:07+00:00

commit 6cd18e711dd8075da9d78cfc1239f912ff28968a upstream. Because of the peculiar way that md devices are created (automatically when the device node is opened), a new device can be created and registered immediately after the blk_unregister_region(disk_devt(disk), disk->minors); call in del_gendisk(). Therefore it is important that all visible artifacts of the previous device are removed before this call. In particular, the 'bdi'. Since: commit c4db59d31e39ea067c32163ac961e9c80198fd37 Author: Christoph Hellwig fs: don't reassign dirty inodes to default_backing_dev_info moved the device_unregister(bdi->dev); call from bdi_unregister() to bdi_destroy() it has been quite easy to lose a race and have a new (e.g.) "md127" be created after the blk_unregister_region() call and before bdi_destroy() is ultimately called by the final 'put_disk', which must come after del_gendisk(). The new device finds that the bdi name is already registered in sysfs and complains > [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70() > [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127' We can fix this by moving the bdi_destroy() call out of blk_release_queue() (which can happen very late when a refcount reaches zero) and into blk_cleanup_queue() - which happens exactly when the md device driver calls it. Then it is only necessary for md to call blk_cleanup_queue() before del_gendisk(). As loop.c devices are also created on demand by opening the device node, we make the same change there. Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37 Reported-by: Azat Khuzhin Cc: Christoph Hellwig Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

blk-mq: initialize 'struct request' and associated data to zero

2015-04-11T20:42:16+00:00

Jan Engelhardt reports a strange oops with an invalid ->sense_buffer pointer in scsi_init_cmd_errh() with the blk-mq code. The sense_buffer pointer should have been initialized by the call to scsi_init_request() from blk_mq_init_rq_map(), but there seems to be some non-repeatable memory corruptor. This patch makes sure we initialize the whole struct request allocation (and the associated 'struct scsi_cmnd' for the SCSI case) to zero, by using __GFP_ZERO in the allocation. The old code initialized a couple of individual fields, leaving the rest undefined (although many of them are then initialized in later phases, like blk_mq_rq_ctx_init() etc. It's not entirely clear why this matters, but it's the rigth thing to do regardless, and with 4.0 imminent this is the defensive "let's just make sure everything is initialized properly" patch. Tested-by: Jan Engelhardt Acked-by: Jens Axboe Signed-off-by: Linus Torvalds

block: fix blk_stack_limits() regression due to lcm() change

2015-03-31T15:45:50+00:00

Linux 3.19 commit 69c953c ("lib/lcm.c: lcm(n,0)=lcm(0,n) is 0, not n") caused blk_stack_limits() to not properly stack queue_limits for stacked devices (e.g. DM). Fix this regression by establishing lcm_not_zero() and switching blk_stack_limits() over to using it. DM uses blk_set_stacking_limits() to establish the initial top-level queue_limits that are then built up based on underlying devices' limits using blk_stack_limits(). In the case of optimal_io_size (io_opt) blk_set_stacking_limits() establishes a default value of 0. With commit 69c953c, lcm(0, n) is no longer n, which compromises proper stacking of the underlying devices' io_opt. Test: $ modprobe scsi_debug dev_size_mb=10 num_tgts=1 opt_blks=1536 $ cat /sys/block/sde/queue/optimal_io_size 786432 $ dmsetup create node --table "0 100 linear /dev/sde 0" Before this fix: $ cat /sys/block/dm-5/queue/optimal_io_size 0 After this fix: $ cat /sys/block/dm-5/queue/optimal_io_size 786432 Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org # 3.19+ Acked-by: Martin K. Petersen Signed-off-by: Jens Axboe

Fix bug in blk_rq_merge_ok

2015-03-20T14:50:41+00:00

Use the right array index to reference the last element of rq->biotail->bi_io_vec[] Signed-off-by: Wenbo Wang Reviewed-by: Chong Yuan Fixes: 66cb45aa41315 ("block: add support for limiting gaps in SG lists") Cc: stable@kernel.org Signed-off-by: Jens Axboe

blkmq: Fix NULL pointer deref when all reserved tags in

2015-03-18T23:06:18+00:00

When allocating from the reserved tags pool, bt_get() is called with a NULL hctx. If all tags are in use, the hw queue is kicked to push out any pending IO, potentially freeing tags, and tag allocation is retried. The problem is that blk_mq_run_hw_queue() doesn't check for a NULL hctx. So we avoid it with a simple NULL hctx test. Tested by hammering mtip32xx with concurrent smartctl/hdparm. Signed-off-by: Sam Bradshaw Signed-off-by: Selvan Mani Fixes: b32232073e80 ("blk-mq: fix hang in bt_get()") Cc: stable@kernel.org Added appropriate comment. Signed-off-by: Jens Axboe