summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)AuthorFilesLines
2016-03-11dm thin metadata: don't issue prefetches if a transaction abort has failedJoe Thornber1-1/+4
If a transaction abort has failed then we can no longer use the metadata device. Typically this happens if the superblock is unreadable. This fix addresses a crash seen during metadata device failure testing. Fixes: 8a01a6af75 ("dm thin: prefetch missing metadata pages") Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-11dm snapshot: disallow the COW and origin devices from being identicalDingXiang2-12/+33
Otherwise loading a "snapshot" table using the same device for the origin and COW devices, e.g.: echo "0 20971520 snapshot 253:3 253:3 P 8" | dmsetup create snap will trigger: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [ 1958.979934] IP: [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot] [ 1958.989655] PGD 0 [ 1958.991903] Oops: 0000 [#1] SMP ... [ 1959.059647] CPU: 9 PID: 3556 Comm: dmsetup Tainted: G IO 4.5.0-rc5.snitm+ #150 ... [ 1959.083517] task: ffff8800b9660c80 ti: ffff88032a954000 task.ti: ffff88032a954000 [ 1959.091865] RIP: 0010:[<ffffffffa040efba>] [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot] [ 1959.104295] RSP: 0018:ffff88032a957b30 EFLAGS: 00010246 [ 1959.110219] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001 [ 1959.118180] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880329334a00 [ 1959.126141] RBP: ffff88032a957b50 R08: 0000000000000000 R09: 0000000000000001 [ 1959.134102] R10: 000000000000000a R11: f000000000000000 R12: ffff880330884d80 [ 1959.142061] R13: 0000000000000008 R14: ffffc90001c13088 R15: ffff880330884d80 [ 1959.150021] FS: 00007f8926ba3840(0000) GS:ffff880333440000(0000) knlGS:0000000000000000 [ 1959.159047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1959.165456] CR2: 0000000000000098 CR3: 000000032f48b000 CR4: 00000000000006e0 [ 1959.173415] Stack: [ 1959.175656] ffffc90001c13040 ffff880329334a00 ffff880330884ed0 ffff88032a957bdc [ 1959.183946] ffff88032a957bb8 ffffffffa040f225 ffff880329334a30 ffff880300000000 [ 1959.192233] ffffffffa04133e0 ffff880329334b30 0000000830884d58 00000000569c58cf [ 1959.200521] Call Trace: [ 1959.203248] [<ffffffffa040f225>] dm_exception_store_create+0x1d5/0x240 [dm_snapshot] [ 1959.211986] [<ffffffffa040d310>] snapshot_ctr+0x140/0x630 [dm_snapshot] [ 1959.219469] [<ffffffffa0005c44>] ? dm_split_args+0x64/0x150 [dm_mod] [ 1959.226656] [<ffffffffa0005ea7>] dm_table_add_target+0x177/0x440 [dm_mod] [ 1959.234328] [<ffffffffa0009203>] table_load+0x143/0x370 [dm_mod] [ 1959.241129] [<ffffffffa00090c0>] ? retrieve_status+0x1b0/0x1b0 [dm_mod] [ 1959.248607] [<ffffffffa0009e35>] ctl_ioctl+0x255/0x4d0 [dm_mod] [ 1959.255307] [<ffffffff813304e2>] ? memzero_explicit+0x12/0x20 [ 1959.261816] [<ffffffffa000a0c3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 1959.268615] [<ffffffff81215eb6>] do_vfs_ioctl+0xa6/0x5c0 [ 1959.274637] [<ffffffff81120d2f>] ? __audit_syscall_entry+0xaf/0x100 [ 1959.281726] [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70 [ 1959.288814] [<ffffffff81216449>] SyS_ioctl+0x79/0x90 [ 1959.294450] [<ffffffff8167e4ae>] entry_SYSCALL_64_fastpath+0x12/0x71 ... [ 1959.323277] RIP [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot] [ 1959.333090] RSP <ffff88032a957b30> [ 1959.336978] CR2: 0000000000000098 [ 1959.344121] ---[ end trace b049991ccad1169e ]--- Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1195899 Cc: stable@vger.kernel.org Signed-off-by: Ding Xiang <dingxiang@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-11dm cache: make the 'mq' policy an alias for 'smq'Joe Thornber4-1492/+85
smq seems to be performing better than the old mq policy in all situations, as well as using a quarter of the memory. Make 'mq' an alias for 'smq' when choosing a cache policy. The tunables that were present for the old mq are faked, and have no effect. mq should be considered deprecated now. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-11dm: drop unnecessary assignment of md->queueBob Liu1-6/+1
md->queue and q are the same thing in dm_old_init_request_queue() and dm_mq_init_request_queue(). Also drop the temporary 'struct request_queue *q' in dm_old_init_request_queue(). Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-11dm: reorder 'struct mapped_device' members to fix alignment and holesMike Snitzer1-16/+18
Saves 16 bytes by eliminating 4 4byte holes but more importantly: numerous members that crossed cachelines were fixed. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-11dm: remove dummy definition of 'struct dm_table'Mike Snitzer1-11/+3
Change the map pointer in 'struct mapped_device' from 'struct dm_table __rcu *' to 'void __rcu *' to avoid the need for the dummy definition. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-11dm: add 'dm_numa_node' module parameterMike Snitzer1-8/+44
Allows user to control which NUMA node the memory for DM device structures (e.g. mapped_device, request_queue, gendisk, blk_mq_tag_set) is allocated from. Defaults to NUMA_NO_NODE (-1). Allowable range is from -1 until the last online NUMA node id. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-11dm thin metadata: remove needless newline from subtree_dec() DMERR messageMike Snitzer1-1/+1
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-11dm mpath: cleanup reinstate_path() et al based on code reviewMike Snitzer1-6/+2
fail_path() will print a "Failing path ..." message but reinstate_path() doesn't print a "Reinstating path ...". Add that message to reinstate_path() to add symmetry and aid system debugging. Remove reinstate_path()'s check for the path_selector providing .reinstate_path hook. All path selectors provide this and any future ones must too. activate_path() calls pg_init_done() with SCSI_DH_DEV_OFFLINED but pg_init_done() doesn't expicitly handle it in its swicth statement. Add SCSI_DH_DEV_OFFLINED to the default case. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm mpath: remove __pgpath_busy forward declaration, rename to pgpath_busyMike Snitzer1-4/+2
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm mpath: switch from 'unsigned' to 'bool' for flags where appropriateMike Snitzer1-50/+51
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm round robin: use percpu 'repeat_count' and 'current_path'Mike Snitzer1-12/+48
Now that dm-mpath core is lockless in the per-IO fast path it is critical, for performance, to have the .select_path hook (rr_select_path) also be as lockless as possible. The new percpu members of 'struct selector' allow for lockless support of 'repeat_count' governed repeat use of a previously selected path. If a path fails while it is 'current_path' the worst case is concurrent IO might be mapped to the failed path until the .fail_path hook (rr_fail_path) is called. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm path selector: remove 'repeat_count' return from .select_path hookMike Snitzer5-18/+4
If a path selector has any use for a repeat_count it should be handled locally and not depend on the dm-mpath core to be concerned with it. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm mpath: push path selector locking down to path selectorsMike Snitzer3-11/+59
Proper locking of the lists used by the path selectors should be handled within the selectors (relying on dm-mpath.c code's use of the m->lock spinlock was reckless). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm mpath: remove repeat_count support from multipath coreMike Snitzer4-10/+24
Preparation for making __multipath_map() avoid taking the m->lock spinlock -- in favor of using RCU locking. repeat_count was primarily for bio-based DM multipath's benefit. There is really no need for it anymore now that DM multipath is request-based. As such, repeat_count > 1 is no longer honored and a warning is displayed if the user attempts to use a value > 1. This is a temporary change for the round-robin path-selector (as a later commit will restore its support for repeat_count > 1). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm mpath: remove unnecessary casts in front of ti->privateMike Snitzer1-5/+5
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm mpath: use blk_mq_alloc_request() and blk_mq_free_request() directlyMike Snitzer1-3/+4
There isn't any need to support both old .request_fn and blk-mq paths in the blk-mq specific portion of __multipath_map(). Call blk_mq_alloc_request() directly rather than use blk_get_request(). Similarly, call blk_mq_free_request(), rather than blk_put_request(), in multipath_release_clone(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm mpath: cleanup 'struct dm_mpath_io' management codeMike Snitzer1-12/+17
Refactor and rename existing interfaces to be more specific and self-documenting. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm mpath: use blk-mq pdu for per-request 'struct dm_mpath_io'Mike Snitzer1-10/+29
Allow the multipath target to avoid making small allocations for each 'struct dm_mpath_io' that is needed for each request. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm: allow immutable request-based targets to use blk-mq pduMike Snitzer3-9/+28
This will allow DM multipath to use a portion of the blk-mq pdu space for target data (e.g. struct dm_mpath_io). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm: rename target's per_bio_data_size to per_io_data_sizeMike Snitzer12-22/+22
Request-based DM will also make use of per_bio_data_size. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm: distinquish old .request_fn (dm-old) vs dm-mq request-based DMMike Snitzer2-50/+58
Rename various methods to have either a "dm_old" or "dm_mq" prefix. Improve code comments to assist with understanding the duality of code that handles both "dm_old" and "dm_mq" cases. It is no much easier to quickly look at the code and _know_ that a given method is either 1) "dm_old" only 2) "dm_mq" only 3) common to both. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm: remove support for stacking dm-mq on .request_fn device(s)Mike Snitzer2-40/+20
Remove all fiddley code that propped up this support for a blk-mq request-queue ontop of all .request_fn devices. Testing has proven this niche request-based dm-mq mode to be buggy, when testing fault tolerance with DM multipath, and there is no point trying to preserve it. Should help improve efficiency of pure dm-mq code and make code maintenance less delicate. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-23dm: fix a couple locking issues with use of block interfacesMike Snitzer1-7/+21
old_stop_queue() was checking blk_queue_stopped() without holding the q->queue_lock. dm_requeue_original_request() needed to check blk_queue_stopped(), with q->queue_lock held, before calling blk_mq_kick_requeue_list(). And a side-effect of that change is start_queue() must also call blk_mq_kick_requeue_list(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22dm: allocate blk_mq_tag_set rather than embed in mapped_deviceMike Snitzer1-18/+27
The blk_mq_tag_set is only needed for dm-mq support. There is point wasting space in 'struct mapped_device' for non-dm-mq devices. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # check kzalloc return
2016-02-22dm: add 'dm_mq_nr_hw_queues' and 'dm_mq_queue_depth' module paramsMike Snitzer1-2/+25
Allow user to change these values via module params or sysfs. 'dm_mq_nr_hw_queues' defaults to 1 (max 32). 'dm_mq_queue_depth' defaults to 2048 (up from 64, which proved far too small under moderate sized workloads -- the dm-multipath device would continuously block waiting for tags (requests) to become available). The maximum is BLK_MQ_MAX_DEPTH (currently 10240). Keep in mind the total number of pre-allocated requests per request-based dm-mq device is 'dm_mq_nr_hw_queues' * 'dm_mq_queue_depth' (currently 2048). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22dm: optimize dm_request_fn()Mike Snitzer1-30/+17
DM multipath is the only request-based DM target -- which only supports tables with a single target that is immutable. Leverage this fact in dm_request_fn(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22dm: optimize dm_mq_queue_rq()Mike Snitzer4-23/+31
DM multipath is the only dm-mq target. But that aside, request-based DM only supports tables with a single target that is immutable. Leverage this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in the mapped_device when the table was made active. This saves the need to even take the read-side of the SRCU via dm_{get,put}_live_table. If the active DM table does not have an immutable target (e.g. "error" target was swapped in) then fallback to the slow-path where the target is looked up from the live table. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22dm: set DM_TARGET_WILDCARD feature on "error" targetMike Snitzer4-2/+19
The DM_TARGET_WILDCARD feature indicates that the "error" target may replace any target; even immutable targets. This feature will be useful to preserve the ability to replace the "multipath" target even once it is formally converted over to having the DM_TARGET_IMMUTABLE feature. Also, implicit in the DM_TARGET_WILDCARD feature flag being set is that .map, .map_rq, .clone_and_map_rq and .release_clone_rq are all defined in the target_type. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22dm: cleanup dm_any_congested()Mike Snitzer1-9/+8
The request-based DM support for checking queue congestion doesn't require access to the live DM table. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22dm: remove unused dm_get_rq_mapinfo()Mike Snitzer1-8/+0
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22dm: fix excessive dm-mq context switchingMike Snitzer1-7/+6
Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower than if an underlying null_blk device were used directly. One of the reasons for this drop in performance is that blk_insert_clone_request() was calling blk_mq_insert_request() with @async=true. This forced the use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues which ushered in ping-ponging between process context (fio in this case) and kblockd's kworker to submit the cloned request. The ftrace function_graph tracer showed: kworker-2013 => fio-12190 fio-12190 => kworker-2013 ... kworker-2013 => fio-12190 fio-12190 => kworker-2013 ... Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to _not_ use kblockd to submit the cloned requests isn't enough to eliminate the observed context switches. In addition to this dm-mq specific blk-core fix, there are 2 DM core fixes to dm-mq that (when paired with the blk-core fix) completely eliminate the observed context switching: 1) don't blk_mq_run_hw_queues in blk-mq request completion Motivated by desire to reduce overhead of dm-mq, punting to kblockd just increases context switches. In my testing against a really fast null_blk device there was no benefit to running blk_mq_run_hw_queues() on completion (and no other blk-mq driver does this). So hopefully this change doesn't induce the need for yet another revert like commit 621739b00e16ca2d ! 2) use blk_mq_complete_request() in dm_complete_request() blk_complete_request() doesn't offer the traditional q->mq_ops vs .request_fn branching pattern that other historic block interfaces do (e.g. blk_get_request). Using blk_mq_complete_request() for blk-mq requests is important for performance. It should be noted that, like blk_complete_request(), blk_mq_complete_request() doesn't natively handle partial completions -- but the request-based DM-multipath target does provide the required partial completion support by dm.c:end_clone_bio() triggering requeueing of the request via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE. dm-mq fix #2 is _much_ more important than #1 for eliminating the context switches. Before: cpu : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475 After: cpu : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472 With these changes multithreaded async read IOPs improved from ~950K to ~1350K for this dm-mq stacked on null_blk test-case. The raw read IOPs of the underlying null_blk device for the same workload is ~1950K. Fixes: 7fb4898e0 ("block: add blk-mq support to blk_insert_cloned_request()") Fixes: bfebd1cdb ("dm: add full blk-mq support to request-based DM") Cc: stable@vger.kernel.org # 4.1+ Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Jens Axboe <axboe@kernel.dk>
2016-02-22dm: fix sparse "unexpected unlock" warnings in ioctl codeMike Snitzer1-27/+29
Rename dm_get_live_table_for_ioctl to dm_grab_bdev_for_ioctl and have it do the dm_{get,put}_live_table() rather than split those operations. The dm_grab_bdev_for_ioctl() callers only care about the block_device associated with a singleton DM device so there isn't any need to retain a reference to the live DM table. It is sufficient to: 1) dm_get_live_table() 2) bdgrab() the bdev associated with the singleton table's target 3) dm_put_live_table() 4) bdput() the bdev Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22dm: do not return target from dm_get_live_table_for_ioctl()Mike Snitzer1-20/+13
None of the callers actually used the returned target. Also, just reuse bdev pointer passed to dm_blk_ioctl(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq pathsMike Snitzer1-0/+2
Using request-based DM mpath configured with the following stacking (.request_fn DM mpath ontop of scsi-mq paths): echo Y > /sys/module/scsi_mod/parameters/use_blk_mq echo N > /sys/module/dm_mod/parameters/use_blk_mq 'struct dm_rq_target_io' would leak if a request is requeued before a blk-mq clone is allocated (or fails to allocate). free_rq_tio() wasn't being called. kmemleak reported: unreferenced object 0xffff8800b90b98c0 (size 112): comm "kworker/7:1H", pid 5692, jiffies 4295056109 (age 78.589s) hex dump (first 32 bytes): 00 d0 5c 2c 03 88 ff ff 40 00 bf 01 00 c9 ff ff ..\,....@....... e0 d9 b1 34 00 88 ff ff 00 00 00 00 00 00 00 00 ...4............ backtrace: [<ffffffff81672b6e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811dbb63>] kmem_cache_alloc+0xc3/0x1e0 [<ffffffff8117eae5>] mempool_alloc_slab+0x15/0x20 [<ffffffff8117ec1e>] mempool_alloc+0x6e/0x170 [<ffffffffa00029ac>] dm_old_prep_fn+0x3c/0x180 [dm_mod] [<ffffffff812fbd78>] blk_peek_request+0x168/0x290 [<ffffffffa0003e62>] dm_request_fn+0xb2/0x1b0 [dm_mod] [<ffffffff812f66e3>] __blk_run_queue+0x33/0x40 [<ffffffff812f9585>] blk_delay_work+0x25/0x40 [<ffffffff81096fff>] process_one_work+0x14f/0x3d0 [<ffffffff81097715>] worker_thread+0x125/0x4b0 [<ffffffff8109ce88>] kthread+0xd8/0xf0 [<ffffffff8167cb8f>] ret_from_fork+0x3f/0x70 [<ffffffffffffffff>] 0xffffffffffffffff crash> struct -o dm_rq_target_io struct dm_rq_target_io { ... } SIZE: 112 Fixes: e5863d9ad7 ("dm: allocate requests in target when stacking on blk-mq devices") Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-04Merge branch 'mymd/for-next' into mymd/for-linusShaohua Li6-56/+58
2016-01-25md-cluster: delete useless codeShaohua Li1-4/+0
page->index already considers node offset. The node_offset calculation in write_sb_page is useless and confusion. Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: NeilBrown <neilb@suse.com> Acked-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-01-25md-cluster: fix missing memory freeShaohua Li1-1/+5
There are several places we allocate dlm_lock_resource, but not free it. leave() need free a lock resource too (from Guoqing) Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Guoqing Jiang <gqjiang@suse.com> Cc: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-01-22Merge branch 'for-4.5/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds4-13/+48
Pull block driver updates from Jens Axboe: "This is the block driver pull request for 4.5, with the exception of NVMe, which is in a separate branch and will be posted after this one. This pull request contains: - A set of bcache stability fixes, which have been acked by Kent. These have been used and tested for more than a year by the community, so it's about time that they got in. - A set of drbd updates from the drbd team (Andreas, Lars, Philipp) and Markus Elfring, Oleg Drokin. - A set of fixes for xen blkback/front from the usual suspects, (Bob, Konrad) as well as community based fixes from Kiri, Julien, and Peng. - A 2038 time fix for sx8 from Shraddha, with a fix from me. - A small mtip32xx cleanup from Zhu Yanjun. - A null_blk division fix from Arnd" * 'for-4.5/drivers' of git://git.kernel.dk/linux-block: (71 commits) null_blk: use sector_div instead of do_div mtip32xx: restrict variables visible in current code module xen/blkfront: Fix crash if backend doesn't follow the right states. xen/blkback: Fix two memory leaks. xen/blkback: make st_ statistics per ring xen/blkfront: Handle non-indirect grant with 64KB pages xen-blkfront: Introduce blkif_ring_get_request xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule() xen/blkback: Free resources if connect_ring failed. xen/blocks: Return -EXX instead of -1 xen/blkback: make pool of persistent grants and free pages per-queue xen/blkback: get the number of hardware queues/rings from blkfront xen/blkback: pseudo support for multi hardware queues/rings xen/blkback: separate ring information out of struct xen_blkif xen/blkfront: correct setting for xen_blkif_max_ring_order xen/blkfront: make persistent grants pool per-queue xen/blkfront: Remove duplicate setting of ->xbdev. xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors. xen/blkfront: negotiate number of queues/rings to be used with backend xen/blkfront: split per device io_lock ...
2016-01-21MD: rename some functionsShaohua Li4-51/+53
These short function names are hard to search. Rename them to make vim happy. Signed-off-by: Shaohua Li <shli@fb.com>
2016-01-15Merge tag 'md/4.5' of git://neil.brown.name/mdLinus Torvalds9-142/+418
Pull md updates from Neil Brown: "Mostly clustered-raid1 and raid5 journal updates. one Y2038 fix and other minor stuff. One patch removes me from the MAINTAINERS file and adds a record of my md maintainership to Credits" Many thanks to Neil, who has been around for a _looong_ time. * tag 'md/4.5' of git://neil.brown.name/md: (26 commits) md/raid: only permit hot-add of compatible integrity profiles Remove myself as MD Maintainer, and add to Credits. raid5-cache: handle journal hotadd in quiesce MD: add journal with array suspended md: set MD_HAS_JOURNAL in correct places md: Remove 'ready' field from mddev. md: remove unnecesary md_new_event_inintr raid5: allow r5l_io_unit allocations to fail raid5-cache: use a mempool for the metadata block raid5-cache: use a bio_set raid5-cache: add journal hot add/remove support drivers: md: use ktime_get_real_seconds() md: avoid warning for 32-bit sector_t raid5-cache: free meta_page earlier raid5-cache: simplify r5l_move_io_unit_list md: update comment for md_allow_write md-cluster: update comments for MD_CLUSTER_SEND_LOCKED_ALREADY md-cluster: Protect communication with mutexes md-cluster: Defer MD reloading to mddev->thread md-cluster: update the documentation ...
2016-01-15Merge branch 'for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: floppy: make local variable non-static exynos: fixes an incorrect header guard dt-bindings: fixes some incorrect header guards cpufreq-dt: correct dead link in documentation cpufreq: ARM big LITTLE: correct dead link in documentation treewide: Fix typos in printk Documentation: filesystem: Fix typo in fs/eventfd.c fs/super.c: use && instead of & for warn_on condition Documentation: fix sysfs-ptp lib: scatterlist: fix Kconfig description
2016-01-14Merge tag 'libnvdimm-for-4.5' of ↵Linus Torvalds2-528/+28
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The bulk of this has appeared in -next and independently received a build success notification from the kbuild robot. The 'for-4.5/block- dax' topic branch was rebased over the weekend to drop the "block device end-of-life" rework that Al would like to see re-implemented with a notifier, and to address bug reports against the badblocks integration. There is pending feedback against "libnvdimm: Add a poison list and export badblocks" received last week. Linda identified some localized fixups that we will handle incrementally. Summary: - Media error handling: The 'badblocks' implementation that originated in md-raid is up-levelled to a generic capability of a block device. This initial implementation is limited to being consulted in the pmem block-i/o path. Later, 'badblocks' will be consulted when creating dax mappings. - Raw block device dax: For virtualization and other cases that want large contiguous mappings of persistent memory, add the capability to dax-mmap a block device directly. - Increased /dev/mem restrictions: Add an option to treat all io-memory as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access while a driver is actively using an address range. This behavior is controlled via the new CONFIG_IO_STRICT_DEVMEM option and can be overridden by the existing "iomem=relaxed" kernel command line option. - Miscellaneous fixes include a 'pfn'-device huge page alignment fix, block device shutdown crash fix, and other small libnvdimm fixes" * tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (32 commits) block: kill disk_{check|set|clear|alloc}_badblocks libnvdimm, pmem: nvdimm_read_bytes() badblocks support pmem, dax: disable dax in the presence of bad blocks pmem: fail io-requests to known bad blocks libnvdimm: convert to statically allocated badblocks libnvdimm: don't fail init for full badblocks list block, badblocks: introduce devm_init_badblocks block: clarify badblocks lifetime badblocks: rename badblocks_free to badblocks_exit libnvdimm, pmem: move definition of nvdimm_namespace_add_poison to nd.h libnvdimm: Add a poison list and export badblocks nfit_test: Enable DSMs for all test NFITs md: convert to use the generic badblocks code block: Add badblock management for gendisks badblocks: Add core badblock management code block: fix del_gendisk() vs blkdev_ioctl crash block: enable dax for raw block devices block: introduce bdev_file_inode() restrict /dev/mem to idle io memory ranges arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug ...
2016-01-14md/raid: only permit hot-add of compatible integrity profilesDan Williams5-22/+26
It is not safe for an integrity profile to be changed while i/o is in-flight in the queue. Prevent adding new disks or otherwise online spares to an array if the device has an incompatible integrity profile. The original change to the blk_integrity_unregister implementation in md, commmit c7bfced9a671 "md: suspend i/o during runtime blk_integrity_unregister" introduced an immediate hang regression. This policy of disallowing changes the integrity profile once one has been established is shared with DM. Here is an abbreviated log from a test run that: 1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [ 59.076127] 2/ Tries to add an integrity-disabled device (pmem1m) [ 90.489209] 3/ Retries with an integrity-enabled device (pmem1s) [ 205.671277] [ 59.076127] md/raid1:md0: active with 1 out of 2 mirrors [ 59.078302] md: data integrity enabled on md0 [..] [ 90.489209] md0: incompatible integrity profile for pmem1m [..] [ 205.671277] md: super_written gets error=-5 [ 205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device. [ 205.677386] md/raid1:md0: Operation continuing on 1 devices. [ 205.683037] RAID1 conf printout: [ 205.684699] --- wd:1 rd:2 [ 205.685972] disk 0, wo:0, o:1, dev:pmem0s [ 205.687562] disk 1, wo:1, o:1, dev:pmem1s [ 205.691717] md: recovery of RAID array md0 Fixes: c7bfced9a671 ("md: suspend i/o during runtime blk_integrity_unregister") Cc: <stable@vger.kernel.org> Cc: Mike Snitzer <snitzer@redhat.com> Reported-by: NeilBrown <neilb@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-14raid5-cache: handle journal hotadd in quiesceShaohua Li1-0/+7
Handle journal hotadd in quiesce to avoid creating duplicated threads. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-14MD: add journal with array suspendedShaohua Li1-1/+6
Hot add journal disk in recovery thread context brings a lot of trouble as IO could be running. Unlike spare disk hot add, adding journal disk with array suspended makes more sense and implmentation is much easier. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-14md: set MD_HAS_JOURNAL in correct placesShaohua Li2-4/+6
Set MD_HAS_JOURNAL when a array is loaded or journal is initialized. This is to avoid the flags set too early in journal disk hotadd. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-13Merge branch 'work.misc' of ↵Linus Torvalds3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of stuff. That probably should've been 5 or 6 separate branches, but by the time I'd realized how large and mixed that bag had become it had been too close to -final to play with rebasing. Some fs/namei.c cleanups there, memdup_user_nul() introduction and switching open-coded instances, burying long-dead code, whack-a-mole of various kinds, several new helpers for ->llseek(), assorted cleanups and fixes from various people, etc. One piece probably deserves special mention - Neil's lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets called without ->i_mutex and tries to avoid ever taking it. That, of course, means that it's not useful for any directory modifications, but things like getting inode attributes in nfds readdirplus are fine with that. I really should've asked for moratorium on lookup-related changes this cycle, but since I hadn't done that early enough... I *am* asking for that for the coming cycle, though - I'm going to try and get conversion of i_mutex to rwsem with ->lookup() done under lock taken shared. There will be a patch closer to the end of the window, along the lines of the one Linus had posted last May - mechanical conversion of ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/ inode_is_locked()/inode_lock_nested(). To quote Linus back then: ----- | This is an automated patch using | | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/' | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/' | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/' | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/' | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/' | | with a very few manual fixups ----- I'm going to send that once the ->i_mutex-affecting stuff in -next gets mostly merged (or when Linus says he's about to stop taking merges)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) nfsd: don't hold i_mutex over userspace upcalls fs:affs:Replace time_t with time64_t fs/9p: use fscache mutex rather than spinlock proc: add a reschedule point in proc_readfd_common() logfs: constify logfs_block_ops structures fcntl: allow to set O_DIRECT flag on pipe fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE fs: xattr: Use kvfree() [s390] page_to_phys() always returns a multiple of PAGE_SIZE nbd: use ->compat_ioctl() fs: use block_device name vsprintf helper lib/vsprintf: add %*pg format specifier fs: use gendisk->disk_name where possible poll: plug an unused argument to do_poll amdkfd: don't open-code memdup_user() cdrom: don't open-code memdup_user() rsxx: don't open-code memdup_user() mtip32xx: don't open-code memdup_user() [um] mconsole: don't open-code memdup_user_nul() [um] hostaudio: don't open-code memdup_user() ...
2016-01-12Merge tag 'dm-4.5-changes' of ↵Linus Torvalds18-354/+1619
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - The most significant set of changes this cycle is the Forward Error Correction (FEC) support that has been added to the DM verity target. Google uses DM verity on all Android devices and it is believed that this FEC support will enable DM verity to recover from storage failures seen since DM verity was first deployed as part of Android. - A stable fix for a race in the destruction of DM thin pool's workqueue - A stable fix for hung IO if a DM snapshot copy hit an error - A few small cleanups in DM core and DM persistent data. - A couple DM thinp range discard improvements (address atomicity of finding a range and the efficiency of discarding a partially mapped thin device) - Add ability to debug DM bufio leaks by recording stack trace when a buffer is allocated. Upon detected leak the recorded stack is dumped. * tag 'dm-4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm snapshot: fix hung bios when copy error occurs dm thin: bump thin and thin-pool target versions dm thin: fix race condition when destroying thin pool workqueue dm space map metadata: remove unused variable in brb_pop() dm verity: add ignore_zero_blocks feature dm verity: add support for forward error correction dm verity: factor out verity_for_bv_block() dm verity: factor out structures and functions useful to separate object dm verity: move dm-verity.c to dm-verity-target.c dm verity: separate function for parsing opt args dm verity: clean up duplicate hashing code dm btree: factor out need_insert() helper dm bufio: use BUG_ON instead of conditional call to BUG dm bufio: store stacktrace in buffers to help find buffer leaks dm bufio: return NULL to improve code clarity dm block manager: cleanup code that prints stacktrace dm: don't save and restore bi_private dm thin metadata: make dm_thin_find_mapped_range() atomic dm thin metadata: speed up discard of partially mapped volumes
2016-01-09badblocks: rename badblocks_free to badblocks_exitDan Williams1-1/+1
For symmetry with badblocks_init() make it clear that this path only destroys incremental allocations of a badblocks instance, and does not free the badblocks instance itself. Signed-off-by: Dan Williams <dan.j.williams@intel.com>