summaryrefslogtreecommitdiff
path: root/drivers/md/persistent-data
AgeCommit message (Collapse)AuthorFilesLines
2016-03-03dm space map metadata: remove unused variable in brb_pop()Mike Snitzer1-3/+0
commit 512167788a6fe9481a33a3cce5f80b684631a1bb upstream. Remove the unused struct block_op pointer that was inadvertantly introduced, via cut-and-paste of previous brb_op() code, as part of commit 50dd842ad. (Cc'ing stable@ because commit 50dd842ad did) Fixes: 50dd842ad ("dm space map metadata: fix ref counting bug when bootstrapping a new space map") Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2016-03-03dm btree: fix bufio buffer leaks in dm_btree_del() error pathJoe Thornber1-1/+15
commit ed8b45a3679eb49069b094c0711b30833f27c734 upstream. If dm_btree_del()'s call to push_frame() fails, e.g. due to btree_node_validator finding invalid metadata, the dm_btree_del() error path must unlock all frames (which have active dm-bufio buffers) that were pushed onto the del_stack. Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio buffers have leaked, e.g.: device-mapper: bufio: leaked buffer 3, hold count 1, list 0 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2016-03-03dm space map metadata: fix ref counting bug when bootstrapping a new space mapJoe Thornber1-10/+22
commit 50dd842ad83b43bed71790efb31cfb2f6c05c9c1 upstream. When applying block operations (BOPs) do not remove them from the uncommitted BOP ring-buffer until after they've been applied -- in case we recurse. Also, perform BOP_INC operation, in dm_sm_metadata_create() and sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather than using direct calls to sm_ll_inc(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2016-02-12dm btree: fix leak of bufio-backed block in btree_split_sibling error pathMike Snitzer1-1/+3
commit 30ce6e1cc5a0f781d60227e9096c86e188d2c2bd upstream. The block allocated at the start of btree_split_sibling() is never released if later insert_at() fails. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-11-12dm btree: fix leak of bufio-backed block in btree_split_beneath error pathMike Snitzer1-1/+1
commit 4dcb8b57df3593dcb20481d9d6cf79d1dc1534be upstream. btree_split_beneath()'s error path had an outstanding FIXME that speaks directly to the potential for _not_ cleaning up a previously allocated bufio-backed block. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-11-03dm btree remove: fix a bug when rebalancing nodes after removalJoe Thornber1-6/+11
commit 2871c69e025e8bc507651d5a9cf81a8a7da9d24b upstream. Commit 4c7e309340ff ("dm btree remove: fix bug in redistribute3") wasn't a complete fix for redistribute3(). The redistribute3 function takes 3 btree nodes and shares out the entries evenly between them. If the three nodes in total contained (MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in the center. Fix this issue by being more careful about calculating the target number of entries for the left and right nodes. Unit tested in userspace using this program: https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-10-28dm btree: add ref counting ops for the leaves of top level btreesJoe Thornber4-15/+47
commit b0dc3c8bc157c60b1d470163882be8c13e1950af upstream. When using nested btrees, the top leaves of the top levels contain block addresses for the root of the next tree down. If we shadow a shared leaf node the leaf values (sub tree roots) should be incremented accordingly. This is only an issue if there is metadata sharing in the top levels. Which only occurs if metadata snapshots are being used (as is possible with dm-thinp). And could result in a block from the thinp metadata snap being reused early, thus corrupting the thinp metadata snap. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-08-04dm btree: silence lockdep lock inversion in dm_btree_del()Joe Thornber1-1/+1
commit 1c7518794a3647eb345d59ee52844e8a40405198 upstream. Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del() can be called via an ioctl and we don't want to recurse into the FS or block layer. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-08-04dm btree remove: fix bug in redistribute3Dennis Yang1-3/+3
commit 4c7e309340ff85072e96f529582d159002c36734 upstream. redistribute3() shares entries out across 3 nodes. Some entries were being moved the wrong way, breaking the ordering. This manifested as a BUG() in dm-btree-remove.c:shift() when entries were removed from the btree. For additional context see: https://www.redhat.com/archives/dm-devel/2015-May/msg00113.html Signed-off-by: Dennis Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-08-04dm space map metadata: fix occasional leak of a metadata block on resizeJoe Thornber1-15/+35
commit 6096d91af0b65a3967139b32d5adbb3647858a26 upstream. The metadata space map has a simplified 'bootstrap' mode that is operational when extending the space maps. Whilst in this mode it's possible for some refcount decrement operations to become queued (eg, as a result of shadowing one of the bitmap indexes). These decrements were not being applied when switching out of bootstrap mode. The effect of this bug was the leaking of a 4k metadata block. This is detected by the latest version of thin_check as a non fatal error. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-01-07dm space map metadata: fix sm_bootstrap_get_nr_blocks()Dan Carpenter1-1/+3
commit c1c6156fe4d4577444b769d7edd5dd503e57bbc9 upstream. This function isn't right and it causes a static checker warning: drivers/md/dm-thin.c:3016 maybe_resize_data_dev() error: potentially using uninitialized 'sb_data_size'. It should set "*count" and return zero on success the same as the sm_metadata_get_nr_blocks() function does earlier. Fixes: 3241b1d3e0aa ('dm: add persistent data library') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-11-19dm btree: fix a recursion depth bug in btree walking codeJoe Thornber3-15/+17
commit 9b460d3699324d570a4d4161c3741431887f102f upstream. The walk code was using a 'ro_spine' to hold it's locked btree nodes. But this data structure is designed for the rolling lock scheme, and as such automatically unlocks blocks that are two steps up the call chain. This is not suitable for the simple recursive walk algorithm, which retraces its steps. This code is only used by the persistent array code, which in turn is only used by dm-cache. In order to trigger it you need to have a mapping tree that is more than 2 levels deep; which equates to 8-16 million cache blocks. For instance a 4T ssd with a very small block size of 32k only just triggers this bug. The fix just places the locked blocks on the stack, and stops using the ro_spine altogether. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-05-15dm transaction manager: fix corruption due to non-atomic transaction commitJoe Thornber4-26/+14
commit a9d45396f5956d0b615c7ae3b936afd888351a47 upstream. The persistent-data library used by dm-thin, dm-cache, etc is transactional. If anything goes wrong, such as an io error when writing new metadata or a power failure, then we roll back to the last transaction. Atomicity when committing a transaction is achieved by: a) Never overwriting data from the previous transaction. b) Writing the superblock last, after all other metadata has hit the disk. This commit and the following commit ("dm: take care to copy the space map roots before locking the superblock") fix a bug associated with (b). When committing it was possible for the superblock to still be written in spite of an io error occurring during the preceeding metadata flush. With these commits we're careful not to take the write lock out on the superblock until after the metadata flush has completed. Change the transaction manager's semantics for dm_tm_commit() to assume all data has been flushed _before_ the single superblock that is passed in. As a prerequisite, split the block manager's block unlocking and flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now the unlocking must be done separately. This issue was discovered by forcing io errors at the crucial time using dm-flakey. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24dm space map metadata: fix refcount decrement below 0 which caused corruptionJoe Thornber1-21/+92
commit cebc2de44d3bce53e46476e774126c298ca2c8a9 upstream. This has been a relatively long-standing issue that wasn't nailed down until Teng-Feng Yang's meticulous bug report to dm-devel on 3/7/2014, see: http://www.redhat.com/archives/dm-devel/2014-March/msg00021.html From that report: "When decreasing the reference count of a metadata block with its reference count equals 3, we will call dm_btree_remove() to remove this enrty from the B+tree which keeps the reference count info in metadata device. The B+tree will try to rebalance the entry of the child nodes in each node it traversed, and the rebalance process contains the following steps. (1) Finding the corresponding children in current node (shadow_current(s)) (2) Shadow the children block (issue BOP_INC) (3) redistribute keys among children, and free children if necessary (issue BOP_DEC) Since the update of a metadata block's reference count could be recursive, we will stash these reference count update operations in smm->uncommitted and then process them in a FILO fashion. The problem is that step(3) could free the children which is created in step(2), so the BOP_DEC issued in step(3) will be carried out before the BOP_INC issued in step(2) since these BOPs will be processed in FILO fashion. Once the BOP_DEC from step(3) tries to decrease the reference count of newly shadow block, it will report failure for its reference equals 0 before decreasing. It looks like we can solve this issue by processing these BOPs in a FIFO fashion instead of FILO." Commit 5b564d80 ("dm space map: disallow decrementing a reference count below zero") changed the code to report an error for this temporary refcount decrement below zero. So what was previously a harmless invalid refcount became a hard failure due to the new error path: device-mapper: space map common: unable to decrement a reference count below 0 device-mapper: thin: 253:6: dm_thin_insert_block() failed: error = -22 device-mapper: thin: 253:6: switching pool to read-only mode This bug is in dm persistent-data code that is common to the DM thin and cache targets. So any users of those targets should apply this fix. Fix this by applying recursive space map operations in FIFO order rather than FILO. Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=68801 Reported-by: Apollon Oikonomopoulos <apoikos@debian.org> Reported-by: edwillam1007@gmail.com Reported-by: Teng-Feng Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-02-14dm space map metadata: fix bug in resizing of thin metadataJoe Thornber1-4/+14
commit fca028438fb903852beaf7c3fe1cd326651af57d upstream. This bug was introduced in commit 7e664b3dec431e ("dm space map metadata: fix extending the space map"). When extending a dm-thin metadata volume we: - Switch the space map into a simple bootstrap mode, which allocates all space linearly from the newly added space. - Add new bitmap entries for the new space - Increment the reference counts for those newly allocated bitmap entries - Commit changes to disk - Switch back out of bootstrap mode. But, the disk commit may allocate space itself, if so this fact will be lost when switching out of bootstrap mode. The bug exhibited itself as an error when the bitmap_root, with an erroneous ref count of 0, was subsequently decremented as part of a later disk commit. This would cause the disk commit to fail, and thinp to enter read_only mode. The metadata was not damaged (thin_check passed). The fix is to put the increments + commit into a loop, running until the commit has not allocated extra space. In practise this loop only runs twice. With this fix the following device mapper testsuite test passes: dmtest run --suite thin-provisioning -n thin_remove_works_after_resize Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-14dm space map metadata: fix extending the space mapJoe Thornber1-5/+13
commit 7e664b3dec431eebf0c5df5ff704d6197634cf35 upstream. When extending a metadata space map we should do the first commit whilst still in bootstrap mode -- a mode where all blocks get allocated in the new area. That way the commit overhead is allocated from the newly added space. Otherwise we risk running out of space. With this fix, and the previous commit "dm space map common: make sure new space is used during extend", the following device mapper testsuite test passes: dmtest run --suite thin-provisioning -n /resize_metadata_no_io/ Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-14dm space map common: make sure new space is used during extendJoe Thornber1-1/+5
commit 12c91a5c2d2a8e8cc40a9552313e1e7b0a2d9ee3 upstream. When extending a low level space map we should update nr_blocks at the start so the new space is used for the index entries. Otherwise extend can fail, e.g.: sm_metadata_extend call sequence that fails: -> sm_ll_extend -> dm_tm_new_block -> dm_sm_new_block -> sm_bootstrap_new_block => returns -ENOSPC because smm->begin == smm->ll.nr_blocks Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-20dm thin: allow pool in read-only mode to transition to read-write modeJoe Thornber2-3/+10
commit 9b7aaa64f96f7ca280d75326fca42f42017b89ef upstream. A thin-pool may be in read-only mode because the pool's data or metadata space was exhausted. To allow for recovery, by adding more space to the pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE mode. Otherwise, running out of space will render the pool permanently read-only. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-20dm space map: disallow decrementing a reference count below zeroJoe Thornber1-9/+23
commit 5b564d80f8bc21094c0cd2b19b679d983aabcc29 upstream. The old behaviour, returning -EINVAL if a ref_count of 0 would be decremented, was removed in commit f722063 ("dm space map: optimise sm_ll_dec and sm_ll_inc"). To fix this regression we return an error code from the mutator function pointer passed to sm_ll_mutate() and have dec_ref_count() return -EINVAL if the old ref_count is 0. Add a DMERR to reflect the potential seriousness of this error. Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path. With this fix the following dmts regression test now passes: dmtest run --suite cache -n /metadata_use_kernel/ The next patch fixes the higher-level dm-array code that exposed this regression. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-20dm space map metadata: return on failure in sm_metadata_new_blockMike Snitzer1-2/+6
commit f62b6b8f498658a9d537c7d380e9966f15e1b2a1 upstream. Commit 2fc48021f4afdd109b9e52b6eef5db89ca80bac7 ("dm persistent metadata: add space map threshold callback") introduced a regression to the metadata block allocation path that resulted in errors being ignored. This regression was uncovered by running the following device-mapper-test-suite test: dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/ The ignored error codes in sm_metadata_new_block() could crash the kernel through use of either the dm-thin or dm-cache targets, e.g.: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event. device-mapper: space map metadata: unable to allocate new metadata block general protection fault: 0000 [#1] SMP ... Workqueue: dm-thin do_worker [dm_thin_pool] task: ffff880035ce2ab0 ti: ffff88021a054000 task.ti: ffff88021a054000 RIP: 0010:[<ffffffffa0331385>] [<ffffffffa0331385>] metadata_ll_load_ie+0x15/0x30 [dm_persistent_data] RSP: 0018:ffff88021a055a68 EFLAGS: 00010202 RAX: 003fc8243d212ba0 RBX: ffff88021a780070 RCX: ffff88021a055a78 RDX: ffff88021a055a78 RSI: 0040402222a92a80 RDI: ffff88021a780070 RBP: ffff88021a055a68 R08: ffff88021a055ba4 R09: 0000000000000010 R10: 0000000000000000 R11: 00000002a02e1000 R12: ffff88021a055ad4 R13: 0000000000000598 R14: ffffffffa0338470 R15: ffff88021a055ba4 FS: 0000000000000000(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f467c0291b8 CR3: 0000000001a0b000 CR4: 00000000000007e0 Stack: ffff88021a055ab8 ffffffffa0332020 ffff88021a055b30 0000000000000001 ffff88021a055b30 0000000000000000 ffff88021a055b18 0000000000000000 ffff88021a055ba4 ffff88021a055b98 ffff88021a055ae8 ffffffffa033304c Call Trace: [<ffffffffa0332020>] sm_ll_lookup_bitmap+0x40/0xa0 [dm_persistent_data] [<ffffffffa033304c>] sm_metadata_count_is_more_than_one+0x8c/0xc0 [dm_persistent_data] [<ffffffffa0333825>] dm_tm_shadow_block+0x65/0x110 [dm_persistent_data] [<ffffffffa0331b00>] sm_ll_mutate+0x80/0x300 [dm_persistent_data] [<ffffffffa0330e60>] ? set_ref_count+0x10/0x10 [dm_persistent_data] [<ffffffffa0331dba>] sm_ll_inc+0x1a/0x20 [dm_persistent_data] [<ffffffffa0332270>] sm_disk_new_block+0x60/0x80 [dm_persistent_data] [<ffffffff81520036>] ? down_write+0x16/0x40 [<ffffffffa001e5c4>] dm_pool_alloc_data_block+0x54/0x80 [dm_thin_pool] [<ffffffffa001b23c>] alloc_data_block+0x9c/0x130 [dm_thin_pool] [<ffffffffa001c27e>] provision_block+0x4e/0x180 [dm_thin_pool] [<ffffffffa001fe9a>] ? dm_thin_find_block+0x6a/0x110 [dm_thin_pool] [<ffffffffa001c57a>] process_bio+0x1ca/0x1f0 [dm_thin_pool] [<ffffffff8111e2ed>] ? mempool_free+0x8d/0xa0 [<ffffffffa001d755>] process_deferred_bios+0xc5/0x230 [dm_thin_pool] [<ffffffffa001d911>] do_worker+0x51/0x60 [dm_thin_pool] [<ffffffff81067872>] process_one_work+0x182/0x3b0 [<ffffffff81068c90>] worker_thread+0x120/0x3a0 [<ffffffff81068b70>] ? manage_workers+0x160/0x160 [<ffffffff8106eb2e>] kthread+0xce/0xe0 [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70 Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-20dm array: fix a reference counting bug in shadow_ablockJoe Thornber1-1/+9
commit ed9571f0cf1fe09d3506302610f3ccdfa1d22c4a upstream. An old array block could have its reference count decremented below zero when it is being replaced in the btree by a new array block. The fix is to increment the old ablock's reference count just before inserting a new ablock into the btree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04dm array: fix bug in growing arrayJoe Thornber1-1/+4
commit 9c1d4de56066e4d6abc66ec188faafd7b303fb08 upstream. Entries would be lost if the old tail block was partially filled. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-23dm space map: optimise sm_ll_dec and sm_ll_incJoe Thornber1-28/+49
Prior to this patch these methods did a lookup followed by an insert. Instead they now call a common mutate function that adjusts the value according to a callback function. This avoids traversing the data structures twice and hence improves performance. Also factor out sm_ll_lookup_big_ref_count() for use by both sm_ll_lookup() and sm_ll_mutate(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-08-23dm btree: prefetch child nodes when walking tree for a dm_btree_delJoe Thornber3-5/+31
dm-btree now takes advantage of dm-bufio's ability to prefetch data via dm_bm_prefetch(). Prior to this change many btree node visits were causing a synchronous read. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-08-23dm btree: use pop_frame in dm_btree_del to cleanup codeJoe Thornber1-1/+1
Remove a visited leaf straight away from the stack, rather than marking all it's children as visited and letting it get removed on the next iteration. May also offer a micro optimisation in dm_btree_del. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm persistent metadata: add space map threshold callbackJoe Thornber1-1/+76
Add a threshold callback to dm persistent data space maps. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm persistent data: add threshold callback to space mapJoe Thornber3-3/+29
Add a threshold callback function to the persistent data space map interface for a subsequent patch to use. dm-thin and dm-cache are interested in knowing when they're getting low on metadata or data blocks. This patch introduces a new method for registering a callback against a threshold. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm persistent data: support space map resizingJoe Thornber1-6/+32
Support extending a dm persistent data metadata space map. The extend itself is implemented by switching back to the boostrap allocator and pointing to the new space. The extra bitmap indexes are then allocated from the new space, and finally we switch back to the proper space map ops and tweak the reference counts. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm persistent data: fix error message typosJoe Thornber1-4/+4
Fix some typos in dm-space-map-metadata.c error messages. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm thin: fix discard corruptionJoe Thornber1-22/+24
Fix a bug in dm_btree_remove that could leave leaf values with incorrect reference counts. The effect of this was that removal of a shared block could result in the space maps thinking the block was no longer used. More concretely, if you have a thin device and a snapshot of it, sending a discard to a shared region of the thin could corrupt the snapshot. Thinp uses a 2-level nested btree to store it's mappings. This first level is indexed by thin device, and the second level by logical block. Often when we're removing an entry in this mapping tree we need to rebalance nodes, which can involve shadowing them, possibly creating a copy if the block is shared. If we do create a copy then children of that node need to have their reference counts incremented. In this way reference counts percolate down the tree as shared trees diverge. The rebalance functions were incrementing the children at the appropriate time, but they were always assuming the children were internal nodes. This meant the leaf values (in our case packed block/flags entries) were not being incremented. Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-02dm: add cache targetJoe Thornber1-0/+1
Add a target that allows a fast device such as an SSD to be used as a cache for a slower device such as a disk. A plug-in architecture was chosen so that the decisions about which data to migrate and when are delegated to interchangeable tunable policy modules. The first general purpose module we have developed, called "mq" (multiqueue), follows in the next patch. Other modules are under development. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-02dm persistent data: add bitsetJoe Thornber3-0/+329
Add a persistent bitset as a wrapper around dm-array. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-02dm persistent data: add transactional arrayJoe Thornber3-0/+975
Add a transactional array. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-02dm persistent data: add btree_walkJoe Thornber4-0/+69
Add dm_btree_walk to iterate through the contents of a btree. This will be used by the dm cache target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-02dm persistent data: set some btree fn parms constMike Snitzer1-3/+3
Mark some constant parameters constant in some dm-btree functions. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-02dm persistent data: remove CONFIG_EXPERIMENTALKees Cook1-1/+1
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-02-28hlist: drop the node parameter from iteratorsSasha Levin1-4/+3
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24drivers/md/persistent-data/dm-transaction-manager.c: rename HASH_SIZEAndrew Morton1-7/+7
Fix the warning: drivers/md/persistent-data/dm-transaction-manager.c:28:1: warning: "HASH_SIZE" redefined In file included from include/linux/elevator.h:5, from include/linux/blkdev.h:216, from drivers/md/persistent-data/dm-block-manager.h:11, from drivers/md/persistent-data/dm-transaction-manager.h:10, from drivers/md/persistent-data/dm-transaction-manager.c:6: include/linux/hashtable.h:22:1: warning: this is the location of the previous definition Cc: Alasdair Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-22Merge tag 'dm-3.8-fixes' of ↵Linus Torvalds7-70/+77
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull dm update from Alasdair G Kergon: "Miscellaneous device-mapper fixes, cleanups and performance improvements. Of particular note: - Disable broken WRITE SAME support in all targets except linear and striped. Use it when kcopyd is zeroing blocks. - Remove several mempools from targets by moving the data into the bio's new front_pad area(which dm calls 'per_bio_data'). - Fix a race in thin provisioning if discards are misused. - Prevent userspace from interfering with the ioctl parameters and use kmalloc for the data buffer if it's small instead of vmalloc. - Throttle some annoying error messages when I/O fails." * tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits) dm stripe: add WRITE SAME support dm: remove map_info dm snapshot: do not use map_context dm thin: dont use map_context dm raid1: dont use map_context dm flakey: dont use map_context dm raid1: rename read_record to bio_record dm: move target request nr to dm_target_io dm snapshot: use per_bio_data dm verity: use per_bio_data dm raid1: use per_bio_data dm: introduce per_bio_data dm kcopyd: add WRITE SAME support to dm_kcopyd_zero dm linear: add WRITE SAME support dm: add WRITE SAME support dm: prepare to support WRITE SAME dm ioctl: use kmalloc if possible dm ioctl: remove PF_MEMALLOC dm persistent data: improve improve space map block alloc failure message dm thin: use DMERR_LIMIT for errors ...
2012-12-22dm persistent data: improve improve space map block alloc failure messageJoe Thornber1-1/+1
Improve space map error message when unable to allocate a new metadata block. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-22dm persistent data: use DMERR_LIMIT for errorsMike Snitzer3-21/+20
Nearly all of persistent-data is in the IO path so throttle error messages with DMERR_LIMIT to limit the amount logged when something has gone wrong. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-22dm block manager: reinstate message when validator failsMike Snitzer1-1/+4
Reinstate a useful error message when the block manager buffer validator fails. This was mistakenly eliminated when the block manager was converted to use dm-bufio. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-22dm persistent data: fix nested btree deletionJoe Thornber1-2/+7
When deleting nested btrees, the code forgets to delete the innermost btree. The thin-metadata code serendipitously compensates for this by claiming there is one extra layer in the tree. This patch corrects both problems. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-22dm persistent data: rename node to btree_nodeMikulas Patocka4-47/+47
This patch fixes a compilation failure on sparc32 by renaming struct node. struct node is already defined in include/linux/node.h. On sparc32, it happens to be included through other dependencies and persistent-data doesn't compile because of conflicting declarations. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-10-30md: Fix typo in drivers/mdMasanari Iida2-3/+3
Correct spelling typo in drivers/md. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-10-12dm persistent data: convert to use le32_add_cpuWei Yongjun1-2/+2
Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu(). dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27dm persistent data: introduce dm_bm_set_read_onlyJoe Thornber2-0/+31
Introduce dm_bm_set_read_only to switch the block manager into a read-only mode. To be used when dm-thin degrades due to io errors on the metadata device. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27dm persistent data: stop using dm_bm_unlock_move when shadowing blocks in tmJoe Thornber3-27/+14
Stop using dm_bm_unlock_move when shadowing blocks in the transaction manager as an optimisation and remove the function as it is then no longer used. Some code, such as the space maps, keeps using on-disk data structures from the previous transaction. It can do this because blocks won't be reallocated until the subsequent transaction. Using dm_bm_unlock_move to copy blocks sounds like a win, but it forces a synchronous read should the old block be accessed. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27dm persistent data: tidy transaction manager creation fnsJoe Thornber3-40/+19
Tidy the transaction manager creation functions. They no longer lock the superblock. Superblock locking is pulled out to the caller. Also export dm_bm_write_lock_zero. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27dm persistent data: create new dm_block_manager structJoe Thornber1-20/+38
This patch introduces a separate struct for the block_manager. It also uses IS_ERR to check the return value of dm_bufio_client_create instead of testing incorrectly for NULL. Prior to this patch a struct dm_block_manager was really an alias for a struct dm_bufio_client. We want to add some functionality to the block manager that will require extra fields, so this one to one mapping is no longer valid. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>