summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)AuthorFilesLines
2015-08-08Merge tag 'dm-4.2-fixes-4' of ↵Linus Torvalds2-17/+11
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - stable fix for a dm_merge_bvec() regression on 32 bit Fedora systems. - fix for a 4.2 DM thinp discard regression due to inability to properly delete a range of blocks in a data mapping btree. * tag 'dm-4.2-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm btree remove: fix bug in remove_one() dm: fix dm_merge_bvec regression on 32 bit systems
2015-08-07dm btree remove: fix bug in remove_one()Joe Thornber1-0/+1
remove_one() was not incrementing the key for the beginning of the range, so not all entries were being removed. This resulted in discards that were not unmapping all blocks. Fixes: 4ec331c3ea ("dm btree: add dm_btree_remove_leaves()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-04dm: fix dm_merge_bvec regression on 32 bit systemsMike Snitzer1-17/+10
A DM regression on 32 bit systems was reported against v4.2-rc3 here: https://lkml.org/lkml/2015/7/29/401 Fix this by reverting both commit 1c220c69 ("dm: fix casting bug in dm_merge_bvec()") and 148e51ba ("dm: improve documentation and code clarity in dm_merge_bvec"). This combined revert is done to eliminate the possibility of a partial revert in stable@ kernels. In hindsight the correct fix, at the time 1c220c69 was applied to fix the regression that 148e51ba introduced, should've been to simply revert 148e51ba. Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Tested-by: Adam Williamson <awilliam@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.19+
2015-08-03md/raid5: don't let shrink_slab shrink too far.NeilBrown1-2/+3
I have a report of drop_one_stripe() called from raid5_cache_scan() apparently finding ->max_nr_stripes == 0. This should not be allowed. So add a test to keep max_nr_stripes above min_nr_stripes. Also use a 'mask' rather than a 'mod' in drop_one_stripe to ensure 'hash' is valid even if max_nr_stripes does reach zero. Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.") Cc: stable@vger.kernel.org (4.1 - please release with 2d5b569b665) Reported-by: Tomas Papan <tomas.papan@gmail.com> Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03md: use kzalloc() when bitmap is disabledBenjamin Randazzo1-1/+1
In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a mdu_bitmap_file_t called "file". 5769 file = kmalloc(sizeof(*file), GFP_NOIO); 5770 if (!file) 5771 return -ENOMEM; This structure is copied to user space at the end of the function. 5786 if (err == 0 && 5787 copy_to_user(arg, file, sizeof(*file))) 5788 err = -EFAULT But if bitmap is disabled only the first byte of "file" is initialized with zero, so it's possible to read some bytes (up to 4095) of kernel space memory from user space. This is an information leak. 5775 /* bitmap disabled, zero the first byte and copy out */ 5776 if (!mddev->bitmap_info.file) 5777 file->pathname[0] = '\0'; Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr> Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03md/raid1: extend spinlock to protect raid1_end_read_request against ↵NeilBrown1-4/+6
inconsistencies raid1_end_read_request() assumes that the In_sync bits are consistent with the ->degaded count. raid1_spare_active updates the In_sync bit before the ->degraded count and so exposes an inconsistency, as does error() So extend the spinlock in raid1_spare_active() and error() to hide those inconsistencies. This should probably be part of Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from last working device'.") as it addresses the same issue. It fixes the same bug and should go to -stable for same reasons. Fixes: 76073054c95b ("md/raid1: clean up read_balance.") Cc: stable@vger.kernel.org (v3.0+) Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-29dm cache: fix device destroy hang due to improper prealloc_used accountingMike Snitzer1-3/+3
Commit 665022d72f9 ("dm cache: avoid calls to prealloc_free_structs() if possible") introduced a regression that caused the removal of a DM cache device to hang in cache_postsuspend()'s call to wait_for_migrations() with the following stack trace: [<ffffffff81651457>] schedule+0x37/0x80 [<ffffffffa041e21b>] cache_postsuspend+0xbb/0x470 [dm_cache] [<ffffffff810ba970>] ? prepare_to_wait_event+0xf0/0xf0 [<ffffffffa0006f77>] dm_table_postsuspend_targets+0x47/0x60 [dm_mod] [<ffffffffa0001eb5>] __dm_destroy+0x215/0x250 [dm_mod] [<ffffffffa0004113>] dm_destroy+0x13/0x20 [dm_mod] [<ffffffffa00098cd>] dev_remove+0x10d/0x170 [dm_mod] [<ffffffffa00097c0>] ? dev_suspend+0x240/0x240 [dm_mod] [<ffffffffa0009f85>] ctl_ioctl+0x255/0x4d0 [dm_mod] [<ffffffff8127ac00>] ? SYSC_semtimedop+0x280/0xe10 [<ffffffffa000a213>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [<ffffffff811fd432>] do_vfs_ioctl+0x2d2/0x4b0 [<ffffffff81117d5f>] ? __audit_syscall_entry+0xaf/0x100 [<ffffffff81022636>] ? do_audit_syscall_entry+0x66/0x70 [<ffffffff811fd689>] SyS_ioctl+0x79/0x90 [<ffffffff81023e58>] ? syscall_trace_leave+0xb8/0x110 [<ffffffff81654f6e>] entry_SYSCALL_64_fastpath+0x12/0x71 Fix this by accounting for the call to prealloc_data_structs() immediately _before_ the call as opposed to after. This is needed because it is possible to break out of the control loop after the call to prealloc_data_structs() but before prealloc_used was set to true. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-29Revert "dm cache: do not wake_worker() in free_migration()"Mike Snitzer1-0/+1
This reverts commit 386cb7cdeeef97e0bf082a8d6bbfc07a2ccce07b. Taking the wake_worker() out of free_migration() will slow writeback dramatically, and hence adaptability. Say we have 10k blocks that need writing back, but are only able to issue 5 concurrently due to the migration bandwidth: it's imperative that we wake_worker() immediately after migration completion; waiting for the next 1 second wake up (via do_waker) means it'll take a long time to write that all back. Reported-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-27dm crypt: update wiki page URLBaruch Siach1-1/+1
Cryptsetup moved to gitlab. This is a leftover from commit e44f23b32dc7 (dm crypt: update URLs to new cryptsetup project page, 2015-04-05). Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-27dm cache policy smq: fix alloc_bitset check that always evaluates as falseColin Ian King1-1/+1
static analysis by cppcheck has found a check on alloc_bitset that always evaluates as false and hence never finds an allocation failure: [drivers/md/dm-cache-policy-smq.c:1689]: (warning) Logical conjunction always evaluates to false: !EXPR && EXPR. Fix this by removing the incorrect mq->cache_hit_bits check Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-27dm thin: return -ENOSPC when erroring retry list due to out of data spaceMike Snitzer1-3/+8
Otherwise -EIO would be returned when -ENOSPC should be used consistently. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-25Merge tag 'md/4.2-fixes' of git://neil.brown.name/mdLinus Torvalds8-27/+71
Pull md fixes from Neil Brown: "Some md fixes for 4.2 Several are tagged for -stable. A few aren't because they are not very, serious or because they are in the 'experimental' cluster code" * tag 'md/4.2-fixes' of git://neil.brown.name/md: md/raid5: clear R5_NeedReplace when no longer needed. Fix read-balancing during node failure md-cluster: fix bitmap sub-offset in bitmap_read_sb md: Return error if request_module fails and returns positive value md: Skip cluster setup in case of error while reading bitmap md/raid1: fix test for 'was read error from last working device'. md: Skip cluster setup for dm-raid md: flush ->event_work before stopping array. md/raid10: always set reshape_safe when initializing reshape_position. md/raid5: avoid races when changing cache size.
2015-07-24md/raid5: clear R5_NeedReplace when no longer needed.NeilBrown1-1/+3
This flag is currently never cleared, which can in rare cases trigger a warn-on if it is still set but the block isn't InSync. So clear it when it isn't need, which includes if the replacement device has failed. Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24Fix read-balancing during node failureGoldwyn Rodrigues3-5/+16
During a node failure, We need to suspend read balancing so that the reads are directed to the first device and stale data is not read. Suspending writes is not required because these would be recorded and synced eventually. A new flag MD_CLUSTER_SUSPEND_READ_BALANCING is set in recover_prep(). area_resyncing() will respond true for the entire devices if this flag is set and the request type is READ. The flag is cleared in recover_done(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reported-By: David Teigland <teigland@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24md-cluster: fix bitmap sub-offset in bitmap_read_sbGoldwyn Rodrigues1-7/+4
bitmap_read_sb is modifying mddev->bitmap_info.offset. This works for the first bitmap read. However, when multiple bitmaps need to be opened by the same node, it ends up corrupting the offset. Fix it by using a local variable. Also, bitmap_read_sb is not required in bitmap_copy_from_slot since it is called in bitmap_create. Remove bitmap_read_sb(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24md: Return error if request_module fails and returns positive valueGoldwyn Rodrigues1-1/+1
request_module() can return 256 (process exited) in some cases, which is not as specified in the documentation before the request_module() definition. Convert the error to -ENOENT. The positive error number results in bitmap_create() returning a value that is meant to be an error but doesn't look like one, so it is dereferenced as a point and causes a crash. (not needed for stable as this is "experimental" code) Fixes: edb39c9deda8 ("Introduce md_cluster_operations to handle cluster functions") Signed-off-By: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24md: Skip cluster setup in case of error while reading bitmapGoldwyn Rodrigues1-1/+1
If the bitmap read fails, the error code set is -EINVAL. However, we don't check for errors and go ahead with cluster_setup. Skip the cluster setup in case of error. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24md/raid1: fix test for 'was read error from last working device'.NeilBrown1-1/+1
When we get a read error from the last working device, we don't try to repair it, and don't fail the device. We simple report a read error to the caller. However the current test for 'is this the last working device' is wrong. When there is only one fully working device, it assumes that a non-faulty device is that device. However a spare which is rebuilding would be non-faulty but so not the only working device. So change the test from "!Faulty" to "In_sync". If ->degraded says there is only one fully working device and this device is in_sync, this must be the one. This bug has existed since we allowed read_balance to read from a recovering spare in v3.0 Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Fixes: 76073054c95b ("md/raid1: clean up read_balance.") Cc: stable@vger.kernel.org (v3.0+) Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-23md: Skip cluster setup for dm-raidGoldwyn Rodrigues1-3/+12
There is a bug that the bitmap superblock isn't initialised properly for dm-raid, so a new field can have garbage in new fields. (dm-raid does initialisation in the kernel - md initialised the superblock in mdadm). This means that for dm-raid we cannot currently trust the new ->nodes field. So: - use __GFP_ZERO to initialise the superblock properly for all new arrays - initialise all fields in bitmap_info in bitmap_new_disk_sb - ignore ->nodes for dm arrays (yes, this is a hack) This bug exposes dm-raid to bug in the (still experimental) md-cluster code, so it is suitable for -stable. It does cause crashes. References: https://bugzilla.kernel.org/show_bug.cgi?id=100491 Cc: stable@vger.kernel.org (v4.1) Signed-off-By: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-22md: flush ->event_work before stopping array.NeilBrown1-0/+2
The 'event_work' worker used by dm-raid may still be running when the array is stopped. This can result in an oops. So flush the workqueue on which it is run after detaching and before destroying the device. Reported-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org (2.6.38+ please delay 2 weeks after -final release) Fixes: 9d09e663d550 ("dm: raid456 basic support")
2015-07-22md/raid10: always set reshape_safe when initializing reshape_position.NeilBrown1-1/+4
'reshape_position' tracks where in the reshape we have reached. 'reshape_safe' tracks where in the reshape we have safely recorded in the metadata. These are compared to determine when to update the metadata. So it is important that reshape_safe is initialised properly. Currently it isn't. When starting a reshape from the beginning it usually has the correct value by luck. But when reducing the number of devices in a RAID10, it has the wrong value and this leads to the metadata not being updated correctly. This can lead to corruption if the reshape is not allowed to complete. This patch is suitable for any -stable kernel which supports RAID10 reshape, which is 3.5 and later. Fixes: 3ea7daa5d7fd ("md/raid10: add reshape support") Cc: stable@vger.kernel.org (v3.5+ please wait for -final to be out for 2 weeks) Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-22md/raid5: avoid races when changing cache size.NeilBrown2-7/+27
Cache size can grow or shrink due to various pressures at any time. So when we resize the cache as part of a 'grow' operation (i.e. change the size to allow more devices) we need to blocks that automatic growing/shrinking. So introduce a mutex. auto grow/shrink uses mutex_trylock() and just doesn't bother if there is a blockage. Resizing the whole cache holds the mutex to ensure that the correct number of new stripes is allocated. This bug can result in some stripes not being freed when an array is stopped. This leads to the kmem_cache not being freed and a subsequent array can try to use the same kmem_cache and get confused. Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.") Cc: stable@vger.kernel.org (4.1 - please delay until 2 weeks after release of 4.2) Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-18Merge tag 'dm-4.2-fixes-2' of ↵Linus Torvalds5-34/+68
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - revert a request-based DM core change that caused IO latency to increase and adversely impact both throughput and system load - fix for a use after free bug in DM core's device cleanup - a couple DM btree removal fixes (used by dm-thinp) - a DM thinp fix for order-5 allocation failure - a DM thinp fix to not degrade to read-only metadata mode when in out-of-data-space mode for longer than the 'no_space_timeout' - fix a long-standing oversight in both dm-thinp and dm-cache by now exporting 'needs_check' in status if it was set in metadata - fix an embarrassing dm-cache busy-loop that caused worker threads to eat cpu even if no IO was actively being issued to the cache device * tag 'dm-4.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: avoid calls to prealloc_free_structs() if possible dm cache: avoid preallocation if no work in writeback_some_dirty_blocks() dm cache: do not wake_worker() in free_migration() dm cache: display 'needs_check' in status if it is set dm thin: display 'needs_check' in status if it is set dm thin: stay in out-of-data-space mode once no_space_timeout expires dm: fix use after free crash due to incorrect cleanup sequence Revert "dm: only run the queue on completion if congested or no requests pending" dm btree: silence lockdep lock inversion in dm_btree_del() dm thin: allocate the cell_sort_array dynamically dm btree remove: fix bug in redistribute3
2015-07-17dm cache: avoid calls to prealloc_free_structs() if possibleMike Snitzer1-3/+12
If no work was performed then prealloc_data_structs() wasn't ever called so there isn't any need to call prealloc_free_structs(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-17dm cache: avoid preallocation if no work in writeback_some_dirty_blocks()Mike Snitzer1-9/+4
Refactor writeback_some_dirty_blocks() to avoid prealloc_data_structs() if the policy doesn't have any dirty blocks ready for writeback. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-17dm cache: do not wake_worker() in free_migration()Mike Snitzer1-1/+0
All methods that queue work call wake_worker() as you'd expect. E.g. cell_defer, defer_bio, quiesce_migration (which is called by writeback, promote, demote_then_promote, invalidate, discard, etc). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-16dm cache: display 'needs_check' in status if it is setMike Snitzer1-2/+7
There is currently no way to see that the needs_check flag has been set in the metadata. Display 'needs_check' in the cache status if it is set in the cache metadata. Also, update cache documentation. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-16dm thin: display 'needs_check' in status if it is setMike Snitzer1-2/+8
There is currently no way to see that the needs_check flag has been set in the metadata. Display 'needs_check' in the thin-pool status if it is set in the thinp metadata. Also, update thinp documentation. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-16dm thin: stay in out-of-data-space mode once no_space_timeout expiresMike Snitzer1-4/+17
This fixes an issue where running out of data space would cause the thin-pool's metadata to become read-only. There was no reason to make metadata read-only -- calling set_pool_mode() with PM_READ_ONLY was a misguided way to error all queued and future write IOs. We can accomplish the same by degrading from PM_OUT_OF_DATA_SPACE to PM_OUT_OF_DATA_SPACE with error_if_no_space enabled. Otherwise, the use of PM_READ_ONLY could cause a race where commit() was started before the PM_READ_ONLY transition but dm_pool_commit_metadata() would go on to fail because the block manager had transitioned to read-only. The return of -EPERM from dm_pool_commit_metadata(), due to attempting to commit while in read-only mode, caused the thin-pool to set 'needs_check' because a metadata_operation_failed(). This needless cascade of failures makes life for users more difficult than needed. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-13dm: fix use after free crash due to incorrect cleanup sequenceMikulas Patocka1-2/+2
Linux 4.2-rc1 Commit 0f20972f7bf6 ("dm: factor out a common cleanup_mapped_device()") moved a common cleanup code to a separate function. Unfortunately, that commit incorrectly changed the order of cleanup, so that it destroys the mapped_device's srcu structure 'io_barrier' before destroying its workqueue. The function that is executed on the workqueue (dm_wq_work) uses the srcu structure, thus it may use it after being freed. That results in a crash in the LVM test suite's mirror-vgreduce-removemissing.sh test. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 0f20972f7bf6 ("dm: factor out a common cleanup_mapped_device()") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-11bcache: don't embed 'return' statements in closure macrosJens Axboe4-6/+14
This is horribly confusing, it breaks the flow of the code without it being apparent in the caller. Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-07-08Revert "dm: only run the queue on completion if congested or no requests ↵Mike Snitzer1-6/+2
pending" This reverts commit 9a0e609e3fd8a95c96629b9fbde6b8c5b9a1456a. (Resolved a conflict during revert due to commit bfebd1cdb4 that came after) This revert is motivated by a couple failure reports on request-based DM multipath testbeds: 1) Netapp reported that their multipath fault injection test under heavy IO load can stall longer than 300 seconds. 2) IBM reported elevated lock contention in their testbed (likely due to increased back pressure due to IO not being dispatched as quickly): https://www.redhat.com/archives/dm-devel/2015-July/msg00057.html Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.1+
2015-07-06dm btree: silence lockdep lock inversion in dm_btree_del()Joe Thornber1-1/+1
Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del() can be called via an ioctl and we don't want to recurse into the FS or block layer. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-07-06dm thin: allocate the cell_sort_array dynamicallyJoe Thornber1-1/+12
Given the pool's cell_sort_array holds 8192 pointers it triggers an order 5 allocation via kmalloc. This order 5 allocation is prone to failure as system memory gets more fragmented over time. Fix this by allocating the cell_sort_array using vmalloc. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-07-06dm btree remove: fix bug in redistribute3Dennis Yang1-3/+3
redistribute3() shares entries out across 3 nodes. Some entries were being moved the wrong way, breaking the ordering. This manifested as a BUG() in dm-btree-remove.c:shift() when entries were removed from the btree. For additional context see: https://www.redhat.com/archives/dm-devel/2015-May/msg00113.html Signed-off-by: Dennis Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-07-05Merge branch 'for-linus' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
2015-07-01MAINTAINERS: BCACHE: Kent Overstreet has changed email addressJoe Perches1-1/+1
Kent's email address in MAINTAINERS seems to be invalid. This was his last sign-off address, so use that if appropriate. Fix the S: status entry while there. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-01bcache: use kvfree() in various placesPekka Enberg2-16/+4
Use kvfree() instead of open-coding it. Signed-off-by: Pekka Enberg <penberg@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-29Merge tag 'md/4.2' of git://neil.brown.name/mdLinus Torvalds4-101/+133
Pull md updates from Neil Brown: "A mixed bag - a few bug fixes - some performance improvement that decrease lock contention - some clean-up Nothing major" * tag 'md/4.2' of git://neil.brown.name/md: md: clear Blocked flag on failed devices when array is read-only. md: unlock mddev_lock on an error path. md: clear mddev->private when it has been freed. md: fix a build warning md/raid5: ignore released_stripes check md/raid5: per hash value and exclusive wait_for_stripe md/raid5: split wait_for_stripe and introduce wait_for_quiescent wait: introduce wait_event_exclusive_cmd md: convert to kstrto*() md/raid10: make sync_request_write() call bio_copy_data()
2015-06-26Merge tag 'dm-4.2-fixes' of ↵Linus Torvalds4-90/+166
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: "Apologies for not pressing this request-based DM partial completion issue further, it was an oversight on my part. We'll have to get it fixed up properly and revisit for a future release. - Revert block and DM core changes the removed request-based DM's ability to handle partial request completions -- otherwise with the current SCSI LLDs these changes could lead to silent data corruption. - Fix two DM version bumps that were missing from the initial 4.2 DM pull request (enabled userspace lvm2 to know certain changes have been made)" * tag 'dm-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache policy smq: fix "default" version to be 1.4.0 dm: bump the ioctl version to 4.32.0 Revert "block, dm: don't copy bios for request clones" Revert "dm: do not allocate any mempools for blk-mq request-based DM"
2015-06-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-3/+1
Merge second patchbomb from Andrew Morton: - most of the rest of MM - lots of misc things - procfs updates - printk feature work - updates to get_maintainer, MAINTAINERS, checkpatch - lib/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits) exit,stats: /* obey this comment */ coredump: add __printf attribute to cn_*printf functions coredump: use from_kuid/kgid when formatting corename fs/reiserfs: remove unneeded cast NILFS2: support NFSv2 export fs/befs/btree.c: remove unneeded initializations fs/minix: remove unneeded cast init/do_mounts.c: add create_dev() failure log kasan: remove duplicate definition of the macro KASAN_FREE_PAGE fs/efs: femove unneeded cast checkpatch: emit "NOTE: <types>" message only once after multiple files checkpatch: emit an error when there's a diff in a changelog checkpatch: validate MODULE_LICENSE content checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr() checkpatch: fix processing of MEMSET issues checkpatch: suggest using ether_addr_equal*() checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files checkpatch: remove local from codespell path checkpatch: add --showfile to allow input via pipe to show filenames ...
2015-06-26dm cache policy smq: fix "default" version to be 1.4.0Mike Snitzer1-1/+1
Commit bccab6a0 ("dm cache: switch the "default" cache replacement policy from mq to smq") should've incremented the "default" policy's version number to 1.4.0 rather than reverting to version 1.0.0. Reported-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-26Revert "block, dm: don't copy bios for request clones"Mike Snitzer3-59/+142
This reverts commit 5f1b670d0bef508a5554d92525f5f6d00d640b38. Justification for revert as reported in this dm-devel post: https://www.redhat.com/archives/dm-devel/2015-June/msg00160.html this change should not be pushed to mainline yet. Firstly, Christoph has a newer version of the patch that fixes silent data corruption problem: https://www.redhat.com/archives/dm-devel/2015-May/msg00229.html And the new version still depends on LLDDs to always complete requests to the end when error happens, while block API doesn't enforce such a requirement. If the assumption is ever broken, the inconsistency between request and bio (e.g. rq->__sector and rq->bio) will cause silent data corruption: https://www.redhat.com/archives/dm-devel/2015-June/msg00022.html Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-26Revert "dm: do not allocate any mempools for blk-mq request-based DM"Mike Snitzer2-40/+33
This reverts commit cbc4e3c1350beb47beab8f34ad9be3d34a20c705. Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-26drivers/md/md.c: use strreplace()Rasmus Villemoes1-3/+1
There's no point in starting over when we meet a '/'. This also eliminates a stack variable and a little .text. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-26Merge tag 'dm-4.2-changes' of ↵Linus Torvalds29-714/+4103
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - DM core cleanups: * blk-mq request-based DM no longer uses any mempools now that partial completions are no longer handled as part of cloned requests - DM raid cleanups and support for MD raid0 - DM cache core advances and a new stochastic-multi-queue (smq) cache replacement policy * smq is the new default dm-cache policy - DM thinp cleanups and much more efficient large discard support - DM statistics support for request-based DM and nanosecond resolution timestamps - Fixes to DM stripe, DM log-writes, DM raid1 and DM crypt * tag 'dm-4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (39 commits) dm stats: add support for request-based DM devices dm stats: collect and report histogram of IO latencies dm stats: support precise timestamps dm stats: fix divide by zero if 'number_of_areas' arg is zero dm cache: switch the "default" cache replacement policy from mq to smq dm space map metadata: fix occasional leak of a metadata block on resize dm thin metadata: fix a race when entering fail mode dm thin: fail messages with EOPNOTSUPP when pool cannot handle messages dm thin: range discard support dm thin metadata: add dm_thin_remove_range() dm thin metadata: add dm_thin_find_mapped_range() dm btree: add dm_btree_remove_leaves() dm stats: Use kvfree() in dm_kvfree() dm cache: age and write back cache entries even without active IO dm cache: prefix all DMERR and DMINFO messages with cache device name dm cache: add fail io mode and needs_check flag dm cache: wake the worker thread every time we free a migration object dm cache: add stochastic-multi-queue (smq) policy dm cache: boost promotion of blocks that will be overwritten dm cache: defer whole cells ...
2015-06-26Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-blockLinus Torvalds6-4/+7
Pull cgroup writeback support from Jens Axboe: "This is the big pull request for adding cgroup writeback support. This code has been in development for a long time, and it has been simmering in for-next for a good chunk of this cycle too. This is one of those problems that has been talked about for at least half a decade, finally there's a solution and code to go with it. Also see last weeks writeup on LWN: http://lwn.net/Articles/648292/" * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits) writeback, blkio: add documentation for cgroup writeback support vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB writeback: do foreign inode detection iff cgroup writeback is enabled v9fs: fix error handling in v9fs_session_init() bdi: fix wrong error return value in cgwb_create() buffer: remove unusued 'ret' variable writeback: disassociate inodes from dying bdi_writebacks writeback: implement foreign cgroup inode bdi_writeback switching writeback: add lockdep annotation to inode_to_wb() writeback: use unlocked_inode_to_wb transaction in inode_congested() writeback: implement unlocked_inode_to_wb transaction and use it for stat updates writeback: implement [locked_]inode_to_wb_and_lock_list() writeback: implement foreign cgroup inode detection writeback: make writeback_control track the inode being written back writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb() mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use writeback: implement memcg writeback domain based throttling writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes writeback: implement memcg wb_domain writeback: update wb_over_bg_thresh() to use wb_domain aware operations ...
2015-06-26Merge branch 'for-4.2/core' of git://git.kernel.dk/linux-blockLinus Torvalds10-160/+65
Pull core block IO update from Jens Axboe: "Nothing really major in here, mostly a collection of smaller optimizations and cleanups, mixed with various fixes. In more detail, this contains: - Addition of policy specific data to blkcg for block cgroups. From Arianna Avanzini. - Various cleanups around command types from Christoph. - Cleanup of the suspend block I/O path from Christoph. - Plugging updates from Shaohua and Jeff Moyer, for blk-mq. - Eliminating atomic inc/dec of both remaining IO count and reference count in a bio. From me. - Fixes for SG gap and chunk size support for data-less (discards) IO, so we can merge these better. From me. - Small restructuring of blk-mq shared tag support, freeing drivers from iterating hardware queues. From Keith Busch. - A few cfq-iosched tweaks, from Tahsin Erdogan and me. Makes the IOPS mode the default for non-rotational storage" * 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits) cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL cfq-iosched: fix sysfs oops when attempting to read unconfigured weights cfq-iosched: move group scheduling functions under ifdef cfq-iosched: fix the setting of IOPS mode on SSDs blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file block, cgroup: implement policy-specific per-blkcg data block: Make CFQ default to IOPS mode on SSDs block: add blk_set_queue_dying() to blkdev.h blk-mq: Shared tag enhancements block: don't honor chunk sizes for data-less IO block: only honor SG gap prevention for merges that contain data block: fix returnvar.cocci warnings block, dm: don't copy bios for request clones block: remove management of bi_remaining when restoring original bi_end_io block: replace trylock with mutex_lock in blkdev_reread_part() block: export blkdev_reread_part() and __blkdev_reread_part() suspend: simplify block I/O handling block: collapse bio bit space block: remove unused BIO_RW_BLOCK and BIO_EOF flags block: remove BIO_EOPNOTSUPP ...
2015-06-25md: clear Blocked flag on failed devices when array is read-only.Neil Brown1-0/+9
The Blocked flag indicates that a device has failed but that this fact hasn't been recorded in the metadata yet. Writes to such devices cannot be allowed until the metadata has been updated. On a read-only array, the Blocked flag will never be cleared. This prevents the device being removed from the array. If the metadata is being handled by the kernel (i.e. !mddev->external), then we can be sure that if the array is switch to writable, then a metadata update will happen and will record the failure. So we don't need the flag set. If metadata is externally managed, it is upto the external manager to clear the 'blocked' flag. Reported-by: XiaoNi <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-25md: unlock mddev_lock on an error path.NeilBrown1-1/+3
This error path retuns while still holding the lock - bad. Fixes: 6791875e2e53 ("md: make reconfig_mutex optional for writes to md sysfs files.") Cc: stable@vger.kernel.org (v4.0+) Signed-off-by: NeilBrown <neilb@suse.com>