summaryrefslogtreecommitdiff
path: root/fs/btrfs/volumes.c
AgeCommit message (Collapse)AuthorFilesLines
2016-04-28btrfs: use existing device constraints table btrfs_raid_arrayDavid Sterba1-14/+9
We should avoid duplicating the device constraints, let's use the btrfs_raid_array in btrfs_check_raid_min_devices. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: introduce raid-type to error-code table, for minimum device constraintDavid Sterba1-0/+15
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: pass number of devices to btrfs_check_raid_min_devicesDavid Sterba1-15/+20
Before this patch, btrfs_check_raid_min_devices would do an off-by-one check of the constraints and not the miminmum check, as its name suggests. This is not a problem if the only caller is device remove, but would be confusing for others. Add an argument with the exact number and let the caller(s) decide if this needs any adjustments, like when device replace is running. Reviewed-by: Anand Jain <anand.jain@oracle.com> Tested-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: rename __check_raid_min_devicesDavid Sterba1-2/+2
Underscores are for special functions, use the full prefix for better stacktrace recognition. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: optimize check for stale deviceAnand Jain1-1/+2
Optimize check for stale device to only be checked when there is device added or changed. If there is no update to the device, there is no need to call btrfs_free_stale_device(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: introduce device delete by devidAnand Jain1-2/+2
This introduces new ioctl BTRFS_IOC_RM_DEV_V2, which uses enhanced struct btrfs_ioctl_vol_args_v2 to carry devid as an user argument. The patch won't delete the old ioctl interface and so kernel remains backward compatible with user land progs. Test case/script: echo "0 $(blockdev --getsz /dev/sdf) linear /dev/sdf 0" | dmsetup create bad_disk mkfs.btrfs -f -d raid1 -m raid1 /dev/sdd /dev/sde /dev/mapper/bad_disk mount /dev/sdd /btrfs dmsetup suspend bad_disk echo "0 $(blockdev --getsz /dev/sdf) error /dev/sdf 0" | dmsetup load bad_disk dmsetup resume bad_disk echo "bad disk failed. now deleting/replacing" btrfs dev del 3 /btrfs echo $? btrfs fi show /btrfs umount /btrfs btrfs-show-super /dev/sdd | egrep num_device dmsetup remove bad_disk wipefs -a /dev/sdf Signed-off-by: Anand Jain <anand.jain@oracle.com> Reported-by: Martin <m_btrfs@ml1.co.uk> [ adjust messages, s/disk/device/ ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: make use of btrfs_scratch_superblocks() in btrfs_rm_device()Anand Jain1-64/+13
With the previous patches now the btrfs_scratch_superblocks() is ready to be used in btrfs_rm_device() so use it. Signed-off-by: Anand Jain <anand.jain@oracle.com> [ use GFP_KERNEL ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: enhance btrfs_find_device_by_user_input() to check device pathAnand Jain1-0/+3
The operation of device replace and device delete follows same steps upto some depth with in btrfs kernel, however they don't share codes. This enhancement will help replace and delete to share codes. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: make use of btrfs_find_device_by_user_input()Anand Jain1-63/+37
btrfs_rm_device() has a section of the code which can be replaced btrfs_find_device_by_user_input() Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: create helper btrfs_find_device_by_user_input()Anand Jain1-0/+19
The patch renames btrfs_dev_replace_find_srcdev() to btrfs_find_device_by_user_input() and moves it to volumes.c, so that delete device can use it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: clean up and optimize __check_raid_min_device()Anand Jain1-24/+19
__check_raid_min_device() which was pealed from btrfs_rm_device() maintianed its original code to show the block move. This patch cleans up __check_raid_min_device(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: create helper function __check_raid_min_devices()Anand Jain1-19/+32
move a section of btrfs_rm_device() code to check for min number of the devices into the function __check_raid_min_devices() Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: create a helper function to read the disk superAnand Jain1-35/+52
A part of code from btrfs_scan_one_device() is moved to a new function btrfs_read_disk_super(), so that former function looks cleaner. (In this process it also moves the code which ensures null terminating label). So this creates easy opportunity to merge various duplicate codes on read disk super. Earlier attempt to merge duplicate codes highlighted that there were some issues for which there are duplicate codes (to read disk super), however it was not clear what was the issue. So until we figure that out, its better to keep them in a separate functions. Signed-off-by: Anand Jain <anand.jain@oracle.com> [ use GFP_KERNEL, PAGE_CACHE_ removal related fixups ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28Btrfs: do not create empty block group if we have allocated dataLiu Bo1-1/+8
Now we force to create empty block group to keep data profile alive, however, in the below example, we eventually get an empty block group while we're trying to get more space for other types (metadata/system), - Before, block group "A": size=2G, used=1.2G block group "B": size=2G, used=512M - After "btrfs balance start -dusage=50 mount_point", block group "A": size=2G, used=(1.2+0.5)G block group "C": size=2G, used=0 Since there is no data in block group C, it won't be deleted automatically and we have to get the unused 2G until the next mount. Balance itself just moves data and doesn't remove data, so it's safe to not create such a empty block group if we already have data allocated in other block groups. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: rename btrfs_std_error to btrfs_handle_fs_errorAnand Jain1-7/+7
btrfs_std_error() handles errors, puts FS into readonly mode (as of now). So its good idea to rename it to btrfs_handle_fs_error(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ edit changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov1-7/+7
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-14btrfs: Fix misspellings in comments.Adam Buchbinder1-2/+2
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-26Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6David Sterba1-10/+14
2016-02-26Merge branch 'dev/rename-keys' into for-chris-4.6David Sterba1-7/+7
2016-02-23Btrfs: fix lockdep deadlock warning due to dev_replaceLiu Bo1-10/+14
Xfstests btrfs/011 complains about a deadlock warning, [ 1226.649039] ========================================================= [ 1226.649039] [ INFO: possible irq lock inversion dependency detected ] [ 1226.649039] 4.1.0+ #270 Not tainted [ 1226.649039] --------------------------------------------------------- [ 1226.652955] kswapd0/46 just changed the state of lock: [ 1226.652955] (&delayed_node->mutex){+.+.-.}, at: [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0 [ 1226.652955] but this lock took another, RECLAIM_FS-unsafe lock in the past: [ 1226.652955] (&fs_info->dev_replace.lock){+.+.+.} and interrupts could create inverse lock ordering between them. [ 1226.652955] other info that might help us debug this: [ 1226.652955] Chain exists of: &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock [ 1226.652955] Possible interrupt unsafe locking scenario: [ 1226.652955] CPU0 CPU1 [ 1226.652955] ---- ---- [ 1226.652955] lock(&fs_info->dev_replace.lock); [ 1226.652955] local_irq_disable(); [ 1226.652955] lock(&delayed_node->mutex); [ 1226.652955] lock(&found->groups_sem); [ 1226.652955] <Interrupt> [ 1226.652955] lock(&delayed_node->mutex); [ 1226.652955] *** DEADLOCK *** Commit 084b6e7c7607 ("btrfs: Fix a lockdep warning when running xfstest.") tried to fix a similar one that has the exactly same warning, but with that, we still run to this. The above lock chain comes from btrfs_commit_transaction ->btrfs_run_delayed_items ... ->__btrfs_update_delayed_inode ... ->__btrfs_cow_block ... ->find_free_extent ->cache_block_group ->load_free_space_cache ->btrfs_readpages ->submit_one_bio ... ->__btrfs_map_block ->btrfs_dev_replace_lock However, with high memory pressure, tasks which hold dev_replace.lock can be interrupted by kswapd and then kswapd is intended to release memory occupied by superblock, inodes and dentries, where we may call evict_inode, and it comes to [ 1226.652955] [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0 [ 1226.652955] [<ffffffff81459e74>] btrfs_remove_delayed_node+0x24/0x30 [ 1226.652955] [<ffffffff8140c5fe>] btrfs_evict_inode+0x34e/0x700 delayed_node->mutex may be acquired in __btrfs_release_delayed_node(), and it leads to a ABBA deadlock. To fix this, we can use "blocking rwlock" used in the case of extent_buffer, but things are simpler here since we only needs read's spinlock to blocking lock. With this, btrfs/011 no more produces warnings in dmesg. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: switch dev stats item to the permanent item keyDavid Sterba1-4/+4
Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: switch balance item to the temporary item keyDavid Sterba1-3/+3
No visible change. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: device add and remove: use GFP_KERNELDavid Sterba1-4/+5
We can safely use GFP_KERNEL in the functions called from the ioctl handlers. Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-20btrfs: Use direct way to determine raid56 write/recover modeZhao Lei1-1/+2
Old code used bbio->raid_map to determine whether in raid56 write/recover operation, because we didn't't have bbio->map_type. Now we have direct way for this condition, rid of using the function-relative data, and make the code more readable. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-01-20btrfs: Small cleanup for get index_srcdev loopZhao Lei1-22/+20
1: Adjust condition in loop to make less TAB 2: Move btrfs_put_bbio()'s line for combine, and makes logic clean. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-01-20btrfs: Enhance chunk validation checkQu Wenruo1-1/+32
Enhance chunk validation: 1) Num_stripes We already have such check but it's only in super block sys chunk array. Now check all on-disk chunks. 2) Chunk logical It should be aligned to sector size. This behavior should be *DOUBLE CHECKED* for 64K sector size like PPC64 or AArch64. Maybe we can found some hidden bugs. 3) Chunk length Same as chunk logical, should be aligned to sector size. 4) Stripe length It should be power of 2. 5) Chunk type Any bit out of TYPE_MAS | PROFILE_MASK is invalid. With all these much restrict rules, several fuzzed image reported in mail list should no longer cause kernel panic. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-01-20Btrfs: fix typo in log message when starting a balanceFilipe Manana1-1/+1
The recent change titled "Btrfs: Check metadata redundancy on balance" (already in linux-next) left a typo in a message for users: metatdata -> metadata. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-01-20Merge branch 'misc-for-4.5' of ↵Chris Mason1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
2016-01-20Merge branch 'misc-cleanups-4.5' of ↵Chris Mason1-13/+13
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
2016-01-19btrfs: remove duplicate const specifierColin Ian King1-1/+1
duplicate const is redundant so remove it Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-15btrfs: initialize the seq counter in struct btrfs_deviceSebastian Andrzej Siewior1-0/+1
I managed to trigger this: | INFO: trying to register non-static key. | the code is fine but needs lockdep annotation. | turning off the locking correctness validator. | CPU: 1 PID: 781 Comm: systemd-gpt-aut Not tainted 4.4.0-rt2+ #14 | Hardware name: ARM-Versatile Express | [<80307cec>] (dump_stack) | [<80070e98>] (__lock_acquire) | [<8007184c>] (lock_acquire) | [<80287800>] (btrfs_ioctl) | [<8012a8d4>] (do_vfs_ioctl) | [<8012ac14>] (SyS_ioctl) so I think that btrfs_device_data_ordered_init() is not invoked behind a macro somewhere. Fixes: 7cc8e58d53cd ("Btrfs: fix unprotected device's variants on 32bits machine") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-15btrfs: cleanup, stop casting for extent_map->lookup everywhereJeff Mahoney1-12/+12
Overloading extent_map->bdev to struct map_lookup * might have started out as a means to an end, but it's a pattern that's used all over the place now. Let's get rid of the casting and just add a union instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-11Merge branch 'for-chris-4.5' of ↵Chris Mason1-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.5 Signed-off-by: Chris Mason <clm@fb.com>
2016-01-11Merge branch 'misc-cleanups-4.5' of ↵Chris Mason1-9/+8
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 Signed-off-by: Chris Mason <clm@fb.com>
2016-01-11Merge branch 'misc-for-4.5' of ↵Chris Mason1-11/+21
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
2016-01-08Btrfs: fix fitrim discarding device area reserved for boot loader's useFilipe Manana1-10/+10
As of the 4.3 kernel release, the fitrim ioctl can now discard any region of a disk that is not allocated to any chunk/block group, including the first megabyte which is used for our primary superblock and by the boot loader (grub for example). Fix this by not allowing to trim/discard any region in the device starting with an offset not greater than min(alloc_start_mount_option, 1Mb), just as it was not possible before 4.3. A reproducer test case for xfstests follows. seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { cd / rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 # Write to the [0, 64Kb[ and [68Kb, 1Mb[ ranges of the device. These ranges are # reserved for a boot loader to use (GRUB for example) and btrfs should never # use them - neither for allocating metadata/data nor should trim/discard them. # The range [64Kb, 68Kb[ is used for the primary superblock of the filesystem. $XFS_IO_PROG -c "pwrite -S 0xfd 0 64K" $SCRATCH_DEV | _filter_xfs_io $XFS_IO_PROG -c "pwrite -S 0xfd 68K 956K" $SCRATCH_DEV | _filter_xfs_io # Now mount the filesystem and perform a fitrim against it. _scratch_mount _require_batched_discard $SCRATCH_MNT $FSTRIM_PROG $SCRATCH_MNT # Now unmount the filesystem and verify the content of the ranges was not # modified (no trim/discard happened on them). _scratch_unmount echo "Content of the ranges [0, 64Kb] and [68Kb, 1Mb[ after fitrim:" od -t x1 -N $((64 * 1024)) $SCRATCH_DEV od -t x1 -j $((68 * 1024)) -N $((956 * 1024)) $SCRATCH_DEV status=0 exit Reported-by: Vincent Petry <PVince81@yahoo.fr> Reported-by: Andrei Borzenkov <arvidjaar@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109341 Fixes: 499f377f49f0 (btrfs: iterate over unused chunk space in FITRIM) Cc: stable@vger.kernel.org # 4.3+ Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-01-07Btrfs: Check metadata redundancy on balanceSam Tygier1-0/+7
When converting a filesystem via balance check that metadata mode is at least as redundant as the data mode. For example give warning when: -dconvert=raid1 -mconvert=single Signed-off-by: Sam Tygier <samtygier@yahoo.co.uk> [ minor message reformatting ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07btrfs: cleanup, use enum values for btrfs_path readaDavid Sterba1-3/+3
Replace the integers by enums for better readability. The value 2 does not have any meaning since a717531942f488209dded30f6bc648167bcefa72 "Btrfs: do less aggressive btree readahead" (2009-01-22). Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07Btrfs: use linux/sizes.h to represent constantsByongho Lee1-8/+8
We use many constants to represent size and offset value. And to make code readable we use '256 * 1024 * 1024' instead of '268435456' to represent '256MB'. However we can make far more readable with 'SZ_256MB' which is defined in the 'linux/sizes.h'. So this patch replaces 'xxx * 1024 * 1024' kind of expression with single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is not a power of 2. And I haven't touched to '4096' & '8192' because it's more intuitive than 'SZ_4KB' & 'SZ_8KB'. Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07btrfs: cleanup, remove stray return statementsDavid Sterba1-1/+0
Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07btrfs: verbose error when we find an unexpected item in sys_arrayDavid Sterba1-0/+3
Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07btrfs: handle invalid num_stripes in sys_arrayDavid Sterba1-0/+8
We can handle the special case of num_stripes == 0 directly inside btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to catch other unhandled cases where we fail to validate external data. A crafted or corrupted image crashes at mount time: BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 7 /dev/loop0 BTRFS info (device loop0): disk space caching is enabled BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()! Kernel panic - not syncing: BUG! CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25 Stack: 637af890 60062489 602aeb2e 604192ba 60387961 00000011 637af8a0 6038a835 637af9c0 6038776b 634ef32b 00000000 Call Trace: [<6001c86d>] show_stack+0xfe/0x15b [<6038a835>] dump_stack+0x2a/0x2c [<6038776b>] panic+0x13e/0x2b3 [<6020f099>] btrfs_read_sys_array+0x25d/0x2ff [<601cfbbe>] open_ctree+0x192d/0x27af [<6019c2c1>] btrfs_mount+0x8f5/0xb9a [<600bc9a7>] mount_fs+0x11/0xf3 [<600d5167>] vfs_kern_mount+0x75/0x11a [<6019bcb0>] btrfs_mount+0x2e4/0xb9a [<600bc9a7>] mount_fs+0x11/0xf3 [<600d5167>] vfs_kern_mount+0x75/0x11a [<600d710b>] do_mount+0xa35/0xbc9 [<600d7557>] SyS_mount+0x95/0xc8 [<6001e884>] handle_syscall+0x6b/0x8e Reported-by: Jiri Slaby <jslaby@suse.com> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> CC: stable@vger.kernel.org # 3.19+ Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07btrfs: Support convert to -d dup for btrfs-convertZhao Lei1-8/+0
Since we will add support for -d dup for non-mixed filesystem, kernel need to support converting to this raid-type. This patch remove limitation of above case. Tested by following script: (combination of dup conversion with fsck): export TEST_DEV='/dev/vdc' export TEST_DIR='/var/ltf/tester/mnt' do_dup_test() { local m_from="$1" local d_from="$2" local m_to="$3" local d_to="$4" echo "Convert from -m $m_from -d $d_from to -m $m_to -d $d_to" umount "$TEST_DIR" &>/dev/null ./mkfs.btrfs -f -m "$m_from" -d "$d_from" "$TEST_DEV" >/dev/null || return 1 mount "$TEST_DEV" "$TEST_DIR" || return 1 cp -a /sbin/* "$TEST_DIR" [[ "$m_from" != "$m_to" ]] && { ./btrfs balance start -f -mconvert="$m_to" "$TEST_DIR" || return 1 } [[ "$d_from" != "$d_to" ]] && { local opt=() [[ "$d_to" == single ]] && opt+=("-f") ./btrfs balance start "${opt[@]}" -dconvert="$d_to" "$TEST_DIR" || return 1 } umount "$TEST_DIR" || return 1 ./btrfsck "$TEST_DEV" || return 1 echo return 0 } test_all() { for m_from in single dup; do for d_from in single dup; do for m_to in single dup; do for d_to in single dup; do do_dup_test "$m_from" "$d_from" "$m_to" "$d_to" || return 1 done done done done } test_all Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2015-12-24btrfs: fix warning on uninit variable in btrfs_finish_chunk_allocChris Mason1-1/+1
map->num_stripes really can't be zero, but just in case. Signed-off-by: Chris Mason <clm@fb.com>
2015-12-24Merge branch 'for-chris-4.5' of ↵Chris Mason1-2/+15
git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.5
2015-12-24Merge branch 'cleanup/misc-simplify' of ↵Chris Mason1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
2015-12-19Merge branch 'for-linus-4.4' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "A couple of small fixes" * 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: check prepare_uptodate_page() error code earlier Btrfs: check for empty bitmap list in setup_cluster_bitmaps btrfs: fix misleading warning when space cache failed to load Btrfs: fix transaction handle leak in balance Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list
2015-12-17Btrfs: fix race when finishing dev replace leading to transaction abortFilipe Manana1-2/+15
During the final phase of a device replace operation, I ran into a transaction abort that resulted in the following trace: [23919.655368] WARNING: CPU: 10 PID: 30175 at fs/btrfs/extent-tree.c:9843 btrfs_create_pending_block_groups+0x15e/0x1ab [btrfs]() [23919.664742] BTRFS: Transaction aborted (error -2) [23919.665749] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 parport psmouse acpi_cpufreq processor i2c_core evdev microcode pcspkr button serio_raw ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom virtio_scsi ata_generic ata_piix virtio_pci floppy virtio_ring libata e1000 virtio scsi_mod [last unloaded: btrfs] [23919.679442] CPU: 10 PID: 30175 Comm: fsstress Not tainted 4.3.0-rc5-btrfs-next-17+ #1 [23919.682392] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [23919.689151] 0000000000000000 ffff8804020cbb50 ffffffff812566f4 ffff8804020cbb98 [23919.692604] ffff8804020cbb88 ffffffff8104d0a6 ffffffffa03eea69 ffff88041b678a48 [23919.694230] ffff88042ac38000 ffff88041b678930 00000000fffffffe ffff8804020cbbf0 [23919.696716] Call Trace: [23919.698669] [<ffffffff812566f4>] dump_stack+0x4e/0x79 [23919.700597] [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8 [23919.701958] [<ffffffffa03eea69>] ? btrfs_create_pending_block_groups+0x15e/0x1ab [btrfs] [23919.703612] [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50 [23919.705047] [<ffffffffa03eea69>] btrfs_create_pending_block_groups+0x15e/0x1ab [btrfs] [23919.706967] [<ffffffffa0402097>] __btrfs_end_transaction+0x84/0x2dd [btrfs] [23919.708611] [<ffffffffa0402300>] btrfs_end_transaction+0x10/0x12 [btrfs] [23919.710099] [<ffffffffa03ef0b8>] btrfs_alloc_data_chunk_ondemand+0x121/0x28b [btrfs] [23919.711970] [<ffffffffa0413025>] btrfs_fallocate+0x7d3/0xc6d [btrfs] [23919.713602] [<ffffffff8108b78f>] ? lock_acquire+0x10d/0x194 [23919.714756] [<ffffffff81086dbc>] ? percpu_down_read+0x51/0x78 [23919.716155] [<ffffffff8116ef1d>] ? __sb_start_write+0x5f/0xb0 [23919.718918] [<ffffffff8116ef1d>] ? __sb_start_write+0x5f/0xb0 [23919.724170] [<ffffffff8116b579>] vfs_fallocate+0x170/0x1ff [23919.725482] [<ffffffff8117c1d7>] ioctl_preallocate+0x89/0x9b [23919.726790] [<ffffffff8117c5ef>] do_vfs_ioctl+0x406/0x4e6 [23919.728428] [<ffffffff81171175>] ? SYSC_newfstat+0x25/0x2e [23919.729642] [<ffffffff8118574d>] ? __fget_light+0x4d/0x71 [23919.730782] [<ffffffff8117c726>] SyS_ioctl+0x57/0x79 [23919.731847] [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f [23919.733330] ---[ end trace 166ef301a335832a ]--- This is due to a race between device replace and chunk allocation, which the following diagram illustrates: CPU 1 CPU 2 btrfs_dev_replace_finishing() at this point dev_replace->tgtdev->devid == BTRFS_DEV_REPLACE_DEVID (0ULL) ... btrfs_start_transaction() btrfs_commit_transaction() btrfs_fallocate() btrfs_alloc_data_chunk_ondemand() btrfs_join_transaction() --> starts a new transaction do_chunk_alloc() lock fs_info->chunk_mutex btrfs_alloc_chunk() --> creates extent map for the new chunk with em->bdev->map->stripes[i]->dev->devid == X (X > 0) --> extent map is added to fs_info->mapping_tree --> initial phase of bg A allocation completes unlock fs_info->chunk_mutex lock fs_info->chunk_mutex btrfs_dev_replace_update_device_in_mapping_tree() --> iterates fs_info->mapping_tree and replaces the device in every extent map's map->stripes[] with dev_replace->tgtdev, which still has an id of 0ULL (BTRFS_DEV_REPLACE_DEVID) btrfs_end_transaction() btrfs_create_pending_block_groups() --> starts final phase of bg A creation (update device, extent, and chunk trees, etc) btrfs_finish_chunk_alloc() btrfs_update_device() --> attempts to update a device item with ID == 0ULL (BTRFS_DEV_REPLACE_DEVID) which is the current ID of bg A's em->bdev->map->stripes[i]->dev->devid --> doesn't find such item returns -ENOENT --> the device id should have been X and not 0ULL got -ENOENT from btrfs_finish_chunk_alloc() and aborts current transaction finishes setting up the target device, namely it sets tgtdev->devid to the value of srcdev->devid, which is X (and X > 0) frees the srcdev unlock fs_info->chunk_mutex So fix this by taking the device list mutex when processing the chunk's extent map stripes to update the device items. This avoids getting the wrong device id and use-after-free problems if the task finishing a chunk allocation grabs the replaced device, which is freed while the dev replace task is holding the device list mutex. This happened while running fstest btrfs/071. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2015-12-10Btrfs: fix transaction handle leak in balanceFilipe Manana1-2/+1
If we fail to allocate a new data chunk, we were jumping to the error path without release the transaction handle we got before. Fix this by always releasing it before doing the jump. Fixes: 2c9fe8355258 ("btrfs: Fix lost-data-profile caused by balance bg") Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-12-07btrfs: remove a trivial helper btrfs_set_buffer_uptodateDavid Sterba1-2/+2
Signed-off-by: David Sterba <dsterba@suse.com>