summaryrefslogtreecommitdiff
path: root/fs/btrfs/volumes.h
AgeCommit message (Collapse)AuthorFilesLines
2011-01-16btrfs: fix wrong free space information of btrfsMiao Xie1-0/+3
When we store data by raid profile in btrfs with two or more different size disks, df command shows there is some free space in the filesystem, but the user can not write any data in fact, df command shows the wrong free space information of btrfs. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 28.00KB devid 1 size 5.01GB used 2.03GB path /dev/sda9 devid 2 size 10.00GB used 2.01GB path /dev/sda10 # btrfs device scan /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999 (fill the filesystem) # sync # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 3.99GB devid 1 size 5.01GB used 5.01GB path /dev/sda9 devid 2 size 10.00GB used 4.99GB path /dev/sda10 It is because btrfs cannot allocate chunks when one of the pairing disks has no space, the free space on the other disks can not be used for ever, and should be subtracted from the total space, but btrfs doesn't subtract this space from the total. It is strange to the user. This patch fixes it by calcing the free space that can be used to allocate chunks. Implementation: 1. get all the devices free space, and align them by stripe length. 2. sort the devices by the free space. 3. check the free space of the devices, 3.1. if it is not zero, and then check the number of the devices that has more free space than this device, if the number of the devices is beyond the min stripe number, the free space can be used, and add into total free space. if the number of the devices is below the min stripe number, we can not use the free space, the check ends. 3.2. if the free space is zero, check the next devices, goto 3.1 This implementation is just likely fake chunk allocation. After appling this patch, df can show correct space information: # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 0 100% /mnt Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16btrfs: make the chunk allocator utilize the devices betterMiao Xie1-0/+24
With this patch, we change the handling method when we can not get enough free extents with default size. Implementation: 1. Look up the suitable free extent on each device and keep the search result. If not find a suitable free extent, keep the max free extent 2. If we get enough suitable free extents with default size, chunk allocation succeeds. 3. If we can not get enough free extents, but the number of the extent with default size is >= min_stripes, we just change the mapping information (reduce the number of stripes in the extent map), and chunk allocation succeeds. 4. If the number of the extent with default size is < min_stripes, sort the devices by its max free extent's size descending 5. Use the size of the max free extent on the (num_stripes - 1)th device as the stripe size to allocate the device space By this way, the chunk allocator can allocate chunks as large as possible when the devices' space is not enough and make full use of the devices. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-14Btrfs: account for missing devices in RAID allocation profilesChris Mason1-0/+2
When we mount in RAID degraded mode without adding a new device to replace the failed one, we can end up using the wrong RAID flags for allocations. This results in strange combinations of block groups (raid1 in a raid10 filesystem) and corruptions when we try to allocate blocks from single spindle chunks on drives that are actually missing. The first device has two small 4MB chunks in it that mkfs creates and these are usually unused in a raid1 or raid10 setup. But, in -o degraded, the allocator will fall back to these because the mask of desired raid groups isn't correct. The fix here is to count the missing devices as we build up the list of devices in the system. This count is used when picking the raid level to make sure we continue using the same levels that were in place before we lost a drive. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-09-22Btrfs: make balance code choose more wisely when relocatingJosef Bacik1-0/+3
Currently, we can panic the box if the first block group we go to move is of a type where there is no space left to move those extents. For example, if we fill the disk up with data, and then we try to balance and we have no room to move the data nor room to allocate new chunks, we will panic. Change this by checking to see if we have room to move this chunk around, and if not, return -ENOSPC and move on to the next chunk. This will make sure we remove block groups that are moveable, like if we have alot of empty metadata block groups, and then that way we make room to be able to balance our data chunks as well. Tested this with an fs that would panic on btrfs-vol -b normally, but no longer panics with this patch. V1->V2: -actually search for a free extent on the device to make sure we can allocate a chunk if need be. -fix btrfs_shrink_device to make sure we actually try to relocate all the chunks, and then if we can't return -ENOSPC so if we are doing a btrfs-vol -r we don't remove the device with data still on it. -check to make sure the block group we are going to relocate isn't the last one in that particular space -fix a bug in btrfs_shrink_device where we would change the device's size and not fix it if we fail to do our relocate Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-06-10Btrfs: avoid races between super writeout and device list updatesChris Mason1-1/+6
On multi-device filesystems, btrfs writes supers to all of the devices before considering a sync complete. There wasn't any additional locking between super writeout and the device list management code because device management was done inside a transaction and super writeout only happened with no transation writers running. With the btrfs fsync log and other async transaction updates, this has been racey for some time. This adds a mutex to protect the device list. The existing volume mutex could not be reused due to transaction lock ordering requirements. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-06-10Btrfs: autodetect SSD devicesChris Mason1-0/+5
During mount, btrfs will check the queue nonrot flag for all the devices found in the FS. If they are all non-rotating, SSD mode is enabled by default. If the FS was mounted with -o nossd, the non-rotating flag is ignored. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-04-27Btrfs: When shrinking, only update disk size on successChris Ball1-0/+3
Previously, we updated a device's size prior to attempting a shrink operation. This patch moves the device resizing logic to only happen if the shrink completes successfully. In the process, it introduces a new field to btrfs_device -- disk_total_bytes -- to track the on-disk size. Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-04-20Btrfs: use WRITE_SYNC for synchronous writesChris Mason1-2/+11
Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for writes we plan on waiting on in the near future. This patch mirrors recent changes in other filesystems and the generic code to use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for other latency critical writes. Btrfs uses async worker threads for checksumming before the write is done, and then again to actually submit the bios. The bio submission code just runs a per-device list of bios that need to be sent down the pipe. This list is split into low priority and high priority lists so the WRITE_SYNC IO happens first. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-04-03Btrfs: fix typos in commentsWu Fengguang1-1/+1
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-12Btrfs: shared seed deviceYan Zheng1-2/+1
This patch makes seed device possible to be shared by multiple mounted file systems. The sharing is achieved by cloning seed device's btrfs_fs_devices structure. Thanks you, Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2008-12-09Btrfs: superblock duplicationYan Zheng1-2/+4
This patch implements superblock duplication. Superblocks are stored at offset 16K, 64M and 256G on every devices. Spaces used by superblocks are preserved by the allocator, which uses a reverse mapping function to find the logical addresses that correspond to superblocks. Thank you, Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2008-12-02Btrfs: corret fmode_t annotationsChristoph Hellwig1-2/+2
Make sure to propagate fmode_t properly and use the right constants for it. Signed-off-by: Christoph Hellwig <hch@lst.de>
2008-11-20Btrfs: Fixes for 2.6.28-rc API changesChris Mason1-0/+3
* open/close_bdev_excl -> open/close_bdev_exclusive * blkdev_issue_discard takes a GFP mask now * Fix blkdev_issue_discard usage now that it is enabled Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-11-18Btrfs: Seed device supportYan Zheng1-6/+14
Seed device is a special btrfs with SEEDING super flag set and can only be mounted in read-only mode. Seed devices allow people to create new btrfs on top of it. The new FS contains the same contents as the seed device, but it can be mounted in read-write mode. This patch does the following: 1) split code in btrfs_alloc_chunk into two parts. The first part does makes the newly allocated chunk usable, but does not do any operation that modifies the chunk tree. The second part does the the chunk tree modifications. This division is for the bootstrap step of adding storage to the seed device. 2) Update device management code to handle seed device. The basic idea is: For an FS grown from seed devices, its seed devices are put into a list. Seed devices are opened on demand at mounting time. If any seed device is missing or has been changed, btrfs kernel module will refuse to mount the FS. 3) make btrfs_find_block_group not return NULL when all block groups are read-only. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2008-09-25Btrfs: Fix the multi-bio code to save the original bio for completionChris Mason1-0/+1
The multi-bio code is responsible for duplicating blocks in raid1 and single spindle duplication. It has counters to make sure all of the locations for a given extent are properly written before io completion is returned to the higher layers. But, it didn't always complete the same bio it was given, sometimes a clone was completed instead. This lead to problems with the async work queues because they saved a pointer to the bio in a struct off bi_private. The fix is to remember the original bio and only complete that one. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add async worker threads for pre and post IO checksummingChris Mason1-3/+7
Btrfs has been using workqueues to spread the checksumming load across other CPUs in the system. But, workqueues only schedule work on the same CPU that queued the work, giving them a limited benefit for systems with higher CPU counts. This code adds a generic facility to schedule work with pools of kthreads, and changes the bio submission code to queue bios up. The queueing is important to make sure large numbers of procs on the system don't turn streaming workloads into random workloads by sending IO down concurrently. The end result of all of this is much higher performance (and CPU usage) when doing checksumming on large machines. Two worker pools are created, one for writes and one for endio processing. The two could deadlock if we tried to service both from a single pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Fix btrfs_open_devices to deal with changes since the scan ioctlsChris Mason1-2/+2
Devices can change after the scan ioctls are done, and btrfs_open_devices needs to be able to verify them as they are opened and used by the FS. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add mount -o degraded to allow mounts to continue with missing devicesChris Mason1-0/+3
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add support for online device removalChris Mason1-0/+3
This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Deal with failed writes in mirrored configurationsChris Mason1-1/+2
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add balance ioctl to restripe the chunksChris Mason1-0/+1
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add new ioctl to add devicesChris Mason1-0/+1
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Make the resizer work based on shrinking and growing devicesChris Mason1-0/+5
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add a special device list for chunk allocationsChris Mason1-0/+7
This allows other code that needs to walk every device in the FS to do so without locking against allocations. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Make an unplug function that doesn't unplug every spindleChris Mason1-0/+2
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add chunk uuids and update multi-device back referencesChris Mason1-2/+4
Block headers now store the chunk tree uuid Chunk items records the device uuid for each stripes Device extent items record better back refs to the chunk tree Block groups record better back refs to the chunk tree The chunk tree format has also changed. The objectid of BTRFS_CHUNK_ITEM_KEY used to be the logical offset of the chunk. Now it is a chunk tree id, with the logical offset being stored in the offset field of the key. This allows a single chunk tree to record multiple logical address spaces, upping the number of bytes indexed by a chunk tree from 2^64 to 2^128. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Write out all super blocks on commit, and bring back proper barrier ↵Chris Mason1-0/+3
support Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Retry metadata reads in the face of checksum failuresChris Mason1-2/+4
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Change btrfs_map_block to return a structure with mappings for all stripesChris Mason1-3/+22
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add support for mirroring across drivesChris Mason1-5/+6
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add support for device scanning and detection ioctlsChris Mason1-0/+25
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Bring back mount -o ssd optimizationsChris Mason1-0/+3
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Move device information into the super block so it can be scannedChris Mason1-12/+1
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Dynamic chunk and block group allocationChris Mason1-1/+1
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add support for multiple devices per filesystemChris Mason1-0/+78
Signed-off-by: Chris Mason <chris.mason@oracle.com>