summaryrefslogtreecommitdiff
path: root/fs/btrfs/disk-io.c
AgeCommit message (Collapse)AuthorFilesLines
2013-06-14Btrfs: make the snap/subv deletion end more early when the fs is R/OMiao Xie1-13/+2
The snapshot/subvolume deletion might spend lots of time, it would make the remount task wait for a long time. This patch improve this problem, we will break the deletion if the fs is remounted to be R/O. It will make the users happy. Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: move the R/O check out of btrfs_clean_one_deleted_snapshot()Miao Xie1-0/+9
If the fs is remounted to be R/O, it is unnecessary to call btrfs_clean_one_deleted_snapshot(), so move the R/O check out of this function. And besides that, it can make the check logic in the caller more clear. Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: make the cleaner complete early when the fs is going to be umountedMiao Xie1-5/+7
Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: remove unnecessary ->s_umount in cleaner_kthread()Miao Xie1-12/+28
In order to avoid the R/O remount, we acquired ->s_umount lock during we deleted the dead snapshots and subvolumes. But it is unnecessary, because we have cleaner_mutex. We use cleaner_mutex to protect the process of the dead snapshots/subvolumes deletion. And when we remount the fs to be R/O, we also acquire this mutex to do cleanup after we change the status of the fs. That is this lock can serialize the above operations, the cleaner can be aware of the status of the fs, and if the cleaner is deleting the dead snapshots/subvolumes, the remount task will wait for it. So it is safe to remove ->s_umount in cleaner_kthread(). Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: cleanup, btrfs_read_fs_root_no_name() doesn't return NULLStefan Behrens1-2/+0
No need to check for NULL in send.c and disk-io.c. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: introduce qgroup_ulist to avoid frequently allocating/freeing ulistWang Shilong1-0/+1
When doing qgroup accounting, we call ulist_alloc()/ulist_free() every time when we want to walk qgroup tree. By introducing 'qgroup_ulist', we only need to call ulist_alloc()/ulist_free() once. This reduce some sys time to allocate memory, see the measurements below fsstress -p 4 -n 10000 -d $dir With this patch: real 0m50.153s user 0m0.081s sys 0m6.294s real 0m51.113s user 0m0.092s sys 0m6.220s real 0m52.610s user 0m0.096s sys 0m6.125s avg 6.213 ----------------------------------------------------- Without the patch: real 0m54.825s user 0m0.061s sys 0m10.665s real 1m6.401s user 0m0.089s sys 0m11.218s real 1m13.768s user 0m0.087s sys 0m10.665s avg 10.849 we can see the sys time reduce ~43%. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-14Btrfs: fix check on same raid type flag twiceHenrik Nordvik1-1/+1
Code checked for raid 5 flag in two else-if branches, so code would never be reached. Probably a copy-paste bug. Signed-off-by: Henrik Nordvik <henrikno@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-08Btrfs: stop all workers before cleaning up rootsJosef Bacik1-3/+3
Dave reported a panic because the extent_root->commit_root was NULL in the caching kthread. That is because we just unset it in free_root_pointers, which is not the correct thing to do, we have to either wait for the caching kthread to complete or hold the extent_commit_sem lock so we know the thread has exited. This patch makes the kthreads all stop first and then we do our cleanup. This should fix the race. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-08Btrfs: fix use-after-free bug during umountLiu Bo1-2/+2
Commit be283b2e674a09457d4563729015adb637ce7cc1 ( Btrfs: use helper to cleanup tree roots) introduced the following bug, BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 IP: [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] [...] Pid: 2463, comm: btrfs-cache-1 Tainted: G O 3.9.0+ #4 innotek GmbH VirtualBox/VirtualBox RIP: 0010:[<ffffffffa039368c>] [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730) [...] Call Trace: [<ffffffffa0398a99>] btrfs_search_slot+0x104/0x64d [btrfs] [<ffffffffa039aea4>] btrfs_next_old_leaf+0xa7/0x334 [btrfs] [<ffffffffa039b141>] btrfs_next_leaf+0x10/0x12 [btrfs] [<ffffffffa039ea13>] caching_thread+0x1a3/0x2e0 [btrfs] [<ffffffffa03d8811>] worker_loop+0x14b/0x48e [btrfs] [<ffffffffa03d86c6>] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [<ffffffff81068d3d>] kthread+0x8d/0x95 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 [<ffffffff8151e5ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 RIP [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] We've free'ed commit_root before actually getting to free block groups where caching thread needs valid extent_root->commit_root. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-06-08Btrfs: don't delete fs_roots until after we cleanup the transactionJosef Bacik1-1/+1
We get a use after free if we had a transaction to cleanup since there could be delayed inodes which refer to their respective fs_root. Thanks Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-05-22mm: change invalidatepage prototype to accept lengthLukas Czerner1-1/+2
Currently there is no way to truncate partial page where the end truncate point is not at the end of the page. This is because it was not needed and the functionality was enough for file system truncate operation to work properly. However more file systems now support punch hole feature and it can benefit from mm supporting truncating page just up to the certain point. Specifically, with this functionality truncate_inode_pages_range() can be changed so it supports truncating partial page at the end of the range (currently it will BUG_ON() if 'end' is not at the end of the page). This commit changes the invalidatepage() address space operation prototype to accept range to be invalidated and update all the instances for it. We also change the block_invalidatepage() in the same way and actually make a use of the new length argument implementing range invalidation. Actual file system implementations will follow except the file systems where the changes are really simple and should not change the behaviour in any way .Implementation for truncate_page_range() which will be able to accept page unaligned ranges will follow as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com>
2013-05-18Merge branch 'for-chris' of ↵Chris Mason1-21/+29
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next
2013-05-18Btrfs: use a btrfs bioset instead of abusing bio internalsChris Mason1-1/+1
Btrfs has been pointer tagging bi_private and using bi_bdev to store the stripe index and mirror number of failed IOs. As bios bubble back up through the call chain, we use these to decide if and how to retry our IOs. They are also used to count IO failures on a per device basis. Recently a bio tracepoint was added lead to crashes because we were abusing bi_bdev. This commit adds a btrfs bioset, and creates explicit fields for the mirror number and stripe index. The plan is to extend this structure for all of the fields currently in struct btrfs_bio, which will mean one less kmalloc in our IO path. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Tejun Heo <tj@kernel.org>
2013-05-18Btrfs: make sure roots are assigned before freeing their nodesJosef Bacik1-18/+21
If we fail to load the chunk tree we'll call free_root_pointers, except we may not have assigned the roots for the dev_root/extent_root/csum_root yet, so we could NULL pointer deref at this point. Just add checks to make sure these roots are set to keep us from panicing. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-18Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock contextMiao Xie1-0/+6
btrfs_invalidate_inodes() may sleep, so we should not invoke it in the spin lock context. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-18Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix()Miao Xie1-1/+0
We have checked if ->node is NULL or not, so it is unnecessary to use BUG_ON() to check again. Remove it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-18Btrfs: don't null pointer deref on abortJosef Bacik1-1/+1
I'm sorry, theres no excuse for this sort of work. We need to use root->leafsize since eb may be NULL. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-18btrfs: annotate quota tree for lockdepDavid Sterba1-1/+1
Quota tree has been missing from lockdep annotations, though no warning has been seen in the wild. There's currently one entry that does not belong there, BTRFS_ORPHAN_OBJECTID. No such tree exists, it's probably a copy & paste mistake, the id is defined among tree ids. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-07Btrfs: allow superblock mismatch from older mkfsChris Mason1-0/+5
We've added new checks to make sure the super block crc is correct during mount. A fresh filesystem from an older mkfs won't have the crc set. This adds a warning when it finds a newly created filesystem but doesn't fail the mount. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-05-07btrfs: enhance superblock checksDavid Sterba1-15/+67
The superblock checksum is not verified upon mount. <awkward silence> Add that check and also reorder existing checks to a more logical order. Current mkfs.btrfs does not calculate the correct checksum of super_block and thus a freshly created filesytem will fail to mount when this patch is applied. First transaction commit calculates correct superblock checksum and saves it to disk. Reproducer: $ mfks.btrfs /dev/sda $ mount /dev/sda /mnt $ btrfs scrub start /mnt $ sleep 5 $ btrfs scrub status /mnt ... super:2 ... Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-05-06btrfs: remove unused gfp mask parameter from release_extent_buffer callchainDavid Sterba1-7/+1
It's unused since 0b32f4bbb423f02ac. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06btrfs: make static code static & remove dead codeEric Sandeen1-38/+5
Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: deal with errors in write_dev_supersJosef Bacik1-1/+11
If you try to mount -o loop a restored file system it will panic if the file ends up being smaller than the original disk. This is because we go to try and get a block for a super that may be past the EOF which makes __getblk return NULL for a buffer head when we aren't expecting it to. Fix this by dealing with this case and just jacking up the errors count. With this patch we no longer panic when mounting a restored file system loopback. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: rescan for qgroupsJan Schmidt1-0/+5
If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: separate sequence numbers for delayed ref tracking and tree mod logJan Schmidt1-1/+1
Sequence numbers for delayed refs have been introduced in the first version of the qgroup patch set. To solve the problem of find_all_roots on a busy file system, the tree mod log was introduced. The sequence numbers for that were simply shared between those two users. However, at one point in qgroup's quota accounting, there's a statement accessing the previous sequence number, that's still just doing (seq - 1) just as it would have to in the very first version. To satisfy that requirement, this patch makes the sequence number counter 64 bit and splits it into a major part (used for qgroup sequence number counting) and a minor part (incremented for each tree modification in the log). This enables us to go exactly one major step backwards, as required for qgroups, while still incrementing the sequence counter for tree mod log insertions to keep track of their order. Keeping them in a single variable means there's no need to change all the code dealing with comparisons of two sequence numbers. The sequence number is reset to 0 on commit (not new in this patch), which ensures we won't overflow the two 32 bit counters. Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs from the tree mod log code may happen. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: set UUID in root_item for created treesStefan Behrens1-0/+4
It is a rare exception that a new tree is created, like the qgroups tree. So far these new trees have an all-zero UUID in their root items. All trees that mkfs.btrfs has created get an UUID during the first mount when btrfs_read_root_item() rewrites the root_item to the v2 structure style. These UUID are never used so far, but anyway, since it is better to have it uniform for all trees, this commit adds some lines that generate and write an UUID for newly created trees. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: delete unused parameter to btrfs_read_root_item()Stefan Behrens1-1/+1
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: various abort cleanupsJosef Bacik1-4/+5
I have a broken file system that when it aborts leaves all sorts of accounting things wrong and gives you lots of WARN_ON()'s other than the abort. This is because we're not cleaning up various parts of the file system when we abort. The first chunks are specific to mount failures, we weren't cleaning up the block group cached inodes and we weren't cleaning up any transactions that had been aborted, which leaves a bunch of things laying around. The second half of this are related to the cleanup parts. First we don't need to release space for the dirty pages from the trans_block_rsv, that's all handled by the trans handles so this is just plain wrong. The other thing is we need to pin down extents that were set ->must_insert_reserved for delayed refs. This isn't so much for the pinning but more for the cleaning up the cache->reserved counter since we are no longer going to use those reserved bytes. With this patch I no longer see a bunch of WARN_ON()'s when I try to mount this broken file system, just the initial one from the abort. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: cleanup destroy_marked_extentsJosef Bacik1-31/+9
We can just look up the extent_buffers for the range and free stuff that way. This makes the cleanup a bit cleaner and we can make sure to evict the extent_buffers pretty quickly by marking them as stale. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: cleanup fs roots if we fail to mountJosef Bacik1-31/+31
We can run the tree logging recovery or the orphan cleanup on mount, so we'll end up looking up a random fs tree in the meantime. So we need to clean this up so we don't leave extent buffers hanging around on the cache. With this patch we no longer leak extent buffers on failure to mount. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: fix all callers of read_tree_blockJosef Bacik1-2/+17
We kept leaking extent buffers when mounting a broken file system and it turns out it's because not everybody uses read_tree_block properly. You need to check and make sure the extent_buffer is uptodate before you use it. This patch fixes everybody who calls read_tree_block directly to make sure they check that it is uptodate and free it and return an error if it is not. With this we no longer leak EB's when things go horribly wrong. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: add tree block level sanity checkJosef Bacik1-0/+6
With a users corrupted fs I was getting weird behavior and panics and it turns out it was because one of his tree blocks had a bogus header level. So add this to the sanity checks in the endio handler for tree blocks. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: don't call readahead hook until we have read the entire ebJosef Bacik1-3/+2
Martin Steigerwald reported a BUG_ON() where we were given a bogus bytenr to map. Turns out he is using > PAGESIZE leafsizes. The readahead stuff is called every time we do a completion, but we may not have finished reading in all the pages, so the bytenr we read off the node could be completely bogus. Fix this by only calling the readahead hook once all pages have been read in. Thanks, Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: don't force pages under writeback to finish when abortingJosef Bacik1-3/+2
Dave reported a BUG_ON() that happened in end_page_writeback() after an abort. This happened because we unconditionally call end_page_writeback() in the endio case, which is right. However when we abort the transaction we will call end_page_writeback() on any writeback pages we find, which is wrong. We need to lock the page and wait on page writeback to complete if it is. There is nothing unsafe about this since we are discarding the transaction anyway. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: use a lock to protect incompat/compat flag of the super blockMiao Xie1-0/+5
The following case will make the incompat/compat flag of the super block be recovered. Task1 |Task2 flags = btrfs_super_incompat_flags(); | |flags = btrfs_super_incompat_flags(); flags |= new_flag1; | |flags |= new_flag2; btrfs_set_super_incompat_flags(flags); | |btrfs_set_super_incompat_flags(flags); the new_flag1 is recovered. In order to avoid this problem, we introduce a lock named super_lock into the btrfs_fs_info structure. If we want to update incompat/compat flags of the super block, we must hold it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: introduce a mutex lock for btrfs quota operationsWang Shilong1-0/+1
The original code has one spin_lock 'qgroup_lock' to protect quota configurations in memory. If we want to add a BTRFS_QGROUP_INFO_KEY, it will be added to Btree firstly, and then update configurations in memory,however, a race condition may happen between these operations. For example: ->add_qgroup_info_item() ->add_qgroup_rb() For the above case, del_qgroup_info_item() may happen just before add_qgroup_rb(). What's worse, when we want to add a qgroup relation: ->add_qgroup_relation_item() ->add_qgroup_relations() We don't have any checks whether 'src' and 'dst' exist before add_qgroup_relation_item(), a race condition can also happen for the above case. To avoid race condition and have all the necessary checks, we introduce a mutex lock 'qgroup_ioctl_lock', and we make all the user change operations protected by the mutex lock. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: fix bad extent loggingJosef Bacik1-1/+1
A user sent me a btrfs-image of a file system that was panicing on mount during the log recovery. I had originally thought these problems were from a bug in the free space cache code, but that was just a symptom of the problem. The problem is if your application does something like this [prealloc][prealloc][prealloc] the internal extent maps will merge those all together into one extent map, even though on disk they are 3 separate extents. So if you go to write into one of these ranges the extent map will be right since we use the physical extent when doing the write, but when we log the extents they will use the wrong sizes for the remainder prealloc space. If this doesn't happen to trip up the free space cache (which it won't in a lot of cases) then you will get bogus entries in your extent tree which will screw stuff up later. The data and such will still work, but everything else is broken. This patch fixes this by not allowing extents that are on the modified list to be merged. This has the side effect that we are no longer adding everything to the modified list all the time, which means we now have to call btrfs_drop_extents every time we log an extent into the tree. So this allows me to drop all this speciality code I was using to get around calling btrfs_drop_extents. With this patch the testcase I've created no longer creates a bogus file system after replaying the log. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06btrfs: clean snapshots one by oneDavid Sterba1-6/+11
Each time pick one dead root from the list and let the caller know if it's needed to continue. This should improve responsiveness during umount and balance which at some point waits for cleaning all currently queued dead roots. A new dead root is added to the end of the list, so the snapshots disappear in the order of deletion. The snapshot cleaning work is now done only from the cleaner thread and the others wake it if needed. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: share stop worker codeLiu Bo1-32/+23
Share the exactly same code of stopping workers. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: add a incompatible format change for smaller metadata extent refsJosef Bacik1-0/+3
We currently store the first key of the tree block inside the reference for the tree block in the extent tree. This takes up quite a bit of space. Make a new key type for metadata which holds the level as the offset and completely removes storing the btrfs_tree_block_info inside the extent ref. This reduces the size from 51 bytes to 33 bytes per extent reference for each tree block. In practice this results in a 30-35% decrease in the size of our extent tree, which means we COW less and can keep more of the extent tree in memory which makes our heavy metadata operations go much faster. This is not an automatic format change, you must enable it at mkfs time or with btrfstune. This patch deals with having metadata stored as either the old format or the new format so it is easy to convert. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: use helper to cleanup tree rootsLiu Bo1-14/+1
free_root_pointers() has been introduced to cleanup all of tree roots, so just use it instead. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06Btrfs: cleanup unused arguments of btrfs_csum_dataLiu Bo1-3/+3
Argument 'root' is no more used in btrfs_csum_data(). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-22Btrfs: fix memory leak in btrfs_create_tree()Tsutomu Itoh1-3/+9
We should free leaf and root before returning from the error handling code. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-22Btrfs: update to use fs_state bitLiu Bo1-1/+1
Now that we use bit operation to check fs_state, update btrfs_free_fs_root()'s checker, otherwise we get back to memory leak case. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-05Btrfs: fix wrong handle at error path of create_snapshot() when the commit failsMiao Xie1-9/+7
There are several bugs at error path of create_snapshot() when the transaction commitment failed. - access the freed transaction handler. At the end of the transaction commitment, the transaction handler was freed, so we should not access it after the transaction commitment. - we were not aware of the error which happened during the snapshot creation if we submitted a async transaction commitment. - pending snapshot access vs pending snapshot free. when something wrong happened after we submitted a async transaction commitment, the transaction committer would cleanup the pending snapshots and free them. But the snapshot creators were not aware of it, they would access the freed pending snapshots. This patch fixes the above problems by: - remove the dangerous code that accessed the freed handler - assign ->error if the error happens during the snapshot creation - the transaction committer doesn't free the pending snapshots, just assigns the error number and evicts them before we unblock the transaction. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01btrfs: try harder to allocate raid56 stripe cacheDavid Sterba1-1/+1
The stripe hash table is large, starting with allocation order 4 and can go as high as order 7 in case lock debugging is turned on and structure padding happens. Observed mount failure: mount: page allocation failure: order:7, mode:0x200050 Pid: 8234, comm: mount Tainted: G W 3.8.0-default+ #267 Call Trace: [<ffffffff81114353>] warn_alloc_failed+0xf3/0x140 [<ffffffff811171d2>] ? __alloc_pages_direct_compact+0x92/0x250 [<ffffffff81117ac3>] __alloc_pages_nodemask+0x733/0x9d0 [<ffffffff81152878>] ? cache_alloc_refill+0x3f8/0x840 [<ffffffff811528bc>] cache_alloc_refill+0x43c/0x840 [<ffffffff811302eb>] ? is_kernel_percpu_address+0x4b/0x90 [<ffffffffa00a00ac>] ? btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs] [<ffffffff811531d7>] kmem_cache_alloc_trace+0x247/0x270 [<ffffffffa00a00ac>] btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs] [<ffffffffa003133f>] open_ctree+0xb2f/0x1f90 [btrfs] [<ffffffff81397289>] ? string+0x49/0xe0 [<ffffffff813987b3>] ? vsnprintf+0x443/0x5d0 [<ffffffffa0007cb6>] btrfs_mount+0x526/0x600 [btrfs] [<ffffffff8115127c>] ? cache_alloc_debugcheck_after+0x4c/0x200 [<ffffffff81162b90>] mount_fs+0x20/0xe0 [<ffffffff8117db26>] vfs_kern_mount+0x76/0x120 [<ffffffff811801b6>] do_mount+0x386/0x980 [<ffffffff8112a5cb>] ? strndup_user+0x5b/0x80 [<ffffffff81180840>] sys_mount+0x90/0xe0 [<ffffffff81962e99>] system_call_fastpath+0x16/0x1b Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28Btrfs: fix memory leak of log rootsLiu Bo1-0/+5
When we abort a transaction while fsyncing, we'll skip freeing log roots part of committing a transaction, which leads to memory leak. This adds a 'free log roots' in putting super when no more users hold references on log roots, so it's safe and clean. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Merge branch 'raid56-experimental' into for-linus-3.9Chris Mason1-9/+53
Signed-off-by: Chris Mason <chris.mason@fusionio.com> Conflicts: fs/btrfs/ctree.h fs/btrfs/extent-tree.c fs/btrfs/inode.c fs/btrfs/volumes.c
2013-02-20Merge branch 'master' of ↵Chris Mason1-79/+81
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9 Signed-off-by: Chris Mason <chris.mason@fusionio.com> Conflicts: fs/btrfs/disk-io.c
2013-02-20btrfs: define BTRFS_MAGIC as a u64 valueZach Brown1-4/+2
super.magic is an le64 but it's treated as an unterminated string when compared against BTRFS_MAGIC which is defined as a string. Instead define BTRFS_MAGIC as a normal hex value and use endian helpers to compare it to the super's magic. I tested this by mounting an fs made before the change and made sure that it didn't introduce sparse errors. This matches a similar cleanup that is pending in btrfs-progs. David Sterba pointed out that we should fix the kernel side as well :). Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>