summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)AuthorFilesLines
2014-01-28Merge branch 'for-linus' of ↵Linus Torvalds6-181/+52
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus assorted cleanups and fixes all over the place... There will be another pile later this week" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits) __dentry_path() fixes vfs: Remove second variable named error in __dentry_path vfs: Is mounted should be testing mnt_ns for NULL or error. Fix race when checking i_size on direct i/o read hfsplus: remove can_set_xattr nfsd: use get_acl and ->set_acl fs: remove generic_acl nfs: use generic posix ACL infrastructure for v3 Posix ACLs gfs2: use generic posix ACL infrastructure jfs: use generic posix ACL infrastructure xfs: use generic posix ACL infrastructure reiserfs: use generic posix ACL infrastructure ocfs2: use generic posix ACL infrastructure jffs2: use generic posix ACL infrastructure hfsplus: use generic posix ACL infrastructure f2fs: use generic posix ACL infrastructure ext2/3/4: use generic posix ACL infrastructure btrfs: use generic posix ACL infrastructure fs: make posix_acl_create more useful fs: make posix_acl_chmod more useful ...
2014-01-26btrfs: use generic posix ACL infrastructureChristoph Hellwig5-135/+28
Also don't bother to set up a .get_acl method for symlinks as we do not support access control (ACLs or even mode bits) for symlinks in Linux. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-26fs: make posix_acl_create more usefulChristoph Hellwig1-1/+1
Rename the current posix_acl_created to __posix_acl_create and add a fully featured helper to set up the ACLs on file creation that uses get_acl(). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-26fs: make posix_acl_chmod more usefulChristoph Hellwig1-1/+1
Rename the current posix_acl_chmod to __posix_acl_chmod and add a fully featured ACL chmod helper that uses the ->set_acl inode operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25btrfs: sanitize BTRFS_IOC_FILE_EXTENT_SAMEAl Viro1-46/+24
* don't assume that ->dest_count won't change between copy_from_user() and memdup_user() * use fdget instead of fget * don't bother comparing superblocks when we'd already compared vfsmounts * get rid of excessive goto * use file_inode() instead of open-coding the sucker Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-23Merge tag 'xfs-for-linus-v3.14-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds1-2/+6
Pull xfs update from Ben Myers: "This is primarily bug fixes, many of which you already have. New stuff includes a series to decouple the in-memory and on-disk log format, helpers in the area of inode clusters, and i_version handling. We decided to try to use more topic branches this release, so there are some merge commits in there on account of that. I'm afraid I didn't do a good job of putting meaningful comments in the first couple of merges. Sorry about that. I think I have the hang of it now. For 3.14-rc1 there are fixes in the areas of remote attributes, discard, growfs, memory leaks in recovery, directory v2, quotas, the MAINTAINERS file, allocation alignment, extent list locking, and in xfs_bmapi_allocate. There are cleanups in xfs_setsize_buftarg, removing unused macros, quotas, setattr, and freeing of inode clusters. The in-memory and on-disk log format have been decoupled, a common helper to calculate the number of blocks in an inode cluster has been added, and handling of i_version has been pulled into the filesystems that use it. - cleanup in xfs_setsize_buftarg - removal of remaining unused flags for vop toss/flush/flushinval - fix for memory corruption in xfs_attrlist_by_handle - fix for out-of-date comment in xfs_trans_dqlockedjoin - fix for discard if range length is less than one block - fix for overrun of agfl buffer using growfs on v4 superblock filesystems - pull i_version handling out into the filesystems that use it - don't leak recovery items on error - fix for memory leak in xfs_dir2_node_removename - several cleanups for quotas - fix bad assertion in xfs_qm_vop_create_dqattach - cleanup for xfs_setattr_mode, and add xfs_setattr_time - fix quota assert in xfs_setattr_nonsize - fix an infinite loop when turning off group/project quota before user quota - fix for temporary buffer allocation failure in xfs_dir2_block_to_sf with large directory block sizes - fix Dave's email address in MAINTAINERS - cleanup calculation of freed inode cluster blocks - fix alignment of initial file allocations to match filesystem geometry - decouple in-memory and on-disk log format - introduce a common helper to calculate the number of filesystem blocks in an inode cluster - fixes for extent list locking - fix for off-by-one in xfs_attr3_rmt_verify - fix for missing destroy_work_on_stack in xfs_bmapi_allocate" * tag 'xfs-for-linus-v3.14-rc1' of git://oss.sgi.com/xfs/xfs: (51 commits) xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() xfs: fix off-by-one error in xfs_attr3_rmt_verify xfs: assert that we hold the ilock for extent map access xfs: use xfs_ilock_attr_map_shared in xfs_attr_list_int xfs: use xfs_ilock_attr_map_shared in xfs_attr_get xfs: use xfs_ilock_data_map_shared in xfs_qm_dqiterate xfs: use xfs_ilock_data_map_shared in xfs_qm_dqtobp xfs: take the ilock around xfs_bmapi_read in xfs_zero_remaining_bytes xfs: reinstate the ilock in xfs_readdir xfs: add xfs_ilock_attr_map_shared xfs: rename xfs_ilock_map_shared xfs: remove xfs_iunlock_map_shared xfs: no need to lock the inode in xfs_find_handle xfs: use xfs_icluster_size_fsb in xfs_imap xfs: use xfs_icluster_size_fsb in xfs_ifree_cluster xfs: use xfs_icluster_size_fsb in xfs_ialloc_inode_init xfs: use xfs_icluster_size_fsb in xfs_bulkstat xfs: introduce a common helper xfs_icluster_size_fsb xfs: get rid of XFS_IALLOC_BLOCKS macros xfs: get rid of XFS_INODE_CLUSTER_SIZE macros ...
2014-01-07treewide: fix comments and printk msgsMasanari Iida1-1/+1
This patch fixed several typo in printk from various part of kernel source. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-12-19treewide: Fix typos in printkMasanari Iida1-1/+1
Correct spelling typo in various part of kernel Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-12-13Merge branch 'for-linus' of ↵Linus Torvalds5-42/+73
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is a small collection of fixes. It was rebased this morning, but I was just fixing signed-off-by tags with the wrong email" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix access_ok() check in btrfs_ioctl_send() Btrfs: make sure we cleanup all reloc roots if error happens Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation Btrfs: fix an oops when doing balance relocation Btrfs: don't miss skinny extent items on delayed ref head contention btrfs: call mnt_drop_write after interrupted subvol deletion Btrfs: don't clear the default compression type
2013-12-12Btrfs: fix access_ok() check in btrfs_ioctl_send()Dan Carpenter1-2/+2
The closing parenthesis is in the wrong place. We want to check "sizeof(*arg->clone_sources) * arg->clone_sources_count" instead of "sizeof(*arg->clone_sources * arg->clone_sources_count)". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> cc: stable@vger.kernel.org
2013-12-12Btrfs: make sure we cleanup all reloc roots if error happensWang Shilong1-0/+7
I hit an oops when merging reloc roots fails, the reason is that new reloc roots may be added and we should make sure we cleanup all reloc roots. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2013-12-12Btrfs: skip building backref tree for uuid and quota tree when doing balance ↵Wang Shilong1-1/+3
relocation Quota tree and UUID Tree is only cowed, they can not be snapshoted. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2013-12-12Btrfs: fix an oops when doing balance relocationWang Shilong1-23/+47
I hit an oops when inserting reloc root into @reloc_root_tree(it can be easily triggered when forcing cow for relocation root) [ 866.494539] [<ffffffffa0499579>] btrfs_init_reloc_root+0x79/0xb0 [btrfs] [ 866.495321] [<ffffffffa044c240>] record_root_in_trans+0xb0/0x110 [btrfs] [ 866.496109] [<ffffffffa044d758>] btrfs_record_root_in_trans+0x48/0x80 [btrfs] [ 866.496908] [<ffffffffa0494da8>] select_reloc_root+0xa8/0x210 [btrfs] [ 866.497703] [<ffffffffa0495c8a>] do_relocation+0x16a/0x540 [btrfs] This is because reloc root inserted into @reloc_root_tree is not within one transaction,reloc root may be cowed and root block bytenr will be reused then oops happens.We should update reloc root in @reloc_root_tree when cow reloc root node, fix it. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2013-12-12Btrfs: don't miss skinny extent items on delayed ref head contentionFilipe David Borba Manana1-12/+10
Currently extent-tree.c:btrfs_lookup_extent_info() can miss the lookup of skinny extent items. This can happen when the execution flow is the following: * We do an extent tree lookup and fail to find a skinny extent item; * As a result, we attempt to see if a non-skinny extent item exists, either by looking at previous item in the leaf or by doing another full extent tree search; * We have a transaction and then we check for a matching delayed ref head in the transaction's delayed refs rbtree; * We find such delayed ref head and then we try to lock it with a call to mutex_trylock(); * The lock was contended so we jump to the label "again", which repeats the extent tree search but for a non-skinny extent item, because we set previously metadata variable to 0 and the search key to look for a non-skinny extent-item; * After the jump (and after releasing the transaction's delayed refs lock), a skinny extent item might have been added to the extent tree but we will miss it because metadata is set to 0 and the search key is set for a non-skinny extent-item. The fix here is to not reset metadata to 0 and to jump to the initial search key setup if the delayed ref head is contended, instead of jumping directly to the extent tree search label ("again"). This issue was found while investigating the issue reported at Bugzilla 64961. David Sterba suspected this function was missing extent items, and that this could be caused by the last change to this function, which was made in the following patch: [PATCH] Btrfs: optimize btrfs_lookup_extent_info() (commit 74be9510876a66ad9826613ac8a526d26f9e7f01) But in fact this issue already existed before, because after failing to find a skinny extent item, the code set the search key for a non-skinny extent item, and on contention of a matching delayed ref head it would not search the extent tree for a skinny extent item anymore. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2013-12-12btrfs: call mnt_drop_write after interrupted subvol deletionDavid Sterba1-1/+2
If btrfs_ioctl_snap_destroy blocks on the mutex and the process is killed, mnt_write count is unbalanced and leads to unmountable filesystem. CC: stable@vger.kernel.org Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2013-12-12Btrfs: don't clear the default compression typeMiao Xie1-3/+2
We met a oops caused by the wrong compression type: [ 556.512356] BUG: unable to handle kernel NULL pointer dereference at (null) [ 556.512370] IP: [<ffffffff811dbaa0>] __list_del_entry+0x1/0x98 [SNIP] [ 556.512490] [<ffffffff811dbb44>] ? list_del+0xd/0x2b [ 556.512539] [<ffffffffa05dd5ce>] find_workspace+0x97/0x175 [btrfs] [ 556.512546] [<ffffffff813c14b5>] ? _raw_spin_lock+0xe/0x10 [ 556.512576] [<ffffffffa05de276>] btrfs_compress_pages+0x2d/0xa2 [btrfs] [ 556.512601] [<ffffffffa05af060>] compress_file_range.constprop.54+0x1f2/0x4e8 [btrfs] [ 556.512627] [<ffffffffa05af388>] async_cow_start+0x32/0x4d [btrfs] [ 556.512655] [<ffffffffa05cc7a1>] worker_loop+0x144/0x4c3 [btrfs] [ 556.512661] [<ffffffff81059404>] ? finish_task_switch+0x80/0xb8 [ 556.512689] [<ffffffffa05cc65d>] ? btrfs_queue_worker+0x244/0x244 [btrfs] [ 556.512695] [<ffffffff8104fa4e>] kthread+0x8d/0x95 [ 556.512699] [<ffffffff81050000>] ? bit_waitqueue+0x34/0x7d [ 556.512704] [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65 [ 556.512709] [<ffffffff813c7eec>] ret_from_fork+0x7c/0xb0 [ 556.512713] [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65 Steps to reproduce: # mkfs.btrfs -f <dev> # mount -o nodatacow <dev> <mnt> # touch <mnt>/<file> # chattr =c <mnt>/<file> # dd if=/dev/zero of=<mnt>/<file> bs=1M count=10 It is because we cleared the default compression type when setting the nodatacow. In fact, we needn't do it because we have used COMPRESS flag to indicate if we need compressed the file data or not, needn't use the variant -- compress_type -- in btrfs_info to do the same thing, and just use it to hold the default compression type. Or we would get a wrong compress type for a file whose own compress flag is set but the compress flag of its filesystem is not set. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2013-12-06Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds4-59/+20
Pull block layer fixes from Jens Axboe: "A small collection of fixes for the current series. It contains: - A fix for a use-after-free of a request in blk-mq. From Ming Lei - A fix for a blk-mq bug that could attempt to dereference a NULL rq if allocation failed - Two xen-blkfront small fixes - Cleanup of submit_bio_wait() type uses in the kernel, unifying that. From Kent - A fix for 32-bit blkg_rwstat reading. I apologize for this one looking mangled in the shortlog, it's entirely my fault for missing an empty line between the description and body of the text" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: fix use-after-free of request blk-mq: fix dereference of rq->mq_ctx if allocation fails block: xen-blkfront: Fix possible NULL ptr dereference xen-blkfront: Silence pfn maybe-uninitialized warning block: submit_bio_wait() conversions Update of blkg_stat and blkg_rwstat may happen in bh context
2013-12-06fs: fix iversion handlingChristoph Hellwig1-2/+6
Currently notify_change directly updates i_version for size updates, which not only is counter to how all other fields are updated through struct iattr, but also breaks XFS, which need inode updates to happen under its own lock, and synchronized to the structure that gets written to the log. Remove the update in the common code, and it to btrfs and ext4, XFS already does a proper updaste internally and currently gets a double update with the existing code. IMHO this is 3.13 and -stable material and should go in through the XFS tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-11-25block: submit_bio_wait() conversionsKent Overstreet4-59/+20
It was being open coded in a few places. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: Chris Mason <chris.mason@fusionio.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-22Merge branch 'for-linus' of ↵Linus Torvalds13-54/+53
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Almost all of these are bug fixes. Dave Sterba's documentation update is the big exception because he removed our promises to set any machine running Btrfs on fire" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Documentation: filesystems: update btrfs tools section Documentation: filesystems: add new btrfs mount options btrfs: update kconfig help text btrfs: fix bio_size_ok() for max_sectors > 0xffff btrfs: Use trace condition for get_extent tracepoint btrfs: fix typo in the log message Btrfs: fix list delete warning when removing ordered root from the list Btrfs: print bytenr instead of page pointer in check-int Btrfs: remove dead codes from ctree.h Btrfs: don't wait for ordered data outside desired range Btrfs: fix lockdep error in async commit Btrfs: avoid heavy operations in btrfs_commit_super Btrfs: fix __btrfs_start_workers retval Btrfs: disable online raid-repair on ro mounts Btrfs: do not inc uncorrectable_errors counter on ro scrubs Btrfs: only drop modified extents if we logged the whole inode Btrfs: make sure to copy everything if we rename Btrfs: don't BUG_ON() if we get an error walking backrefs
2013-11-21btrfs: update kconfig help textDavid Sterba1-5/+10
Reflect the current status. Portions of the text taken from the wiki pages. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21btrfs: fix bio_size_ok() for max_sectors > 0xffffAkinobu Mita1-1/+1
The data type of max_sectors in queue settings is unsigned int. But this value is stored to the local variable whose type is unsigned short in bio_size_ok(). This can cause unexpected result when max_sectors > 0xffff. Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21btrfs: Use trace condition for get_extent tracepointSteven Rostedt1-2/+1
Doing an if statement to test some condition to know if we should trigger a tracepoint is pointless when tracing is disabled. This just adds overhead and wastes a branch prediction. This is why the TRACE_EVENT_CONDITION() was created. It places the check inside the jump label so that the branch does not happen unless tracing is enabled. That is, instead of doing: if (em) trace_btrfs_get_extent(root, em); Which is basically this: if (em) if (static_key(trace_btrfs_get_extent)) { Using a TRACE_EVENT_CONDITION() we can just do: trace_btrfs_get_extent(root, em); And the condition trace event will do: if (static_key(trace_btrfs_get_extent)) { if (em) { ... The static key is a non conditional jump (or nop) that is faster than having to check if em is NULL or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21btrfs: fix typo in the log messageAnand Jain1-1/+1
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: fix list delete warning when removing ordered root from the listMiao Xie1-0/+1
Commit b02441999efcc6152b87cd58e7970bb7843f76cf "Btrfs: don't wait for the completion of all the ordered extents" introduced a bug that broke the ordered root list: WARNING: CPU: 1 PID: 7119 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98() It is because we forgot to return the roots in the splice list to the ordered list of the fs. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: print bytenr instead of page pointer in check-intStefan Behrens1-8/+17
The page pointer information was useless. The bytenr is what you want when you search for submitted write bios. Additionally, a new bit in the print mask is added that allows to selectively enable the check-int submit_bio verbose mode. Before, the global verbose mode had to be enabled leading to many million useless lines in the kernel log. And a comment is added that explains that LOG_BUF_SHIFT needs to be set to a really high value. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: remove dead codes from ctree.hWang Shilong1-6/+0
These two functions are only stated but undefined. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: don't wait for ordered data outside desired rangeFilipe David Borba Manana1-1/+1
In btrfs_wait_ordered_range(), if we found an extent to the left of the start of our desired wait range and the last byte of that extent is 1 less than the desired range's start, we would would wait for the IO completion of that extent unnecessarily. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: fix lockdep error in async commitLiu Bo1-2/+2
Lockdep complains about btrfs's async commit: [ 2372.462171] [ BUG: bad unlock balance detected! ] [ 2372.462191] 3.12.0+ #32 Tainted: G W [ 2372.462209] ------------------------------------- [ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at: [ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462305] but there are no more locks to release! [ 2372.462324] [ 2372.462324] other info that might help us debug this: [ 2372.462349] no locks held by ceph-osd/14048. [ 2372.462367] [ 2372.462367] stack backtrace: [ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G W 3.12.0+ #32 [ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 2372.462455] ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320 [ 2372.462491] ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650 [ 2372.462526] ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000 [ 2372.462562] Call Trace: [ 2372.462584] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462619] [<ffffffff816f094a>] dump_stack+0x45/0x56 [ 2372.462642] [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100 [ 2372.462677] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462710] [<ffffffff810b01ee>] lock_release+0x18e/0x210 [ 2372.462742] [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs] [ 2372.462783] [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs] [ 2372.462822] [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs] [ 2372.462849] [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0 [ 2372.462873] [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0 [ 2372.462897] [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100 [ 2372.462922] [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0 [ 2372.462946] [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0 [ 2372.462969] [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0 [ 2372.462991] [<ffffffff817045a4>] tracesys+0xdd/0xe2 ==================================================== It's because that we don't do the right thing when checking if it's ok to tell lockdep that we're trying to release the rwsem. If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze rwsem, which makes lockdep complains a lot. Reported-by: Ma Jianpeng <majianpeng@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: avoid heavy operations in btrfs_commit_superLiu Bo1-20/+1
The 'git blame' history shows that, the old transaction commit code has to do twice to ensure roots are updated and we have to flush metadata and super block manually, however, right now all of these can be handled well inside the transaction commit code without extra efforts. And the error handling part remains same with the current code, -- 'return to caller once we get error'. This saves us a transaction commit and a flush of super block, which are both heavy operations according to ftrace output analysis. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: fix __btrfs_start_workers retvalIlya Dryomov1-0/+1
__btrfs_start_workers returns 0 in case it raced with btrfs_stop_workers and lost the race. This is wrong because worker in this case is not allowed to start and is in fact destroyed. Return -EINVAL instead. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: disable online raid-repair on ro mountsIlya Dryomov1-3/+8
This disables the "if needed, write the good copy back before the read is completed" part of the read sequence for read-only mounts. Cc: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: do not inc uncorrectable_errors counter on ro scrubsIlya Dryomov1-2/+4
Currently if we discover an error when scrubbing in ro mode we a) blindly increment the uncorrectable_errors counter, and b) spam the dmesg with the 'unable to fixup (regular) error at ...' message, even though a) we haven't tried to determine if the error is correctable or not, and b) we haven't tried to fixup anything. Fix this. Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: only drop modified extents if we logged the whole inodeJosef Bacik1-1/+1
If we fsync, seek and write, rename and then fsync again we will lose the modified hole extent because the rename will drop all of the modified extents since we didn't do the fast search. We need to only drop the modified extents if we didn't do the fast search and we were logging the entire inode as we don't need them anymore, otherwise this is being premature. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: make sure to copy everything if we renameJosef Bacik1-1/+2
If we rename a file that is already in the log and we fsync again we will lose the new name. This is because we just log the inode update and not the new ref. To fix this we just need to check if we are logging the new name of the inode and copy all the metadata instead of just updating the inode itself. With this patch my testcase now passes. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21Btrfs: don't BUG_ON() if we get an error walking backrefsJosef Bacik1-1/+2
We can just return false for this so we stop doing the snapshot aware defrag stuff. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-16Merge branch 'for-linus' of ↵Linus Torvalds2-11/+11
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This pull fixes the empty_zero_page bug that Heiko reported, and includes one more cleanup from Al Viro" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: get rid of fdentry() btrfs: fix empty_zero_page misusage
2013-11-16Merge branch 'for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual earth-shaking, news-breaking, rocket science pile from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) doc: usb: Fix typo in Documentation/usb/gadget_configs.txt doc: add missing files to timers/00-INDEX timekeeping: Fix some trivial typos in comments mm: Fix some trivial typos in comments irq: Fix some trivial typos in comments NUMA: fix typos in Kconfig help text mm: update 00-INDEX doc: Documentation/DMA-attributes.txt fix typo DRM: comment: `halve' -> `half' Docs: Kconfig: `devlopers' -> `developers' doc: typo on word accounting in kprobes.c in mutliple architectures treewide: fix "usefull" typo treewide: fix "distingush" typo mm/Kconfig: Grammar s/an/a/ kexec: Typo s/the/then/ Documentation/kvm: Update cpuid documentation for steal time and pv eoi treewide: Fix common typo in "identify" __page_to_pfn: Fix typo in comment Correct some typos for word frequency clk: fixed-factor: Fix a trivial typo ...
2013-11-15btrfs: get rid of fdentry()Al Viro2-9/+4
3 of 4 callers actually want file_inode()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-15btrfs: fix empty_zero_page misusageChris Mason1-2/+7
Heiko Carstens noticed that btrfs was using empty_zero_page incorrectly. He explained: The definition of empty_zero_page is architecture specific. It is (currently) either a character array, an unsigned long containing the address of the empty_zero_page, or even worse only the address of the struct page belonging to the empty_zero_page. This commit changes btrfs to use a for-loop instead. On x86 the resulting .ko is smaller, and we're no longer worrying about how each arch builds its zeros. Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12Btrfs: rename btrfs_start_all_delalloc_inodesMiao Xie7-9/+7
rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at the same place. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12Btrfs: don't wait for the completion of all the ordered extentsMiao Xie8-18/+31
It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12Btrfs: don't wait for all the async delalloc when shrinking delallocMiao Xie1-2/+12
It was very likely that there were lots of async delalloc pages in the filesystem, if we waited until all the pages were flushed, we would be blocked for a long time, and the performance would also drop down. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12Btrfs: fix the confusion between delalloc bytes and metadata bytesMiao Xie1-0/+6
In shrink_delalloc(), what we need reclaim is the metadata space, so flushing pages by to_reclaim is not reasonable, it is very likely that the pages we flush are not enough. And then we had to invoke the flush function for several times, at the worst, we need call flush_space for several times. It wasted time. We improve this problem by converting the metadata space size we need reserve to the delalloc bytes, By this way, we can flush the pages by a reasonable number. (Now we use a fixed number to do conversion, it is not flexible, maybe we can find a good way to improve it in the future.) Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12Btrfs: pick up the code for the item number calculation in flush_space()Miao Xie1-9/+16
This patch picked up the code that was used to calculate the number of the items for which we need reserve space, and we will use it in the next patch. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12Btrfs: wait for the ordered extent only when we wantMiao Xie1-1/+2
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc()Miao Xie1-4/+3
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12Btrfs: avoid unnecessary scrub workers allocationWang Shilong1-13/+10
We only allocate scrub workers if we pass all the necessary checks, for example, there are no operation in progress. Besides, move mutex lock protection outside of scrub_workers_get() /scrub_workers_put(). Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12Btrfs: check file extent type before anything elseJosef Bacik1-5/+5
I hit this problem with my no holes patch and it made me realize what the problem was for bz 60834. If the first item in the leaf is an inline extent and we try to read anything starting from disk_bytenr onward we will read off the end of the leaf. So we need to check to see what it's type is, and if it's not REG we can just break out. This should fix this problem. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-12btrfs: Remove useless variable in write_ctree_super()Rashika1-4/+1
The function write_ctree_super() in disk-io.c uses variable ret to return the result of function write_all_supers(). Since, this variable serves no purpose, hence the patch removes it and returns the call of the called function. Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>