summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)AuthorFilesLines
2019-09-09btrfs: add space reservation tracepoint for reserved bytesJosef Bacik1-0/+2
I noticed when folding the trace_btrfs_space_reservation() tracepoint into the btrfs_space_info_update_* helpers that we didn't emit a tracepoint when doing btrfs_add_reserved_bytes(). I know this is because we were swapping bytes_may_use for bytes_reserved, so in my mind there was no reason to have the tracepoint there. But now there is because we always emit the unreserve for the bytes_may_use side, and this would have broken if compression was on anyway. Add a tracepoint to cover the bytes_reserved counter so the math still comes out right. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: roll tracepoint into btrfs_space_info_update helperJosef Bacik6-34/+7
We duplicate this tracepoint everywhere we call these helpers, so update the helper to have the tracepoint as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: do not allow reservations if we have pending ticketsJosef Bacik1-3/+7
If we already have tickets on the list we don't want to steal their reservations. This is a preparation patch for upcoming changes, technically this shouldn't happen today because of the way we add bytes to tickets before adding them to the space_info in most cases. This does not change the FIFO nature of reserve tickets, it simply allows us to enforce it in a different way. Previously it was enforced because any new space would be added to the first ticket on the list, which would result in new reservations getting a reserve ticket. This replaces that mechanism by simply checking to see if we have outstanding reserve tickets and skipping straight to adding a ticket for our reservation. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: stop clearing EXTENT_DIRTY in inode I/O treeOmar Sandoval6-47/+30
Since commit fee187d9d9dd ("Btrfs: do not set EXTENT_DIRTY along with EXTENT_DELALLOC"), we never set EXTENT_DIRTY in inode->io_tree, so we can simplify and stop trying to clear it. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: treat RWF_{,D}SYNC writes as sync for CRCsOmar Sandoval1-1/+1
The VFS indicates a synchronous write to ->write_iter() via iocb->ki_flags. The IOCB_{,D}SYNC flags may be set based on the file (see iocb_flags()) or the RWF_* flags passed to a syscall like pwritev2() (see kiocb_set_rw_flags()). However, in btrfs_file_write_iter(), we're checking if a write is synchronous based only on the file; we use this to decide when to bump the sync_writers counter and thus do CRCs synchronously. Make sure we do this for all synchronous writes as determined by the VFS. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add const ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: use correct count in btrfs_file_write_iter()Omar Sandoval1-1/+2
generic_write_checks() may modify iov_iter_count(), so we must get the count after the call, not before. Using the wrong one has a couple of consequences: 1. We check a longer range in check_can_nocow() for nowait than we're actually writing. 2. We create extra hole extent maps in btrfs_cont_expand(). As far as I can tell, this is harmless, but I might be missing something. These issues are pretty minor, but let's fix it before something more important trips on it. Fixes: edf064e7c6fe ("btrfs: nowait aio support") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tie extent buffer and it's token togetherDavid Sterba5-26/+20
Further simplifaction of the get/set helpers is possible when the token is uniquely tied to an extent buffer. A condition and an assignment can be avoided. The initializations are moved closer to the first use when the extent buffer is valid. There's one exception in __push_leaf_left where the token is reused. Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: assume valid token for btrfs_set/get_token helpersDavid Sterba1-12/+12
Now that we can safely assume that the token is always a valid pointer, remove the branches that check that. Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: define separate btrfs_set/get_XX helpersDavid Sterba2-11/+55
There are helpers for all type widths defined via macro and optionally can use a token which is a cached pointer to avoid repeated mapping of the extent buffer. The token value is known at compile time, when it's valid it's always address of a local variable, otherwise it's NULL passed by the token-less helpers. This can be utilized to remove some branching as the helpers are used frequenlty. Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Make btrfs_find_name_in_ext_backref return struct btrfs_inode_extrefNikolay Borisov3-32/+22
btrfs_find_name_in_ext_backref returns either 0/1 depending on whether it found a backref for the given name. If it returns true then the actual inode_ref struct is returned in one of its parameters. That's pointless, instead refactor the function such that it returns either a pointer to the btrfs_inode_extref or NULL it it didn't find anything. This streamlines the function calling convention. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Make btrfs_find_name_in_backref return btrfs_inode_ref structNikolay Borisov3-22/+21
btrfs_find_name_in_backref returns either 0/1 depending on whether it found a backref for the given name. If it returns true then the actual inode_ref struct is returned in one of its parameters. That's pointless, instead refactor the function such that it returns either a pointer to the btrfs_inode_ref or NULL it it didn't find anything. This streamlines the function calling convention. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: move dev_stats helpers to volumes.cDavid Sterba2-24/+23
The other dev stats functions are already there and the helpers are not used by anything else. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: move struct io_ctl to free-space-cache.hDavid Sterba3-15/+15
The io_ctl structure is used for free space management, and used only by the v1 space cache code, but unfortunatlly the full definition is required by block-group.h so it can't be moved to free-space-cache.c without additional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: move functions for tree compare to send.cDavid Sterba3-376/+374
Send is the only user of tree_compare, we can move it there along with the other helpers and definitions. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: rename and export read_node_slotDavid Sterba2-11/+14
Preparatory work for code that will be moved out of ctree and uses this function. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: move private raid56 definitions from ctree.hDavid Sterba2-16/+16
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: move math functions to misc.hDavid Sterba7-33/+21
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: move cond_wake_up functions out of ctreeDavid Sterba12-22/+43
The file ctree.h serves as a header for everything and has become quite bloated. Split some helpers that are generic and create a new file that should be the catch-all for code that's not btrfs-specific. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: use proper error values on allocation failure in clone_fs_devicesAnand Jain1-2/+6
Fix the fake ENOMEM return error code to the actual error in clone_fs_devices(). Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: proper error handling when invalid device is found in find_next_devidAnand Jain1-1/+6
In a corrupted tree, if search for next devid finds the device with devid = -1, then report the error -EUCLEAN back to the parent function to fail gracefully. The tree checker will not catch this in case the devids are created using the following script: umount /btrfs dev1=/dev/sdb dev2=/dev/sdc mkfs.btrfs -fq -dsingle -msingle $dev1 mount $dev1 /btrfs _fail() { echo $1 exit 1 } while true; do btrfs dev add -f $dev2 /btrfs || _fail "add failed" btrfs dev del $dev1 /btrfs || _fail "del failed" dev_tmp=$dev1 dev1=$dev2 dev2=$dev_tmp done With output: BTRFS critical (device sdb): corrupt leaf: root=3 block=313739198464 slot=1 devid=1 invalid devid: has=507 expect=[0, 506] BTRFS error (device sdb): block=313739198464 write time tree block corruption detected BTRFS: error (device sdb) in btrfs_commit_transaction:2268: errno=-5 IO failure (Error while writing out transaction) BTRFS warning (device sdb): Skipping commit of aborted transaction. BTRFS: error (device sdb) in cleanup_transaction:1827: errno=-5 IO failure Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> [ add script and messages ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: fix allocation of free space cache v1 bitmap pagesChristophe Leroy3-7/+22
Various notifications of type "BUG kmalloc-4096 () : Redzone overwritten" have been observed recently in various parts of the kernel. After some time, it has been made a relation with the use of BTRFS filesystem and with SLUB_DEBUG turned on. [ 22.809700] BUG kmalloc-4096 (Tainted: G W ): Redzone overwritten [ 22.810286] INFO: 0xbe1a5921-0xfbfc06cd. First byte 0x0 instead of 0xcc [ 22.810866] INFO: Allocated in __load_free_space_cache+0x588/0x780 [btrfs] age=22 cpu=0 pid=224 [ 22.811193] __slab_alloc.constprop.26+0x44/0x70 [ 22.811345] kmem_cache_alloc_trace+0xf0/0x2ec [ 22.811588] __load_free_space_cache+0x588/0x780 [btrfs] [ 22.811848] load_free_space_cache+0xf4/0x1b0 [btrfs] [ 22.812090] cache_block_group+0x1d0/0x3d0 [btrfs] [ 22.812321] find_free_extent+0x680/0x12a4 [btrfs] [ 22.812549] btrfs_reserve_extent+0xec/0x220 [btrfs] [ 22.812785] btrfs_alloc_tree_block+0x178/0x5f4 [btrfs] [ 22.813032] __btrfs_cow_block+0x150/0x5d4 [btrfs] [ 22.813262] btrfs_cow_block+0x194/0x298 [btrfs] [ 22.813484] commit_cowonly_roots+0x44/0x294 [btrfs] [ 22.813718] btrfs_commit_transaction+0x63c/0xc0c [btrfs] [ 22.813973] close_ctree+0xf8/0x2a4 [btrfs] [ 22.814107] generic_shutdown_super+0x80/0x110 [ 22.814250] kill_anon_super+0x18/0x30 [ 22.814437] btrfs_kill_super+0x18/0x90 [btrfs] [ 22.814590] INFO: Freed in proc_cgroup_show+0xc0/0x248 age=41 cpu=0 pid=83 [ 22.814841] proc_cgroup_show+0xc0/0x248 [ 22.814967] proc_single_show+0x54/0x98 [ 22.815086] seq_read+0x278/0x45c [ 22.815190] __vfs_read+0x28/0x17c [ 22.815289] vfs_read+0xa8/0x14c [ 22.815381] ksys_read+0x50/0x94 [ 22.815475] ret_from_syscall+0x0/0x38 Commit 69d2480456d1 ("btrfs: use copy_page for copying pages instead of memcpy") changed the way bitmap blocks are copied. But allthough bitmaps have the size of a page, they were allocated with kzalloc(). Most of the time, kzalloc() allocates aligned blocks of memory, so copy_page() can be used. But when some debug options like SLAB_DEBUG are activated, kzalloc() may return unaligned pointer. On powerpc, memcpy(), copy_page() and other copying functions use 'dcbz' instruction which provides an entire zeroed cacheline to avoid memory read when the intention is to overwrite a full line. Functions like memcpy() are writen to care about partial cachelines at the start and end of the destination, but copy_page() assumes it gets pages. As pages are naturally cache aligned, copy_page() doesn't care about partial lines. This means that when copy_page() is called with a misaligned pointer, a few leading bytes are zeroed. To fix it, allocate bitmaps through kmem_cache instead of using kzalloc() The cache pool is created with PAGE_SIZE alignment constraint. Reported-by: Erhard F. <erhard_f@mailbox.org> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204371 Fixes: 69d2480456d1 ("btrfs: use copy_page for copying pages instead of memcpy") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: David Sterba <dsterba@suse.com> [ rename to btrfs_free_space_bitmap ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Detect unbalanced tree with empty leaf before crashing btree operationsQu Wenruo2-0/+16
[BUG] With crafted image, btrfs will panic at btree operations: kernel BUG at fs/btrfs/ctree.c:3894! invalid opcode: 0000 [#1] SMP PTI CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9 RIP: 0010:__push_leaf_left+0x6b6/0x6e0 RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000 RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000 RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240 R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0 FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0 Call Trace: ? _cond_resched+0x1a/0x50 push_leaf_left+0x179/0x190 btrfs_del_items+0x316/0x470 btrfs_del_csums+0x215/0x3a0 __btrfs_free_extent.isra.72+0x5a7/0xbe0 __btrfs_run_delayed_refs+0x539/0x1120 btrfs_run_delayed_refs+0xdb/0x1b0 btrfs_commit_transaction+0x52/0x950 ? start_transaction+0x94/0x450 transaction_kthread+0x163/0x190 kthread+0x105/0x140 ? btrfs_cleanup_transaction+0x560/0x560 ? kthread_destroy_worker+0x50/0x50 ret_from_fork+0x35/0x40 Modules linked in: ---[ end trace c2425e6e89b5558f ]--- [CAUSE] The offending csum tree looks like this: checksum tree key (CSUM_TREE ROOT_ITEM 0) node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE ... key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17 key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<< key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17 ... leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8 range start 85975040 end 85983232 length 8192 ... leaf 29642752 items 0 free space 3995 generation 17 owner 0 ^ empty leaf invalid owner ^ leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368 range start 92274688 end 95723520 length 3448832 So we have a corrupted csum tree where one tree leaf is completely empty, causing unbalanced btree, thus leading to unexpected btree balance error. [FIX] For this particular case, we handle it in two directions to catch it: - Check if the tree block is empty through btrfs_verify_level_key() So that invalid tree blocks won't be read out through btrfs_search_slot() and its variants. - Check 0 tree owner in tree checker NO tree is using 0 as its tree owner, detect it and reject at tree block read time. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202821 Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Deprecate BTRFS_SUBVOL_CREATE_ASYNC flagNikolay Borisov1-1/+14
Support for asynchronous snapshot creation was originally added in 72fd032e9424 ("Btrfs: add SNAP_CREATE_ASYNC ioctl") to cater for ceph's backend needs. However, since Ceph has deprecated support for btrfs there is no longer need for that support in btrfs. Additionally, this was never supported by btrfs-progs, the official userspace tools. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: improve error handling in run_delalloc_nocowNikolay Borisov1-3/+17
Correctly handle failure cases when adding an ordered extents in case of REGULAR or PREALLOC extents. Remove the BUG_ON. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: comment and minor simplifications in run_delalloc_nocowNikolay Borisov1-4/+3
Add a comment explaining why we keep the BUG also use the already read and cached value of extent ram bytes stored in 'ram_bytes'. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: streamline code in run_delalloc_nocow in case of inline extentsNikolay Borisov1-7/+5
The extent range check right after the "out_check" label is redundant, because the only way it can trigger is if we have an inline extent. In this case it makes more sense to actually move it in the branch explictly dealing with inlines extents. What's more, the nested 'if (nocow)' can never be true because for inline extents we always do COW and there is no chance 'nocow' can be true, just remove that check. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: simplify extent type checks in run_delalloc_nocowNikolay Borisov1-8/+8
There is no point in checking the type of the extent again just to set the 'type' variable, when this check has already been performed before. Instead, extend the original if branch with an 'else' clause. This allows to remove one local variable and make it obvious how the code flow differs for prealloc/regular extents. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: improve comments around nocow pathNikolay Borisov2-4/+50
run_delalloc_nocow contains numerous, somewhat subtle, checks when figuring out whether a particular extent should be CoW'ed or not. This patch explicitly states the assumptions those checks verify. As a result also document 2 of the more subtle checks in check_committed_ref as well. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: refactor variable scope in run_delalloc_nocowNikolay Borisov1-33/+28
Of the 22 (!!!) local variables declared in this function only 9 have function-wide context. Of the remaining 13, 12 are needed in the main while loop of the function and 1 is needed in a tiny if branch, only in case we have prealloc extent. This commit reduces the lifespan of every variable to its bare minimum. It also renames the 'nolock' boolean to freespace_inode to clearly indicate its purpose. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: only reserve metadata_size for inodesJosef Bacik2-5/+18
Historically we reserved worst case for every btree operation, and generally speaking we want to do that in cases where it could be the worst case. However for updating inodes we know the inode items are already in the tree, so it will only be an update operation and never an insert operation. This allows us to always reserve only the metadata_size amount for inode updates rather than the insert_metadata_size amount. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: rename the btrfs_calc_*_metadata_size helpersJosef Bacik13-30/+34
btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that it doesn't take into account any splitting at the levels, because truncate will never split nodes. However truncate _and_ changing will never split nodes, so rename btrfs_calc_trunc_metadata_size to btrfs_calc_metadata_size. Also btrfs_calc_trans_metadata_size is purely for inserting items, so rename this to btrfs_calc_insert_metadata_size. Making these clearer will help when I start using them differently in upcoming patches. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tree-checker: Add EXTENT_DATA_REF checkQu Wenruo3-1/+50
EXTENT_DATA_REF is a little like DIR_ITEM which contains hash in its key->offset. This patch will check the following contents: - Key->objectid Basic alignment check. - Hash Hash of each extent_data_ref item must match key->offset. - Offset Basic alignment check. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tree-checker: Add simple keyed refs checkQu Wenruo1-1/+39
For TREE_BLOCK_REF, SHARED_DATA_REF and SHARED_BLOCK_REF we need to check: | TREE_BLOCK_REF | SHARED_BLOCK_REF | SHARED_BLOCK_REF --------------+----------------+-----------------+------------------ key->objectid | Alignment | Alignment | Alignment key->offset | Any value | Alignment | Alignment item_size | 0 | 0 | sizeof(le32) (*) *: sizeof(struct btrfs_shared_data_ref) So introduce a check to check all these 3 key types together. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tree-checker: Add EXTENT_ITEM and METADATA_ITEM checkQu Wenruo1-0/+248
This patch introduces the ability to check extent items. This check involves: - key->objectid check Basic alignment check. - key->type check Against btrfs_extent_item::type and SKINNY_METADATA feature. - key->offset alignment check for EXTENT_ITEM - key->offset check for METADATA_ITEM - item size check Both against minimal size and stepping check. - btrfs_extent_item check Checks its flags and generation. - btrfs_extent_inline_ref checks Against 4 types inline ref. Checks bytenr alignment and tree level. - btrfs_extent_item::refs check Check against total refs found in inline refs. This check would be the most complex single item check due to its nature of inlined items. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: fix error pointer check in __btrfs_map_block()Dan Carpenter1-1/+1
The btrfs_get_chunk_map() never returns NULL, it returns error pointers. Fixes: 89b798ad1b42 ("btrfs: Use btrfs_get_io_geometry appropriately") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: dev stat drop useless gotoAnand Jain1-5/+2
In the function btrfs_init_dev_stats() goto out is not needed, because the alloc has failed. So just return -ENOMEM. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: dev stats item key conversion per cpu type is not neededAnand Jain1-2/+0
%found_key is not used, drop it since it hasn't been used since the beginning in 733f4fbbc108 ("Btrfs: read device stats on mount, write modified ones during commit"). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Make reada_tree_block_flagged privateNikolay Borisov3-31/+29
This function is used only for the readahead machinery. It makes no sense to keep it external to reada.c file. Place it above its sole caller and make it static. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: compression: replace set_level callbacks by a common helperDavid Sterba5-33/+20
The set_level callbacks do not do anything special and can be replaced by a helper that uses the levels defined in the tables. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: define compression levels staticallyDavid Sterba4-0/+10
The maximum and default levels do not change and can be defined directly. The set_level callback was a temporary solution and will be removed. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09Btrfs: fix use-after-free when using the tree modification logFilipe Manana1-1/+3
At ctree.c:get_old_root(), we are accessing a root's header owner field after we have freed the respective extent buffer. This results in an use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC is set, results in a stack trace like the following: [ 3876.799331] stack segment: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [ 3876.799363] CPU: 0 PID: 15436 Comm: pool Not tainted 5.3.0-rc3-btrfs-next-54 #1 [ 3876.799385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 [ 3876.799433] RIP: 0010:btrfs_search_old_slot+0x652/0xd80 [btrfs] (...) [ 3876.799502] RSP: 0018:ffff9f08c1a2f9f0 EFLAGS: 00010286 [ 3876.799518] RAX: ffff8dd300000000 RBX: ffff8dd85a7a9348 RCX: 000000038da26000 [ 3876.799538] RDX: 0000000000000000 RSI: ffffe522ce368980 RDI: 0000000000000246 [ 3876.799559] RBP: dae1922adadad000 R08: 0000000008020000 R09: ffffe522c0000000 [ 3876.799579] R10: ffff8dd57fd788c8 R11: 000000007511b030 R12: ffff8dd781ddc000 [ 3876.799599] R13: ffff8dd9e6240578 R14: ffff8dd6896f7a88 R15: ffff8dd688cf90b8 [ 3876.799620] FS: 00007f23ddd97700(0000) GS:ffff8dda20200000(0000) knlGS:0000000000000000 [ 3876.799643] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3876.799660] CR2: 00007f23d4024000 CR3: 0000000710bb0005 CR4: 00000000003606f0 [ 3876.799682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3876.799703] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3876.799723] Call Trace: [ 3876.799735] ? do_raw_spin_unlock+0x49/0xc0 [ 3876.799749] ? _raw_spin_unlock+0x24/0x30 [ 3876.799779] resolve_indirect_refs+0x1eb/0xc80 [btrfs] [ 3876.799810] find_parent_nodes+0x38d/0x1180 [btrfs] [ 3876.799841] btrfs_check_shared+0x11a/0x1d0 [btrfs] [ 3876.799870] ? extent_fiemap+0x598/0x6e0 [btrfs] [ 3876.799895] extent_fiemap+0x598/0x6e0 [btrfs] [ 3876.799913] do_vfs_ioctl+0x45a/0x700 [ 3876.799926] ksys_ioctl+0x70/0x80 [ 3876.799938] ? trace_hardirqs_off_thunk+0x1a/0x20 [ 3876.799953] __x64_sys_ioctl+0x16/0x20 [ 3876.799965] do_syscall_64+0x62/0x220 [ 3876.799977] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3876.799993] RIP: 0033:0x7f23e0013dd7 (...) [ 3876.800056] RSP: 002b:00007f23ddd96ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3876.800078] RAX: ffffffffffffffda RBX: 00007f23d80210f8 RCX: 00007f23e0013dd7 [ 3876.800099] RDX: 00007f23d80210f8 RSI: 00000000c020660b RDI: 0000000000000003 [ 3876.800626] RBP: 000055fa2a2a2440 R08: 0000000000000000 R09: 00007f23ddd96d7c [ 3876.801143] R10: 00007f23d8022000 R11: 0000000000000246 R12: 00007f23ddd96d80 [ 3876.801662] R13: 00007f23ddd96d78 R14: 00007f23d80210f0 R15: 00007f23ddd96d80 (...) [ 3876.805107] ---[ end trace e53161e179ef04f9 ]--- Fix that by saving the root's header owner field into a local variable before freeing the root's extent buffer, and then use that local variable when needed. Fixes: 30b0463a9394d9 ("Btrfs: fix accessing the root pointer in tree mod log functions") CC: stable@vger.kernel.org # 3.10+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: replace: BTRFS_DEV_REPLACE_ITEM_STATE_x defines should goAnand Jain1-1/+1
The BTRFS_DEV_REPLACE_ITEM_STATE_x defines, as shown in [1], are unused in both kernel and btrfs-progs (except for one instance of BTRFS_DEV_REPLACE_ITEM_STATE_NEVER_STARTED in kernel). [1] btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED 2 btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED 3 btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED 4 Further these define-values are different form its counterpart BTRFS_IOCTL_DEV_REPLACE_STATE_x series as shown in [2]. [2] btrfs_tree.h:#define BTRFS_DEV_REPLACE_ITEM_STATE_SUSPENDED 2 btrfs_tree.h:#define BTRFS_DEV_REPLACE_ITEM_STATE_FINISHED 3 btrfs_tree.h:#define BTRFS_DEV_REPLACE_ITEM_STATE_CANCELED 4 So this patch deletes the BTRFS_DEV_REPLACE_ITEM_STATE_x altogether, and one instance of BTRFS_DEV_REPLACE_ITEM_STATE_NEVER_STARTED is replaced with BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED in the kernel. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: introduce an evict flushing stateJosef Bacik3-47/+62
We have this weird space flushing loop inside inode.c for evict where we'll do the normal LIMIT flush, and then commit the transaction and hope we get our space. This is super janky, and in fact there's really nothing stopping us from using FLUSH_ALL except that we run delayed iputs, which means we could deadlock. So introduce a new flush state for eviction that does the normal priority flushing with all of the states that are safe for eviction. The nice side-effect of this is that we'll try harder for evictions. Previously if (for example generic/269) you had a bunch of other operations happening on the fs you could race with those reservations when committing the transaction, and eventually miss getting a reservation for the evict. With this code we'll have our ticket in place through the transaction commit, so any pinned bytes will go to our pending evictions first. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: refactor priority_reclaim_metadata_spaceJosef Bacik1-6/+9
With the eviction flushing stuff we'll want to allow for different states, but still work basically the same way that priority_reclaim_metadata_space works currently. Refactor this to take the flushing states and size as an argument so we can use the same logic for limit flushing and eviction flushing. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: factor out the ticket flush handlingJosef Bacik1-22/+42
We're going to make this logic a little more complicated for evict, so factor the ticket flushing/waiting code out of __reserve_metadata_bytes. This has no functional change. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: unify error handling for ticket flushingJosef Bacik1-21/+11
Currently we handle the cleanup of errored out tickets in both the priority flush path and the normal flushing path. This is the same code in both places, so just refactor so we don't duplicate the cleanup work. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: add a flush step for delayed iputsJosef Bacik2-3/+5
Delayed iputs could very well free up enough space without needing to commit the transaction, so make this step it's own step. This will allow us to skip the step for evictions in a later patch. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: unexport the temporary exported functionsJosef Bacik2-18/+14
These were renamed and exported to facilitate logical migration of different code chunks into block-group.c. Now that all the users are in one file go ahead and rename them back, move the code around, and make them static. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: migrate the block group cleanup codeJosef Bacik4-130/+129
This can now be easily migrated as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ refresh on top of sysfs cleanups ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: migrate the alloc_profile helpersJosef Bacik4-125/+122
These feel more at home in block-group.c. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ refresh, adjust btrfs_get_alloc_profile exports ] Signed-off-by: David Sterba <dsterba@suse.com>