summaryrefslogtreecommitdiff
path: root/fs/btrfs/delayed-inode.c
AgeCommit message (Collapse)AuthorFilesLines
11 daysbtrfs: add unlikely annotations to branches leading to transaction abortDavid Sterba1-3/+3
The unlikely() annotation is a static prediction hint that compiler may use to reorder code out of hot path. We use it elsewhere (namely tree-checker.c) for error branches that almost never happen. Transaction abort is one such error, the btrfs_abort_transaction() inlines code to check the state and print a warning, this ought to be out of the hot path. The most common pattern is when transaction abort is called after checking a return value and the control flow leads to a quick return. In other cases it may not be necessary to add unlikely() e.g. when the function returns anyway or the control flow is not changed noticeably. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
11 daysbtrfs: use kmalloc_array() for open-coded arithmetic in kmalloc()Miquel Sabaté Solà1-2/+2
As pointed out in the documentation, calling 'kmalloc' with open-coded arithmetic can lead to unfortunate overflows and this particular way of using it has been deprecated. Instead, it's preferred to use 'kmalloc_array' in cases where it might apply so an overflow check is performed. Note this is an API cleanup and is not fixing any overflows because in all cases the multipliers are bounded small numbers derived from number of items in leaves/nodes. Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
12 daysbtrfs: print leaked references in kill_all_delayed_nodes()Leo Martins1-0/+1
We are seeing soft lockups in kill_all_delayed_nodes due to a delayed_node with a lingering reference count of 1. Printing at this point will reveal the guilty stack trace. If the delayed_node has no references there should be no output. Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
12 daysbtrfs: implement ref_tracker for delayed_nodesLeo Martins1-55/+120
Add ref_tracker infrastructure for struct btrfs_delayed_node. It is a response to the largest btrfs related crash in our fleet. We're seeing soft lockups in btrfs_kill_all_delayed_nodes() that seem to be a result of delayed_nodes not being released properly. A ref_tracker object is allocated on reference count increases and freed on reference count decreases. The ref_tracker object stores a stack trace of where it is allocated. The ref_tracker_dir object is embedded in btrfs_delayed_node and keeps track of all current and some old/freed ref_tracker objects. When a leak is detected we can print the stack traces for all ref_trackers that have not yet been freed. Here is a common example of taking a reference to a delayed_node and freeing it with ref_tracker. struct btrfs_ref_tracker tracker; struct btrfs_delayed_node *node; node = btrfs_get_delayed_node(inode, &tracker); // use delayed_node... btrfs_release_delayed_node(node, &tracker); There are two special cases where the delayed_node reference is "long lived", meaning that the thread that takes the reference and the thread that releases the reference are different. The 'inode_cache_tracker' tracks the delayed_node stored in btrfs_inode. The 'node_list_tracker' tracks the delayed_node stored in the btrfs_delayed_root node_list/prepare_list. These trackers are embedded in the btrfs_delayed_node. btrfs_ref_tracker and btrfs_ref_tracker_dir are wrappers that either compile to the corresponding ref_tracker structs or empty structs depending on CONFIG_BTRFS_DEBUG. There are also btrfs wrappers for the ref_tracker API. Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-15btrfs: initialize inode::file_extent_tree after i_mode has been setaustinchang1-3/+0
btrfs_init_file_extent_tree() uses S_ISREG() to determine if the file is a regular file. In the beginning of btrfs_read_locked_inode(), the i_mode hasn't been read from inode item, then file_extent_tree won't be used at all in volumes without NO_HOLES. Fix this by calling btrfs_init_file_extent_tree() after i_mode is initialized in btrfs_read_locked_inode(). Fixes: 3d7db6e8bd22e6 ("btrfs: don't allocate file extent tree for non regular files") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: austinchang <austinchang@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: make btrfs_readdir_delayed_dir_index() return a bool insteadFilipe Manana1-7/+7
There's no need to return errors, all we do is return 1 or 0 depending on whether we should or should not stop iterating over delayed dir indexes. So change the function to return bool instead of an int. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: make btrfs_should_delete_dir_index() return a bool insteadFilipe Manana1-4/+3
There's no need to return errors and we currently return 1 in case the index should be deleted and 0 otherwise, so change the return type from int to bool as this is a boolean function and it's more clear this way. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: add details to error messages at btrfs_delete_delayed_dir_index()Filipe Manana1-4/+4
Update the error messages with: 1) Fix typo in the first one, deltiona -> deletion; 2) Remove redundant part of the first message, the part following the comma, and including all the useful information: root, inode, index and error value; 3) Update the second message to use more formal language (example 'error' instead of 'err'), , remove redundant part about "deletion tree of delayed node..." and print the relevant information in the same format and order as the first message, without the ugly opening parenthesis without a space separating from the previous word. This also makes the message similar in format to the one we have at btrfs_insert_delayed_dir_index(). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: make btrfs_delete_delayed_insertion_item() return a booleanFilipe Manana1-6/+7
There's no need to return an integer as all we need to do is return true or false to tell whether we deleted a delayed item or not. Also the logic is inverted since we return 1 (true) if we didn't delete and 0 (false) if we did, which is somewhat counter intuitive. Change the return type to a boolean and make it return true if we deleted and false otherwise. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: use rb_find() in __btrfs_lookup_delayed_item()Yangtao Li1-21/+18
Use the rb-tree helper so we don't open code the search code. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Pan Chuang <panchuang@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: unfold transaction abort at __btrfs_update_delayed_inode()Filipe Manana1-11/+14
We have a common error path where we abort the transaction, but like this in case we get a transaction abort stack trace we don't know exactly which previous function call failed. Instead abort the transaction after any function call that returns an error, so that we can easily identify which function failed. Reviewed-by: Daniel Vacek <neelx@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-06-19btrfs: fix delayed ref refcount leak in debug assertionLeo Martins1-1/+4
If the delayed_root is not empty we are increasing the number of references to a delayed_node without decreasing it, causing a leak. Fix by decrementing the delayed_node reference count. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ Remove the changelog from the commit message. ] Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: simplify extracting delayed node at btrfs_first_prepared_delayed_node()Filipe Manana1-10/+7
Instead of grabbing the next pointer from the list and then doing a list_entry() call, we can simply use list_first_entry(), removing the need for list_head variable. Also there's no need to check if the list is empty before attempting to extract the first element, we can use list_first_entry_or_null(), removing the need for a special if statement and the 'out' label. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: simplify extracting delayed node at btrfs_first_delayed_node()Filipe Manana1-9/+5
Instead of grabbing the next pointer from the list and then doing a list_entry() call, we can simply use list_first_entry(), removing the need for list_head variable. Also there's no need to check if the list is empty before attempting to extract the first element, we can use list_first_entry_or_null(), removing the need for a special if statement and the 'out' label. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: trivial conversion to return bool instead of intDavid Sterba1-4/+4
Old code has a lot of int for bool return values, bool is recommended and done in new code. Convert the trivial cases that do simple 0/false and 1/true. Functions comment are updated if needed. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: update and correct description of btrfs_get_or_create_delayed_node()Charles Han1-1/+6
The comment mistakenly says the function is returning PTR_ERR instead of ERR_PTR. Fix it and update it so it's more descriptive. Signed-off-by: Charles Han <hanchunchao@inspur.com> Reviewed-by: David Sterba <dsterba@suse.com> [ Enhance the function comment. ] Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: use rb_entry_safe() where possible to simplify codeDavid Sterba1-21/+6
Simplify conditionally reading an rb_entry(), there's the rb_entry_safe() helper that checks the node pointer for NULL so we don't have to write it explicitly. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: do trivial BTRFS_PATH_AUTO_FREE conversionsDavid Sterba1-2/+1
The most trivial pattern for the auto freeing when the variable is declared with the macro and the final btrfs_free_path() is removed. There are almost none goto -> return conversions and there's no other function cleanup. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: pass struct btrfs_inode to btrfs_fill_inode()David Sterba1-26/+26
Pass a struct btrfs_inode to btrfs_fill_inode() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: pass struct btrfs_inode to fill_stack_inode_item()David Sterba1-24/+22
Pass a struct btrfs_inode to fill_stack_inode_item() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13btrfs: drop unused parameter fs_info to btrfs_delete_delayed_insertion_item()David Sterba1-3/+2
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13btrfs: delayed-inode: remove unnecessary call to btrfs_mark_buffer_dirty()Filipe Manana1-1/+0
The call to btrfs_mark_buffer_dirty() at __btrfs_update_delayed_inode() is not necessary as we have a path setup for writing with btrfs_search_slot() having a 'cow' argument set to 1. This just makes the code more verbose, confusing and add a little extra overhead and well as increase the module's text size, so remove it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13btrfs: update __btrfs_add_delayed_item() to use rb helperRoger L. Beckermeyer III1-24/+19
Update __btrfs_add_delayed_item() to use rb_find_add_cached(). Signed-off-by: Roger L. Beckermeyer III <beckerlee3@gmail.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: pass a btrfs_inode to btrfs_readdir_get_delayed_items()David Sterba1-4/+4
Pass a struct btrfs_inode to btrfs_readdir_get_delayed_items() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: pass a btrfs_inode to btrfs_readdir_put_delayed_items()David Sterba1-2/+2
Pass a struct btrfs_inode to btrfs_readdir_put_delayed_items() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: constify pointer parameters where applicableDavid Sterba1-3/+3
We can add const to many parameters, this is for clarity and minor addition to safety. There are some minor effects, in the assembly code and .ko measured on release config. This patch does not cover all possible conversions. Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: unify index_cnt and csum_bytes from struct btrfs_inodeFilipe Manana1-1/+2
The index_cnt field of struct btrfs_inode is used only for two purposes: 1) To store the index for the next entry added to a directory; 2) For the data relocation inode to track the logical start address of the block group currently being relocated. For the relocation case we use index_cnt because it's not used for anything else in the relocation use case - we could have used other fields that are not used by relocation such as defrag_bytes, last_unlink_trans or last_reflink_trans for example (among others). Since the csum_bytes field is not used for directories, do the following changes: 1) Put index_cnt and csum_bytes in a union, and index_cnt is only initialized when the inode is a directory. The csum_bytes is only accessed in IO paths for regular files, so we're fine here; 2) Use the defrag_bytes field for relocation, since the data relocation inode is never used for defrag purposes. And to make the naming better, alias it to reloc_block_group_start by using a union. This reduces the size of struct btrfs_inode by 8 bytes in a release kernel, from 1056 bytes down to 1048 bytes. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: remove inode_lock from struct btrfs_root and use xarray locksFilipe Manana1-14/+12
Currently we use the spinlock inode_lock from struct btrfs_root to serialize access to two different data structures: 1) The delayed inodes xarray (struct btrfs_root::delayed_nodes); 2) The inodes xarray (struct btrfs_root::inodes). Instead of using our own lock, we can use the spinlock that is part of the xarray implementation, by using the xa_lock() and xa_unlock() APIs and using the xarray APIs with the double underscore prefix that don't take the xarray locks and assume the caller is using xa_lock() and xa_unlock(). So remove the spinlock inode_lock from struct btrfs_root and use the corresponding xarray locks. This brings 2 benefits: 1) We reduce the size of struct btrfs_root, from 1336 bytes down to 1328 bytes on a 64 bits release kernel config; 2) We reduce lock contention by not using anymore the same lock for changing two different and unrelated xarrays. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: change root->root_key.objectid to btrfs_root_id()Josef Bacik1-1/+1
A comment from Filipe on one of my previous cleanups brought my attention to a new helper we have for getting the root id of a root, which makes it easier to read in the code. The changes where made with the following Coccinelle semantic patch: // <smpl> @@ expression E,E1; @@ ( E->root_key.objectid = E1 | - E->root_key.objectid + btrfs_root_id(E) ) // </smpl> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor style fixups ] Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-02btrfs: record delayed inode root in transactionBoris Burkov1-0/+3
When running delayed inode updates, we do not record the inode's root in the transaction, but we do allocate PREALLOC and thus converted PERTRANS space for it. To be sure we free that PERTRANS meta rsv, we must ensure that we record the root in the transaction. Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-05btrfs: remove SLAB_MEM_SPREAD flag useChengming Zhou1-1/+1
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1, so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete to avoid confusion for users. Here we can just remove all its users, which has no functional change. [1] https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz/ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04btrfs: use KMEM_CACHE() to create btrfs_delayed_node cacheKunwu Chan1-5/+1
Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify the creation of SLAB caches when the default values are used. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04btrfs: uninline btrfs_init_delayed_root()David Sterba1-0/+11
This is a simple initializer and not on any hot path, it does not need to be static inline. Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04btrfs: change BUG_ON to assertion when checking for delayed_node rootDavid Sterba1-1/+1
The pointer to root is initialized in btrfs_init_delayed_node(), no need to check for it again. Change the BUG_ON to assertion. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04btrfs: delayed-inode: drop pointless BUG_ON in __btrfs_remove_delayed_item()David Sterba1-2/+0
There's a BUG_ON checking for a valid pointer of fs_info::delayed_root but it is valid since init_mount_fs_info() and has the same lifetime as fs_info. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-16btrfs: switch btrfs_root::delayed_nodes_tree to xarray from radix-treeDavid Sterba1-29/+35
The radix-tree has been superseded by the xarray (https://lwn.net/Articles/745073), this patch converts the btrfs_root::delayed_nodes, the APIs are used in a simple way. First idea is to do xa_insert() but this would require GFP_ATOMIC allocation which we want to avoid if possible. The preload mechanism of radix-tree can be emulated within the xarray API. - xa_reserve() with GFP_NOFS outside of the lock, the reserved entry is inserted atomically at most once - xa_store() under a lock, in case something races in we can detect that and xa_load() returns a valid pointer All uses of xa_load() must check for a valid pointer in case they manage to get between the xa_reserve() and xa_store(), this is handled in btrfs_get_delayed_node(). Otherwise the functionality is equivalent, xarray implements the radix-tree and there should be no performance difference. The patch continues the efforts started in 253bf57555e451 ("btrfs: turn delayed_nodes_tree into an XArray") and fixes the problems with locking and GFP flags 088aea3b97e0ae ("Revert "btrfs: turn delayed_nodes_tree into an XArray""). Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: do not utilize goto to implement delayed inode ref deletionQu Wenruo1-21/+24
[PROBLEM] The function __btrfs_update_delayed_inode() is doing something not meeting the code standard of today: path->slots[0]++ if (path->slots[0] >= btrfs_header_nritems(leaf)) goto search; again: if (!is_the_target_inode_ref()) goto out; ret = btrfs_delete_item(); /* Some cleanup. */ return ret; search: ret = search_for_the_last_inode_ref(); goto again; With the tag named "again", it's pretty common to think it's a loop, but the truth is, we only need to do the search once, to locate the last (also the first, since there should only be one INODE_REF or INODE_EXTREF now) ref of the inode. [FIX] Instead of the weird jumps, just do them in a stream-lined fashion. This removes those weird labels, and add extra comments on why we can do the different searches. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-30Merge tag 'for-6.7-tag' of ↵Linus Torvalds1-16/+11
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "New features: - raid-stripe-tree New tree for logical file extent mapping where the physical mapping may not match on multiple devices. This is now used in zoned mode to implement RAID0/RAID1* profiles, but can be used in non-zoned mode as well. The support for RAID56 is in development and will eventually fix the problems with the current implementation. This is a backward incompatible feature and has to be enabled at mkfs time. - simple quota accounting (squota) A simplified mode of qgroup that accounts all space on the initial extent owners (a subvolume), the snapshots are then cheap to create and delete. The deletion of snapshots in fully accounting qgroups is a known CPU/IO performance bottleneck. The squota is not suitable for the general use case but works well for containers where the original subvolume exists for the whole time. This is a backward incompatible feature as it needs extending some structures, but can be enabled on an existing filesystem. - temporary filesystem fsid (temp_fsid) The fsid identifies a filesystem and is hard coded in the structures, which disallows mounting the same fsid found on different devices. For a single device filesystem this is not strictly necessary, a new temporary fsid can be generated on mount e.g. after a device is cloned. This will be used by Steam Deck for root partition A/B testing, or can be used for VM root images. Other user visible changes: - filesystems with partially finished metadata_uuid conversion cannot be mounted anymore and the uuid fixup has to be done by btrfs-progs (btrfstune). Performance improvements: - reduce reservations for checksum deletions (with enabled free space tree by factor of 4), on a sample workload on file with many extents the deletion time decreased by 12% - make extent state merges more efficient during insertions, reduce rb-tree iterations (run time of critical functions reduced by 5%) Core changes: - the integrity check functionality has been removed, this was a debugging feature and removal does not affect other integrity checks like checksums or tree-checker - space reservation changes: - more efficient delayed ref reservations, this avoids building up too much work or overusing or exhausting the global block reserve in some situations - move delayed refs reservation to the transaction start time, this prevents some ENOSPC corner cases related to exhaustion of global reserve - improvements in reducing excessive reservations for block group items - adjust overcommit logic in near full situations, account for one more chunk to eventually allocate metadata chunk, this is mostly relevant for small filesystems (<10GiB) - single device filesystems are scanned but not registered (except seed devices), this allows temp_fsid to work - qgroup iterations do not need GFP_ATOMIC allocations anymore - cleanups, refactoring, reduced data structure size, function parameter simplifications, error handling fixes" * tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (156 commits) btrfs: open code timespec64 in struct btrfs_inode btrfs: remove redundant log root tree index assignment during log sync btrfs: remove redundant initialization of variable dirty in btrfs_update_time() btrfs: sysfs: show temp_fsid feature btrfs: disable the device add feature for temp-fsid btrfs: disable the seed feature for temp-fsid btrfs: update comment for temp-fsid, fsid, and metadata_uuid btrfs: remove pointless empty log context list check when syncing log btrfs: update comment for struct btrfs_inode::lock btrfs: remove pointless barrier from btrfs_sync_file() btrfs: add and use helpers for reading and writing last_trans_committed btrfs: add and use helpers for reading and writing fs_info->generation btrfs: add and use helpers for reading and writing log_transid btrfs: add and use helpers for reading and writing last_log_commit btrfs: support cloned-device mount capability btrfs: add helper function find_fsid_by_disk btrfs: stop reserving excessive space for block group item insertions btrfs: stop reserving excessive space for block group item updates btrfs: reorder btrfs_inode to fill gaps btrfs: open code btrfs_ordered_inode_tree in btrfs_inode ...
2023-10-30Merge tag 'vfs-6.7.ctime' of ↵Linus Torvalds1-10/+10
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode time accessor updates from Christian Brauner: "This finishes the conversion of all inode time fields to accessor functions as discussed on list. Changing timestamps manually as we used to do before is error prone. Using accessors function makes this robust. It does not contain the switch of the time fields to discrete 64 bit integers to replace struct timespec and free up space in struct inode. But after this, the switch can be trivially made and the patch should only affect the vfs if we decide to do it" * tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits) fs: rename inode i_atime and i_mtime fields security: convert to new timestamp accessors selinux: convert to new timestamp accessors apparmor: convert to new timestamp accessors sunrpc: convert to new timestamp accessors mm: convert to new timestamp accessors bpf: convert to new timestamp accessors ipc: convert to new timestamp accessors linux: convert to new timestamp accessors zonefs: convert to new timestamp accessors xfs: convert to new timestamp accessors vboxsf: convert to new timestamp accessors ufs: convert to new timestamp accessors udf: convert to new timestamp accessors ubifs: convert to new timestamp accessors tracefs: convert to new timestamp accessors sysv: convert to new timestamp accessors squashfs: convert to new timestamp accessors server: convert to new timestamp accessors client: convert to new timestamp accessors ...
2023-10-18btrfs: convert to new timestamp accessorsJeff Layton1-10/+10
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-21-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-12btrfs: open code timespec64 in struct btrfs_inodeDavid Sterba1-8/+4
The type of timespec64::tv_nsec is 'unsigned long', while we have only u32 for on-disk and in-memory. This wastes a few bytes in btrfs_inode. Add separate members for sec and nsec with the corresponding type width. This creates a 4 byte hole in btrfs_inode which can be utilized in the future. Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: remove redundant root argument from btrfs_delayed_update_inode()Filipe Manana1-1/+1
The root argument for btrfs_delayed_update_inode() always matches the root of the given inode, so remove the root argument and get it from the inode argument. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: merge ordered work callbacks in btrfs_work into oneDavid Sterba1-2/+1
There are two callbacks defined in btrfs_work but only two actually make use of them, otherwise there are NULLs. We can get rid of the freeing callback making it a special case of the normal work. This reduces the size of btrfs_work by 8 bytes, final layout: struct btrfs_work { btrfs_func_t func; /* 0 8 */ btrfs_ordered_func_t ordered_func; /* 8 8 */ struct work_struct normal_work; /* 16 32 */ struct list_head ordered_list; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct btrfs_workqueue * wq; /* 64 8 */ long unsigned int flags; /* 72 8 */ /* size: 80, cachelines: 2, members: 6 */ /* last cacheline: 16 bytes */ }; This in turn reduces size of other structures (on a release config): - async_chunk 160 -> 152 - async_submit_bio 152 -> 144 - btrfs_async_delayed_work 104 -> 96 - btrfs_caching_control 176 -> 168 - btrfs_delalloc_work 144 -> 136 - btrfs_fs_info 3608 -> 3600 - btrfs_ordered_extent 440 -> 424 - btrfs_writepage_fixup 104 -> 96 Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: abort transaction on generation mismatch when marking eb as dirtyFilipe Manana1-1/+1
When marking an extent buffer as dirty, at btrfs_mark_buffer_dirty(), we check if its generation matches the running transaction and if not we just print a warning. Such mismatch is an indicator that something really went wrong and only printing a warning message (and stack trace) is not enough to prevent a corruption. Allowing a transaction to commit with such an extent buffer will trigger an error if we ever try to read it from disk due to a generation mismatch with its parent generation. So abort the current transaction with -EUCLEAN if we notice a generation mismatch. For this we need to pass a transaction handle to btrfs_mark_buffer_dirty() which is always available except in test code, in which case we can pass NULL since it operates on dummy extent buffers and all test roots have a single node/leaf (root node at level 0). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: reformat remaining kdoc style commentsDavid Sterba1-3/+3
Function name in the comment does not bring much value to code not exposed as API and we don't stick to the kdoc format anymore. Update formatting of parameter descriptions. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: update comment for reservation of metadata space for delayed itemsFilipe Manana1-1/+1
The second comment at btrfs_delayed_item_reserve_metadata() refers to a field named "index_items_size" of a delayed inode, however that field does not exists - it existed in a previous patch version, but then it split into the fields "curr_index_batch_size" and "index_item_leaves" in the final patch version that was picked. So update the comment. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-11Merge tag 'for-6.6-rc5-tag' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A revert of recent mount option parsing fix, this breaks mounts with security options. The second patch is a flexible array annotation" * tag 'for-6.6-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size() Revert "btrfs: reject unknown mount options early"
2023-10-11btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size()Gustavo A. R. Silva1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). While there, use struct_size() helper, instead of the open-coded version, to calculate the size for the allocation of the whole flexible structure, including of course, the flexible-array member. This code was found with the help of Coccinelle, and audited and fixed manually. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-12Merge tag 'for-6.6-rc1-tag' of ↵Linus Torvalds1-33/+71
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - several fixes for handling directory item (inserting, removing, iteration, error handling) - fix transaction commit stalls when auto relocation is running and blocks other tasks that want to commit - fix a build error when DEBUG is enabled - fix lockdep warning in inode number lookup ioctl - fix race when finishing block group creation - remove link to obsolete wiki in several files * tag 'for-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org btrfs: assert delayed node locked when removing delayed item btrfs: remove BUG() after failure to insert delayed dir index item btrfs: improve error message after failure to add delayed dir index item btrfs: fix a compilation error if DEBUG is defined in btree_dirty_folio btrfs: check for BTRFS_FS_ERROR in pending ordered assert btrfs: fix lockdep splat and potential deadlock after failure running delayed items btrfs: do not block starts waiting on previous transaction commit btrfs: release path before inode lookup during the ino lookup ioctl btrfs: fix race between finishing block group creation and its item update
2023-09-08btrfs: assert delayed node locked when removing delayed itemFilipe Manana1-4/+8
When removing a delayed item, or releasing which will remove it as well, we will modify one of the delayed node's rbtrees and item counter if the delayed item is in one of the rbtrees. This require having the delayed node's mutex locked, otherwise we will race with other tasks modifying the rbtrees and the counter. This is motivated by a previous version of another patch actually calling btrfs_release_delayed_item() after unlocking the delayed node's mutex and against a delayed item that is in a rbtree. So assert at __btrfs_remove_delayed_item() that the delayed node's mutex is locked. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>