summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)AuthorFilesLines
2025-07-22btrfs: open code rcu_string_free() and remove itDavid Sterba2-8/+3
The helper is trivial and we can simply use kfree_rcu() if needed. In our case it's just one place where we rename a device in device_list_add() and the old name can still be used until the end of the RCU grace period. The other case is freeing a device and there nothing should reach the device, so we can use plain kfree(). Reviewed-by: Daniel Vacek <neelx@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: zoned: reserve data_reloc block group on mountJohannes Thumshirn3-0/+65
Create a block group dedicated for data relocation on mount of a zoned filesystem. If there is already more than one empty DATA block group on mount, this one is picked for the data relocation block group, instead of a newly created one. This is done to ensure, there is always space for performing garbage collection and the filesystem is not hitting ENOSPC under heavy overwrite workloads. CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: use btrfs_root_id() where not done yetDavid Sterba3-4/+4
A few more remaining cases where we can use the helper. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: use btrfs_is_data_reloc_root() where not done yetDavid Sterba2-2/+2
Two remaining cases where we can use the helper. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: use on-stack variable for block reserve in btrfs_replace_file_extents()David Sterba1-17/+12
We can avoid potential memory allocation failure in btrfs_replace_file_extents() as the block reserve lifetime is limited to the scope of the function. This requires +48 bytes on stack. Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: use on-stack variable for block reserve in btrfs_truncate()David Sterba1-12/+10
We can avoid potential memory allocation failure in btrfs_truncate() as the block reserve lifetime is limited to the scope of the function. This requires +48 bytes on stack. Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: use on-stack variable for block reserve in btrfs_evict_inode()David Sterba1-13/+12
We can avoid potential memory allocation failure in btrfs_evict_inode() as the block reserve lifetime is limited to the scope of the function. This requires +48 bytes on stack. Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: update comment for xarray fields in struct btrfs_rootSun YangKai1-8/+2
The inode_lock field of struct btrfs_root was removed in commit e2844cce75c9e61("btrfs: remove inode_lock from struct btrfs_root and use xarray locks") but the related comment haven't been updated. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: enable large data folio support under CONFIG_BTRFS_EXPERIMENTALQu Wenruo5-20/+21
With all the preparation patches already merged, it's pretty easy to enable large data folios: - Remove the ASSERT() on folio size in btrfs_end_repair_bio() - Add a helper to properly set the max folio order Currently due to several call sites that are fetching the bitmap content directly into an unsigned long, we can only support BITS_PER_LONG blocks for each bitmap. - Call the helper when reading/creating an inode The support has the following limitations: - No large folios for data reloc inode The relocation code still requires page sized folio. But it's not that hot nor common compared to regular buffered ios. Will be improved in the future. - Requires CONFIG_BTRFS_EXPERIMENTAL - Will require all folio related operations to check if it needs the extra btrfs_subpage structure Now any folio larger than block size will need btrfs_subpage structure handling. Unfortunately I do not have a physical machine for performance test, but if everything goes like XFS/EXT4, it should mostly bring single digits percentage performance improvement in the real world. Although I believe there are still quite some optimizations to be done, let's focus on testing the current large data folio support first. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: use refcount_t type for the extent buffer reference counterFilipe Manana10-42/+41
Instead of using a bare atomic, use the refcount_t type, which despite being a structure that contains only an atomic, has an API that checks for underflows and other hazards. This doesn't change the size of the extent_buffer structure. This removes the need to do things like this: WARN_ON(atomic_read(&eb->refs) == 0); if (atomic_dec_and_test(&eb->refs)) { (...) } And do just: if (refcount_dec_and_test(&eb->refs)) { (...) } Since refcount_dec_and_test() already triggers a warning when we decrement a ref count that has a value of 0 (or below zero). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: add comment for optimization in free_extent_buffer()Filipe Manana1-0/+1
There's this special atomic compare and exchange logic which serves to avoid locking the extent buffers refs_lock spinlock and therefore reduce lock contention, so add a comment to make it more obvious. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: reorganize logic at free_extent_buffer() for better readabilityFilipe Manana1-3/+6
It's hard to read the logic to break out of the while loop since it's a very long expression consisting of a logical or of two composite expressions, each one composed by a logical and. Further each one is also testing for the EXTENT_BUFFER_UNMAPPED bit, making it more verbose than necessary. So change from this: if ((!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags) && refs <= 3) || (test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags) && refs == 1)) break; To this: if (test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags)) { if (refs == 1) break; } else if (refs <= 3) { break; } At least on x86_64 using gcc 9.3.0, this doesn't change the object size. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: make btrfs_readdir_delayed_dir_index() return a bool insteadFilipe Manana3-11/+10
There's no need to return errors, all we do is return 1 or 0 depending on whether we should or should not stop iterating over delayed dir indexes. So change the function to return bool instead of an int. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: make btrfs_should_delete_dir_index() return a bool insteadFilipe Manana2-6/+4
There's no need to return errors and we currently return 1 in case the index should be deleted and 0 otherwise, so change the return type from int to bool as this is a boolean function and it's more clear this way. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: add details to error messages at btrfs_delete_delayed_dir_index()Filipe Manana1-4/+4
Update the error messages with: 1) Fix typo in the first one, deltiona -> deletion; 2) Remove redundant part of the first message, the part following the comma, and including all the useful information: root, inode, index and error value; 3) Update the second message to use more formal language (example 'error' instead of 'err'), , remove redundant part about "deletion tree of delayed node..." and print the relevant information in the same format and order as the first message, without the ugly opening parenthesis without a space separating from the previous word. This also makes the message similar in format to the one we have at btrfs_insert_delayed_dir_index(). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: make btrfs_delete_delayed_insertion_item() return a booleanFilipe Manana1-6/+7
There's no need to return an integer as all we need to do is return true or false to tell whether we deleted a delayed item or not. Also the logic is inverted since we return 1 (true) if we didn't delete and 0 (false) if we did, which is somewhat counter intuitive. Change the return type to a boolean and make it return true if we deleted and false otherwise. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: switch del_all argument of replay_dir_deletes() from int to boolFilipe Manana1-6/+5
The argument has boolean semantics, so change its type from int to bool, making it more clear. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: pass NULL index to btrfs_del_inode_ref() where not neededFilipe Manana2-5/+2
There are two callers of btrfs_del_inode_ref() that declare a local index variable and then pass a pointer for it to btrfs_del_inode_ref(), but then don't use that index at all. Since btrfs_del_inode_ref() accepts a NULL index pointer, pass NULL instead and stop declaring those useless index variables. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: allocate scratch eb earlier at btrfs_log_new_name()Filipe Manana1-4/+11
Instead of allocating the scratch eb after joining the log transaction, allocate it before so that we're not delaying log commits for longer than necessary, as allocating the scratch eb means allocating an extent_buffer structure, which comes from a dedicated kmem_cache, plus pages/folios to attach to the eb. Both of these allocations may take time when we're under memory pressure. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: allocate path earlier at btrfs_log_new_name()Filipe Manana1-7/+9
Instead of allocating the path after joining the log transaction, allocate it before so that we're not delaying log commits for the rare cases where the allocation takes a significant time (under memory pressure and all slabs are full, there's the need to allocate a new page, etc). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: allocate path earlier at btrfs_del_dir_entries_in_log()Filipe Manana1-9/+9
Instead of allocating the path after joining the log transaction, allocate it before so that we're not delaying log commits for the rare cases where the allocation takes a significant time (under memory pressure and all slabs are full, there's the need to allocate a new page, etc). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: assert we join log transaction at btrfs_del_dir_entries_in_log()Filipe Manana1-1/+2
We are supposed to be able to join a log transaction at that point, since we have determined that the inode was logged in the current transaction with the call to inode_logged(). So ASSERT() we joined a log transaction and also warn if we didn't in case assertions are disabled (the kernel config doesn't have CONFIG_BTRFS_ASSERT=y), so that the issue gets noticed and reported if it ever happens. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: use btrfs_del_item() at del_logged_dentry()Filipe Manana1-1/+1
There's no need to use btrfs_delete_one_dir_name() at del_logged_dentry() because we are processing a dir index key which can contain only a single name, unlike dir item keys which can encode multiple names in case of name hash collisions. We have explicitly looked up for a dir index key by calling btrfs_lookup_dir_index_item() and we don't log dir item keys anymore (since commit 339d03542484 ("btrfs: only copy dir index keys when logging a directory")). So simplify and use btrfs_del_item() directly instead of btrfs_delete_one_dir_name(). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: free path sooner at __btrfs_unlink_inode()Filipe Manana1-17/+14
After calling btrfs_delete_one_dir_name() there's no need for the path anymore so we can free it immediately after. After that point we do some btree operations that take time and in those call chains we end up allocating paths for these operations, so we're unnecessarily holding on to the path we allocated early at __btrfs_unlink_inode(). So free the path as soon as we don't need it and add a comment. This also allows to simplify the error path, removing two exit labels and returning directly when an error happens. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: assert we join log transaction at btrfs_del_inode_ref_in_log()Filipe Manana1-1/+2
We are supposed to be able to join a log transaction at that point, since we have determined that the inode was logged in the current transaction with the call to inode_logged(). So ASSERT() we joined a log transaction and also warn if we didn't in case assertions are disabled (the kernel config doesn't have CONFIG_BTRFS_ASSERT=y), so that the issue gets noticed and reported if it ever happens. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: open code fc_mount() to avoid releasing s_umount rw_sempahoreAl Viro1-17/+17
[CURRENT BEHAVIOR] Currently inside btrfs_get_tree_subvol(), we call fc_mount() to grab a tree, then re-lock s_umount inside btrfs_reconfigure_for_mount() to avoid race with remount. However fc_mount() itself is just doing two things: 1. Call vfs_get_tree() 2. Release s_umount then call vfs_create_mount() [ENHANCEMENT] Instead of calling fc_mount(), we can open-code it with vfs_get_tree() first. This provides a benefit that, since we have the full control of s_umount, we do not need to re-lock that rw_sempahore when calling btrfs_reconfigure_for_mount(), meaning less race between RO/RW remount. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Qu Wenruo <wqu@suse.com> [ Rework the subject and commit message, refactor the error handling ] Signed-off-by: Qu Wenruo <wqu@suse.com> Tested-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in scrub_submit_extent_sector_read()David Sterba1-4/+4
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_create_common()David Sterba1-8/+8
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_wait_tree_log_extents()David Sterba1-5/+5
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_wait_extents()David Sterba1-5/+5
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in quota_override_store()David Sterba1-4/+4
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_fill_super()David Sterba1-12/+12
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in calc_pct_ratio()David Sterba1-3/+3
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_symlink()David Sterba1-15/+14
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_link()David Sterba1-13/+13
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_setattr()David Sterba1-11/+11
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_init_inode_security()David Sterba1-7/+7
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_alloc_from_bitmap()David Sterba1-3/+3
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_lock_extent_bits()David Sterba1-5/+5
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret in btrfs_try_lock_extent_bits()David Sterba1-4/+3
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret2 in btrfs_truncate_inode_items()David Sterba1-6/+5
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret2 in btrfs_add_link()David Sterba1-11/+10
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret2 in btrfs_setsize()David Sterba1-4/+4
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret2 in btrfs_search_old_slot()David Sterba1-5/+5
Unify naming of return value to the preferred way, move the variable to the closest scope. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret2 in btrfs_search_slot()David Sterba1-19/+18
Unify naming of return value to the preferred way, move the variable to the closest scope. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret2 in search_leaf()David Sterba1-9/+8
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret2 in read_block_for_search()David Sterba1-7/+7
Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename err to ret2 in resolve_indirect_refs()David Sterba1-5/+5
Unify naming of return value to the preferred way, move the variable to the closest scope. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: rename btrfs_subpage structureQu Wenruo4-150/+156
With the incoming large data folios support, the structure name btrfs_subpage is no longer correct, as for we can have multiple blocks inside a large folio, and the block size is still page size. So to follow the schema of iomap, rename btrfs_subpage to btrfs_folio_state, along with involved enums. There are still exported functions with "btrfs_subpage_" prefix, and I believe for metadata the name "subpage" will stay forever as we will never allocate a folio larger than nodesize anyway. The full cleanup of the word "subpage" will happen in much smaller steps in the future. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: add comments on the extra btrfs specific subpage bitmapsQu Wenruo1-0/+14
Unlike the iomap_folio_state structure, the btrfs_subpage structure has a lot of extra sub-bitmaps, namely: - writeback sub-bitmap - locked sub-bitmap iomap_folio_state uses an atomic for writeback tracking, while it has no per-block locked tracking. This is because iomap always locks a single folio, and submits dirty blocks with that folio locked. But btrfs has async delalloc ranges (for compression), which are queued with their range locked, until the compression is done, then marks the involved range writeback and unlocked. This means a range can be unlocked and marked writeback at seemingly random timing, thus it needs the extra tracking. This needs a huge rework on the lifespan of async delalloc range before we can remove/simplify these two sub-bitmaps. - ordered sub-bitmap - checked sub-bitmap These are for COW-fixup, but as I mentioned in the past, the COW-fixup is not really needed anymore and these two flags are already marked deprecated, and will be removed in the near future after comprehensive tests. Add related comments to indicate we're actively trying to align the sub-bitmaps to the iomap ones. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>