summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)AuthorFilesLines
2023-10-12btrfs: reduce arguments of helpers space accounting root itemDavid Sterba1-12/+11
There are two helpers to increase used bytes of root items that add or subtract one node size, we don't need to pass the argument for that. Rename the function so it matches the root item member that gets changed. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: reduce parameters of btrfs_pin_extent_for_log_replayDavid Sterba3-10/+7
Both callers of btrfs_pin_extent_for_log_replay expand the parameters to extent buffer members. We can simply pass the extent buffer instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: reduce parameters of btrfs_pin_reserved_extentDavid Sterba3-7/+8
There is only one caller of btrfs_pin_reserved_extent that expands the parameters to extent buffer members. We can simply pass the extent buffer instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: drop __must_check annotationsDavid Sterba3-6/+4
Drop all __must_check annotations because they're used in random functions and not consistently. All errors should be handled. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: reformat remaining kdoc style commentsDavid Sterba18-51/+62
Function name in the comment does not bring much value to code not exposed as API and we don't stick to the kdoc format anymore. Update formatting of parameter descriptions. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: move functions comments from qgroup.h to qgroup.cDavid Sterba2-76/+71
We keep the comments next to the implementation, there were some left to move. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: comment about fsid and metadata_uuid relationshipAnand Jain1-0/+9
Add a comment explaining the relationship between fsid and metadata_uuid in the on-disk superblock and the in-memory struct btrfs_fs_devices. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: remove unused helpers for ulist aux dataJiapeng Chong1-10/+0
These functions are defined in the qgroup.c file, but not called anymore since commit "btrfs: qgroup: use qgroup_iterator_nested to in qgroup_update_refcnt()" so we can delete them. fs/btrfs/qgroup.c:149:19: warning: unused function 'qgroup_to_aux'. fs/btrfs/qgroup.c:154:36: warning: unused function 'unode_aux_to_qgroup'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6566 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: prealloc btrfs_qgroup_list for __add_relation_rb()Qu Wenruo1-19/+59
Currently we go GFP_ATOMIC allocation for qgroup relation add, this includes the following 3 call sites: - btrfs_read_qgroup_config() This is not really needed, as at that time we're still in single thread mode, and no spin lock is held. - btrfs_add_qgroup_relation() This one is holding a spinlock, but we're ensured to add at most one relation, thus we can easily do a preallocation and use the preallocated memory to avoid GFP_ATOMIC. - btrfs_qgroup_inherit() This is a little more tricky, as we may have as many relationships as inherit::num_qgroups. Thus we have to properly allocate an array then preallocate all the memory. This patch would remove the GFP_ATOMIC allocation for above involved call sites, by doing preallocation before holding the spinlock, and let __add_relation_rb() to handle the freeing of the structure. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: pre-allocate btrfs_qgroup to reduce GFP_ATOMIC usageQu Wenruo1-26/+61
Qgroup is the heaviest user of GFP_ATOMIC, but one call site does not really need GFP_ATOMIC, that is add_qgroup_rb(). That function only searches the rbtree to find if we already have such entry. If not, then it would try to allocate memory for it. This means we can afford to pre-allocate such structure unconditionally, then free the memory if it's not needed. Considering this function is not a hot path, only utilized by the following functions: - btrfs_qgroup_inherit() For "btrfs subvolume snapshot -i" option. - btrfs_read_qgroup_config() At mount time, and we're ensured there would be no existing rb tree entry for each qgroup. - btrfs_create_qgroup() Thus we're completely safe to pre-allocate the extra memory for btrfs_qgroup structure, and reduce unnecessary GFP_ATOMIC usage. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator_nested to in qgroup_update_refcnt()Qu Wenruo2-42/+53
The ulist @qgroups is utilized to record all involved qgroups from both old and new roots inside btrfs_qgroup_account_extent(). Due to the fact that qgroup_update_refcnt() itself is already utilizing qgroup_iterator, here we have to introduce another list_head, btrfs_qgroup::nested_iterator, allowing nested iteration. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator to replace tmp ulist in ↵Qu Wenruo1-28/+11
qgroup_update_refcnt() For function qgroup_update_refcnt(), we use @tmp list to iterate all the involved qgroups of a subvolume. It's a perfect match for qgroup_iterator facility, as that @tmp ulist has a very limited lifespan (just inside the while() loop). By migrating to qgroup_iterator, we can get rid of the GFP_ATOMIC memory allocation and no error handling is needed. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator in __qgroup_excl_accounting()Qu Wenruo1-64/+17
With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can get rid of the ulist and its GFP_ATOMIC memory allocation. Furthermore we can merge the code handling the initial and parent qgroups into one loop, and drop the @tmp ulist parameter for involved call sites. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator in qgroup_convert_meta()Qu Wenruo1-22/+10
With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can get rid of the ulist and its GFP_ATOMIC memory allocation. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator in btrfs_qgroup_free_refroot()Qu Wenruo1-22/+7
With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can get rid of the ulist and its GFP_ATOMIC memory allocation. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: iterate qgroups without memory allocation for qgroup_reserve()Qu Wenruo2-32/+38
Qgroup heavily relies on ulist to go through all the involved qgroups, but since we're using ulist inside fs_info->qgroup_lock spinlock, this means we're doing a lot of GFP_ATOMIC allocations. This patch reduces the GFP_ATOMIC usage for qgroup_reserve() by eliminating the memory allocation completely. This is done by moving the needed memory to btrfs_qgroup::iterator list_head, so that we can put all the involved qgroup into a on-stack list, thus eliminating the need to allocate memory while holding spinlock. The only cost is the slightly higher memory usage, but considering the reduce GFP_ATOMIC during a hot path, it should still be acceptable. Function qgroup_reserve() is the perfect start point for this conversion. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: remove extraneous includes from ctree.hJosef Bacik1-28/+0
We don't need any of these includes in the ctree.h header file for the header file itself, remove them to clean up ctree.h a little bit. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: include linux/security.h in super.cJosef Bacik1-0/+1
We use some of the security related code in here, include it in super.c so we can remove the include from ctree.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: include trace header in where necessaryJosef Bacik4-0/+4
If we no longer include the tracepoints from ctree.h we fail to compile because we have the dependency in some of the header files and source files. Add the include where we have these dependencies to allow us to remove the include from ctree.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: add btrfs_delayed_ref_head declaration to extent-tree.hJosef Bacik1-0/+1
extent-tree.h uses btrfs_delayed_ref_head in a function argument but doesn't pull it's declaration from anywhere, add it to the top of the header. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: add fscrypt related dependencies to respective headersJosef Bacik4-0/+6
These headers have struct fscrypt_str as function arguments, so add struct fscrypt_str to the theader, and include linux/fscrypt.h in btrfs_inode.h as it also needs the definition of struct fscrypt_name for the new inode args. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: include linux/iomap.h in file.cJosef Bacik1-0/+1
We use the iomap code in file.c, include it so we have our dependencies. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: include asm/unaligned.h in accessors.hJosef Bacik1-0/+1
We use the unaligned helpers directly in accessors.h, add the include here. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: move btrfs_name_hash to dir-item.hJosef Bacik4-5/+9
This is related to the name hashing for dir items, move it into dir-item.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: move btrfs_extref_hash into inode-item.hJosef Bacik2-9/+7
Ideally this would be un-inlined, but that is a cleanup for later. For now move this into inode-item.h, which is where the extref code lives. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: remove btrfs_crc32c wrapperJosef Bacik4-13/+8
This simply sends the same arguments into crc32c(), and is just used in a few places. Remove this wrapper and directly call crc32c() in these instances. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: move btrfs_crc32c_final into free-space-cache.cJosef Bacik2-5/+5
This is the only place this helper is used, take it out of ctree.h and move it into free-space-cache.c. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: do not require EXTENT_NOWAIT for btrfs_redirty_list_add()Qu Wenruo1-1/+1
The flag EXTENT_NOWAIT is a special flag to notify extent-io-tree code that this operation should not sleep for the extent state preallocation. However for btrfs_redirty_list_add(), all callers are able to sleep: - clean_log_buffer() Just 2 lines before, we call btrfs_pin_reserved_extent(), which calls pin_down_extent(), and that function does not require EXTENT_NOWAIT. Thus we're safe to call it without EXTENT_NOWAIT. - btrfs_free_tree_block() This function have several call sites which trigger tree read, e.g. walk_up_proc(), thus we're safe to call it without EXTENT_NOWAIT. Thus there is no need to require EXTENT_NOWAIT flag. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: sipmlify uuid parameters of alloc_fs_devices()Anand Jain2-17/+18
Among all the callers, only the device_list_add() function uses the second argument of alloc_fs_devices(). It passes metadata_uuid when available, otherwise, it passes NULL. And in turn, alloc_fs_devices() is designed to copy either metadata_uuid or fsid into fs_devices::metadata_uuid. So remove the second argument in alloc_fs_devices(), and always copy the fsid. In the caller device_list_add() function, we will overwrite it with metadata_uuid when it is available. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: update comment for reservation of metadata space for delayed itemsFilipe Manana1-1/+1
The second comment at btrfs_delayed_item_reserve_metadata() refers to a field named "index_items_size" of a delayed inode, however that field does not exists - it existed in a previous patch version, but then it split into the fields "curr_index_batch_size" and "index_item_leaves" in the final patch version that was picked. So update the comment. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-11Merge tag 'for-6.6-rc5-tag' of ↵Linus Torvalds3-6/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A revert of recent mount option parsing fix, this breaks mounts with security options. The second patch is a flexible array annotation" * tag 'for-6.6-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size() Revert "btrfs: reject unknown mount options early"
2023-10-11btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size()Gustavo A. R. Silva2-2/+2
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). While there, use struct_size() helper, instead of the open-coded version, to calculate the size for the allocation of the whole flexible structure, including of course, the flexible-array member. This code was found with the help of Coccinelle, and audited and fixed manually. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-10Revert "btrfs: reject unknown mount options early"David Sterba1-4/+0
This reverts commit 5f521494cc73520ffac18ede0758883b9aedd018. The patch breaks mounts with security mount options like $ mount -o context=system_u:object_r:root_t:s0 /dev/sdX /mn mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdX, missing codepage or helper program, ... We cannot reject all unknown options in btrfs_parse_subvol_options() as intended, the security options can be present at this point and it's not possible to enumerate them in a future proof way. This means unknown mount options are silently accepted like before when the filesystem is mounted with either -o subvol=/path or as followup mounts of the same device. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-09btrfs: move btrfs_xattr_handlers to .rodataWedson Almeida Filho2-2/+2
This makes it harder for accidental or malicious changes to btrfs_xattr_handlers at runtime. Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: David Sterba <dsterba@suse.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Link: https://lore.kernel.org/r/20230930050033.41174-6-wedsonaf@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-06Merge tag 'for-6.6-rc4-tag' of ↵Linus Torvalds4-17/+47
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - reject unknown mount options - adjust transaction abort error message level - fix one more build warning with -Wmaybe-uninitialized - proper error handling in several COW-related cases * tag 'for-6.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: error out when reallocating block for defrag using a stale transaction btrfs: error when COWing block from a root that is being deleted btrfs: error out when COWing block using a stale transaction btrfs: always print transaction aborted messages with an error level btrfs: reject unknown mount options early btrfs: fix some -Wmaybe-uninitialized warnings in ioctl.c
2023-10-04fs: super: dynamically allocate the s_shrinkQi Zheng1-1/+1
In preparation for implementing lockless slab shrink, use new APIs to dynamically allocate the s_shrink, so that it can be freed asynchronously via RCU. Then it doesn't need to wait for RCU read-side critical section when releasing the struct super_block. Link: https://lkml.kernel.org/r/20230911094444.68966-39-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04btrfs: error out when reallocating block for defrag using a stale transactionFilipe Manana1-2/+16
At btrfs_realloc_node() we have these checks to verify we are not using a stale transaction (a past transaction with an unblocked state or higher), and the only thing we do is to trigger two WARN_ON(). This however is a critical problem, highly unexpected and if it happens it's most likely due to a bug, so we should error out and turn the fs into error state so that such issue is much more easily noticed if it's triggered. The problem is critical because in btrfs_realloc_node() we COW tree blocks, and using such stale transaction will lead to not persisting the extent buffers used for the COW operations, as allocating tree block adds the range of the respective extent buffers to the ->dirty_pages iotree of the transaction, and a stale transaction, in the unlocked state or higher, will not flush dirty extent buffers anymore, therefore resulting in not persisting the tree block and resource leaks (not cleaning the dirty_pages iotree for example). So do the following changes: 1) Return -EUCLEAN if we find a stale transaction; 2) Turn the fs into error state, with error -EUCLEAN, so that no transaction can be committed, and generate a stack trace; 3) Combine both conditions into a single if statement, as both are related and have the same error message; 4) Mark the check as unlikely, since this is not expected to ever happen. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: error when COWing block from a root that is being deletedFilipe Manana1-3/+7
At btrfs_cow_block() we check if the block being COWed belongs to a root that is being deleted and if so we log an error message. However this is an unexpected case and it indicates a bug somewhere, so we should return an error and abort the transaction. So change this in the following ways: 1) Abort the transaction with -EUCLEAN, so that if the issue ever happens it can easily be noticed; 2) Change the logged message level from error to critical, and change the message itself to print the block's logical address and the ID of the root; 3) Return -EUCLEAN to the caller; 4) As this is an unexpected scenario, that should never happen, mark the check as unlikely, allowing the compiler to potentially generate better code. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: error out when COWing block using a stale transactionFilipe Manana1-8/+16
At btrfs_cow_block() we have these checks to verify we are not using a stale transaction (a past transaction with an unblocked state or higher), and the only thing we do is to trigger a WARN with a message and a stack trace. This however is a critical problem, highly unexpected and if it happens it's most likely due to a bug, so we should error out and turn the fs into error state so that such issue is much more easily noticed if it's triggered. The problem is critical because using such stale transaction will lead to not persisting the extent buffer used for the COW operation, as allocating a tree block adds the range of the respective extent buffer to the ->dirty_pages iotree of the transaction, and a stale transaction, in the unlocked state or higher, will not flush dirty extent buffers anymore, therefore resulting in not persisting the tree block and resource leaks (not cleaning the dirty_pages iotree for example). So do the following changes: 1) Return -EUCLEAN if we find a stale transaction; 2) Turn the fs into error state, with error -EUCLEAN, so that no transaction can be committed, and generate a stack trace; 3) Combine both conditions into a single if statement, as both are related and have the same error message; 4) Mark the check as unlikely, since this is not expected to ever happen. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: always print transaction aborted messages with an error levelFilipe Manana1-2/+2
Commit b7af0635c87f ("btrfs: print transaction aborted messages with an error level") changed the log level of transaction aborted messages from a debug level to an error level, so that such messages are always visible even on production systems where the log level is normally above the debug level (and also on some syzbot reports). Later, commit fccf0c842ed4 ("btrfs: move btrfs_abort_transaction to transaction.c") changed the log level back to debug level when the error number for a transaction abort should not have a stack trace printed. This happened for absolutely no reason. It's always useful to print transaction abort messages with an error level, regardless of whether the error number should cause a stack trace or not. So change back the log level to error level. Fixes: fccf0c842ed4 ("btrfs: move btrfs_abort_transaction to transaction.c") CC: stable@vger.kernel.org # 6.5+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: reject unknown mount options earlyQu Wenruo1-0/+4
[BUG] The following script would allow invalid mount options to be specified (although such invalid options would just be ignored): # mkfs.btrfs -f $dev # mount $dev $mnt1 <<< Successful mount expected # mount $dev $mnt2 -o junk <<< Failed mount expected # echo $? 0 [CAUSE] For the 2nd mount, since the fs is already mounted, we won't go through open_ctree() thus no btrfs_parse_options(), but only through btrfs_parse_subvol_options(). However we do not treat unrecognized options from valid but irrelevant options, thus those invalid options would just be ignored by btrfs_parse_subvol_options(). [FIX] Add the handling for Opt_err to handle invalid options and error out, while still ignore other valid options inside btrfs_parse_subvol_options(). Reported-by: Anand Jain <anand.jain@oracle.com> CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: fix some -Wmaybe-uninitialized warnings in ioctl.cJosef Bacik1-2/+2
Jens reported the following warnings from -Wmaybe-uninitialized recent Linus' branch. In file included from ./include/asm-generic/rwonce.h:26, from ./arch/arm64/include/asm/rwonce.h:71, from ./include/linux/compiler.h:246, from ./include/linux/export.h:5, from ./include/linux/linkage.h:7, from ./include/linux/kernel.h:17, from fs/btrfs/ioctl.c:6: In function ‘instrument_copy_from_user_before’, inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3, inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7, inlined from ‘btrfs_ioctl_space_info’ at fs/btrfs/ioctl.c:2999:6, inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4616:10: ./include/linux/kasan-checks.h:38:27: warning: ‘space_args’ may be used uninitialized [-Wmaybe-uninitialized] 38 | #define kasan_check_write __kasan_check_write ./include/linux/instrumented.h:129:9: note: in expansion of macro ‘kasan_check_write’ 129 | kasan_check_write(to, n); | ^~~~~~~~~~~~~~~~~ ./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’: ./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const volatile void *’ to ‘__kasan_check_write’ declared here 20 | bool __kasan_check_write(const volatile void *p, unsigned int size); | ^~~~~~~~~~~~~~~~~~~ fs/btrfs/ioctl.c:2981:39: note: ‘space_args’ declared here 2981 | struct btrfs_ioctl_space_args space_args; | ^~~~~~~~~~ In function ‘instrument_copy_from_user_before’, inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3, inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7, inlined from ‘_btrfs_ioctl_send’ at fs/btrfs/ioctl.c:4343:9, inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4658:10: ./include/linux/kasan-checks.h:38:27: warning: ‘args32’ may be used uninitialized [-Wmaybe-uninitialized] 38 | #define kasan_check_write __kasan_check_write ./include/linux/instrumented.h:129:9: note: in expansion of macro ‘kasan_check_write’ 129 | kasan_check_write(to, n); | ^~~~~~~~~~~~~~~~~ ./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’: ./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const volatile void *’ to ‘__kasan_check_write’ declared here 20 | bool __kasan_check_write(const volatile void *p, unsigned int size); | ^~~~~~~~~~~~~~~~~~~ fs/btrfs/ioctl.c:4341:49: note: ‘args32’ declared here 4341 | struct btrfs_ioctl_send_args_32 args32; | ^~~~~~ This was due to his config options and having KASAN turned on, which adds some extra checks around copy_from_user(), which then triggered the -Wmaybe-uninitialized checker for these cases. Fix the warnings by initializing the different structs we're copying into. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-26Merge tag 'for-6.6-rc3-tag' of ↵Linus Torvalds8-33/+63
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - delayed refs fixes: - fix race when refilling delayed refs block reserve - prevent transaction block reserve underflow when starting transaction - error message and value adjustments - fix build warnings with CONFIG_CC_OPTIMIZE_FOR_SIZE and -Wmaybe-uninitialized - fix for smatch report where uninitialized data from invalid extent buffer range could be returned to the caller - fix numeric overflow in statfs when calculating lower threshold for a full filesystem * tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: initialize start_slot in btrfs_log_prealloc_extents btrfs: make sure to initialize start and len in find_free_dev_extent btrfs: reset destination buffer when read_extent_buffer() gets invalid range btrfs: properly report 0 avail for very full file systems btrfs: log message if extent item not found when running delayed extent op btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref() btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1 btrfs: prevent transaction block reserve underflow when starting transaction btrfs: fix race when refilling delayed refs block reserve
2023-09-21Merge tag 'v6.6-rc3.vfs.ctime.revert' of ↵Linus Torvalds2-7/+22
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull finegrained timestamp reverts from Christian Brauner: "Earlier this week we sent a few minor fixes for the multi-grained timestamp work in [1]. While we were polishing those up after Linus realized that there might be a nicer way to fix them we received a regression report in [2] that fine grained timestamps break gnulib tests and thus possibly other tools. The kernel will elide fine-grain timestamp updates when no one is actively querying for them to avoid performance impacts. So a sequence like write(f1) stat(f2) write(f2) stat(f2) write(f1) stat(f1) may result in timestamp f1 to be older than the final f2 timestamp even though f1 was last written too but the second write didn't update the timestamp. Such plotholes can lead to subtle bugs when programs compare timestamps. For example, the nap() function in [2] will estimate that it needs to wait one ns on a fine-grain timestamp enabled filesytem between subsequent calls to observe a timestamp change. But in general we don't update timestamps with more than one jiffie if we think that no one is actively querying for fine-grain timestamps to avoid performance impacts. While discussing various fixes the decision was to go back to the drawing board and ultimately to explore a solution that involves only exposing such fine-grained timestamps to nfs internally and never to userspace. As there are multiple solutions discussed the honest thing to do here is not to fix this up or disable it but to cleanly revert. The general infrastructure will probably come back but there is no reason to keep this code in mainline. The general changes to timestamp handling are valid and a good cleanup that will stay. The revert is fully bisectable" Link: https://lore.kernel.org/all/20230918-hirte-neuzugang-4c2324e7bae3@brauner [1] Link: https://lore.kernel.org/all/bf0524debb976627693e12ad23690094e4514303.camel@linuxfromscratch.org [2] * tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: Revert "fs: add infrastructure for multigrain timestamps" Revert "btrfs: convert to multigrain timestamps" Revert "ext4: switch to multigrain timestamps" Revert "xfs: switch to multigrain timestamps" Revert "tmpfs: add support for multigrain timestamps"
2023-09-21btrfs: initialize start_slot in btrfs_log_prealloc_extentsJosef Bacik1-1/+1
Jens reported a compiler warning when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this fs/btrfs/tree-log.c: In function ‘btrfs_log_prealloc_extents’: fs/btrfs/tree-log.c:4828:23: warning: ‘start_slot’ may be used uninitialized [-Wmaybe-uninitialized] 4828 | ret = copy_items(trans, inode, dst_path, path, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4829 | start_slot, ins_nr, 1, 0); | ~~~~~~~~~~~~~~~~~~~~~~~~~ fs/btrfs/tree-log.c:4725:13: note: ‘start_slot’ was declared here 4725 | int start_slot; | ^~~~~~~~~~ The compiler is incorrect, as we only use this code when ins_len > 0, and when ins_len > 0 we have start_slot properly initialized. However we generally find the -Wmaybe-uninitialized warnings valuable, so initialize start_slot to get rid of the warning. Reported-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-21btrfs: make sure to initialize start and len in find_free_dev_extentJosef Bacik1-7/+6
Jens reported a compiler error when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this In function ‘gather_device_info’, inlined from ‘btrfs_create_chunk’ at fs/btrfs/volumes.c:5507:8: fs/btrfs/volumes.c:5245:48: warning: ‘dev_offset’ may be used uninitialized [-Wmaybe-uninitialized] 5245 | devices_info[ndevs].dev_offset = dev_offset; | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ fs/btrfs/volumes.c: In function ‘btrfs_create_chunk’: fs/btrfs/volumes.c:5196:13: note: ‘dev_offset’ was declared here 5196 | u64 dev_offset; This occurs because find_free_dev_extent is responsible for setting dev_offset, however if we get an -ENOMEM at the top of the function we'll return without setting the value. This isn't actually a problem because we will see the -ENOMEM in gather_device_info() and return and not use the uninitialized value, however we also just don't want the compiler warning so rework the code slightly in find_free_dev_extent() to make sure it's always setting *start and *len to avoid the compiler warning. Reported-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20btrfs: reset destination buffer when read_extent_buffer() gets invalid rangeQu Wenruo1-1/+7
Commit f98b6215d7d1 ("btrfs: extent_io: do extra check for extent buffer read write functions") changed how we handle invalid extent buffer range for read_extent_buffer(). Previously if the range is invalid we just set the destination to zero, but after the patch we do nothing and error out. This can lead to smatch static checker errors like: fs/btrfs/print-tree.c:186 print_uuid_item() error: uninitialized symbol 'subvol_id'. fs/btrfs/tests/extent-io-tests.c:338 check_eb_bitmap() error: uninitialized symbol 'has'. fs/btrfs/tests/extent-io-tests.c:353 check_eb_bitmap() error: uninitialized symbol 'has'. fs/btrfs/uuid-tree.c:203 btrfs_uuid_tree_remove() error: uninitialized symbol 'read_subid'. fs/btrfs/uuid-tree.c:353 btrfs_uuid_tree_iterate() error: uninitialized symbol 'subid_le'. fs/btrfs/uuid-tree.c:72 btrfs_uuid_tree_lookup() error: uninitialized symbol 'data'. fs/btrfs/volumes.c:7415 btrfs_dev_stats_value() error: uninitialized symbol 'val'. Fix those warnings by reverting back to the old memset() behavior. By this we keep the static checker happy and would still make a lot of noise when such invalid ranges are passed in. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: f98b6215d7d1 ("btrfs: extent_io: do extra check for extent buffer read write functions") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20btrfs: properly report 0 avail for very full file systemsJosef Bacik1-1/+1
A user reported some issues with smaller file systems that get very full. While investigating this issue I noticed that df wasn't showing 100% full, despite having 0 chunk space and having < 1MiB of available metadata space. This turns out to be an overflow issue, we're doing: total_available_metadata_space - SZ_4M < global_block_rsv_size to determine if there's not enough space to make metadata allocations, which overflows if total_available_metadata_space is < 4M. Fix this by checking to see if our available space is greater than the 4M threshold. This makes df properly report 100% usage on the file system. CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20btrfs: log message if extent item not found when running delayed extent opFilipe Manana1-1/+4
When running a delayed extent operation, if we don't find the extent item in the extent tree we just return -EIO without any logged message. This indicates some bug or possibly a memory or fs corruption, so the return value should not be -EIO but -EUCLEAN instead, and since it's not expected to ever happen, print an informative error message so that if it happens we have some idea of what went wrong, where to look at. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()Filipe Manana1-4/+3
At __btrfs_inc_extent_ref() we are doing a BUG_ON() if we are dealing with a tree block reference that has a reference count that is different from 1, but we have already dealt with this case at run_delayed_tree_ref(), making it useless. So remove the BUG_ON(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>