summaryrefslogtreecommitdiff
path: root/fs/btrfs/inode.c
AgeCommit message (Collapse)AuthorFilesLines
2024-11-19Merge tag 'for-6.13-tag' of ↵Linus Torvalds1-241/+254
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Changes outside of btrfs: add io_uring command flag to track a dying task (the rest will go via the block git tree). User visible changes: - wire encoded read (ioctl) to io_uring commands, this can be used on itself, in the future this will allow 'send' to be asynchronous. As a consequence, the encoded read ioctl can also work in non-blocking mode - new ioctl to wait for cleaned subvolumes, no need to use the generic and root-only SEARCH_TREE ioctl, will be used by "btrfs subvol sync" - recognize different paths/symlinks for the same devices and don't report them during rescanning, this can be observed with LVM or DM - seeding device use case change, the sprout device (the one capturing new writes) will not clear the read-only status of the super block; this prevents accumulating space from deleted snapshots Performance improvements: - reduce lock contention when traversing extent buffers - reduce extent tree lock contention when searching for inline backref - switch from rb-trees to xarray for delayed ref tracking, improvements due to better cache locality, branching factors and more compact data structures - enable extent map shrinker again (prevent memory exhaustion under some types of IO load), reworked to run in a single worker thread (there used to be problems causing long stalls under memory pressure) Core changes: - raid-stripe-tree feature updates: - make device replace and scrub work - implement partial deletion of stripe extents - new selftests - split the config option BTRFS_DEBUG and add EXPERIMENTAL for features that are experimental or with known problems so we don't misuse debugging config for that - subpage mode updates (sector < page): - update compression implementations - update writepage, writeback - continued folio API conversions: - buffered writes - make buffered write copy one page at a time, preparatory work for future integration with large folios, may cause performance drop - proper locking of root item regarding starting send - error handling improvements - code cleanups and refactoring: - dead code removal - unused parameter reduction - lockdep assertions" * tag 'for-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (119 commits) btrfs: send: check for read-only send root under critical section btrfs: send: check for dead send root under critical section btrfs: remove check for NULL fs_info at btrfs_folio_end_lock_bitmap() btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl() btrfs: fix a typo in btrfs_use_zone_append btrfs: avoid superfluous calls to free_extent_map() in btrfs_encoded_read() btrfs: simplify logic to decrement snapshot counter at btrfs_mksnapshot() btrfs: remove hole from struct btrfs_delayed_node btrfs: update stale comment for struct btrfs_delayed_ref_node::add_list btrfs: add new ioctl to wait for cleaned subvolumes btrfs: simplify range tracking in cow_file_range() btrfs: remove conditional path allocation in btrfs_read_locked_inode() btrfs: push cleanup into btrfs_read_locked_inode() io_uring/cmd: let cmds to know about dying task btrfs: add struct io_btrfs_cmd as type for io_uring_cmd_to_pdu() btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl) btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages() btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is set btrfs: change btrfs_encoded_read() so that reading of extent is done by caller btrfs: remove pointless iocb::ki_pos addition in btrfs_encoded_read() ...
2024-11-18Merge tag 'vfs-6.13.pagecache' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs pagecache updates from Christian Brauner: "Cleanup filesystem page flag usage: This continues the work to make the mappedtodisk/owner_2 flag available to filesystems which don't use buffer heads. Further patches remove uses of Private2. This brings us very close to being rid of it entirely" * tag 'vfs-6.13.pagecache' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: migrate: Remove references to Private2 ceph: Remove call to PagePrivate2() btrfs: Switch from using the private_2 flag to owner_2 mm: Remove PageMappedToDisk nilfs2: Convert nilfs_copy_buffer() to use folios fs: Move clearing of mappedtodisk to buffer.c
2024-11-18Merge tag 'vfs-6.13.misc' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Fixup and improve NLM and kNFSD file lock callbacks Last year both GFS2 and OCFS2 had some work done to make their locking more robust when exported over NFS. Unfortunately, part of that work caused both NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send lock notifications to clients This in itself is not a huge problem because most NFS clients will still poll the server in order to acquire a conflicted lock It's important for NLM and kNFSD that they do not block their kernel threads inside filesystem's file_lock implementations because that can produce deadlocks. We used to make sure of this by only trusting that posix_lock_file() can correctly handle blocking lock calls asynchronously, so the lock managers would only setup their file_lock requests for async callbacks if the filesystem did not define its own lock() file operation However, when GFS2 and OCFS2 grew the capability to correctly handle blocking lock requests asynchronously, they started signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting posix_lock_file() was inadvertently dropped, so now most filesystems no longer produce lock notifications when exported over NFS Fix this by using an fop_flag which greatly simplifies the problem and grooms the way for future uses by both filesystems and lock managers alike - Add a sysctl to delete the dentry when a file is removed instead of making it a negative dentry Commit 681ce8623567 ("vfs: Delete the associated dentry when deleting a file") introduced an unconditional deletion of the associated dentry when a file is removed. However, this led to performance regressions in specific benchmarks, such as ilebench.sum_operations/s, prompting a revert in commit 4a4be1ad3a6e ("Revert "vfs: Delete the associated dentry when deleting a file""). This reintroduces the concept conditionally through a sysctl - Expand the statmount() system call: * Report the filesystem subtype in a new fs_subtype field to e.g., report fuse filesystem subtypes * Report the superblock source in a new sb_source field * Add a new way to return filesystem specific mount options in an option array that returns filesystem specific mount options separated by zero bytes and unescaped. This allows caller's to retrieve filesystem specific mount options and immediately pass them to e.g., fsconfig() without having to unescape or split them * Report security (LSM) specific mount options in a separate security option array. We don't lump them together with filesystem specific mount options as security mount options are generic and most users aren't interested in them The format is the same as for the filesystem specific mount option array - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command - Optimize acl_permission_check() to avoid costly {g,u}id ownership checks if possible - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict() - Add synchronous wakeup support for ep_poll_callback. Currently, epoll only uses wake_up() to wake up task. But sometimes there are epoll users which want to use the synchronous wakeup flag to give a hint to the scheduler, e.g., the Android binder driver. So add a wake_up_sync() define, and use wake_up_sync() when sync is true in ep_poll_callback() Fixes: - Fix kernel documentation for inode_insert5() and iget5_locked() - Annotate racy epoll check on file->f_ep - Make F_DUPFD_QUERY associative - Avoid filename buffer overrun in initramfs - Don't let statmount() return empty strings - Add a cond_resched() to dump_user_range() to avoid hogging the CPU - Don't query the device logical blocksize multiple times for hfsplus - Make filemap_read() check that the offset is positive or zero Cleanups: - Various typo fixes - Cleanup wbc_attach_fdatawrite_inode() - Add __releases annotation to wbc_attach_and_unlock_inode() - Add hugetlbfs tracepoints - Fix various vfs kernel doc parameters - Remove obsolete TODO comment from io_cancel() - Convert wbc_account_cgroup_owner() to take a folio - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add() - Reorder struct posix_acl to save 8 bytes - Annotate struct posix_acl with __counted_by() - Replace one-element array with flexible array member in freevxfs - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()" * tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) statmount: retrieve security mount options vfs: make evict() use smp_mb__after_spinlock instead of smp_mb statmount: add flag to retrieve unescaped options fs: add the ability for statmount() to report the sb_source writeback: wbc_attach_fdatawrite_inode out of line writeback: add a __releases annoation to wbc_attach_and_unlock_inode fs: add the ability for statmount() to report the fs_subtype fs: don't let statmount return empty strings fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel() hfsplus: don't query the device logical block size multiple times freevxfs: Replace one-element array with flexible array member fs: optimize acl_permission_check() initramfs: avoid filename buffer overrun fs/writeback: convert wbc_account_cgroup_owner to take a folio acl: Annotate struct posix_acl with __counted_by() acl: Realign struct posix_acl to save 8 bytes epoll: Add synchronous wakeup support for ep_poll_callback coredump: add cond_resched() to dump_user_range mm/page-writeback.c: Fix comment of wb_domain_writeout_add() mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL ...
2024-11-11btrfs: avoid superfluous calls to free_extent_map() in btrfs_encoded_read()Mark Harmstone1-2/+2
Change the control flow of btrfs_encoded_read() so that it doesn't call free_extent_map() when we know that this has already been done. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Mark Harmstone <maharmstone@fb.com> Suggested-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: simplify range tracking in cow_file_range()Haisu Wang1-18/+14
Simplify tracking of the range processed by using cur_alloc_size only to store the reserved part that may fail to the allocated extent. Remove the ram_size as well since it is always equal to cur_alloc_size in the context. Advance the start in normal path until extent allocation succeeds and keep the start unchanged in the error handling path. Passed the fstest generic/475 test for a hundred times with quota enabled. And a modified generic/475 test by removing the sleep time for a hundred times. About one tenth of the tests do enter the error handling path due to fail to reserve extent. Suggested-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: remove conditional path allocation in btrfs_read_locked_inode()Leo Martins1-17/+29
Remove conditional path allocation from btrfs_read_locked_inode(). Add an ASSERT(path) to indicate it should never be called with a NULL path. Call btrfs_read_locked_inode() directly from btrfs_iget(). This causes code duplication between btrfs_iget() and btrfs_iget_path(), but I think this is justifiable as it removes the need for conditionally allocating the path inside of btrfs_read_locked_inode(). This makes the code easier to reason about and makes it clear who has the responsibility of allocating and freeing the path. Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: push cleanup into btrfs_read_locked_inode()Leo Martins1-48/+53
Move btrfs_add_inode_to_root() so it can be called from btrfs_read_locked_inode(), no changes were made to the function. Move cleanup code from btrfs_iget_path() to btrfs_read_locked_inode. This improves readability and improves a leaky abstraction. Previously btrfs_iget_path() had to handle a positive error case as a result of a call to btrfs_search_slot(), but it makes more sense to handle this closer to the source of the call. Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl)Mark Harmstone1-10/+31
Add an io_uring command for encoded reads, using the same interface as the existing BTRFS_IOC_ENCODED_READ ioctl. btrfs_uring_encoded_read() is an io_uring version of btrfs_ioctl_encoded_read(), which validates the user input and calls btrfs_encoded_read() to read the appropriate metadata. If we determine that we need to read an extent from disk, we call btrfs_encoded_read_regular_fill_pages() through btrfs_uring_read_extent() to prepare the bio. The existing btrfs_encoded_read_regular_fill_pages() is changed so that if it is passed a valid uring_ctx, rather than waking up any waiting threads it calls btrfs_uring_read_extent_endio(). This in turn copies the read data back to userspace, and calls io_uring_cmd_done() to complete the io_uring command. Because we're potentially doing a non-blocking read, btrfs_uring_read_extent() doesn't clean up after itself if it returns -EIOCBQUEUED. Instead, it allocates a priv struct, populates the fields there that we will need to unlock the inode and free our allocations, and defers this to the btrfs_uring_read_finished() that gets called when the bio completes. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()Mark Harmstone1-11/+18
Change btrfs_encoded_read_regular_fill_pages() so that the priv struct is allocated rather than stored on the stack, in preparation for adding an asynchronous mode to the function. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is setMark Harmstone1-11/+44
Change btrfs_encoded_read() so that it returns -EAGAIN rather than sleeps if IOCB_NOWAIT is set in iocb->ki_flags. The conditions that require sleeping are: inode lock, writeback, extent lock, ordered range. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: change btrfs_encoded_read() so that reading of extent is done by callerMark Harmstone1-30/+27
Change the behaviour of btrfs_encoded_read() so that if it needs to read an extent from disk, it leaves the extent and inode locked and returns -EIOCBQUEUED. The caller is then responsible for doing the I/O via btrfs_encoded_read_regular() and unlocking the extent and inode. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: remove pointless iocb::ki_pos addition in btrfs_encoded_read()Mark Harmstone1-4/+1
iocb->ki_pos isn't used after this function, so there's no point in changing its value. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: use filemap_get_folio() helperAnand Jain1-4/+3
When fgp_flags and gfp_flags are zero, use filemap_get_folio(A, B) instead of __filemap_get_folio(A, B, 0, 0)—no need for the extra arguments 0, 0. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: fix wrong sizeof in btrfs_do_encoded_write()Mark Harmstone1-1/+1
btrfs_do_encoded_write() was converted to use folios in 400b172b8cdc, but we're still allocating based on sizeof(struct page *) rather than sizeof(struct folio *). There's no functional change. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: drop unused parameter file_offset from ↵David Sterba1-3/+3
btrfs_encoded_read_regular_fill_pages() The file_offset parameter used to be passed to encoded read struct but was removed in commit b665affe93d8 ("btrfs: remove unused members from struct btrfs_encoded_read_private"). Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: drop unused parameter offset from __cow_file_range_inline()David Sterba1-3/+3
We don't need offset for inline extents, they always start from 0. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: drop unused parameter inode from read_inline_extent()David Sterba1-3/+2
We don't need the inode pointer to read inline extent, it's all accessible from the path pointer. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: remove btrfs_set_range_writeback()Qu Wenruo1-22/+0
The function btrfs_set_range_writeback() was originally a callback for metadata and data, to mark a range with writeback flag. Then it was converted into a common function call for both metadata and data. From the very beginning, the function had been only called on a full page, later converted to handle range inside a page. But it never needed to handle multiple pages, and since commit 8189197425e7 ("btrfs: refactor __extent_writepage_io() to do sector-by-sector submission") the function was only called on a sector-by-sector basis. This makes the function unnecessary, and can be converted to a simple btrfs_folio_set_writeback() call instead. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: handle empty list of NOCOW ordered extents with checksum listJohannes Thumshirn1-1/+6
Currently we BUG_ON() in btrfs_finish_one_ordered() if we are finishing an ordered extent that is flagged as NOCOW, but it's checksum list is not empty. This is clearly a logic error which we can recover from by aborting the transaction. For developer builds which enable CONFIG_BTRFS_ASSERT, also ASSERT() that the list is empty. Suggested-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: correct typos in multiple comments across various filesShen Lichuan1-1/+1
Fix some confusing spelling errors that were currently identified, the details are as follows: block-group.c: 2800: uncompressible ==> incompressible extent-tree.c: 3131: EXTEMT ==> EXTENT extent_io.c: 3124: utlizing ==> utilizing extent_map.c: 1323: ealier ==> earlier extent_map.c: 1325: possiblity ==> possibility fiemap.c: 189: emmitted ==> emitted fiemap.c: 197: emmitted ==> emitted fiemap.c: 203: emmitted ==> emitted transaction.h: 36: trasaction ==> transaction volumes.c: 5312: filesysmte ==> filesystem zoned.c: 1977: trasnsaction ==> transaction Signed-off-by: Shen Lichuan <shenlichuan@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: remove code duplication in ordered extent finishingJohannes Thumshirn1-29/+19
Remove the duplicated transaction joining, block reserve setting and raid extent inserting in btrfs_finish_ordered_extent(). While at it, also abort the transaction in case inserting a RAID stripe-tree entry fails. Suggested-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: allow compression even if the range is not page alignedQu Wenruo1-34/+7
Previously for btrfs with sector size smaller than page size (subpage), we only allow compression if the range is fully page aligned. This is to work around the asynchronous submission of compressed range, which delayed the page unlock and writeback into a workqueue, furthermore asynchronous submission can lock multiple sector range across page boundary. Such asynchronous submission makes it very hard to co-operate with other regular writes. With the recent changes to the subpage folio unlock path, now asynchronous submission of compressed pages can co-operate with regular submission, so enable sector perfect compression if it's an experimental build. The ETA for moving this feature out of experimental is 6.15, and I hope all remaining corner cases can be exposed before that. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: make extent_range_clear_dirty_for_io() to handle sector size < page ↵Qu Wenruo1-1/+2
size cases For btrfs with sector size < page size (e.g. 4K sector size, 64K page size), and enable the sector perfect compression support, then the following dirty range can lead to problems: 0 32K 64K 96K 128K | |///////||//////| |/| 124K In above case, if we start writeback for that inode, the last dirty range [124K, 128K) will not be submitted and cause reserved space leakage: - Start writeback for page 0 We find the range [32K, 96K) is suitable for compression, and queue it into a workqueue to do the delayed compression and submission. - Compression happens for range [32K, 96K) Function extent_range_clear_dirty_for_io() is called, however it is only doing full page handling, not considering any the extra bitmaps for subpage cases. That function will clear page dirty for both page 0 and page 64K. - Writeback for the inode is done Because page 64K has its dirty flag cleared, it will not be considered as a writeback target. This means the range [124K, 128K) will not be submitted, and reserved space for it will be leaked. Fix this problem by using the subpage helper to clear the dirty flag. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-08Merge tag 'for-6.12-rc6-tag' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more one-liners that fix some user visible problems: - use correct range when clearing qgroup reservations after COW - properly reset freed delayed ref list head - fix ro/rw subvolume mounts to be backward compatible with old and new mount API" * tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix the length of reserved qgroup to free btrfs: reinitialize delayed ref list after deleting it from the list btrfs: fix per-subvolume RO/RW flags with new mount API
2024-11-07btrfs: fix the length of reserved qgroup to freeHaisu Wang1-1/+1
The dealloc flag may be cleared and the extent won't reach the disk in cow_file_range when errors path. The reserved qgroup space is freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range"). However, the length of untouched region to free needs to be adjusted with the correct remaining region size. Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range") CC: stable@vger.kernel.org # 6.11+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-28fs/writeback: convert wbc_account_cgroup_owner to take a folioPankaj Raghav1-1/+1
Most of the callers of wbc_account_cgroup_owner() are converting a folio to page before calling the function. wbc_account_cgroup_owner() is converting the page back to a folio to call mem_cgroup_css_from_folio(). Convert wbc_account_cgroup_owner() to take a folio instead of a page, and convert all callers to pass a folio directly except f2fs. Convert the page to folio for all the callers from f2fs as they were the only callers calling wbc_account_cgroup_owner() with a page. As f2fs is already in the process of converting to folios, these call sites might also soon be calling wbc_account_cgroup_owner() with a folio directly in the future. No functional changes. Only compile tested. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240926140121.203821-1-kernel@pankajraghav.com Acked-by: David Sterba <dsterba@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-24Merge tag 'for-6.12-rc4-tag' of ↵Linus Torvalds1-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - mount option fixes: - fix handling of compression mount options on remount - reject rw remount in case there are options that don't work in read-write mode (like rescue options) - fix zone accounting of unusable space - fix in-memory corruption when merging extent maps - fix delalloc range locking for sector < page - use more convenient default value of drop subtree threshold, clean more subvolumes without the fallback to marking quotas inconsistent - fix smatch warning about incorrect value passed to ERR_PTR * tag 'for-6.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item() btrfs: reject ro->rw reconfiguration if there are hard ro requirements btrfs: fix read corruption due to race with extent map merging btrfs: fix the delalloc range locking if sector size < page size btrfs: qgroup: set a more sane default value for subtree drop threshold btrfs: clear force-compress on remount when compress mount option is given btrfs: zoned: fix zone unusable accounting for freed reserved extent
2024-10-22btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item()Yue Haibing1-5/+2
The ret may be zero in btrfs_search_dir_index_item() and should not passed to ERR_PTR(). Now btrfs_unlink_subvol() is the only caller to this, reconstructed it to check ERR_PTR(-ENOENT) while ret >= 0. This fixes smatch warnings: fs/btrfs/dir-item.c:353 btrfs_search_dir_index_item() warn: passing zero to 'ERR_PTR' Fixes: 9dcbe16fccbb ("btrfs: use btrfs_for_each_slot in btrfs_search_dir_index_item") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-04Merge tag 'for-6.12-rc1-tag' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - in incremental send, fix invalid clone operation for file that got its size decreased - fix __counted_by() annotation of send path cache entries, we do not store the terminating NUL - fix a longstanding bug in relocation (and quite hard to hit by chance), drop back reference cache that can get out of sync after transaction commit - wait for fixup worker kthread before finishing umount - add missing raid-stripe-tree extent for NOCOW files, zoned mode cannot have NOCOW files but RST is meant to be a standalone feature - handle transaction start error during relocation, avoid potential NULL pointer dereference of relocation control structure (reported by syzbot) - disable module-wide rate limiting of debug level messages - minor fix to tracepoint definition (reported by checkpatch.pl) * tag 'for-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: disable rate limiting when debug enabled btrfs: wait for fixup workers before stopping cleaner kthread during umount btrfs: fix a NULL pointer dereference when failed to start a new trasacntion btrfs: send: fix invalid clone operation for file that got its size decreased btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class btrfs: drop the backref cache during relocation if we commit btrfs: also add stripe entries for NOCOW writes btrfs: send: fix buffer overflow detection when copying path to cache entry
2024-10-04btrfs: Switch from using the private_2 flag to owner_2Matthew Wilcox (Oracle)1-4/+4
We are close to removing the private_2 flag, so switch btrfs to using owner_2 for its ordered flag. This is mostly used by buffer head filesystems, so btrfs can use it because it doesn't use buffer heads. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20241002040111.1023018-5-willy@infradead.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-03move asm/unaligned.h to linux/unaligned.hAl Viro1-1/+1
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-01btrfs: also add stripe entries for NOCOW writesJohannes Thumshirn1-0/+5
NOCOW writes do not generate stripe_extent entries in the RAID stripe tree, as the RAID stripe-tree feature initially was designed with a zoned filesystem in mind and on a zoned filesystem, we do not allow NOCOW writes. But the RAID stripe-tree feature is independent from the zoned feature, so we must also do NOCOW writes for RAID stripe-tree filesystems. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_decompress() to take a folioLi Zetao1-1/+1
The old page API is being gradually replaced and converted to use folio to improve code readability and avoid repeated conversion between page and folio. Based on the previous patch, the compression path can be directly used in folio without converting to page. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert try_release_extent_mapping() to take a folioLi Zetao1-1/+1
The old page API is being gradually replaced and converted to use folio to improve code readability and avoid repeated conversion between page and folio. And page_to_inode() can be replaced with folio_to_inode() now. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert clear_page_extent_mapped() to take a folioLi Zetao1-2/+2
The old page API is being gradually replaced and converted to use folio to improve code readability and avoid repeated conversion between page and folio. Now clear_page_extent_mapped() can deal with a folio directly, so change its name to clear_folio_extent_mapped(). Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: drop transaction parameter from btrfs_add_inode_defrag()David Sterba1-1/+1
There's only one caller inode_should_defrag() that passes NULL to btrfs_add_inode_defrag() so we can drop it an simplify the code. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: rename __extent_writepage() and drop double underscoresDavid Sterba1-1/+1
The function does not follow the pattern where the underscores would be justified, so rename it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: rename btrfs_submit_bio() to btrfs_submit_bbio()David Sterba1-2/+2
The function name is a bit misleading as it submits the btrfs_bio (bbio), rename it so we can use btrfs_submit_bio() when an actual bio is submitted. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: add comment about locking in cow_file_range_inline()Boris Burkov1-0/+14
Add a comment to document the complicated locked_page unlock logic in cow_file_range_inline. The specifically tricky part is that a caller just up the stack converts ret == 0 to ret == 1 and then another caller far up the callstack handles ret == 1 as a success, AND returns without cleanup in that case, both of which "feel" unnatural and led to the original bug. Try to document that somewhat specific callstack logic here to explain the weird un-setting of locked_folio on success. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert extent_range_clear_dirty_for_io() to use a folioJosef Bacik1-6/+6
Instead of getting a page and using that to clear dirty for io, use the folio helper and use the appropriate folio functions. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert insert_inline_extent() to use a folioJosef Bacik1-4/+7
We only use a page to copy in the data for the inline extent. Use a folio for this instead. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_set_range_writeback() to use a folioJosef Bacik1-6/+6
We already use a lot of functions here that use folios, update the function to use __filemap_get_folio instead of find_get_page and then use the folio directly. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert wait_subpage_spinlock() to only use a folioJosef Bacik1-6/+5
Currently this already uses a folio for most things, update it to take a folio and update all the page usage with the corresponding folio usage. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_get_extent() to take a folioJosef Bacik1-3/+3
We only pass this into read_inline_extent, change it to take a folio and update the callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert read_inline_extent() to use a folioJosef Bacik1-7/+7
Instead of using a page, use a folio instead, take a folio as an argument, and update the callers appropriately. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert uncompress_inline() to take a folioJosef Bacik1-4/+5
Update uncompress_inline to take a folio and update it's usage accordingly. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert struct btrfs_writepage_fixup to use a folioJosef Bacik1-4/+3
Now the fixup creator and consumer use folios, change this to use a folio as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_writepage_cow_fixup() to use folioJosef Bacik1-15/+16
Instead of a page, use a folio for btrfs_writepage_cow_fixup. We already have a folio at the only caller, and the fixup worker uses folios. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert btrfs_writepage_fixup_worker() to use a folioJosef Bacik1-26/+28
This function heavily messes with pages, instead update it to use a folio. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: convert submit_uncompressed_range() to take a folioJosef Bacik1-14/+12
This mostly uses folios already, update it to take a folio and update the rest of the function to use the folio instead of the page. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>