kernel/linux.git/fs/btrfs/ordered-data.h, branch v7.1-rc5

btrfs: check type flags in alloc_ordered_extent()

2026-04-07T16:56:02+00:00

Unlike other flags used in btrfs, BTRFS_ORDERED_* macros are different as they cannot be directly used as flags. They are defined as bit values, thus they should be utilized with bit operations, not directly with logical operations. Unfortunately sometimes I forgot this and passed the incorrect flags to alloc_ordered_extent() and hit weird bugs. Enhance the type checks in alloc_ordered_extent(): - Make sure there is one and only one bit set for exclusive type flags There are four exclusive type flags, REGULAR, NOCOW, PREALLOC and COMPRESSED. So introduce a new macro, BTRFS_ORDERED_EXCLUSIVE_FLAGS, to cover above flags. Add an ASSERT() to check one and only one of those exclusive flags can be set for alloc_ordered_extent(). - Re-order the type bit numbers to the end of the enum This is make it much harder to get a valid false negative. E.g., with the old code BTRFS_ORDERED_REGULAR starts at zero, we can have the following flags passing the bit uniqueness check: * BTRFS_ORDERED_NOCOW Be treated as BTRFS_ORDERED_REGULAR (1 == 1UL << 0). * BTRFS_ORDERED_PREALLOC Be treated as BTRFS_ORDERED_NOCOW (2 == 1UL << 1). * BTRFS_ORDERED_DIRECT Be treated as BTRFS_ORDERED_PREALLOC (4 == 1UL << 2). Now all those types start at 8, passing any of those bit numbers as flags directly will not pass the ASSERT(). - Add a static assert to avoid overflow To make sure all BTRFS_ORDERED_* flags can fit into an unsigned long. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: remove folio parameter from ordered io related functions

2026-04-07T16:55:57+00:00

Both functions btrfs_finish_ordered_extent() and btrfs_mark_ordered_io_finished() are accepting an optional folio parameter. That @folio is passed into can_finish_ordered_extent(), which later will test and clear the ordered flag for the involved range. However I do not think there is any other call site that can clear ordered flags of an page cache folio and can affect can_finish_ordered_extent(). There are limited *_clear_ordered() callers out of can_finish_ordered_extent() function: - btrfs_migrate_folio() This is completely unrelated, it's just migrating the ordered flag to the new folio. - btrfs_cleanup_ordered_extents() We manually clean the ordered flags of all involved folios, then call btrfs_mark_ordered_io_finished() without a @folio parameter. So it doesn't need and didn't pass a @folio parameter in the first place. - btrfs_writepage_fixup_worker() This function is going to be removed soon, and we should not hit that function anymore. - btrfs_invalidate_folio() This is the real call site we need to bother with. If we already have a bio running, btrfs_finish_ordered_extent() in end_bbio_data_write() will be executed first, as btrfs_invalidate_folio() will wait for the writeback to finish. Thus if there is a running bio, it will not see the range has ordered flags, and just skip to the next range. If there is no bio running, meaning the ordered extent is created but the folio is not yet submitted. In that case btrfs_invalidate_folio() will manually clear the folio ordered range, but then manually finish the ordered extent with btrfs_dec_test_ordered_pending() without bothering the folio ordered flags. Meaning if the OE range with folio ordered flags will be finished manually without the need to call can_finish_ordered_extent(). This means all can_finish_ordered_extent() call sites should get a range that has folio ordered flag set, thus the old "return false" branch should never be triggered. Now we can: - Remove the @folio parameter from involved functions * btrfs_mark_ordered_io_finished() * btrfs_finish_ordered_extent() For call sites passing a @folio into those functions, let them manually clear the ordered flag of involved folios. - Move btrfs_finish_ordered_extent() out of the loop in end_bbio_data_write() We only need to call btrfs_finish_ordered_extent() once per bbio, not per folio. - Add an ASSERT() to make sure all folio ranges have ordered flags It's only for end_bbio_data_write(). And we already have enough safe nets to catch over-accounting of ordered extents. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo Signed-off-by: David Sterba

btrfs: remove the btrfs_inode parameter from btrfs_remove_ordered_extent()

2026-04-07T16:55:57+00:00

We already have btrfs_ordered_extent::inode, thus there is no need to pass a btrfs_inode parameter to btrfs_remove_ordered_extent(). Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo Signed-off-by: David Sterba

btrfs: rename btrfs_ordered_extent::list to csum_list

2026-04-07T16:55:56+00:00

That list head records all pending checksums for that ordered extent. And unlike other lists, we just use the name "list", which can be very confusing for readers. Rename it to "csum_list" which follows the remaining lists, showing the purpose of the list. And since we're here, remove a comment inside btrfs_finish_ordered_zoned() where we have "ASSERT(!list_empty(&ordered->csum_list))" to make sure the OE has pending csums. That comment is only here to make sure we do not call list_first_entry() before checking BTRFS_ORDERED_PREALLOC. But since we already have that bit checked and even have a dedicated ASSERT(), there is no need for that comment anymore. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo Signed-off-by: David Sterba

btrfs: introduce a read path dedicated extent lock helper

2025-03-18T19:35:48+00:00

Currently we're using btrfs_lock_and_flush_ordered_range() for both btrfs_read_folio() and btrfs_readahead(), but it has one critical problem for future subpage optimizations: - It will call btrfs_start_ordered_extent() to writeback the involved folios But remember we're calling btrfs_lock_and_flush_ordered_range() at read paths, meaning the folio is already locked by read path. If we really trigger writeback for those already locked folios, this will lead to a deadlock and writeback cannot get the folio lock. Such dead lock is prevented by the fact that btrfs always keeps a dirty folio also uptodate, by either dirtying all blocks of the folio, or by reading the whole folio before dirtying. To prepare for the incoming patch which allows btrfs to skip full folio read if the buffered write is block aligned, we have to start by solving the possible deadlock first. Instead of blindly calling btrfs_start_ordered_extent(), introduce a new helper, which is smarter in the following ways: - Only wait and flush the ordered extent if * The folio doesn't even have private bit set * Part of the blocks of the ordered extent are not uptodate This can happen by: * The folio writeback finished, then got invalidated. There are a lot of reasons that a folio can get invalidated, from memory pressure to direct IO (which invalidates all folios of the range). But OE not yet finished. We have to wait for the ordered extent, as the OE may contain to-be-inserted data checksum. Without waiting, our read can fail due to the missing checksum. But either way, the OE should not need any extra flush inside the locked folio range. - Skip the ordered extent completely if * All the blocks are dirty This happens when OE creation is caused by a folio writeback whose file offset is before our folio. E.g. 16K page size and 4K block size 0 8K 16K 24K 32K |//////////////||///////| | The writeback of folio 0 created an OE for range [0, 24K), but since folio 16K is not fully uptodate, a read is triggered for folio 16K. The writeback will never happen (we're holding the folio lock for read), nor will the OE finish. Thus we must skip the range. * All the blocks are uptodate This happens when the writeback finished, but OE not yet finished. Since the blocks are already uptodate, we can skip the OE range. The new helper lock_extents_for_read() will do a loop for the target range by: 1) Lock the full range 2) If there is no ordered extent in the remaining range, exit 3) If there is an ordered extent that we can skip Skip to the end of the OE, and continue checking We do not trigger writeback nor wait for the OE. 4) If there is an ordered extent that we cannot skip Unlock the whole extent range and start the ordered extent. And also update btrfs_start_ordered_extent() to add two more parameters: @nowriteback_start and @nowriteback_len, to prevent triggering flush for a certain range. This will allow us to handle the following case properly in the future: 16K page size, 4K btrfs block size: 0 4K 8K 12K 16K 20K 24K 28K 32K |/////////////////////////////||////////////////| | | |<-------------------- OE 2 ------------------->| |< OE 1 >| The folio has been written back before, thus we have an OE at [28K, 32K). Although the OE 1 finished its IO, the OE is not yet removed from IO tree. The folio got invalidated after writeback completed and before the ordered extent finished. And [16K, 24K) range is dirty and uptodate, caused by a block aligned buffered write (and future enhancements allowing btrfs to skip full folio read for such case). But writeback for folio 0 has began, thus it generated OE 2, covering range [0, 24K). Since the full folio 16K is not uptodate, if we want to read the folio, the existing btrfs_lock_and_flush_ordered_range() will dead lock, by: btrfs_read_folio() | Folio 16K is already locked |- btrfs_lock_and_flush_ordered_range() |- btrfs_start_ordered_extent() for range [16K, 24K) |- filemap_fdatawrite_range() for range [16K, 24K) |- extent_write_cache_pages() folio_lock() on folio 16K, deadlock. But now we will have the following sequence: btrfs_read_folio() | Folio 16K is already locked |- lock_extents_for_read() |- can_skip_ordered_extent() for range [16K, 24K) | Returned true, the range [16K, 24K) will be skipped. |- can_skip_ordered_extent() for range [28K, 32K) | Returned false. |- btrfs_start_ordered_extent() for range [28K, 32K) with [16K, 32K) as no writeback range No writeback for folio 16K will be triggered. And there will be no more possible deadlock on the same folio. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo Signed-off-by: David Sterba

btrfs: update include and forward declarations in headers

2025-03-18T19:35:43+00:00

Pass over all header files and add missing forward declarations, includes or fix include types. Signed-off-by: David Sterba

btrfs: convert btrfs_mark_ordered_io_finished() to take a folio

2024-09-10T14:51:14+00:00

We only need a folio now, make it take a folio as an argument and update all of the callers. Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: convert btrfs_finish_ordered_extent() to take a folio

2024-09-10T14:51:14+00:00

The callers and callee's of this now all use folios, update it to take a folio as well. Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: switch btrfs_ordered_extent::inode to struct btrfs_inode

2024-07-11T13:33:28+00:00

The structure is internal so we should use struct btrfs_inode for that, allowing to remove some use of BTRFS_I. Reviewed-by: Boris Burkov Signed-off-by: David Sterba

btrfs: cleanup duplicated parameters related to create_io_em()

2024-07-11T13:33:21+00:00

Most parameters of create_io_em() can be replaced by the members with the same name inside btrfs_file_extent. Do a direct parameters cleanup here. Reviewed-by: Johannes Thumshirn Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba