<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/btrfs/ordered-data.h, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-03-18T19:35:48+00:00</updated>
<entry>
<title>btrfs: introduce a read path dedicated extent lock helper</title>
<updated>2025-03-18T19:35:48+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2025-02-10T08:22:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d2da21a6e06c40b9fd380ada93f1b48279e48b16'/>
<id>urn:sha1:d2da21a6e06c40b9fd380ada93f1b48279e48b16</id>
<content type='text'>
Currently we're using btrfs_lock_and_flush_ordered_range() for both
btrfs_read_folio() and btrfs_readahead(), but it has one critical
problem for future subpage optimizations:

- It will call btrfs_start_ordered_extent() to writeback the involved
  folios

  But remember we're calling btrfs_lock_and_flush_ordered_range() at
  read paths, meaning the folio is already locked by read path.

  If we really trigger writeback for those already locked folios, this
  will lead to a deadlock and writeback cannot get the folio lock.

  Such dead lock is prevented by the fact that btrfs always keeps a
  dirty folio also uptodate, by either dirtying all blocks of the folio,
  or by reading the whole folio before dirtying.

To prepare for the incoming patch which allows btrfs to skip full folio
read if the buffered write is block aligned, we have to start by solving
the possible deadlock first.

Instead of blindly calling btrfs_start_ordered_extent(), introduce a
new helper, which is smarter in the following ways:

- Only wait and flush the ordered extent if
  * The folio doesn't even have private bit set
  * Part of the blocks of the ordered extent are not uptodate

  This can happen by:
  * The folio writeback finished, then got invalidated.
    There are a lot of reasons that a folio can get invalidated,
    from memory pressure to direct IO (which invalidates all folios
    of the range).
    But OE not yet finished.

  We have to wait for the ordered extent, as the OE may contain
  to-be-inserted data checksum.
  Without waiting, our read can fail due to the missing checksum.

  But either way, the OE should not need any extra flush inside the
  locked folio range.

- Skip the ordered extent completely if
  * All the blocks are dirty
    This happens when OE creation is caused by a folio writeback whose
    file offset is before our folio.

    E.g. 16K page size and 4K block size

    0      8K      16K      24K     32K
    |//////////////||///////|       |

    The writeback of folio 0 created an OE for range [0, 24K), but since
    folio 16K is not fully uptodate, a read is triggered for folio 16K.

    The writeback will never happen (we're holding the folio lock for
    read), nor will the OE finish.

    Thus we must skip the range.

  * All the blocks are uptodate
    This happens when the writeback finished, but OE not yet finished.

    Since the blocks are already uptodate, we can skip the OE range.

The new helper lock_extents_for_read() will do a loop for the target
range by:

1) Lock the full range

2) If there is no ordered extent in the remaining range, exit

3) If there is an ordered extent that we can skip
   Skip to the end of the OE, and continue checking
   We do not trigger writeback nor wait for the OE.

4) If there is an ordered extent that we cannot skip
   Unlock the whole extent range and start the ordered extent.

And also update btrfs_start_ordered_extent() to add two more parameters:
@nowriteback_start and @nowriteback_len, to prevent triggering flush for
a certain range.

This will allow us to handle the following case properly in the future:

  16K page size, 4K btrfs block size:

  0     4K      8K     12K      16K      20K      24K     28K      32K
  |/////////////////////////////||////////////////|       |        |
  |&lt;-------------------- OE 2 -------------------&gt;|       |&lt; OE 1 &gt;|

  The folio has been written back before, thus we have an OE at
  [28K, 32K).
  Although the OE 1 finished its IO, the OE is not yet removed from IO
  tree.
  The folio got invalidated after writeback completed and before the
  ordered extent finished.

  And [16K, 24K) range is dirty and uptodate, caused by a block aligned
  buffered write (and future enhancements allowing btrfs to skip full
  folio read for such case).
  But writeback for folio 0 has began, thus it generated OE 2, covering
  range [0, 24K).

  Since the full folio 16K is not uptodate, if we want to read the folio,
  the existing btrfs_lock_and_flush_ordered_range() will dead lock, by:

  btrfs_read_folio()
  | Folio 16K is already locked
  |- btrfs_lock_and_flush_ordered_range()
     |- btrfs_start_ordered_extent() for range [16K, 24K)
        |- filemap_fdatawrite_range() for range [16K, 24K)
           |- extent_write_cache_pages()
              folio_lock() on folio 16K, deadlock.

  But now we will have the following sequence:

  btrfs_read_folio()
  | Folio 16K is already locked
  |- lock_extents_for_read()
     |- can_skip_ordered_extent() for range [16K, 24K)
     |  Returned true, the range [16K, 24K) will be skipped.
     |- can_skip_ordered_extent() for range [28K, 32K)
     |  Returned false.
     |- btrfs_start_ordered_extent() for range [28K, 32K) with
        [16K, 32K) as no writeback range
        No writeback for folio 16K will be triggered.

  And there will be no more possible deadlock on the same folio.

Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: update include and forward declarations in headers</title>
<updated>2025-03-18T19:35:43+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2025-02-12T20:22:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6149c82bdaff63203e58a4910d9245468d281f39'/>
<id>urn:sha1:6149c82bdaff63203e58a4910d9245468d281f39</id>
<content type='text'>
Pass over all header files and add missing forward declarations,
includes or fix include types.

Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: convert btrfs_mark_ordered_io_finished() to take a folio</title>
<updated>2024-09-10T14:51:14+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2024-07-24T19:57:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a79228011c75f9123ba2dbfd010cba27ea87b973'/>
<id>urn:sha1:a79228011c75f9123ba2dbfd010cba27ea87b973</id>
<content type='text'>
We only need a folio now, make it take a folio as an argument and update
all of the callers.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: convert btrfs_finish_ordered_extent() to take a folio</title>
<updated>2024-09-10T14:51:14+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2024-07-24T19:53:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aef665d69ad15afaebdc2c32b3e58fc526ba6c3d'/>
<id>urn:sha1:aef665d69ad15afaebdc2c32b3e58fc526ba6c3d</id>
<content type='text'>
The callers and callee's of this now all use folios, update it to take a
folio as well.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: switch btrfs_ordered_extent::inode to struct btrfs_inode</title>
<updated>2024-07-11T13:33:28+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2022-01-31T17:57:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a1f4e3d7bd3bfa01d8b4be9b55c2af180f733ff1'/>
<id>urn:sha1:a1f4e3d7bd3bfa01d8b4be9b55c2af180f733ff1</id>
<content type='text'>
The structure is internal so we should use struct btrfs_inode for that,
allowing to remove some use of BTRFS_I.

Reviewed-by: Boris Burkov &lt;boris@bur.io&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: cleanup duplicated parameters related to create_io_em()</title>
<updated>2024-07-11T13:33:21+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2024-05-03T03:32:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9fec848b3a3343a1c680e26155aec26ae4866849'/>
<id>urn:sha1:9fec848b3a3343a1c680e26155aec26ae4866849</id>
<content type='text'>
Most parameters of create_io_em() can be replaced by the members with
the same name inside btrfs_file_extent.

Do a direct parameters cleanup here.

Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: cleanup duplicated parameters related to btrfs_alloc_ordered_extent</title>
<updated>2024-07-11T13:33:21+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2024-05-03T03:19:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e9ea31fb5c1f90a006dccb6972664086447f4911'/>
<id>urn:sha1:e9ea31fb5c1f90a006dccb6972664086447f4911</id>
<content type='text'>
All parameters after @filepos of btrfs_alloc_ordered_extent() can be
replaced with btrfs_file_extent structure.

This patch does the cleanup, meanwhile some points to note:

- Move btrfs_file_extent structure to ordered-data.h
  The structure is needed by both btrfs_alloc_ordered_extent() and
  can_nocow_extent(), but since btrfs_inode.h includes
  ordered-data.h, so we need to move the structure to ordered-data.h.

- Move the special handling of NOCOW/PREALLOC into
  btrfs_alloc_ordered_extent()
  This is to allow btrfs_split_ordered_extent() to properly split them
  for DIO.
  For now just move the handling into btrfs_alloc_ordered_extent() to
  simplify the callers.

Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: simplify range parameters of btrfs_wait_ordered_roots()</title>
<updated>2024-07-11T13:33:19+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2024-05-14T14:48:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=42317ab440c110618b511290284c6d6c10bcffc7'/>
<id>urn:sha1:42317ab440c110618b511290284c6d6c10bcffc7</id>
<content type='text'>
The range is specified only in two ways, we can simplify the case for
the whole filesystem range as a NULL block group parameter.

Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: pass a btrfs_inode to btrfs_wait_ordered_range()</title>
<updated>2024-07-11T13:33:18+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2024-05-18T17:14:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e641e323abb3ceff6477a3936c4f5173126b7890'/>
<id>urn:sha1:e641e323abb3ceff6477a3936c4f5173126b7890</id>
<content type='text'>
Instead of passing a (VFS) inode pointer argument, pass a btrfs_inode
instead, as this is generally what we do for internal APIs, making it
more consistent with most of the code base. This will later allow to
help to remove a lot of BTRFS_I() calls in btrfs_sync_file().

Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: make btrfs_finish_ordered_extent() return void</title>
<updated>2024-07-11T13:33:18+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2024-05-15T12:54:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c41881ae07c828ee5da8ed1d44204911fdae3bcf'/>
<id>urn:sha1:c41881ae07c828ee5da8ed1d44204911fdae3bcf</id>
<content type='text'>
Currently btrfs_finish_ordered_extent() returns a boolean indicating if
the ordered extent was added to the work queue for completion, but none
of its callers cares about it, so make it return void.

Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
</feed>
