<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/btrfs/relocation.c, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-10-13T20:29:03+00:00</updated>
<entry>
<title>btrfs: fix clearing of BTRFS_FS_RELOC_RUNNING if relocation already running</title>
<updated>2025-10-13T20:29:03+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2025-09-24T15:10:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7e5a5983edda664e8e4bb20af17b80f5135c655c'/>
<id>urn:sha1:7e5a5983edda664e8e4bb20af17b80f5135c655c</id>
<content type='text'>
When starting relocation, at reloc_chunk_start(), if we happen to find
the flag BTRFS_FS_RELOC_RUNNING is already set we return an error
(-EINPROGRESS) to the callers, however the callers call reloc_chunk_end()
which will clear the flag BTRFS_FS_RELOC_RUNNING, which is wrong since
relocation was started by another task and still running.

Finding the BTRFS_FS_RELOC_RUNNING flag already set is an unexpected
scenario, but still our current behaviour is not correct.

Fix this by never calling reloc_chunk_end() if reloc_chunk_start() has
returned an error, which is what logically makes sense, since the general
widespread pattern is to have end functions called only if the counterpart
start functions succeeded. This requires changing reloc_chunk_start() to
clear BTRFS_FS_RELOC_RUNNING if there's a pending cancel request.

Fixes: 907d2710d727 ("btrfs: add cancellable chunk relocation support")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Boris Burkov &lt;boris@bur.io&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: add unlikely annotations to branches leading to transaction abort</title>
<updated>2025-09-23T06:49:26+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2025-09-17T17:53:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a929904cf73b650f49cc60941e6e618240096fcb'/>
<id>urn:sha1:a929904cf73b650f49cc60941e6e618240096fcb</id>
<content type='text'>
The unlikely() annotation is a static prediction hint that compiler may
use to reorder code out of hot path. We use it elsewhere (namely
tree-checker.c) for error branches that almost never happen.

Transaction abort is one such error, the btrfs_abort_transaction()
inlines code to check the state and print a warning, this ought to be
out of the hot path.

The most common pattern is when transaction abort is called after
checking a return value and the control flow leads to a quick return.
In other cases it may not be necessary to add unlikely() e.g. when the
function returns anyway or the control flow is not changed noticeably.

Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: add unlikely annotations to branches leading to EIO</title>
<updated>2025-09-23T06:49:26+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2025-09-17T17:53:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cc53bd2085c8fa7b199a9a8e10e634b62a6d3fa8'/>
<id>urn:sha1:cc53bd2085c8fa7b199a9a8e10e634b62a6d3fa8</id>
<content type='text'>
The unlikely() annotation is a static prediction hint that compiler may
use to reorder code out of hot path. We use it elsewhere (namely
tree-checker.c) for error branches that almost never happen, where
EIO is one of them.

Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: add unlikely annotations to branches leading to EUCLEAN</title>
<updated>2025-09-23T06:49:26+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2025-09-17T17:53:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9264d004a6c9788354b45553b9e4fe910e71b387'/>
<id>urn:sha1:9264d004a6c9788354b45553b9e4fe910e71b387</id>
<content type='text'>
The unlikely() annotation is a static prediction hint that compiler may
use to reorder code out of hot path. We use it elsewhere (namely
tree-checker.c) for error branches that almost never happen, where
EUCLEAN (a corruption) is one of them.

Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: more trivial BTRFS_PATH_AUTO_FREE conversions</title>
<updated>2025-09-23T06:49:26+00:00</updated>
<author>
<name>Sun YangKai</name>
<email>sunk67188@gmail.com</email>
</author>
<published>2025-09-22T11:30:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4ca6f24a52c4e94bd09f70cf132d0a38db7996b0'/>
<id>urn:sha1:4ca6f24a52c4e94bd09f70cf132d0a38db7996b0</id>
<content type='text'>
Trivial pattern for the auto freeing with goto -&gt; return conversions
if possible.

The following cases are considered trivial in this patch:

1. Cases where there are no operations between btrfs_free_path() and the
   function returns.
2. Cases where only simple cleanup operations (such as kfree(), kvfree(),
   clear_bit(), and fs_path_free()) are present between
   btrfs_free_path() and the function return.

Signed-off-by: Sun YangKai &lt;sunk67188@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: convert several int parameters to bool</title>
<updated>2025-09-22T08:54:32+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2025-08-13T10:41:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=67e78f983e6a1208c2f52adad4b06a388f4a15a5'/>
<id>urn:sha1:67e78f983e6a1208c2f52adad4b06a388f4a15a5</id>
<content type='text'>
We're almost done cleaning misused int/bool parameters. Convert a bunch
of them, found by manual grepping.  Note that btrfs_sync_fs() needs an
int as it's mandated by the struct super_operations prototype.

Reviewed-by: Boris Burkov &lt;boris@bur.io&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: do not allow relocation of partially dropped subvolumes</title>
<updated>2025-08-07T15:07:15+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2025-07-25T11:03:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4289b494ac553e74e86fed1c66b2bf9530bc1082'/>
<id>urn:sha1:4289b494ac553e74e86fed1c66b2bf9530bc1082</id>
<content type='text'>
[BUG]
There is an internal report that balance triggered transaction abort,
with the following call trace:

  item 85 key (594509824 169 0) itemoff 12599 itemsize 33
          extent refs 1 gen 197740 flags 2
          ref#0: tree block backref root 7
  item 86 key (594558976 169 0) itemoff 12566 itemsize 33
          extent refs 1 gen 197522 flags 2
          ref#0: tree block backref root 7
 ...
 BTRFS error (device loop0): extent item not found for insert, bytenr 594526208 num_bytes 16384 parent 449921024 root_objectid 934 owner 1 offset 0
 BTRFS error (device loop0): failed to run delayed ref for logical 594526208 num_bytes 16384 type 182 action 1 ref_mod 1: -117
 ------------[ cut here ]------------
 BTRFS: Transaction aborted (error -117)
 WARNING: CPU: 1 PID: 6963 at ../fs/btrfs/extent-tree.c:2168 btrfs_run_delayed_refs+0xfa/0x110 [btrfs]

And btrfs check doesn't report anything wrong related to the extent
tree.

[CAUSE]
The cause is a little complex, firstly the extent tree indeed doesn't
have the backref for 594526208.

The extent tree only have the following two backrefs around that bytenr
on-disk:

        item 65 key (594509824 METADATA_ITEM 0) itemoff 13880 itemsize 33
                refs 1 gen 197740 flags TREE_BLOCK
                tree block skinny level 0
                (176 0x7) tree block backref root CSUM_TREE
        item 66 key (594558976 METADATA_ITEM 0) itemoff 13847 itemsize 33
                refs 1 gen 197522 flags TREE_BLOCK
                tree block skinny level 0
                (176 0x7) tree block backref root CSUM_TREE

But the such missing backref item is not an corruption on disk, as the
offending delayed ref belongs to subvolume 934, and that subvolume is
being dropped:

        item 0 key (934 ROOT_ITEM 198229) itemoff 15844 itemsize 439
                generation 198229 root_dirid 256 bytenr 10741039104 byte_limit 0 bytes_used 345571328
                last_snapshot 198229 flags 0x1000000000001(RDONLY) refs 0
                drop_progress key (206324 EXTENT_DATA 2711650304) drop_level 2
                level 2 generation_v2 198229

And that offending tree block 594526208 is inside the dropped range of
that subvolume.  That explains why there is no backref item for that
bytenr and why btrfs check is not reporting anything wrong.

But this also shows another problem, as btrfs will do all the orphan
subvolume cleanup at a read-write mount.

So half-dropped subvolume should not exist after an RW mount, and
balance itself is also exclusive to subvolume cleanup, meaning we
shouldn't hit a subvolume half-dropped during relocation.

The root cause is, there is no orphan item for this subvolume.
In fact there are 5 subvolumes from around 2021 that have the same
problem.

It looks like the original report has some older kernels running, and
caused those zombie subvolumes.

Thankfully upstream commit 8d488a8c7ba2 ("btrfs: fix subvolume/snapshot
deletion not triggered on mount") has long fixed the bug.

[ENHANCEMENT]
For repairing such old fs, btrfs-progs will be enhanced.

Considering how delayed the problem will show up (at run delayed ref
time) and at that time we have to abort transaction already, it is too
late.

Instead here we reject any half-dropped subvolume for reloc tree at the
earliest time, preventing confusion and extra time wasted on debugging
similar bugs.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: enable large data folios for data reloc inode</title>
<updated>2025-07-21T23:13:03+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2025-07-16T07:25:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=041c39da53c25f94a4705466b7f980c979215767'/>
<id>urn:sha1:041c39da53c25f94a4705466b7f980c979215767</id>
<content type='text'>
For data reloc inodes, they are a special type of inodes that are not
exposed to user space, and are only utilized during data block groups
relocation.

They do not go under regular read-write operations, but have their file
extents manually created to have the same layout of a block group, then
its content is read from the original block group, and written back to
the new location which is in a new block group.

Previously all the handling was done in page units, and commit
c2832898126f ("btrfs: make relocate_one_page() handle subpage case")
changed the handling to subpage blocks.

On the other hand, data reloc inodes are a perfect match for large data
folios, as each relocation cluster represents one or more data extents
that are contiguous in their logical addresses.

This patch enables large folios for data reloc inodes by:

- Remove the special handling of data reloc inodes when setting folio
  order

- Change relocate_one_folio() to return the file offset of the next
  folio
  Originally it's designed to handle fixed page sized blocks, but with
  large folios, we can handle a large folio, thus we have to return the
  end of the current folio.

- Remove the warning on folio_order()

- Use folio_size() to replace fixed PAGE_SIZE usage

- Use file_offset as iterator inside relocate_file_extent_cluster

Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: reloc: unconditionally invalidate the page cache for each cluster</title>
<updated>2025-07-21T23:13:03+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2025-07-15T03:48:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4e346baee95f4688b36c9bd95664774c3d105c3d'/>
<id>urn:sha1:4e346baee95f4688b36c9bd95664774c3d105c3d</id>
<content type='text'>
Commit 9d9ea1e68a05 ("btrfs: subpage: fix relocation potentially
overwriting last page data") fixed a bug when relocating data block
groups for subpage cases.

However for the incoming large folios for data reloc inode, we can hit
the same situation where block size is the same as page size, but the
folio we got is still larger than a block.

In that case, the old subpage specific check is no longer reliable.

Here we have to enhance the handling by:

- Unconditionally invalidate the page cache for the current cluster
  We set the @flush to true so that any dirty folios are properly
  written back first.

  And this time instead of dropping the whole page cache, just drop the
  range covered by the current cluster.

  This will bring some minor performance drop, as for a large folio, the
  heading half will be read twice (read by previous cluster, then
  invalidated, then read again by the current cluster).

  However that is required to support large folios, and this gets rid of
  the kinda tricky manual uptodate flag clearing for each block.

- Remove the special handling of writing back the whole page cache
  filemap_invalidate_inode() handles the write back already, and since
  we're invalidating all pages in the range, we no longer need to
  manually clear the uptodate flags for involved blocks.

  Thus there is no need to manually write back the whole page cache.

Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: remove btrfs_clear_extent_bits()</title>
<updated>2025-07-21T22:09:22+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2025-07-11T08:46:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d6be378de06c8ae72c46d528f69cc53fa382b01c'/>
<id>urn:sha1:d6be378de06c8ae72c46d528f69cc53fa382b01c</id>
<content type='text'>
It's just a simple wrapper around btrfs_clear_extent_bit() that passes a
NULL for its last argument (a cached extent state record), plus there is
not counter part - we have a btrfs_set_extent_bit() but we do not have a
btrfs_set_extent_bits() (plural version). So just remove it and make all
callers use btrfs_clear_extent_bit() directly.

Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
</feed>
