<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/btrfs, branch v7.1-rc5</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1-rc5</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1-rc5'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-05-23T23:54:48+00:00</updated>
<entry>
<title>Merge tag 'for-7.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2026-05-23T23:54:48+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-05-23T23:54:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=400544639d2a11a9c1e276a912a9dff8fe4107dc'/>
<id>urn:sha1:400544639d2a11a9c1e276a912a9dff8fe4107dc</id>
<content type='text'>
Pull btrfs fixes from David Sterba:
 "A batch of fixes to simple quotas:

   - add conditional rescheduling point not dependent on the lock during
     inode iterations to avoid delays with PREEMPT_NONE enabled

   - fix subvolume deletion so it does not break the squota invariants

   - properly handle enabling squota, tracking extents in the initial
     transaction

   - catch and warn about underflows, clamp to zero to avoid further
     problems

  And one fix to inode size handling:

   - fix handling of preallocated extents beyond i_size when not using
     the no-holes feature"

* tag 'for-7.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: swallow btrfs_record_squota_delta() ENOENT
  btrfs: clamp to avoid squota underflow
  btrfs: fix squota accounting during enable generation
  btrfs: check for subvolume before deleting squota qgroup
  btrfs: always drop root-&gt;inodes lock before cond_resched()
  btrfs: mark file extent range dirty after converting prealloc extents
</content>
</entry>
<entry>
<title>btrfs: swallow btrfs_record_squota_delta() ENOENT</title>
<updated>2026-05-16T01:08:40+00:00</updated>
<author>
<name>Boris Burkov</name>
<email>boris@bur.io</email>
</author>
<published>2026-05-12T16:55:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f13342e15deafb7538a7a8577ed5f4c33c56f64e'/>
<id>urn:sha1:f13342e15deafb7538a7a8577ed5f4c33c56f64e</id>
<content type='text'>
I thought that it was likely I could harden squota deletion to the point
that it was impossible to end up with an extent accounted to a qgroup
outliving its qgroup. Several recent bugs have made me re-consider that
position.

Ultimately, this is a tradeoff between short term stability and long
term strictness, but I think given that there could be another layer of
bugs behind the 2-3 I just fixed, I would feel much more confident in
people using squotas if the risk was "your values can get a bit out of
whack which you can fix by deleting stuff or
disabling/re-enabling/repairing" vs "it will abort your filesystem".

As the final nail in the coffin, the Meta production kernel was lacking
earlier fixes from me and Qu regarding subvol qgroup lifetime, so this
is what we have been testing at scale, so I think at least for now
upstream should have the same extra layer of protection.

Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: Boris Burkov &lt;boris@bur.io&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: clamp to avoid squota underflow</title>
<updated>2026-05-16T01:07:20+00:00</updated>
<author>
<name>Boris Burkov</name>
<email>boris@bur.io</email>
</author>
<published>2026-05-11T20:06:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=99aacd195141ff77295c535388888f072ec89e82'/>
<id>urn:sha1:99aacd195141ff77295c535388888f072ec89e82</id>
<content type='text'>
Simple quota accounting can undercount metadata tree block allocations
in certain scenarios. When an undercounted subvolume is deleted and its
tree blocks freed, the free deltas decrement rfer/excl past zero,
wrapping the u64 to a value near U64_MAX.

Once wrapped, can_delete_squota_qgroup() sees non-zero rfer and refuses
to delete the qgroup. The qgroup becomes permanently orphaned in the
quota tree, since there is no subvolume left to generate frees that
would bring the counter back to zero.

While we ultimately want to fix any mis-accounting at the source, it is
also helpful and worthwhile to mitigate the damage by clamping rfer and
excl to zero on underflow rather than allowing the u64 to wrap. This at
least allows us to clean up the messed up qgroups on subvol deletion.

Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: Boris Burkov &lt;boris@bur.io&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: fix squota accounting during enable generation</title>
<updated>2026-05-16T01:07:19+00:00</updated>
<author>
<name>Boris Burkov</name>
<email>boris@bur.io</email>
</author>
<published>2026-05-12T02:53:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d7c600554816b8ef70adffe078a0e360c055d82b'/>
<id>urn:sha1:d7c600554816b8ef70adffe078a0e360c055d82b</id>
<content type='text'>
The first transaction that enables squotas is special and a bit tricky.
We have to set BTRFS_FS_QUOTA_ENABLED after the transaction to avoid a
deadlock, so any delayed refs that run before we set the bit are not
squota accounted. For data this is fine, we don't get an owner_ref, so
there is no real harm, it's as if the extent predated squotas. However
for metadata, the tree block will have gen == enable_gen so when we free
it later, we will decrement the squota accounting, which can result in
an underflow. Before it is freed, btrfs check shows errors, as we have
mismatched usage between the node generations/owners and the squota
values.

There are two angles to this fix:

1. For extents that come in delayed_refs that run during the
   enable_gen transaction, we must actually set enable_gen to the *next*
   transaction. That is the first transaction that we can really
   properly account in any way.
2. For extents that come in between the end of our transaction handle
   and the time we set the BTRFS_FS_QUOTA_ENABLED bit, we need an
   additional bit, BTRFS_FS_SQUOTA_ENABLING which only affects recording
   squota deltas, so we do pick up those extents. Otherwise, we would
   miss them, even for enable_gen + 1.

Fixes: bd7c1ea3a302 ("btrfs: qgroup: check generation when recording simple quota delta")
Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: Boris Burkov &lt;boris@bur.io&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: check for subvolume before deleting squota qgroup</title>
<updated>2026-05-16T01:07:17+00:00</updated>
<author>
<name>Boris Burkov</name>
<email>boris@bur.io</email>
</author>
<published>2026-05-11T20:07:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1e92637722ae4bd417f7a37e8d1485dc23b93935'/>
<id>urn:sha1:1e92637722ae4bd417f7a37e8d1485dc23b93935</id>
<content type='text'>
The invariant that we want to maintain with subvolume qgroups is that
the qgroup can only be deleted if there is no root. With squotas, we
thought that it was sufficient to just check the usage, because we
assumed that deleting a subvolume will drive it's qgroups usage to 0,
and thus 0 usage implies no subvolume.

However, this is false, for two reasons:

- A subvol whose extents are all from before squotas was enabled.
- A subvol that was created in this transaction and for which we have
  not yet run any delayed refs.

In both cases, deleting the qgroup breaks the desired invariant and we
are left with a subvolume with no qgroup but squotas are enabled.

Fix this by unifying the deletion check logic between full qgroups and
squotas. Squotas do all the same checks *and* the additional usage == 0
check, which is the one extra rule peculiar to squotas.

Link: https://lore.kernel.org/linux-btrfs/adnBhWfJQ1n3hZC8@merlins.org/
Fixes: a8df35619948 ("btrfs: forbid deleting live subvol qgroup")
Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: Boris Burkov &lt;boris@bur.io&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: always drop root-&gt;inodes lock before cond_resched()</title>
<updated>2026-05-16T01:06:56+00:00</updated>
<author>
<name>Boris Burkov</name>
<email>boris@bur.io</email>
</author>
<published>2026-05-08T20:11:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=975e63c7a8074d83e195577b7f76dadc9a3d14b7'/>
<id>urn:sha1:975e63c7a8074d83e195577b7f76dadc9a3d14b7</id>
<content type='text'>
find_first_inode() and find_first_inode_to_shrink() lock root-&gt;inodes,
then loop over them, occasionally skipping some inodes. When they skip
an inode, they attempt to share the cpu/lock with cond_resched_lock().

However, that has a subtle problem associated with it.
cond_resched_lock() only drops the lock if it needs to actually call
schedule(). With CONFIG_PREEMPT_NONE, this means the full timeslice as
detected at ticks. With 8+ cpus and default tunables, this is 2.8ms. So
regardless of HZ, we will run for at least 2.8ms in this loop without
dropping the lock, assuming it finds no suitable inodes. If HZ is
small enough, it might be even worse as the tick granularity becomes
bigger than the timeslice.

The knock-on effect of this is that callers to
btrfs_del_inode_from_root() like kswapd trying to shrink the inode slab
or userspace threads calling evict() will spin on xa_lock(&amp;root-&gt;inodes)
for 2.8ms, so the extent map shrinker dominates the lock even though
ostensibly it is intending to share it. This produces memory pressure as
there is only one kswapd and it runs sequentially so it can get stuck in
the inode slab shrinking.

To fix it, simply replace cond_resched_lock() with an open coded variant
which unconditionally does unlock/lock around cond_resched. Sharing the
lock is decoupled from sharing the CPU, and all the users of the lock
now share it fairly.

I was able to reproduce this on test systems by producing a lot of empty
files (to make a big root-&gt;inodes xarray), then producing memory
pressure by reading large files larger than ram, triggering kswapd and
the extent_map shrinker. The lock contention is visible with perf or
lockstat. This patch also relieved a user-apparent bottleneck on a
production system from the original report.

Tested-by: Rik van Riel &lt;riel@surriel.com&gt;
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Boris Burkov &lt;boris@bur.io&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: mark file extent range dirty after converting prealloc extents</title>
<updated>2026-05-16T01:06:37+00:00</updated>
<author>
<name>Robbie Ko</name>
<email>robbieko@synology.com</email>
</author>
<published>2026-05-08T13:42:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=080ecbd05432970a8df1f70586b925a18b5ea6f4'/>
<id>urn:sha1:080ecbd05432970a8df1f70586b925a18b5ea6f4</id>
<content type='text'>
When writing into a preallocated extent, ordered extent completion calls
btrfs_mark_extent_written() to convert the file extent item from the
BTRFS_FILE_EXTENT_PREALLOC type to the BTRFS_FILE_EXTENT_REG type.

If the preallocated extent was created beyond i_size with fallocate
keep-size, and the inode is evicted and loaded again before the write,
the inode's file_extent_tree is initialized only up to i_size.

The beyond i_size prealloc extent is therefore not tracked there.

After a write into that extent extends i_size, btrfs_mark_extent_written()
updates the file extent item, but the corresponding range is not marked
dirty in the inode's file_extent_tree.

This can leave disk_i_size stale when the filesystem does not use the
no-holes feature, so after remount the file size can go back to the old
value.

The following reproducer triggers the problem:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f -O ^no-holes $DEV
  mount $DEV $MNT

  touch $MNT/file
  fallocate -n -l 2M $MNT/file

  umount $MNT
  mount $DEV $MNT

  dd if=/dev/zero of=$MNT/file bs=1M count=1 conv=notrunc
  ls -lh $MNT/file

  umount $MNT
  mount $DEV $MNT

  ls -lh $MNT/file
  umount $MNT

Running the reproducer gives the following result:

  $ ./test.sh
  (...)
  1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.000596024 s, 1.8 GB/s
  -rw-rw-r-- 1 root root 1.0M May  8 16:34 /mnt/sdi/file
  -rw-rw-r-- 1 root root 0 May  8 16:34 /mnt/sdi/file

Fix this by marking the written range dirty in the inode's
file_extent_tree after successfully converting the prealloc extent to a
regular extent.

Fixes: 9ddc959e802b ("btrfs: use the file extent tree infrastructure")
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Robbie Ko &lt;robbieko@synology.com&gt;
[ Minor change log updates ]
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-7.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2026-05-15T20:22:07+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-05-15T20:22:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a8b0b72255d09bb12ada5620cd6ced91adde5ac8'/>
<id>urn:sha1:a8b0b72255d09bb12ada5620cd6ced91adde5ac8</id>
<content type='text'>
Pull btrfs fixes from David Sterba:

 - fixup warning when allocating memory for readahead, __GFP_NOWARN was
   accidentally dropped when setting mapping constraints

 - in tracepoint of file sync, fix sleeping in atomic context when
   handling dentries

 - harden initial loading of block group on crafted/fuzzed images,
   iterate all chunk mapping entries unconditionally

 - fix freeing pages of submitted io after checking for errors

 - fix incorrect inode size after remount when using fallocate KEEP_SIZE
   mode (also requires disabled 'no-holes' feature)

* tag 'for-7.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix incorrect i_size after remount caused by KEEP_SIZE prealloc gap
  btrfs: only release the dirty pages io tree after successful writes
  btrfs: tracepoints: fix sleep while in atomic context in btrfs_sync_file()
  btrfs: always pass __GFP_NOWARN from add_ra_bio_pages()
  btrfs: fix check_chunk_block_group_mappings() to iterate all chunk maps
</content>
</entry>
<entry>
<title>btrfs: fix incorrect i_size after remount caused by KEEP_SIZE prealloc gap</title>
<updated>2026-05-07T22:32:08+00:00</updated>
<author>
<name>Robbie Ko</name>
<email>robbieko@synology.com</email>
</author>
<published>2026-05-01T02:41:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c562ba61fc5e11798720acc1b172862158f1fa0b'/>
<id>urn:sha1:c562ba61fc5e11798720acc1b172862158f1fa0b</id>
<content type='text'>
When fallocate() with FALLOC_FL_KEEP_SIZE preallocates an extent past the
current i_size, the file_extent_tree of the inode is updated to cover
that range. However, on the next mount, btrfs_read_locked_inode() only
re-populates file_extent_tree with [0, round_up(i_size, sectorsize)),
losing the marks that belonged to the KEEP_SIZE prealloc extent beyond
i_size.

Later, when a non-KEEP_SIZE fallocate() extends i_size into / past that
old prealloc extent, the reservation loop in btrfs_fallocate() skips
already-prealloc segments and does not call into the path that marks the
file_extent_tree, so a gap remains inside the file_extent_tree across
[old_aligned_i_size, start_of_new_alloc). Then __btrfs_prealloc_file_range()
calls btrfs_inode_safe_disk_i_size_write(), which uses
find_contiguous_extent_bit() starting at offset 0 to derive disk_i_size.
The walk stops at the gap, so disk_i_size ends up smaller than i_size and
gets persisted. After the next mount, the file shows the wrong (smaller)
size.

The following reproducer triggers the problem:

  $ cat test.sh
  MNT=/mnt/sdi
  DEV=/dev/sdi

  mkdir -p $MNT
  mkfs.btrfs -f -O ^no-holes $DEV
  mount $DEV $MNT

  touch $MNT/file1
  # KEEP_SIZE prealloc beyond i_size (i_size stays 0)
  fallocate -n -o 4M -l 4M $MNT/file1
  umount $MNT
  mount $DEV $MNT

  # non-KEEP_SIZE fallocate that overlaps the previous prealloc tail
  # and extends past it
  fallocate -o 7M -l 2M $MNT/file1
  ls -lh $MNT/file1
  umount $MNT
  mount $DEV $MNT
  ls -lh $MNT/file1
  umount $MNT

Running the reproducer gives the following result:

  $ ./test.sh
  (...)
  -rw-rw-r-- 1 root root 9.0M May  4 16:35 /mnt/sdi/file1
  -rw-rw-r-- 1 root root 7.0M May  4 16:35 /mnt/sdi/file1

The size before the second mount is correct (9M), but after the
remount it drops to 7M, i.e. the start of the gap inside file_extent_tree.

Fix this in __btrfs_prealloc_file_range() by marking the entire range
[round_down(old_i_size, sectorsize), round_up(new_i_size, sectorsize))
in file_extent_tree before updating i_size and calling
btrfs_inode_safe_disk_i_size_write(). This ensures the contiguous bit
search starting from 0 is not truncated by a stale gap left behind by a
previous KEEP_SIZE prealloc that was not restored on inode load.

The fix has no effect when the NO_HOLES feature is enabled because
btrfs_inode_safe_disk_i_size_write() and
btrfs_inode_set_file_extent_range()
both take the fast path that directly tracks disk_i_size without
consulting file_extent_tree.

Fixes: 9ddc959e802b ("btrfs: use the file extent tree infrastructure")
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Robbie Ko &lt;robbieko@synology.com&gt;
[ Minor updates to the change log ]
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: only release the dirty pages io tree after successful writes</title>
<updated>2026-05-07T22:31:47+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2026-04-30T01:07:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4066c55e109475a06d18a1f127c939d551211956'/>
<id>urn:sha1:4066c55e109475a06d18a1f127c939d551211956</id>
<content type='text'>
[WARNING]
With extra warning on dirty extent buffers at umount (aka, the next
patch in the series), test case generic/388 can trigger the following
warning about dirty extent buffers at unmount time:

  BTRFS critical (device dm-2 state E): emergency shutdown
  BTRFS error (device dm-2 state E): error while writing out transaction: -30
  BTRFS warning (device dm-2 state E): Skipping commit of aborted transaction.
  BTRFS error (device dm-2 state EA): Transaction 9 aborted (error -30)
  BTRFS: error (device dm-2 state EA) in cleanup_transaction:2068: errno=-30 Readonly filesystem
  BTRFS info (device dm-2 state EA): forced readonly
  BTRFS info (device dm-2 state EA): last unmount of filesystem 4fbf2e15-f941-49a0-bc7c-716315d2777c
  ------------[ cut here ]------------
  WARNING: disk-io.c:3311 at invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs], CPU#8: umount/914368
  CPU: 8 UID: 0 PID: 914368 Comm: umount Tainted: G           OE       7.1.0-rc1-custom+ #372 PREEMPT(full)  2de38db8d1deae71fde295430a0ff3ab98ccf596
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
  RIP: 0010:invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs]
  Call Trace:
   &lt;TASK&gt;
   close_ctree+0x52e/0x574 [btrfs d2f0b1cd330d1287e7a9919d112eadfc0e914efd]
   generic_shutdown_super+0x89/0x1a0
   kill_anon_super+0x16/0x40
   btrfs_kill_super+0x16/0x20 [btrfs d2f0b1cd330d1287e7a9919d112eadfc0e914efd]
   deactivate_locked_super+0x2d/0xb0
   cleanup_mnt+0xdc/0x140
   task_work_run+0x5a/0xa0
   exit_to_user_mode_loop+0x123/0x4b0
   do_syscall_64+0x243/0x7c0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   &lt;/TASK&gt;
  ---[ end trace 0000000000000000 ]---
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30539776 owner 9 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30621696 owner 257 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30638080 owner 258 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30654464 owner 7 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30703616 owner 2 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30720000 owner 10 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30736384 owner 4 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30752768 owner 11 gen 9 refs 2 flags 0x7

I'm using a stripped down version, which seems to trigger the warning
more reliably:

  _fsstress_pid=""
  workload()
  {
  	dmesg -C
  	mkfs.btrfs -f -K $dev &gt; /dev/null
  	echo 1 &gt; /sys/kernel/debug/clear_warn_once
  	mount $dev $mnt
  	$fsstress -w -n 1024 -p 4 -d $mnt &amp;
  	_fsstress_pid=$!
  	sleep 0
  	$godown $mnt
  	pkill --echo -PIPE fsstress &gt; /dev/null
  	wait $_fsstress_pid
  	unset _fsstress_pid
  	umount $mnt

  	if dmesg | grep -q "WARNING"; then
  		fail
  	fi
  }

  for (( i = 0; i &lt; $runtime; i++ )); do
  	echo "=== $i/$runtime ==="
  	workload
  done

[CAUSE]
Inside btrfs_write_and_wait_transaction(), we first try to write all
dirty ebs, then wait for them to finish.

After that we call btrfs_extent_io_tree_release() to free all
extent states from dirty_pages io tree.

However if we hit an error from btrfs_write_marked_extent(), then we
still call btrfs_extent_io_tree_release() to clear that dirty_pages io
tree, which may contain dirty records that we haven't yet submitted.

Furthermore, the later transaction cleanup path will utilize that
dirty_pages io tree to properly cleanup those dirty ebs, but since it's
already empty, no dirty ebs are properly cleaned up, thus will later
trigger the warnings inside invalidate_btree_folios().

[FIX]
Normally such dirty ebs won't cause problems, as when the iput() is
called on the btree inode, the dirty ebs will be forcibly written back,
and since the fs is already in an error status, such writeback will not
reach disk and finish immediately.

But it's still better to get rid of such dirty ebs, if we ended up with
dirty ebs but the fs is not in an error status, then such writeback at
iput() time will be too late, as all workers are already stopped but
writeback will utilize workers, which will lead to NULL pointer
dereferences.

Instead of unconditionally calling btrfs_extent_io_tree_release(), only
call it if btrfs_write_and_wait_transaction() finished successfully, so
that @dirty_pages extent io tree is kept untouched for transaction
cleanup.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
</feed>
