kernel/linux.git/fs/btrfs/ordered-data.c, branch v6.1.168

btrfs: fix qgroup reservation leak on failure to allocate ordered extent

2025-08-28T14:26:10+00:00

[ Upstream commit 1f2889f5594a2bc4c6a52634c4a51b93e785def5 ] If we fail to allocate an ordered extent for a COW write we end up leaking a qgroup data reservation since we called btrfs_qgroup_release_data() but we didn't call btrfs_qgroup_free_refroot() (which would happen when running the respective data delayed ref created by ordered extent completion or when finishing the ordered extent in case an error happened). So make sure we call btrfs_qgroup_free_refroot() if we fail to allocate an ordered extent for a COW write. Fixes: 7dbeaad0af7d ("btrfs: change timing for qgroup reserved space for ordered extents to fix reserved space leak") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana Signed-off-by: David Sterba [ adjust to code movements ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

btrfs: fix qgroup_free_reserved_data int overflow

2024-01-10T16:10:35+00:00

[ Upstream commit 9e65bfca24cf1d77e4a5c7a170db5867377b3fe7 ] The reserved data counter and input parameter is a u64, but we inadvertently accumulate it in an int. Overflowing that int results in freeing the wrong amount of data and breaking reserve accounting. Unfortunately, this overflow rot spreads from there, as the qgroup release/free functions rely on returning an int to take advantage of negative values for error codes. Therefore, the full fix is to return the "released" or "freed" amount by a u64 argument and to return 0 or negative error code via the return value. Most of the call sites simply ignore the return value, though some of them handle the error and count the returned bytes. Change all of them accordingly. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo Signed-off-by: Boris Burkov Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin

btrfs: free qgroup reserve when ORDERED_IOERR is set

2023-12-20T16:00:26+00:00

commit f63e1164b90b385cd832ff0fdfcfa76c3cc15436 upstream. An ordered extent completing is a critical moment in qgroup reserve handling, as the ownership of the reservation is handed off from the ordered extent to the delayed ref. In the happy path we release (unlock) but do not free (decrement counter) the reservation, and the delayed ref drives the free. However, on an error, we don't create a delayed ref, since there is no ref to add. Therefore, free on the error path. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo Signed-off-by: Boris Burkov Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman

btrfs: check for BTRFS_FS_ERROR in pending ordered assert

2023-09-23T09:11:11+00:00

commit 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab upstream. If we do fast tree logging we increment a counter on the current transaction for every ordered extent we need to wait for. This means we expect the transaction to still be there when we clear pending on the ordered extent. However if we happen to abort the transaction and clean it up, there could be no running transaction, and thus we'll trip the "ASSERT(trans)" check. This is obviously incorrect, and the code properly deals with the case that the transaction doesn't exist. Fix this ASSERT() to only fire if there's no trans and we don't have BTRFS_FS_ERROR() set on the file system. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman

btrfs: add btrfs_try_lock_ordered_range

2022-09-29T15:08:28+00:00

For IOCB_NOWAIT we're going to want to use try lock on the extent lock, and simply bail if there's an ordered extent in the range because the only choice there is to wait for the ordered extent to complete. Reviewed-by: Filipe Manana Signed-off-by: Josef Bacik Signed-off-by: Stefan Roesch Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: unify the lock/unlock extent variants

2022-09-26T10:28:05+00:00

We have two variants of lock/unlock extent, one set that takes a cached state, another that does not. This is slightly annoying, and generally speaking there are only a few places where we don't have a cached state. Simplify this by making lock_extent/unlock_extent the only variant and make it take a cached state, then convert all the callers appropriately. Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: add lockdep annotations for the ordered extents wait event

2022-09-26T10:27:53+00:00

This wait event is very similar to the pending ordered wait event in the sense that it occurs in a different context than the condition signaling for the event. The signaling occurs in btrfs_remove_ordered_extent() while the wait event is implemented in btrfs_start_ordered_extent() in fs/btrfs/ordered-data.c However, in this case a thread must not acquire the lockdep map for the ordered extents wait event when the ordered extent is related to a free space inode. That is because lockdep creates dependencies between locks acquired both in execution paths related to normal inodes and paths related to free space inodes, thus leading to false positives. Reviewed-by: Josef Bacik Signed-off-by: Ioannis Angelakopoulos Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: add lockdep annotations for pending_ordered wait event

2022-09-26T10:27:53+00:00

In contrast to the num_writers and num_extwriters wait events, the condition for the pending ordered wait event is signaled in a different context from the wait event itself. The condition signaling occurs in btrfs_remove_ordered_extent() in fs/btrfs/ordered-data.c while the wait event is implemented in btrfs_commit_transaction() in fs/btrfs/transaction.c Thus the thread signaling the condition has to acquire the lockdep map as a reader at the start of btrfs_remove_ordered_extent() and release it after it has signaled the condition. In this case some dependencies might be left out due to the placement of the annotation, but it is better than no annotation at all. Reviewed-by: Josef Bacik Signed-off-by: Ioannis Angelakopoulos Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: remove the finish_func argument to btrfs_mark_ordered_io_finished

2022-07-25T15:45:37+00:00

finish_func is always set to finish_ordered_fn, so remove it and also the now pointless and somewhat confusingly named __endio_write_update_ordered wrapper. Reviewed-by: Nikolay Borisov Signed-off-by: Christoph Hellwig Signed-off-by: David Sterba

btrfs: add tracepoints for ordered extents

2022-07-25T15:45:34+00:00

When debugging a reference counting issue with ordered extents, I've found we're lacking a lot of tracepoint coverage in the ordered extent code. Close these gaps by adding tracepoints after every refcount_inc() in the ordered extent code. Reviewed-by: Boris Burkov Reviewed-by: Qu Wenruo Reviewed-by: Anand Jain Signed-off-by: Johannes Thumshirn Reviewed-by: David Sterba Signed-off-by: David Sterba