ext4: prevent stale extent cache entries caused by concurrent I/O writeback

Currently, in the I/O writeback path, ext4_map_blocks() may attempt to cache additional unrelated extents in the extent status tree without holding the inode's i_rwsem and the mapping's invalidate_lock. This can lead to stale extent status entries remaining in certain scenarios, potentially causing data corruption. For example, when performing a collapse range in ext4_collapse_range(), it clears the extent cache and dirty pages before removing blocks and shifting extents. It also holds the i_data_sem during these two operations. However, both ext4_ext_remove_space() and ext4_ext_shift_extents() may briefly release the i_data_sem if journal credits are insufficient (ext4_datasem_ensure_credits()). If another writeback process writes dirty pages from other regions during this interval, it may cache extents that are about to be modified. Unless ext4_collapse_range() explicitly clears the extent cache again, these cached entries can become stale and inconsistent with the actual extents. 0 a n b c m | | | | | | [www][wwwwww][wwwwwwww]...[wwwww][wwww]... | | N M Assume that block a is dirty. The collapse range operation is removing data from n to m and drops i_data_sem immediately after removing the extent from b to c. At the same time, a concurrent writeback begins to write back block a; it will reloads the extent from [n, b) into the extent status tree since it does not hold the i_rwsem or the invalidate_lock. After the collapse range operation, it left the stale extent [n, b), which points logical block n to N, but the actual physical block of n should be M. Similarly, both ext4_insert_range() and ext4_truncate() have the same problem. ext4_punch_hole() survived since it re-add a hole extent entry after removing space since commit 9f1118223aa0 ("ext4: add a hole extent entry in cache after punch"). In most cases, during dirty page writeback, the block mapping information is likely to be found in the extent cache, making it less necessary to search for physical extents. Consequently, loading unrelated extent caches during writeback appears to be ineffective. Therefore, fix this by adds EXT4_EX_NOCACHE in the writeback path to prevent caching of unrelated extents, eliminating this potential source of corruption. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250423085257.122685-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
author: Zhang Yi <yi.zhang@huawei.com> 2025-04-23 11:52:51 +0300
committer: Theodore Ts'o <tytso@mit.edu> 2025-05-14 17:42:12 +0300
commit: 402e38e6b71f5739119ca3107f375e112d63c7c5 (patch)
tree: ff4c4e88b5341b2db23cc9f4a8068a464244dd7f /fs/ext4/fast_commit.c
parent: 86b349ce0312a397a6961e457108556e44a3d211 (diff)
download: linux-402e38e6b71f5739119ca3107f375e112d63c7c5.tar.xz
1 files changed, 2 insertions, 1 deletions
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index bfe5b3c40078..1392241de5e6 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -918,7 +918,8 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
 		map.m_lblk = cur_lblk_off;
 		map.m_len = new_blk_size - cur_lblk_off + 1;
 		ret = ext4_map_blocks(NULL, inode, &map,
-				      EXT4_GET_BLOCKS_IO_SUBMIT);
+				      EXT4_GET_BLOCKS_IO_SUBMIT |
+				      EXT4_EX_NOCACHE);
 		if (ret < 0)
 			return -ECANCELED;
author	Zhang Yi <yi.zhang@huawei.com>	2025-04-23 11:52:51 +0300
committer	Theodore Ts'o <tytso@mit.edu>	2025-05-14 17:42:12 +0300
commit	402e38e6b71f5739119ca3107f375e112d63c7c5 (patch)
tree	ff4c4e88b5341b2db23cc9f4a8068a464244dd7f /fs/ext4/fast_commit.c
parent	86b349ce0312a397a6961e457108556e44a3d211 (diff)
download	linux-402e38e6b71f5739119ca3107f375e112d63c7c5.tar.xz