xfs: recalculate free rt extents after log recovery

I've been observing periodic corruption reports from xfs_scrub involving the free rt extent counter (frextents) while running xfs/141. That test uses an error injection knob to induce a torn write to the log, and an arbitrary number of recovery mounts, frextents will count fewer free rt extents than can be found the rtbitmap. The root cause of the problem is a combination of the misuse of sb_frextents in the incore mount to reflect both incore reservations made by running transactions as well as the actual count of free rt extents on disk. The following sequence can reproduce the undercount: Thread 1 Thread 2 xfs_trans_alloc(rtextents=3) xfs_mod_frextents(-3) <blocks> xfs_attr_set() xfs_bmap_attr_addfork() xfs_add_attr2() xfs_log_sb() xfs_sb_to_disk() xfs_trans_commit() <log flushed to disk> <log goes down> Note that thread 1 subtracts 3 from sb_frextents even though it never commits to using that space. Thread 2 writes the undercounted value to the ondisk superblock and logs it to the xattr transaction, which is then flushed to disk. At next mount, log recovery will find the logged superblock and write that back into the filesystem. At the end of log recovery, we reread the superblock and install the recovered undercounted frextents value into the incore superblock. From that point on, we've effectively leaked thread 1's transaction reservation. The correct fix for this is to separate the incore reservation from the ondisk usage, but that's a matter for the next patch. Because the kernel has been logging superblocks with undercounted frextents for a very long time and we don't demand that sysadmins run xfs_repair after a crash, fix the undercount by recomputing frextents after log recovery. Gating this on log recovery is a reasonable balance (I think) between correcting the problem and slowing down every mount attempt. Note that xfs_repair will fix undercounted frextents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
author: Darrick J. Wong <djwong@kernel.org> 2022-04-11 23:49:42 +0300
committer: Dave Chinner <david@fromorbit.com> 2022-04-11 23:49:42 +0300
commit: 5a605fd6cb1da0ec9cb6e54c06bcf58f706d2f83 (patch)
tree: 6c8eb64e5290bb466b46aab89c6917c40e6c5245 /fs/xfs/xfs_rtalloc.c
parent: f34061f554feba68e12b7a73008c350d2a9afd0c (diff)
download: linux-5a605fd6cb1da0ec9cb6e54c06bcf58f706d2f83.tar.xz
1 files changed, 37 insertions, 0 deletions
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index b8c79ee791af..76f50e75f99c 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1284,6 +1284,43 @@ xfs_rtmount_init(
 	return 0;
 }
 
+static int
+xfs_rtalloc_count_frextent(
+	struct xfs_mount		*mp,
+	struct xfs_trans		*tp,
+	const struct xfs_rtalloc_rec	*rec,
+	void				*priv)
+{
+	uint64_t			*valp = priv;
+
+	*valp += rec->ar_extcount;
+	return 0;
+}
+
+/*
+ * Reinitialize the number of free realtime extents from the realtime bitmap.
+ * Callers must ensure that there is no other activity in the filesystem.
+ */
+int
+xfs_rtalloc_reinit_frextents(
+	struct xfs_mount	*mp)
+{
+	uint64_t		val = 0;
+	int			error;
+
+	xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL);
+	error = xfs_rtalloc_query_all(mp, NULL, xfs_rtalloc_count_frextent,
+			&val);
+	xfs_iunlock(mp->m_rbmip, XFS_ILOCK_EXCL);
+	if (error)
+		return error;
+
+	spin_lock(&mp->m_sb_lock);
+	mp->m_sb.sb_frextents = val;
+	spin_unlock(&mp->m_sb_lock);
+	return 0;
+}
+
 /*
  * Get the bitmap and summary inodes and the summary cache into the mount
  * structure at mount time.
author	Darrick J. Wong <djwong@kernel.org>	2022-04-11 23:49:42 +0300
committer	Dave Chinner <david@fromorbit.com>	2022-04-11 23:49:42 +0300
commit	5a605fd6cb1da0ec9cb6e54c06bcf58f706d2f83 (patch)
tree	6c8eb64e5290bb466b46aab89c6917c40e6c5245 /fs/xfs/xfs_rtalloc.c
parent	f34061f554feba68e12b7a73008c350d2a9afd0c (diff)
download	linux-5a605fd6cb1da0ec9cb6e54c06bcf58f706d2f83.tar.xz