jbd: Issue cache flush after checkpointing

When we reach cleanup_journal_tail(), there is no guarantee that checkpointed buffers are on a stable storage - especially if buffers were written out by log_do_checkpoint(), they are likely to be only in disk's caches. Thus when we update journal superblock, effectively removing old transaction from journal, this write of superblock can get to stable storage before those checkpointed buffers which can result in filesystem corruption after a crash. A similar problem can happen if we replay the journal and wipe it before flushing disk's caches. Thus we must unconditionally issue a cache flush before we update journal superblock in these cases. The fix is slightly complicated by the fact that we have to get log tail before we issue cache flush but we can store it in the journal superblock only after the cache flush. Otherwise we risk races where new tail is written before appropriate cache flush is finished. I managed to reproduce the corruption using somewhat tweaked Chris Mason's barrier-test scheduler. Also this should fix occasional reports of 'Bit already freed' filesystem errors which are totally unreproducible but inspection of several fs images I've gathered over time points to a problem like this. CC: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
author: Jan Kara <jack@suse.cz> 2011-11-26 03:35:39 +0400
committer: Jan Kara <jack@suse.cz> 2012-01-11 16:36:57 +0400
commit: 353b67d8ced4dc53281c88150ad295e24bc4b4c5 (patch)
tree: a339a47a9899d01108c6167ffbbefcea07f63912 /fs/jbd/recovery.c
parent: e4e11180dfa545233e5145919b75b7fac88638df (diff)
download: linux-353b67d8ced4dc53281c88150ad295e24bc4b4c5.tar.xz
1 files changed, 4 insertions, 0 deletions
diff --git a/fs/jbd/recovery.c b/fs/jbd/recovery.c
index 5b43e96788e6..008bf062fd26 100644
--- a/fs/jbd/recovery.c
+++ b/fs/jbd/recovery.c
@@ -20,6 +20,7 @@
 #include <linux/fs.h>
 #include <linux/jbd.h>
 #include <linux/errno.h>
+#include <linux/blkdev.h>
 #endif
 
 /*
@@ -263,6 +264,9 @@ int journal_recover(journal_t *journal)
 	err2 = sync_blockdev(journal->j_fs_dev);
 	if (!err)
 		err = err2;
+	/* Flush disk caches to get replayed data on the permanent storage */
+	if (journal->j_flags & JFS_BARRIER)
+		blkdev_issue_flush(journal->j_fs_dev, GFP_KERNEL, NULL);
 
 	return err;
 }
author	Jan Kara <jack@suse.cz>	2011-11-26 03:35:39 +0400
committer	Jan Kara <jack@suse.cz>	2012-01-11 16:36:57 +0400
commit	353b67d8ced4dc53281c88150ad295e24bc4b4c5 (patch)
tree	a339a47a9899d01108c6167ffbbefcea07f63912 /fs/jbd/recovery.c
parent	e4e11180dfa545233e5145919b75b7fac88638df (diff)
download	linux-353b67d8ced4dc53281c88150ad295e24bc4b4c5.tar.xz