summaryrefslogtreecommitdiff
path: root/fs/ocfs2/journal.c
AgeCommit message (Collapse)AuthorFilesLines
2020-01-23ocfs2: call journal flush to mark journal as empty after journal recovery ↵Kai Li1-0/+8
when mount [ Upstream commit 397eac17f86f404f5ba31d8c3e39ec3124b39fd3 ] If journal is dirty when mount, it will be replayed but jbd2 sb log tail cannot be updated to mark a new start because journal->j_flag has already been set with JBD2_ABORT first in journal_init_common. When a new transaction is committed, it will be recored in block 1 first(journal->j_tail is set to 1 in journal_reset). If emergency restart happens again before journal super block is updated unfortunately, the new recorded trans will not be replayed in the next mount. The following steps describe this procedure in detail. 1. mount and touch some files 2. these transactions are committed to journal area but not checkpointed 3. emergency restart 4. mount again and its journals are replayed 5. journal super block's first s_start is 1, but its s_seq is not updated 6. touch a new file and its trans is committed but not checkpointed 7. emergency restart again 8. mount and journal is dirty, but trans committed in 6 will not be replayed. This exception happens easily when this lun is used by only one node. If it is used by multi-nodes, other node will replay its journal and its journal super block will be updated after recovery like what this patch does. ocfs2_recover_node->ocfs2_replay_journal. The following jbd2 journal can be generated by touching a new file after journal is replayed, and seq 15 is the first valid commit, but first seq is 13 in journal super block. logdump: Block 0: Journal Superblock Seq: 0 Type: 4 (JBD2_SUPERBLOCK_V2) Blocksize: 4096 Total Blocks: 32768 First Block: 1 First Commit ID: 13 Start Log Blknum: 1 Error: 0 Feature Compat: 0 Feature Incompat: 2 block64 Feature RO compat: 0 Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36 FS Share Cnt: 1 Dynamic Superblk Blknum: 0 Per Txn Block Limit Journal: 0 Data: 0 Block 1: Journal Commit Block Seq: 14 Type: 2 (JBD2_COMMIT_BLOCK) Block 2: Journal Descriptor Seq: 15 Type: 1 (JBD2_DESCRIPTOR_BLOCK) No. Blocknum Flags 0. 587 none UUID: 00000000000000000000000000000000 1. 8257792 JBD2_FLAG_SAME_UUID 2. 619 JBD2_FLAG_SAME_UUID 3. 24772864 JBD2_FLAG_SAME_UUID 4. 8257802 JBD2_FLAG_SAME_UUID 5. 513 JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG ... Block 7: Inode Inode: 8257802 Mode: 0640 Generation: 57157641 (0x3682809) FS Generation: 2839773110 (0xa9437fb6) CRC32: 00000000 ECC: 0000 Type: Regular Attr: 0x0 Flags: Valid Dynamic Features: (0x1) InlineData User: 0 (root) Group: 0 (root) Size: 7 Links: 1 Clusters: 0 ctime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019 atime: 0x5de5d870 0x113181a1 -- Tue Dec 3 11:37:20.288457121 2019 mtime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019 dtime: 0x0 -- Thu Jan 1 08:00:00 1970 ... Block 9: Journal Commit Block Seq: 15 Type: 2 (JBD2_COMMIT_BLOCK) The following is journal recovery log when recovering the upper jbd2 journal when mount again. syslog: ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it. fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0 fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1 fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2 fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13 Due to first commit seq 13 recorded in journal super is not consistent with the value recorded in block 1(seq is 14), journal recovery will be terminated before seq 15 even though it is an unbroken commit, inode 8257802 is a new file and it will be lost. Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com Signed-off-by: Kai Li <li.kai4@h3c.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Changwei Ge <gechangwei@live.cn> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05ocfs2: clear journal dirty flag after shutdown journalJunxiao Bi1-4/+2
[ Upstream commit d85400af790dba2aa294f0a77e712f166681f977 ] Dirty flag of the journal should be cleared at the last stage of umount, if do it before jbd2_journal_destroy(), then some metadata in uncommitted transaction could be lost due to io error, but as dirty flag of journal was already cleared, we can't find that until run a full fsck. This may cause system panic or other corruption. Link: http://lkml.kernel.org/r/20181121020023.3034-3-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@versity.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-05-30ocfs2: return error when we attempt to access a dirty bh in jbd2piaojun1-11/+12
[ Upstream commit d984187e3a1ad7d12447a7ab2c43ce3717a2b5b3 ] We should not reuse the dirty bh in jbd2 directly due to the following situation: 1. When removing extent rec, we will dirty the bhs of extent rec and truncate log at the same time, and hand them over to jbd2. 2. The bhs are submitted to jbd2 area successfully. 3. The write-back thread of device help flush the bhs to disk but encounter write error due to abnormal storage link. 4. After a while the storage link become normal. Truncate log flush worker triggered by the next space reclaiming found the dirty bh of truncate log and clear its 'BH_Write_EIO' and then set it uptodate in __ocfs2_journal_access(): ocfs2_truncate_log_worker ocfs2_flush_truncate_log __ocfs2_flush_truncate_log ocfs2_replay_truncate_records ocfs2_journal_access_di __ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata. 5. Then jbd2 will flush the bh of truncate log to disk, but the bh of extent rec is still in error state, and unfortunately nobody will take care of it. 6. At last the space of extent rec was not reduced, but truncate log flush worker have given it back to globalalloc. That will cause duplicate cluster problem which could be identified by fsck.ocfs2. Sadly we can hardly revert this but set fs read-only in case of ruining atomicity and consistency of space reclaim. Link: http://lkml.kernel.org/r/5A6E8092.8090701@huawei.com Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access") Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-06ocfs2: add uuid to ocfs2 thread name for problem analysisJoseph Qi1-2/+2
A node can mount multiple ocfs2 volumes. And if thread names are same for each volume/domain, it will bring inconvenience when analyzing problems because we have to identify which volume/domain the messages belong to. Since thread name will be printed to messages, so add volume uuid or dlm name to thread name can benefit problem analysis. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06ocfs2: only take lock if dio entry when recover orphansJoseph Qi1-39/+47
We have no need to take inode mutex, rw and inode lock if it is not dio entry when recover orphans. Optimize it by adding a flag OCFS2_INODE_DIO_ORPHAN_ENTRY to ocfs2_inode_info to reduce contention. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06ocfs2: do not include dio entry in case of orphan scanJoseph Qi1-3/+12
dio entry will only do truncate in case of ORPHAN_NEED_TRUNCATE. So do not include it when doing normal orphan scan to reduce contention. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-05ocfs2: neaten do_error, ocfs2_error and ocfs2_abortJoe Perches1-1/+1
These uses sometimes do and sometimes don't have '\n' terminations. Make the uses consistently use '\n' terminations and remove the newline from the functions. Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-05ocfs2: fix race between crashed dio and rmJoseph Qi1-1/+3
There is a race case between crashed dio and rm, which will lead to OCFS2_VALID_FL not set read-only. N1 N2 ------------------------------------------------------------------------ dd with direct flag rm file crashed with an dio entry left in orphan dir clear OCFS2_VALID_FL in ocfs2_remove_inode recover N1 and read the corrupted inode, and set filesystem read-only So we skip the inode deletion this time and wait for dio entry recovered first. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-05ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_accessJoseph Qi1-1/+17
When storage network is unstable, it may trigger the BUG in __ocfs2_journal_access because of buffer not uptodate. We can retry the write in this case or return error instead of BUG. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Zhangguanghui <zhang.guanghui@h3c.com> Tested-by: Zhangguanghui <zhang.guanghui@h3c.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-05ocfs2: fix race between dio and recover orphanJoseph Qi1-4/+4
During direct io the inode will be added to orphan first and then deleted from orphan. There is a race window that the orphan entry will be deleted twice and thus trigger the BUG when validating OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan. ocfs2_direct_IO_write ... ocfs2_add_inode_to_orphan >>>>>>>> race window. 1) another node may rm the file and then down, this node take care of orphan recovery and clear flag OCFS2_DIO_ORPHANED_FL. 2) since rw lock is unlocked, it may race with another orphan recovery and append dio. ocfs2_del_inode_from_orphan So take inode mutex lock when recovering orphans and make rw unlock at the end of aio write in case of append dio. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25ocfs2: mark local functions as staticJoseph Qi1-5/+5
Some functions are only used locally, so mark them as static. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger()Xue jiufei1-3/+1
ocfs2_abort_trigger() use bh->b_assoc_map to get sb. But there's no function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer dereference while calling this function. We can get sb from bh->b_bdev->bd_super instead of b_assoc_map. [akpm@linux-foundation.org: update comment, per Joseph] Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25ocfs2: do not BUG if jbd2_journal_dirty_metadata failsJoseph Qi1-1/+14
jbd2_journal_dirty_metadata may fail. Currently it cannot take care of non zero return value and just BUG in ocfs2_journal_dirty. This patch is aborting the handle and journal instead of BUG. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25ocfs2: fix a tiny race when truncate dio orohaned entryJoseph Qi1-26/+21
Once dio crashed it will leave an entry in orphan dir. And orphan scan will take care of the clean up. There is a tiny race case that the same entry will be truncated twice and then trigger the BUG in ocfs2_del_inode_from_orphan. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17ocfs2: wait for orphan recovery first once append O_DIRECT write crashJoseph Qi1-0/+2
If one node has crashed with orphan entry leftover, another node which do append O_DIRECT write to the same file will override the i_dio_orphaned_slot. Then the old entry won't be cleaned forever. If this case happens, we let it wait for orphan recovery first. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17ocfs2: add orphan recovery types in ocfs2_recover_orphansJoseph Qi1-20/+88
Define two orphan recovery types, which indicates if need truncate file or not. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11ocfs2: remove unreachable code in __ocfs2_recovery_thread()Daeseok Youn1-1/+0
Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-01vfs: make first argument of dir_context.actor typedMiklos Szeredi1-3/+5
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-05ocfs2: limit printk when journal is abortedJoseph Qi1-2/+15
Once JBD2_ABORT is set, ocfs2_commit_cache will fail in ocfs2_commit_thread. Then it will get into a loop with mass logs. This will meaninglessly consume a larger number of resource and may lead to the system hanging. So limit printk in this case. [akpm@linux-foundation.org: document the msleep] Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-04ocfs2: remove OCFS2_INODE_SKIP_DELETE flagJan Kara1-6/+0
The flag was never set, delete it. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12ocfs2: use i_size_read() to access i_sizeJunxiao Bi1-4/+4
Though ocfs2 uses inode->i_mutex to protect i_size, there are both i_size_read/write() and direct accesses. Clean up all direct access to eliminate confusion. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12ocfs2: lighten up allocate transactionYounger Liu1-0/+35
The issue scenario is as following: When fallocating a very large disk space for a small file, __ocfs2_extend_allocation attempts to get a very large transaction. For some journal sizes, there may be not enough room for this transaction, and the fallocate will fail. The patch below extends & restarts the transaction as necessary while allocating space, and should work with even the smallest journal. This patch refers ext4 resize. Test: # mkfs.ocfs2 -b 4K -C 32K -T datafiles /dev/sdc ...(jounral size is 32M) # mount.ocfs2 /dev/sdc /mnt/ocfs2/ # touch /mnt/ocfs2/1.log # fallocate -o 0 -l 400G /mnt/ocfs2/1.log fallocate: /mnt/ocfs2/1.log: fallocate failed: Cannot allocate memory # tail -f /var/log/messages [ 7372.278591] JBD: fallocate wants too many credits (2051 > 2048) [ 7372.278597] (fallocate,6438,0):__ocfs2_extend_allocation:709 ERROR: status = -12 [ 7372.278603] (fallocate,6438,0):ocfs2_allocate_unwritten_extents:1504 ERROR: status = -12 [ 7372.278607] (fallocate,6438,0):__ocfs2_change_file_space:1955 ERROR: status = -12 ^C With this patch, the test works well. Signed-off-by: Younger Liu <younger.liu@huawei.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-29[readdir] convert ocfs2Al Viro1-7/+7
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22ocfs2: remove kfree() redundant null checksTim Gardner1-7/+3
smatch analysis indicates a number of redundant NULL checks before calling kfree(), eg: fs/ocfs2/alloc.c:6138 ocfs2_begin_truncate_log_recovery() info: redundant null check on *tl_copy calling kfree() fs/ocfs2/alloc.c:6755 ocfs2_zero_range_for_truncate() info: redundant null check on pages calling kfree() etc.... [akpm@linux-foundation.org: revert dubious change in ocfs2_begin_truncate_log_recovery()] Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31ocfs2: Convert to new freezing mechanismJan Kara1-1/+6
Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze protection. We also protect several ioctl entry points which were missing the protection. Finally, we add freeze protection to the journaling mechanism so that iput() of unlinked inode cannot modify a frozen filesystem. CC: Mark Fasheh <mfasheh@suse.com> CC: Joel Becker <jlbec@evilplan.org> CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-24ocfs2: Add comment about orphan scanningSunil Mushran1-0/+14
Add a comment that explains the reason as to why orphan scan scans all the slots. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2: Clean up messages in the fsSunil Mushran1-3/+6
Convert useful messages from ML_NOTICE to KERN_NOTICE to improve readability. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-05-13ocfs2: Skip mount recovery for hard-ro mountsSunil Mushran1-0/+3
Patch skips mount recovery for hard-ro mounts which otherwise leads to an oops. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-03-31Fix common misspellingsLucas De Marchi1-1/+1
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-02-24ocfs2: Remove masklog ML_JOURNAL.Tao Ma1-76/+54
Remove mlog(0) from fs/ocfs2/journal.c and the masklog JOURNAL. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-03-07ocfs2: Remove EXIT from masklog.Tao Ma1-19/+0
mlog_exit is used to record the exit status of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. This patch just try to remove it or change it. So: 1. if all the error paths already use mlog_errno, it is just removed. Otherwise, it will be replaced by mlog_errno. 2. if it is used to print some return value, it is replaced with mlog(0,...). mlog_exit_ptr is changed to mlog(0. All those mlog(0,...) will be replaced with trace events later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-02-21ocfs2: Remove ENTRY from masklog.Tao Ma1-31/+12
ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2010-09-10ocfs2: Remove obsolete comments before ocfs2_start_trans.Tao Ma1-3/+0
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Remove unused old_id in ocfs2_commit_cache.Tao Ma1-2/+1
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Add some trace log for orphan scan.Tao Ma1-0/+3
Now orphan scan worker has no trace log, so it is very hard to tell whether it is finished or blocked. So add 2 mlog trace log so that we can tell whether the current orphan scan worker is blocked or not. It does help when I analyzed a orphan scan bug. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-08-08Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds1-2/+2
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: Adding error check after calling ext4_mb_regular_allocator() ext4: Fix dirtying of journalled buffers in data=journal mode ext4: re-inline ext4_rec_len_(to|from)_disk functions jbd2: Remove t_handle_lock from start_this_handle() jbd2: Change j_state_lock to be a rwlock_t jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop ext4: Add mount options in superblock ext4: force block allocation on quota_off ext4: fix freeze deadlock under IO ext4: drop inode from orphan list if ext4_delete_inode() fails ext4: check to make make sure bd_dev is set before dereferencing it jbd2: Make barrier messages less scary ext4: don't print scary messages for allocation failures post-abort ext4: fix EFBIG edge case when writing to large non-extent file ext4: fix ext4_get_blocks references ext4: Always journal quota file modifications ext4: Fix potential memory leak in ext4_fill_super ext4: Don't error out the fs if the user tries to make a file too big ext4: allocate stripe-multiple IOs on stripe boundaries ext4: move aio completion after unwritten extent conversion ... Fix up conflicts in fs/ext4/inode.c as per Ted. Fix up xfs conflicts as per earlier xfs merge.
2010-08-04jbd2: Change j_state_lock to be a rwlock_tTheodore Ts'o1-2/+2
Lockstat reports have shown that j_state_lock is a major source of lock contention, especially on systems with more than 4 CPU cores. So change it to be a read/write spinlock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-16jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactionsJan Kara1-12/+12
OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-06-16ocfs2: Move orphan scan work to ocfs2_wq.Tao Ma1-3/+3
We used to let orphan scan work in the default work queue, but there is a corner case which will make the system deadlock. The scenario is like this: 1. set heartbeat threadshold to 200. this will allow us to have a great chance to have a orphan scan work before our quorum decision. 2. mount node 1. 3. after 1~2 minutes, mount node 2(in order to make the bug easier to reproduce, better add maxcpus=1 to kernel command line). 4. node 1 do orphan scan work. 5. node 2 do orphan scan work. 6. node 1 do orphan scan work. After this, node 1 hold the orphan scan lock while node 2 know node 1 is the master. 7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection). Now when node 2 begins orphan scan, the system queue is blocked. The root cause is that both orphan scan work and quorum decision work will use the system event work queue. orphan scan has a chance of blocking the event work queue(in dlm_wait_for_node_death) so that there is no chance for quorum decision work to proceed. This patch resolve it by moving orphan scan work to ocfs2_wq. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: Make ocfs2_extend_trans() really extend.Tao Ma1-6/+9
In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's blocks. But if jbd2_journal_extend() fails, it will only restart with the the new number of blocks. This tends to be awkward since in most cases we want additional reserved blocks. It makes our code harder to mantain since the caller can't be sure all the original blocks will not be accessed and dirtied again. There are 15 callers of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add h_buffer_credits before they call ocfs2_extend_trans(). This makes ocfs2_extend_trans() really extend atop the original block count. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: Make ocfs2_journal_dirty() void.Joel Becker1-8/+3
jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-26ocfs2/trivial: Remove trailing whitespacesSunil Mushran1-1/+1
Patch removes trailing whitespaces. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-04tree-wide: fix assorted typos all over the placeAndré Goddard Rosa1-1/+1
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-23ocfs2: Add metaecc for ocfs2_refcount_block.Tao Ma1-0/+15
Add metaecc and journal trigger for ocfs2_refcount_block. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-05ocfs2: Pass struct ocfs2_caching_info to the journal functions.Joel Becker1-35/+30
The next step in divorcing metadata I/O management from struct inode is to pass struct ocfs2_caching_info to the journal functions. Thus the journal locks a metadata cache with the cache io_lock function. It also can compare ci_last_trans and ci_created_trans directly. This is a large patch because of all the places we change ocfs2_journal_access..(handle, inode, ...) to ocfs2_journal_access..(handle, INODE_CACHE(inode), ...). Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-05ocfs2: Take the inode out of the metadata read/write paths.Joel Becker1-2/+2
We are really passing the inode into the ocfs2_read/write_blocks() functions to get at the metadata cache. This commit passes the cache directly into the metadata block functions, divorcing them from the inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-09ocfs2: Fixup orphan scan cleanup after failed mountJeff Mahoney1-1/+7
If the mount fails for any reason, ocfs2_dismount_volume calls ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init be called to setup the mutex and work queues, but that doesn't happen if the mount has failed and we oops accessing an uninitialized work queue. This patch splits the init and startup of the orphan scan, eliminating the oops. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-23ocfs2: Disable orphan scanning for local and hard-ro mountsSunil Mushran1-13/+17
Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-23ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init()Sunil Mushran1-0/+1
We don't access the LVB in our ocfs2_*_lock_res_init() functions. Since the LVB can become invalid during some cluster recovery operations, the dlmglue must be able to handle an uninitialized LVB. For the orphan scan lock, we initialized an uninitialzed LVB with our scan sequence number plus one. This starts a normal orphan scan cycle. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-23ocfs2: Stop orphan scan as early as possible during umountSunil Mushran1-2/+12
Currently if the orphan scan fires a tick before the user issues the umount, the umount will wait for the queued orphan scan tasks to complete. This patch makes the umount stop the orphan scan as early as possible so as to reduce the probability of the queued tasks slowing down the umount. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>