summaryrefslogtreecommitdiff
path: root/fs/jbd2/transaction.c
AgeCommit message (Collapse)AuthorFilesLines
2019-11-30Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-183/+217
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge window saw the the following new featuers added to ext4: - Direct I/O via iomap (required the iomap-for-next branch from Darrick as a prereq). - Support for using dioread-nolock where the block size < page size. - Support for encryption for file systems where the block size < page size. - Rework of journal credits handling so a revoke-heavy workload will not cause the journal to run out of space. - Replace bit-spinlocks with spinlocks in jbd2 Also included were some bug fixes and cleanups, mostly to clean up corner cases from fuzzed file systems and error path handling" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (59 commits) ext4: work around deleting a file with i_nlink == 0 safely ext4: add more paranoia checking in ext4_expand_extra_isize handling jbd2: make jbd2_handle_buffer_credits() handle reserved handles ext4: fix a bug in ext4_wait_for_tail_page_commit ext4: bio_alloc with __GFP_DIRECT_RECLAIM never fails ext4: code cleanup for get_next_id ext4: fix leak of quota reservations ext4: remove unused variable warning in parse_options() ext4: Enable encryption for subpage-sized blocks fs/buffer.c: support fscrypt in block_read_full_page() ext4: Add error handling for io_end_vec struct allocation jbd2: Fine tune estimate of necessary descriptor blocks jbd2: Provide trace event for handle restarts ext4: Reserve revoke credits for freed blocks jbd2: Make credit checking more strict jbd2: Rename h_buffer_credits to h_total_credits jbd2: Reserve space for revoke descriptor blocks jbd2: Drop jbd2_space_needed() jbd2: Account descriptor blocks into t_outstanding_credits jbd2: Factor out common parts of stopping and restarting a handle ...
2019-11-06Merge branch 'jk/jbd2-revoke-overflow'Theodore Ts'o1-107/+149
2019-11-06jbd2: Fine tune estimate of necessary descriptor blocksJan Kara1-5/+16
Currently we reserve j_max_transaction_buffers / 32 for transaction descriptor blocks. Now that revoke descriptors are accounted for separately this estimate is unnecessarily high and we can actually compute much tighter estimate. In the common case of 32k journal blocks and 4k blocksize this actually reduces the amount of reserved descriptor blocks from 256 to ~25 which allows us to fit more real data into a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-25-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Provide trace event for handle restartsJan Kara1-1/+7
Provide trace event for handle restarts to ease debugging. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-24-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Make credit checking more strictJan Kara1-1/+1
Make checking of available credits in jbd2_journal_dirty_metadata() more strict. There should be always enough credits in the handle to write all potential revoke descriptors. Also we warn in case there are not enough credits since this is a bug in the filesystem. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-22-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Rename h_buffer_credits to h_total_creditsJan Kara1-15/+15
The credit counter now contains both buffer and revoke descriptor block credits. Rename to counter to h_total_credits to reflect that. No functional change. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-21-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Reserve space for revoke descriptor blocksJan Kara1-7/+47
Extend functions for starting, extending, and restarting transaction handles to take number of revoke records handle must be able to accommodate. These functions then make sure transaction has enough credits to be able to store resulting revoke descriptor blocks. Also revoke code tracks number of revoke records created by a handle to catch situation where some place didn't reserve enough space for revoke records. Similarly to standard transaction credits, space for unused reserved revoke records is released when the handle is stopped. On the ext4 side we currently take a simplistic approach of reserving space for 1024 revoke records for any transaction. This grows amount of credits reserved for each handle only by a few and is enough for any normal workload so that we don't hit warnings in jbd2. We will refine the logic in following commits. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Drop jbd2_space_needed()Jan Kara1-2/+3
The function is now just a trivial wrapper returning journal->j_max_transaction_buffers. Drop it. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-19-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Account descriptor blocks into t_outstanding_creditsJan Kara1-8/+12
Currently, journal descriptor blocks were not accounted in transaction->t_outstanding_credits and we were just leaving some slack space in the journal for them (in jbd2_log_space_left() and jbd2_space_needed()). This is making proper accounting (and reservation we want to add) of descriptor blocks difficult so switch to accounting descriptor blocks in transaction->t_outstanding_credits and just reserve the same amount of credits in t_outstanding credits for journal descriptor blocks when creating transaction. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Factor out common parts of stopping and restarting a handleJan Kara1-52/+46
jbd2__journal_restart() has quite some code that is common with jbd2_journal_stop(). Factor this functionality into stop_this_handle() helper and use it from both functions. Note that this also drops t_handle_lock protection from jbd2__journal_restart() as jbd2_journal_stop() does the same thing without it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-17-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Drop pointless wakeup from jbd2_journal_stop()Jan Kara1-4/+1
When we drop last handle from a transaction and journal->j_barrier_count > 0, jbd2_journal_stop() wakes up journal->j_wait_transaction_locked wait queue. This looks pointless - wait for outstanding handles always happens on journal->j_wait_updates waitqueue. journal->j_wait_transaction_locked is used to wait for transaction state changes and by start_this_handle() for waiting until journal->j_barrier_count drops to 0. The first case is clearly irrelevant here since only jbd2 thread changes transaction state. The second case looks related but jbd2_journal_unlock_updates() is responsible for the wakeup in this case. So just drop the wakeup. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-16-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Drop pointless check from jbd2_journal_stop()Jan Kara1-5/+2
If a transaction is larger than journal->j_max_transaction_buffers, that is a bug and not a trigger for transaction commit. Also the very next attempt to start new handle will start transaction commit anyway. So just remove the pointless check. Arguably, we could start transaction commit whenever the transaction size is *close* to journal->j_max_transaction_buffers. This has a potential to reduce latency of the next jbd2_journal_start() at the cost of somewhat smaller transactions. However for this to have any effect, it would mean that there isn't someone already waiting in jbd2_journal_start() which means metadata load for the fs is pretty light anyway so probably this optimization is not worth it. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-15-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06jbd2: Reorganize jbd2_journal_stop()Jan Kara1-24/+16
Move code in jbd2_journal_stop() around a bit. It removes some unnecessary code duplication and will make factoring out parts common with jbd2__journal_restart() easier. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-14-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-10-21jbd2: Make state lock a spinlockThomas Gleixner1-53/+47
Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held region because bit spinlocks disable preemption even on RT. A first attempt was to replace state lock with a spinlock placed in struct buffer_head and make the locking conditional on PREEMPT_RT and DEBUG_BIT_SPINLOCKS. Jan pointed out that there is a 4 byte hole in struct journal_head where a regular spinlock fits in and he would not object to convert the state lock to a spinlock unconditionally. Aside of solving the RT problem, this also gains lockdep coverage for the journal head state lock (bit-spinlocks are not covered by lockdep as it's hard to fit a lockdep map into a single bit). The trivial change would have been to convert the jbd_*lock_bh_state() inlines, but that comes with the downside that these functions take a buffer head pointer which needs to be converted to a journal head pointer which adds another level of indirection. As almost all functions which use this lock have a journal head pointer readily available, it makes more sense to remove the lock helper inlines and write out spin_*lock() at all call sites. Fixup all locking comments as well. Suggested-by: Jan Kara <jack@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jan Kara <jack@suse.com> Cc: linux-ext4@vger.kernel.org Link: https://lore.kernel.org/r/20190809124233.13277-7-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-10-21jbd2: Don't call __bforget() unnecessarilyJan Kara1-5/+4
jbd2_journal_forget() jumps to 'not_jbd' branch which calls __bforget() in cases where the buffer is clean which is pointless. In case of failed assertion, it can be even argued that it is safer not to touch buffer's dirty bits. Also logically it makes more sense to just jump to 'drop' and that will make logic also simpler when we switch bh_state_lock to a spinlock. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-6-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-10-21jbd2: Drop unnecessary branch from jbd2_journal_forget()Jan Kara1-4/+0
We have cleared both dirty & jbddirty bits from the bh. So there's no difference between bforget() and brelse(). Thus there's no point jumping to no_jbd branch. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-10-21jbd2: Move dropping of jh reference out of un/re-filing functionsJan Kara1-8/+15
__jbd2_journal_unfile_buffer() and __jbd2_journal_refile_buffer() drop transaction's jh reference when they remove jh from a transaction. This will be however inconvenient once we move state lock into journal_head itself as we still need to unlock it and we'd need to grab jh reference just for that. Move dropping of jh reference out of these functions into the few callers. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-10-21jbd2: Simplify journal_unmap_buffer()Thomas Gleixner1-6/+2
journal_unmap_buffer() checks first whether the buffer head is a journal. If so it takes locks and then invokes jbd2_journal_grab_journal_head() followed by another check whether this is journal head buffer. The double checking is pointless. Replace the initial check with jbd2_journal_grab_journal_head() which alredy checks whether the buffer head is actually a journal. Allows also early access to the journal head pointer for the upcoming conversion of state lock to a regular spinlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: linux-ext4@vger.kernel.org Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-10-09locking/lockdep: Remove unused @nested argument from lock_release()Qian Cai1-2/+2
Since the following commit: b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release") @nested is no longer used in lock_release(), so remove it from all lock_release() calls and friends. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: alexander.levin@microsoft.com Cc: daniel@iogearbox.net Cc: davem@davemloft.net Cc: dri-devel@lists.freedesktop.org Cc: duyuyang@gmail.com Cc: gregkh@linuxfoundation.org Cc: hannes@cmpxchg.org Cc: intel-gfx@lists.freedesktop.org Cc: jack@suse.com Cc: jlbec@evilplan.or Cc: joonas.lahtinen@linux.intel.com Cc: joseph.qi@linux.alibaba.com Cc: jslaby@suse.com Cc: juri.lelli@redhat.com Cc: maarten.lankhorst@linux.intel.com Cc: mark@fasheh.com Cc: mhocko@kernel.org Cc: mripard@kernel.org Cc: ocfs2-devel@oss.oracle.com Cc: rodrigo.vivi@intel.com Cc: sean@poorly.run Cc: st@kernel.org Cc: tj@kernel.org Cc: tytso@mit.edu Cc: vdavydov.dev@gmail.com Cc: vincent.guittot@linaro.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-25jbd2: remove jbd2_journal_inode_add_[write|wait]Joseph Qi1-12/+0
Since ext4/ocfs2 are using jbd2_inode dirty range scoping APIs now, jbd2_journal_inode_add_[write|wait] are not used any more, remove them. Link: http://lkml.kernel.org/r/1562977611-8412-2-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Ross Zwisler <zwisler@google.com> Acked-by: Changwei Ge <chge@linux.alibaba.com> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-25jbd2: add missing tracepoint for reserved handleXiaoguang Wang1-0/+3
This issue was found when I use ebpf to trace every jbd2 handle's running info in dioread_nolock case. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-06-21jbd2: introduce jbd2_inode dirty range scopingRoss Zwisler1-21/+28
Currently both journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() operate on the entire address space of each of the inodes associated with a given journal entry. The consequence of this is that if we have an inode where we are constantly appending dirty pages we can end up waiting for an indefinite amount of time in journal_finish_inode_data_buffers() while we wait for all the pages under writeback to be written out. The easiest way to cause this type of workload is do just dd from /dev/zero to a file until it fills the entire filesystem. This can cause journal_finish_inode_data_buffers() to wait for the duration of the entire dd operation. We can improve this situation by scoping each of the inode dirty ranges associated with a given transaction. We do this via the jbd2_inode structure so that the scoping is contained within jbd2 and so that it follows the lifetime and locking rules for that structure. This allows us to limit the writeback & wait in journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() respectively to the dirty range for a given struct jdb2_inode, keeping us from waiting forever if the inode in question is still being appended to. Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
2019-05-11jbd2: fix potential double freeChengguang Xu1-3/+5
When failing from creating cache jbd2_inode_cache, we will destroy the previously created cache jbd2_handle_cache twice. This patch fixes this by moving each cache initialization/destruction to its own separate, individual function. Signed-off-by: Chengguang Xu <cgxu519@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2019-03-01jbd2: jbd2_get_transaction does not need to return a valueLiu Song1-5/+3
In jbd2_get_transaction, a new transaction is initialized, and set to the j_running_transaction. No need for a return value, so remove it. Also, adjust some comments to match the actual operation of this function. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2019-02-21jbd2: fix compile warning when using JBUFFER_TRACEzhangyi (F)1-8/+8
The jh pointer may be used uninitialized in the two cases below and the compiler complain about it when enabling JBUFFER_TRACE macro, fix them. In file included from fs/jbd2/transaction.c:19:0: fs/jbd2/transaction.c: In function ‘jbd2_journal_get_undo_access’: ./include/linux/jbd2.h:1637:38: warning: ‘jh’ is used uninitialized in this function [-Wuninitialized] #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0) ^ fs/jbd2/transaction.c:1219:23: note: ‘jh’ was declared here struct journal_head *jh; ^ In file included from fs/jbd2/transaction.c:19:0: fs/jbd2/transaction.c: In function ‘jbd2_journal_dirty_metadata’: ./include/linux/jbd2.h:1637:38: warning: ‘jh’ may be used uninitialized in this function [-Wmaybe-uninitialized] #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0) ^ fs/jbd2/transaction.c:1332:23: note: ‘jh’ was declared here struct journal_head *jh; ^ Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz>
2019-02-11jbd2: discard dirty data when forgetting an un-journalled bufferzhangyi (F)1-4/+38
We do not unmap and clear dirty flag when forgetting a buffer without journal or does not belongs to any transaction, so the invalid dirty data may still be written to the disk later. It's fine if the corresponding block is never used before the next mount, and it's also fine that we invoke clean_bdev_aliases() related functions to unmap the block device mapping when re-allocating such freed block as data block. But this logic is somewhat fragile and risky that may lead to data corruption if we forget to clean bdev aliases. So, It's better to discard dirty data during forget time. We have been already handled all the cases of forgetting journalled buffer, this patch deal with the remaining two cases. - buffer is not journalled yet, - buffer is journalled but doesn't belongs to any transaction. We invoke __bforget() instead of __brelese() when forgetting an un-journalled buffer in jbd2_journal_forget(). After this patch we can remove all clean_bdev_aliases() related calls in ext4. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2019-02-11jbd2: clear dirty flag when revoking a buffer from an older transactionzhangyi (F)1-5/+12
Now, we capture a data corruption problem on ext4 while we're truncating an extent index block. Imaging that if we are revoking a buffer which has been journaled by the committing transaction, the buffer's jbddirty flag will not be cleared in jbd2_journal_forget(), so the commit code will set the buffer dirty flag again after refile the buffer. fsx kjournald2 jbd2_journal_commit_transaction jbd2_journal_revoke commit phase 1~5... jbd2_journal_forget belongs to older transaction commit phase 6 jbddirty not clear __jbd2_journal_refile_buffer __jbd2_journal_unfile_buffer test_clear_buffer_jbddirty mark_buffer_dirty Finally, if the freed extent index block was allocated again as data block by some other files, it may corrupt the file data after writing cached pages later, such as during unmount time. (In general, clean_bdev_aliases() related helpers should be invoked after re-allocation to prevent the above corruption, but unfortunately we missed it when zeroout the head of extra extent blocks in ext4_ext_handle_unwritten_extents()). This patch mark buffer as freed and set j_next_transaction to the new transaction when it already belongs to the committing transaction in jbd2_journal_forget(), so that commit code knows it should clear dirty bits when it is done with the buffer. This problem can be reproduced by xfstests generic/455 easily with seeds (3246 3247 3248 3249). Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
2018-12-04jbd2: clean up indentation issue, replace spaces with tabColin Ian King1-1/+1
There is a statement that is indented with spaces, replace it with a tab. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04jbd2: avoid long hold times of j_state_lock while committing a transactionJan Kara1-5/+38
We can hold j_state_lock for writing at the beginning of jbd2_journal_commit_transaction() for a rather long time (reportedly for 30 ms) due cleaning revoke bits of all revoked buffers under it. The handling of revoke tables as well as cleaning of t_reserved_list, and checkpoint lists does not need j_state_lock for anything. It is only needed to prevent new handles from joining the transaction. Generally T_LOCKED transaction state prevents new handles from joining the transaction - except for reserved handles which have to allowed to join while we wait for other handles to complete. To prevent reserved handles from joining the transaction while cleaning up lists, add new transaction state T_SWITCH and watch for it when starting reserved handles. With this we can just drop the lock for operations that don't need it. Reported-and-tested-by: Adrian Hunter <adrian.hunter@intel.com> Suggested-by: "Theodore Y. Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-17jbd2: don't mark block as modified if the handle is out of creditsTheodore Ts'o1-1/+8
Do not set the b_modified flag in block's journal head should not until after we're sure that jbd2_journal_dirty_metadat() will not abort with an error due to there not being enough space reserved in the jbd2 handle. Otherwise, future attempts to modify the buffer may lead a large number of spurious errors and warnings. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2018-05-21jbd2: remove NULL check before calling kmem_cache_destroy()Wang Long1-4/+2
The kmem_cache_destroy() function already checks for null pointers, so we can remove the check at the call site. This patch also sets jbd2_handle_cache and jbd2_inode_cache to be NULL after freeing them in jbd2_journal_destroy_handle_cache(). Signed-off-by: Wang Long <wanglong19@meituan.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2018-04-18ext4: set h_journal if there is a failure starting a reserved handleTheodore Ts'o1-0/+1
If ext4 tries to start a reserved handle via jbd2_journal_start_reserved(), and the journal has been aborted, this can result in a NULL pointer dereference. This is because the fields h_journal and h_transaction in the handle structure share the same memory, via a union, so jbd2_journal_start_reserved() will clear h_journal before calling start_this_handle(). If this function fails due to an aborted handle, h_journal will still be NULL, and the call to jbd2_journal_free_reserved() will pass a NULL journal to sub_reserve_credits(). This can be reproduced by running "kvm-xfstests -c dioread_nolock generic/475". Cc: stable@kernel.org # 3.11 Fixes: 8f7d89f36829b ("jbd2: transaction reservation support") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz>
2018-01-10jbd2: fix sphinx kernel-doc build warningsTobin C. Harding1-1/+4
Sphinx emits various (26) warnings when building make target 'htmldocs'. Currently struct definitions contain duplicate documentation, some as kernel-docs and some as standard c89 comments. We can reduce duplication while cleaning up the kernel docs. Move all kernel-docs to right above each struct member. Use the set of all existing comments (kernel-doc and c89). Add documentation for missing struct members and function arguments. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2017-12-18ext4: fix up remaining files with SPDX cleanupsTheodore Ts'o1-4/+1
A number of ext4 source files were skipped due because their copyright permission statements didn't match the expected text used by the automated conversion utilities. I've added SPDX tags for the rest. While looking at some of these files, I've noticed that we have quite a bit of variation on the licenses that were used --- in particular some of the Red Hat licenses on the jbd2 files use a GPL2+ license, and we have some files that have a LGPL-2.1 license (which was quite surprising). I've not attempted to do any license changes. Even if it is perfectly legal to relicense to GPL 2.0-only for consistency's sake, that should be done with ext4 developer community discussion. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-07-04Merge tag 'docs-4.13' of git://git.lwn.net/linuxLinus Torvalds1-21/+21
Pull documentation updates from Jonathan Corbet: "There has been a fair amount of activity in the docs tree this time around. Highlights include: - Conversion of a bunch of security documentation into RST - The conversion of the remaining DocBook templates by The Amazing Mauro Machine. We can now drop the entire DocBook build chain. - The usual collection of fixes and minor updates" * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits) scripts/kernel-doc: handle DECLARE_HASHTABLE Documentation: atomic_ops.txt is core-api/atomic_ops.rst Docs: clean up some DocBook loose ends Make the main documentation title less Geocities Docs: Use kernel-figure in vidioc-g-selection.rst Docs: fix table problems in ras.rst Docs: Fix breakage with Sphinx 1.5 and upper Docs: Include the Latex "ifthen" package doc/kokr/howto: Only send regression fixes after -rc1 docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters doc: Document suitability of IBM Verse for kernel development Doc: fix a markup error in coding-style.rst docs: driver-api: i2c: remove some outdated information Documentation: DMA API: fix a typo in a function name Docs: Insert missing space to separate link from text doc/ko_KR/memory-barriers: Update control-dependencies example Documentation, kbuild: fix typo "minimun" -> "minimum" docs: Fix some formatting issues in request-key.rst doc: ReSTify keys-trusted-encrypted.txt doc: ReSTify keys-request-key.txt ...
2017-05-22jbd2: preserve original nofs flag during journal restartTahsin Erdogan1-0/+6
When a transaction starts, start_this_handle() saves current PF_MEMALLOC_NOFS value so that it can be restored at journal stop time. Journal restart is a special case that calls start_this_handle() without stopping the transaction. start_this_handle() isn't aware that the original value is already stored so it overwrites it with current value. For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set at the end: jbd2_journal_start() jbd2__journal_restart() jbd2_journal_stop() Make jbd2__journal_restart() restore the original value before calling start_this_handle(). Fixes: 81378da64de6 ("jbd2: mark the transaction context with the scope GFP_NOFS context") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2017-05-16fs: jbd2: escape a string with special chars on a kernel-docMauro Carvalho Chehab1-2/+2
kernel-doc will try to interpret a foo() string, except if properly escaped. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-16fs: jbd2: make jbd2_journal_start() kernel-doc parseableMauro Carvalho Chehab1-19/+19
kernel-doc script expects that a function documentation to be just before the function, otherwise it will be ignored. So, move the kernel-doc markup to the right place. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-04jbd2: mark the transaction context with the scope GFP_NOFS contextMichal Hocko1-0/+12
now that we have memalloc_nofs_{save,restore} api we can mark the whole transaction context as implicitly GFP_NOFS. All allocations will automatically inherit GFP_NOFS this way. This means that we do not have to mark any of those requests with GFP_NOFS and moreover all the ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded GFP_KERNEL allocations deep inside the vmalloc will be NOFS now. [akpm@linux-foundation.org: tweak comments] Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-05jbd2: don't leak modified metadata buffers on an aborted journalTheodore Ts'o1-1/+3
If the journal has been aborted, we shouldn't mark the underlying buffer head as dirty, since that will cause the metadata block to get modified. And if the journal has been aborted, we shouldn't allow this since it will almost certainly lead to a corrupted file system. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-10-13jbd2: fix incorrect unlock on j_list_lockTaesoo Kim1-1/+2
When 'jh->b_transaction == transaction' (asserted by below) J_ASSERT_JH(jh, (jh->b_transaction == transaction || ... 'journal->j_list_lock' will be incorrectly unlocked, since the the lock is aquired only at the end of if / else-if statements (missing the else case). Signed-off-by: Taesoo Kim <tsgatesv@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Fixes: 6e4862a5bb9d12be87e4ea5d9a60836ebed71d28 Cc: stable@vger.kernel.org # 3.14+
2016-09-22jbd2: fix lockdep annotation in add_transaction_credits()Jan Kara1-2/+4
Thomas has reported a lockdep splat hitting in add_transaction_credits(). The problem is that that function calls jbd2_might_wait_for_commit() while holding j_state_lock which is wrong (we do not really wait for transaction commit while holding that lock). Fix the problem by moving jbd2_might_wait_for_commit() into places where we are ready to wait for transaction commit and thus j_state_lock is unlocked. Cc: stable@vger.kernel.org Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-06-30jbd2: track more dependencies on transaction commitJan Kara1-0/+4
So far we were tracking only dependency on transaction commit due to starting a new handle (which may require commit to start a new transaction). Now add tracking also for other cases where we wait for transaction commit. This way lockdep can catch deadlocks e. g. because we call jbd2_journal_stop() for a synchronous handle with some locks held which rank below transaction start. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-06-30jbd2: move lockdep tracking to journal_sJan Kara1-8/+3
Currently lockdep map is tracked in each journal handle. To be able to expand lockdep support to cover also other cases where we depend on transaction commit and where handle is not available, move lockdep map into struct journal_s. Since this makes the lockdep map shared for all handles, we have to use rwsem_acquire_read() for acquisitions now. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-06-30jbd2: move lockdep instrumentation for jbd2 handlesJan Kara1-2/+2
The transaction the handle references is free to commit once we've decremented t_updates counter. Move the lockdep instrumentation to that place. Currently it was a bit later which did not really matter but subsequent improvements to lockdep instrumentation would cause false positives with it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-24Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-4/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Fix a number of bugs, most notably a potential stale data exposure after a crash and a potential BUG_ON crash if a file has the data journalling flag enabled while it has dirty delayed allocation blocks that haven't been written yet. Also fix a potential crash in the new project quota code and a maliciously corrupted file system. In addition, fix some DAX-specific bugs, including when there is a transient ENOSPC situation and races between writes via direct I/O and an mmap'ed segment that could lead to lost I/O. Finally the usual set of miscellaneous cleanups" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: pre-zero allocated blocks for DAX IO ext4: refactor direct IO code ext4: fix race in transient ENOSPC detection ext4: handle transient ENOSPC properly for DAX dax: call get_blocks() with create == 1 for write faults to unwritten extents ext4: remove unmeetable inconsisteny check from ext4_find_extent() jbd2: remove excess descriptions for handle_s ext4: remove unnecessary bio get/put ext4: silence UBSAN in ext4_mb_init() ext4: address UBSAN warning in mb_find_order_for_block() ext4: fix oops on corrupted filesystem ext4: fix check of dqget() return value in ext4_ioctl_setproject() ext4: clean up error handling when orphan list is corrupted ext4: fix hang when processing corrupted orphaned inode list ext4: remove trailing \n from ext4_warning/ext4_error calls ext4: fix races between changing inode journal mode and ext4_writepages ext4: handle unwritten or delalloc buffers before enabling data journaling ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart() ext4: do not ask jbd2 to write data for delalloc buffers jbd2: add support for avoiding data writes during transaction commits ...
2016-04-24jbd2: add support for avoiding data writes during transaction commitsJan Kara1-4/+18
Currently when filesystem needs to make sure data is on permanent storage before committing a transaction it adds inode to transaction's inode list. During transaction commit, jbd2 writes back all dirty buffers that have allocated underlying blocks and waits for the IO to finish. However when doing writeback for delayed allocated data, we allocate blocks and immediately submit the data. Thus asking jbd2 to write dirty pages just unnecessarily adds more work to jbd2 possibly writing back other redirtied blocks. Add support to jbd2 to allow filesystem to ask jbd2 to only wait for outstanding data writes before committing a transaction and thus avoid unnecessary writes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-18Merge branch 'master' into for-nextJiri Kosina1-28/+20
Sync with Linus' tree so that patches against newer codebase can be applied. Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-18Doc: treewide : Fix typos in DocBook/filesystem.xmlMasanari Iida1-3/+3
This patch fix spelling typos found in DocBook/filesystem.xml. It is because the file was generated from comments in code, I have to fix the comments in codes, instead of xml file. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov1-2/+2
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>