summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-06-28exfat: avoid incorrectly releasing for root inodeChen Li1-1/+1
In d_make_root, when we fail to allocate dentry for root inode, we will iput root inode and returned value is NULL in this function. So we do not need to release this inode again at d_make_root's caller. Signed-off-by: Chen Li <chenli@uniontech.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2021-06-28orangefs: fix orangefs df output.Mike Marshall1-1/+1
Orangefs df output is whacky. Walt Ligon suggested this might fix it. It seems way more in line with reality now... Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2021-06-28orangefs: readahead adjustmentMike Marshall1-4/+3
Matthew Wilcox suggested that perhaps it could be "possible for rac->file to be NULL if the caller doesn't have a struct file"... Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2021-06-28gfs2: Fix error handling in init_statfsAndreas Gruenbacher1-0/+1
On an error path, init_statfs calls iput(pn) after pn has already been put. Fix that by setting pn to NULL after the initial iput. Fixes: 97fd734ba17e ("gfs2: lookup local statfs inodes prior to journal recovery") Cc: stable@vger.kernel.org # v5.10+ Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-28gfs2: Fix underflow in gfs2_page_mkwriteAndreas Gruenbacher1-2/+2
On filesystems with a block size smaller than PAGE_SIZE and non-empty files smaller then PAGE_SIZE, gfs2_page_mkwrite could end up allocating excess blocks beyond the end of the file, similar to fallocate. This doesn't make sense; fix it. Reported-by: Bob Peterson <rpeterso@redhat.com> Fixes: 184b4e60853d ("gfs2: Fix end-of-file handling in gfs2_page_mkwrite") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-28gfs2: Use list_move_tail instead of list_del/list_add_tailBaokun Li1-2/+1
Using list_move_tail() instead of list_del() + list_add_tail(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-28gfs2: Fix do_gfs2_set_flags descriptionAndreas Gruenbacher1-1/+1
Commit 88b631cbfbeb ("gfs2: convert to fileattr") changed the argument list without updating the description. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-28ksmbd: set MAY_* flags together with open flagsHyunchul Lee3-45/+38
set MAY_* flags together with open flags and remove ksmbd_vfs_inode_permission(). Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: factor out a ksmbd_vfs_lock_parent helperHyunchul Lee1-63/+62
Factor out a self-contained helper to get stable parent dentry. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: move fs/cifsd to fs/ksmbdNamjae Jeon61-9/+9
Move fs/cifsd to fs/ksmbd and rename the remaining cifsd name to ksmbd. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: use f_bsize in FS_SECTOR_SIZE_INFORMATIONNamjae Jeon3-47/+4
Use f_bsize in FS_SECTOR_SIZE_INFORMATION to avoid the access the block layer. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: remove unneeded NULL check in the list iteratorNamjae Jeon2-10/+7
Remove unneeded NULL check in the list iterator. And use list_for_each_entry_safe instead of list_for_each_safe. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: use f_bsize instead of q->limits.logical_block_sizeNamjae Jeon3-42/+7
Use f_bsize instead of q->limits.logical_block_size. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: change stream type macro to enumerationNamjae Jeon2-3/+8
Change stream type macro to enumeration and move it to vfs.h. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: opencode posix acl functions instead of wrappersNamjae Jeon5-48/+19
Add select FS_POSIX_ACL in Kconfig and then opencode posix acl functions instead of wrappers Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: factor out a ksmbd_validate_entry_in_use helper from __ksmbd_vfs_renameNamjae Jeon1-15/+26
Factor out a self-contained helper to find sub file/dir in use. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: opencode to avoid trivial wrappersNamjae Jeon3-39/+14
Opencode to avoid trivial wrappers that just make the code hard to follow. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28ksmbd: remove ksmbd_err/infoNamjae Jeon18-307/+294
Use the pr_fmt built into pr_*. and use pr_err/info after removing wrapper ksmbd_err/info. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-26NFS: Avoid duplicate resets of attribute cache timeoutsTrond Myklebust1-6/+1
We know that the attributes changed on the server if and only if the change attribute is different. Otherwise, we're just refreshing our cache with values that were already known to be stale. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-26NFSv4: Fix handling of non-atomic change attrbute updatesTrond Myklebust1-18/+15
If the change attribute update is declared to be non-atomic by the server, or our cached value does not match the server's value before the operation was performed, then we should declare the inode cache invalid. On the other hand, if the change to the directory raced with a lookup or getattr which already updated the change attribute, then optimise away the revalidation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-26NFS: Fix up inode attribute revalidation timeoutsTrond Myklebust1-34/+16
The inode is considered revalidated when we've checked the value of the change attribute against our cached value since that suffices to establish whether or not the other cached values are valid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-26xfs: don't wait on future iclogs when pushing the CILDave Chinner1-7/+28
The iclogbuf ring attached to the struct xlog is circular, hence the first and last iclogs in the ring can only be determined by comparing them against the log->l_iclog pointer. In xfs_cil_push_work(), we want to wait on previous iclogs that were issued so that we can flush them to stable storage with the commit record write, and it simply waits on the previous iclog in the ring. This, however, leads to CIL push hangs in generic/019 like so: task:kworker/u33:0 state:D stack:12680 pid: 7 ppid: 2 flags:0x00004000 Workqueue: xfs-cil/pmem1 xlog_cil_push_work Call Trace: __schedule+0x30b/0x9f0 schedule+0x68/0xe0 xlog_wait_on_iclog+0x121/0x190 ? wake_up_q+0xa0/0xa0 xlog_cil_push_work+0x994/0xa10 ? _raw_spin_lock+0x15/0x20 ? xfs_swap_extents+0x920/0x920 process_one_work+0x1ab/0x390 worker_thread+0x56/0x3d0 ? rescuer_thread+0x3c0/0x3c0 kthread+0x14d/0x170 ? __kthread_bind_mask+0x70/0x70 ret_from_fork+0x1f/0x30 With other threads blocking in either xlog_state_get_iclog_space() waiting for iclog space or xlog_grant_head_wait() waiting for log reservation space. The problem here is that the previous iclog on the ring might actually be a future iclog. That is, if log->l_iclog points at commit_iclog, commit_iclog is the first (oldest) iclog in the ring and there are no previous iclogs pending as they have all completed their IO and been activated again. IOWs, commit_iclog->ic_prev points to an iclog that will be written in the future, not one that has been written in the past. Hence, in this case, waiting on the ->ic_prev iclog is incorrect behaviour, and depending on the state of the future iclog, we can end up with a circular ABA wait cycle and we hang. The fix is made more complex by the fact that many iclogs states cannot be used to determine if the iclog is a past or future iclog. Hence we have to determine past iclogs by checking the LSN of the iclog rather than their state. A past ACTIVE iclog will have a LSN of zero, while a future ACTIVE iclog will have a LSN greater than the current iclog. We don't wait on either of these cases. Similarly, a future iclog that hasn't completed IO will have an LSN greater than the current iclog and so we don't wait on them. A past iclog that is still undergoing IO completion will have a LSN less than the current iclog and those are the only iclogs that we need to wait on. Hence we can use the iclog LSN to determine what iclogs we need to wait on here. Fixes: 5fd9256ce156 ("xfs: separate CIL commit record IO") Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-25smb3: prevent races updating CurrentMidSteve French2-1/+4
There was one place where we weren't locking CurrentMid, and although likely to be safe since even without the lock since it is during negotiate protocol, it is more consistent to lock it in this last remaining place, and avoids confusing Coverity warning. Addresses-Coverity: 1486665 ("Data race condition") Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-25xfs: Fix a CIL UAF by getting get rid of the iclog callback lockDave Chinner3-35/+18
The iclog callback chain has it's own lock. That was added way back in 2008 by myself to alleviate severe lock contention on the icloglock in commit 114d23aae512 ("[XFS] Per iclog callback chain lock"). This was long before delayed logging took the icloglock out of the hot transaction commit path and removed all contention on it. Hence the separate ic_callback_lock doesn't serve any scalability purpose anymore, and hasn't for close on a decade. Further, we only attach callbacks to iclogs in one place where we are already taking the icloglock soon after attaching the callbacks. We also have to drop the icloglock to run callbacks and grab it immediately afterwards again. So given that the icloglock is no longer hot, making it cover callbacks again doesn't really change the locking patterns very much at all. We also need to extend the icloglock to cover callback addition to fix a zero-day UAF in the CIL push code. This occurs when shutdown races with xlog_cil_push_work() and the shutdown runs the callbacks before the push releases the iclog. This results in the CIL context structure attached to the iclog being freed by the callback before the CIL push has finished referencing it, leading to UAF bugs. Hence, to avoid this UAF, we need the callback attachment to be atomic with post processing of the commit iclog and references to the structures being attached to the iclog. This requires holding the icloglock as that's the only way to serialise iclog state against a shutdown in progress. The result is we need to be using the icloglock to protect the callback list addition and removal and serialise them with shutdown. That makes the ic_callback_lock redundant and so it can be removed. Fixes: 71e330b59390 ("xfs: Introduce delayed logging core code") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-25xfs: remove callback dequeue loop from xlog_state_do_iclog_callbacksDave Chinner1-10/+6
If we are processing callbacks on an iclog, nothing can be concurrently adding callbacks to the loop. We only add callbacks to the iclog when they are in ACTIVE or WANT_SYNC state, and we explicitly do not add callbacks if the iclog is already in IOERROR state. The only way to have a dequeue racing with an enqueue is to be processing a shutdown without a direct reference to an iclog in ACTIVE or WANT_SYNC state. As the enqueue avoids this race condition, we only ever need a single dequeue operation in xlog_state_do_iclog_callbacks(). Hence we can remove the loop. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-25xfs: don't nest icloglock inside ic_callback_lockDave Chinner1-14/+4
It's completely unnecessary because callbacks are added to iclogs without holding the icloglock, hence no amount of ordering between the icloglock and ic_callback_lock will order the removal of callbacks from the iclog. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-25xfs: Initialize error in xfs_attr_remove_iterAllison Henderson1-1/+1
A recent bug report generated a warning that a code path in xfs_attr_remove_iter could potentially return error uninitialized in the case of XFS_DAS_RM_SHRINK state. Fix this by initializing error. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+1
Merge misc fixes from Andrew Morton: "24 patches, based on 4a09d388f2ab382f217a764e6a152b3f614246f6. Subsystems affected by this patch series: mm (thp, vmalloc, hugetlb, memory-failure, and pagealloc), nilfs2, kthread, MAINTAINERS, and mailmap" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits) mailmap: add Marek's other e-mail address and identity without diacritics MAINTAINERS: fix Marek's identity again mm/page_alloc: do bulk array bounds check after checking populated elements mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array mm/hwpoison: do not lock page again when me_huge_page() successfully recovers mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned mm/memory-failure: use a mutex to avoid memory_failure() races mm, futex: fix shared futex pgoff on shmem huge page kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() kthread_worker: split code for canceling the delayed work timer mm/vmalloc: unbreak kasan vmalloc support KVM: s390: prepare for hugepage vmalloc mm/vmalloc: add vmalloc_no_huge nilfs2: fix memory leak in nilfs_sysfs_delete_device_group mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes mm: page_vma_mapped_walk(): get vma_address_end() earlier mm: page_vma_mapped_walk(): use goto instead of while (1) mm: page_vma_mapped_walk(): add a level of indentation mm: page_vma_mapped_walk(): crossing page table boundary ...
2021-06-25Merge tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-clientLinus Torvalds4-17/+26
Pull ceph fixes from Ilya Dryomov: "Two regression fixes from the merge window: one in the auth code affecting old clusters and one in the filesystem for proper propagation of MDS request errors. Also included a locking fix for async creates, marked for stable" * tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client: libceph: set global_id as soon as we get an auth ticket libceph: don't pass result into ac->ops->handle_reply() ceph: fix error handling in ceph_atomic_open and ceph_lookup ceph: must hold snap_rwsem when filling inode for async create
2021-06-25Merge tag 'netfs-fixes-20210621' of ↵Linus Torvalds2-15/+45
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull netfs fixes from David Howells: "This contains patches to fix netfs_write_begin() and afs_write_end() in the following ways: (1) In netfs_write_begin(), extract the decision about whether to skip a page out to its own helper and have that clear around the region to be written, but not clear that region. This requires the filesystem to patch it up afterwards if the hole doesn't get completely filled. (2) Use offset_in_thp() in (1) rather than manually calculating the offset into the page. (3) Due to (1), afs_write_end() now needs to handle short data write into the page by generic_perform_write(). I've adopted an analogous approach to ceph of just returning 0 in this case and letting the caller go round again. It also adds a note that (in the future) the len parameter may extend beyond the page allocated. This is because the page allocation is deferred to write_begin() and that gets to decide what size of THP to allocate." Jeff Layton points out: "The netfs fix in particular fixes a data corruption bug in cephfs" * tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: netfs: fix test for whether we can skip read when writing beyond EOF afs: Fix afs_write_end() to handle short writes
2021-06-25nilfs2: fix memory leak in nilfs_sysfs_delete_device_groupPavel Skripkin1-0/+1
My local syzbot instance hit memory leak in nilfs2. The problem was in missing kobject_put() in nilfs_sysfs_delete_device_group(). kobject_del() does not call kobject_cleanup() for passed kobject and it leads to leaking duped kobject name if kobject_put() was not called. Fail log: BUG: memory leak unreferenced object 0xffff8880596171e0 (size 8): comm "syz-executor379", pid 8381, jiffies 4294980258 (age 21.100s) hex dump (first 8 bytes): 6c 6f 6f 70 30 00 00 00 loop0... backtrace: kstrdup+0x36/0x70 mm/util.c:60 kstrdup_const+0x53/0x80 mm/util.c:83 kvasprintf_const+0x108/0x190 lib/kasprintf.c:48 kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xc9/0x160 lib/kobject.c:473 nilfs_sysfs_create_device_group+0x150/0x800 fs/nilfs2/sysfs.c:999 init_nilfs+0xe26/0x12b0 fs/nilfs2/the_nilfs.c:637 Link: https://lkml.kernel.org/r/20210612140559.20022-1-paskripkin@gmail.com Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-25cifs: fix missing spinlock around update to ses->statusSteve French2-2/+6
In the other places where we update ses->status we protect the updates via GlobalMid_Lock. So to be consistent add the same locking around it in cifs_put_smb_ses where it was missing. Addresses-Coverity: 1268904 ("Data race condition") Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-24block: pass a gendisk to bdev_disk_changedChristoph Hellwig1-2/+2
bdev_disk_changed can only operate on whole devices. Make that clear by passing a gendisk instead of the struct block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210624123240.441814-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-24block: move bdev_disk_changedChristoph Hellwig1-53/+0
Move bdev_disk_changed to block/partitions/core.c, together with the rest of the partition scanning code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210624123240.441814-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-24fs: remove bdev_try_to_free_page callbackZhang Yi1-15/+0
After remove the unique user of sop->bdev_try_to_free_page() callback, we could remove the callback and the corresponding blkdev_releasepage() at all. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-9-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24ext4: remove bdev_try_to_free_page() callbackZhang Yi1-21/+0
After we introduce a jbd2 shrinker to release checkpointed buffer's journal head, we could free buffer without bdev_try_to_free_page() under memory pressure. So this patch remove the whole bdev_try_to_free_page() callback directly. It also remove many use-after-free issues relate to it together. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-8-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: simplify journal_clean_one_cp_list()Zhang Yi1-26/+4
Now that __try_to_free_cp_buf() remove checkpointed buffer or transaction when the buffer is not 'busy', which is only called by journal_clean_one_cp_list(). This patch simplify this function by remove __try_to_free_cp_buf() and invoke __cp_buffer_busy() directly. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-7-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2,ext4: add a shrinker to release checkpointed buffersZhang Yi3-0/+242
Current metadata buffer release logic in bdev_try_to_free_page() have a lot of use-after-free issues when umount filesystem concurrently, and it is difficult to fix directly because ext4 is the only user of s_op->bdev_try_to_free_page callback and we may have to add more special refcount or lock that is only used by ext4 into the common vfs layer, which is unacceptable. One better solution is remove the bdev_try_to_free_page callback, but the real problem is we cannot easily release journal_head on the checkpointed buffer, so try_to_free_buffers() cannot release buffers and page under memory pressure, which is more likely to trigger out-of-memory. So we cannot remove the callback directly before we find another way to release journal_head. This patch introduce a shrinker to free journal_head on the checkpointed transaction. After the journal_head got freed, try_to_free_buffers() could free buffer properly. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-6-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: remove redundant buffer io error checksZhang Yi1-11/+2
Now that __jbd2_journal_remove_checkpoint() can detect buffer io error and mark journal checkpoint error, then we abort the journal later before updating log tail to ensure the filesystem works consistently. So we could remove other redundant buffer io error checkes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: don't abort the journal when freeing buffersZhang Yi1-17/+0
Now that we can be sure the journal is aborted once a buffer has failed to be written back to disk, we can remove the journal abort logic in jbd2_journal_try_to_free_buffers() which was introduced in commit c044f3d8360d ("jbd2: abort journal if free a async write error metadata buffer"), because it may cost and propably is not safe. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: ensure abort the journal if detect IO error when writing original ↵Zhang Yi2-0/+26
buffer back Although we merged c044f3d8360 ("jbd2: abort journal if free a async write error metadata buffer"), there is a race between jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still fail to detect the buffer write io error flag which may lead to filesystem inconsistency. jbd2_journal_try_to_free_buffers() ext4_put_super() jbd2_journal_destroy() __jbd2_journal_remove_checkpoint() detect buffer write error jbd2_log_do_checkpoint() jbd2_cleanup_journal_tail() <--- lead to inconsistency jbd2_journal_abort() Fix this issue by introducing a new atomic flag which only have one JBD2_CHECKPOINT_IO_ERROR bit now, and set it in __jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer which has write_io_error flag. Then jbd2_journal_destroy() will detect this mark and abort the journal to prevent updating log tail. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: remove the out label in __jbd2_journal_remove_checkpoint()Zhang Yi1-12/+12
The 'out' lable just return the 'ret' value and seems not required, so remove this label and switch to return appropriate value immediately. This patch also do some minor cleanup, no logical change. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24ext4: no need to verify new add extent blockyangerkun1-0/+1
ext4_ext_grow_indepth will add a new extent block which has init the expected content. We can mark this buffer as verified so to stop a useless check in __read_extent_tree_block. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210609075545.1442160-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: clean up misleading comments for jbd2_fc_release_bufsyangerkun1-8/+0
This comments was for jbd2_fc_wait_bufs, not for jbd2_fc_release_bufs. Remove this misleading comments. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210608141236.459441-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24ext4: add check to prevent attempting to resize an fs with sparse_super2Josh Triplett1-0/+4
The in-kernel ext4 resize code doesn't support filesystem with the sparse_super2 feature. It fails with errors like this and doesn't finish the resize: EXT4-fs (loop0): resizing filesystem from 16640 to 7864320 blocks EXT4-fs warning (device loop0): verify_reserved_gdb:760: reserved GDT 2 missing grp 1 (32770) EXT4-fs warning (device loop0): ext4_resize_fs:2111: error (-22) occurred during file system resize EXT4-fs (loop0): resized filesystem to 2097152 To reproduce: mkfs.ext4 -b 4096 -I 256 -J size=32 -E resize=$((256*1024*1024)) -O sparse_super2 ext4.img 65M truncate -s 30G ext4.img mount ext4.img /mnt python3 -c 'import fcntl, os, struct ; fd = os.open("/mnt", os.O_RDONLY | os.O_DIRECTORY) ; fcntl.ioctl(fd, 0x40086610, struct.pack("Q", 30 * 1024 * 1024 * 1024 // 4096), False) ; os.close(fd)' dmesg | tail e2fsck ext4.img The userspace resize2fs tool has a check for this case: it checks if the filesystem has sparse_super2 set and if the kernel provides /sys/fs/ext4/features/sparse_super2. However, the former check requires manually reading and parsing the filesystem superblock. Detect this case in ext4_resize_begin and error out early with a clear error message. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/74b8ae78405270211943cd7393e65586c5faeed1.1623093259.git.josh@joshtriplett.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24ext4: consolidate checks for resize of bigalloc into ext4_resize_beginJosh Triplett2-14/+5
Two different places checked for attempts to resize a filesystem with the bigalloc feature. Move the check into ext4_resize_begin, which both places already call. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/bee03303d999225ecb3bfa5be8576b2f4c6edbe6.1623093259.git.josh@joshtriplett.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()Ritesh Harjani3-34/+9
ext4_xattr_ibody_inline_set() & ext4_xattr_ibody_set() have the exact same definition. Hence remove ext4_xattr_ibody_inline_set() and all its call references. Convert the callers of it to call ext4_xattr_ibody_set() instead. [ Modified to preserve ext4_xattr_ibody_set() and remove ext4_xattr_ibody_inline_set() instead. -- TYT ] Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/fd566b799bbbbe9b668eb5eecde5b5e319e3694f.1622685482.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24ext4: fsmap: fix the block/inode bitmap commentRitesh Harjani1-2/+2
While debugging fstest ext4/027 failure, found below comment to be wrong and confusing. Hence fix it while we are at it. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/e79134132db7ea42f15747b5c669ee91cc1aacdf.1622432690.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24ext4: fix comment for s_hash_unsignedEric Biggers1-1/+1
Fix the comment for s_hash_unsigned to not be the opposite of what it actually is. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20210527235557.2377525-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24io_uring: Create define to modify a SQPOLL parameterOlivier Langlois1-2/+3
The magic number used to cap the number of entries extracted from an io_uring instance SQ before moving to the other instances is an interesting parameter to experiment with. A define has been created to make it easy to change its value from a single location. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b401640063e77ad3e9f921e09c9b3ac10a8bb923.1624473200.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>