summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
4 daysNFS: add barriers when testing for NFS_FSDATA_BLOCKEDNeilBrown1-16/+32
commit 99bc9f2eb3f79a2b4296d9bf43153e1d10ca50d3 upstream. dentry->d_fsdata is set to NFS_FSDATA_BLOCKED while unlinking or renaming-over a file to ensure that no open succeeds while the NFS operation progressed on the server. Setting dentry->d_fsdata to NFS_FSDATA_BLOCKED is done under ->d_lock after checking the refcount is not elevated. Any attempt to open the file (through that name) will go through lookp_open() which will take ->d_lock while incrementing the refcount, we can be sure that once the new value is set, __nfs_lookup_revalidate() *will* see the new value and will block. We don't have any locking guarantee that when we set ->d_fsdata to NULL, the wait_var_event() in __nfs_lookup_revalidate() will notice. wait/wake primitives do NOT provide barriers to guarantee order. We must use smp_load_acquire() in wait_var_event() to ensure we look at an up-to-date value, and must use smp_store_release() before wake_up_var(). This patch adds those barrier functions and factors out block_revalidate() and unblock_revalidate() far clarity. There is also a hypothetical bug in that if memory allocation fails (which never happens in practice) we might leave ->d_fsdata locked. This patch adds the missing call to unblock_revalidate(). Reported-and-tested-by: Richard Kojedzinszky <richard+debian+bugreport@kojedz.in> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071501 Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysNFS: unlink/rmdir shouldn't call d_delete() twice on ENOENTNeilBrown1-1/+2
commit f16857e62bac60786104c020ad7c86e2163b2c5b upstream. nfs_unlink() calls d_delete() twice if it receives ENOENT from the server - once in nfs_dentry_handle_enoent() from nfs_safe_remove and once in nfs_dentry_remove_handle_error(). nfs_rmddir() also calls it twice - the nfs_dentry_handle_enoent() call is direct and inside a region locked with ->rmdir_sem It is safe to call d_delete() twice if the refcount > 1 as the dentry is simply unhashed. If the refcount is 1, the first call sets d_inode to NULL and the second call crashes. This patch guards the d_delete() call from nfs_dentry_handle_enoent() leaving the one under ->remdir_sem in case that is important. In mainline it would be safe to remove the d_delete() call. However in older kernels to which this might be backported, that would change the behaviour of nfs_unlink(). nfs_unlink() used to unhash the dentry which resulted in nfs_dentry_handle_enoent() not calling d_delete(). So in older kernels we need the d_delete() in nfs_dentry_remove_handle_error() when called from nfs_unlink() but not when called from nfs_rmdir(). To make the code work correctly for old and new kernels, and from both nfs_unlink() and nfs_rmdir(), we protect the d_delete() call with simple_positive(). This ensures it is never called in a circumstance where it could crash. Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Fixes: 9019fb391de0 ("NFS: Label the dentry with a verifier in nfs_rmdir() and nfs_unlink()") Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysnfsd: provide locking for v4_end_graceNeilBrown4-5/+44
[ Upstream commit 2857bd59feb63fcf40fe4baf55401baea6b4feb4 ] Writing to v4_end_grace can race with server shutdown and result in memory being accessed after it was freed - reclaim_str_hashtbl in particularly. We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is held while client_tracking_op->init() is called and that can wait for an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a deadlock. nfsd4_end_grace() is also called by the landromat work queue and this doesn't require locking as server shutdown will stop the work and wait for it before freeing anything that nfsd4_end_grace() might access. However, we must be sure that writing to v4_end_grace doesn't restart the work item after shutdown has already waited for it. For this we add a new flag protected with nn->client_lock. It is set only while it is safe to make client tracking calls, and v4_end_grace only schedules work while the flag is set with the spinlock held. So this patch adds a nfsd_net field "client_tracking_active" which is set as described. Another field "grace_end_forced", is set when v4_end_grace is written. After this is set, and providing client_tracking_active is set, the laundromat is scheduled. This "grace_end_forced" field bypasses other checks for whether the grace period has finished. This resolves a race which can result in use-after-free. Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neil@brown.name> Tested-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysNFS: Fix up the automount fs_context to use the correct credTrond Myklebust1-0/+5
[ Upstream commit a2a8fc27dd668e7562b5326b5ed2f1604cb1e2e9 ] When automounting, the fs_context should be fixed up to use the cred from the parent filesystem, since the operation is just extending the namespace. Authorisation to enter that namespace will already have been provided by the preceding lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysNFSv4: ensure the open stateid seqid doesn't go backwardsScott Mayhew2-2/+12
[ Upstream commit 2e47c3cc64b44b0b06cd68c2801db92ff143f2b2 ] We have observed an NFSv4 client receiving a LOCK reply with a status of NFS4ERR_OLD_STATEID and subsequently retrying the LOCK request with an earlier seqid value in the stateid. As this was for a new lockowner, that would imply that nfs_set_open_stateid_locked() had updated the open stateid seqid with an earlier value. Looking at nfs_set_open_stateid_locked(), if the incoming seqid is out of sequence, the task will sleep on the state->waitq for up to 5 seconds. If the task waits for the full 5 seconds, then after finishing the wait it'll update the open stateid seqid with whatever value the incoming seqid has. If there are multiple waiters in this scenario, then the last one to perform said update may not be the one with the highest seqid. Add a check to ensure that the seqid can only be incremented, and add a tracepoint to indicate when old seqids are skipped. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all()Ye Bin3-25/+13
[ Upstream commit 5701875f9609b000d91351eaa6bfd97fe2f157f4 ] There's issue as follows: BUG: KASAN: use-after-free in ext4_xattr_inode_dec_ref_all+0x6ff/0x790 Read of size 4 at addr ffff88807b003000 by task syz-executor.0/15172 CPU: 3 PID: 15172 Comm: syz-executor.0 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0xbe/0xfd lib/dump_stack.c:123 print_address_description.constprop.0+0x1e/0x280 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 ext4_xattr_inode_dec_ref_all+0x6ff/0x790 fs/ext4/xattr.c:1137 ext4_xattr_delete_inode+0x4c7/0xda0 fs/ext4/xattr.c:2896 ext4_evict_inode+0xb3b/0x1670 fs/ext4/inode.c:323 evict+0x39f/0x880 fs/inode.c:622 iput_final fs/inode.c:1746 [inline] iput fs/inode.c:1772 [inline] iput+0x525/0x6c0 fs/inode.c:1758 ext4_orphan_cleanup fs/ext4/super.c:3298 [inline] ext4_fill_super+0x8c57/0xba40 fs/ext4/super.c:5300 mount_bdev+0x355/0x410 fs/super.c:1446 legacy_get_tree+0xfe/0x220 fs/fs_context.c:611 vfs_get_tree+0x8d/0x2f0 fs/super.c:1576 do_new_mount fs/namespace.c:2983 [inline] path_mount+0x119a/0x1ad0 fs/namespace.c:3316 do_mount+0xfc/0x110 fs/namespace.c:3329 __do_sys_mount fs/namespace.c:3540 [inline] __se_sys_mount+0x219/0x2e0 fs/namespace.c:3514 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Memory state around the buggy address: ffff88807b002f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88807b002f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88807b003000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88807b003080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88807b003100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Above issue happens as ext4_xattr_delete_inode() isn't check xattr is valid if xattr is in inode. To solve above issue call xattr_check_inode() check if xattr if valid in inode. In fact, we can directly verify in ext4_iget_extra_inode(), so that there is no divergent verification. Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250208063141.1539283-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: David Nyström <david.nystrom@est.tech> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysext4: introduce ITAIL helperYe Bin2-5/+8
[ Upstream commit 69f3a3039b0d0003de008659cafd5a1eaaa0a7a4 ] Introduce ITAIL helper to get the bound of xattr in inode. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250208063141.1539283-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: David Nyström <david.nystrom@est.tech> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysbtrfs: do not clean up repair bio if submit failsJosef Bacik1-8/+7
[ Upstream commit 8cbc3001a3264d998d6b6db3e23f935c158abd4d ] The submit helper will always run bio_endio() on the bio if it fails to submit, so cleaning up the bio just leads to a variety of use-after-free and NULL pointer dereference bugs because we race with the endio function that is cleaning up the bio. Instead just return BLK_STS_OK as the repair function has to continue to process the rest of the pages, and the endio for the repair bio will do the appropriate cleanup for the page that it was given. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> [Minor context change fixed.] Signed-off-by: Bin Lan <bin.lan.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Keerthana: Backported the patch to v5.10.y ] Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysovl: Use "buf" flexible array for memcpy() destinationKees Cook2-2/+2
commit cf8aa9bf97cadf85745506c6a3e244b22c268d63 upstream. The "buf" flexible array needs to be the memcpy() destination to avoid false positive run-time warning from the recent FORTIFY_SOURCE hardening: memcpy: detected field-spanning write (size 93) of single field "&fh->fb" at fs/overlayfs/export.c:799 (size 21) Reported-by: syzbot+9d14351a171d0d1c7955@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/000000000000763a6c05e95a5985@google.com/ Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Shivani: Modified to apply on 5.10.y] Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysNFSD: NFSv4 file creation neglects setting ACLChuck Lever1-1/+1
[ Upstream commit 913f7cf77bf14c13cfea70e89bcb6d0b22239562 ] An NFSv4 client that sets an ACL with a named principal during file creation retrieves the ACL afterwards, and finds that it is only a default ACL (based on the mode bits) and not the ACL that was requested during file creation. This violates RFC 8881 section 6.4.1.3: "the ACL attribute is set as given". The issue occurs in nfsd_create_setattr(). On 6.1.y, the check to determine whether nfsd_setattr() should be called is simply "iap->ia_valid", which only accounts for iattr changes. When only an ACL is present (and no iattr fields are set), nfsd_setattr() is skipped and the POSIX ACL is never applied to the inode. Subsequently, when the client retrieves the ACL, the server finds no POSIX ACL on the inode and returns one generated from the file's mode bits rather than returning the originally-specified ACL. Reported-by: Aurelien Couderc <aurelien.couderc2002@gmail.com> Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: stable@vger.kernel.org [ cel: Adjust nfsd_create_setattr() instead of nfsd_attrs_valid() ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 dayslockd: fix vfs_test_lock() callsNeilBrown4-18/+25
[ Upstream commit a49a2a1baa0c553c3548a1c414b6a3c005a8deba ] Usage of vfs_test_lock() is somewhat confused. Documentation suggests it is given a "lock" but this is not the case. It is given a struct file_lock which contains some details of the sort of lock it should be looking for. In particular passing a "file_lock" containing fl_lmops or fl_ops is meaningless and possibly confusing. This is particularly problematic in lockd. nlmsvc_testlock() receives an initialised "file_lock" from xdr-decode, including manager ops and an owner. It then mistakenly passes this to vfs_test_lock() which might replace the owner and the ops. This can lead to confusion when freeing the lock. The primary role of the 'struct file_lock' passed to vfs_test_lock() is to report a conflicting lock that was found, so it makes more sense for nlmsvc_testlock() to pass "conflock", which it uses for returning the conflicting lock. With this change, freeing of the lock is not confused and code in __nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified. Documentation for vfs_test_lock() is improved to reflect its real purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the future. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com Signed-off-by: NeilBrown <neil@brown.name> Fixes: 20fa19027286 ("nfs: add export operations") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ adapted c.flc_* field accesses to direct fl_* fields ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysNFSD: Clear SECLABEL in the suppattr_exclcreat bitmapChuck Lever1-0/+5
[ Upstream commit 27d17641cacfedd816789b75d342430f6b912bd2 ] >>From RFC 8881: 5.8.1.14. Attribute 75: suppattr_exclcreat > The bit vector that would set all REQUIRED and RECOMMENDED > attributes that are supported by the EXCLUSIVE4_1 method of file > creation via the OPEN operation. The scope of this attribute > applies to all objects with a matching fsid. There's nothing in RFC 8881 that states that suppattr_exclcreat is or is not allowed to contain bits for attributes that are clear in the reported supported_attrs bitmask. But it doesn't make sense for an NFS server to indicate that it /doesn't/ implement an attribute, but then also indicate that clients /are/ allowed to set that attribute using OPEN(create) with EXCLUSIVE4_1. Ensure that the SECURITY_LABEL and ACL bits are not set in the suppattr_exclcreat bitmask when they are also not set in the supported_attrs bitmask. Fixes: 8c18f2052e75 ("nfsd41: SUPPATTR_EXCLCREAT attribute") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysjbd2: fix the inconsistency between checksum and data in memory for journal sbYe Bin1-0/+14
[ Upstream commit 6abfe107894af7e8ce3a2e120c619d81ee764ad5 ] Copying the file system while it is mounted as read-only results in a mount failure: [~]# mkfs.ext4 -F /dev/sdc [~]# mount /dev/sdc -o ro /mnt/test [~]# dd if=/dev/sdc of=/dev/sda bs=1M [~]# mount /dev/sda /mnt/test1 [ 1094.849826] JBD2: journal checksum error [ 1094.850927] EXT4-fs (sda): Could not load journal inode mount: mount /dev/sda on /mnt/test1 failed: Bad message The process described above is just an abstracted way I came up with to reproduce the issue. In the actual scenario, the file system was mounted read-only and then copied while it was still mounted. It was found that the mount operation failed. The user intended to verify the data or use it as a backup, and this action was performed during a version upgrade. Above issue may happen as follows: ext4_fill_super set_journal_csum_feature_set(sb) if (ext4_has_metadata_csum(sb)) incompat = JBD2_FEATURE_INCOMPAT_CSUM_V3; if (test_opt(sb, JOURNAL_CHECKSUM) jbd2_journal_set_features(sbi->s_journal, compat, 0, incompat); lock_buffer(journal->j_sb_buffer); sb->s_feature_incompat |= cpu_to_be32(incompat); //The data in the journal sb was modified, but the checksum was not updated, so the data remaining in memory has a mismatch between the data and the checksum. unlock_buffer(journal->j_sb_buffer); In this case, the journal sb copied over is in a state where the checksum and data are inconsistent, so mounting fails. To solve the above issue, update the checksum in memory after modifying the journal sb. Fixes: 4fd5ea43bc11 ("jbd2: checksum journal superblock") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251103010123.3753631-1-yebin@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org [ Changed jbd2_superblock_csum(sb) to jbd2_superblock_csum(journal, sb) ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysf2fs: fix to avoid updating zero-sized extent in extent cacheChao Yu1-1/+2
[ Upstream commit 7c37c79510329cd951a4dedf3f7bf7e2b18dccec ] As syzbot reported: F2FS-fs (loop0): __update_extent_tree_range: extent len is zero, type: 0, extent [0, 0, 0], age [0, 0] ------------[ cut here ]------------ kernel BUG at fs/f2fs/extent_cache.c:678! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5336 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:__update_extent_tree_range+0x13bc/0x1500 fs/f2fs/extent_cache.c:678 Call Trace: <TASK> f2fs_update_read_extent_cache_range+0x192/0x3e0 fs/f2fs/extent_cache.c:1085 f2fs_do_zero_range fs/f2fs/file.c:1657 [inline] f2fs_zero_range+0x10c1/0x1580 fs/f2fs/file.c:1737 f2fs_fallocate+0x583/0x990 fs/f2fs/file.c:2030 vfs_fallocate+0x669/0x7e0 fs/open.c:342 ioctl_preallocate fs/ioctl.c:289 [inline] file_ioctl+0x611/0x780 fs/ioctl.c:-1 do_vfs_ioctl+0xb33/0x1430 fs/ioctl.c:576 __do_sys_ioctl fs/ioctl.c:595 [inline] __se_sys_ioctl+0x82/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f07bc58eec9 In error path of f2fs_zero_range(), it may add a zero-sized extent into extent cache, it should be avoided. Fixes: 6e9619499f53 ("f2fs: support in batch fzero in dnode page") Cc: stable@kernel.org Reported-by: syzbot+24124df3170c3638b35f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68e5d698.050a0220.256323.0032.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysf2fs: fix to propagate error from f2fs_enable_checkpoint()Chao Yu1-9/+15
[ Upstream commit be112e7449a6e1b54aa9feac618825d154b3a5c7 ] In order to let userspace detect such error rather than suffering silent failure. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Cc: stable@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ adapted error handling to use restore_gc ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysf2fs: fix to detect recoverable inode during dryrun of find_fsync_dnodes()Chao Yu1-3/+6
[ Upstream commit 68d05693f8c031257a0822464366e1c2a239a512 ] mkfs.f2fs -f /dev/vdd mount /dev/vdd /mnt/f2fs touch /mnt/f2fs/foo sync # avoid CP_UMOUNT_FLAG in last f2fs_checkpoint.ckpt_flags touch /mnt/f2fs/bar f2fs_io fsync /mnt/f2fs/bar f2fs_io shutdown 2 /mnt/f2fs umount /mnt/f2fs blockdev --setro /dev/vdd mount /dev/vdd /mnt/f2fs mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only. For the case if we create and fsync a new inode before sudden power-cut, without norecovery or disable_roll_forward mount option, the following mount will succeed w/o recovering last fsynced inode. The problem here is that we only check inode_list list after find_fsync_dnodes() in f2fs_recover_fsync_data() to find out whether there is recoverable data in the iamge, but there is a missed case, if last fsynced inode is not existing in last checkpoint, then, we will fail to get its inode due to nat of inode node is not existing in last checkpoint, so the inode won't be linked in inode_list. Let's detect such case in dyrun mode to fix this issue. After this change, mount will fail as expected below: mount: /mnt/f2fs: cannot mount /dev/vdd read-only. dmesg(1) may have more information after failed mount system call. demsg: F2FS-fs (vdd): Need to recover fsync data, but write access unavailable, please try mount w/ disable_roll_forward or norecovery Cc: stable@kernel.org Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ folio => page ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysf2fs: use global inline_xattr_slab instead of per-sb slab cacheChao Yu4-36/+24
[ Upstream commit 1f27ef42bb0b7c0740c5616ec577ec188b8a1d05 ] As Hong Yun reported in mailing list: loop7: detected capacity change from 0 to 131072 ------------[ cut here ]------------ kmem_cache of name 'f2fs_xattr_entry-7:7' already exists WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 kmem_cache_sanity_check mm/slab_common.c:109 [inline] WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 __kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307 CPU: 0 UID: 0 PID: 24426 Comm: syz.7.1370 Not tainted 6.17.0-rc4 #1 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:kmem_cache_sanity_check mm/slab_common.c:109 [inline] RIP: 0010:__kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307 Call Trace:  __kmem_cache_create include/linux/slab.h:353 [inline]  f2fs_kmem_cache_create fs/f2fs/f2fs.h:2943 [inline]  f2fs_init_xattr_caches+0xa5/0xe0 fs/f2fs/xattr.c:843  f2fs_fill_super+0x1645/0x2620 fs/f2fs/super.c:4918  get_tree_bdev_flags+0x1fb/0x260 fs/super.c:1692  vfs_get_tree+0x43/0x140 fs/super.c:1815  do_new_mount+0x201/0x550 fs/namespace.c:3808  do_mount fs/namespace.c:4136 [inline]  __do_sys_mount fs/namespace.c:4347 [inline]  __se_sys_mount+0x298/0x2f0 fs/namespace.c:4324  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]  do_syscall_64+0x8e/0x3a0 arch/x86/entry/syscall_64.c:94  entry_SYSCALL_64_after_hwframe+0x76/0x7e The bug can be reproduced w/ below scripts: - mount /dev/vdb /mnt1 - mount /dev/vdc /mnt2 - umount /mnt1 - mounnt /dev/vdb /mnt1 The reason is if we created two slab caches, named f2fs_xattr_entry-7:3 and f2fs_xattr_entry-7:7, and they have the same slab size. Actually, slab system will only create one slab cache core structure which has slab name of "f2fs_xattr_entry-7:3", and two slab caches share the same structure and cache address. So, if we destroy f2fs_xattr_entry-7:3 cache w/ cache address, it will decrease reference count of slab cache, rather than release slab cache entirely, since there is one more user has referenced the cache. Then, if we try to create slab cache w/ name "f2fs_xattr_entry-7:3" again, slab system will find that there is existed cache which has the same name and trigger the warning. Let's changes to use global inline_xattr_slab instead of per-sb slab cache for fixing. Fixes: a999150f4fe3 ("f2fs: use kmem_cache pool during inline xattr lookups") Cc: stable@kernel.org Reported-by: Hong Yun <yhong@link.cuhk.edu.hk> Tested-by: Hong Yun <yhong@link.cuhk.edu.hk> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ No f2fs_kmem_cache_alloc() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysxfs: fix a memory leak in xfs_buf_item_init()Haoxiang Li1-0/+1
[ Upstream commit fc40459de82543b565ebc839dca8f7987f16f62e ] xfs_buf_item_get_format() may allocate memory for bip->bli_formats, free the memory in the error path. Fixes: c3d5f0c2fb85 ("xfs: complain if anyone tries to create a too-large buffer log item") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysbtrfs: don't rewrite ret from inode_permissionJosef Bacik1-3/+1
[ Upstream commit 0185c2292c600993199bc6b1f342ad47a9e8c678 ] In our user safe ino resolve ioctl we'll just turn any ret into -EACCES from inode_permission(). This is redundant, and could potentially be wrong if we had an ENOMEM in the security layer or some such other error, so simply return the actual return value. Note: The patch was taken from v5 of fscrypt patchset (https://lore.kernel.org/linux-btrfs/cover.1706116485.git.josef@toxicpanda.com/) which was handled over time by various people: Omar Sandoval, Sweet Tea Dorminy, Josef Bacik. Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Daniel Vacek <neelx@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add note ] Signed-off-by: David Sterba <dsterba@suse.com> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysext4: fix string copying in parse_apply_sb_mount_options()Fedor Pchelkin1-3/+4
[ Upstream commit ee5a977b4e771cc181f39d504426dbd31ed701cc ] strscpy_pad() can't be used to copy a non-NUL-term string into a NUL-term string of possibly bigger size. Commit 0efc5990bca5 ("string.h: Introduce memtostr() and memtostr_pad()") provides additional information in that regard. So if this happens, the following warning is observed: strnlen: detected buffer overflow: 65 byte read of buffer size 64 WARNING: CPU: 0 PID: 28655 at lib/string_helpers.c:1032 __fortify_report+0x96/0xc0 lib/string_helpers.c:1032 Modules linked in: CPU: 0 UID: 0 PID: 28655 Comm: syz-executor.3 Not tainted 6.12.54-syzkaller-00144-g5f0270f1ba00 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:__fortify_report+0x96/0xc0 lib/string_helpers.c:1032 Call Trace: <TASK> __fortify_panic+0x1f/0x30 lib/string_helpers.c:1039 strnlen include/linux/fortify-string.h:235 [inline] sized_strscpy include/linux/fortify-string.h:309 [inline] parse_apply_sb_mount_options fs/ext4/super.c:2504 [inline] __ext4_fill_super fs/ext4/super.c:5261 [inline] ext4_fill_super+0x3c35/0xad00 fs/ext4/super.c:5706 get_tree_bdev_flags+0x387/0x620 fs/super.c:1636 vfs_get_tree+0x93/0x380 fs/super.c:1814 do_new_mount fs/namespace.c:3553 [inline] path_mount+0x6ae/0x1f70 fs/namespace.c:3880 do_mount fs/namespace.c:3893 [inline] __do_sys_mount fs/namespace.c:4103 [inline] __se_sys_mount fs/namespace.c:4080 [inline] __x64_sys_mount+0x280/0x300 fs/namespace.c:4080 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x64/0x140 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e Since userspace is expected to provide s_mount_opts field to be at most 63 characters long with the ending byte being NUL-term, use a 64-byte buffer which matches the size of s_mount_opts, so that strscpy_pad() does its job properly. Return with error if the user still managed to provide a non-NUL-term string here. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 8ecb790ea8c3 ("ext4: avoid potential buffer over-read in parse_apply_sb_mount_options()") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251101160430.222297-1-pchelkin@ispras.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu> [ goto failed_mount instead of return ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysnfsd: Drop the client reference in client_states_open()Haoxiang Li1-1/+3
commit 1f941b2c23fd34c6f3b76d36f9d0a2528fa92b8f upstream. In error path, call drop_client() to drop the reference obtained by get_nfsdfs_clp(). Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysnfsd: Mark variable __maybe_unused to avoid W=1 build breakAndy Shevchenko1-1/+1
commit ebae102897e760e9e6bc625f701dd666b2163bd1 upstream. Clang is not happy about set but (in some cases) unused variable: fs/nfsd/export.c:1027:17: error: variable 'inode' set but not used [-Werror,-Wunused-but-set-variable] since it's used as a parameter to dprintk() which might be configured a no-op. To avoid uglifying code with the specific ifdeffery just mark the variable __maybe_unused. The commit [1], which introduced this behaviour, is quite old and hence the Fixes tag points to the first of the Git era. Link: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0431923fb7a1 [1] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysfsnotify: do not generate ACCESS/MODIFY events on child for special filesAmir Goldstein1-1/+8
commit 635bc4def026a24e071436f4f356ea08c0eed6ff upstream. inotify/fanotify do not allow users with no read access to a file to subscribe to events (e.g. IN_ACCESS/IN_MODIFY), but they do allow the same user to subscribe for watching events on children when the user has access to the parent directory (e.g. /dev). Users with no read access to a file but with read access to its parent directory can still stat the file and see if it was accessed/modified via atime/mtime change. The same is not true for special files (e.g. /dev/null). Users will not generally observe atime/mtime changes when other users read/write to special files, only when someone sets atime/mtime via utimensat(). Align fsnotify events with this stat behavior and do not generate ACCESS/MODIFY events to parent watchers on read/write of special files. The events are still generated to parent watchers on utimensat(). This closes some side-channels that could be possibly used for information exfiltration [1]. [1] https://snee.la/pdf/pubs/file-notification-attacks.pdf Reported-by: Sudheendra Raghav Neela <sneela@tugraz.at> CC: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysocfs2: fix kernel BUG in ocfs2_find_victim_chainPrithvi Tambewagh1-0/+10
commit 039bef30e320827bac8990c9f29d2a68cd8adb5f upstream. syzbot reported a kernel BUG in ocfs2_find_victim_chain() because the `cl_next_free_rec` field of the allocation chain list (next free slot in the chain list) is 0, triggring the BUG_ON(!cl->cl_next_free_rec) condition in ocfs2_find_victim_chain() and panicking the kernel. To fix this, an if condition is introduced in ocfs2_claim_suballoc_bits(), just before calling ocfs2_find_victim_chain(), the code block in it being executed when either of the following conditions is true: 1. `cl_next_free_rec` is equal to 0, indicating that there are no free chains in the allocation chain list 2. `cl_next_free_rec` is greater than `cl_count` (the total number of chains in the allocation chain list) Either of them being true is indicative of the fact that there are no chains left for usage. This is addressed using ocfs2_error(), which prints the error log for debugging purposes, rather than panicking the kernel. Link: https://lkml.kernel.org/r/20251201130711.143900-1-activprithvi@gmail.com Signed-off-by: Prithvi Tambewagh <activprithvi@gmail.com> Reported-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=96d38c6e1655c1420a72 Tested-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysf2fs: fix return value of f2fs_recover_fsync_data()Chao Yu1-5/+9
commit 01fba45deaddcce0d0b01c411435d1acf6feab7b upstream. With below scripts, it will trigger panic in f2fs: mkfs.f2fs -f /dev/vdd mount /dev/vdd /mnt/f2fs touch /mnt/f2fs/foo sync echo 111 >> /mnt/f2fs/foo f2fs_io fsync /mnt/f2fs/foo f2fs_io shutdown 2 /mnt/f2fs umount /mnt/f2fs mount -o ro,norecovery /dev/vdd /mnt/f2fs or mount -o ro,disable_roll_forward /dev/vdd /mnt/f2fs F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 0 F2FS-fs (vdd): Mounted with checkpoint version = 7f5c361f F2FS-fs (vdd): Stopped filesystem due to reason: 0 F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 1 Filesystem f2fs get_tree() didn't set fc->root, returned 1 ------------[ cut here ]------------ kernel BUG at fs/super.c:1761! Oops: invalid opcode: 0000 [#1] SMP PTI CPU: 3 UID: 0 PID: 722 Comm: mount Not tainted 6.18.0-rc2+ #721 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:vfs_get_tree.cold+0x18/0x1a Call Trace: <TASK> fc_mount+0x13/0xa0 path_mount+0x34e/0xc50 __x64_sys_mount+0x121/0x150 do_syscall_64+0x84/0x800 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa6cc126cfe The root cause is we missed to handle error number returned from f2fs_recover_fsync_data() when mounting image w/ ro,norecovery or ro,disable_roll_forward mount option, result in returning a positive error number to vfs_get_tree(), fix it. Cc: stable@kernel.org Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysf2fs: invalidate dentry cache on failed whiteout creationDeepanshu Kartikey1-2/+4
commit d33f89b34aa313f50f9a512d58dd288999f246b0 upstream. F2FS can mount filesystems with corrupted directory depth values that get runtime-clamped to MAX_DIR_HASH_DEPTH. When RENAME_WHITEOUT operations are performed on such directories, f2fs_rename performs directory modifications (updating target entry and deleting source entry) before attempting to add the whiteout entry via f2fs_add_link. If f2fs_add_link fails due to the corrupted directory structure, the function returns an error to VFS, but the partial directory modifications have already been committed to disk. VFS assumes the entire rename operation failed and does not update the dentry cache, leaving stale mappings. In the error path, VFS does not call d_move() to update the dentry cache. This results in new_dentry still pointing to the old inode (new_inode) which has already had its i_nlink decremented to zero. The stale cache causes subsequent operations to incorrectly reference the freed inode. This causes subsequent operations to use cached dentry information that no longer matches the on-disk state. When a second rename targets the same entry, VFS attempts to decrement i_nlink on the stale inode, which may already have i_nlink=0, triggering a WARNING in drop_nlink(). Example sequence: 1. First rename (RENAME_WHITEOUT): file2 → file1 - f2fs updates file1 entry on disk (points to inode 8) - f2fs deletes file2 entry on disk - f2fs_add_link(whiteout) fails (corrupted directory) - Returns error to VFS - VFS does not call d_move() due to error - VFS cache still has: file1 → inode 7 (stale!) - inode 7 has i_nlink=0 (already decremented) 2. Second rename: file3 → file1 - VFS uses stale cache: file1 → inode 7 - Tries to drop_nlink on inode 7 (i_nlink already 0) - WARNING in drop_nlink() Fix this by explicitly invalidating old_dentry and new_dentry when f2fs_add_link fails during whiteout creation. This forces VFS to refresh from disk on subsequent operations, ensuring cache consistency even when the rename partially succeeds. Reproducer: 1. Mount F2FS image with corrupted i_current_depth 2. renameat2(file2, file1, RENAME_WHITEOUT) 3. renameat2(file3, file1, 0) 4. System triggers WARNING in drop_nlink() Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT") Reported-by: syzbot+632cf32276a9a564188d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=632cf32276a9a564188d Suggested-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/all/20251022233349.102728-1-kartikey406@gmail.com/ [v1] Cc: stable@vger.kernel.org Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysNFSD: use correct reservation type in nfsd4_scsi_fence_clientDai Ngo1-1/+2
commit 6f52063db9aabdaabea929b1e998af98c2e8d917 upstream. The reservation type argument for the pr_preempt call should match the one used in nfsd4_block_get_device_info_scsi. Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support") Cc: stable@vger.kernel.org Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysjbd2: use a weaker annotation in journal handlingByungchul Park1-1/+1
commit 40a71b53d5a6d4ea17e4d54b99b2ac03a7f5e783 upstream. jbd2 journal handling code doesn't want jbd2_might_wait_for_commit() to be placed between start_this_handle() and stop_this_handle(). So it marks the region with rwsem_acquire_read() and rwsem_release(). However, the annotation is too strong for that purpose. We don't have to use more than try lock annotation for that. rwsem_acquire_read() implies: 1. might be a waiter on contention of the lock. 2. enter to the critical section of the lock. All we need in here is to act 2, not 1. So trylock version of annotation is sufficient for that purpose. Now that dept partially relies on lockdep annotaions, dept interpets rwsem_acquire_read() as a potential wait and might report a deadlock by the wait. Replace it with trylock version of annotation. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Message-ID: <20251024073940.1063-1-byungchul@sk.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysext4: fix incorrect group number assertion in mb_check_buddyYongjian Sun1-0/+2
commit 3f7a79d05c692c7cfec70bf104b1b3c3d0ce6247 upstream. When the MB_CHECK_ASSERT macro is enabled, an assertion failure can occur in __mb_check_buddy when checking preallocated blocks (pa) in a block group: Assertion failure in mb_free_blocks() : "groupnr == e4b->bd_group" This happens when a pa at the very end of a block group (e.g., pa_pstart=32765, pa_len=3 in a group of 32768 blocks) becomes exhausted - its pa_pstart is advanced by pa_len to 32768, which lies in the next block group. If this exhausted pa (with pa_len == 0) is still in the bb_prealloc_list during the buddy check, the assertion incorrectly flags it as belonging to the wrong group. A possible sequence is as follows: ext4_mb_new_blocks ext4_mb_release_context pa->pa_pstart += EXT4_C2B(sbi, ac->ac_b_ex.fe_len) pa->pa_len -= ac->ac_b_ex.fe_len __mb_check_buddy for each pa in group ext4_get_group_no_and_offset MB_CHECK_ASSERT(groupnr == e4b->bd_group) To fix this, we modify the check to skip block group validation for exhausted preallocations (where pa_len == 0). Such entries are in a transitional state and will be removed from the list soon, so they should not trigger an assertion. This change prevents the false positive while maintaining the integrity of the checks for active allocations. Fixes: c9de560ded61f ("ext4: Add multi block allocator for ext4") Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251106060614.631382-2-sunyongjian@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysext4: xattr: fix null pointer deref in ext4_raw_inode()Karina Yankevich1-1/+5
commit b97cb7d6a051aa6ebd57906df0e26e9e36c26d14 upstream. If ext4_get_inode_loc() fails (e.g. if it returns -EFSCORRUPTED), iloc.bh will remain set to NULL. Since ext4_xattr_inode_dec_ref_all() lacks error checking, this will lead to a null pointer dereference in ext4_raw_inode(), called right after ext4_get_inode_loc(). Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: c8e008b60492 ("ext4: ignore xattrs past end") Cc: stable@kernel.org Signed-off-by: Karina Yankevich <k.yankevich@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Baokun Li <libaokun1@huawei.com> Message-ID: <20251022093253.3546296-1-k.yankevich@omp.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysexfat: fix remount failure in different process environmentsYuezhang Mo1-4/+15
[ Upstream commit 51fc7b4ce10ccab8ea5e4876bcdc42cf5202a0ef ] The kernel test robot reported that the exFAT remount operation failed. The reason for the failure was that the process's umask is different between mount and remount, causing fs_fmask and fs_dmask are changed. Potentially, both gid and uid may also be changed. Therefore, when initializing fs_context for remount, inherit these mount options from the options used during mount. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202511251637.81670f5c-lkp@intel.com Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysbtrfs: scrub: always update btrfs_scrub_progress::last_physicalQu Wenruo1-0/+5
[ Upstream commit 54df8b80cc63aa0f22c4590cad11542731ed43ff ] [BUG] When a scrub failed immediately without any byte scrubbed, the returned btrfs_scrub_progress::last_physical will always be 0, even if there is a non-zero @start passed into btrfs_scrub_dev() for resume cases. This will reset the progress and make later scrub resume start from the beginning. [CAUSE] The function btrfs_scrub_dev() accepts a @progress parameter to copy its updated progress to the caller, there are cases where we either don't touch progress::last_physical at all or copy 0 into last_physical: - last_physical not updated at all If some error happened before scrubbing any super block or chunk, we will not copy the progress, leaving the @last_physical untouched. E.g. failed to allocate @sctx, scrubbing a missing device or even there is already a running scrub and so on. All those cases won't touch @progress at all, resulting the last_physical untouched and will be left as 0 for most cases. - Error out before scrubbing any bytes In those case we allocated @sctx, and sctx->stat.last_physical is all zero (initialized by kvzalloc()). Unfortunately some critical errors happened during scrub_enumerate_chunks() or scrub_supers() before any stripe is really scrubbed. In that case although we will copy sctx->stat back to @progress, since no byte is really scrubbed, last_physical will be overwritten to 0. [FIX] Make sure the parameter @progress always has its @last_physical member updated to @start parameter inside btrfs_scrub_dev(). At the very beginning of the function, set @progress->last_physical to @start, so that even if we error out without doing progress copying, last_physical is still at @start. Then after we got @sctx allocated, set sctx->stat.last_physical to @start, this will make sure even if we didn't get any byte scrubbed, at the progress copying stage the @last_physical is not left as zero. This should resolve the resume progress reset problem. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 dayshfsplus: fix volume corruption issue for generic/073Viacheslav Dubeyko1-1/+6
[ Upstream commit 24e17a29cf7537f0947f26a50f85319abd723c6c ] The xfstests' test-case generic/073 leaves HFS+ volume in corrupted state: sudo ./check generic/073 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.17.0-rc1+ #4 SMP PREEMPT_DYNAMIC Wed Oct 1 15:02:44 PDT 2025 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/073 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent (see XFSTESTS-2/xfstests-dev/results//generic/073.full for details) Ran: generic/073 Failures: generic/073 Failed 1 of 1 tests sudo fsck.hfsplus -d /dev/loop51 ** /dev/loop51 Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K. Executing fsck_hfs (version 540.1-Linux). ** Checking non-journaled HFS Plus Volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. Invalid directory item count (It should be 1 instead of 0) ** Checking extended attributes file. ** Checking volume bitmap. ** Checking volume information. Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000 CBTStat = 0x0000 CatStat = 0x00004000 ** Repairing volume. ** Rechecking volume. ** Checking non-journaled HFS Plus Volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. ** Checking extended attributes file. ** Checking volume bitmap. ** Checking volume information. ** The volume untitled was repaired successfully. The test is doing these steps on final phase: mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo So, we move file bar from testdir_1 into testdir_2 folder. It means that HFS+ logic decrements the number of entries in testdir_1 and increments number of entries in testdir_2. Finally, we do fsync only for testdir_1 and foo but not for testdir_2. As a result, this is the reason why fsck.hfsplus detects the volume corruption afterwards. This patch fixes the issue by means of adding the hfsplus_cat_write_inode() call for old_dir and new_dir in hfsplus_rename() after the successful ending of hfsplus_rename_cat(). This method makes modification of in-core inode objects for old_dir and new_dir but it doesn't save these modifications in Catalog File's entries. It was expected that hfsplus_write_inode() will save these modifications afterwards. However, because generic/073 does fsync only for testdir_1 and foo then testdir_2 modification hasn't beed saved into Catalog File's entry and it was flushed without this modification. And it was detected by fsck.hfsplus. Now, hfsplus_rename() stores in Catalog File all modified entries and correct state of Catalog File will be flushed during hfsplus_file_fsync() call. Finally, it makes fsck.hfsplus happy. sudo ./check generic/073 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc3+ #93 SMP PREEMPT_DYNAMIC Wed Nov 12 14:37:49 PST 2025 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/073 32s ... 32s Ran: generic/073 Passed all 1 tests Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20251112232522.814038-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 dayshfsplus: Verify inode mode when loading from diskTetsuo Handa1-4/+28
[ Upstream commit 005d4b0d33f6b4a23d382b7930f7a96b95b01f39 ] syzbot is reporting that S_IFMT bits of inode->i_mode can become bogus when the S_IFMT bits of the 16bits "mode" field loaded from disk are corrupted. According to [1], the permissions field was treated as reserved in Mac OS 8 and 9. According to [2], the reserved field was explicitly initialized with 0, and that field must remain 0 as long as reserved. Therefore, when the "mode" field is not 0 (i.e. no longer reserved), the file must be S_IFDIR if dir == 1, and the file must be one of S_IFREG/S_IFLNK/S_IFCHR/ S_IFBLK/S_IFIFO/S_IFSOCK if dir == 0. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Link: https://developer.apple.com/library/archive/technotes/tn/tn1150.html#HFSPlusPermissions [1] Link: https://developer.apple.com/library/archive/technotes/tn/tn1150.html#ReservedAndPadFields [2] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/04ded9f9-73fb-496c-bfa5-89c4f5d1d7bb@I-love.SAKURA.ne.jp Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 dayshfsplus: fix missing hfs_bnode_get() in __hfs_bnode_createYang Chenzhi1-0/+1
[ Upstream commit 152af114287851583cf7e0abc10129941f19466a ] When sync() and link() are called concurrently, both threads may enter hfs_bnode_find() without finding the node in the hash table and proceed to create it. Thread A: hfsplus_write_inode() -> hfsplus_write_system_inode() -> hfs_btree_write() -> hfs_bnode_find(tree, 0) -> __hfs_bnode_create(tree, 0) Thread B: hfsplus_create_cat() -> hfs_brec_insert() -> hfs_bnode_split() -> hfs_bmap_alloc() -> hfs_bnode_find(tree, 0) -> __hfs_bnode_create(tree, 0) In this case, thread A creates the bnode, sets refcnt=1, and hashes it. Thread B also tries to create the same bnode, notices it has already been inserted, drops its own instance, and uses the hashed one without getting the node. ``` node2 = hfs_bnode_findhash(tree, cnid); if (!node2) { <- Thread A hash = hfs_bnode_hash(cnid); node->next_hash = tree->node_hash[hash]; tree->node_hash[hash] = node; tree->node_hash_cnt++; } else { <- Thread B spin_unlock(&tree->hash_lock); kfree(node); wait_event(node2->lock_wq, !test_bit(HFS_BNODE_NEW, &node2->flags)); return node2; } ``` However, hfs_bnode_find() requires each call to take a reference. Here both threads end up setting refcnt=1. When they later put the node, this triggers: BUG_ON(!atomic_read(&node->refcnt)) In this scenario, Thread B in fact finds the node in the hash table rather than creating a new one, and thus must take a reference. Fix this by calling hfs_bnode_get() when reusing a bnode newly created by another thread to ensure the refcount is updated correctly. A similar bug was fixed in HFS long ago in commit a9dc087fd3c4 ("fix missing hfs_bnode_get() in __hfs_bnode_create") but the same issue remained in HFS+ until now. Reported-by: syzbot+005d2a9ecd9fbf525f6a@syzkaller.appspotmail.com Signed-off-by: Yang Chenzhi <yang.chenzhi@vivo.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20250829093912.611853-1-yang.chenzhi@vivo.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 dayshfsplus: fix volume corruption issue for generic/070Viacheslav Dubeyko1-2/+1
[ Upstream commit ed490f36f439b877393c12a2113601e4145a5a56 ] The xfstests' test-case generic/070 leaves HFS+ volume in corrupted state: sudo ./check generic/070 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.17.0-rc1+ #4 SMP PREEMPT_DYNAMIC Wed Oct 1 15:02:44 PDT 2025 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/070 _check_generic_filesystem: filesystem on /dev/loop50 is inconsistent (see xfstests-dev/results//generic/070.full for details) Ran: generic/070 Failures: generic/070 Failed 1 of 1 tests sudo fsck.hfsplus -d /dev/loop50 ** /dev/loop50 Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K. Executing fsck_hfs (version 540.1-Linux). ** Checking non-journaled HFS Plus Volume. The volume name is test ** Checking extents overflow file. Unused node is not erased (node = 1) ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. ** Checking extended attributes file. ** Checking volume bitmap. ** Checking volume information. Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0004 CBTStat = 0x0000 CatStat = 0x00000000 ** Repairing volume. ** Rechecking volume. ** Checking non-journaled HFS Plus Volume. The volume name is test ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. ** Checking extended attributes file. ** Checking volume bitmap. ** Checking volume information. ** The volume test was repaired successfully. It is possible to see that fsck.hfsplus detected not erased and unused node for the case of extents overflow file. The HFS+ logic has special method that defines if the node should be erased: bool hfs_bnode_need_zeroout(struct hfs_btree *tree) { struct super_block *sb = tree->inode->i_sb; struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb); const u32 volume_attr = be32_to_cpu(sbi->s_vhdr->attributes); return tree->cnid == HFSPLUS_CAT_CNID && volume_attr & HFSPLUS_VOL_UNUSED_NODE_FIX; } However, it is possible to see that this method works only for the case of catalog file. But debugging of the issue has shown that HFSPLUS_VOL_UNUSED_NODE_FIX attribute has been requested for the extents overflow file too: catalog file kernel: hfsplus: node 4, num_recs 0, flags 0x10 kernel: hfsplus: tree->cnid 4, volume_attr 0x80000800 extents overflow file kernel: hfsplus: node 1, num_recs 0, flags 0x10 kernel: hfsplus: tree->cnid 3, volume_attr 0x80000800 This patch modifies the hfs_bnode_need_zeroout() by checking only volume_attr but not the b-tree ID because node zeroing can be requested for all HFS+ b-tree types. sudo ./check generic/070 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc3+ #79 SMP PREEMPT_DYNAMIC Fri Oct 31 16:07:42 PDT 2025 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/070 33s ... 34s Ran: generic/070 Passed all 1 tests Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20251101001229.247432-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysbtrfs: fix memory leak of fs_devices in degraded seed device pathDeepanshu Kartikey1-0/+1
[ Upstream commit b57f2ddd28737db6ff0e9da8467f0ab9d707e997 ] In open_seed_devices(), when find_fsid() fails and we're in DEGRADED mode, a new fs_devices is allocated via alloc_fs_devices() but is never added to the seed_list before returning. This contrasts with the normal path where fs_devices is properly added via list_add(). If any error occurs later in read_one_dev() or btrfs_read_chunk_tree(), the cleanup code iterates seed_list to free seed devices, but this orphaned fs_devices is never found and never freed, causing a memory leak. Any devices allocated via add_missing_dev() and attached to this fs_devices are also leaked. Fix this by adding the newly allocated fs_devices to seed_list in the degraded path, consistent with the normal path. Fixes: 5f37583569442 ("Btrfs: move the missing device to its own fs device list") Reported-by: syzbot+eadd98df8bceb15d7fed@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=eadd98df8bceb15d7fed Tested-by: syzbot+eadd98df8bceb15d7fed@syzkaller.appspotmail.com Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysNFS: Fix missing unlock in nfs_unlink()Sun Ke1-1/+3
commit 2067231a9e2cbbcae0a4aca6ac36ff2dd6a7b701 upstream. Add the missing unlock before goto. Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Signed-off-by: Sun Ke <sunke32@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysocfs2: fix memory leak in ocfs2_merge_rec_left()Dmitry Antipov1-1/+0
[ Upstream commit 2214ec4bf89d0fd27717322d3983a2f3b469c7f3 ] In 'ocfs2_merge_rec_left()', do not reset 'left_path' to NULL after move, thus allowing 'ocfs2_free_path()' to free it before return. Link: https://lkml.kernel.org/r/20251205065159.392749-1-dmantipov@yandex.ru Fixes: 677b975282e4 ("ocfs2: Add support for cross extent block") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+cfc7cab3bb6eaa7c4de2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cfc7cab3bb6eaa7c4de2 Reviewed-by: Heming Zhao <heming.zhao@suse.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysfs/nls: Fix inconsistency between utf8_to_utf32() and utf32_to_utf8()Armin Wolf1-4/+7
[ Upstream commit c36f9d7b2869a003a2f7d6ff2c6bac9e62fd7d68 ] After commit 25524b619029 ("fs/nls: Fix utf16 to utf8 conversion"), the return values of utf8_to_utf32() and utf32_to_utf8() are inconsistent when encountering an error: utf8_to_utf32() returns -1, while utf32_to_utf8() returns errno codes. Fix this inconsistency by modifying utf8_to_utf32() to return errno codes as well. Fixes: 25524b619029 ("fs/nls: Fix utf16 to utf8 conversion") Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20251129111535.8984-1-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysNFS: Automounted filesystems should inherit ro,noexec,nodev,sync flagsTrond Myklebust2-4/+6
[ Upstream commit 8675c69816e4276b979ff475ee5fac4688f80125 ] When a filesystem is being automounted, it needs to preserve the user-set superblock mount options, such as the "ro" flag. Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Link: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/ Fixes: f2aedb713c28 ("NFS: Add fs_context support.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysfs_context: drop the unused lsm_flags memberOndrej Mosnacek1-3/+0
[ Upstream commit 4e04143c869c5b6d499fbd5083caa860d5c942c3 ] This isn't ever used by VFS now, and it couldn't even work. Any FS that uses the SECURITY_LSM_NATIVE_LABELS flag needs to also process the value returned back from the LSM, so it needs to do its security_sb_set_mnt_opts() call on its own anyway. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Stable-dep-of: 8675c69816e4 ("NFS: Automounted filesystems should inherit ro,noexec,nodev,sync flags") Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysRevert "nfs: ignore SB_RDONLY when mounting nfs"Trond Myklebust1-1/+1
[ Upstream commit d4a26d34f1946142f9d32e540490e4926ae9a46b ] This reverts commit 52cb7f8f177878b4f22397b9c4d2c8f743766be3. Silently ignoring the "ro" and "rw" mount options causes user confusion, and regressions. Reported-by: Alkis Georgopoulos<alkisg@gmail.com> Cc: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 52cb7f8f1778 ("nfs: ignore SB_RDONLY when mounting nfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysRevert "nfs: clear SB_RDONLY before getting superblock"Trond Myklebust1-9/+0
[ Upstream commit d216b698d44e33417ad4cc796cb04ccddbb8c0ee ] This reverts commit 8cd9b785943c57a136536250da80ba1eb6f8eb18. Silently ignoring the "ro" and "rw" mount options causes user confusion, and regressions. Reported-by: Alkis Georgopoulos<alkisg@gmail.com> Cc: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 8cd9b785943c ("nfs: clear SB_RDONLY before getting superblock") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysRevert "nfs: ignore SB_RDONLY when remounting nfs"Trond Myklebust1-10/+0
[ Upstream commit 400fa37afbb11a601c204b72af0f0e5bc2db695c ] This reverts commit 80c4de6ab44c14e910117a02f2f8241ffc6ec54a. Silently ignoring the "ro" and "rw" mount options causes user confusion, and regressions. Reported-by: Alkis Georgopoulos<alkisg@gmail.com> Cc: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 80c4de6ab44c ("nfs: ignore SB_RDONLY when remounting nfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysNFSv4/pNFS: Clear NFS_INO_LAYOUTCOMMIT in pnfs_mark_layout_stateid_invalidJonathan Curley1-0/+1
[ Upstream commit e0f8058f2cb56de0b7572f51cd563ca5debce746 ] Fixes a crash when layout is null during this call stack: write_inode -> nfs4_write_inode -> pnfs_layoutcommit_inode pnfs_set_layoutcommit relies on the lseg refcount to keep the layout around. Need to clear NFS_INO_LAYOUTCOMMIT otherwise we might attempt to reference a null layout. Fixes: fe1cf9469d7bc ("pNFS: Clear all layout segment state in pnfs_mark_layout_stateid_invalid") Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysfs/nls: Fix utf16 to utf8 conversionArmin Wolf1-4/+12
[ Upstream commit 25524b6190295577e4918c689644451365e6466d ] Currently the function responsible for converting between utf16 and utf8 strings will ignore any characters that cannot be converted. This however also includes multi-byte characters that do not fit into the provided string buffer. This can cause problems if such a multi-byte character is followed by a single-byte character. In such a case the multi-byte character might be ignored when the provided string buffer is too small, but the single-byte character might fit and is thus still copied into the resulting string. Fix this by stop filling the provided string buffer once a character does not fit. In order to be able to do this extend utf32_to_utf8() to return useful errno codes instead of -1. Fixes: 74675a58507e ("NLS: update handling of Unicode") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20251111131125.3379-2-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysNFS: Avoid changing nlink when file removes and attribute updates raceTrond Myklebust1-6/+13
[ Upstream commit bd4928ec799b31c492eb63f9f4a0c1e0bb4bb3f7 ] If a file removal races with another operation that updates its attributes, then skip the change to nlink, and just mark the attributes as being stale. Reported-by: Aiden Lambert <alambert48@gatech.edu> Fixes: 59a707b0d42e ("NFS: Ensure we revalidate the inode correctly after remove or rename") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysNFS: don't unhash dentry during unlink/renameNeilBrown1-18/+54
[ Upstream commit 3c59366c207e4c6c6569524af606baf017a55c61 ] NFS unlink() (and rename over existing target) must determine if the file is open, and must perform a "silly rename" instead of an unlink (or before rename) if it is. Otherwise the client might hold a file open which has been removed on the server. Consequently if it determines that the file isn't open, it must block any subsequent opens until the unlink/rename has been completed on the server. This is currently achieved by unhashing the dentry. This forces any open attempt to the slow-path for lookup which will block on i_rwsem on the directory until the unlink/rename completes. A future patch will change the VFS to only get a shared lock on i_rwsem for unlink, so this will no longer work. Instead we introduce an explicit interlock. A special value is stored in dentry->d_fsdata while the unlink/rename is running and ->d_revalidate blocks while that value is present. When ->d_revalidate unblocks, the dentry will be invalid. This closes the race without requiring exclusion on i_rwsem. d_fsdata is already used in two different ways. 1/ an IS_ROOT directory dentry might have a "devname" stored in d_fsdata. Such a dentry doesn't have a name and so cannot be the target of unlink or rename. For safety we check if an old devname is still stored, and remove it if it is. 2/ a dentry with DCACHE_NFSFS_RENAMED set will have a 'struct nfs_unlinkdata' stored in d_fsdata. While this is set maydelete() will fail, so an unlink or rename will never proceed on such a dentry. Neither of these can be in effect when a dentry is the target of unlink or rename. So we can expect d_fsdata to be NULL, and store a special value ((void*)1) which is given the name NFS_FSDATA_BLOCKED to indicate that any lookup will be blocked. The d_count() is incremented under d_lock() when a lookup finds the dentry, so we check d_count() is low, and set NFS_FSDATA_BLOCKED under the same lock to avoid any races. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Stable-dep-of: bd4928ec799b ("NFS: Avoid changing nlink when file removes and attribute updates race") Signed-off-by: Sasha Levin <sashal@kernel.org>
4 daysNFS: Label the dentry with a verifier in nfs_rmdir() and nfs_unlink()Trond Myklebust1-3/+15
[ Upstream commit 9019fb391de02cbff422090768b73afe9f6174df ] After the success of an operation such as rmdir() or unlink(), we expect to add the dentry back to the dcache as an ordinary negative dentry. However in NFS, unless it is labelled with the appropriate verifier for the parent directory state, then nfs_lookup_revalidate will end up discarding that dentry and forcing a new lookup. The fix is to ensure that we relabel the dentry appropriately on success. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Stable-dep-of: bd4928ec799b ("NFS: Avoid changing nlink when file removes and attribute updates race") Signed-off-by: Sasha Levin <sashal@kernel.org>