summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2025-05-12fuse: fix race between concurrent setattrs from multiple nodesGuang Yuan Wu1-0/+11
When mounting a user-space filesystem on multiple clients, after concurrent ->setattr() calls from different node, stale inode attributes may be cached in some node. This is caused by fuse_setattr() racing with fuse_reverse_inval_inode(). When filesystem server receives setattr request, the client node with valid iattr cached will be required to update the fuse_inode's attr_version and invalidate the cache by fuse_reverse_inval_inode(), and at the next call to ->getattr() they will be fetched from user space. The race scenario is: 1. client-1 sends setattr (iattr-1) request to server 2. client-1 receives the reply from server 3. before client-1 updates iattr-1 to the cached attributes by fuse_change_attributes_common(), server receives another setattr (iattr-2) request from client-2 4. server requests client-1 to update the inode attr_version and invalidate the cached iattr, and iattr-1 becomes staled 5. client-2 receives the reply from server, and caches iattr-2 6. continue with step 2, client-1 invokes fuse_change_attributes_common(), and caches iattr-1 The issue has been observed from concurrent of chmod, chown, or truncate, which all invoke ->setattr() call. The solution is to use fuse_inode's attr_version to check whether the attributes have been modified during the setattr request's lifetime. If so, mark the attributes as invalid in the function fuse_change_attributes_common(). Signed-off-by: Guang Yuan Wu <gwu@ddn.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-05-12crypto: lib/chacha - add strongly-typed state zeroizationEric Biggers1-2/+2
Now that the ChaCha state matrix is strongly-typed, add a helper function chacha_zeroize_state() which zeroizes it. Then convert all applicable callers to use it instead of direct memzero_explicit. No functional changes. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12crypto: lib/chacha - strongly type the ChaCha stateEric Biggers1-9/+9
The ChaCha state matrix is 16 32-bit words. Currently it is represented in the code as a raw u32 array, or even just a pointer to u32. This weak typing is error-prone. Instead, introduce struct chacha_state: struct chacha_state { u32 x[16]; }; Convert all ChaCha and HChaCha functions to use struct chacha_state. No functional changes. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12nilfs2: do not propagate ENOENT error from nilfs_btree_propagate()Ryusuke Konishi1-1/+3
In preparation for writing logs, in nilfs_btree_propagate(), which makes parent and ancestor node blocks dirty starting from a modified data block or b-tree node block, if the starting block does not belong to the b-tree, i.e. is isolated, nilfs_btree_do_lookup() called within the function fails with -ENOENT. In this case, even though -ENOENT is an internal code, it is propagated to the log writer via nilfs_bmap_propagate() and may be erroneously returned to system calls such as fsync(). Fix this issue by changing the error code to -EINVAL in this case, and having the bmap layer detect metadata corruption and convert the error code appropriately. Link: https://lkml.kernel.org/r/20250428173808.6452-3-konishi.ryusuke@gmail.com Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12nilfs2: add pointer check for nilfs_direct_propagate()Wentao Liang1-0/+3
Patch series "nilfs2: improve sanity checks in dirty state propagation". This fixes one missed check for block mapping anomalies and one improper return of an error code during a preparation step for log writing, thereby improving checking for filesystem corruption on writeback. This patch (of 2): In nilfs_direct_propagate(), the printer get from nilfs_direct_get_ptr() need to be checked to ensure it is not an invalid pointer. If the pointer value obtained by nilfs_direct_get_ptr() is NILFS_BMAP_INVALID_PTR, means that the metadata (in this case, i_bmap in the nilfs_inode_info struct) that should point to the data block at the buffer head of the argument is corrupted and the data block is orphaned, meaning that the file system has lost consistency. Add a value check and return -EINVAL when it is an invalid pointer. Link: https://lkml.kernel.org/r/20250428173808.6452-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20250428173808.6452-2-konishi.ryusuke@gmail.com Fixes: 36a580eb489f ("nilfs2: direct block mapping") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12sort.h: hoist cmp_int() into generic header fileFedor Pchelkin3-6/+2
Deduplicate the same functionality implemented in several places by moving the cmp_int() helper macro into linux/sort.h. The macro performs a three-way comparison of the arguments mostly useful in different sorting strategies and algorithms. Link: https://lkml.kernel.org/r/20250427201451.900730-1-pchelkin@ispras.ru Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Suggested-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Acked-by: Coly Li <colyli@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Coly Li <colyli@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12ocfs2: remove unnecessary NULL check before unregister_sysctl_table()Chen Ni1-2/+1
unregister_sysctl_table() checks for NULL pointers internally. Remove unneeded NULL check here. Link: https://lkml.kernel.org/r/20250422073051.1334310-1-nichen@iscas.ac.cn Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12ocfs2: fix possible memory leak in ocfs2_finish_quota_recoveryMurad Masimov1-1/+1
If ocfs2_finish_quota_recovery() exits due to an error before passing all rc_list elements to ocfs2_recover_local_quota_file() then it can lead to a memory leak as rc_list may still contain elements that have to be freed. Release all memory allocated by ocfs2_add_recovery_chunk() using ocfs2_free_quota_recovery() instead of kfree(). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Link: https://lkml.kernel.org/r/20250402065628.706359-2-m.masimov@mt-integration.ru Fixes: 2205363dce74 ("ocfs2: Implement quota recovery") Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12ocfs2: simplify return statement in ocfs2_filecheck_attr_store()Thorsten Blum1-1/+1
Don't negate 'ret' and simplify the return statement. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12ocfs2: o2net_idle_timer: Rename del_timer_sync in commentWangYuli1-1/+1
Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()") switched del_timer_sync to timer_delete_sync, but did not modify the comment for o2net_idle_timer(). Now fix it. Link: https://lkml.kernel.org/r/BDDB1E4E2876C36C+20250411102610.165946-1-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12Squashfs: check return result of sb_min_blocksizePhillip Lougher1-0/+5
Syzkaller reports an "UBSAN: shift-out-of-bounds in squashfs_bio_read" bug. Syzkaller forks multiple processes which after mounting the Squashfs filesystem, issues an ioctl("/dev/loop0", LOOP_SET_BLOCK_SIZE, 0x8000). Now if this ioctl occurs at the same time another process is in the process of mounting a Squashfs filesystem on /dev/loop0, the failure occurs. When this happens the following code in squashfs_fill_super() fails. ---- msblk->devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE); msblk->devblksize_log2 = ffz(~msblk->devblksize); ---- sb_min_blocksize() returns 0, which means msblk->devblksize is set to 0. As a result, ffz(~msblk->devblksize) returns 64, and msblk->devblksize_log2 is set to 64. This subsequently causes the UBSAN: shift-out-of-bounds in fs/squashfs/block.c:195:36 shift exponent 64 is too large for 64-bit type 'u64' (aka 'unsigned long long') This commit adds a check for a 0 return by sb_min_blocksize(). Link: https://lkml.kernel.org/r/20250409024747.876480-1-phillip@squashfs.org.uk Fixes: 0aa666190509 ("Squashfs: super block operations") Reported-by: syzbot+65761fc25a137b9c8c6e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67f0dd7a.050a0220.0a13.0230.GAE@google.com/ Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12proc: fix the issue of proc_mem_open returning NULLPenglei Jiang3-11/+17
proc_mem_open() can return an errno, NULL, or mm_struct*. If it fails to acquire mm, it returns NULL, but the caller does not check for the case when the return value is NULL. The following conditions lead to failure in acquiring mm: - The task is a kernel thread (PF_KTHREAD) - The task is exiting (PF_EXITING) Changes: - Add documentation comments for the return value of proc_mem_open(). - Add checks in the caller to return -ESRCH when proc_mem_open() returns NULL. Link: https://lkml.kernel.org/r/20250404063357.78891-1-superman.xpt@gmail.com Reported-by: syzbot+f9238a0a31f9b5603fef@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000f52642060d4e3750@google.com Signed-off-by: Penglei Jiang <superman.xpt@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Adrian Ratiu <adrian.ratiu@collabora.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Felix Moessbauer <felix.moessbauer@siemens.com> Cc: Jeff layton <jlayton@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12fs/proc/page: refactor to reduce code duplicationLiu Ye1-107/+54
kpageflags_read() and kpagecgroup_read() are quite similar to kpagecount_read(). Refactor common code into a helper function to reduce code duplication. Link: https://lkml.kernel.org/r/20250318063226.223284-1-liuyerd@163.com Signed-off-by: Liu Ye <liuye@kylinos.cn> Acked-by: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Svetly Todorov <svetly.todorov@memverge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regionsAndrei Vagin1-7/+10
Patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions", v2. Introduce the PAGE_IS_GUARD flag in the PAGEMAP_SCAN ioctl to expose information about guard regions. This allows userspace tools, such as CRIU, to detect and handle guard regions. Currently, CRIU utilizes PAGEMAP_SCAN as a more efficient alternative to parsing /proc/pid/pagemap. Without this change, guard regions are incorrectly reported as swap-anon regions, leading CRIU to attempt dumping them and subsequently failing. The series includes updates to the documentation and selftests to reflect the new functionality. This patch (of 3): Introduce the PAGE_IS_GUARD flag in the PAGEMAP_SCAN ioctl to expose information about guard regions. This allows userspace tools, such as CRIU, to detect and handle guard regions. Link: https://lkml.kernel.org/r/20250324065328.107678-1-avagin@google.com Link: https://lkml.kernel.org/r/20250324065328.107678-2-avagin@google.com Signed-off-by: Andrei Vagin <avagin@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm: add folio_mk_pmd()Matthew Wilcox (Oracle)1-2/+1
Removes five conversions from folio to page. Also removes both callers of mk_pmd() that aren't part of mk_huge_pmd(), getting us a step closer to removing the confusion between mk_pmd(), mk_huge_pmd() and pmd_mkhuge(). Link: https://lkml.kernel.org/r/20250402181709.2386022-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12nfsd: remove legacy dprintks from GETATTR and STATFS codepathsJeff Layton2-10/+0
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: remove legacy READDIR dprintksJeff Layton2-9/+0
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: remove dprintks for v2/3 RENAME eventsJeff Layton2-14/+0
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: remove REMOVE/RMDIR dprintksJeff Layton2-15/+0
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: remove old LINK dprintksJeff Layton2-14/+0
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: remove old v2/3 SYMLINK dprintksJeff Layton2-9/+0
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: remove old v2/3 create path dprintksJeff Layton2-20/+0
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add tracepoint for getattr and statfs eventsJeff Layton5-0/+38
There isn't a common helper for getattrs, so add these into the protocol-specific helpers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add tracepoint to nfsd_readdirJeff Layton4-0/+33
Observe the start of NFS READDIR operations. The NFS READDIR's count argument can be interesting when tuning a client's readdir behavior. However, the count argument is not passed to nfsd_readdir(). To properly capture the count argument, this tracepoint must appear in each proc function before the nfsd_readdir() call. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add tracepoint to nfsd_renameJeff Layton2-0/+33
Observe the start of RENAME operations for all NFS versions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add tracepoints for unlink eventsJeff Layton2-0/+26
Observe the start of UNLINK, REMOVE, and RMDIR operations for all NFS versions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add tracepoint to nfsd_link()Jeff Layton2-0/+29
Observe the start of NFS LINK operations. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add tracepoint to nfsd_symlinkJeff Layton2-0/+29
Observe the start of SYMLINK operations for all NFS versions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add nfsd_vfs_create tracepointsJeff Layton3-0/+32
Observe the start of file and directory creation for all NFS versions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add a tracepoint to nfsd_lookup_dentryJeff Layton2-1/+24
Replace the dprintk in nfsd_lookup_dentry() with a trace point. nfsd_lookup_dentry() is called frequently enough that enabling this dprintk call site would result in log floods and performance issues. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add a tracepoint for nfsd_setattrJeff Layton2-0/+41
Turn Sargun's internal kprobe based implementation of this into a normal static tracepoint. Also, remove the dprintk's that got added recently with the fix for zero-length ACLs. Cc: Sargun Dillon <sargun@sargun.me> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: Add a Call equivalent to the NFSD_TRACE_PROC_RES macrosChuck Lever1-0/+17
Introduce tracing helpers that can be used before the procedure status code is known. These macros are similar to the SVC_RQST_ENDPOINT helpers, but they can be modified to include NFS-specific fields if that is needed later. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: Use sockaddr instead of a generic arrayChuck Lever1-14/+15
Record and emit presentation addresses using tracing helpers designed for the task. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: Implement FATTR4_CLONE_BLKSIZE attributeChuck Lever1-1/+18
RFC 7862 states that if an NFS server implements a CLONE operation, it MUST also implement FATTR4_CLONE_BLKSIZE. NFSD implements CLONE, but does not implement FATTR4_CLONE_BLKSIZE. Note that in Section 12.2, RFC 7862 claims that FATTR4_CLONE_BLKSIZE is RECOMMENDED, not REQUIRED. Likely this is because a minor version is not permitted to add a REQUIRED attribute. Confusing. We assume this attribute reports a block size as a count of bytes, as RFC 7862 does not specify a unit. Reported-by: Roland Mainz <roland.mainz@nrubsig.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org # v6.7+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: use SHA-256 library API instead of crypto_shash APIEric Biggers2-49/+14
This user of SHA-256 does not support any other algorithm, so the crypto_shash abstraction provides no value. Just use the SHA-256 library API instead, which is much simpler and easier to use. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: Initialize ssc before laundromat_work to prevent NULL dereferenceLi Lingfeng1-3/+3
In nfs4_state_start_net(), laundromat_work may access nfsd_ssc through nfs4_laundromat -> nfsd4_ssc_expire_umount. If nfsd_ssc isn't initialized, this can cause NULL pointer dereference. Normally the delayed start of laundromat_work allows sufficient time for nfsd_ssc initialization to complete. However, when the kernel waits too long for userspace responses (e.g. in nfs4_state_start_net -> nfsd4_end_grace -> nfsd4_record_grace_done -> nfsd4_cld_grace_done -> cld_pipe_upcall -> __cld_pipe_upcall -> wait_for_completion path), the delayed work may start before nfsd_ssc initialization finishes. Fix this by moving nfsd_ssc initialization before starting laundromat_work. Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: add commit start/done tracepoints around nfsd_commit()Jeff Layton2-0/+5
Very useful for gauging how long the vfs_fsync_range() takes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: nfsd4_spo_must_allow() must check this is a v4 compound requestNeilBrown1-1/+2
If the request being processed is not a v4 compound request, then examining the cstate can have undefined results. This patch adds a check that the rpc procedure being executed (rq_procinfo) is the NFSPROC4_COMPOUND procedure. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: fix access checking for NLM under XPRTSEC policiesOlga Kornievskaia1-1/+2
When an export policy with xprtsec policy is set with "tls" and/or "mtls", but an NFS client is doing a v3 xprtsec=tls mount, then NLM locking calls fail with an error because there is currently no support for NLM with TLS. Until such support is added, allow NLM calls under TLS-secured policy. Fixes: 4cc9b9f2bf4d ("nfsd: refine and rename NFSD_MAY_LOCK") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12nfsd: remove redundant WARN_ON_ONCE in nfsd4_writeGuoqing Jiang1-1/+0
It can be removed since svc_fill_write_vector already has the same WARN_ON_ONCE. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: Add experimental setting to disable the use of splice readChuck Lever3-0/+35
NFSD currently has two separate code paths for handling read requests. One uses page splicing; the other is a traditional read based on an iov iterator. Because most Linux file systems support splice read, the latter does not get nearly the same test experience as splice reads. To force the use of vectored reads for testing and benchmarking, introduce the ability to disable splice reads for all NFS READ operations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: Add /sys/kernel/debug/nfsdChuck Lever4-0/+31
Create a small sandbox under /sys/kernel/debug for experimental NFS server feature settings. There is no API/ABI compatibility guarantee for these settings. The only documentation for such settings, if any documentation exists, is in the kernel source code. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: fix race between nfsd registration and exports_procManinder Singh1-9/+8
As of now nfsd calls create_proc_exports_entry() at start of init_nfsd and cleanup by remove_proc_entry() at last of exit_nfsd. Which causes kernel OOPs if there is race between below 2 operations: (i) exportfs -r (ii) mount -t nfsd none /proc/fs/nfsd for 5.4 kernel ARM64: CPU 1: el1_irq+0xbc/0x180 arch_counter_get_cntvct+0x14/0x18 running_clock+0xc/0x18 preempt_count_add+0x88/0x110 prep_new_page+0xb0/0x220 get_page_from_freelist+0x2d8/0x1778 __alloc_pages_nodemask+0x15c/0xef0 __vmalloc_node_range+0x28c/0x478 __vmalloc_node_flags_caller+0x8c/0xb0 kvmalloc_node+0x88/0xe0 nfsd_init_net+0x6c/0x108 [nfsd] ops_init+0x44/0x170 register_pernet_operations+0x114/0x270 register_pernet_subsys+0x34/0x50 init_nfsd+0xa8/0x718 [nfsd] do_one_initcall+0x54/0x2e0 CPU 2 : Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 PC is at : exports_net_open+0x50/0x68 [nfsd] Call trace: exports_net_open+0x50/0x68 [nfsd] exports_proc_open+0x2c/0x38 [nfsd] proc_reg_open+0xb8/0x198 do_dentry_open+0x1c4/0x418 vfs_open+0x38/0x48 path_openat+0x28c/0xf18 do_filp_open+0x70/0xe8 do_sys_open+0x154/0x248 Sometimes it crashes at exports_net_open() and sometimes cache_seq_next_rcu(). and same is happening on latest 6.14 kernel as well: [ 0.000000] Linux version 6.14.0-rc5-next-20250304-dirty ... [ 285.455918] Unable to handle kernel paging request at virtual address 00001f4800001f48 ... [ 285.464902] pc : cache_seq_next_rcu+0x78/0xa4 ... [ 285.469695] Call trace: [ 285.470083] cache_seq_next_rcu+0x78/0xa4 (P) [ 285.470488] seq_read+0xe0/0x11c [ 285.470675] proc_reg_read+0x9c/0xf0 [ 285.470874] vfs_read+0xc4/0x2fc [ 285.471057] ksys_read+0x6c/0xf4 [ 285.471231] __arm64_sys_read+0x1c/0x28 [ 285.471428] invoke_syscall+0x44/0x100 [ 285.471633] el0_svc_common.constprop.0+0x40/0xe0 [ 285.471870] do_el0_svc_compat+0x1c/0x34 [ 285.472073] el0_svc_compat+0x2c/0x80 [ 285.472265] el0t_32_sync_handler+0x90/0x140 [ 285.472473] el0t_32_sync+0x19c/0x1a0 [ 285.472887] Code: f9400885 93407c23 937d7c27 11000421 (f86378a3) [ 285.473422] ---[ end trace 0000000000000000 ]--- It reproduced simply with below script: while [ 1 ] do /exportfs -r done & while [ 1 ] do insmod /nfsd.ko mount -t nfsd none /proc/fs/nfsd umount /proc/fs/nfsd rmmod nfsd done & So exporting interfaces to user space shall be done at last and cleanup at first place. With change there is no Kernel OOPs. Co-developed-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: unregister filesystem in case genl_register_family() failsManinder Singh1-1/+3
With rpc_status netlink support, unregister of register_filesystem() was missed in case of genl_register_family() fails. Correcting it by making new label. Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support") Cc: stable@vger.kernel.org Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: Record each NFSv4 call's session slot indexChuck Lever2-0/+13
Help the client resolve the race between the reply to an asynchronous COPY reply and the associated CB_OFFLOAD callback by planting the session, slot, and sequence number of the COPY in the CB_SEQUENCE contained in the CB_OFFLOAD COMPOUND. Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: Implement CB_SEQUENCE referring call listsChuck Lever2-17/+22
The slot index number of the current COMPOUND has, until now, not been needed outside of nfsd4_sequence(). But to record the tuple that represents a referring call, the slot number will be needed when processing subsequent operations in the COMPOUND. Refactor the code that allocates a new struct nfsd4_slot to ensure that the new sl_index field is always correctly initialized. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: Implement CB_SEQUENCE referring call listsChuck Lever3-6/+153
We have yet to implement a mechanism in NFSD for resolving races between a server's reply and a related callback operation. For example, a CB_OFFLOAD callback can race with the matching COPY response. The client will not recognize the copy state ID in the CB_OFFLOAD callback until the COPY response arrives. Trond adds: > It is also needed for the same kind of race with delegation > recalls, layout recalls, CB_NOTIFY_DEVICEID and would also be > helpful (although not as strongly required) for CB_NOTIFY_LOCK. RFC 8881 Section 20.9.3 describes referring call lists this way: > The csa_referring_call_lists array is the list of COMPOUND > requests, identified by session ID, slot ID, and sequence ID. > These are requests that the client previously sent to the server. > These previous requests created state that some operation(s) in > the same CB_COMPOUND as the csa_referring_call_lists are > identifying. A session ID is included because leased state is tied > to a client ID, and a client ID can have multiple sessions. See > Section 2.10.6.3. Introduce the XDR infrastructure for populating the csa_referring_call_lists argument of CB_SEQUENCE. Subsequent patches will put the referring call list to use. Note that cb_sequence_enc_sz estimates that only zero or one rcl is included in each CB_SEQUENCE, but the new infrastructure can manage any number of referring calls. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: Shorten CB_OFFLOAD response to NFS4ERR_DELAYChuck Lever1-1/+1
Try not to prolong the wait for completion of a COPY or COPY_NOTIFY operation. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-12NFSD: OFFLOAD_CANCEL should mark an async COPY as completedChuck Lever1-1/+4
Update the status of an async COPY operation when it has been stopped. OFFLOAD_STATUS needs to indicate that the COPY is no longer running. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11Merge tag 'mm-hotfixes-stable-2025-05-10-14-23' of ↵Linus Torvalds10-45/+136
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "22 hotfixes. 13 are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates" * tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mm: fix folio_pte_batch() on XEN PV nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs() mm/hugetlb: copy the CMA flag when demoting mm, swap: fix false warning for large allocation with !THP_SWAP selftests/mm: fix a build failure on powerpc selftests/mm: fix build break when compiling pkey_util.c mm: vmalloc: support more granular vrealloc() sizing tools/testing/selftests: fix guard region test tmpfs assumption ocfs2: stop quota recovery before disabling quotas ocfs2: implement handshaking with ocfs2 recovery thread ocfs2: switch osb->disable_recovery to enum mailmap: map Uwe's BayLibre addresses to a single one MAINTAINERS: add mm THP section mm/userfaultfd: fix uninitialized output field for -EAGAIN race selftests/mm: compaction_test: support platform with huge mount of memory MAINTAINERS: add core mm section ocfs2: fix panic in failed foilio allocation mm/huge_memory: fix dereferencing invalid pmd migration entry MAINTAINERS: add reverse mapping section x86: disable image size check for test builds ...