summaryrefslogtreecommitdiff
path: root/fs/nfsd
AgeCommit message (Collapse)AuthorFilesLines
2026-03-25nfsd: fix heap overflow in NFSv4.0 LOCK replay cacheJeff Layton2-7/+19
[ Upstream commit 5133b61aaf437e5f25b1b396b14242a6bb0508e2 ] The NFSv4.0 replay cache uses a fixed 112-byte inline buffer (rp_ibuf[NFSD4_REPLAY_ISIZE]) to store encoded operation responses. This size was calculated based on OPEN responses and does not account for LOCK denied responses, which include the conflicting lock owner as a variable-length field up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a LOCK operation is denied due to a conflict with an existing lock that has a large owner, nfsd4_encode_operation() copies the full encoded response into the undersized replay buffer via read_bytes_from_xdr_buf() with no bounds check. This results in a slab-out-of-bounds write of up to 944 bytes past the end of the buffer, corrupting adjacent heap memory. This can be triggered remotely by an unauthenticated attacker with two cooperating NFSv4.0 clients: one sets a lock with a large owner string, then the other requests a conflicting lock to provoke the denial. We could fix this by increasing NFSD4_REPLAY_ISIZE to allow for a full opaque, but that would increase the size of every stateowner, when most lockowners are not that large. Instead, fix this by checking the encoded response length against NFSD4_REPLAY_ISIZE before copying into the replay buffer. If the response is too large, set rp_buflen to 0 to skip caching the replay payload. The status is still cached, and the client already received the correct response on the original request. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@kernel.org Reported-by: Nicholas Carlini <npc@anthropic.com> Tested-by: Nicholas Carlini <npc@anthropic.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ replaced `op_status_offset + XDR_UNIT` with existing `post_err_offset` variable ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fdChuck Lever1-2/+12
commit e7fcf179b82d3a3730fd8615da01b087cc654d0b upstream. The /proc/fs/nfs/exports proc entry is created at module init and persists for the module's lifetime. exports_proc_open() captures the caller's current network namespace and stores its svc_export_cache in seq->private, but takes no reference on the namespace. If the namespace is subsequently torn down (e.g. container destruction after the opener does setns() to a different namespace), nfsd_net_exit() calls nfsd_export_shutdown() which frees the cache. Subsequent reads on the still-open fd dereference the freed cache_detail, walking a freed hash table. Hold a reference on the struct net for the lifetime of the open file descriptor. This prevents nfsd_net_exit() from running -- and thus prevents nfsd_export_shutdown() from freeing the cache -- while any exports fd is open. cache_detail already stores its net pointer (cd->net, set by cache_create_net()), so exports_release() can retrieve it without additional per-file storage. Reported-by: Misbah Anjum N <misanjum@linux.ibm.com> Closes: https://lore.kernel.org/linux-nfs/dcd371d3a95815a84ba7de52cef447b8@linux.ibm.com/ Fixes: 96d851c4d28d ("nfsd: use proper net while reading "exports" file") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-04nfsd: fix return error code for nfsd_map_name_to_[ug]idAnthony Iliopoulos1-0/+4
[ Upstream commit 404d779466646bf1461f2090ff137e99acaecf42 ] idmap lookups can time out while the cache is waiting for a userspace upcall reply. In that case cache_check() returns -ETIMEDOUT to callers. The nfsd_map_name_to_[ug]id functions currently proceed with attempting to map the id to a kuid despite a potentially temporary failure to perform the idmap lookup. This results in the code returning the error NFSERR_BADOWNER which can cause client operations to return to userspace with failure. Fix this by returning the failure status before attempting kuid mapping. This will return NFSERR_JUKEBOX on idmap lookup timeout so that clients can retry the operation instead of aborting it. Fixes: 65e10f6d0ab0 ("nfsd: Convert idmap to use kuids and kgids") Cc: stable@vger.kernel.org Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04nfsd: never defer requests during idmap lookupAnthony Iliopoulos3-8/+58
[ Upstream commit f9c206cdc4266caad6a9a7f46341420a10f03ccb ] During v4 request compound arg decoding, some ops (e.g. SETATTR) can trigger idmap lookup upcalls. When those upcall responses get delayed beyond the allowed time limit, cache_check() will mark the request for deferral and cause it to be dropped. This prevents nfs4svc_encode_compoundres from being executed, and thus the session slot flag NFSD4_SLOT_INUSE never gets cleared. Subsequent client requests will fail with NFSERR_JUKEBOX, given that the slot will be marked as in-use, making the SEQUENCE op fail. Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred. Fixes: 2f425878b6a7 ("nfsd: don't use the deferral service, return NFS4ERR_DELAY") Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-16nfsd: don't ignore the return code of svc_proc_register()Jeff Layton3-4/+11
commit 930b64ca0c511521f0abdd1d57ce52b2a6e3476b upstream. Currently, nfsd_proc_stat_init() ignores the return value of svc_proc_register(). If the procfile creation fails, then the kernel will WARN when it tries to remove the entry later. Fix nfsd_proc_stat_init() to return the same type of pointer as svc_proc_register(), and fix up nfsd_net_init() to check that and fail the nfsd_net construction if it occurs. svc_proc_register() can fail if the dentry can't be allocated, or if an identical dentry already exists. The second case is pretty unlikely in the nfsd_net construction codepath, so if this happens, return -ENOMEM. Reported-by: syzbot+e34ad04f27991521104c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-nfs/67a47501.050a0220.19061f.05f9.GAE@google.com/ Cc: stable@vger.kernel.org # v6.9 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ Update the cleanup path to use nfsd_stat_counters_destroy. This ensures the teardown logic is correctly paired with nfsd_stat_counters_init, as required by the current NFSD implementation.] Signed-off-by: Jianqiang kang <jianqkang@sina.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-30NFSD: fix race between nfsd registration and exports_procManinder Singh1-9/+8
[ Upstream commit f7fb730cac9aafda8b9813b55d04e28a9664d17c ] As of now nfsd calls create_proc_exports_entry() at start of init_nfsd and cleanup by remove_proc_entry() at last of exit_nfsd. Which causes kernel OOPs if there is race between below 2 operations: (i) exportfs -r (ii) mount -t nfsd none /proc/fs/nfsd for 5.4 kernel ARM64: CPU 1: el1_irq+0xbc/0x180 arch_counter_get_cntvct+0x14/0x18 running_clock+0xc/0x18 preempt_count_add+0x88/0x110 prep_new_page+0xb0/0x220 get_page_from_freelist+0x2d8/0x1778 __alloc_pages_nodemask+0x15c/0xef0 __vmalloc_node_range+0x28c/0x478 __vmalloc_node_flags_caller+0x8c/0xb0 kvmalloc_node+0x88/0xe0 nfsd_init_net+0x6c/0x108 [nfsd] ops_init+0x44/0x170 register_pernet_operations+0x114/0x270 register_pernet_subsys+0x34/0x50 init_nfsd+0xa8/0x718 [nfsd] do_one_initcall+0x54/0x2e0 CPU 2 : Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 PC is at : exports_net_open+0x50/0x68 [nfsd] Call trace: exports_net_open+0x50/0x68 [nfsd] exports_proc_open+0x2c/0x38 [nfsd] proc_reg_open+0xb8/0x198 do_dentry_open+0x1c4/0x418 vfs_open+0x38/0x48 path_openat+0x28c/0xf18 do_filp_open+0x70/0xe8 do_sys_open+0x154/0x248 Sometimes it crashes at exports_net_open() and sometimes cache_seq_next_rcu(). and same is happening on latest 6.14 kernel as well: [ 0.000000] Linux version 6.14.0-rc5-next-20250304-dirty ... [ 285.455918] Unable to handle kernel paging request at virtual address 00001f4800001f48 ... [ 285.464902] pc : cache_seq_next_rcu+0x78/0xa4 ... [ 285.469695] Call trace: [ 285.470083] cache_seq_next_rcu+0x78/0xa4 (P) [ 285.470488] seq_read+0xe0/0x11c [ 285.470675] proc_reg_read+0x9c/0xf0 [ 285.470874] vfs_read+0xc4/0x2fc [ 285.471057] ksys_read+0x6c/0xf4 [ 285.471231] __arm64_sys_read+0x1c/0x28 [ 285.471428] invoke_syscall+0x44/0x100 [ 285.471633] el0_svc_common.constprop.0+0x40/0xe0 [ 285.471870] do_el0_svc_compat+0x1c/0x34 [ 285.472073] el0_svc_compat+0x2c/0x80 [ 285.472265] el0t_32_sync_handler+0x90/0x140 [ 285.472473] el0t_32_sync+0x19c/0x1a0 [ 285.472887] Code: f9400885 93407c23 937d7c27 11000421 (f86378a3) [ 285.473422] ---[ end trace 0000000000000000 ]--- It reproduced simply with below script: while [ 1 ] do /exportfs -r done & while [ 1 ] do insmod /nfsd.ko mount -t nfsd none /proc/fs/nfsd umount /proc/fs/nfsd rmmod nfsd done & So exporting interfaces to user space shall be done at last and cleanup at first place. With change there is no Kernel OOPs. Co-developed-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ The context change is due to the commit bd9d6a3efa97 ("NFSD: add rpc_status netlink support") and the proper adoption is done. ] Signed-off-by: Rahul Sharma <black.hawk@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17NFSD: Remove NFSERR_EAGAINChuck Lever2-2/+1
[ Upstream commit c6c209ceb87f64a6ceebe61761951dcbbf4a0baa ] I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881. None of these RFCs have an NFS status code that match the numeric value "11". Based on the meaning of the EAGAIN errno, I presume the use of this status in NFSD means NFS4ERR_DELAY. So replace the one usage of nfserr_eagain, and remove it from NFSD's NFS status conversion tables. As far as I can tell, NFSERR_EAGAIN has existed since the pre-git era, but was not actually used by any code until commit f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed."), at which time it become possible for NFSD to return a status code of 11 (which is not valid NFS protocol). Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Cc: stable@vger.kernel.org Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17nfs_common: factor out nfs_errtbl and nfs_stat_to_errnoMike Snitzer1-0/+1
[ Upstream commit 4806ded4c14c5e8fdc6ce885d83221a78c06a428 ] Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and fs/nfs/nfs3xdr.c Will also be used by fs/nfsd/localio.c Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Stable-dep-of: c6c209ceb87f ("NFSD: Remove NFSERR_EAGAIN") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17NFSD: NFSv4 file creation neglects setting ACLChuck Lever1-1/+2
commit 913f7cf77bf14c13cfea70e89bcb6d0b22239562 upstream. An NFSv4 client that sets an ACL with a named principal during file creation retrieves the ACL afterwards, and finds that it is only a default ACL (based on the mode bits) and not the ACL that was requested during file creation. This violates RFC 8881 section 6.4.1.3: "the ACL attribute is set as given". The issue occurs in nfsd_create_setattr(), which calls nfsd_attrs_valid() to determine whether to call nfsd_setattr(). However, nfsd_attrs_valid() checks only for iattr changes and security labels, but not POSIX ACLs. When only an ACL is present, the function returns false, nfsd_setattr() is skipped, and the POSIX ACL is never applied to the inode. Subsequently, when the client retrieves the ACL, the server finds no POSIX ACL on the inode and returns one generated from the file's mode bits rather than returning the originally-specified ACL. Reported-by: Aurélien Couderc <aurelien.couderc2002@gmail.com> Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17nfsd: set security label during create operationsStephen Smalley2-1/+9
commit 442d27ff09a218b61020ab56387dbc508ad6bfa6 upstream. When security labeling is enabled, the client can pass a file security label as part of a create operation for the new file, similar to mode and other attributes. At present, the security label is received by nfsd and passed down to nfsd_create_setattr(), but nfsd_setattr() is never called and therefore the label is never set on the new file. This bug may have been introduced on or around commit d6a97d3f589a ("NFSD: add security label to struct nfsd_attrs"). Looking at nfsd_setattr() I am uncertain as to whether the same issue presents for file ACLs and therefore requires a similar fix for those. An alternative approach would be to introduce a new LSM hook to set the "create SID" of the current task prior to the actual file creation, which would atomically label the new inode at creation time. This would be better for SELinux and a similar approach has been used previously (see security_dentry_create_files_as) but perhaps not usable by other LSMs. Reproducer: 1. Install a Linux distro with SELinux - Fedora is easiest 2. git clone https://github.com/SELinuxProject/selinux-testsuite 3. Install the requisite dependencies per selinux-testsuite/README.md 4. Run something like the following script: MOUNT=$HOME/selinux-testsuite sudo systemctl start nfs-server sudo exportfs -o rw,no_root_squash,security_label localhost:$MOUNT sudo mkdir -p /mnt/selinux-testsuite sudo mount -t nfs -o vers=4.2 localhost:$MOUNT /mnt/selinux-testsuite pushd /mnt/selinux-testsuite/ sudo make -C policy load pushd tests/filesystem sudo runcon -t test_filesystem_t ./create_file -f trans_test_file \ -e test_filesystem_filetranscon_t -v sudo rm -f trans_test_file popd sudo make -C policy unload popd sudo umount /mnt/selinux-testsuite sudo exportfs -u localhost:$MOUNT sudo rmdir /mnt/selinux-testsuite sudo systemctl stop nfs-server Expected output: <eliding noise from commands run prior to or after the test itself> Process context: unconfined_u:unconfined_r:test_filesystem_t:s0-s0:c0.c1023 Created file: trans_test_file File context: unconfined_u:object_r:test_filesystem_filetranscon_t:s0 File context is correct Actual output: <eliding noise from commands run prior to or after the test itself> Process context: unconfined_u:unconfined_r:test_filesystem_t:s0-s0:c0.c1023 Created file: trans_test_file File context: system_u:object_r:test_file_t:s0 File context error, expected: test_filesystem_filetranscon_t got: test_file_t Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Stable-dep-of: 913f7cf77bf1 ("NFSD: NFSv4 file creation neglects setting ACL") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()Trond Myklebust8-21/+25
commit 24d92de9186ebc340687caf7356e1070773e67bc upstream. The main point of the guarded SETATTR is to prevent races with other WRITE and SETATTR calls. That requires that the check of the guard time against the inode ctime be done after taking the inode lock. Furthermore, we need to take into account the 32-bit nature of timestamps in NFSv3, and the possibility that files may change at a faster rate than once a second. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Stable-dep-of: 442d27ff09a2 ("nfsd: set security label during create operations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17nfsd: convert to new timestamp accessorsJeff Layton5-9/+10
commit 11fec9b9fb04fd1b3330a3b91ab9dcfa81ad5ad3 upstream. Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-50-jlayton@kernel.org Stable-dep-of: 24d92de9186e ("nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()") Signed-off-by: Christian Brauner <brauner@kernel.org> [ cel: d68886bae76a has already been applied ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17nfsd: provide locking for v4_end_graceNeilBrown4-5/+44
commit 2857bd59feb63fcf40fe4baf55401baea6b4feb4 upstream. Writing to v4_end_grace can race with server shutdown and result in memory being accessed after it was freed - reclaim_str_hashtbl in particularly. We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is held while client_tracking_op->init() is called and that can wait for an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a deadlock. nfsd4_end_grace() is also called by the landromat work queue and this doesn't require locking as server shutdown will stop the work and wait for it before freeing anything that nfsd4_end_grace() might access. However, we must be sure that writing to v4_end_grace doesn't restart the work item after shutdown has already waited for it. For this we add a new flag protected with nn->client_lock. It is set only while it is safe to make client tracking calls, and v4_end_grace only schedules work while the flag is set with the spinlock held. So this patch adds a nfsd_net field "client_tracking_active" which is set as described. Another field "grace_end_forced", is set when v4_end_grace is written. After this is set, and providing client_tracking_active is set, the laundromat is scheduled. This "grace_end_forced" field bypasses other checks for whether the grace period has finished. This resolves a race which can result in use-after-free. Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neil@brown.name> Tested-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17NFSD: Fix permission check for read access to executable-only filesScott Mayhew1-2/+2
commit e901c7fce59e72d9f3c92733c379849c4034ac50 upstream. Commit abc02e5602f7 ("NFSD: Support write delegations in LAYOUTGET") added NFSD_MAY_OWNER_OVERRIDE to the access flags passed from nfsd4_layoutget() to fh_verify(). This causes LAYOUTGET to fail for executable-only files, and causes xfstests generic/126 to fail on pNFS SCSI. To allow read access to executable-only files, what we really want is: 1. The "permissions" portion of the access flags (the lower 6 bits) must be exactly NFSD_MAY_READ 2. The "hints" portion of the access flags (the upper 26 bits) can contain any combination of NFSD_MAY_OWNER_OVERRIDE and NFSD_MAY_READ_IF_EXEC Fixes: abc02e5602f7 ("NFSD: Support write delegations in LAYOUTGET") Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-11NFSD: Clear SECLABEL in the suppattr_exclcreat bitmapChuck Lever1-0/+5
[ Upstream commit 27d17641cacfedd816789b75d342430f6b912bd2 ] >>From RFC 8881: 5.8.1.14. Attribute 75: suppattr_exclcreat > The bit vector that would set all REQUIRED and RECOMMENDED > attributes that are supported by the EXCLUSIVE4_1 method of file > creation via the OPEN operation. The scope of this attribute > applies to all objects with a matching fsid. There's nothing in RFC 8881 that states that suppattr_exclcreat is or is not allowed to contain bits for attributes that are clear in the reported supported_attrs bitmask. But it doesn't make sense for an NFS server to indicate that it /doesn't/ implement an attribute, but then also indicate that clients /are/ allowed to set that attribute using OPEN(create) with EXCLUSIVE4_1. Ensure that the SECURITY_LABEL and ACL bits are not set in the suppattr_exclcreat bitmask when they are also not set in the supported_attrs bitmask. Fixes: 8c18f2052e75 ("nfsd41: SUPPATTR_EXCLCREAT attribute") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-11nfsd: Drop the client reference in client_states_open()Haoxiang Li1-1/+3
commit 1f941b2c23fd34c6f3b76d36f9d0a2528fa92b8f upstream. In error path, call drop_client() to drop the reference obtained by get_nfsdfs_clp(). Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-11nfsd: Mark variable __maybe_unused to avoid W=1 build breakAndy Shevchenko1-1/+1
commit ebae102897e760e9e6bc625f701dd666b2163bd1 upstream. Clang is not happy about set but (in some cases) unused variable: fs/nfsd/export.c:1027:17: error: variable 'inode' set but not used [-Werror,-Wunused-but-set-variable] since it's used as a parameter to dprintk() which might be configured a no-op. To avoid uglifying code with the specific ifdeffery just mark the variable __maybe_unused. The commit [1], which introduced this behaviour, is quite old and hence the Fixes tag points to the first of the Git era. Link: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0431923fb7a1 [1] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-11NFSD: use correct reservation type in nfsd4_scsi_fence_clientDai Ngo1-1/+2
commit 6f52063db9aabdaabea929b1e998af98c2e8d917 upstream. The reservation type argument for the pr_preempt call should match the one used in nfsd4_block_get_device_info_scsi. Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support") Cc: stable@vger.kernel.org Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-11NFSD/blocklayout: Fix minlength check in proc_layoutgetSergey Bashirov1-1/+3
[ Upstream commit 3524b021b0ec620a76c89aee78e9d4b4130fb711 ] The extent returned by the file system may have a smaller offset than the segment offset requested by the client. In this case, the minimum segment length must be checked against the requested range. Otherwise, the client may not be able to continue the read/write operation. Fixes: 8650b8a05850 ("nfsd: pNFS block layout driver") Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07nfsd: Replace clamp_t in nfsd4_get_drc_mem()NeilBrown1-2/+4
A recent change to clamp_t() in 6.1.y caused fs/nfsd/nfs4state.c to fail to compile with gcc-9. The code in nfsd4_get_drc_mem() was written with the assumption that when "max < min", clamp(val, min, max) would return max. This assumption is not documented as an API promise and the change caused a compile failure if it could be statically determined that "max < min". The relevant code was no longer present upstream when commit 1519fbc8832b ("minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()") landed there, so there is no upstream change to nfsd4_get_drc_mem() to backport. There is no clear case that the existing code in nfsd4_get_drc_mem() is functioning incorrectly. The goal of this patch is to permit the clean application of commit 1519fbc8832b ("minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()"), and any commits that depend on it, to LTS kernels without affecting the ability to compile those kernels. This is done by open-coding the __clamp() macro sans the built-in type checking. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220745#c0 Signed-off-by: NeilBrown <neil@brown.name> Stable-dep-of: 1519fbc8832b ("minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed_by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-24NFSD: free copynotify stateid in nfs4_free_ol_stateid()Olga Kornievskaia1-1/+2
commit 4aa17144d5abc3c756883e3a010246f0dba8b468 upstream. Typically copynotify stateid is freed either when parent's stateid is being close/freed or in nfsd4_laundromat if the stateid hasn't been used in a lease period. However, in case when the server got an OPEN (which created a parent stateid), followed by a COPY_NOTIFY using that stateid, followed by a client reboot. New client instance while doing CREATE_SESSION would force expire previous state of this client. It leads to the open state being freed thru release_openowner-> nfs4_free_ol_stateid() and it finds that it still has copynotify stateid associated with it. We currently print a warning and is triggerred WARNING: CPU: 1 PID: 8858 at fs/nfsd/nfs4state.c:1550 nfs4_free_ol_stateid+0xb0/0x100 [nfsd] This patch, instead, frees the associated copynotify stateid here. If the parent stateid is freed (without freeing the copynotify stateids associated with it), it leads to the list corruption when laundromat ends up freeing the copynotify state later. [ 1626.839430] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 1626.842828] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth cfg80211 rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace nfs_localio ext4 crc16 mbcache jbd2 overlay uinput snd_seq_dummy snd_hrtimer qrtr rfkill vfat fat uvcvideo snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops snd_hda_intel uvc snd_intel_dspcfg videobuf2_v4l2 videobuf2_common snd_hda_codec snd_hda_core videodev snd_hwdep snd_seq mc snd_seq_device snd_pcm snd_timer snd soundcore sg loop auth_rpcgss vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs 8021q garp stp llc mrp nvme ghash_ce e1000e nvme_core sr_mod nvme_keyring nvme_auth cdrom vmwgfx drm_ttm_helper ttm sunrpc dm_mirror dm_region_hash dm_log iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse dm_multipath dm_mod nfnetlink [ 1626.855594] CPU: 2 UID: 0 PID: 199 Comm: kworker/u24:33 Kdump: loaded Tainted: G B W 6.17.0-rc7+ #22 PREEMPT(voluntary) [ 1626.857075] Tainted: [B]=BAD_PAGE, [W]=WARN [ 1626.857573] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024 [ 1626.858724] Workqueue: nfsd4 laundromat_main [nfsd] [ 1626.859304] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 1626.860010] pc : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.860601] lr : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.861182] sp : ffff8000881d7a40 [ 1626.861521] x29: ffff8000881d7a40 x28: 0000000000000018 x27: ffff0000c2a98200 [ 1626.862260] x26: 0000000000000600 x25: 0000000000000000 x24: ffff8000881d7b20 [ 1626.862986] x23: ffff0000c2a981e8 x22: 1fffe00012410e7d x21: ffff0000920873e8 [ 1626.863701] x20: ffff0000920873e8 x19: ffff000086f22998 x18: 0000000000000000 [ 1626.864421] x17: 20747562202c3839 x16: 3932326636383030 x15: 3030666666662065 [ 1626.865092] x14: 6220646c756f6873 x13: 0000000000000001 x12: ffff60004fd9e4a3 [ 1626.865713] x11: 1fffe0004fd9e4a2 x10: ffff60004fd9e4a2 x9 : dfff800000000000 [ 1626.866320] x8 : 00009fffb0261b5e x7 : ffff00027ecf2513 x6 : 0000000000000001 [ 1626.866938] x5 : ffff00027ecf2510 x4 : ffff60004fd9e4a3 x3 : 0000000000000000 [ 1626.867553] x2 : 0000000000000000 x1 : ffff000096069640 x0 : 000000000000006d [ 1626.868167] Call trace: [ 1626.868382] __list_del_entry_valid_or_report+0x148/0x200 (P) [ 1626.868876] _free_cpntf_state_locked+0xd0/0x268 [nfsd] [ 1626.869368] nfs4_laundromat+0x6f8/0x1058 [nfsd] [ 1626.869813] laundromat_main+0x24/0x60 [nfsd] [ 1626.870231] process_one_work+0x584/0x1050 [ 1626.870595] worker_thread+0x4c4/0xc60 [ 1626.870893] kthread+0x2f8/0x398 [ 1626.871146] ret_from_fork+0x10/0x20 [ 1626.871422] Code: aa1303e1 aa1403e3 910e8000 97bc55d7 (d4210000) [ 1626.871892] SMP: stopping secondary CPUs Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/d8f064c1-a26f-4eed-b4f0-1f7f608f415f@oracle.com/T/#t Fixes: 624322f1adc5 ("NFSD add COPY_NOTIFY operation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-24NFSD: Fix crash in nfsd4_read_release()Chuck Lever1-3/+4
commit abb1f08a2121dd270193746e43b2a9373db9ad84 upstream. When tracing is enabled, the trace_nfsd_read_done trace point crashes during the pynfs read.testNoFh test. Fixes: 15a8b55dbb1b ("nfsd: call op_release, even when op_func returns an error") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-23nfsd: decouple the xprtsec policy check from check_nfsd_access()Scott Mayhew3-9/+65
[ Upstream commit e4f574ca9c6dfa66695bb054ff5df43ecea873ec ] This is a backport of e4f574ca9c6d specifically for the 6.6-stable kernel. It differs from the upstream version mainly in that it's working around the absence of some 6.12-era commits: - 1459ad57673b nfsd: Move error code mapping to per-version proc code. - 0a183f24a7ae NFSD: Handle @rqstp == NULL in check_nfsd_access() - 5e66d2d92a1c nfsd: factor out __fh_verify to allow NULL rqstp to be passed A while back I had reported that an NFSv3 client could successfully mount using '-o xprtsec=none' an export that had been exported with 'xprtsec=tls:mtls'. By "successfully" I mean that the mount command would succeed and the mount would show up in /proc/mount. Attempting to do anything futher with the mount would be met with NFS3ERR_ACCES. Transport Layer Security isn't an RPC security flavor or pseudo-flavor, so we shouldn't be conflating them when determining whether the access checks can be bypassed. Split check_nfsd_access() into two helpers, and have fh_verify() call the helpers directly since fh_verify() has logic that allows one or both of the checks to be skipped. All other sites will continue to call check_nfsd_access(). Link: https://lore.kernel.org/linux-nfs/ZjO3Qwf_G87yNXb2@aion/ Fixes: 9280c5774314 ("NFSD: Handle new xprtsec= export option") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-23NFSD: Define a proc_layoutcommit for the FlexFiles layout typeChuck Lever1-0/+8
[ Upstream commit 4b47a8601b71ad98833b447d465592d847b4dc77 ] Avoid a crash if a pNFS client should happen to send a LAYOUTCOMMIT operation on a FlexFiles layout. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/linux-nfs/152f99b2-ba35-4dec-93a9-4690e625dccd@oracle.com/T/#t Cc: Thomas Haynes <loghyr@hammerspace.com> Cc: stable@vger.kernel.org Fixes: 9b9960a0ca47 ("nfsd: Add a super simple flex file server") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ removed struct svc_rqst parameter from nfsd4_ff_proc_layoutcommit ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-23NFSD: Fix last write offset handling in layoutcommitSergey Bashirov2-18/+17
[ Upstream commit d68886bae76a4b9b3484d23e5b7df086f940fa38 ] The data type of loca_last_write_offset is newoffset4 and is switched on a boolean value, no_newoffset, that indicates if a previous write occurred or not. If no_newoffset is FALSE, an offset is not given. This means that client does not try to update the file size. Thus, server should not try to calculate new file size and check if it fits into the segment range. See RFC 8881, section 12.5.4.2. Sometimes the current incorrect logic may cause clients to hang when trying to sync an inode. If layoutcommit fails, the client marks the inode as dirty again. Fixes: 9cf514ccfacb ("nfsd: implement pNFS operations") Cc: stable@vger.kernel.org Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ removed rqstp parameter from proc_layoutcommit ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-23NFSD: Minor cleanup in layoutcommit processingSergey Bashirov1-9/+3
[ Upstream commit 274365a51d88658fb51cca637ba579034e90a799 ] Remove dprintk in nfsd4_layoutcommit. These are not needed in day to day usage, and the information is also available in Wireshark when capturing NFS traffic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: d68886bae76a ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-23NFSD: Rework encoding and decoding of nfsd4_deviceidSergey Bashirov5-22/+39
[ Upstream commit 832738e4b325b742940761e10487403f9aad13e8 ] Compilers may optimize the layout of C structures, so we should not rely on sizeof struct and memcpy to encode and decode XDR structures. The byte order of the fields should also be taken into account. This patch adds the correct functions to handle the deviceid4 structure and removes the pad field, which is currently not used by NFSD, from the runtime state. The server's byte order is preserved because the deviceid4 blob on the wire is only used as a cookie by the client. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: d68886bae76a ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19nfsd: nfserr_jukebox in nlm_fopen should lead to a retryOlga Kornievskaia1-0/+15
commit a082e4b4d08a4a0e656d90c2c05da85f23e6d0c9 upstream. When v3 NLM request finds a conflicting delegation, it triggers a delegation recall and nfsd_open fails with EAGAIN. nfsd_open then translates EAGAIN into nfserr_jukebox. In nlm_fopen, instead of returning nlm_failed for when there is a conflicting delegation, drop this NLM request so that the client retries. Once delegation is recalled and if a local lock is claimed, a retry would lead to nfsd returning a nlm_lck_blocked error or a successful nlm lock. Fixes: d343fce148a4 ("[PATCH] knfsd: Allow lockd to drop replies as appropriate") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul()Thorsten Blum1-1/+1
commit ab1c282c010c4f327bd7addc3c0035fd8e3c1721 upstream. Commit 5304877936c0 ("NFSD: Fix strncpy() fortify warning") replaced strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy() already guaranteed NUL-termination of the destination buffer and subtracting one byte potentially truncated the source string. The incorrect size was then carried over in commit 72f78ae00a8e ("NFSD: move from strlcpy with unused retval to strscpy") when switching from strlcpy() to strscpy(). Fix this off-by-one error by using the full size of the destination buffer again. Cc: stable@vger.kernel.org Fixes: 5304877936c0 ("NFSD: Fix strncpy() fortify warning") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19NFSD: nfsd_unlink() clobbers non-zero status returned from fh_fill_pre_attrs()Chuck Lever1-3/+1
[ Upstream commit d7d8e3169b56e7696559a2427c922c0d55debcec ] If fh_fill_pre_attrs() returns a non-zero status, the error flow takes it through out_unlock, which then overwrites the returned status code with err = nfserrno(host_err); Fixes: a332018a91c4 ("nfsd: handle failure to collect pre/post-op attrs more sanely") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ Slightly different error mapping ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19nfsd: Fix a regression in nfsd_setattr()Trond Myklebust2-2/+11
[ Upstream commit 6412e44c40aaf8f1d7320b2099c5bdd6cb9126ac ] Commit bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") broke the NFSv3 pre/post op attributes behaviour when doing a SETATTR rpc call by stripping out the calls to fh_fill_pre_attrs() and fh_fill_post_attrs(). Fixes: bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Message-ID: <20240216012451.22725-1-trondmy@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: d7d8e3169b56 ("NFSD: nfsd_unlink() clobbers non-zero status returned from fh_fill_pre_attrs()") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28NFSD: detect mismatch of file handle and delegation stateid in OPEN opDai Ngo1-0/+14
commit 9c65001c57164033ad08b654c8b5ae35512ddf4a upstream. When the client sends an OPEN with claim type CLAIM_DELEG_CUR_FH or CLAIM_DELEGATION_CUR, the delegation stateid and the file handle must belong to the same file, otherwise return NFS4ERR_INVAL. Note that RFC8881, section 8.2.4, mandates the server to return NFS4ERR_BAD_STATEID if the selected table entry does not match the current filehandle. However returning NFS4ERR_BAD_STATEID in the OPEN causes the client to retry the operation and therefor get the client into a loop. To avoid this situation we return NFS4ERR_INVAL instead. Reported-by: Petro Pavlov <petro.pavlov@vastdata.com> Fixes: c44c5eeb2c02 ("[PATCH] nfsd4: add open state code for CLAIM_DELEGATE_CUR") Cc: stable@vger.kernel.org Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28nfsd: handle get_client_locked() failure in nfsd4_setclientid_confirm()Jeff Layton1-5/+15
commit 908e4ead7f757504d8b345452730636e298cbf68 upstream. Lei Lu recently reported that nfsd4_setclientid_confirm() did not check the return value from get_client_locked(). a SETCLIENTID_CONFIRM could race with a confirmed client expiring and fail to get a reference. That could later lead to a UAF. Fix this by getting a reference early in the case where there is an extant confirmed client. If that fails then treat it as if there were no confirmed client found at all. In the case where the unconfirmed client is expiring, just fail and return the result from get_client_locked(). Reported-by: lei lu <llfamsec@gmail.com> Closes: https://lore.kernel.org/linux-nfs/CAEBF3_b=UvqzNKdnfD_52L05Mqrqui9vZ2eFamgAbV0WG+FNWQ@mail.gmail.com/ Fixes: d20c11d86d8f ("nfsd: Protect session creation and client confirm using client_lock") Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27nfsd: Initialize ssc before laundromat_work to prevent NULL dereferenceLi Lingfeng1-3/+3
commit b31da62889e6d610114d81dc7a6edbcaa503fcf8 upstream. In nfs4_state_start_net(), laundromat_work may access nfsd_ssc through nfs4_laundromat -> nfsd4_ssc_expire_umount. If nfsd_ssc isn't initialized, this can cause NULL pointer dereference. Normally the delayed start of laundromat_work allows sufficient time for nfsd_ssc initialization to complete. However, when the kernel waits too long for userspace responses (e.g. in nfs4_state_start_net -> nfsd4_end_grace -> nfsd4_record_grace_done -> nfsd4_cld_grace_done -> cld_pipe_upcall -> __cld_pipe_upcall -> wait_for_completion path), the delayed work may start before nfsd_ssc initialization finishes. Fix this by moving nfsd_ssc initialization before starting laundromat_work. Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27nfsd: nfsd4_spo_must_allow() must check this is a v4 compound requestNeilBrown1-1/+2
commit 1244f0b2c3cecd3f349a877006e67c9492b41807 upstream. If the request being processed is not a v4 compound request, then examining the cstate can have undefined results. This patch adds a check that the rpc procedure being executed (rq_procinfo) is the NFSPROC4_COMPOUND procedure. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25nfsd: decrease sc_count directly if fail to queue dl_recallLi Lingfeng1-1/+1
[ Upstream commit a1d14d931bf700c1025db8c46d6731aa5cf440f9 ] A deadlock warning occurred when invoking nfs4_put_stid following a failed dl_recall queue operation: T1 T2 nfs4_laundromat nfs4_get_client_reaplist nfs4_anylock_blockers __break_lease spin_lock // ctx->flc_lock spin_lock // clp->cl_lock nfs4_lockowner_has_blockers locks_owner_has_blockers spin_lock // flctx->flc_lock nfsd_break_deleg_cb nfsd_break_one_deleg nfs4_put_stid refcount_dec_and_lock spin_lock // clp->cl_lock When a file is opened, an nfs4_delegation is allocated with sc_count initialized to 1, and the file_lease holds a reference to the delegation. The file_lease is then associated with the file through kernel_setlease. The disassociation is performed in nfsd4_delegreturn via the following call chain: nfsd4_delegreturn --> destroy_delegation --> destroy_unhashed_deleg --> nfs4_unlock_deleg_lease --> kernel_setlease --> generic_delete_lease The corresponding sc_count reference will be released after this disassociation. Since nfsd_break_one_deleg executes while holding the flc_lock, the disassociation process becomes blocked when attempting to acquire flc_lock in generic_delete_lease. This means: 1) sc_count in nfsd_break_one_deleg will not be decremented to 0; 2) The nfs4_put_stid called by nfsd_break_one_deleg will not attempt to acquire cl_lock; 3) Consequently, no deadlock condition is created. Given that sc_count in nfsd_break_one_deleg remains non-zero, we can safely perform refcount_dec on sc_count directly. This approach effectively avoids triggering deadlock warnings. Fixes: 230ca758453c ("nfsd: put dl_stid if fail to queue dl_recall") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25nfs: add missing selections of CONFIG_CRC32Eric Biggers2-7/+1
[ Upstream commit cd35b6cb46649750b7dbd0df0e2d767415d8917b ] nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available only when CONFIG_CRC32 is enabled. But the only NFS kconfig option that selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and did not actually guard the use of crc32_le() even on the client. The code worked around this bug by only actually calling crc32_le() when CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases. This avoided randconfig build errors, and in real kernels the fallback code was unlikely to be reached since CONFIG_CRC32 is 'default y'. But, this really needs to just be done properly, especially now that I'm planning to update CONFIG_CRC32 to not be 'default y'. Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select CONFIG_CRC32. Then remove the fallback code that becomes unnecessary, as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG. Fixes: 1264a2f053a3 ("NFS: refactor code for calculating the crc32 hash of a filehandle") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10NFSD: Skip sending CB_RECALL_ANY when the backchannel isn't upChuck Lever1-7/+12
commit 8a388c1fabeb6606e16467b23242416c0dbeffad upstream. NFSD sends CB_RECALL_ANY to clients when the server is low on memory or that client has a large number of delegations outstanding. We've seen cases where NFSD attempts to send CB_RECALL_ANY requests to disconnected clients, and gets confused. These calls never go anywhere if a backchannel transport to the target client isn't available. Before the server can send any backchannel operation, the client has to connect first and then do a BIND_CONN_TO_SESSION. This patch doesn't address the root cause of the confusion, but there's no need to queue up these optional operations if they can't go anywhere. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10nfsd: put dl_stid if fail to queue dl_recallLi Lingfeng1-1/+11
commit 230ca758453c63bd38e4d9f4a21db698f7abada8 upstream. Before calling nfsd4_run_cb to queue dl_recall to the callback_wq, we increment the reference count of dl_stid. We expect that after the corresponding work_struct is processed, the reference count of dl_stid will be decremented through the callback function nfsd4_cb_recall_release. However, if the call to nfsd4_run_cb fails, the incremented reference count of dl_stid will not be decremented correspondingly, leading to the following nfs4_stid leak: unreferenced object 0xffff88812067b578 (size 344): comm "nfsd", pid 2761, jiffies 4295044002 (age 5541.241s) hex dump (first 32 bytes): 01 00 00 00 6b 6b 6b 6b b8 02 c0 e2 81 88 ff ff ....kkkk........ 00 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 ad 4e ad de .kkkkkkk.....N.. backtrace: kmem_cache_alloc+0x4b9/0x700 nfsd4_process_open1+0x34/0x300 nfsd4_open+0x2d1/0x9d0 nfsd4_proc_compound+0x7a2/0xe30 nfsd_dispatch+0x241/0x3e0 svc_process_common+0x5d3/0xcc0 svc_process+0x2a3/0x320 nfsd+0x180/0x2e0 kthread+0x199/0x1d0 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1b/0x30 unreferenced object 0xffff8881499f4d28 (size 368): comm "nfsd", pid 2761, jiffies 4295044005 (age 5541.239s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 30 4d 9f 49 81 88 ff ff ........0M.I.... 30 4d 9f 49 81 88 ff ff 20 00 00 00 01 00 00 00 0M.I.... ....... backtrace: kmem_cache_alloc+0x4b9/0x700 nfs4_alloc_stid+0x29/0x210 alloc_init_deleg+0x92/0x2e0 nfs4_set_delegation+0x284/0xc00 nfs4_open_delegation+0x216/0x3f0 nfsd4_process_open2+0x2b3/0xee0 nfsd4_open+0x770/0x9d0 nfsd4_proc_compound+0x7a2/0xe30 nfsd_dispatch+0x241/0x3e0 svc_process_common+0x5d3/0xcc0 svc_process+0x2a3/0x320 nfsd+0x180/0x2e0 kthread+0x199/0x1d0 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1b/0x30 Fix it by checking the result of nfsd4_run_cb and call nfs4_put_stid if fail to queue dl_recall. Cc: stable@vger.kernel.org Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21NFSD: fix hang in nfsd4_shutdown_callbackDai Ngo1-2/+5
commit 036ac2778f7b28885814c6fbc07e156ad1624d03 upstream. If nfs4_client is in courtesy state then there is no point to send the callback. This causes nfsd4_shutdown_callback to hang since cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP notifies NFSD that the connection was dropped. This patch modifies nfsd4_run_cb_work to skip the RPC call if nfs4_client is in courtesy state. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Fixes: 66af25799940 ("NFSD: add courteous server support for thread with only delegation") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21nfsd: clear acl_access/acl_default after releasing themLi Lingfeng2-0/+4
commit 7faf14a7b0366f153284db0ad3347c457ea70136 upstream. If getting acl_default fails, acl_access and acl_default will be released simultaneously. However, acl_access will still retain a pointer pointing to the released posix_acl, which will trigger a WARNING in nfs3svc_release_getacl like this: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 26 PID: 3199 at lib/refcount.c:28 refcount_warn_saturate+0xb5/0x170 Modules linked in: CPU: 26 UID: 0 PID: 3199 Comm: nfsd Not tainted 6.12.0-rc6-00079-g04ae226af01f-dirty #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb5/0x170 Code: cc cc 0f b6 1d b3 20 a5 03 80 fb 01 0f 87 65 48 d8 00 83 e3 01 75 e4 48 c7 c7 c0 3b 9b 85 c6 05 97 20 a5 03 01 e8 fb 3e 30 ff <0f> 0b eb cd 0f b6 1d 8a3 RSP: 0018:ffffc90008637cd8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83904fde RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88871ed36380 RBP: ffff888158beeb40 R08: 0000000000000001 R09: fffff520010c6f56 R10: ffffc90008637ab7 R11: 0000000000000001 R12: 0000000000000001 R13: ffff888140e77400 R14: ffff888140e77408 R15: ffffffff858b42c0 FS: 0000000000000000(0000) GS:ffff88871ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000562384d32158 CR3: 000000055cc6a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? refcount_warn_saturate+0xb5/0x170 ? __warn+0xa5/0x140 ? refcount_warn_saturate+0xb5/0x170 ? report_bug+0x1b1/0x1e0 ? handle_bug+0x53/0xa0 ? exc_invalid_op+0x17/0x40 ? asm_exc_invalid_op+0x1a/0x20 ? tick_nohz_tick_stopped+0x1e/0x40 ? refcount_warn_saturate+0xb5/0x170 ? refcount_warn_saturate+0xb5/0x170 nfs3svc_release_getacl+0xc9/0xe0 svc_process_common+0x5db/0xb60 ? __pfx_svc_process_common+0x10/0x10 ? __rcu_read_unlock+0x69/0xa0 ? __pfx_nfsd_dispatch+0x10/0x10 ? svc_xprt_received+0xa1/0x120 ? xdr_init_decode+0x11d/0x190 svc_process+0x2a7/0x330 svc_handle_xprt+0x69d/0x940 svc_recv+0x180/0x2d0 nfsd+0x168/0x200 ? __pfx_nfsd+0x10/0x10 kthread+0x1a2/0x1e0 ? kthread+0xf4/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Kernel panic - not syncing: kernel: panic_on_warn set ... Clear acl_access/acl_default after posix_acl_release is called to prevent UAF from being triggered. Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241107014705.2509463-1-lilingfeng@huaweicloud.com/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Rick Macklem <rmacklem@uoguelph.ca> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08NFSD: Reset cb_seq_status after NFS4ERR_DELAYChuck Lever1-0/+1
commit 961b4b5e86bf56a2e4b567f81682defa5cba957e upstream. I noticed that once an NFSv4.1 callback operation gets a NFS4ERR_DELAY status on CB_SEQUENCE and then the connection is lost, the callback client loops, resending it indefinitely. The switch arm in nfsd4_cb_sequence_done() that handles NFS4ERR_DELAY uses rpc_restart_call() to rearm the RPC state machine for the retransmit, but that path does not call the rpc_prepare_call callback again. Thus cb_seq_status is set to -10008 by the first NFS4ERR_DELAY result, but is never set back to 1 for the retransmits. nfsd4_cb_sequence_done() thinks it's getting nothing but a long series of CB_SEQUENCE NFS4ERR_DELAY replies. Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-23nfsd: add list_head nf_gc to struct nfsd_fileYouzhong Yang2-8/+11
commit 8e6e2ffa6569a205f1805cbaeca143b556581da6 upstream. nfsd_file_put() in one thread can race with another thread doing garbage collection (running nfsd_file_gc() -> list_lru_walk() -> nfsd_file_lru_cb()): * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add(). * nfsd_file_lru_add() returns true (with NFSD_FILE_REFERENCED bit set) * garbage collector kicks in, nfsd_file_lru_cb() clears REFERENCED bit and returns LRU_ROTATE. * garbage collector kicks in again, nfsd_file_lru_cb() now decrements nf->nf_ref to 0, runs nfsd_file_unhash(), removes it from the LRU and adds to the dispose list [list_lru_isolate_move(lru, &nf->nf_lru, head)] * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))]. The 'nf' has been added to the 'dispose' list by nfsd_file_lru_cb(), so nfsd_file_lru_remove(nf) simply treats it as part of the LRU and removes it, which leads to its removal from the 'dispose' list. * At this moment, 'nf' is unhashed with its nf_ref being 0, and not on the LRU. nfsd_file_put() continues its execution [if (refcount_dec_and_test(&nf->nf_ref))], as nf->nf_ref is already 0, nf->nf_ref is set to REFCOUNT_SATURATED, and the 'nf' gets no chance of being freed. nfsd_file_put() can also race with nfsd_file_cond_queue(): * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add(). * nfsd_file_lru_add() sets REFERENCED bit and returns true. * Some userland application runs 'exportfs -f' or something like that, which triggers __nfsd_file_cache_purge() -> nfsd_file_cond_queue(). * In nfsd_file_cond_queue(), it runs [if (!nfsd_file_unhash(nf))], unhash is done successfully. * nfsd_file_cond_queue() runs [if (!nfsd_file_get(nf))], now nf->nf_ref goes to 2. * nfsd_file_cond_queue() runs [if (nfsd_file_lru_remove(nf))], it succeeds. * nfsd_file_cond_queue() runs [if (refcount_sub_and_test(decrement, &nf->nf_ref))] (with "decrement" being 2), so the nf->nf_ref goes to 0, the 'nf' is added to the dispose list [list_add(&nf->nf_lru, dispose)] * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))], although the 'nf' is not in the LRU, but it is linked in the 'dispose' list, nfsd_file_lru_remove() simply treats it as part of the LRU and removes it. This leads to its removal from the 'dispose' list! * Now nf->ref is 0, unhashed. nfsd_file_put() continues its execution and set nf->nf_ref to REFCOUNT_SATURATED. As shown in the above analysis, using nf_lru for both the LRU list and dispose list can cause the leaks. This patch adds a new list_head nf_gc in struct nfsd_file, and uses it for the dispose list. This does not fix the nfsd_file leaking issue completely. Signed-off-by: Youzhong Yang <youzhong@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-02nfsd: restore callback functionality for NFSv4.0NeilBrown1-3/+1
[ Upstream commit 7917f01a286ce01e9c085e24468421f596ee1a0c ] A recent patch inadvertently broke callbacks for NFSv4.0. In the 4.0 case we do not expect a session to be found but still need to call setup_callback_client() which will not try to dereference it. This patch moves the check for failure to find a session into the 4.1+ branch of setup_callback_client() Fixes: 1e02c641c3a4 ("NFSD: Prevent NULL dereference in nfsd4_process_cb_update()") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"Yang Erkun2-27/+8
[ Upstream commit 69d803c40edeaf94089fbc8751c9b746cdc35044 ] This reverts commit f8c989a0c89a75d30f899a7cabdc14d72522bb8d. Before this commit, svc_export_put or expkey_put will call path_put with sync mode. After this commit, path_put will be called with async mode. And this can lead the unexpected results show as follow. mkfs.xfs -f /dev/sda echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports exportfs -ra service nfs-server start mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1 mount /dev/sda /mnt/sda touch /mnt1/sda/file exportfs -r umount /mnt/sda # failed unexcepted The touch will finally call nfsd_cross_mnt, add refcount to mount, and then add cache_head. Before this commit, exportfs -r will call cache_flush to cleanup all cache_head, and path_put in svc_export_put/expkey_put will be finished with sync mode. So, the latter umount will always success. However, after this commit, path_put will be called with async mode, the latter umount may failed, and if we add some delay, umount will success too. Personally I think this bug and should be fixed. We first revert before bugfix patch, and then fix the original bug with a different way. Fixes: f8c989a0c89a ("nfsd: release svc_expkey/svc_export with rcu_work") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occurYang Erkun1-0/+19
commit 98100e88dd8865999dc6379a3356cd799795fe7b upstream. The action force umount(umount -f) will attempt to kill all rpc_task even umount operation may ultimately fail if some files remain open. Consequently, if an action attempts to open a file, it can potentially send two rpc_task to nfs server. NFS CLIENT thread1 thread2 open("file") ... nfs4_do_open _nfs4_do_open _nfs4_open_and_get_state _nfs4_proc_open nfs4_run_open_task /* rpc_task1 */ rpc_run_task rpc_wait_for_completion_task umount -f nfs_umount_begin rpc_killall_tasks rpc_signal_task rpc_task1 been wakeup and return -512 _nfs4_do_open // while loop ... nfs4_run_open_task /* rpc_task2 */ rpc_run_task rpc_wait_for_completion_task While processing an open request, nfsd will first attempt to find or allocate an nfs4_openowner. If it finds an nfs4_openowner that is not marked as NFS4_OO_CONFIRMED, this nfs4_openowner will released. Since two rpc_task can attempt to open the same file simultaneously from the client to server, and because two instances of nfsd can run concurrently, this situation can lead to lots of memory leak. Additionally, when we echo 0 to /proc/fs/nfsd/threads, warning will be triggered. NFS SERVER nfsd1 nfsd2 echo 0 > /proc/fs/nfsd/threads nfsd4_open nfsd4_process_open1 find_or_alloc_open_stateowner // alloc oo1, stateid1 nfsd4_open nfsd4_process_open1 find_or_alloc_open_stateowner // find oo1, without NFS4_OO_CONFIRMED release_openowner unhash_openowner_locked list_del_init(&oo->oo_perclient) // cannot find this oo // from client, LEAK!!! alloc_stateowner // alloc oo2 nfsd4_process_open2 init_open_stateid // associate oo1 // with stateid1, stateid1 LEAK!!! nfs4_get_vfs_file // alloc nfsd_file1 and nfsd_file_mark1 // all LEAK!!! nfsd4_process_open2 ... write_threads ... nfsd_destroy_serv nfsd_shutdown_net nfs4_state_shutdown_net nfs4_state_destroy_net destroy_client __destroy_client // won't find oo1!!! nfsd_shutdown_generic nfsd_file_cache_shutdown kmem_cache_destroy for nfsd_file_slab and nfsd_file_mark_slab // bark since nfsd_file1 // and nfsd_file_mark1 // still alive ======================================================================= BUG nfsd_file (Not tainted): Objects remaining in nfsd_file on __kmem_cache_shutdown() ----------------------------------------------------------------------- Slab 0xffd4000004438a80 objects=34 used=1 fp=0xff11000110e2ad28 flags=0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff) CPU: 4 UID: 0 PID: 757 Comm: sh Not tainted 6.12.0-rc6+ #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 slab_err+0xb0/0xf0 __kmem_cache_shutdown+0x15c/0x310 kmem_cache_destroy+0x66/0x160 nfsd_file_cache_shutdown+0xac/0x210 [nfsd] nfsd_destroy_serv+0x251/0x2a0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1ae/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Disabling lock debugging due to kernel taint Object 0xff11000110e2ac38 @offset=3128 Allocated in nfsd_file_do_acquire+0x20f/0xa30 [nfsd] age=1635 cpu=3 pid=800 nfsd_file_do_acquire+0x20f/0xa30 [nfsd] nfsd_file_acquire_opened+0x5f/0x90 [nfsd] nfs4_get_vfs_file+0x4c9/0x570 [nfsd] nfsd4_process_open2+0x713/0x1070 [nfsd] nfsd4_open+0x74b/0x8b0 [nfsd] nfsd4_proc_compound+0x70b/0xc20 [nfsd] nfsd_dispatch+0x1b4/0x3a0 [nfsd] svc_process_common+0x5b8/0xc50 [sunrpc] svc_process+0x2ab/0x3b0 [sunrpc] svc_handle_xprt+0x681/0xa20 [sunrpc] nfsd+0x183/0x220 [nfsd] kthread+0x199/0x1e0 ret_from_fork+0x31/0x60 ret_from_fork_asm+0x1a/0x30 Add nfs4_openowner_unhashed to help found unhashed nfs4_openowner, and break nfsd4_open process to fix this problem. Cc: stable@vger.kernel.org # v5.4+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09nfsd: make sure exp active before svc_export_showYang Erkun1-1/+4
commit be8f982c369c965faffa198b46060f8853e0f1f0 upstream. The function `e_show` was called with protection from RCU. This only ensures that `exp` will not be freed. Therefore, the reference count for `exp` can drop to zero, which will trigger a refcount use-after-free warning when `exp_get` is called. To resolve this issue, use `cache_get_rcu` to ensure that `exp` remains active. ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 819 at lib/refcount.c:25 refcount_warn_saturate+0xb1/0x120 CPU: 3 UID: 0 PID: 819 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb1/0x120 ... Call Trace: <TASK> e_show+0x20b/0x230 [nfsd] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: bf18f163e89c ("NFSD: Using exp_get for export getting") Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09NFSD: Prevent a potential integer overflowChuck Lever1-7/+7
commit 7f33b92e5b18e904a481e6e208486da43e4dc841 upstream. If the tag length is >= U32_MAX - 3 then the "length + 4" addition can result in an integer overflow. Address this by splitting the decoding into several steps so that decode_cb_compound4res() does not have to perform arithmetic on the unsafe length value. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09NFSD: Fix nfsd4_shutdown_copy()Chuck Lever1-2/+5
[ Upstream commit 62a8642ba00aa8ceb0a02ade942f5ec52e877c95 ] nfsd4_shutdown_copy() is just this: while ((copy = nfsd4_get_copy(clp)) != NULL) nfsd4_stop_copy(copy); nfsd4_get_copy() bumps @copy's reference count, preventing nfsd4_stop_copy() from releasing @copy. A while loop like this usually works by removing the first element of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy() alters the async_copies list. Best I can tell, then, is that nfsd4_shutdown_copy() continues to loop until other threads manage to remove all the items from this list. The spinning loop blocks shutdown until these items are gone. Possibly the reason we haven't seen this issue in the field is because client_has_state() prevents __destroy_client() from calling nfsd4_shutdown_copy() if there are any items on this list. In a subsequent patch I plan to remove that restriction. Fixes: e0639dc5805a ("NFSD introduce async copy feature") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09nfsd: release svc_expkey/svc_export with rcu_workYang Erkun2-8/+27
[ Upstream commit f8c989a0c89a75d30f899a7cabdc14d72522bb8d ] The last reference for `cache_head` can be reduced to zero in `c_show` and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently, `svc_export_put` and `expkey_put` will be invoked, leading to two issues: 1. The `svc_export_put` will directly free ex_uuid. However, `e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can trigger a use-after-free issue, shown below. ================================================================== BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd] Read of size 1 at addr ff11000010fdc120 by task cat/870 CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x2c/0x3a0 print_report+0xb9/0x280 kasan_report+0xae/0xe0 svc_export_show+0x362/0x430 [nfsd] c_show+0x161/0x390 [sunrpc] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 proc_reg_read+0xe1/0x140 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 830: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc_node_track_caller_noprof+0x1bc/0x400 kmemdup_noprof+0x22/0x50 svc_export_parse+0x8a9/0xb80 [nfsd] cache_do_downcall+0x71/0xa0 [sunrpc] cache_write_procfs+0x8e/0xd0 [sunrpc] proc_reg_write+0xe1/0x140 vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 868: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x37/0x50 kfree+0xf3/0x3e0 svc_export_put+0x87/0xb0 [nfsd] cache_purge+0x17f/0x1f0 [sunrpc] nfsd_destroy_serv+0x226/0x2d0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e 2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`. However, `svc_export_put`/`expkey_put` will call path_put, which subsequently triggers a sleeping operation due to the following `dput`. ============================= WARNING: suspicious RCU usage 5.10.0-dirty #141 Not tainted ----------------------------- ... Call Trace: dump_stack+0x9a/0xd0 ___might_sleep+0x231/0x240 dput+0x39/0x600 path_put+0x1b/0x30 svc_export_put+0x17/0x80 e_show+0x1c9/0x200 seq_read_iter+0x63f/0x7c0 seq_read+0x226/0x2d0 vfs_read+0x113/0x2c0 ksys_read+0xc9/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Fix these issues by using `rcu_work` to help release `svc_expkey`/`svc_export`. This approach allows for an asynchronous context to invoke `path_put` and also facilitates the freeing of `uuid/exp/key` after an RCU grace period. Fixes: 9ceddd9da134 ("knfsd: Allow lockless lookups of the exports") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>