summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)AuthorFilesLines
2023-10-06NFSv4: Fix a state manager thread deadlock regressionTrond Myklebust2-13/+29
commit 956fd46f97d238032cb5fa4771cdaccc6e760f9a upstream. Commit 4dc73c679114 reintroduces the deadlock that was fixed by commit aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock") because it prevents the setup of new threads to handle reboot recovery, while the older recovery thread is stuck returning delegations. Fixes: 4dc73c679114 ("NFSv4: keep state manager thread active if swap is enabled") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-06NFSv4.1: fix zero value filehandle in post open getattrOlga Kornievskaia1-1/+5
[ Upstream commit 4506f23e117161a20104c8fa04f33e1ca63c26af ] Currently, if the OPEN compound experiencing an error and needs to get the file attributes separately, it will send a stand alone GETATTR but it would use the filehandle from the results of the OPEN compound. In case of the CLAIM_FH OPEN, nfs_openres's fh is zero value. That generate a GETATTR that's sent with a zero value filehandle, and results in the server returning an error. Instead, for the CLAIM_FH OPEN, take the filehandle that was used in the PUTFH of the OPEN compound. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06NFSv4.1: fix pnfs MDS=DS session trunkingOlga Kornievskaia1-1/+5
[ Upstream commit 806a3bc421a115fbb287c1efce63a48c54ee804b ] Currently, when GETDEVICEINFO returns multiple locations where each is a different IP but the server's identity is same as MDS, then nfs4_set_ds_client() finds the existing nfs_client structure which has the MDS's max_connect value (and if it's 1), then the 1st IP on the DS's list will get dropped due to MDS trunking rules. Other IPs would be added as they fall under the pnfs trunking rules. For the list of IPs the 1st goes thru calling nfs4_set_ds_client() which will eventually call nfs4_add_trunk() and call into rpc_clnt_test_and_add_xprt() which has the check for MDS trunking. The other IPs (after the 1st one), would call rpc_clnt_add_xprt() which doesn't go thru that check. nfs4_add_trunk() is called when MDS trunking is happening and it needs to enforce the usage of max_connect mount option of the 1st mount. However, this shouldn't be applied to pnfs flow. Instead, this patch proposed to treat MDS=DS as DS trunking and make sure that MDS's max_connect limit does not apply to the 1st IP returned in the GETDEVICEINFO list. It does so by marking the newly created client with a new flag NFS_CS_PNFS which then used to pass max_connect value to use into the rpc_clnt_test_and_add_xprt() instead of the existing rpc client's max_connect value set by the MDS connection. For example, mount was done without max_connect value set so MDS's rpc client has cl_max_connect=1. Upon calling into rpc_clnt_test_and_add_xprt() and using rpc client's value, the caller passes in max_connect value which is previously been set in the pnfs path (as a part of handling GETDEVICEINFO list of IPs) in nfs4_set_ds_client(). However, when NFS_CS_PNFS flag is not set and we know we are doing MDS trunking, comparing a new IP of the same server, we then set the max_connect value to the existing MDS's value and pass that into rpc_clnt_test_and_add_xprt(). Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS serverOlga Kornievskaia2-0/+7
[ Upstream commit 51d674a5e4889f1c8e223ac131cf218e1631e423 ] After receiving the location(s) of the DS server(s) in the GETDEVINCEINFO, create the request for the clientid to such server and indicate that the client is connecting to a DS. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Stable-dep-of: 806a3bc421a1 ("NFSv4.1: fix pnfs MDS=DS session trunking") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06NFS/pNFS: Report EINVAL errors from connect() to the serverTrond Myklebust1-0/+1
[ Upstream commit dd7d7ee3ba2a70d12d02defb478790cf57d5b87b ] With IPv6, connect() can occasionally return EINVAL if a route is unavailable. If this happens during I/O to a data server, we want to report it using LAYOUTERROR as an inability to connect. Fixes: dd52128afdde ("NFSv4.1/pnfs Ensure flexfiles reports all connection related errors") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06NFS: More fixes for nfs_direct_write_reschedule_io()Trond Myklebust1-6/+11
[ Upstream commit b11243f720ee5f9376861099019c8542969b6318 ] Ensure that all requests are put back onto the commit list so that they can be rescheduled. Fixes: 4daaeba93822 ("NFS: Fix nfs_direct_write_reschedule_io()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06NFS: Use the correct commit info in nfs_join_page_group()Trond Myklebust2-14/+17
[ Upstream commit b193a78ddb5ee7dba074d3f28dc050069ba083c0 ] Ensure that nfs_clear_request_commit() updates the correct counters when it removes them from the commit list. Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06NFS: More O_DIRECT accounting fixes for error pathsTrond Myklebust1-16/+31
[ Upstream commit 8982f7aff39fb526aba4441fff2525fcedd5e1a3 ] If we hit a fatal error when retransmitting, we do need to record the removal of the request from the count of written bytes. Fixes: 031d73ed768a ("NFS: Fix O_DIRECT accounting of number of bytes read/written") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06NFS: Fix O_DIRECT locking issuesTrond Myklebust1-4/+4
[ Upstream commit 7c6339322ce0c6128acbe36aacc1eeb986dd7bf1 ] The dreq fields are protected by the dreq->lock. Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06NFS: Fix error handling for O_DIRECT write schedulingTrond Myklebust1-16/+46
[ Upstream commit 954998b60caa8f2a3bf3abe490de6f08d283687a ] If we fail to schedule a request for transmission, there are 2 possibilities: 1) Either we hit a fatal error, and we just want to drop the remaining requests on the floor. 2) We were asked to try again, in which case we should allow the outstanding RPC calls to complete, so that we can recoalesce requests and try again. Fixes: d600ad1f2bdb ("NFS41: pop some layoutget errors to application") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_infoFedor Pchelkin1-1/+1
commit 96562c45af5c31b89a197af28f79bfa838fb8391 upstream. It is an almost improbable error case but when page allocating loop in nfs4_get_device_info() fails then we should only free the already allocated pages, as __free_page() can't deal with NULL arguments. Found by Linux Verification Center (linuxtesting.org). Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-19NFS: Fix a potential data corruptionTrond Myklebust1-1/+19
commit 88975a55969e11f26fe3846bf4fbf8e7dc8cbbd4 upstream. We must ensure that the subrequests are joined back into the head before we can retransmit a request. If the head was not on the commit lists, because the server wrote it synchronously, we still need to add it back to the retransmission list. Add a call that mirrors the effect of nfs_cancel_remove_inode() for O_DIRECT. Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13pNFS: Fix assignment of xprtdata.credAnna Schumaker1-1/+1
[ Upstream commit c4a123d2e8c4dc91d581ee7d05c0cd51a0273fab ] The comma at the end of the line was leftover from an earlier refactor of the _nfs4_pnfs_v3_ds_connect() function. This is technically valid C, so the compilers didn't catch it, but if I'm understanding how it works correctly it assigns the return value of rpc_clnt_add_xprtr() to xprtdata.cred. Reported-by: Olga Kornievskaia <kolga@netapp.com> Fixes: a12f996d3413 ("NFSv4/pNFS: Use connections to a DS that are all of the same protocol family") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13NFSv4.2: fix handling of COPY ERR_OFFLOAD_NO_REQOlga Kornievskaia1-2/+3
[ Upstream commit 5690eed941ab7e33c3c3d6b850100cabf740f075 ] If the client sent a synchronous copy and the server replied with ERR_OFFLOAD_NO_REQ indicating that it wants an asynchronous copy instead, the client should retry with asynchronous copy. Fixes: 539f57b3e0fd ("NFS handle COPY ERR_OFFLOAD_NO_REQS") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13NFS: Guard against READDIR loop when entry names exceed MAXNAMELENBenjamin Coddington2-2/+2
[ Upstream commit f67b55b6588bcf9316a1e6e8d529100a5aa3ebe6 ] Commit 64cfca85bacd asserts the only valid return values for nfs2/3_decode_dirent should not include -ENAMETOOLONG, but for a server that sends a filename3 which exceeds MAXNAMELEN in a READDIR response the client's behavior will be to endlessly retry the operation. We could map -ENAMETOOLONG into -EBADCOOKIE, but that would produce truncated listings without any error. The client should return an error for this case to clearly assert that the server implementation must be corrected. Fixes: 64cfca85bacd ("NFS: Return valid errors from nfs2/3_decode_dirent()") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13nfs/blocklayout: Use the passed in gfp flagsDan Carpenter1-2/+2
[ Upstream commit 08b45fcb2d4675f6182fe0edc0d8b1fe604051fa ] This allocation should use the passed in GFP_ flags instead of GFP_KERNEL. One places where this matters is in filelayout_pg_init_write() which uses GFP_NOFS as the allocation flags. Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13NFSv4.2: Rework scratch handling for READ_PLUS (again)Anna Schumaker5-13/+14
[ Upstream commit 303a78052091c81e9003915c521fdca1c7e117af ] I found that the read code might send multiple requests using the same nfs_pgio_header, but nfs4_proc_read_setup() is only called once. This is how we ended up occasionally double-freeing the scratch buffer, but also means we set a NULL pointer but non-zero length to the xdr scratch buffer. This results in an oops the first time decoding needs to copy something to scratch, which frequently happens when decoding READ_PLUS hole segments. I fix this by moving scratch handling into the pageio read code. I provide a function to allocate scratch space for decoding read replies, and free the scratch buffer when the nfs_pgio_header is freed. Fixes: fbd2a05f29a9 (NFSv4.2: Rework scratch handling for READ_PLUS) Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13NFSv4.2: Fix READ_PLUS size calculationsAnna Schumaker1-3/+9
[ Upstream commit 8d18f6c5bb864d97a730f471c56cdecf313efe64 ] I bump the decode_read_plus_maxsz to account for hole segments, but I need to subtract out this increase when calling rpc_prepare_reply_pages() so the common case of single data segment replies can be directly placed into the xdr pages without needing to be shifted around. Reported-by: Chuck Lever <chuck.lever@oracle.com> Fixes: d3b00a802c845 ("NFS: Replace the READ_PLUS decoding code") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13NFSv4.2: Fix READ_PLUS smatch warningsAnna Schumaker1-2/+1
[ Upstream commit bb05a617f06b7a882e19c4f475b8e37f14d9ceac ] Smatch reports: fs/nfs/nfs42xdr.c:1131 decode_read_plus() warn: missing error code? 'status' Which Dan suggests to fix by doing a hardcoded "return 0" from the "if (segments == 0)" check. Additionally, smatch reports that the "status = -EIO" assignment is not used. This patch addresses both these issues. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202305222209.6l5VM2lL-lkp@intel.com/ Fixes: d3b00a802c845 ("NFS: Replace the READ_PLUS decoding code") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-22Merge tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds4-18/+31
Pull NFS client fixes from Trond Myklebust: - fix a use after free in nfs_direct_join_group() (Cc: stable) - fix sysfs server name memory leak - fix lock recovery hang in NFSv4.0 - fix page free in the error path for nfs42_proc_getxattr() and __nfs4_get_acl_uncached() - SUNRPC/rdma: fix receive buffer dma-mapping after a server disconnect * tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: xprtrdma: Remap Receive buffers after a reconnect NFSv4: fix out path in __nfs4_get_acl_uncached NFSv4.2: fix error handling in nfs42_proc_getxattr NFS: Fix sysfs server name memory leak NFS: Fix a use after free in nfs_direct_join_group() NFSv4: Fix dropped lock for racing OPEN and delegation return
2023-08-19NFSv4: fix out path in __nfs4_get_acl_uncachedFedor Pchelkin1-3/+2
Another highly rare error case when a page allocating loop (inside __nfs4_get_acl_uncached, this time) is not properly unwound on error. Since pages array is allocated being uninitialized, need to free only lower array indices. NULL checks were useful before commit 62a1573fcf84 ("NFSv4 fix acl retrieval over krb5i/krb5p mounts") when the array had been initialized to zero on stack. Found by Linux Verification Center (linuxtesting.org). Fixes: 62a1573fcf84 ("NFSv4 fix acl retrieval over krb5i/krb5p mounts") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-08-19NFSv4.2: fix error handling in nfs42_proc_getxattrFedor Pchelkin1-3/+2
There is a slight issue with error handling code inside nfs42_proc_getxattr(). If page allocating loop fails then we free the failing page array element which is NULL but __free_page() can't deal with NULL args. Found by Linux Verification Center (linuxtesting.org). Fixes: a1f26739ccdc ("NFSv4.2: improve page handling for GETXATTR") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-08-19NFS: Fix sysfs server name memory leakBenjamin Coddington1-1/+3
Free the formatted server index string after it has been duplicated by kobject_rename(). Fixes: 1c7251187dc0 ("NFS: add superblock sysfs entries") Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-08-17NFS: Fix a use after free in nfs_direct_join_group()Trond Myklebust1-10/+16
Be more careful when tearing down the subrequests of an O_DIRECT write as part of a retransmission. Reported-by: Chris Mason <clm@fb.com> Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-07-02NFSv4: Fix dropped lock for racing OPEN and delegation returnBenjamin Coddington1-1/+8
Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return") attempted to solve this problem by using nfs4's generic async error handling, but introduced a regression where v4.0 lock recovery would hang. The additional complexity introduced by overloading that error handling is not necessary for this case. This patch expects that commit to be reverted. The problem as originally explained in the above commit is: There's a small window where a LOCK sent during a delegation return can race with another OPEN on client, but the open stateid has not yet been updated. In this case, the client doesn't handle the OLD_STATEID error from the server and will lose this lock, emitting: "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024". Fix this by using the old_stateid refresh helpers if the server replies with OLD_STATEID. Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-07-02Merge tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds14-393/+767
Pull NFS client updates from Trond Myklebust: "Stable fixes and other bugfixes: - nfs: don't report STATX_BTIME in ->getattr - Revert 'NFSv4: Retry LOCK on OLD_STATEID during delegation return' since it breaks NFSv4 state recovery. - NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION - Fix the NFSv4.2 xattr cache shrinker_id - Force a ctime update after a NFSv4.2 SETXATTR call Features and cleanups: - NFS and RPC over TLS client code from Chuck Lever - Support for use of abstract unix socket addresses with the rpcbind daemon - Sysfs API to allow shutdown of the kernel RPC client and prevent umount() hangs if the server is known to be permanently down - XDR cleanups from Anna" * tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (33 commits) Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" NFS: Don't cleanup sysfs superblock entry if uninitialized nfs: don't report STATX_BTIME in ->getattr NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION NFSv4.2: fix wrong shrinker_id NFSv4: Clean up some shutdown loops NFS: Cancel all existing RPC tasks when shutdown NFS: add sysfs shutdown knob NFS: add a sysfs link to the acl rpc_client NFS: add a sysfs link to the lockd rpc_client NFS: Add sysfs links to sunrpc clients for nfs_clients NFS: add superblock sysfs entries NFS: Make all of /sys/fs/nfs network-namespace unique NFS: Open-code the nfs_kset kset_create_and_add() NFS: rename nfs_client_kobj to nfs_net_kobj NFS: rename nfs_client_kset to nfs_kset NFS: Add an "xprtsec=" NFS mount option NFS: Have struct nfs_client carry a TLS policy field SUNRPC: Add a TCP-with-TLS RPC transport class SUNRPC: Capture CMSG metadata on client-side receive ...
2023-06-29Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return"Benjamin Coddington1-4/+2
Olga Kornievskaia reports that this patch breaks NFSv4.0 state recovery. It also introduces additional complexity in the error paths for cases not related to the original problem. Let's revert it for now, and address the original problem in another manner. This reverts commit f5ea16137a3fa2858620dc9084466491c128535f. Fixes: f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return") Reported-by: Kornievskaia, Olga <Olga.Kornievskaia@netapp.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-29NFS: Don't cleanup sysfs superblock entry if uninitializedBenjamin Coddington1-2/+4
Its possible to end up in nfs_free_server() before the server's superblock sysfs entry has been initialized, in which case calling kobject_put() will emit a WARNING. Check if the kobject has been initialized before cleaning it up. Fixes: 1c7251187dc0 ("NFS: add superblock sysfs entries") Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds1-5/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-28Merge tag 'hardening-v6.5-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "There are three areas of note: A bunch of strlcpy()->strscpy() conversions ended up living in my tree since they were either Acked by maintainers for me to carry, or got ignored for multiple weeks (and were trivial changes). The compiler option '-fstrict-flex-arrays=3' has been enabled globally, and has been in -next for the entire devel cycle. This changes compiler diagnostics (though mainly just -Warray-bounds which is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_ coverage. In other words, there are no new restrictions, just potentially new warnings. Any new FORTIFY warnings we've seen have been fixed (usually in their respective subsystem trees). For more details, see commit df8fc4e934c12b. The under-development compiler attribute __counted_by has been added so that we can start annotating flexible array members with their associated structure member that tracks the count of flexible array elements at run-time. It is possible (likely?) that the exact syntax of the attribute will change before it is finalized, but GCC and Clang are working together to sort it out. Any changes can be made to the macro while we continue to add annotations. As an example of that last case, I have a treewide commit waiting with such annotations found via Coccinelle: https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b Also see commit dd06e72e68bcb4 for more details. Summary: - Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko) - Convert strreplace() to return string start (Andy Shevchenko) - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook) - Add missing function prototypes seen with W=1 (Arnd Bergmann) - Fix strscpy() kerndoc typo (Arne Welzel) - Replace strlcpy() with strscpy() across many subsystems which were either Acked by respective maintainers or were trivial changes that went ignored for multiple weeks (Azeem Shaikh) - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers) - Add KUnit tests for strcat()-family - Enable KUnit tests of FORTIFY wrappers under UML - Add more complete FORTIFY protections for strlcat() - Add missed disabling of FORTIFY for all arch purgatories. - Enable -fstrict-flex-arrays=3 globally - Tightening UBSAN_BOUNDS when using GCC - Improve checkpatch to check for strcpy, strncpy, and fake flex arrays - Improve use of const variables in FORTIFY - Add requested struct_size_t() helper for types not pointers - Add __counted_by macro for annotating flexible array size members" * tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits) netfilter: ipset: Replace strlcpy with strscpy uml: Replace strlcpy with strscpy um: Use HOST_DIR for mrproper kallsyms: Replace all non-returning strlcpy with strscpy sh: Replace all non-returning strlcpy with strscpy of/flattree: Replace all non-returning strlcpy with strscpy sparc64: Replace all non-returning strlcpy with strscpy Hexagon: Replace all non-returning strlcpy with strscpy kobject: Use return value of strreplace() lib/string_helpers: Change returned value of the strreplace() jbd2: Avoid printing outside the boundary of the buffer checkpatch: Check for 0-length and 1-element arrays riscv/purgatory: Do not use fortified string functions s390/purgatory: Do not use fortified string functions x86/purgatory: Do not use fortified string functions acpi: Replace struct acpi_table_slit 1-element array with flex-array clocksource: Replace all non-returning strlcpy with strscpy string: use __builtin_memcpy() in strlcpy/strlcat staging: most: Replace all non-returning strlcpy with strscpy drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy ...
2023-06-26Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds1-4/+6
Pull block updates from Jens Axboe: - NVMe pull request via Keith: - Various cleanups all around (Irvin, Chaitanya, Christophe) - Better struct packing (Christophe JAILLET) - Reduce controller error logs for optional commands (Keith) - Support for >=64KiB block sizes (Daniel Gomez) - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner) - bcache updates via Coly: - Fix a race at init time (Mingzhe Zou) - Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye) - use page pinning in the block layer for dio (David) - convert old block dio code to page pinning (David, Christoph) - cleanups for pktcdvd (Andy) - cleanups for rnbd (Guoqing) - use the unchecked __bio_add_page() for the initial single page additions (Johannes) - fix overflows in the Amiga partition handling code (Michael) - improve mq-deadline zoned device support (Bart) - keep passthrough requests out of the IO schedulers (Christoph, Ming) - improve support for flush requests, making them less special to deal with (Christoph) - add bdev holder ops and shutdown methods (Christoph) - fix the name_to_dev_t() situation and use cases (Christoph) - decouple the block open flags from fmode_t (Christoph) - ublk updates and cleanups, including adding user copy support (Ming) - BFQ sanity checking (Bart) - convert brd from radix to xarray (Pankaj) - constify various structures (Thomas, Ivan) - more fine grained persistent reservation ioctl capability checks (Jingbo) - misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan, Jordy, Li, Min, Yu, Zhong, Waiman) * tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits) scsi/sg: don't grab scsi host module reference ext4: Fix warning in blkdev_put() block: don't return -EINVAL for not found names in devt_from_devname cdrom: Fix spectre-v1 gadget block: Improve kernel-doc headers blk-mq: don't insert passthrough request into sw queue bsg: make bsg_class a static const structure ublk: make ublk_chr_class a static const structure aoe: make aoe_class a static const structure block/rnbd: make all 'class' structures const block: fix the exclusive open mask in disk_scan_partitions block: add overflow checks for Amiga partition support block: change all __u32 annotations to __be32 in affs_hardblocks.h block: fix signed int overflow in Amiga partition support block: add capacity validation in bdev_add_partition() block: fine-granular CAP_SYS_ADMIN for Persistent Reservation block: disallow Persistent Reservation on partitions reiserfs: fix blkdev_put() warning from release_journal_dev() block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions() block: document the holder argument to blkdev_get_by_path ...
2023-06-26Merge tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds3-2/+25
Pull splice updates from Jens Axboe: "This kills off ITER_PIPE to avoid a race between truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory corruption. Instead, we either use (a) filemap_splice_read(), which invokes the buffered file reading code and splices from the pagecache into the pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads into it and then pushes the filled pages into the pipe; or (c) handle it in filesystem-specific code. Summary: - Rename direct_splice_read() to copy_splice_read() - Simplify the calculations for the number of pages to be reclaimed in copy_splice_read() - Turn do_splice_to() into a helper, vfs_splice_read(), so that it can be used by overlayfs and coda to perform the checks on the lower fs - Make vfs_splice_read() jump to copy_splice_read() to handle direct-I/O and DAX - Provide shmem with its own splice_read to handle non-existent pages in the pagecache. We don't want a ->read_folio() as we don't want to populate holes, but filemap_get_pages() requires it - Provide overlayfs with its own splice_read to call down to a lower layer as overlayfs doesn't provide ->read_folio() - Provide coda with its own splice_read to call down to a lower layer as coda doesn't provide ->read_folio() - Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs and random files as they just copy to the output buffer and don't splice pages - Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3, ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation - Make cifs use filemap_splice_read() - Replace pointers to generic_file_splice_read() with pointers to filemap_splice_read() as DIO and DAX are handled in the caller; filesystems can still provide their own alternate ->splice_read() op - Remove generic_file_splice_read() - Remove ITER_PIPE and its paraphernalia as generic_file_splice_read was the only user" * tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-06-20nfs: don't report STATX_BTIME in ->getattrJeff Layton1-1/+1
NFS doesn't properly support reporting the btime in getattr (yet), but 61a968b4f05e mistakenly added it to the request_mask. This causes statx for STATX_BTIME to report a zeroed out btime instead of properly clearing the flag. Cc: stable@vger.kernel.org # v6.3+ Fixes: 61a968b4f05e ("nfs: report the inode version in getattr if requested") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2214134 Reported-by: Boyang Xue <bxue@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSIONOlga Kornievskaia1-0/+1
When the client received NFS4ERR_BADSESSION, it schedules recovery and start the state manager thread which in turn freezes the session table and does not allow for any new requests to use the no-longer valid session. However, it is possible that before the state manager thread runs, a new operation would use the released slot that received BADSESSION and was therefore not updated its sequence number. Such re-use of the slot can lead the application errors. Fixes: 5c441544f045 ("NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFSv4.2: fix wrong shrinker_idQi Zheng1-35/+44
Currently, the list_lru::shrinker_id corresponding to the nfs4_xattr shrinkers is wrong: >>> prog["nfs4_xattr_cache_lru"].shrinker_id (int)0 >>> prog["nfs4_xattr_entry_lru"].shrinker_id (int)0 >>> prog["nfs4_xattr_large_entry_lru"].shrinker_id (int)0 >>> prog["nfs4_xattr_cache_shrinker"].id (int)18 >>> prog["nfs4_xattr_entry_shrinker"].id (int)19 >>> prog["nfs4_xattr_large_entry_shrinker"].id (int)20 This is not what we expect, which will cause these shrinkers not to be found in shrink_slab_memcg(). We should assign shrinker::id before calling list_lru_init_memcg(), so that the corresponding list_lru::shrinker_id will be assigned the correct value like below: >>> prog["nfs4_xattr_cache_lru"].shrinker_id (int)16 >>> prog["nfs4_xattr_entry_lru"].shrinker_id (int)17 >>> prog["nfs4_xattr_large_entry_lru"].shrinker_id (int)18 >>> prog["nfs4_xattr_cache_shrinker"].id (int)16 >>> prog["nfs4_xattr_entry_shrinker"].id (int)17 >>> prog["nfs4_xattr_large_entry_shrinker"].id (int)18 So just do it. Fixes: 95ad37f90c33 ("NFSv4.2: add client side xattr caching.") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFSv4: Clean up some shutdown loopsBenjamin Coddington2-1/+4
If a SEQUENCE call receives -EIO for a shutdown client, it will retry the RPC call. Instead of doing that for a shutdown client, just bail out. Likewise, if the state manager decides to perform recovery for a shutdown client, it will continuously retry. As above, just bail out. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: Cancel all existing RPC tasks when shutdownBenjamin Coddington1-4/+15
Walk existing RPC tasks and cancel them with -EIO when the client is shutdown. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: add sysfs shutdown knobBenjamin Coddington1-1/+53
Within each nfs_server sysfs tree, add an entry named "shutdown". Writing 1 to this file will set the cl_shutdown bit on the rpc_clnt structs associated with that mount. If cl_shutdown is set, the task scheduler immediately returns -EIO for new tasks. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: add a sysfs link to the acl rpc_clientBenjamin Coddington1-0/+4
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: add a sysfs link to the lockd rpc_clientBenjamin Coddington1-0/+1
After lockd is started, add a symlink for lockd's rpc_client under NFS' superblock sysfs. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: Add sysfs links to sunrpc clients for nfs_clientsBenjamin Coddington4-0/+28
For the general and state management nfs_client under each mount, create symlinks to their respective rpc_client sysfs entries. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: add superblock sysfs entriesBenjamin Coddington5-1/+88
Create a sysfs directory for each mount that corresponds to the mount's nfs_server struct. As the mount is being constructed, use the name "server-n", but rename it to the "MAJOR:MINOR" of the mount after assigning a device_id. The rename approach allows us to populate the mount's directory with links to the various rpc_client objects during the mount's construction. The naming convention (MAJOR:MINOR) can be used to reference a particular NFS mount's sysfs tree. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: Make all of /sys/fs/nfs network-namespace uniqueBenjamin Coddington2-37/+33
Expand the NFS network-namespaced sysfs from /sys/fs/nfs/net down one level into /sys/fs/nfs by moving the "net" kobject onto struct nfs_netns_client and setting it up during network namespace init. This prepares the way for superblock kobjects within /sys/fs/nfs that will only be visible to matching network namespaces. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: Open-code the nfs_kset kset_create_and_add()Benjamin Coddington1-2/+32
In preparation to make objects below /sys/fs/nfs namespace aware, we need to define our own kobj_type for the nfs kset so that we can add the .child_ns_type member in a following patch. No functional change here, only the unrolling of kset_create_and_add(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: rename nfs_client_kobj to nfs_net_kobjBenjamin Coddington2-6/+6
Match the variable names to the sysfs structure. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: rename nfs_client_kset to nfs_ksetBenjamin Coddington1-8/+8
Be brief and match the subsystem name. There's no need to distinguish this kset variable from the server. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: Add an "xprtsec=" NFS mount optionChuck Lever6-15/+102
After some discussion, we decided that controlling transport layer security policy should be separate from the setting for the user authentication flavor. To accomplish this, add a new NFS mount option to select a transport layer security policy for RPC operations associated with the mount point. xprtsec=none - Transport layer security is forced off. xprtsec=tls - Establish an encryption-only TLS session. If the initial handshake fails, the mount fails. If TLS is not available on a reconnect, drop the connection and try again. xprtsec=mtls - Both sides authenticate and an encrypted session is created. If the initial handshake fails, the mount fails. If TLS is not available on a reconnect, drop the connection and try again. To support client peer authentication (mtls), the handshake daemon will have configurable default authentication material (certificate or pre-shared key). In the future, mount options can be added that can provide this material on a per-mount basis. Updates to mount.nfs (to support xprtsec=auto) and nfs(5) will be sent under separate cover. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: Have struct nfs_client carry a TLS policy fieldChuck Lever4-5/+25
The new field is used to match struct nfs_clients that have the same TLS policy setting. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: Improvements for fs_context-related tracepointsChuck Lever1-0/+5
Add some missing observability to the fs_context tracepoints added by commit 33ce83ef0bb0 ("NFS: Replace fs_context-related dprintk() call sites with tracepoints"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFSv4.2: SETXATTR should update ctimeAnna Schumaker2-7/+29
Otherwise, `stat` will report a stale value to users. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>