starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2023-08-30	nfsd: Simplify code around svc_exit_thread() call in nfsd()	NeilBrown	1	-23/+0
	Previously a thread could exit asynchronously (due to a signal) so some care was needed to hold nfsd_mutex over the last svc_put() call. Now a thread can only exit when svc_set_num_threads() is called, and this is always called under nfsd_mutex. So no care is needed. Not only is the mutex held when a thread exits now, but the svc refcount is elevated, so the svc_put() in svc_exit_thread() will never be a final put, so the mutex isn't even needed at this point in the code. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	nfsd: don't allow nfsd threads to be signalled.	NeilBrown	3	-23/+3
	The original implementation of nfsd used signals to stop threads during shutdown. In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads internally it if was asked to run "0" threads. After this user-space transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to threads was no longer an important part of the API. In commit 3ebdbe5203a8 ("SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the use of signals for stopping threads, using kthread_stop() instead. This patch makes the "obvious" next step and removes the ability to signal nfsd threads - or any svc threads. nfsd stops allowing signals and we don't check for their delivery any more. This will allow for some simplification in later patches. A change worth noting is in nfsd4_ssc_setup_dul(). There was previously a signal_pending() check which would only succeed when the thread was being shut down. It should really have tested kthread_should_stop() as well. Now it just does the latter, not the former. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	lockd: remove SIGKILL handling	NeilBrown	1	-24/+0
	lockd allows SIGKILL and responds by dropping all locks and restarting the grace period. This functionality has been present since 2.1.32 when lockd was added to Linux. This functionality is undocumented and most likely added as a useful debug aid. When there is a need to drop locks, the better approach is to use /proc/fs/nfsd/unlock_*. This patch removes SIGKILL handling as part of preparation for removing all signal handling from sunrpc service threads. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	fs: lockd: avoid possible wrong NULL parameter	Su Hui	1	-0/+3
	clang's static analysis warning: fs/lockd/mon.c: line 293, column 2: Null pointer passed as 2nd argument to memory copy function. Assuming 'hostname' is NULL and calling 'nsm_create_handle()', this will pass NULL as 2nd argument to memory copy function 'memcpy()'. So return NULL if 'hostname' is invalid. Fixes: 77a3ef33e2de ("NSM: More clean up of nsm_get_handle()") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	exportfs: remove kernel-doc warnings in exportfs	Zhu Wang	1	-0/+1
	Remove kernel-doc warning in exportfs: fs/exportfs/expfs.c:395: warning: Function parameter or member 'parent' not described in 'exportfs_encode_inode_fh' Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	nfsd: inherit required unset default acls from effective set	Jeff Layton	1	-5/+29
	A well-formed NFSv4 ACL will always contain OWNER@/GROUP@/EVERYONE@ ACEs, but there is no requirement for inheritable entries for those entities. POSIX ACLs must always have owner/group/other entries, even for a default ACL. nfsd builds the default ACL from inheritable ACEs, but the current code just leaves any unspecified ACEs zeroed out. The result is that adding a default user or group ACE to an inode can leave it with unwanted deny entries. For instance, a newly created directory with no acl will look something like this: # NFSv4 translation by server A::OWNER@:rwaDxtTcCy A::GROUP@:rxtcy A::EVERYONE@:rxtcy # POSIX ACL of underlying file user::rwx group::r-x other::r-x ...if I then add new v4 ACE: nfs4_setfacl -a A:fd:1000:rwx /mnt/local/test ...I end up with a result like this today: user::rwx user:1000:rwx group::r-x mask::rwx other::r-x default:user::--- default:user:1000:rwx default:group::--- default:mask::rwx default:other::--- A::OWNER@:rwaDxtTcCy A::1000:rwaDxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy D:fdi:OWNER@:rwaDx A:fdi:OWNER@:tTcCy A:fdi:1000:rwaDxtcy A:fdi:GROUP@:tcy A:fdi:EVERYONE@:tcy ...which is not at all expected. Adding a single inheritable allow ACE should not result in everyone else losing access. The setfacl command solves a silimar issue by copying owner/group/other entries from the effective ACL when none of them are set: "If a Default ACL entry is created, and the Default ACL contains no owner, owning group, or others entry, a copy of the ACL owner, owning group, or others entry is added to the Default ACL. Having nfsd do the same provides a more sane result (with no deny ACEs in the resulting set): user::rwx user:1000:rwx group::r-x mask::rwx other::r-x default:user::rwx default:user:1000:rwx default:group::r-x default:mask::rwx default:other::r-x A::OWNER@:rwaDxtTcCy A::1000:rwaDxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy A:fdi:OWNER@:rwaDxtTcCy A:fdi:1000:rwaDxtcy A:fdi:GROUP@:rxtcy A:fdi:EVERYONE@:rxtcy Reported-by: Ondrej Valousek <ondrej.valousek@diasemi.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2136452 Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	lockd: nlm_blocked list race fixes	Alexander Aring	1	-1/+12
	This patch fixes races when lockd accesses the global nlm_blocked list. It was mostly safe to access the list because everything was accessed from the lockd kernel thread context but there exist cases like nlmsvc_grant_deferred() that could manipulate the nlm_blocked list and it can be called from any context. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	nfsd: set missing after_change as before_change + 1	Jeff Layton	1	-1/+1
	In the event that we can't fetch post_op_attr attributes, we still need to set a value for the after_change. The operation has already happened, so we're not able to return an error at that point, but we do want to ensure that the client knows that its cache should be invalidated. If we weren't able to fetch post-op attrs, then just set the after_change to before_change + 1. The atomic flag should already be clear in this case. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	nfsd: remove unsafe BUG_ON from set_change_info	Jeff Layton	2	-11/+32
	At one time, nfsd would scrape inode information directly out of struct inode in order to populate the change_info4. At that time, the BUG_ON in set_change_info made some sense, since having it unset meant a coding error. More recently, it calls vfs_getattr to get this information, which can fail. If that fails, fh_pre_saved can end up not being set. While this situation is unfortunate, we don't need to crash the box. Move set_change_info to nfs4proc.c since all of the callers are there. Revise the condition for setting "atomic" to also check for fh_pre_saved. Drop the BUG_ON and just have it zero out both change_attr4s when this occurs. Reported-by: Boyang Xue <bxue@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2223560 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	nfsd: handle failure to collect pre/post-op attrs more sanely	Jeff Layton	5	-37/+65
	Collecting pre_op_attrs can fail, in which case it's probably best to fail the whole operation. Change fh_fill_pre_attrs and fh_fill_both_attrs to return __be32, and have the callers check the return code and abort the operation if it's not nfs_ok. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	nfsd: add a MODULE_DESCRIPTION	Jeff Layton	1	-0/+1
	I got this today from modpost: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfsd/nfsd.o Add a module description. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	NFSD: Rename struct svc_cacherep	Chuck Lever	4	-28/+28
	The svc_ prefix is identified with the SunRPC layer. Although the duplicate reply cache caches RPC replies, it is only for the NFS protocol. Rename the struct to better reflect its purpose. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	NFSD: Remove svc_rqst::rq_cacherep	Chuck Lever	3	-11/+16
	Over time I'd like to see NFS-specific fields moved out of struct svc_rqst, which is an RPC layer object. These fields are layering violations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	NFSD: Refactor the duplicate reply cache shrinker	Chuck Lever	1	-43/+39
	Avoid holding the bucket lock while freeing cache entries. This change also caps the number of entries that are freed when the shrinker calls to reduce the shrinker's impact on the cache's effectiveness. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	NFSD: Replace nfsd_prune_bucket()	Chuck Lever	2	-15/+85
	Enable nfsd_prune_bucket() to drop the bucket lock while calling kfree(). Use the same pattern that Jeff recently introduced in the NFSD filecache. A few percpu operations are moved outside the lock since they temporarily disable local IRQs which is expensive and does not need to be done while the lock is held. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	NFSD: Rename nfsd_reply_cache_alloc()	Chuck Lever	1	-3/+3
	For readability, rename to match the other helpers. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	NFSD: Refactor nfsd_reply_cache_free_locked()	Chuck Lever	1	-7/+20
	To reduce contention on the bucket locks, we must avoid calling kfree() while each bucket lock is held. Start by refactoring nfsd_reply_cache_free_locked() into a helper that removes an entry from the bucket (and must therefore run under the lock) and a second helper that frees the entry (which does not need to hold the lock). For readability, rename the helpers nfsd_cacherep_<verb>. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	NFSD: Enable write delegation support	Dai Ngo	2	-20/+78
	This patch grants write delegations for OPEN with NFS4_SHARE_ACCESS_WRITE if there is no conflict with other OPENs. Write delegation conflicts with another OPEN, REMOVE, RENAME and SETATTR are handled the same as read delegation using notify_change, try_break_deleg. The NFSv4.0 protocol does not enable a server to determine that a conflicting GETATTR originated from the client holding the delegation versus coming from some other client. With NFSv4.1 and later, the SEQUENCE operation that begins each COMPOUND contains a client ID, so delegation recall can be safely squelched in this case. With NFSv4.0, however, the server must recall or send a CB_GETATTR (per RFC 7530 Section 16.7.5) even when the GETATTR originates from the client holding that delegation. An NFSv4.0 client can trigger a pathological situation if it always sends a DELEGRETURN preceded by a conflicting GETATTR in the same COMPOUND. COMPOUND execution will always stop at the GETATTR and the DELEGRETURN will never get executed. The server eventually revokes the delegation, which can result in loss of open or lock state. Tracepoint added to track whether read or write delegation is granted. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	NFSD: Report zero space limit for write delegations	Chuck Lever	1	-3/+6
	Replace the -1 (no limit) with a zero (no reserved space). This prevents certain non-determinant client behavior, such as silly-renaming a file when the only open reference is a write delegation. Such a rename can leave unexpected .nfs files in a directory that is otherwise supposed to be empty. Note that other server implementations that support write delegation also set this field to zero. Suggested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	NFSD: handle GETATTR conflict with write delegation	Dai Ngo	5	-0/+82
	If the GETATTR request on a file that has write delegation in effect and the request attributes include the change info and size attribute then the write delegation is recalled. If the delegation is returned within 30ms then the GETATTR is serviced as normal otherwise the NFS4ERR_DELAY error is returned for the GETATTR. Add counter for write delegation recall due to conflict GETATTR. This is used to evaluate the need to implement CB_GETATTR to adoid recalling the delegation with conflit GETATTR. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	locks: allow support for write delegation	Dai Ngo	1	-7/+0
	Remove the check for F_WRLCK in generic_add_lease to allow file_lock to be used for write delegation. First consumer is NFSD. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-30	Merge tag 'mm-stable-2023-08-28-18-26' of ↵	Linus Torvalds	36	-306/+306
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ...
2023-08-29	Merge tag 'flex-array-transformations-6.6-rc1' of ↵	Linus Torvalds	4	-10/+11
	git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull flexible-array updates from Gustavo A. R. Silva. * tag 'flex-array-transformations-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: fs: omfs: Use flexible-array member in struct omfs_extent sparc: openpromio: Address -Warray-bounds warning reiserfs: Replace one-element array with flexible-array member
2023-08-29	Merge tag 'v6.6-vfs.super.fixes' of ↵	Linus Torvalds	1	-20/+31
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock fixes from Christian Brauner: "Two follow-up fixes for the super work this cycle: - Move a misplaced lockep assertion before we potentially free the object containing the lock. - Ensure that filesystems which match superblocks in sget{_fc}() based on sb->s_fs_info are guaranteed to see a valid sb->s_fs_info as long as a superblock still appears on the filesystem type's superblock list. What we want as a proper solution for next cycle is to split sb->free_sb() out of sb->kill_sb() so that we can simply call kill_super_notify() after sb->kill_sb() but before sb->free_sb(). Currently, this is lumped together in sb->kill_sb()" * tag 'v6.6-vfs.super.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: super: ensure valid info super: move lockdep assert
2023-08-29	ksmbd: add missing calling smb2_set_err_rsp() on error	Namjae Jeon	1	-0/+1
	If some error happen on smb2_sess_setup(), Need to call smb2_set_err_rsp() to set error response. This patch add missing calling smb2_set_err_rsp() on error. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	ksmbd: replace one-element array with flex-array member in struct smb2_ea_info	Namjae Jeon	2	-2/+2
	UBSAN complains about out-of-bounds array indexes on 1-element arrays in struct smb2_ea_info. UBSAN: array-index-out-of-bounds in fs/smb/server/smb2pdu.c:4335:15 index 1 is out of range for type 'char [1]' CPU: 1 PID: 354 Comm: kworker/1:4 Not tainted 6.5.0-rc4 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 Workqueue: ksmbd-io handle_ksmbd_work [ksmbd] Call Trace: <TASK> __dump_stack linux/lib/dump_stack.c:88 dump_stack_lvl+0x48/0x70 linux/lib/dump_stack.c:106 dump_stack+0x10/0x20 linux/lib/dump_stack.c:113 ubsan_epilogue linux/lib/ubsan.c:217 __ubsan_handle_out_of_bounds+0xc6/0x110 linux/lib/ubsan.c:348 smb2_get_ea linux/fs/smb/server/smb2pdu.c:4335 smb2_get_info_file linux/fs/smb/server/smb2pdu.c:4900 smb2_query_info+0x63ae/0x6b20 linux/fs/smb/server/smb2pdu.c:5275 __process_request linux/fs/smb/server/server.c:145 __handle_ksmbd_work linux/fs/smb/server/server.c:213 handle_ksmbd_work+0x348/0x10b0 linux/fs/smb/server/server.c:266 process_one_work+0x85a/0x1500 linux/kernel/workqueue.c:2597 worker_thread+0xf3/0x13a0 linux/kernel/workqueue.c:2748 kthread+0x2b7/0x390 linux/kernel/kthread.c:389 ret_from_fork+0x44/0x90 linux/arch/x86/kernel/process.c:145 ret_from_fork_asm+0x1b/0x30 linux/arch/x86/entry/entry_64.S:304 </TASK> Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	ksmbd: fix slub overflow in ksmbd_decode_ntlmssp_auth_blob()	Namjae Jeon	1	-0/+3
	If authblob->SessionKey.Length is bigger than session key size(CIFS_KEY_SIZE), slub overflow can happen in key exchange codes. cifs_arc4_crypt copy to session key array from SessionKey from client. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21940 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	ksmbd: fix wrong DataOffset validation of create context	Namjae Jeon	1	-1/+1
	If ->DataOffset of create context is 0, DataBuffer size is not correctly validated. This patch change wrong validation code and consider tag length in request. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21824 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	ksmbd: Fix one kernel-doc comment	Yang Li	1	-1/+0
	Fix one kernel-doc comment to silence the warning: fs/smb/server/smb2pdu.c:4160: warning: Excess function parameter 'infoclass_size' description in 'buffer_check_err' Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	ksmbd: reduce descriptor size if remaining bytes is less than request size	Namjae Jeon	1	-7/+18
	Create 3 kinds of files to reproduce this problem. dd if=/dev/urandom of=127k.bin bs=1024 count=127 dd if=/dev/urandom of=128k.bin bs=1024 count=128 dd if=/dev/urandom of=129k.bin bs=1024 count=129 When copying files from ksmbd share to windows or cifs.ko, The following error message happen from windows client. "The file '129k.bin' is too large for the destination filesystem." We can see the error logs from ksmbd debug prints [48394.611537] ksmbd: RDMA r/w request 0x0: token 0x669d, length 0x20000 [48394.612054] ksmbd: smb_direct: RDMA write, len 0x20000, needed credits 0x1 [48394.612572] ksmbd: filename 129k.bin, offset 131072, len 131072 [48394.614189] ksmbd: nbytes 1024, offset 132096 mincount 0 [48394.614585] ksmbd: Failed to process 8 [-22] And we can reproduce it with cifs.ko, e.g. dd if=129k.bin of=/dev/null bs=128KB count=2 This problem is that ksmbd rdma return error if remaining bytes is less than Length of Buffer Descriptor V1 Structure. smb_direct_rdma_xmit() ... if (desc_buf_len == 0 \|\| total_length > buf_len \|\| total_length > t->max_rdma_rw_size) return -EINVAL; This patch reduce descriptor size with remaining bytes and remove the check for total_length and buf_len. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	ksmbd: fix `force create mode' and `force directory mode'	Atte Heikkilä	1	-18/+11
	`force create mode' and `force directory mode' should be bitwise ORed with the perms after `create mask' and `directory mask' have been applied, respectively. Signed-off-by: Atte Heikkilä <atteh.mailbox@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	ksmbd: fix wrong interim response on compound	Namjae Jeon	4	-26/+26
	If smb2_lock or smb2_open request is compound, ksmbd could send wrong interim response to client. ksmbd allocate new interim buffer instead of using resonse buffer to support compound request. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	ksmbd: add support for read compound	Namjae Jeon	12	-370/+380
	MacOS sends a compound request including read to the server (e.g. open-read-close). So far, ksmbd has not handled read as a compound request. For compatibility between ksmbd and an OS that supports SMB, This patch provides compound support for read requests. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	ksmbd: switch to use kmemdup_nul() helper	Yang Yingliang	1	-3/+1
	Use kmemdup_nul() helper instead of open-coding to simplify the code. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29	jfs: validate max amount of blocks before allocation.	Alexei Filippov	1	-0/+5
	The lack of checking bmp->db_max_freebud in extBalloc() can lead to shift out of bounds, so this patch prevents undefined behavior, because bmp->db_max_freebud == -1 only if there is no free space. Signed-off-by: Aleksei Filippov <halip0503@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+5f088f29593e6b4c8db8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=01abadbd6ae6a08b1f1987aa61554c6b3ac19ff2
2023-08-29	jfs: remove redundant initialization to pointer ip	Colin Ian King	1	-1/+1
	The pointer ip is being initialized with a value that is never read, it is being re-assigned later on. The assignment is redundant and can be removed. Cleans up clang scan warning: fs/jfs/namei.c:886:16: warning: Value stored to 'ip' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-08-29	fuse: conditionally fill kstat in fuse_do_statx()	Bernd Schubert	1	-5/+8
	The code path fuse_update_attributes fuse_update_get_attr fuse_do_statx has the risk to use a NULL pointer for struct kstat *stat, although current callers of fuse_update_attributes() only set request_mask to values that will trigger the call of fuse_do_getattr(), which already handles the NULL pointer. Future updates might miss that fuse_do_statx() does not handle it it is safer to add a condition already right now. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Fixes: d3045530bdd2 ("fuse: implement statx") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-29	super: ensure valid info	Christian Brauner	1	-19/+30
	For keyed filesystems that recycle superblocks based on s_fs_info or information contained therein s_fs_info must be kept as long as the superblock is on the filesystem type super list. This isn't guaranteed as s_fs_info will be freed latest in sb->kill_sb(). The fix is simply to perform notification and list removal in kill_anon_super(). Any filesystem needs to free s_fs_info after they call the kill_*() helpers. If they don't they risk use-after-free right now so fixing it here is guaranteed that s_fs_info remain valid. For block backed filesystems notifying in pass sb->kill_sb() in deactivate_locked_super() remains unproblematic and is required because multiple other block devices can be shut down after kill_block_super() has been called from a filesystem's sb->kill_sb() handler. For example, ext4 and xfs close additional devices. Block based filesystems don't depend on s_fs_info (btrfs does use s_fs_info but also uses kill_anon_super() and not kill_block_super().). Sorry for that braino. Goal should be to unify this behavior during this cycle obviously. But let's please do a simple bugfix now. Fixes: 2c18a63b760a ("super: wait until we passed kill super") Fixes: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reported-by: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com Message-Id: <20230828-vfs-super-fixes-v1-2-b37a4a04a88f@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-29	super: move lockdep assert	Christian Brauner	1	-1/+1
	Fix braino and move the lockdep assertion after put_super() otherwise we risk a use-after-free. Fixes: 2c18a63b760a ("super: wait until we passed kill super") Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230828-vfs-super-fixes-v1-1-b37a4a04a88f@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-28	Merge tag 'pstore-v6.6-rc1' of ↵	Linus Torvalds	5	-346/+137
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: - Greatly simplify compression support (Ard Biesheuvel) - Avoid crashes for corrupted offsets when prz size is 0 (Enlin Mu) - Expand range of usable record sizes (Yuxiao Zhang) - Fix kernel-doc warning (Matthew Wilcox) * tag 'pstore-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: Fix kernel-doc warning pstore: Support record sizes larger than kmalloc() limit pstore/ram: Check start of empty przs during init pstore: Replace crypto API compression with zlib_deflate library calls pstore: Remove worst-case compression size logic
2023-08-28	Merge tag 'for-6.6-tag' of ↵	Linus Torvalds	44	-1617/+2259
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "No new features, the bulk of the changes are fixes, refactoring and cleanups. The notable fix is the scrub performance restoration after rewrite in 6.4, though still only partial. Fixes: - scrub performance drop due to rewrite in 6.4 partially restored: - do IO grouping by blg_plug/blk_unplug again - avoid unnecessary tree searches when processing stripes, in extent and checksum trees - the drop is noticeable on fast PCIe devices, -66% and restored to -33% of the original - backports to 6.4 planned - handle more corner cases of transaction commit during orphan cleanup or delayed ref processing - use correct fsid/metadata_uuid when validating super block - copy directory permissions and time when creating a stub subvolume Core: - debugging feature integrity checker deprecated, to be removed in 6.7 - in zoned mode, zones are activated just before the write, making error handling easier, now the overcommit mechanism can be enabled again which improves performance by avoiding more frequent flushing - v0 extent handling completely removed, deprecated long time ago - error handling improvements - tests: - extent buffer bitmap tests - pinned extent splitting tests - cleanups and refactoring: - compression writeback - extent buffer bitmap - space flushing, ENOSPC handling" * tag 'for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (110 commits) btrfs: zoned: skip splitting and logical rewriting on pre-alloc write btrfs: tests: test invalid splitting when skipping pinned drop extent_map btrfs: tests: add a test for btrfs_add_extent_mapping btrfs: tests: add extent_map tests for dropping with odd layouts btrfs: scrub: move write back of repaired sectors to scrub_stripe_read_repair_worker() btrfs: scrub: don't go ordered workqueue for dev-replace btrfs: scrub: fix grouping of read IO btrfs: scrub: avoid unnecessary csum tree search preparing stripes btrfs: scrub: avoid unnecessary extent tree search preparing stripes btrfs: copy dir permission and time when creating a stub subvolume btrfs: remove pointless empty list check when reading delayed dir indexes btrfs: drop redundant check to use fs_devices::metadata_uuid btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super btrfs: use the correct superblock to compare fsid in btrfs_validate_super btrfs: simplify memcpy either of metadata_uuid or fsid btrfs: add a helper to read the superblock metadata_uuid btrfs: remove v0 extent handling btrfs: output extra debug info if we failed to find an inline backref btrfs: move the !zoned assert into run_delalloc_cow btrfs: consolidate the error handling in run_delalloc_nocow ...
2023-08-28	Merge tag 'affs-for-6.6-tag' of ↵	Linus Torvalds	2	-15/+19
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull affs updates from David Sterba: "Two minor updates for AFFS: - reimplement writepage() address space callback on top of migrate_folio() - fix a build warning, local parameters 'toupper' collide with the standard ctype.h name" * tag 'affs-for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: affs: rename local toupper() to fn() to avoid confusion affs: remove writepage implementation
2023-08-28	Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux	Linus Torvalds	6	-103/+79
	Pull fsverity updates from Eric Biggers: "Several cleanups for fs/verity/, including two commits that make the builtin signature support more cleanly separated from the base feature" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: skip PKCS#7 parser when keyring is empty fsverity: move sysctl registration out of signature.c fsverity: simplify handling of errors during initcall fsverity: explicitly check that there is no algorithm 0
2023-08-28	Merge tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux	Linus Torvalds	9	-200/+446
	Pull iomap updates from Darrick Wong: "We've got some big changes for this release -- I'm very happy to be landing willy's work to enable large folios for the page cache for general read and write IOs when the fs can make contiguous space allocations, and Ritesh's work to track sub-folio dirty state to eliminate the write amplification problems inherent in using large folios. As a bonus, io_uring can now process write completions in the caller's context instead of bouncing through a workqueue, which should reduce io latency dramatically. IOWs, XFS should see a nice performance bump for both IO paths. Summary: - Make large writes to the page cache fill sparse parts of the cache with large folios, then use large memcpy calls for the large folio. - Track the per-block dirty state of each large folio so that a buffered write to a single byte on a large folio does not result in a (potentially) multi-megabyte writeback IO. - Allow some directio completions to be performed in the initiating task's context instead of punting through a workqueue. This will reduce latency for some io_uring requests" * tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits) iomap: support IOCB_DIO_CALLER_COMP io_uring/rw: add write support for IOCB_DIO_CALLER_COMP fs: add IOCB flags related to passing back dio completions iomap: add IOMAP_DIO_INLINE_COMP iomap: only set iocb->private for polled bio iomap: treat a write through cache the same as FUA iomap: use an unsigned type for IOMAP_DIO_* defines iomap: cleanup up iomap_dio_bio_end_io() iomap: Add per-block dirty state tracking to improve performance iomap: Allocate ifs in ->write_begin() early iomap: Refactor iomap_write_delalloc_punch() function out iomap: Use iomap_punch_t typedef iomap: Fix possible overflow condition in iomap_write_delalloc_scan iomap: Add some uptodate state handling helpers for ifs state bitmap iomap: Drop ifs argument from iomap_set_range_uptodate() iomap: Rename iomap_page to iomap_folio_state and others iomap: Copy larger chunks from userspace iomap: Create large folios in the buffered write path filemap: Allow __filemap_get_folio to allocate large folios filemap: Add fgf_t typedef ...
2023-08-28	Merge tag 'erofs-for-6.6-rc1' of ↵	Linus Torvalds	11	-190/+459
	git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, a xattr bloom filter feature is introduced to speed up negative xattr lookups, which was originally suggested by Alexander for Composefs use cases. Additionally, the DEFLATE algorithm is now supported, which can be used together with hardware accelerators for our cloud workloads. Each supported compression algorithm can be selected on a per-file basis for specific access patterns too. There are also some random fixes and cleanups as usual: - Support xattr bloom filter to optimize negative xattr lookups - Support DEFLATE compression algorithm as an alternative - Fix a regression that ztailpacking pclusters don't release properly - Avoid warning dedupe and fragments features anymore - Some folio conversions and cleanups" * tag 'erofs-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: release ztailpacking pclusters properly erofs: don't warn dedupe and fragments features anymore erofs: adapt folios for z_erofs_read_folio() erofs: adapt folios for z_erofs_readahead() erofs: get rid of fe->backmost for cache decompression erofs: drop z_erofs_page_mark_eio() erofs: tidy up z_erofs_do_read_page() erofs: move preparation logic into z_erofs_pcluster_begin() erofs: avoid obsolete {collector,collection} terms erofs: simplify z_erofs_read_fragment() erofs: remove redundant erofs_fs_type declaration in super.c erofs: add necessary kmem_cache_create flags for erofs inode cache erofs: clean up redundant comment and adjust code alignment erofs: refine warning messages for zdata I/Os erofs: boost negative xattr lookup with bloom filter erofs: update on-disk format for xattr name filter erofs: DEFLATE compression support
2023-08-28	Merge tag 'filelock-v6.6' of ↵	Linus Torvalds	1	-5/+22
	git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: - new functionality for F_OFD_GETLK: requesting a type of F_UNLCK will find info about whatever lock happens to be first in the given range, regardless of type. - an OFD lock selftest - bugfix involving a UAF in a tracepoint - comment typo fix * tag 'filelock-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock fs/locks: Fix typo selftests: add OFD lock tests fs/locks: F_UNLCK extension for F_OFD_GETLK
2023-08-28	Merge tag 'v6.6-fs.proc.uapi' of ↵	Linus Torvalds	2	-1/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull procfs fixes from Christian Brauner: "Mode changes to files under /proc/<pid>/ aren't supported ever since commit 6d76fa58b050 ("Don't allow chmod() on the /proc/<pid>/ files"). Due to an oversight in commit 1b3044e39a89 ("procfs: fix pthread cross-thread naming if !PR_DUMPABLE") in switching from REG to NOD, mode changes on /proc/thread-self/comm were accidently allowed. Similar, mode changes for all files beneath /proc/<pid>/net/ are blocked but mode changes on /proc/<pid>/net itself were accidently allowed. Both issues come down to not using the generic proc_setattr() helper which blocks all mode changes. This is rectified with this pull request. This also removes a strange nolibc test that abused /proc/<pid>/net for testing mode changes. Using procfs for this test never made a lot of sense given procfs has special semantics for almost everything anway. Both changes are minor user-visible changes. It is however very unlikely that mode changes on proc/<pid>/net and /proc/thread-self/comm are something that userspace relies on" * tag 'v6.6-fs.proc.uapi' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: procfs: block chmod on /proc/thread-self/comm proc: use generic setattr() for /proc/$PID/net selftests/nolibc: drop test chmod_net
2023-08-28	Merge tag 'v6.6-vfs.autofs' of ↵	Linus Torvalds	1	-2/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull autofs fixes from Christian Brauner: "This fixes a memory leak in autofs reported by syzkaller and a missing conversion from uninterruptible to interruptible wake up when autofs is in catatonic mode" * tag 'v6.6-vfs.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: autofs: use wake_up() instead of wake_up_interruptible(() autofs: fix memory leak of waitqueues in autofs_catatonic_mode
2023-08-28	Merge tag 'v6.6-vfs.fchmodat2' of ↵	Linus Torvalds	1	-4/+19
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fchmodat2 system call from Christian Brauner: "This adds the fchmodat2() system call. It is a revised version of the fchmodat() system call, adding a missing flag argument. Support for both AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH are included. Adding this system call revision has been a longstanding request but so far has always fallen through the cracks. While the kernel implementation of fchmodat() does not have a flag argument the libc provided POSIX-compliant fchmodat(3) version does. Both glibc and musl have to implement a workaround in order to support AT_SYMLINK_NOFOLLOW (see [1] and [2]). The workaround is brittle because it relies not just on O_PATH and O_NOFOLLOW semantics and procfs magic links but also on our rather inconsistent symlink semantics. This gives userspace a proper fchmodat2() system call that libcs can use to properly implement fchmodat(3) and allows them to get rid of their hacks. In this case it will immediately benefit them as the current workaround is already defunct because of aformentioned inconsistencies. In addition to AT_SYMLINK_NOFOLLOW, give userspace the ability to use AT_EMPTY_PATH with fchmodat2(). This is already possible with fchownat() so there's no reason to not also support it for fchmodat2(). The implementation is simple and comes with selftests. Implementation of the system call and wiring up the system call are done as separate patches even though they could arguably be one patch. But in case there are merge conflicts from other system call additions it can be beneficial to have separate patches" Link: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35 [1] Link: https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28 [2] * tag 'v6.6-vfs.fchmodat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests: fchmodat2: remove duplicate unneeded defines fchmodat2: add support for AT_EMPTY_PATH selftests: Add fchmodat2 selftest arch: Register fchmodat2, usually as syscall 452 fs: Add fchmodat2() Non-functional cleanup of a "__user * filename"
2023-08-28	Merge tag 'v6.6-vfs.super' of ↵	Linus Torvalds	25	-520/+935
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...