summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2025-04-10smb: client: Fix netns refcount imbalance causing leaks and use-after-freeWang Zhaolong1-8/+8
[ Upstream commit 4e7f1644f2ac6d01dc584f6301c3b1d5aac4eaef ] Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") attempted to fix a netns use-after-free issue by manually adjusting reference counts via sk->sk_net_refcnt and sock_inuse_add(). However, a later commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") pointed out that the approach of manually setting sk->sk_net_refcnt in the first commit was technically incorrect, as sk->sk_net_refcnt should only be set for user sockets. It led to issues like TCP timers not being cleared properly on close. The second commit moved to a model of just holding an extra netns reference for server->ssocket using get_net(), and dropping it when the server is torn down. But there remain some gaps in the get_net()/put_net() balancing added by these commits. The incomplete reference handling in these fixes results in two issues: 1. Netns refcount leaks[1] The problem process is as follows: ``` mount.cifs cifsd cifs_do_mount cifs_mount cifs_mount_get_session cifs_get_tcp_session get_net() /* First get net. */ ip_connect generic_ip_connect /* Try port 445 */ get_net() ->connect() /* Failed */ put_net() generic_ip_connect /* Try port 139 */ get_net() /* Missing matching put_net() for this get_net().*/ cifs_get_smb_ses cifs_negotiate_protocol smb2_negotiate SMB2_negotiate cifs_send_recv wait_for_response cifs_demultiplex_thread cifs_read_from_socket cifs_readv_from_socket cifs_reconnect cifs_abort_connection sock_release(); server->ssocket = NULL; /* Missing put_net() here. */ generic_ip_connect get_net() ->connect() /* Failed */ put_net() sock_release(); server->ssocket = NULL; free_rsp_buf ... clean_demultiplex_info /* It's only called once here. */ put_net() ``` When cifs_reconnect() is triggered, the server->ssocket is released without a corresponding put_net() for the reference acquired in generic_ip_connect() before. it ends up calling generic_ip_connect() again to retry get_net(). After that, server->ssocket is set to NULL in the error path of generic_ip_connect(), and the net count cannot be released in the final clean_demultiplex_info() function. 2. Potential use-after-free The current refcounting scheme can lead to a potential use-after-free issue in the following scenario: ``` cifs_do_mount cifs_mount cifs_mount_get_session cifs_get_tcp_session get_net() /* First get net */ ip_connect generic_ip_connect get_net() bind_socket kernel_bind /* failed */ put_net() /* after out_err_crypto_release label */ put_net() /* after out_err label */ put_net() ``` In the exception handling process where binding the socket fails, the get_net() and put_net() calls are unbalanced, which may cause the server->net reference count to drop to zero and be prematurely released. To address both issues, this patch ties the netns reference counting to the server->ssocket and server lifecycles. The extra reference is now acquired when the server or socket is created, and released when the socket is destroyed or the server is torn down. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=219792 Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10NFS: Shut down the nfs_client only after all the superblocksTrond Myklebust1-1/+21
[ Upstream commit 2d3e998a0bc7fe26a724f87a8ce217848040520e ] The nfs_client manages state for all the superblocks in the "cl_superblocks" list, so it must not be shut down until all of them are gone. Fixes: 7d3e26a054c8 ("NFS: Cancel all existing RPC tasks when shutdown") Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10fs/procfs: fix the comment above proc_pid_wchan()Bart Van Assche1-1/+1
[ Upstream commit 6287fbad1cd91f0c25cdc3a580499060828a8f30 ] proc_pid_wchan() used to report kernel addresses to user space but that is no longer the case today. Bring the comment above proc_pid_wchan() in sync with the implementation. Link: https://lkml.kernel.org/r/20250319210222.1518771-1-bvanassche@acm.org Fixes: b2f73922d119 ("fs/proc, core/debug: Don't expose absolute kernel addresses via wchan") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Kees Cook <kees@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10um: hostfs: avoid issues on inode number reuse by hostBenjamin Berg3-27/+41
[ Upstream commit 0bc754d1e31f40f4a343b692096d9e092ccc0370 ] Some file systems (e.g. ext4) may reuse inode numbers once the inode is not in use anymore. Usually hostfs will keep an FD open for each inode, but this is not always the case. In the case of sockets, this cannot even be done properly. As such, the following sequence of events was possible: * application creates and deletes a socket * hostfs creates/deletes the socket on the host * inode is still in the hostfs cache * hostfs creates a new file * ext4 on the outside reuses the inode number * hostfs finds the socket inode for the newly created file * application receives -ENXIO when opening the file As mentioned, this can only happen if the deleted file is a special file that is never opened on the host (i.e. no .open fop). As such, to prevent issues, it is sufficient to check that the inode has the expected type. That said, also add a check for the inode birth time, just to be on the safe side. Fixes: 74ce793bcbde ("hostfs: Fix ephemeral inodes") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Mickaël Salaün <mic@digikod.net> Tested-by: Mickaël Salaün <mic@digikod.net> Link: https://patch.msgid.link/20250214092822.1241575-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10fuse: fix dax truncate/punch_hole fault pathAlistair Popple3-4/+3
[ Upstream commit 7851bf649d423edd7286b292739f2eefded3d35c ] Patch series "fs/dax: Fix ZONE_DEVICE page reference counts", v9. Device and FS DAX pages have always maintained their own page reference counts without following the normal rules for page reference counting. In particular pages are considered free when the refcount hits one rather than zero and refcounts are not added when mapping the page. Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary mechanism for allowing GUP to hold references on the page (see get_dev_pagemap). However there doesn't seem to be any reason why FS DAX pages need their own reference counting scheme. By treating the refcounts on these pages the same way as normal pages we can remove a lot of special checks. In particular pXd_trans_huge() becomes the same as pXd_leaf(), although I haven't made that change here. It also frees up a valuable SW define PTE bit on architectures that have devmap PTE bits defined. It also almost certainly allows further clean-up of the devmap managed functions, but I have left that as a future improvment. It also enables support for compound ZONE_DEVICE pages which is one of my primary motivators for doing this work. This patch (of 20): FS DAX requires file systems to call into the DAX layout prior to unlinking inodes to ensure there is no ongoing DMA or other remote access to the direct mapped page. The fuse file system implements fuse_dax_break_layouts() to do this which includes a comment indicating that passing dmap_end == 0 leads to unmapping of the whole file. However this is not true - passing dmap_end == 0 will not unmap anything before dmap_start, and further more dax_layout_busy_page_range() will not scan any of the range to see if there maybe ongoing DMA access to the range. Fix this by passing -1 for dmap_end to fuse_dax_break_layouts() which will invalidate the entire file range to dax_layout_busy_page_range(). Link: https://lkml.kernel.org/r/cover.8068ad144a7eea4a813670301f4d2a86a8e68ec4.1740713401.git-series.apopple@nvidia.com Link: https://lkml.kernel.org/r/f09a34b6c40032022e4ddee6fadb7cc676f08867.1740713401.git-series.apopple@nvidia.com Fixes: 6ae330cad6ef ("virtiofs: serialize truncate/punch_hole and dax fault path") Signed-off-by: Alistair Popple <apopple@nvidia.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Asahi Lina <lina@asahilina.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: linmiaohe <linmiaohe@huawei.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10NFS: fix open_owner_id_maxsz and related fields.NeilBrown1-9/+9
[ Upstream commit 43502f6e8d1e767d6736ea0676cc784025cf6eeb ] A recent change increased the size of an NFSv4 open owner, but didn't increase the corresponding max_sz defines. This is not know to have caused failure, but should be fixed. This patch also fixes some relates _maxsz fields that are wrong. Note that the XXX_owner_id_maxsz values now are only the size of the id and do NOT include the len field that will always preceed the id in xdr encoding. I think this is clearer. Reported-by: David Disseldorp <ddiss@suse.com> Fixes: d98f72272500 ("nfs: simplify and guarantee owner uniqueness.") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10NFSv4: Avoid unnecessary scans of filesystems for delayed delegationsTrond Myklebust1-6/+12
[ Upstream commit e767b59e29b8327d25edde65efc743f479f30d0a ] The amount of looping through the list of delegations is occasionally leading to soft lockups. If the state manager was asked to manage the delayed return of delegations, then only scan those filesystems containing delegations that were marked as being delayed. Fixes: be20037725d1 ("NFSv4: Fix delegation return in cases where we have to retry") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10NFSv4: Avoid unnecessary scans of filesystems for expired delegationsTrond Myklebust1-0/+7
[ Upstream commit f163aa81a799e2d46d7f8f0b42a0e7770eaa0d06 ] The amount of looping through the list of delegations is occasionally leading to soft lockups. If the state manager was asked to reap the expired delegations, it should scan only those filesystems that hold delegations that need to be reaped. Fixes: 7f156ef0bf45 ("NFSv4: Clean up nfs_delegation_reap_expired()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10NFSv4: Avoid unnecessary scans of filesystems for returning delegationsTrond Myklebust1-0/+5
[ Upstream commit 35a566a24e58f1b5f89737edf60b77de58719ed0 ] The amount of looping through the list of delegations is occasionally leading to soft lockups. If the state manager was asked to return delegations asynchronously, it should only scan those filesystems that hold delegations that need to be returned. Fixes: af3b61bf6131 ("NFSv4: Clean up nfs_client_return_marked_delegations()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10NFSv4: Don't trigger uneccessary scans for return-on-close delegationsTrond Myklebust1-15/+18
[ Upstream commit 47acca884f714f41d95dc654f802845544554784 ] The amount of looping through the list of delegations is occasionally leading to soft lockups. Avoid at least some loops by not requiring the NFSv4 state manager to scan for delegations that are marked for return-on-close. Instead, either mark them for immediate return (if possible) or else leave it up to nfs4_inode_return_delegation_on_close() to return them once the file is closed by the application. Fixes: b757144fd77c ("NFSv4: Be less aggressive about returning delegations for open files") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10ocfs2: validate l_tree_depth to avoid out-of-bounds accessVasiliy Kovalev1-0/+8
[ Upstream commit a406aff8c05115119127c962cbbbbd202e1973ef ] The l_tree_depth field is 16-bit (__le16), but the actual maximum depth is limited to OCFS2_MAX_PATH_DEPTH. Add a check to prevent out-of-bounds access if l_tree_depth has an invalid value, which may occur when reading from a corrupted mounted disk [1]. Link: https://lkml.kernel.org/r/20250214084908.736528-1-kovalev@altlinux.org Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Reported-by: syzbot+66c146268dc88f4341fd@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=66c146268dc88f4341fd [1] Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Kurt Hackel <kurt.hackel@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Vasiliy Kovalev <kovalev@altlinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10fs/ntfs3: Prevent integer overflow in hdr_first_de()Dan Carpenter1-1/+1
[ Upstream commit 6bb81b94f7a9cba6bde9a905cef52a65317a8b04 ] The "de_off" and "used" variables come from the disk so they both need to check. The problem is that on 32bit systems if they're both greater than UINT_MAX - 16 then the check does work as intended because of an integer overflow. Fixes: 60ce8dfde035 ("fs/ntfs3: Fix wrong if in hdr_first_de") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10fs/ntfs3: Fix a couple integer overflows on 32bit systemsDan Carpenter1-2/+2
[ Upstream commit 5ad414f4df2294b28836b5b7b69787659d6aa708 ] On 32bit systems the "off + sizeof(struct NTFS_DE)" addition can have an integer wrapping issue. Fix it by using size_add(). Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10isofs: fix KMSAN uninit-value bug in do_isofs_readdir()Qasim Ijaz1-1/+2
[ Upstream commit 81a82e8f33880793029cd6f8a766fb13b737e6a7 ] In do_isofs_readdir() when assigning the variable "struct iso_directory_record *de" the b_data field of the buffer_head is accessed and an offset is added to it, the size of b_data is 2048 and the offset size is 2047, meaning "de = (struct iso_directory_record *) (bh->b_data + offset);" yields the final byte of the 2048 sized b_data block. The first byte of the directory record (de_len) is then read and found to be 31, meaning the directory record size is 31 bytes long. The directory record is defined by the structure: struct iso_directory_record { __u8 length; // 1 byte __u8 ext_attr_length; // 1 byte __u8 extent[8]; // 8 bytes __u8 size[8]; // 8 bytes __u8 date[7]; // 7 bytes __u8 flags; // 1 byte __u8 file_unit_size; // 1 byte __u8 interleave; // 1 byte __u8 volume_sequence_number[4]; // 4 bytes __u8 name_len; // 1 byte char name[]; // variable size } __attribute__((packed)); The fixed portion of this structure occupies 33 bytes. Therefore, a valid directory record must be at least 33 bytes long (even without considering the variable-length name field). Since de_len is only 31, it is insufficient to contain the complete fixed header. The code later hits the following sanity check that compares de_len against the sum of de->name_len and sizeof(struct iso_directory_record): if (de_len < de->name_len[0] + sizeof(struct iso_directory_record)) { ... } Since the fixed portion of the structure is 33 bytes (up to and including name_len member), a valid record should have de_len of at least 33 bytes; here, however, de_len is too short, and the field de->name_len (located at offset 32) is accessed even though it lies beyond the available 31 bytes. This access on the corrupted isofs data triggers a KASAN uninitialized memory warning. The fix would be to first verify that de_len is at least sizeof(struct iso_directory_record) before accessing any fields like de->name_len. Reported-by: syzbot <syzbot+812641c6c3d7586a1613@syzkaller.appspotmail.com> Tested-by: syzbot <syzbot+812641c6c3d7586a1613@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=812641c6c3d7586a1613 Fixes: 2deb1acc653c ("isofs: fix access to unallocated memory when reading corrupted filesystem") Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250211195900.42406-1-qasdev00@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10fs/ntfs3: Update inode->i_mapping->a_ops on compression stateKonstantin Komarov3-5/+26
[ Upstream commit b432163ebd15a0fb74051949cb61456d6c55ccbd ] Update inode->i_mapping->a_ops when the compression state changes to ensure correct address space operations. Clear ATTR_FLAG_SPARSED/FILE_ATTRIBUTE_SPARSE_FILE when enabling compression to prevent flag conflicts. v2: Additionally, ensure that all dirty pages are flushed and concurrent access to the page cache is blocked. Fixes: 6b39bfaeec44 ("fs/ntfs3: Add support for the compression attribute") Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-07bcachefs: bch2_ioctl_subvolume_destroy() fixesKent Overstreet1-2/+4
[ Upstream commit 707549600c4a012ed71c0204a7992a679880bf33 ] bch2_evict_subvolume_inodes() was getting stuck - due to incorrectly pruning the dcache. Also, fix missing permissions checks. Reported-by: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-07nfsd: fix legacy client tracking initializationScott Mayhew1-1/+0
commit de71d4e211eddb670b285a0ea477a299601ce1ca upstream. Get rid of the nfsd4_legacy_tracking_ops->init() call in check_for_legacy_methods(). That will be handled in the caller (nfsd4_client_tracking_init()). Otherwise, we'll wind up calling nfsd4_legacy_tracking_ops->init() twice, and the second time we'll trigger the BUG_ON() in nfsd4_init_recdir(). Fixes: 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking") Reported-by: Jur van der Burg <jur@avtware.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219580 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-29ksmbd: fix incorrect validation for num_aces field of smb_aclNamjae Jeon1-1/+4
commit 1b8b67f3c5e5169535e26efedd3e422172e2db64 upstream. parse_dcal() validate num_aces to allocate posix_ace_state_array. if (num_aces > ULONG_MAX / sizeof(struct smb_ace *)) It is an incorrect validation that we can create an array of size ULONG_MAX. smb_acl has ->size field to calculate actual number of aces in request buffer size. Use this to check invalid num_aces. Reported-by: Igor Leite Ladessa <igor-ladessa@hotmail.com> Tested-by: Igor Leite Ladessa <igor-ladessa@hotmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-29proc: fix UAF in proc_get_inode()Ye Bin3-4/+26
commit 654b33ada4ab5e926cd9c570196fefa7bec7c1df upstream. Fix race between rmmod and /proc/XXX's inode instantiation. The bug is that pde->proc_ops don't belong to /proc, it belongs to a module, therefore dereferencing it after /proc entry has been registered is a bug unless use_pde/unuse_pde() pair has been used. use_pde/unuse_pde can be avoided (2 atomic ops!) because pde->proc_ops never changes so information necessary for inode instantiation can be saved _before_ proc_register() in PDE itself and used later, avoiding pde->proc_ops->... dereference. rmmod lookup sys_delete_module proc_lookup_de pde_get(de); proc_get_inode(dir->i_sb, de); mod->exit() proc_remove remove_proc_subtree proc_entry_rundown(de); free_module(mod); if (S_ISREG(inode->i_mode)) if (de->proc_ops->proc_read_iter) --> As module is already freed, will trigger UAF BUG: unable to handle page fault for address: fffffbfff80a702b PGD 817fc4067 P4D 817fc4067 PUD 817fc0067 PMD 102ef4067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 26 UID: 0 PID: 2667 Comm: ls Tainted: G Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:proc_get_inode+0x302/0x6e0 RSP: 0018:ffff88811c837998 EFLAGS: 00010a06 RAX: dffffc0000000000 RBX: ffffffffc0538140 RCX: 0000000000000007 RDX: 1ffffffff80a702b RSI: 0000000000000001 RDI: ffffffffc0538158 RBP: ffff8881299a6000 R08: 0000000067bbe1e5 R09: 1ffff11023906f20 R10: ffffffffb560ca07 R11: ffffffffb2b43a58 R12: ffff888105bb78f0 R13: ffff888100518048 R14: ffff8881299a6004 R15: 0000000000000001 FS: 00007f95b9686840(0000) GS:ffff8883af100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffbfff80a702b CR3: 0000000117dd2000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> proc_lookup_de+0x11f/0x2e0 __lookup_slow+0x188/0x350 walk_component+0x2ab/0x4f0 path_lookupat+0x120/0x660 filename_lookup+0x1ce/0x560 vfs_statx+0xac/0x150 __do_sys_newstat+0x96/0x110 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e [adobriyan@gmail.com: don't do 2 atomic ops on the common path] Link: https://lkml.kernel.org/r/3d25ded0-1739-447e-812b-e34da7990dcf@p183 Fixes: 778f3dd5a13c ("Fix procfs compat_ioctl regression") Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-29netfs: Call `invalidate_cache` only if implementedMax Kellermann1-1/+2
commit 344b7ef248f420ed4ba3a3539cb0a0fc18df9a6c upstream. Many filesystems such as NFS and Ceph do not implement the `invalidate_cache` method. On those filesystems, if writing to the cache (`NETFS_WRITE_TO_CACHE`) fails for some reason, the kernel crashes like this: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 0 P4D 0 Oops: Oops: 0010 [#1] SMP PTI CPU: 9 UID: 0 PID: 3380 Comm: kworker/u193:11 Not tainted 6.13.3-cm4all1-hp #437 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: events_unbound netfs_write_collection_worker RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffff9b86e2ca7dc0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 7fffffffffffffff RDX: 0000000000000001 RSI: ffff89259d576a18 RDI: ffff89259d576900 RBP: ffff89259d5769b0 R08: ffff9b86e2ca7d28 R09: 0000000000000002 R10: ffff89258ceaca80 R11: 0000000000000001 R12: 0000000000000020 R13: ffff893d158b9338 R14: ffff89259d576900 R15: ffff89259d5769b0 FS: 0000000000000000(0000) GS:ffff893c9fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 000000054442e003 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x15c/0x460 ? try_to_wake_up+0x2d2/0x530 ? exc_page_fault+0x5e/0x100 ? asm_exc_page_fault+0x22/0x30 netfs_write_collection_worker+0xe9f/0x12b0 ? xs_poll_check_readable+0x3f/0x80 ? xs_stream_data_receive_workfn+0x8d/0x110 process_one_work+0x134/0x2d0 worker_thread+0x299/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xba/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: CR2: 0000000000000000 This patch adds the missing `NULL` check. Fixes: 0e0f2dfe880f ("netfs: Dispatch write requests to process a writeback slice") Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20250314164201.1993231-3-dhowells@redhat.com Acked-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-29libfs: Fix duplicate directory entry in offset_dir_lookupYongjian Sun1-1/+1
[ Upstream commit f70681e9e6066ab7b102e6b46a336a8ed67812ae ] There is an issue in the kernel: In tmpfs, when using the "ls" command to list the contents of a directory with a large number of files, glibc performs the getdents call in multiple rounds. If a concurrent unlink occurs between these getdents calls, it may lead to duplicate directory entries in the ls output. One possible reproduction scenario is as follows: Create 1026 files and execute ls and rm concurrently: for i in {1..1026}; do echo "This is file $i" > /tmp/dir/file$i done ls /tmp/dir rm /tmp/dir/file4 ->getdents(file1026-file5) ->unlink(file4) ->getdents(file5,file3,file2,file1) It is expected that the second getdents call to return file3 through file1, but instead it returns an extra file5. The root cause of this problem is in the offset_dir_lookup function. It uses mas_find to determine the starting position for the current getdents call. Since mas_find locates the first position that is greater than or equal to mas->index, when file4 is deleted, it ends up returning file5. It can be fixed by replacing mas_find with mas_find_rev, which finds the first position that is less than or equal to mas->index. Fixes: b9b588f22a0c ("libfs: Use d_children list to iterate simple_offset directories") Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> Link: https://lore.kernel.org/r/20250320034417.555810-1-sunyongjian@huaweicloud.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22fs/netfs/read_collect: add to next->prev_donatedMax Kellermann1-1/+1
If multiple subrequests donate data to the same "next" request (depending on the subrequest completion order), each of them would overwrite the `prev_donated` field, causing data corruption and a BUG() crash ("Can't donate prior to front"). Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Closes: https://lore.kernel.org/netfs/CAKPOu+_4mUwYgQtRTbXCmi+-k3PGvLysnPadkmHOyB7Gz0iSMA@mail.gmail.com/ Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-22smb: client: Fix match_session bug preventing session reuseHenrique Carvalho1-4/+12
[ Upstream commit 605b249ea96770ac4fac4b8510a99e0f8442be5e ] Fix a bug in match_session() that can causes the session to not be reused in some cases. Reproduction steps: mount.cifs //server/share /mnt/a -o credentials=creds mount.cifs //server/share /mnt/b -o credentials=creds,sec=ntlmssp cat /proc/fs/cifs/DebugData | grep SessionId | wc -l mount.cifs //server/share /mnt/b -o credentials=creds,sec=ntlmssp mount.cifs //server/share /mnt/a -o credentials=creds cat /proc/fs/cifs/DebugData | grep SessionId | wc -l Cc: stable@vger.kernel.org Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22smb3: add support for IAKerbSteve French5-3/+12
[ Upstream commit eea5119fa5979c350af5783a8148eacdd4219715 ] There are now more servers which advertise support for IAKerb (passthrough Kerberos authentication via proxy). IAKerb is a public extension industry standard Kerberos protocol that allows a client without line-of-sight to a Domain Controller to authenticate. There can be cases where we would fail to mount if the server only advertises the OID for IAKerb in SPNEGO/GSSAPI. Add code to allow us to still upcall to userspace in these cases to obtain the Kerberos ticket. Signed-off-by: Steve French <stfrench@microsoft.com> Stable-dep-of: 605b249ea967 ("smb: client: Fix match_session bug preventing session reuse") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22cifs: Fix integer overflow while processing closetimeo mount optionMurad Masimov1-2/+2
[ Upstream commit d5a30fddfe2f2e540f6c43b59cf701809995faef ] User-provided mount parameter closetimeo of type u32 is intended to have an upper limit, but before it is validated, the value is converted from seconds to jiffies which can lead to an integer overflow. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 5efdd9122eff ("smb3: allow deferred close timeout to be configurable") Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22cifs: Fix integer overflow while processing actimeo mount optionMurad Masimov1-1/+1
[ Upstream commit 64f690ee22c99e16084e0e45181b2a1eed2fa149 ] User-provided mount parameter actimeo of type u32 is intended to have an upper limit, but before it is validated, the value is converted from seconds to jiffies which can lead to an integer overflow. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 6d20e8406f09 ("cifs: add attribute cache timeout (actimeo) tunable") Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22cifs: Fix integer overflow while processing acdirmax mount optionMurad Masimov1-2/+2
[ Upstream commit 5b29891f91dfb8758baf1e2217bef4b16b2b165b ] User-provided mount parameter acdirmax of type u32 is intended to have an upper limit, but before it is validated, the value is converted from seconds to jiffies which can lead to an integer overflow. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 4c9f948142a5 ("cifs: Add new mount parameter "acdirmax" to allow caching directory metadata") Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22cifs: Fix integer overflow while processing acregmax mount optionMurad Masimov1-2/+2
[ Upstream commit 7489161b1852390b4413d57f2457cd40b34da6cc ] User-provided mount parameter acregmax of type u32 is intended to have an upper limit, but before it is validated, the value is converted from seconds to jiffies which can lead to an integer overflow. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 5780464614f6 ("cifs: Add new parameter "acregmax" for distinct file and directory metadata timeout") Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22smb: client: fix regression with guest optionPaulo Alcantara1-0/+4
commit fc99045effa81fdf509c2a97cbb7e6e8f2fd4443 upstream. When mounting a CIFS share with 'guest' mount option, mount.cifs(8) will set empty password= and password2= options. Currently we only handle empty strings from user= and password= options, so the mount will fail with cifs: Bad value for 'password2' Fix this by handling empty string from password2= option as well. Link: https://bbs.archlinux.org/viewtopic.php?id=303927 Reported-by: Adam Williamson <awilliam@redhat.com> Closes: https://lore.kernel.org/r/83c00b5fea81c07f6897a5dd3ef50fd3b290f56c.camel@redhat.com Fixes: 35f834265e0d ("smb3: fix broken reconnect when password changing on the server by allowing password rotation") Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-22ksmbd: prevent connection release during oplock break notificationNamjae Jeon4-12/+30
commit 3aa660c059240e0c795217182cf7df32909dd917 upstream. ksmbd_work could be freed when after connection release. Increment r_count of ksmbd_conn to indicate that requests are not finished yet and to not release the connection. Cc: stable@vger.kernel.org Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-22ksmbd: fix use-after-free in ksmbd_free_work_structNamjae Jeon4-27/+15
commit bb39ed47065455604729404729d9116868638d31 upstream. ->interim_entry of ksmbd_work could be deleted after oplock is freed. We don't need to manage it with linked list. The interim request could be immediately sent whenever a oplock break wait is needed. Cc: stable@vger.kernel.org Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-22cifs: Throw -EOPNOTSUPP error on unsupported reparse point type from ↵Pali Rohár1-3/+2
parse_reparse_point() [ Upstream commit cad3fc0a4c8cef07b07ceddc137f582267577250 ] This would help to track and detect by caller if the reparse point type was processed or not. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22cifs: Validate content of WSL reparse point buffersPali Rohár1-0/+5
[ Upstream commit 1f48660667efb97c3cf70485c7e1977af718b48b ] WSL socket, fifo, char and block devices have empty reparse buffer. Validate the length of the reparse buffer. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Stable-dep-of: cad3fc0a4c8c ("cifs: Throw -EOPNOTSUPP error on unsupported reparse point type from parse_reparse_point()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22fuse: don't truncate cached, mutated symlinkMiklos Szeredi2-6/+20
[ Upstream commit b4c173dfbb6c78568578ff18f9e8822d7bd0e31b ] Fuse allows the value of a symlink to change and this property is exploited by some filesystems (e.g. CVMFS). It has been observed, that sometimes after changing the symlink contents, the value is truncated to the old size. This is caused by fuse_getattr() racing with fuse_reverse_inval_inode(). fuse_reverse_inval_inode() updates the fuse_inode's attr_version, which results in fuse_change_attributes() exiting before updating the cached attributes This is okay, as the cached attributes remain invalid and the next call to fuse_change_attributes() will likely update the inode with the correct values. The reason this causes problems is that cached symlinks will be returned through page_get_link(), which truncates the symlink to inode->i_size. This is correct for filesystems that don't mutate symlinks, but in this case it causes bad behavior. The solution is to just remove this truncation. This can cause a regression in a filesystem that relies on supplying a symlink larger than the file size, but this is unlikely. If that happens we'd need to make this behavior conditional. Reported-by: Laura Promberger <laura.promberger@cern.ch> Tested-by: Sam Lewis <samclewis@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250220100258.793363-1-mszeredi@redhat.com Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22cifs: Treat unhandled directory name surrogate reparse points as mount ↵Pali Rohár2-0/+16
directory nodes [ Upstream commit b587fd128660d48cd2122f870f720ff8e2b4abb3 ] If the reparse point was not handled (indicated by the -EOPNOTSUPP from ops->parse_reparse_point() call) but reparse tag is of type name surrogate directory type, then treat is as a new mount point. Name surrogate reparse point represents another named entity in the system. From SMB client point of view, this another entity is resolved on the SMB server, and server serves its content automatically. Therefore from Linux client point of view, this name surrogate reparse point of directory type crosses mount point. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22btrfs: fix two misuses of folio_shift()Matthew Wilcox (Oracle)1-6/+5
[ Upstream commit 01af106a076352182b2916b143fc50272600bd81 ] It is meaningless to shift a byte count by folio_shift(). The folio index is in units of PAGE_SIZE, not folio_size(). We can use folio_contains() to make this work for arbitrary-order folios, so remove the assertion that the folios are of order 0. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22vboxsf: fix building with GCC 15Brahmajit Das1-1/+2
[ Upstream commit 4e7487245abcbc5a1a1aea54e4d3b33c53804bda ] Building with GCC 15 results in build error fs/vboxsf/super.c:24:54: error: initializer-string for array of ‘unsigned char’ is too long [-Werror=unterminated-string-initialization] 24 | static const unsigned char VBSF_MOUNT_SIGNATURE[4] = "\000\377\376\375"; | ^~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Due to GCC having enabled -Werror=unterminated-string-initialization[0] by default. Separately initializing each array element of VBSF_MOUNT_SIGNATURE to ensure NUL termination, thus satisfying GCC 15 and fixing the build error. [0]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wno-unterminated-string-initialization Signed-off-by: Brahmajit Das <brahmajit.xyz@gmail.com> Link: https://lore.kernel.org/r/20250121162648.1408743-1-brahmajit.xyz@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22smb: client: fix noisy when tree connecting to DFS interlink targetsPaulo Alcantara1-1/+1
[ Upstream commit 773dc23ff81838b6f74d7fabba5a441cc6a93982 ] When the client attempts to tree connect to a domain-based DFS namespace from a DFS interlink target, the server will return STATUS_BAD_NETWORK_NAME and the following will appear on dmesg: CIFS: VFS: BAD_NETWORK_NAME: \\dom\dfs Since a DFS share might contain several DFS interlinks and they expire after 10 minutes, the above message might end up being flooded on dmesg when mounting or accessing them. Print this only once per share. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22btrfs: avoid starting new transaction when cleaning qgroup during subvolume dropFilipe Manana1-5/+1
[ Upstream commit fdef89ce6fada462aef9cb90a140c93c8c209f0f ] At btrfs_qgroup_cleanup_dropped_subvolume() all we want to commit the current transaction in order to have all the qgroup rfer/excl numbers up to date. However we are using btrfs_start_transaction(), which joins the current transaction if there is one that is not yet committing, but also starts a new one if there is none or if the current one is already committing (its state is >= TRANS_STATE_COMMIT_START). This later case results in unnecessary IO, wasting time and a pointless rotation of the backup roots in the super block. So instead of using btrfs_start_transaction() followed by a btrfs_commit_transaction(), use btrfs_commit_current_transaction() which achieves our purpose and avoids starting and committing new transactions. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-13fs/netfs/read_collect: fix crash due to uninitialized `prev` variableMax Kellermann1-10/+11
[ Just like the other two netfs patches I sent yesterday, this one doesn't apply to v6.14 as it was obsoleted by commit e2d46f2ec332 ("netfs: Change the read result collector to only use one work item"). ] When checking whether the edges of adjacent subrequests touch, the `prev` variable is deferenced, but it might not have been initialized. This causes crashes like this one: BUG: unable to handle page fault for address: 0000000181343843 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000001c66db0067 P4D 8000001c66db0067 PUD 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 1 UID: 33333 PID: 24424 Comm: php-cgi8.2 Kdump: loaded Not tainted 6.13.2-cm4all0-hp+ #427 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 11/23/2021 RIP: 0010:netfs_consume_read_data.isra.0+0x5ef/0xb00 Code: fe ff ff 48 8b 83 88 00 00 00 48 8b 4c 24 30 4c 8b 43 78 48 85 c0 48 8d 51 70 75 20 48 8b 73 30 48 39 d6 74 17 48 8b 7c 24 40 <48> 8b 4f 78 48 03 4f 68 48 39 4b 68 0f 84 ab 02 00 00 49 29 c0 48 RSP: 0000:ffffc90037adbd00 EFLAGS: 00010283 RAX: 0000000000000000 RBX: ffff88811bda0600 RCX: ffff888620e7b980 RDX: ffff888620e7b9f0 RSI: ffff88811bda0428 RDI: 00000001813437cb RBP: 0000000000000000 R08: 0000000000004000 R09: 0000000000000000 R10: ffffffff82e070c0 R11: 0000000007ffffff R12: 0000000000004000 R13: ffff888620e7bb68 R14: 0000000000008000 R15: ffff888620e7bb68 FS: 00007ff2e0e7ddc0(0000) GS:ffff88981f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000181343843 CR3: 0000001bc10ba006 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x15c/0x450 ? search_extable+0x22/0x30 ? netfs_consume_read_data.isra.0+0x5ef/0xb00 ? search_module_extables+0xe/0x40 ? exc_page_fault+0x5e/0x100 ? asm_exc_page_fault+0x22/0x30 ? netfs_consume_read_data.isra.0+0x5ef/0xb00 ? intel_iommu_unmap_pages+0xaa/0x190 ? __pfx_cachefiles_read_complete+0x10/0x10 netfs_read_subreq_terminated+0x24f/0x390 cachefiles_read_complete+0x48/0xf0 iomap_dio_bio_end_io+0x125/0x160 blk_update_request+0xea/0x3e0 scsi_end_request+0x27/0x190 scsi_io_completion+0x43/0x6c0 blk_complete_reqs+0x40/0x50 handle_softirqs+0xd1/0x280 irq_exit_rcu+0x91/0xb0 common_interrupt+0x3b/0xa0 asm_common_interrupt+0x22/0x40 RIP: 0033:0x55fe8470d2ab Code: 00 00 3c 7e 74 3b 3c b6 0f 84 dd 03 00 00 3c 1e 74 2f 83 c1 01 48 83 c2 38 48 83 c7 30 44 39 d1 74 3e 48 63 42 08 85 c0 79 a3 <49> 8b 46 48 8b 04 38 f6 c4 04 75 0b 0f b6 42 30 83 e0 0c 3c 04 75 RSP: 002b:00007ffca5ef2720 EFLAGS: 00000216 RAX: 0000000000000023 RBX: 0000000000000008 RCX: 000000000000001b RDX: 00007ff2e0cdb6f8 RSI: 0000000000000006 RDI: 0000000000000510 RBP: 00007ffca5ef27a0 R08: 00007ffca5ef2720 R09: 0000000000000001 R10: 000000000000001e R11: 00007ff2e0c10d08 R12: 0000000000000001 R13: 0000000000000120 R14: 00007ff2e0cb1ed0 R15: 00000000000000b0 </TASK> Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13fs/netfs/read_pgpriv2: skip folio queues without `marks3`Max Kellermann1-2/+3
[ Note this patch doesn't apply to v6.14 as it was obsoleted by commit e2d46f2ec332 ("netfs: Change the read result collector to only use one work item"). ] At the beginning of the function, folio queues with marks3==0 are skipped, but after that, the `marks3` field is ignored. If one such queue is found, `slot` is set to 64 (because `__ffs(0)==64`), leading to a buffer overflow in the folioq_folio() call. The resulting crash may look like this: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 11 UID: 0 PID: 2909 Comm: kworker/u262:1 Not tainted 6.13.1-cm4all2-vm #415 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Workqueue: events_unbound netfs_read_termination_worker RIP: 0010:netfs_pgpriv2_write_to_the_cache+0x15a/0x3f0 Code: 48 85 c0 48 89 44 24 08 0f 84 24 01 00 00 48 8b 80 40 01 00 00 48 8b 7c 24 08 f3 48 0f bc c0 89 44 24 18 89 c0 48 8b 74 c7 08 <48> 8b 06 48 c7 04 24 00 10 00 00 a8 40 74 10 0f b6 4e 40 b8 00 10 RSP: 0018:ffffbbc440effe18 EFLAGS: 00010203 RAX: 0000000000000040 RBX: ffff96f8fc034000 RCX: 0000000000000000 RDX: 0000000000000040 RSI: 0000000000000000 RDI: ffff96f8fc036400 RBP: 0000000000001000 R08: ffff96f9132bb400 R09: 0000000000001000 R10: ffff96f8c1263c80 R11: 0000000000000003 R12: 0000000000001000 R13: ffff96f8fb75ade8 R14: fffffaaf5ca90000 R15: ffff96f8fb75ad00 FS: 0000000000000000(0000) GS:ffff9703cf0c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010c9ca003 CR4: 00000000001706b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x158/0x450 ? search_extable+0x22/0x30 ? netfs_pgpriv2_write_to_the_cache+0x15a/0x3f0 ? search_module_extables+0xe/0x40 ? exc_page_fault+0x62/0x120 ? asm_exc_page_fault+0x22/0x30 ? netfs_pgpriv2_write_to_the_cache+0x15a/0x3f0 ? netfs_pgpriv2_write_to_the_cache+0xf6/0x3f0 netfs_read_termination_worker+0x1f/0x60 process_one_work+0x138/0x2d0 worker_thread+0x2a5/0x3b0 ? __pfx_worker_thread+0x10/0x10 kthread+0xba/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13exfat: short-circuit zero-byte writes in exfat_file_write_iterEric Sandeen1-1/+1
[ Upstream commit fda94a9919fd632033979ad7765a99ae3cab9289 ] When generic_write_checks() returns zero, it means that iov_iter_count() is zero, and there is no work to do. Simply return success like all other filesystems do, rather than proceeding down the write path, which today yields an -EFAULT in generic_perform_write() via the (fault_in_iov_iter_readable(i, bytes) == bytes) check when bytes == 0. Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Reported-by: Noah <kernel-org-10@maxgrass.eu> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-13exfat: fix soft lockup in exfat_clear_bitmapNamjae Jeon3-7/+16
[ Upstream commit 9da33619e0ca53627641bc97d1b93ec741299111 ] bitmap clear loop will take long time in __exfat_free_cluster() if data size of file/dir enty is invalid. If cluster bit in bitmap is already clear, stop clearing bitmap go to out of loop. Fixes: 31023864e67a ("exfat: add fat entry operations") Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-13exfat: fix just enough dentries but allocate a new cluster to dirYuezhang Mo1-1/+1
[ Upstream commit 6697f819a10b238ccf01998c3f203d65d8374696 ] This commit fixes the condition for allocating cluster to parent directory to avoid allocating new cluster to parent directory when there are just enough empty directory entries at the end of the parent directory. Fixes: af02c72d0b62 ("exfat: convert exfat_find_empty_entry() to use dentry cache") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-13coredump: Only sort VMAs when core_sort_vma sysctl is setKees Cook1-2/+13
[ Upstream commit 39ec9eaaa165d297d008d1fa385748430bd18e4d ] The sorting of VMAs by size in commit 7d442a33bfe8 ("binfmt_elf: Dump smaller VMAs first in ELF cores") breaks elfutils[1]. Instead, sort based on the setting of the new sysctl, core_sort_vma, which defaults to 0, no sorting. Reported-by: Michael Stapelberg <michael@stapelberg.ch> Closes: https://lore.kernel.org/all/20250218085407.61126-1-michael@stapelberg.de/ [1] Fixes: 7d442a33bfe8 ("binfmt_elf: Dump smaller VMAs first in ELF cores") Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-13NFS: fix nfs_release_folio() to not deadlock via kcompactd writebackMike Snitzer1-1/+2
commit ce6d9c1c2b5cc785016faa11b48b6cd317eb367e upstream. Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for it so nfs_release_folio() can skip calling nfs_wb_folio() from kcompactd. Otherwise NFS can deadlock waiting for kcompactd enduced writeback which recurses back to NFS (which triggers writeback to NFSD via NFS loopback mount on the same host, NFSD blocks waiting for XFS's call to __filemap_get_folio): 6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds. {--- [58] "kcompactd0" [<0>] folio_wait_bit+0xe8/0x200 [<0>] folio_wait_writeback+0x2b/0x80 [<0>] nfs_wb_folio+0x80/0x1b0 [nfs] [<0>] nfs_release_folio+0x68/0x130 [nfs] [<0>] split_huge_page_to_list_to_order+0x362/0x840 [<0>] migrate_pages_batch+0x43d/0xb90 [<0>] migrate_pages_sync+0x9a/0x240 [<0>] migrate_pages+0x93c/0x9f0 [<0>] compact_zone+0x8e2/0x1030 [<0>] compact_node+0xdb/0x120 [<0>] kcompactd+0x121/0x2e0 [<0>] kthread+0xcf/0x100 [<0>] ret_from_fork+0x31/0x40 [<0>] ret_from_fork_asm+0x1a/0x30 ---} [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/20250225022002.26141-1-snitzer@kernel.org Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13btrfs: fix a leaked chunk map issue in read_one_chunk()Haoxiang Li1-0/+1
commit 35d99c68af40a8ca175babc5a89ef7e2226fb3ca upstream. Add btrfs_free_chunk_map() to free the memory allocated by btrfs_alloc_chunk_map() if btrfs_add_chunk_map() fails. Fixes: 7dc66abb5a47 ("btrfs: use a dedicated data structure for chunk maps") CC: stable@vger.kernel.org Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13ksmbd: fix bug on trap in smb2_lockNamjae Jeon1-1/+1
commit e26e2d2e15daf1ab33e0135caf2304a0cfa2744b upstream. If lock count is greater than 1, flags could be old value. It should be checked with flags of smb_lock, not flags. It will cause bug-on trap from locks_free_lock in error handling routine. Cc: stable@vger.kernel.org Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13ksmbd: fix use-after-free in smb2_lockNamjae Jeon1-3/+3
commit 84d2d1641b71dec326e8736a749b7ee76a9599fc upstream. If smb_lock->zero_len has value, ->llist of smb_lock is not delete and flock is old one. It will cause use-after-free on error handling routine. Cc: stable@vger.kernel.org Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13ksmbd: fix out-of-bounds in parse_sec_desc()Namjae Jeon1-0/+16
commit d6e13e19063db24f94b690159d0633aaf72a0f03 upstream. If osidoffset, gsidoffset and dacloffset could be greater than smb_ntsd struct size. If it is smaller, It could cause slab-out-of-bounds. And when validating sid, It need to check it included subauth array size. Cc: stable@vger.kernel.org Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>