summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-12-14epoll: annotate racy checkChristian Brauner1-2/+4
[ Upstream commit 6474353a5e3d0b2cf610153cea0c61f576a36d0a ] Epoll relies on a racy fastpath check during __fput() in eventpoll_release() to avoid the hit of pointlessly acquiring a semaphore. Annotate that race by using WRITE_ONCE() and READ_ONCE(). Link: https://lore.kernel.org/r/66edfb3c.050a0220.3195df.001a.GAE@google.com Link: https://lore.kernel.org/r/20240925-fungieren-anbauen-79b334b00542@brauner Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: syzbot+3b6b32dc50537a49bb4a@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14ocfs2: update seq_file index in ocfs2_dlm_seq_nextWengang Wang1-0/+1
commit 914eec5e980171bc128e7e24f7a22aa1d803570e upstream. The following INFO level message was seen: seq_file: buggy .next function ocfs2_dlm_seq_next [ocfs2] did not update position index Fix: Update *pos (so m->index) to make seq_read_iter happy though the index its self makes no sense to ocfs2_dlm_seq_next. Link: https://lkml.kernel.org/r/20241119174500.9198-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14smb3.1.1: fix posix mounts to older serversSteve French3-5/+11
commit ddca5023091588eb303e3c0097d95c325992d05f upstream. Some servers which implement the SMB3.1.1 POSIX extensions did not set the file type in the mode in the infolevel 100 response. With the recent changes for checking the file type via the mode field, this can cause the root directory to be reported incorrectly and mounts (e.g. to ksmbd) to fail. Fixes: 6a832bc8bbb2 ("fs/smb/client: Implement new SMB3 POSIX type") Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Cc: Ralph Boehme <slow@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14fs/smb/client: cifs_prime_dcache() for SMB3 POSIX reparse pointsRalph Boehme1-1/+17
commit 8cb0bc5436351de8a11eef13b7367d64cc0d6c68 upstream. Spares an extra revalidation request Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Ralph Boehme <slow@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14fs/smb/client: Implement new SMB3 POSIX typeRalph Boehme4-60/+149
commit 6a832bc8bbb22350f7ffe6ecb2d36f261bb96023 upstream. Fixes special files against current Samba. On the Samba server: insgesamt 20 131958 brw-r--r-- 1 root root 0, 0 15. Nov 12:04 blockdev 131965 crw-r--r-- 1 root root 1, 1 15. Nov 12:04 chardev 131966 prw-r--r-- 1 samba samba 0 15. Nov 12:05 fifo 131953 -rw-rwxrw-+ 2 samba samba 4 18. Nov 11:37 file 131953 -rw-rwxrw-+ 2 samba samba 4 18. Nov 11:37 hardlink 131957 lrwxrwxrwx 1 samba samba 4 15. Nov 12:03 symlink -> file 131954 -rwxrwxr-x+ 1 samba samba 0 18. Nov 15:28 symlinkoversmb Before: ls: cannot access '/mnt/smb3unix/posix/blockdev': No data available ls: cannot access '/mnt/smb3unix/posix/chardev': No data available ls: cannot access '/mnt/smb3unix/posix/symlinkoversmb': No data available ls: cannot access '/mnt/smb3unix/posix/fifo': No data available ls: cannot access '/mnt/smb3unix/posix/symlink': No data available total 16 ? -????????? ? ? ? ? ? blockdev ? -????????? ? ? ? ? ? chardev ? -????????? ? ? ? ? ? fifo 131953 -rw-rwxrw- 2 root samba 4 Nov 18 11:37 file 131953 -rw-rwxrw- 2 root samba 4 Nov 18 11:37 hardlink ? -????????? ? ? ? ? ? symlink ? -????????? ? ? ? ? ? symlinkoversmb After: insgesamt 21 131958 brw-r--r-- 1 root root 0, 0 15. Nov 12:04 blockdev 131965 crw-r--r-- 1 root root 1, 1 15. Nov 12:04 chardev 131966 prw-r--r-- 1 root samba 0 15. Nov 12:05 fifo 131953 -rw-rwxrw- 2 root samba 4 18. Nov 11:37 file 131953 -rw-rwxrw- 2 root samba 4 18. Nov 11:37 hardlink 131957 lrwxrwxrwx 1 root samba 4 15. Nov 12:03 symlink -> file 131954 lrwxrwxr-x 1 root samba 23 18. Nov 15:28 symlinkoversmb -> mnt/smb3unix/posix/file Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Ralph Boehme <slow@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14fs/smb/client: avoid querying SMB2_OP_QUERY_WSL_EA for SMB3 POSIXRalph Boehme1-1/+2
commit ca4b2c4607433033e9c4f4659f809af4261d8992 upstream. Avoid extra roundtrip Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Ralph Boehme <slow@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()Ryusuke Konishi1-1/+1
commit 985ebec4ab0a28bb5910c3b1481a40fbf7f9e61d upstream. Syzbot reported that when searching for records in a directory where the inode's i_size is corrupted and has a large value, memory access outside the folio/page range may occur, or a use-after-free bug may be detected if KASAN is enabled. This is because nilfs_last_byte(), which is called by nilfs_find_entry() and others to calculate the number of valid bytes of directory data in a page from i_size and the page index, loses the upper 32 bits of the 64-bit size information due to an inappropriate type of local variable to which the i_size value is assigned. This caused a large byte offset value due to underflow in the end address calculation in the calling nilfs_find_entry(), resulting in memory access that exceeds the folio/page size. Fix this issue by changing the type of the local variable causing the bit loss from "unsigned int" to "u64". The return value of nilfs_last_byte() is also of type "unsigned int", but it is truncated so as not to exceed PAGE_SIZE and no bit loss occurs, so no change is required. Link: https://lkml.kernel.org/r/20241119172403.9292-1-konishi.ryusuke@gmail.com Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=96d5d14c47d97015c624 Tested-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14ksmbd: fix Out-of-Bounds Write in ksmbd_vfs_stream_writeJordy Zomer1-0/+2
commit 313dab082289e460391c82d855430ec8a28ddf81 upstream. An offset from client could be a negative value, It could allows to write data outside the bounds of the allocated buffer. Note that this issue is coming when setting 'vfs objects = streams_xattr parameter' in ksmbd.conf. Cc: stable@vger.kernel.org # v5.15+ Reported-by: Jordy Zomer <jordyzomer@google.com> Signed-off-by: Jordy Zomer <jordyzomer@google.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14ksmbd: fix Out-of-Bounds Read in ksmbd_vfs_stream_readJordy Zomer1-0/+4
commit fc342cf86e2dc4d2edb0fc2ff5e28b6c7845adb9 upstream. An offset from client could be a negative value, It could lead to an out-of-bounds read from the stream_buf. Note that this issue is coming when setting 'vfs objects = streams_xattr parameter' in ksmbd.conf. Cc: stable@vger.kernel.org # v5.15+ Reported-by: Jordy Zomer <jordyzomer@google.com> Signed-off-by: Jordy Zomer <jordyzomer@google.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14smb: client: fix potential race in cifs_put_tcon()Paulo Alcantara1-3/+1
[ Upstream commit c32b624fa4f7ca5a2ff217a0b1b2f1352bb4ec11 ] dfs_cache_refresh() delayed worker could race with cifs_put_tcon(), so make sure to call list_replace_init() on @tcon->dfs_ses_list after kworker is cancelled or finished. Fixes: 4f42a8b54b5c ("smb: client: fix DFS interlink failover") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14ocfs2: free inode when ocfs2_get_init_inode() failsTetsuo Handa1-1/+3
[ Upstream commit 965b5dd1894f4525f38c1b5f99b0106a07dbb5db ] syzbot is reporting busy inodes after unmount, for commit 9c89fe0af826 ("ocfs2: Handle error from dquot_initialize()") forgot to call iput() when new_inode() succeeded and dquot_initialize() failed. Link: https://lkml.kernel.org/r/e68c0224-b7c6-4784-b4fa-a9fc8c675525@I-love.SAKURA.ne.jp Fixes: 9c89fe0af826 ("ocfs2: Handle error from dquot_initialize()") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot+0af00f6a2cba2058b5db@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0af00f6a2cba2058b5db Tested-by: syzbot+0af00f6a2cba2058b5db@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: fix to requery extent which cross boundary of inquiryChao Yu1-5/+15
[ Upstream commit 6787a82245857271133b63ae7f72f1dc9f29e985 ] dd if=/dev/zero of=file bs=4k count=5 xfs_io file -c "fiemap -v 2 16384" file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..31]: 139272..139303 32 0x1000 1: [32..39]: 139304..139311 8 0x1001 xfs_io file -c "fiemap -v 0 16384" file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..31]: 139272..139303 32 0x1000 xfs_io file -c "fiemap -v 0 16385" file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..39]: 139272..139311 40 0x1001 There are two problems: - continuous extent is split to two - FIEMAP_EXTENT_LAST is missing in last extent The root cause is: if upper boundary of inquiry crosses extent, f2fs_map_blocks() will truncate length of returned extent to F2FS_BYTES_TO_BLK(len), and also, it will stop to query latter extent or hole to make sure current extent is last or not. In order to fix this issue, once we found an extent locates in the end of inquiry range by f2fs_map_blocks(), we need to expand inquiry range to requiry. Cc: stable@vger.kernel.org Fixes: 7f63eb77af7b ("f2fs: report unwritten area in f2fs_fiemap") Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: fix to adjust appropriate length for fiemapZhiguo Niu1-3/+3
[ Upstream commit 77569f785c8624fa4189795fb52e635a973672e5 ] If user give a file size as "length" parameter for fiemap operations, but if this size is non-block size aligned, it will show 2 segments fiemap results even this whole file is contiguous on disk, such as the following results: ./f2fs_io fiemap 0 19034 ylog/analyzer.py Fiemap: offset = 0 len = 19034 logical addr. physical addr. length flags 0 0000000000000000 0000000020baa000 0000000000004000 00001000 1 0000000000004000 0000000020bae000 0000000000001000 00001001 after this patch: ./f2fs_io fiemap 0 19034 ylog/analyzer.py Fiemap: offset = 0 len = 19034 logical addr. physical addr. length flags 0 0000000000000000 00000000315f3000 0000000000005000 00001001 Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: 6787a8224585 ("f2fs: fix to requery extent which cross boundary of inquiry") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}Chao Yu1-39/+29
[ Upstream commit 7461f37094180200cb2f98e60ef99a0cea97beec ] f2fs doesn't support different blksize in one instance, so bytes_to_blks() and blks_to_bytes() are equal to F2FS_BYTES_TO_BLK and F2FS_BLK_TO_BYTES, let's use F2FS_BYTES_TO_BLK/F2FS_BLK_TO_BYTES instead for cleanup. Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: 6787a8224585 ("f2fs: fix to requery extent which cross boundary of inquiry") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occurYang Erkun1-0/+19
commit 98100e88dd8865999dc6379a3356cd799795fe7b upstream. The action force umount(umount -f) will attempt to kill all rpc_task even umount operation may ultimately fail if some files remain open. Consequently, if an action attempts to open a file, it can potentially send two rpc_task to nfs server. NFS CLIENT thread1 thread2 open("file") ... nfs4_do_open _nfs4_do_open _nfs4_open_and_get_state _nfs4_proc_open nfs4_run_open_task /* rpc_task1 */ rpc_run_task rpc_wait_for_completion_task umount -f nfs_umount_begin rpc_killall_tasks rpc_signal_task rpc_task1 been wakeup and return -512 _nfs4_do_open // while loop ... nfs4_run_open_task /* rpc_task2 */ rpc_run_task rpc_wait_for_completion_task While processing an open request, nfsd will first attempt to find or allocate an nfs4_openowner. If it finds an nfs4_openowner that is not marked as NFS4_OO_CONFIRMED, this nfs4_openowner will released. Since two rpc_task can attempt to open the same file simultaneously from the client to server, and because two instances of nfsd can run concurrently, this situation can lead to lots of memory leak. Additionally, when we echo 0 to /proc/fs/nfsd/threads, warning will be triggered. NFS SERVER nfsd1 nfsd2 echo 0 > /proc/fs/nfsd/threads nfsd4_open nfsd4_process_open1 find_or_alloc_open_stateowner // alloc oo1, stateid1 nfsd4_open nfsd4_process_open1 find_or_alloc_open_stateowner // find oo1, without NFS4_OO_CONFIRMED release_openowner unhash_openowner_locked list_del_init(&oo->oo_perclient) // cannot find this oo // from client, LEAK!!! alloc_stateowner // alloc oo2 nfsd4_process_open2 init_open_stateid // associate oo1 // with stateid1, stateid1 LEAK!!! nfs4_get_vfs_file // alloc nfsd_file1 and nfsd_file_mark1 // all LEAK!!! nfsd4_process_open2 ... write_threads ... nfsd_destroy_serv nfsd_shutdown_net nfs4_state_shutdown_net nfs4_state_destroy_net destroy_client __destroy_client // won't find oo1!!! nfsd_shutdown_generic nfsd_file_cache_shutdown kmem_cache_destroy for nfsd_file_slab and nfsd_file_mark_slab // bark since nfsd_file1 // and nfsd_file_mark1 // still alive ======================================================================= BUG nfsd_file (Not tainted): Objects remaining in nfsd_file on __kmem_cache_shutdown() ----------------------------------------------------------------------- Slab 0xffd4000004438a80 objects=34 used=1 fp=0xff11000110e2ad28 flags=0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff) CPU: 4 UID: 0 PID: 757 Comm: sh Not tainted 6.12.0-rc6+ #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 slab_err+0xb0/0xf0 __kmem_cache_shutdown+0x15c/0x310 kmem_cache_destroy+0x66/0x160 nfsd_file_cache_shutdown+0xac/0x210 [nfsd] nfsd_destroy_serv+0x251/0x2a0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1ae/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Disabling lock debugging due to kernel taint Object 0xff11000110e2ac38 @offset=3128 Allocated in nfsd_file_do_acquire+0x20f/0xa30 [nfsd] age=1635 cpu=3 pid=800 nfsd_file_do_acquire+0x20f/0xa30 [nfsd] nfsd_file_acquire_opened+0x5f/0x90 [nfsd] nfs4_get_vfs_file+0x4c9/0x570 [nfsd] nfsd4_process_open2+0x713/0x1070 [nfsd] nfsd4_open+0x74b/0x8b0 [nfsd] nfsd4_proc_compound+0x70b/0xc20 [nfsd] nfsd_dispatch+0x1b4/0x3a0 [nfsd] svc_process_common+0x5b8/0xc50 [sunrpc] svc_process+0x2ab/0x3b0 [sunrpc] svc_handle_xprt+0x681/0xa20 [sunrpc] nfsd+0x183/0x220 [nfsd] kthread+0x199/0x1e0 ret_from_fork+0x31/0x60 ret_from_fork_asm+0x1a/0x30 Add nfs4_openowner_unhashed to help found unhashed nfs4_openowner, and break nfsd4_open process to fix this problem. Cc: stable@vger.kernel.org # v5.4+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09nfsd: make sure exp active before svc_export_showYang Erkun1-1/+4
commit be8f982c369c965faffa198b46060f8853e0f1f0 upstream. The function `e_show` was called with protection from RCU. This only ensures that `exp` will not be freed. Therefore, the reference count for `exp` can drop to zero, which will trigger a refcount use-after-free warning when `exp_get` is called. To resolve this issue, use `cache_get_rcu` to ensure that `exp` remains active. ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 819 at lib/refcount.c:25 refcount_warn_saturate+0xb1/0x120 CPU: 3 UID: 0 PID: 819 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb1/0x120 ... Call Trace: <TASK> e_show+0x20b/0x230 [nfsd] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: bf18f163e89c ("NFSD: Using exp_get for export getting") Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09ovl: properly handle large files in ovl_security_fileattrOleksandr Tymoshenko1-1/+6
commit 3b6b99ef15ea37635604992ede9ebcccef38a239 upstream. dentry_open in ovl_security_fileattr fails for any file larger than 2GB if open method of the underlying filesystem calls generic_file_open (e.g. fusefs). The issue can be reproduce using the following script: (passthrough_ll is an example app from libfuse). $ D=/opt/test/mnt $ mkdir -p ${D}/{source,base,top/uppr,top/work,ovlfs} $ dd if=/dev/zero of=${D}/source/zero.bin bs=1G count=2 $ passthrough_ll -o source=${D}/source ${D}/base $ mount -t overlay overlay \ -olowerdir=${D}/base,upperdir=${D}/top/uppr,workdir=${D}/top/work \ ${D}/ovlfs $ chmod 0777 ${D}/mnt/ovlfs/zero.bin Running this script results in "Value too large for defined data type" error message from chmod. Signed-off-by: Oleksandr Tymoshenko <ovt@google.com> Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09fs/proc/kcore.c: Clear ret value in read_kcore_iter after successful ↵Jiri Olsa1-0/+1
iov_iter_zero commit 088f294609d8f8816dc316681aef2eb61982e0da upstream. If iov_iter_zero succeeds after failed copy_from_kernel_nofault, we need to reset the ret value to zero otherwise it will be returned as final return value of read_kcore_iter. This fixes objdump -d dump over /proc/kcore for me. Cc: stable@vger.kernel.org Cc: Alexander Gordeev <agordeev@linux.ibm.com> Fixes: 3d5854d75e31 ("fs/proc/kcore.c: allow translation of physical memory addresses") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20241121231118.3212000-1-jolsa@kernel.org Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09ceph: fix cred leak in ceph_mds_check_access()Max Kellermann1-0/+3
commit c5cf420303256dcd6ff175643e9e9558543c2047 upstream. get_current_cred() increments the reference counter, but the put_cred() call was missing. Cc: stable@vger.kernel.org Fixes: 596afb0b8933 ("ceph: add ceph_mds_check_access() helper") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09ceph: pass cred pointer to ceph_mds_auth_match()Max Kellermann1-2/+2
commit 23426309a4064b25a961e1c72961d8bfc7c8c990 upstream. This eliminates a redundant get_current_cred() call, because ceph_mds_check_access() has already obtained this pointer. As a side effect, this also fixes a reference leak in ceph_mds_auth_match(): by omitting the get_current_cred() call, no additional cred reference is taken. Cc: stable@vger.kernel.org Fixes: 596afb0b8933 ("ceph: add ceph_mds_check_access() helper") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09ceph: extract entity name from device idPatrick Donnelly1-1/+9
commit 955710afcb3bb63e21e186451ed5eba85fa14d0b upstream. Previously, the "name" in the new device syntax "<name>@<fsid>.<fsname>" was ignored because (presumably) tests were done using mount.ceph which also passed the entity name using "-o name=foo". If mounting is done without the mount.ceph helper, the new device id syntax fails to set the name properly. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/68516 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09f2fs: fix to drop all discards after creating snapshot on lvm deviceChao Yu2-7/+21
commit bc8aeb04fd80cb8cfae3058445c84410fd0beb5e upstream. Piergiorgio reported a bug in bugzilla as below: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330 RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs] Call Trace: __issue_discard_cmd+0x1ca/0x350 [f2fs] issue_discard_thread+0x191/0x480 [f2fs] kthread+0xcf/0x100 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1a/0x30 w/ below testcase, it can reproduce this bug quickly: - pvcreate /dev/vdb - vgcreate myvg1 /dev/vdb - lvcreate -L 1024m -n mylv1 myvg1 - mount /dev/myvg1/mylv1 /mnt/f2fs - dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20 - sync - rm /mnt/f2fs/file - sync - lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1 - umount /mnt/f2fs The root cause is: it will update discard_max_bytes of mounted lvm device to zero after creating snapshot on this lvm device, then, __submit_discard_cmd() will pass parameter @nr_sects w/ zero value to __blkdev_issue_discard(), it returns a NULL bio pointer, result in panic. This patch changes as below for fixing: 1. Let's drop all remained discards in f2fs_unfreeze() if snapshot of lvm device is created. 2. Checking discard_max_bytes before submitting discard during __submit_discard_cmd(). Cc: stable@vger.kernel.org Fixes: 35ec7d574884 ("f2fs: split discard command in prior to block layer") Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09ovl: Filter invalid inodes with missing lookup functionVasiliy Kovalev1-0/+3
commit c8b359dddb418c60df1a69beea01d1b3322bfe83 upstream. Add a check to the ovl_dentry_weird() function to prevent the processing of directory inodes that lack the lookup function. This is important because such inodes can cause errors in overlayfs when passed to the lowerstack. Reported-by: syzbot+a8c9d476508bd14a90e5@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=a8c9d476508bd14a90e5 Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/linux-unionfs/CAJfpegvx-oS9XGuwpJx=Xe28_jzWx5eRo1y900_ZzWY+=gGzUg@mail.gmail.com/ Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Cc: <stable@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09btrfs: ref-verify: fix use-after-free after invalid ref actionFilipe Manana1-0/+1
[ Upstream commit 7c4e39f9d2af4abaf82ca0e315d1fd340456620f ] At btrfs_ref_tree_mod() after we successfully inserted the new ref entry (local variable 'ref') into the respective block entry's rbtree (local variable 'be'), if we find an unexpected action of BTRFS_DROP_DELAYED_REF, we error out and free the ref entry without removing it from the block entry's rbtree. Then in the error path of btrfs_ref_tree_mod() we call btrfs_free_ref_cache(), which iterates over all block entries and then calls free_block_entry() for each one, and there we will trigger a use-after-free when we are called against the block entry to which we added the freed ref entry to its rbtree, since the rbtree still points to the block entry, as we didn't remove it from the rbtree before freeing it in the error path at btrfs_ref_tree_mod(). Fix this by removing the new ref entry from the rbtree before freeing it. Syzbot report this with the following stack traces: BTRFS error (device loop0 state EA): Ref action 2, root 5, ref_root 0, parent 8564736, owner 0, offset 0, num_refs 18446744073709551615 __btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523 update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512 btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594 btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754 btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116 btrfs_insert_empty_items+0x9c/0x1a0 fs/btrfs/ctree.c:4314 btrfs_insert_empty_item fs/btrfs/ctree.h:669 [inline] btrfs_insert_orphan_item+0x1f1/0x320 fs/btrfs/orphan.c:23 btrfs_orphan_add+0x6d/0x1a0 fs/btrfs/inode.c:3482 btrfs_unlink+0x267/0x350 fs/btrfs/inode.c:4293 vfs_unlink+0x365/0x650 fs/namei.c:4469 do_unlinkat+0x4ae/0x830 fs/namei.c:4533 __do_sys_unlinkat fs/namei.c:4576 [inline] __se_sys_unlinkat fs/namei.c:4569 [inline] __x64_sys_unlinkat+0xcc/0xf0 fs/namei.c:4569 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f BTRFS error (device loop0 state EA): Ref action 1, root 5, ref_root 5, parent 0, owner 260, offset 0, num_refs 1 __btrfs_mod_ref+0x76b/0xac0 fs/btrfs/extent-tree.c:2521 update_ref_for_cow+0x96a/0x11f0 btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594 btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754 btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116 btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411 __btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030 btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline] __btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137 __btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171 btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313 prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586 relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538 BTRFS error (device loop0 state EA): Ref action 2, root 5, ref_root 0, parent 8564736, owner 0, offset 0, num_refs 18446744073709551615 __btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523 update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512 btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594 btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754 btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116 btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411 __btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030 btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline] __btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137 __btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171 btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313 prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586 relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538 ================================================================== BUG: KASAN: slab-use-after-free in rb_first+0x69/0x70 lib/rbtree.c:473 Read of size 8 at addr ffff888042d1af38 by task syz.0.0/5329 CPU: 0 UID: 0 PID: 5329 Comm: syz.0.0 Not tainted 6.12.0-rc7-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 rb_first+0x69/0x70 lib/rbtree.c:473 free_block_entry+0x78/0x230 fs/btrfs/ref-verify.c:248 btrfs_free_ref_cache+0xa3/0x100 fs/btrfs/ref-verify.c:917 btrfs_ref_tree_mod+0x139f/0x15e0 fs/btrfs/ref-verify.c:898 btrfs_free_extent+0x33c/0x380 fs/btrfs/extent-tree.c:3544 __btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523 update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512 btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594 btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754 btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116 btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411 __btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030 btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline] __btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137 __btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171 btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313 prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586 relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f996df7e719 RSP: 002b:00007f996ede7038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f996e135f80 RCX: 00007f996df7e719 RDX: 0000000020000180 RSI: 00000000c4009420 RDI: 0000000000000004 RBP: 00007f996dff139e R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f996e135f80 R15: 00007fff79f32e68 </TASK> Allocated by task 5329: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:257 [inline] __kmalloc_cache_noprof+0x19c/0x2c0 mm/slub.c:4295 kmalloc_noprof include/linux/slab.h:878 [inline] kzalloc_noprof include/linux/slab.h:1014 [inline] btrfs_ref_tree_mod+0x264/0x15e0 fs/btrfs/ref-verify.c:701 btrfs_free_extent+0x33c/0x380 fs/btrfs/extent-tree.c:3544 __btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523 update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512 btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594 btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754 btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116 btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411 __btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030 btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline] __btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137 __btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171 btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313 prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586 relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 5329: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:230 [inline] slab_free_hook mm/slub.c:2342 [inline] slab_free mm/slub.c:4579 [inline] kfree+0x1a0/0x440 mm/slub.c:4727 btrfs_ref_tree_mod+0x136c/0x15e0 btrfs_free_extent+0x33c/0x380 fs/btrfs/extent-tree.c:3544 __btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523 update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512 btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594 btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754 btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116 btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411 __btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030 btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline] __btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137 __btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171 btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313 prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586 relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The buggy address belongs to the object at ffff888042d1af00 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 56 bytes inside of freed 64-byte region [ffff888042d1af00, ffff888042d1af40) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x42d1a anon flags: 0x4fff00000000000(node=1|zone=1|lastcpupid=0x7ff) page_type: f5(slab) raw: 04fff00000000000 ffff88801ac418c0 0000000000000000 dead000000000001 raw: 0000000000000000 0000000000200020 00000001f5000000 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52c40(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 5055, tgid 5055 (dhcpcd-run-hook), ts 40377240074, free_ts 40376848335 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1541 prep_new_page mm/page_alloc.c:1549 [inline] get_page_from_freelist+0x3649/0x3790 mm/page_alloc.c:3459 __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4735 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 alloc_slab_page+0x6a/0x140 mm/slub.c:2412 allocate_slab+0x5a/0x2f0 mm/slub.c:2578 new_slab mm/slub.c:2631 [inline] ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3818 __slab_alloc+0x58/0xa0 mm/slub.c:3908 __slab_alloc_node mm/slub.c:3961 [inline] slab_alloc_node mm/slub.c:4122 [inline] __do_kmalloc_node mm/slub.c:4263 [inline] __kmalloc_noprof+0x25a/0x400 mm/slub.c:4276 kmalloc_noprof include/linux/slab.h:882 [inline] kzalloc_noprof include/linux/slab.h:1014 [inline] tomoyo_encode2 security/tomoyo/realpath.c:45 [inline] tomoyo_encode+0x26f/0x540 security/tomoyo/realpath.c:80 tomoyo_realpath_from_path+0x59e/0x5e0 security/tomoyo/realpath.c:283 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_check_open_permission+0x255/0x500 security/tomoyo/file.c:771 security_file_open+0x777/0x990 security/security.c:3109 do_dentry_open+0x369/0x1460 fs/open.c:945 vfs_open+0x3e/0x330 fs/open.c:1088 do_open fs/namei.c:3774 [inline] path_openat+0x2c84/0x3590 fs/namei.c:3933 page last free pid 5055 tgid 5055 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1112 [inline] free_unref_page+0xcfb/0xf20 mm/page_alloc.c:2642 free_pipe_info+0x300/0x390 fs/pipe.c:860 put_pipe_info fs/pipe.c:719 [inline] pipe_release+0x245/0x320 fs/pipe.c:742 __fput+0x23f/0x880 fs/file_table.c:431 __do_sys_close fs/open.c:1567 [inline] __se_sys_close fs/open.c:1552 [inline] __x64_sys_close+0x7f/0x110 fs/open.c:1552 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Memory state around the buggy address: ffff888042d1ae00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888042d1ae80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc >ffff888042d1af00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff888042d1af80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc ffff888042d1b000: 00 00 00 00 00 fc fc 00 00 00 00 00 fc fc 00 00 Reported-by: syzbot+7325f164162e200000c1@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/673723eb.050a0220.1324f8.00a8.GAE@google.com/T/#u Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09btrfs: add a sanity check for btrfs root in btrfs_search_slot()Lizhi Xu1-1/+5
[ Upstream commit 3ed51857a50f530ac7a1482e069dfbd1298558d4 ] Syzbot reports a null-ptr-deref in btrfs_search_slot(). The reproducer is using rescue=ibadroots, and the extent tree root is corrupted thus the extent tree is NULL. When scrub tries to search the extent tree to gather the needed extent info, btrfs_search_slot() doesn't check if the target root is NULL or not, resulting the null-ptr-deref. Add sanity check for btrfs root before using it in btrfs_search_slot(). Reported-by: syzbot+3030e17bd57a73d39bd7@syzkaller.appspotmail.com Fixes: 42437a6386ff ("btrfs: introduce mount option rescue=ignorebadroots") Link: https://syzkaller.appspot.com/bug?extid=3030e17bd57a73d39bd7 CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-by: syzbot+3030e17bd57a73d39bd7@syzkaller.appspotmail.com Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09btrfs: don't loop for nowait writes when checking for cross referencesFilipe Manana1-1/+1
[ Upstream commit ed67f2a913a4f0fc505db29805c41dd07d3cb356 ] When checking for delayed refs when verifying if there are cross references for a data extent, we stop if the path has nowait set and we can't try lock the delayed ref head's mutex, returning -EAGAIN with the goal of making a write fallback to a blocking context. However we ignore the -EAGAIN at btrfs_cross_ref_exist() when check_delayed_ref() returns it, and keep looping instead of immediately returning the -EAGAIN to the caller. Fix this by not looping if we get -EAGAIN and we have a nowait path. Fixes: 26ce91144631 ("btrfs: make can_nocow_extent nowait compatible") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09btrfs: fix use-after-free in btrfs_encoded_read_endio()Johannes Thumshirn1-1/+1
[ Upstream commit 05b36b04d74a517d6675bf2f90829ff1ac7e28dc ] Shinichiro reported the following use-after free that sometimes is happening in our CI system when running fstests' btrfs/284 on a TCMU runner device: BUG: KASAN: slab-use-after-free in lock_release+0x708/0x780 Read of size 8 at addr ffff888106a83f18 by task kworker/u80:6/219 CPU: 8 UID: 0 PID: 219 Comm: kworker/u80:6 Not tainted 6.12.0-rc6-kts+ #15 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020 Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] Call Trace: <TASK> dump_stack_lvl+0x6e/0xa0 ? lock_release+0x708/0x780 print_report+0x174/0x505 ? lock_release+0x708/0x780 ? __virt_addr_valid+0x224/0x410 ? lock_release+0x708/0x780 kasan_report+0xda/0x1b0 ? lock_release+0x708/0x780 ? __wake_up+0x44/0x60 lock_release+0x708/0x780 ? __pfx_lock_release+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? lock_is_held_type+0x9a/0x110 _raw_spin_unlock_irqrestore+0x1f/0x60 __wake_up+0x44/0x60 btrfs_encoded_read_endio+0x14b/0x190 [btrfs] btrfs_check_read_bio+0x8d9/0x1360 [btrfs] ? lock_release+0x1b0/0x780 ? trace_lock_acquire+0x12f/0x1a0 ? __pfx_btrfs_check_read_bio+0x10/0x10 [btrfs] ? process_one_work+0x7e3/0x1460 ? lock_acquire+0x31/0xc0 ? process_one_work+0x7e3/0x1460 process_one_work+0x85c/0x1460 ? __pfx_process_one_work+0x10/0x10 ? assign_work+0x16c/0x240 worker_thread+0x5e6/0xfc0 ? __pfx_worker_thread+0x10/0x10 kthread+0x2c3/0x3a0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 3661: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 btrfs_encoded_read_regular_fill_pages+0x16c/0x6d0 [btrfs] send_extent_data+0xf0f/0x24a0 [btrfs] process_extent+0x48a/0x1830 [btrfs] changed_cb+0x178b/0x2ea0 [btrfs] btrfs_ioctl_send+0x3bf9/0x5c20 [btrfs] _btrfs_ioctl_send+0x117/0x330 [btrfs] btrfs_ioctl+0x184a/0x60a0 [btrfs] __x64_sys_ioctl+0x12e/0x1a0 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 3661: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x70 __kasan_slab_free+0x4f/0x70 kfree+0x143/0x490 btrfs_encoded_read_regular_fill_pages+0x531/0x6d0 [btrfs] send_extent_data+0xf0f/0x24a0 [btrfs] process_extent+0x48a/0x1830 [btrfs] changed_cb+0x178b/0x2ea0 [btrfs] btrfs_ioctl_send+0x3bf9/0x5c20 [btrfs] _btrfs_ioctl_send+0x117/0x330 [btrfs] btrfs_ioctl+0x184a/0x60a0 [btrfs] __x64_sys_ioctl+0x12e/0x1a0 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e The buggy address belongs to the object at ffff888106a83f00 which belongs to the cache kmalloc-rnd-07-96 of size 96 The buggy address is located 24 bytes inside of freed 96-byte region [ffff888106a83f00, ffff888106a83f60) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888106a83800 pfn:0x106a83 flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) page_type: f5(slab) raw: 0017ffffc0000000 ffff888100053680 ffffea0004917200 0000000000000004 raw: ffff888106a83800 0000000080200019 00000001f5000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888106a83e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff888106a83e80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc >ffff888106a83f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ ffff888106a83f80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff888106a84000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Further analyzing the trace and the crash dump's vmcore file shows that the wake_up() call in btrfs_encoded_read_endio() is calling wake_up() on the wait_queue that is in the private data passed to the end_io handler. Commit 4ff47df40447 ("btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()") moved 'struct btrfs_encoded_read_private' off the stack. Before that commit one can see a corruption of the private data when analyzing the vmcore after a crash: *(struct btrfs_encoded_read_private *)0xffff88815626eec8 = { .wait = (wait_queue_head_t){ .lock = (spinlock_t){ .rlock = (struct raw_spinlock){ .raw_lock = (arch_spinlock_t){ .val = (atomic_t){ .counter = (int)-2005885696, }, .locked = (u8)0, .pending = (u8)157, .locked_pending = (u16)40192, .tail = (u16)34928, }, .magic = (unsigned int)536325682, .owner_cpu = (unsigned int)29, .owner = (void *)__SCT__tp_func_btrfs_transaction_commit+0x0 = 0x0, .dep_map = (struct lockdep_map){ .key = (struct lock_class_key *)0xffff8881575a3b6c, .class_cache = (struct lock_class *[2]){ 0xffff8882a71985c0, 0xffffea00066f5d40 }, .name = (const char *)0xffff88815626f100 = "", .wait_type_outer = (u8)37, .wait_type_inner = (u8)178, .lock_type = (u8)154, }, }, .__padding = (u8 [24]){ 0, 157, 112, 136, 50, 174, 247, 31, 29 }, .dep_map = (struct lockdep_map){ .key = (struct lock_class_key *)0xffff8881575a3b6c, .class_cache = (struct lock_class *[2]){ 0xffff8882a71985c0, 0xffffea00066f5d40 }, .name = (const char *)0xffff88815626f100 = "", .wait_type_outer = (u8)37, .wait_type_inner = (u8)178, .lock_type = (u8)154, }, }, .head = (struct list_head){ .next = (struct list_head *)0x112cca, .prev = (struct list_head *)0x47, }, }, .pending = (atomic_t){ .counter = (int)-1491499288, }, .status = (blk_status_t)130, } Here we can see several indicators of in-memory data corruption, e.g. the large negative atomic values of ->pending or ->wait->lock->rlock->raw_lock->val, as well as the bogus spinlock magic 0x1ff7ae32 (decimal 536325682 above) instead of 0xdead4ead or the bogus pointer values for ->wait->head. To fix this, change atomic_dec_return() to atomic_dec_and_test() to fix the corruption, as atomic_dec_return() is defined as two instructions on x86_64, whereas atomic_dec_and_test() is defined as a single atomic operation. This can lead to a situation where counter value is already decremented but the if statement in btrfs_encoded_read_endio() is not completely processed, i.e. the 0 test has not completed. If another thread continues executing btrfs_encoded_read_regular_fill_pages() the atomic_dec_return() there can see an already updated ->pending counter and continues by freeing the private data. Continuing in the endio handler the test for 0 succeeds and the wait_queue is woken up, resulting in a use-after-free. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Suggested-by: Damien Le Moal <Damien.LeMoal@wdc.com> Fixes: 1881fba89bd5 ("btrfs: add BTRFS_IOC_ENCODED_READ ioctl") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()Mark Harmstone1-11/+18
[ Upstream commit 68d3b27e05c7ca5545e88465f5e2be6eda0e11df ] Change btrfs_encoded_read_regular_fill_pages() so that the priv struct is allocated rather than stored on the stack, in preparation for adding an asynchronous mode to the function. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 05b36b04d74a ("btrfs: fix use-after-free in btrfs_encoded_read_endio()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09btrfs: change btrfs_encoded_read() so that reading of extent is done by callerMark Harmstone3-32/+66
[ Upstream commit 26efd44796c6dd7a64f039a0dda6d558eac97a3e ] Change the behaviour of btrfs_encoded_read() so that if it needs to read an extent from disk, it leaves the extent and inode locked and returns -EIOCBQUEUED. The caller is then responsible for doing the I/O via btrfs_encoded_read_regular() and unlocking the extent and inode. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 05b36b04d74a ("btrfs: fix use-after-free in btrfs_encoded_read_endio()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09btrfs: drop unused parameter file_offset from ↵David Sterba3-6/+5
btrfs_encoded_read_regular_fill_pages() [ Upstream commit 590168edbe6317ca9f4066215fb099f43ffe745c ] The file_offset parameter used to be passed to encoded read struct but was removed in commit b665affe93d8 ("btrfs: remove unused members from struct btrfs_encoded_read_private"). Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 05b36b04d74a ("btrfs: fix use-after-free in btrfs_encoded_read_endio()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09quota: flush quota_release_work upon quota writebackOjaswin Mujoo1-0/+2
[ Upstream commit ac6f420291b3fee1113f21d612fa88b628afab5b ] One of the paths quota writeback is called from is: freeze_super() sync_filesystem() ext4_sync_fs() dquot_writeback_dquots() Since we currently don't always flush the quota_release_work queue in this path, we can end up with the following race: 1. dquot are added to releasing_dquots list during regular operations. 2. FS Freeze starts, however, this does not flush the quota_release_work queue. 3. Freeze completes. 4. Kernel eventually tries to flush the workqueue while FS is frozen which hits a WARN_ON since transaction gets started during frozen state: ext4_journal_check_start+0x28/0x110 [ext4] (unreliable) __ext4_journal_start_sb+0x64/0x1c0 [ext4] ext4_release_dquot+0x90/0x1d0 [ext4] quota_release_workfn+0x43c/0x4d0 Which is the following line: WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE); Which ultimately results in generic/390 failing due to dmesg noise. This was detected on powerpc machine 15 cores. To avoid this, make sure to flush the workqueue during dquot_writeback_dquots() so we dont have any pending workitems after freeze. Reported-by: Disha Goel <disgoel@linux.ibm.com> CC: stable@vger.kernel.org Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") Reviewed-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241121123855.645335-2-ojaswin@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09xfs: remove unknown compat feature check in superblock write validationLong Li1-7/+0
[ Upstream commit 652f03db897ba24f9c4b269e254ccc6cc01ff1b7 ] Compat features are new features that older kernels can safely ignore, allowing read-write mounts without issues. The current sb write validation implementation returns -EFSCORRUPTED for unknown compat features, preventing filesystem write operations and contradicting the feature's definition. Additionally, if the mounted image is unclean, the log recovery may need to write to the superblock. Returning an error for unknown compat features during sb write validation can cause mount failures. Although XFS currently does not use compat feature flags, this issue affects current kernels' ability to mount images that may use compat feature flags in the future. Since superblock read validation already warns about unknown compat features, it's unnecessary to repeat this warning during write validation. Therefore, the relevant code in write validation is being removed. Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05nfs/blocklayout: Limit repeat device registration on failureBenjamin Coddington1-1/+14
[ Upstream commit 614733f9441ed53bb442d4734112ec1e24bd6da7 ] Every pNFS SCSI IO wants to do LAYOUTGET, then within the layout find the device which can drive GETDEVINFO, then finally may need to prep the device with a reservation. This slow work makes a mess of IO latencies if one of the later steps is going to fail for awhile. If we're unable to register a SCSI device, ensure we mark the device as unavailable so that it will timeout and be re-added via GETDEVINFO. This avoids repeated doomed attempts to register a device in the IO path. Add some clarifying comments as well. Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05nfs/blocklayout: Don't attempt unregister for invalid block deviceBenjamin Coddington1-4/+2
[ Upstream commit 3a4ce14d9a6b868e0787e4582420b721c04ee41e ] Since commit d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") an unmount of a pNFS SCSI layout-enabled NFS may dereference a NULL block_device in: bl_unregister_scsi+0x16/0xe0 [blocklayoutdriver] bl_free_device+0x70/0x80 [blocklayoutdriver] bl_free_deviceid_node+0x12/0x30 [blocklayoutdriver] nfs4_put_deviceid_node+0x60/0xc0 [nfsv4] nfs4_deviceid_purge_client+0x132/0x190 [nfsv4] unset_pnfs_layoutdriver+0x59/0x60 [nfsv4] nfs4_destroy_server+0x36/0x70 [nfsv4] nfs_free_server+0x23/0xe0 [nfs] deactivate_locked_super+0x30/0xb0 cleanup_mnt+0xba/0x150 task_work_run+0x59/0x90 syscall_exit_to_user_mode+0x217/0x220 do_syscall_64+0x8e/0x160 This happens because even though we were able to create the nfs4_deviceid_node, the lookup for the device was unable to attach the block device to the pnfs_block_dev. If we never found a block device to register, we can avoid this case with the PNFS_BDEV_REGISTERED flag. Move the deref behind the test for the flag. Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05nfs: ignore SB_RDONLY when mounting nfsLi Lingfeng1-1/+1
[ Upstream commit 52cb7f8f177878b4f22397b9c4d2c8f743766be3 ] When exporting only one file system with fsid=0 on the server side, the client alternately uses the ro/rw mount options to perform the mount operation, and a new vfsmount is generated each time. It can be reproduced as follows: [root@localhost ~]# mount /dev/sda /mnt2 [root@localhost ~]# echo "/mnt2 *(rw,no_root_squash,fsid=0)" >/etc/exports [root@localhost ~]# systemctl restart nfs-server [root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa [root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa [root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa [root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa [root@localhost ~]# mount | grep nfs4 127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,... 127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,... 127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,... 127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,... [root@localhost ~]# We expected that after mounting with the ro option, using the rw option to mount again would return EBUSY, but the actual situation was not the case. As shown above, when mounting for the first time, a superblock with the ro flag will be generated, and at the same time, in do_new_mount_fc --> do_add_mount, it detects that the superblock corresponding to the current target directory is inconsistent with the currently generated one (path->mnt->mnt_sb != newmnt->mnt.mnt_sb), and a new vfsmount will be generated. When mounting with the rw option for the second time, since no matching superblock can be found in the fs_supers list, a new superblock with the rw flag will be generated again. The superblock in use (ro) is different from the newly generated superblock (rw), and a new vfsmount will be generated again. When mounting with the ro option for the third time, the superblock (ro) is found in fs_supers, the superblock in use (rw) is different from the found superblock (ro), and a new vfsmount will be generated again. We can switch between ro/rw through remount, and only one superblock needs to be generated, thus avoiding the problem of repeated generation of vfsmount caused by switching superblocks. Furthermore, This can also resolve the issue described in the link. Fixes: 275a5d24bf56 ("NFS: Error when mounting the same filesystem with different options") Link: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05cifs: unlock on error in smb3_reconfigure()Dan Carpenter1-1/+3
[ Upstream commit cda88d2fef7aa7de80b5697e8009fcbbb436f42d ] Unlock before returning if smb3_sync_session_ctx_passwords() fails. Fixes: 7e654ab7da03 ("cifs: during remount, make sure passwords are in sync") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05cifs: during remount, make sure passwords are in syncShyam Prasad N2-9/+75
[ Upstream commit 0f0e357902957fba28ed31bde0d6921c6bd1485d ] This fixes scenarios where remount can overwrite the only currently working password, breaking reconnect. We recently introduced a password2 field in both ses and ctx structs. This was done so as to allow the client to rotate passwords for a mount without any downtime. However, when the client transparently handles password rotation, it can swap the values of the two password fields in the ses struct, but not in smb3_fs_context struct that hangs off cifs_sb. This can lead to a situation where a remount unintentionally overwrites a working password in the ses struct. In order to fix this, we first get the passwords in ctx struct in-sync with ses struct, before replacing them with what the passwords that could be passed as a part of remount. Also, in order to avoid race condition between smb2_reconnect and smb3_reconfigure, we make sure to lock session_mutex before changing password and password2 fields of the ses structure. Fixes: 35f834265e0d ("smb3: fix broken reconnect when password changing on the server by allowing password rotation") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05smb: Initialize cfid->tcon before performing network opsPaul Aurich1-1/+1
[ Upstream commit c353ee4fb119a2582d0e011f66a76a38f5cf984d ] Avoid leaking a tcon ref when a lease break races with opening the cached directory. Processing the leak break might take a reference to the tcon in cached_dir_lease_break() and then fail to release the ref in cached_dir_offload_close, since cfid->tcon is still NULL. Fixes: ebe98f1447bb ("cifs: enable caching of directories for which a lease is held") Signed-off-by: Paul Aurich <paul@darkrain42.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05cifs: Fix parsing reparse point with native symlink in SMB1 non-UNICODE sessionPali Rohár1-2/+1
[ Upstream commit f4ca4f5a36eac9b4da378a0f28cbbe38534a0901 ] SMB1 NT_TRANSACT_IOCTL/FSCTL_GET_REPARSE_POINT even in non-UNICODE mode returns reparse buffer in UNICODE/UTF-16 format. This is because FSCTL_GET_REPARSE_POINT is NT-based IOCTL which does not distinguish between 8-bit non-UNICODE and 16-bit UNICODE modes and its path buffers are always encoded in UTF-16. This change fixes reading of native symlinks in SMB1 when UNICODE session is not active. Fixes: ed3e0a149b58 ("smb: client: implement ->query_reparse_point() for SMB1") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05cifs: Fix parsing native symlinks relative to the exportPali Rohár9-28/+108
[ Upstream commit 723f4ef90452aa629f3d923e92e0449d69362b1d ] SMB symlink which has SYMLINK_FLAG_RELATIVE set is relative (as opposite of the absolute) and it can be relative either to the current directory (where is the symlink stored) or relative to the top level export path. To what it is relative depends on the first character of the symlink target path. If the first character is path separator then symlink is relative to the export, otherwise to the current directory. Linux (and generally POSIX systems) supports only symlink paths relative to the current directory where is symlink stored. Currently if Linux SMB client reads relative SMB symlink with first character as path separator (slash), it let as is. Which means that Linux interpret it as absolute symlink pointing from the root (/). But this location is different than the top level directory of SMB export (unless SMB export was mounted to the root) and thefore SMB symlinks relative to the export are interpreted wrongly by Linux SMB client. Fix this problem. As Linux does not have equivalent of the path relative to the top of the mount point, convert such symlink target path relative to the current directory. Do this by prepending "../" pattern N times before the SMB target path, where N is the number of path separators found in SMB symlink path. So for example, if SMB share is mounted to Linux path /mnt/share/, symlink is stored in file /mnt/share/test/folder1/symlink (so SMB symlink path is test\folder1\symlink) and SMB symlink target points to \test\folder2\file, then convert symlink target path to Linux path ../../test/folder2/file. Deduplicate code for parsing SMB symlinks in native form from functions smb2_parse_symlink_response() and parse_reparse_native_symlink() into new function smb2_parse_native_symlink() and pass into this new function a new full_path parameter from callers, which specify SMB full path where is symlink stored. This change fixes resolving of the native Windows symlinks relative to the top level directory of the SMB share. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Stable-dep-of: f4ca4f5a36ea ("cifs: Fix parsing reparse point with native symlink in SMB1 non-UNICODE session") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05smb: client: disable directory caching when dir_cache_timeout is zeroHenrique Carvalho1-1/+1
[ Upstream commit ceaf1451990e3ea7fb50aebb5a149f57945f6e9f ] Setting dir_cache_timeout to zero should disable the caching of directory contents. Currently, even when dir_cache_timeout is zero, some caching related functions are still invoked, which is unintended behavior. Fix the issue by setting tcon->nohandlecache to true when dir_cache_timeout is zero, ensuring that directory handle caching is properly disabled. Fixes: 238b351d0935 ("smb3: allow controlling length of time directory entries are cached with dir leases") Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05Revert "nfs: don't reuse partially completed requests in ↵Trond Myklebust1-20/+29
nfs_lock_and_join_requests" [ Upstream commit 66f9dac9077c9c063552e465212abeb8f97d28a7 ] This reverts commit b571cfcb9dcac187c6d967987792d37cb0688610. This patch appears to assume that if one request is complete, then the others will complete too before unlocking. That is not a valid assumption, since other requests could hit a non-fatal error or a short write that would cause them not to complete. Reported-by: Igor Raits <igor@gooddata.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219508 Fixes: b571cfcb9dca ("nfs: don't reuse partially completed requests in nfs_lock_and_join_requests") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05hostfs: Fix the NULL vs IS_ERR() bug for __filemap_get_folio()ZhangPeng1-2/+2
[ Upstream commit bed2cc482600296fe04edbc38005ba2851449c10 ] The __filemap_get_folio() function returns error pointers. It never returns NULL. So use IS_ERR() to check it. Fixes: 1da86618bdce ("fs: Convert aops->write_begin to take a folio") Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05jffs2: fix use of uninitialized variableQingfang Deng1-4/+3
[ Upstream commit 3ba44ee966bc3c41dd8a944f963466c8fcc60dc8 ] When building the kernel with -Wmaybe-uninitialized, the compiler reports this warning: In function 'jffs2_mark_erased_block', inlined from 'jffs2_erase_pending_blocks' at fs/jffs2/erase.c:116:4: fs/jffs2/erase.c:474:9: warning: 'bad_offset' may be used uninitialized [-Wmaybe-uninitialized] 474 | jffs2_erase_failed(c, jeb, bad_offset); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/jffs2/erase.c: In function 'jffs2_erase_pending_blocks': fs/jffs2/erase.c:402:18: note: 'bad_offset' was declared here 402 | uint32_t bad_offset; | ^~~~~~~~~~ When mtd->point() is used, jffs2_erase_pending_blocks can return -EIO without initializing bad_offset, which is later used at the filebad label in jffs2_mark_erased_block. Fix it by initializing this variable. Fixes: 8a0f572397ca ("[JFFS2] Return values of jffs2_block_check_erase error paths") Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05ubifs: authentication: Fix use-after-free in ubifs_tnc_end_commitWaqar Hameed1-0/+2
[ Upstream commit 4617fb8fc15effe8eda4dd898d4e33eb537a7140 ] After an insertion in TNC, the tree might split and cause a node to change its `znode->parent`. A further deletion of other nodes in the tree (which also could free the nodes), the aforementioned node's `znode->cparent` could still point to a freed node. This `znode->cparent` may not be updated when getting nodes to commit in `ubifs_tnc_start_commit()`. This could then trigger a use-after-free when accessing the `znode->cparent` in `write_index()` in `ubifs_tnc_end_commit()`. This can be triggered by running rm -f /etc/test-file.bin dd if=/dev/urandom of=/etc/test-file.bin bs=1M count=60 conv=fsync in a loop, and with `CONFIG_UBIFS_FS_AUTHENTICATION`. KASAN then reports: BUG: KASAN: use-after-free in ubifs_tnc_end_commit+0xa5c/0x1950 Write of size 32 at addr ffffff800a3af86c by task ubifs_bgt0_20/153 Call trace: dump_backtrace+0x0/0x340 show_stack+0x18/0x24 dump_stack_lvl+0x9c/0xbc print_address_description.constprop.0+0x74/0x2b0 kasan_report+0x1d8/0x1f0 kasan_check_range+0xf8/0x1a0 memcpy+0x84/0xf4 ubifs_tnc_end_commit+0xa5c/0x1950 do_commit+0x4e0/0x1340 ubifs_bg_thread+0x234/0x2e0 kthread+0x36c/0x410 ret_from_fork+0x10/0x20 Allocated by task 401: kasan_save_stack+0x38/0x70 __kasan_kmalloc+0x8c/0xd0 __kmalloc+0x34c/0x5bc tnc_insert+0x140/0x16a4 ubifs_tnc_add+0x370/0x52c ubifs_jnl_write_data+0x5d8/0x870 do_writepage+0x36c/0x510 ubifs_writepage+0x190/0x4dc __writepage+0x58/0x154 write_cache_pages+0x394/0x830 do_writepages+0x1f0/0x5b0 filemap_fdatawrite_wbc+0x170/0x25c file_write_and_wait_range+0x140/0x190 ubifs_fsync+0xe8/0x290 vfs_fsync_range+0xc0/0x1e4 do_fsync+0x40/0x90 __arm64_sys_fsync+0x34/0x50 invoke_syscall.constprop.0+0xa8/0x260 do_el0_svc+0xc8/0x1f0 el0_svc+0x34/0x70 el0t_64_sync_handler+0x108/0x114 el0t_64_sync+0x1a4/0x1a8 Freed by task 403: kasan_save_stack+0x38/0x70 kasan_set_track+0x28/0x40 kasan_set_free_info+0x28/0x4c __kasan_slab_free+0xd4/0x13c kfree+0xc4/0x3a0 tnc_delete+0x3f4/0xe40 ubifs_tnc_remove_range+0x368/0x73c ubifs_tnc_remove_ino+0x29c/0x2e0 ubifs_jnl_delete_inode+0x150/0x260 ubifs_evict_inode+0x1d4/0x2e4 evict+0x1c8/0x450 iput+0x2a0/0x3c4 do_unlinkat+0x2cc/0x490 __arm64_sys_unlinkat+0x90/0x100 invoke_syscall.constprop.0+0xa8/0x260 do_el0_svc+0xc8/0x1f0 el0_svc+0x34/0x70 el0t_64_sync_handler+0x108/0x114 el0t_64_sync+0x1a4/0x1a8 The offending `memcpy()` in `ubifs_copy_hash()` has a use-after-free when a node becomes root in TNC but still has a `cparent` to an already freed node. More specifically, consider the following TNC: zroot / / zp1 / / zn Inserting a new node `zn_new` with a key smaller then `zn` will trigger a split in `tnc_insert()` if `zp1` is full: zroot / \ / \ zp1 zp2 / \ / \ zn_new zn `zn->parent` has now been moved to `zp2`, *but* `zn->cparent` still points to `zp1`. Now, consider a removal of all the nodes _except_ `zn`. Just when `tnc_delete()` is about to delete `zroot` and `zp2`: zroot \ \ zp2 \ \ zn `zroot` and `zp2` get freed and the tree collapses: zn `zn` now becomes the new `zroot`. `get_znodes_to_commit()` will now only find `zn`, the new `zroot`, and `write_index()` will check its `znode->cparent` that wrongly points to the already freed `zp1`. `ubifs_copy_hash()` thus gets wrongly called with `znode->cparent->zbranch[znode->iip].hash` that triggers the use-after-free! Fix this by explicitly setting `znode->cparent` to `NULL` in `get_znodes_to_commit()` for the root node. The search for the dirty nodes is bottom-up in the tree. Thus, when `find_next_dirty(znode)` returns NULL, the current `znode` _is_ the root node. Add an assert for this. Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes") Tested-by: Waqar Hameed <waqar.hameed@axis.com> Co-developed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Waqar Hameed <waqar.hameed@axis.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05ubifs: Correct the total block count by deducting journal reservationZhihao Cheng1-3/+3
[ Upstream commit 84a2bee9c49769310efa19601157ef50a1df1267 ] Since commit e874dcde1cbf ("ubifs: Reserve one leb for each journal head while doing budget"), available space is calulated by deducting reservation for all journal heads. However, the total block count ( which is only used by statfs) is not updated yet, which will cause the wrong displaying for used space(total - available). Fix it by deducting reservation for all journal heads from total block count. Fixes: e874dcde1cbf ("ubifs: Reserve one leb for each journal head while doing budget") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05nfs/localio: must clear res.replen in nfs_local_read_doneNeilBrown1-0/+6
[ Upstream commit 650703bc4ed3edf841e851c99ab8e7ba9e5262a3 ] Otherwise memory corruption can occur due to NFSv3 LOCALIO reads leaving garbage in res.replen: - nfs3_read_done() copies that into server->read_hdrsize; from there nfs3_proc_read_setup() copies it to args.replen in new requests. - nfs3_xdr_enc_read3args() passes that to rpc_prepare_reply_pages() which includes it in hdrsize for xdr_init_pages, so that rq_rcv_buf contains a ridiculous len. - This is copied to rq_private_buf and xs_read_stream_request() eventually passes the kvec to sock_recvmsg() which receives incoming data into entirely the wrong place. This is easily reproduced with NFSv3 LOCALIO that is servicing reads when it is made to pivot back to using normal RPC. This switch back to using normal NFSv3 with RPC can occur for a few reasons but this issue was exposed with a test that stops and then restarts the NFSv3 server while LOCALIO is performing heavy read IO. Fixes: 70ba381e1a43 ("nfs: add LOCALIO support") Reported-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05NFSv4.0: Fix a use-after-free problem in the asynchronous open()Trond Myklebust1-3/+5
[ Upstream commit 2fdb05dc0931250574f0cb0ebeb5ed8e20f4a889 ] Yang Erkun reports that when two threads are opening files at the same time, and are forced to abort before a reply is seen, then the call to nfs_release_seqid() in nfs4_opendata_free() can result in a use-after-free of the pointer to the defunct rpc task of the other thread. The fix is to ensure that if the RPC call is aborted before the call to nfs_wait_on_sequence() is complete, then we must call nfs_release_seqid() in nfs4_open_release() before the rpc_task is freed. Reported-by: Yang Erkun <yangerkun@huawei.com> Fixes: 24ac23ab88df ("NFSv4: Convert open() into an asynchronous RPC call") Reviewed-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05NFSD: Prevent a potential integer overflowChuck Lever1-7/+7
commit 7f33b92e5b18e904a481e6e208486da43e4dc841 upstream. If the tag length is >= U32_MAX - 3 then the "length + 4" addition can result in an integer overflow. Address this by splitting the decoding into several steps so that decode_cb_compound4res() does not have to perform arithmetic on the unsafe length value. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05f2fs: fix to do sanity check on node blkaddr in truncate_node()Chao Yu1-0/+10
commit 6babe00ccd34fc65b78ef8b99754e32b4385f23d upstream. syzbot reports a f2fs bug as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:2534! RIP: 0010:f2fs_invalidate_blocks+0x35f/0x370 fs/f2fs/segment.c:2534 Call Trace: truncate_node+0x1ae/0x8c0 fs/f2fs/node.c:909 f2fs_remove_inode_page+0x5c2/0x870 fs/f2fs/node.c:1288 f2fs_evict_inode+0x879/0x15c0 fs/f2fs/inode.c:856 evict+0x4e8/0x9b0 fs/inode.c:723 f2fs_handle_failed_inode+0x271/0x2e0 fs/f2fs/inode.c:986 f2fs_create+0x357/0x530 fs/f2fs/namei.c:394 lookup_open fs/namei.c:3595 [inline] open_last_lookups fs/namei.c:3694 [inline] path_openat+0x1c03/0x3590 fs/namei.c:3930 do_filp_open+0x235/0x490 fs/namei.c:3960 do_sys_openat2+0x13e/0x1d0 fs/open.c:1415 do_sys_open fs/open.c:1430 [inline] __do_sys_openat fs/open.c:1446 [inline] __se_sys_openat fs/open.c:1441 [inline] __x64_sys_openat+0x247/0x2a0 fs/open.c:1441 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0010:f2fs_invalidate_blocks+0x35f/0x370 fs/f2fs/segment.c:2534 The root cause is: on a fuzzed image, blkaddr in nat entry may be corrupted, then it will cause system panic when using it in f2fs_invalidate_blocks(), to avoid this, let's add sanity check on nat blkaddr in truncate_node(). Reported-by: syzbot+33379ce4ac76acf7d0c7@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/0000000000009a6cd706224ca720@google.com/ Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>