summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
6 daysnetfs: Fix potential uninitialised var in netfs_extract_user_iter()David Howells1-1/+1
commit 7e3d8db899d54af39fafb2eb3392b0cdae9973b5 upstream. In netfs_extract_user_iter(), if it's given a zero-length iterator, it will fall through the loop without setting ret, and so the error handling behaviour will be undefined, depending on whether ret happens to be negative. The value of ret then propagates back up the callstack. Fix this by presetting ret to 0. Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-9-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysf2fs: fix false alarm of lockdep on cp_global_sem lockChao Yu2-0/+14
[ Upstream commit 6a5e3de9c2bb0b691d16789a5d19e9276a09b308 ] lockdep reported a potential deadlock: a) TCMU device removal context: - call del_gendisk() to get q->q_usage_counter - call start_flush_work() to get work_completion of wb->dwork b) f2fs writeback context: - in wb_workfn(), which holds work_completion of wb->dwork - call f2fs_balance_fs() to get sbi->gc_lock c) f2fs vfs_write context: - call f2fs_gc() to get sbi->gc_lock - call f2fs_write_checkpoint() to get sbi->cp_global_sem d) f2fs mount context: - call recover_fsync_data() to get sbi->cp_global_sem - call f2fs_check_and_fix_write_pointer() to call blkdev_report_zones() that goes down to blk_mq_alloc_request and get q->q_usage_counter Original callstack is in Closes tag. However, I think this is a false alarm due to before mount returns successfully (context d), we can not access file therein via vfs_write (context c). Let's introduce per-sb cp_global_sem_key, and assign the key for cp_global_sem, so that lockdep can recognize cp_global_sem from different super block correctly. A lot of work are done by Shin'ichiro Kawasaki, thanks a lot for the work. Fixes: c426d99127b1 ("f2fs: Check write pointer consistency of open zones") Cc: stable@kernel.org Reported-and-tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/20260218125237.3340441-1-shinichiro.kawasaki@wdc.com Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ adapted context to use `init_f2fs_rwsem()` instead of the not-yet-backported `init_f2fs_rwsem_trace()` macro ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysf2fs: fix incorrect file address mapping when inline inode is unwrittenYongpeng Yang1-4/+9
[ Upstream commit 68a0178981a0f493295afa29f8880246e561494c ] When `fileinfo->fi_flags` does not have the `FIEMAP_FLAG_SYNC` bit set and inline data has not been persisted yet, the physical address of the extent is calculated incorrectly for unwritten inline inodes. root@vm:/mnt/f2fs# dd if=/dev/zero of=data.3k bs=3k count=1 root@vm:/mnt/f2fs# f2fs_io fiemap 0 100 data.3k Fiemap: offset = 0 len = 100 logical addr. physical addr. length flags 0 0000000000000000 00000ffffffff16c 0000000000000c00 00000301 This patch fixes the issue by checking if the inode's address is valid. If the inline inode is unwritten, set the physical address to 0 and mark the extent with `FIEMAP_EXTENT_UNKNOWN | FIEMAP_EXTENT_DELALLOC` flags. Cc: stable@kernel.org Fixes: 67f8cf3cee6f ("f2fs: support fiemap for inline_data") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ renamed `ifolio` to `ipage` in `inline_data_addr()` and `F2FS_INODE()` calls ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysbtrfs: do not mark inode incompressible after inline attempt failsQu Wenruo1-0/+6
[ Upstream commit 2e0e3716c7b6f8d71df2fbe709b922e54700f71b ] [BUG] The following sequence will set the file with nocompress flag: # mkfs.btrfs -f $dev # mount $dev $mnt -o max_inline=4,compress # xfs_io -f -c "pwrite 0 2k" -c sync $mnt/foobar The inode will have NOCOMPRESS flag, even if the content itself (all 0xcd) can still be compressed very well: item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160 generation 9 transid 10 size 2097152 nbytes 1052672 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 257 flags 0x8(NOCOMPRESS) Please note that, this behavior is there even before commit 59615e2c1f63 ("btrfs: reject single block sized compression early"). [CAUSE] At compress_file_range(), after btrfs_compress_folios() call, we try making an inlined extent by calling cow_file_range_inline(). But cow_file_range_inline() calls can_cow_file_range_inline() which has more accurate checks on if the range can be inlined. One of the user configurable conditions is the "max_inline=" mount option. If that value is set low (like the example, 4 bytes, which cannot store any header), or the compressed content is just slightly larger than 2K (the default value, meaning a 50% compression ratio), cow_file_range_inline() will return 1 immediately. And since we're here only to try inline the compressed data, the range is no larger than a single fs block. Thus compression is never going to make it a win, we fall back to marking the inode incompressible unavoidably. [FIX] Just add an extra check after inline attempt, so that if the inline attempt failed, do not set the nocompress flag. As there is no way to remove that flag, and the default 50% compression ratio is way too strict for the whole inode. CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 dayssmb: client: Use FullSessionKey for AES-256 encryption key derivationPiyush Sachdeva2-8/+26
[ Upstream commit 5be7a0cef3229fb3b63a07c0d289daf752545424 ] When Kerberos authentication is used with AES-256 encryption (AES-256-CCM or AES-256-GCM), the SMB3 encryption and decryption keys must be derived using the full session key (Session.FullSessionKey) rather than just the first 16 bytes (Session.SessionKey). Per MS-SMB2 section 3.2.5.3.1, when Connection.Dialect is "3.1.1" and Connection.CipherId is AES-256-CCM or AES-256-GCM, Session.FullSessionKey must be set to the full cryptographic key from the GSS authentication context. The encryption and decryption key derivation (SMBC2SCipherKey, SMBS2CCipherKey) must use this FullSessionKey as the KDF input. The signing key derivation continues to use Session.SessionKey (first 16 bytes) in all cases. Previously, generate_key() hardcoded SMB2_NTLMV2_SESSKEY_SIZE (16) as the HMAC-SHA256 key input length for all derivations. When Kerberos with AES-256 provides a 32-byte session key, the KDF for encryption/decryption was using only the first 16 bytes, producing keys that did not match the server's, causing mount failures with sec=krb5 and require_gcm_256=1. Add a full_key_size parameter to generate_key() and pass the appropriate size from generate_smb3signingkey(): - Signing: always SMB2_NTLMV2_SESSKEY_SIZE (16 bytes) - Encryption/Decryption: ses->auth_key.len when AES-256, otherwise 16 Also fix cifs_dump_full_key() to report the actual session key length for AES-256 instead of hardcoded CIFS_SESS_KEY_SIZE, so that userspace tools like Wireshark receive the correct key for decryption. Cc: <stable@vger.kernel.org> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com> Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> [ adapted upstream's void/hmac_sha256_init_usingrawkey-based generate_key() to 6.12's int-return crypto_shash_* form while threading full_key_size through all callers. ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysbtrfs: fix missing last_unlink_trans update when removing a directoryFilipe Manana1-0/+2
[ Upstream commit 999757231c49376cd1a37308d2c8c4c9932571e1 ] When removing a directory we are not updating its last_unlink_trans field, which can result in incorrect fsync behaviour in case some one fsyncs the directory after it was removed because it's holding a file descriptor on it. Example scenario: mkdir /mnt/dir1 mkdir /mnt/dir1/dir2 mkdir /mnt/dir3 sync -f /mnt # Do some change to the directory and fsync it. chmod 700 /mnt/dir1 xfs_io -c fsync /mnt/dir1 # Move dir2 out of dir1 so that dir1 becomes empty. mv /mnt/dir1/dir2 /mnt/dir3/ open fd on /mnt/dir1 call rmdir(2) on path "/mnt/dir1" fsync fd <trigger power failure> When attempting to mount the filesystem, the log replay will fail with an -EIO error and dmesg/syslog has the following: [445771.626482] BTRFS info (device dm-0): first mount of filesystem 0368bbea-6c5e-44b5-b409-09abe496e650 [445771.626486] BTRFS info (device dm-0): using crc32c checksum algorithm [445771.627912] BTRFS info (device dm-0): start tree-log replay [445771.628335] page: refcount:2 mapcount:0 mapping:0000000061443ddc index:0x1d00 pfn:0x7072a5 [445771.629453] memcg:ffff89f400351b00 [445771.629892] aops:btree_aops [btrfs] ino:1 [445771.630737] flags: 0x17fffc00000402a(uptodate|lru|private|writeback|node=0|zone=2|lastcpupid=0x1ffff) [445771.632359] raw: 017fffc00000402a fffff47284d950c8 fffff472907b7c08 ffff89f458e412b8 [445771.633713] raw: 0000000000001d00 ffff89f6c51d1a90 00000002ffffffff ffff89f400351b00 [445771.635029] page dumped because: eb page dump [445771.635825] BTRFS critical (device dm-0): corrupt leaf: root=5 block=30408704 slot=10 ino=258, invalid nlink: has 2 expect no more than 1 for dir [445771.638088] BTRFS info (device dm-0): leaf 30408704 gen 10 total ptrs 17 free space 14878 owner 5 [445771.638091] BTRFS info (device dm-0): refs 4 lock_owner 0 current 3581087 [445771.638094] item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 [445771.638097] inode generation 3 transid 9 size 16 nbytes 16384 [445771.638098] block group 0 mode 40755 links 1 uid 0 gid 0 [445771.638100] rdev 0 sequence 2 flags 0x0 [445771.638102] atime 1775744884.0 [445771.660056] ctime 1775744885.645502983 [445771.660058] mtime 1775744885.645502983 [445771.660060] otime 1775744884.0 [445771.660062] item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 [445771.660064] index 0 name_len 2 [445771.660066] item 2 key (256 DIR_ITEM 1843588421) itemoff 16077 itemsize 34 [445771.660068] location key (259 1 0) type 2 [445771.660070] transid 9 data_len 0 name_len 4 [445771.660075] item 3 key (256 DIR_ITEM 2363071922) itemoff 16043 itemsize 34 [445771.660076] location key (257 1 0) type 2 [445771.660077] transid 9 data_len 0 name_len 4 [445771.660078] item 4 key (256 DIR_INDEX 2) itemoff 16009 itemsize 34 [445771.660079] location key (257 1 0) type 2 [445771.660080] transid 9 data_len 0 name_len 4 [445771.660081] item 5 key (256 DIR_INDEX 3) itemoff 15975 itemsize 34 [445771.660082] location key (259 1 0) type 2 [445771.660083] transid 9 data_len 0 name_len 4 [445771.660084] item 6 key (257 INODE_ITEM 0) itemoff 15815 itemsize 160 [445771.660086] inode generation 9 transid 9 size 8 nbytes 0 [445771.660087] block group 0 mode 40777 links 1 uid 0 gid 0 [445771.660088] rdev 0 sequence 2 flags 0x0 [445771.660089] atime 1775744885.641174097 [445771.660090] ctime 1775744885.645502983 [445771.660091] mtime 1775744885.645502983 [445771.660105] otime 1775744885.641174097 [445771.660106] item 7 key (257 INODE_REF 256) itemoff 15801 itemsize 14 [445771.660107] index 2 name_len 4 [445771.660108] item 8 key (257 DIR_ITEM 2676584006) itemoff 15767 itemsize 34 [445771.660109] location key (258 1 0) type 2 [445771.660110] transid 9 data_len 0 name_len 4 [445771.660111] item 9 key (257 DIR_INDEX 2) itemoff 15733 itemsize 34 [445771.660112] location key (258 1 0) type 2 [445771.660113] transid 9 data_len 0 name_len 4 [445771.660114] item 10 key (258 INODE_ITEM 0) itemoff 15573 itemsize 160 [445771.660115] inode generation 9 transid 10 size 0 nbytes 0 [445771.660116] block group 0 mode 40755 links 2 uid 0 gid 0 [445771.660117] rdev 0 sequence 0 flags 0x0 [445771.660118] atime 1775744885.645502983 [445771.660119] ctime 1775744885.645502983 [445771.660120] mtime 1775744885.645502983 [445771.660121] otime 1775744885.645502983 [445771.660122] item 11 key (258 INODE_REF 257) itemoff 15559 itemsize 14 [445771.660123] index 2 name_len 4 [445771.660124] item 12 key (258 INODE_REF 259) itemoff 15545 itemsize 14 [445771.660125] index 2 name_len 4 [445771.660126] item 13 key (259 INODE_ITEM 0) itemoff 15385 itemsize 160 [445771.660127] inode generation 9 transid 10 size 8 nbytes 0 [445771.660128] block group 0 mode 40755 links 1 uid 0 gid 0 [445771.660129] rdev 0 sequence 1 flags 0x0 [445771.660130] atime 1775744885.645502983 [445771.660130] ctime 1775744885.645502983 [445771.660131] mtime 1775744885.645502983 [445771.660132] otime 1775744885.645502983 [445771.660133] item 14 key (259 INODE_REF 256) itemoff 15371 itemsize 14 [445771.660134] index 3 name_len 4 [445771.660135] item 15 key (259 DIR_ITEM 2676584006) itemoff 15337 itemsize 34 [445771.660136] location key (258 1 0) type 2 [445771.660137] transid 10 data_len 0 name_len 4 [445771.660138] item 16 key (259 DIR_INDEX 2) itemoff 15303 itemsize 34 [445771.660139] location key (258 1 0) type 2 [445771.660140] transid 10 data_len 0 name_len 4 [445771.660144] BTRFS error (device dm-0): block=30408704 write time tree block corruption detected [445771.661650] ------------[ cut here ]------------ [445771.662358] WARNING: fs/btrfs/disk-io.c:326 at btree_csum_one_bio+0x217/0x230 [btrfs], CPU#8: mount/3581087 [445771.663588] Modules linked in: btrfs f2fs xfs (...) [445771.671229] CPU: 8 UID: 0 PID: 3581087 Comm: mount Tainted: G W 7.0.0-rc6-btrfs-next-230+ #2 PREEMPT(full) [445771.672575] Tainted: [W]=WARN [445771.672987] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [445771.674460] RIP: 0010:btree_csum_one_bio+0x217/0x230 [btrfs] [445771.675222] Code: 89 44 24 (...) [445771.677364] RSP: 0018:ffffd23882247660 EFLAGS: 00010246 [445771.678029] RAX: 0000000000000000 RBX: ffff89f6c51d1a90 RCX: 0000000000000000 [445771.678975] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff89f406020000 [445771.679983] RBP: ffff89f821204000 R08: 0000000000000000 R09: 00000000ffefffff [445771.680905] R10: ffffd23882247448 R11: 0000000000000003 R12: ffffd23882247668 [445771.681978] R13: ffff89f458e40fc0 R14: ffff89f737f4f500 R15: ffff89f737f4f500 [445771.682912] FS: 00007f0447a98840(0000) GS:ffff89fb9771d000(0000) knlGS:0000000000000000 [445771.684393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [445771.685230] CR2: 00007f0447bf1330 CR3: 000000017cb02002 CR4: 0000000000370ef0 [445771.686273] Call Trace: [445771.686646] <TASK> [445771.686969] btrfs_submit_bbio+0x83f/0x860 [btrfs] [445771.687750] ? write_one_eb+0x28f/0x340 [btrfs] [445771.688428] btree_writepages+0x2e3/0x550 [btrfs] [445771.689180] ? kmem_cache_alloc_noprof+0x12a/0x490 [445771.689963] ? alloc_extent_state+0x19/0x120 [btrfs] [445771.690801] ? kmem_cache_free+0x135/0x380 [445771.691328] ? preempt_count_add+0x69/0xa0 [445771.691831] ? set_extent_bit+0x252/0x8e0 [btrfs] [445771.692468] ? xas_load+0x9/0xc0 [445771.692873] ? xas_find+0x14d/0x1a0 [445771.693304] do_writepages+0xc6/0x160 [445771.693756] filemap_writeback+0xb8/0xe0 [445771.694274] btrfs_write_marked_extents+0x61/0x170 [btrfs] [445771.694999] btrfs_write_and_wait_transaction+0x4e/0xc0 [btrfs] [445771.695818] btrfs_commit_transaction+0x5c8/0xd10 [btrfs] [445771.696530] ? kmem_cache_free+0x135/0x380 [445771.697120] ? release_extent_buffer+0x34/0x160 [btrfs] [445771.697786] btrfs_recover_log_trees+0x7be/0x7e0 [btrfs] [445771.698525] ? __pfx_replay_one_buffer+0x10/0x10 [btrfs] [445771.699206] open_ctree+0x11e5/0x1810 [btrfs] [445771.699776] btrfs_get_tree.cold+0xb/0x162 [btrfs] [445771.700463] ? fscontext_read+0x165/0x180 [445771.701146] ? rw_verify_area+0x50/0x180 [445771.701866] vfs_get_tree+0x25/0xd0 [445771.702491] vfs_cmd_create+0x59/0xe0 [445771.703125] __do_sys_fsconfig+0x303/0x610 [445771.703603] do_syscall_64+0xe9/0xf20 [445771.703974] entry_SYSCALL_64_after_hwframe+0x76/0x7e [445771.704700] RIP: 0033:0x7f0447cbd4aa [445771.705108] Code: 73 01 c3 (...) [445771.707263] RSP: 002b:00007ffc4e528318 EFLAGS: 00000246 ORIG_RAX: 00000000000001af [445771.708107] RAX: ffffffffffffffda RBX: 00005561585d8c20 RCX: 00007f0447cbd4aa [445771.708931] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 [445771.709744] RBP: 00005561585d9120 R08: 0000000000000000 R09: 0000000000000000 [445771.710674] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [445771.711477] R13: 00007f0447e4f580 R14: 00007f0447e5126c R15: 00007f0447e36a23 [445771.712277] </TASK> [445771.712541] ---[ end trace 0000000000000000 ]--- [445771.713382] BTRFS error (device dm-0): error while writing out transaction: -5 [445771.714679] BTRFS warning (device dm-0): Skipping commit of aborted transaction. [445771.715562] BTRFS error (device dm-0 state A): Transaction aborted (error -5) [445771.716459] BTRFS: error (device dm-0 state A) in cleanup_transaction:2068: errno=-5 IO failure [445771.717936] BTRFS error (device dm-0 state EA): failed to recover log trees with error: -5 [445771.719681] BTRFS error (device dm-0 state EA): open_ctree failed: -5 The problem is that such a fsync should have result in a fallback to a transaction commit, but that did not happen because through the btrfs_rmdir() we never update the directory's last_unlink_trans field. Any inode that had a link removed must have its last_unlink_trans updated to the ID of transaction used for the operation, otherwise fsync and log replay will not work correctly. btrfs_rmdir() calls btrfs_unlink_inode() and through that call chain we never call btrfs_record_unlink_dir() in order to update last_unlink_trans. However btrfs_unlink(), which is used for unlinking regular files, calls btrfs_record_unlink_dir() and then calls btrfs_unlink_inode(). So fix this by moving the call to btrfs_record_unlink_dir() from btrfs_unlink() to btrfs_unlink_inode(). A test case for fstests will follow soon. Reported-by: Slava0135 <slava.kovalevskiy.2014@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAAJYhww5ov62Hm+n+tmhcL-e_4cBobg+OWogKjOJxVUXivC=MQ@mail.gmail.com/ CC: stable@vger.kernel.org Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysbtrfs: use btrfs inodes in btrfs_rmdir() to avoid so much usage of BTRFS_I()Filipe Manana1-15/+16
[ Upstream commit 98060e1611177ddc842601a58258876ab435fdbf ] Almost everywhere we want to use a btrfs inode and therefore we have a lot of calls to BTRFS_I(), making the code more verbose. Instead use btrfs inode local variables to avoid so much use of BTRFS_I(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 999757231c49 ("btrfs: fix missing last_unlink_trans update when removing a directory") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysbtrfs: use inode already stored in local variable at btrfs_rmdir()Filipe Manana1-2/+1
[ Upstream commit 9f82a4ed34d870b5719f9b95f7da4f74d3325a6f ] There's no need to call d_inode(dentry) when calling btrfs_unlink_inode() since we have already stored that in a local inode variable. So just use the local variable to make the code less verbose. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 999757231c49 ("btrfs: fix missing last_unlink_trans update when removing a directory") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 dayseventfs: Use list_add_tail_rcu() for SRCU-protected children listDavid Carlier1-1/+1
[ Upstream commit f67950b2887fa10df50c4317a1fe98a65bc6875b ] Commit d2603279c7d6 ("eventfs: Use list_del_rcu() for SRCU protected list variable") converted the removal side to pair with the list_for_each_entry_srcu() walker in eventfs_iterate(). The insertion in eventfs_create_dir() was left as a plain list_add_tail(), which on weakly-ordered architectures can expose a new entry to the SRCU reader before its list pointers and fields are observable. Use list_add_tail_rcu() so the publication pairs with the existing list_del_rcu() and list_for_each_entry_srcu(). Fixes: 43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260418152251.199343-1-devnexen@gmail.com Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [ adapted scoped_guard(mutex, &eventfs_mutex) block to explicit mutex_lock()/mutex_unlock() pair ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysnetfs: fix error handling in netfs_extract_user_iter()Paulo Alcantara1-3/+10
commit 0aad5704c6b4d14007d4eab15883e8524e4310f4 upstream. In netfs_extract_user_iter(), if iov_iter_extract_pages() failed to extract user pages, bail out on -ENOMEM, otherwise return the error code only if @npages == 0, allowing short DIO reads and writes to be issued. This fixes mmapstress02 from LTP tests against CIFS. Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-10-dhowells@redhat.com Cc: netfs@lists.linux.dev Cc: stable@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysceph: fix BUG_ON in __ceph_build_xattrs_blob() due to stale blob sizeViacheslav Dubeyko1-0/+16
commit 0c22d9511cbde746622f8e4c11aaa63fe76d45f9 upstream. The generic/642 test-case can reproduce the kernel crash: [40243.605254] ------------[ cut here ]------------ [40243.605956] kernel BUG at fs/ceph/xattr.c:918! [40243.607142] Oops: invalid opcode: 0000 [#1] SMP PTI [40243.608067] CPU: 7 UID: 0 PID: 498762 Comm: kworker/7:1 Not tainted 7.0.0-rc7+ #3 PREEMPT(full) [40243.609700] Hardware name: QEMU Ubuntu 25.10 PC v2 (i440FX + PIIX, + 10.1 machine, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [40243.611820] Workqueue: ceph-msgr ceph_con_workfn [40243.612715] RIP: 0010:__ceph_build_xattrs_blob+0x1b8/0x1e0 [40243.613731] Code: 0f 84 82 fe ff ff e9 cf 8e 56 ff 48 8d 65 e8 31 c0 5b 41 5c 41 5d 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b 4c 8b 62 08 41 8b 85 24 07 00 00 49 83 c4 04 41 89 44 24 fc [40243.616888] RSP: 0018:ffffcc80c4d4b688 EFLAGS: 00010287 [40243.617773] RAX: 0000000000010026 RBX: 0000000000000001 RCX: 0000000000000000 [40243.618928] RDX: ffff8a773798dee0 RSI: 0000000000000000 RDI: 0000000000000000 [40243.620158] RBP: ffffcc80c4d4b6a0 R08: 0000000000000000 R09: 0000000000000000 [40243.621573] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a75f3b58000 [40243.622907] R13: ffff8a75f3b58000 R14: 0000000000000080 R15: 000000000000bffd [40243.624054] FS: 0000000000000000(0000) GS:ffff8a787d1b4000(0000) knlGS:0000000000000000 [40243.625331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40243.626269] CR2: 000072f390b623c0 CR3: 000000011c02a003 CR4: 0000000000372ef0 [40243.627408] Call Trace: [40243.627839] <TASK> [40243.628188] __prep_cap+0x3fd/0x4a0 [40243.628789] ? do_raw_spin_unlock+0x4e/0xe0 [40243.629474] ceph_check_caps+0x46a/0xc80 [40243.630094] ? __lock_acquire+0x4a2/0x2650 [40243.630773] ? find_held_lock+0x31/0x90 [40243.631347] ? handle_cap_grant+0x79f/0x1060 [40243.632068] ? lock_release+0xd9/0x300 [40243.632696] ? __mutex_unlock_slowpath+0x3e/0x340 [40243.633429] ? lock_release+0xd9/0x300 [40243.634052] handle_cap_grant+0xcf6/0x1060 [40243.634745] ceph_handle_caps+0x122b/0x2110 [40243.635415] mds_dispatch+0x5bd/0x2160 [40243.636034] ? ceph_con_process_message+0x65/0x190 [40243.636828] ? lock_release+0xd9/0x300 [40243.637431] ceph_con_process_message+0x7a/0x190 [40243.638184] ? kfree+0x311/0x4f0 [40243.638749] ? kfree+0x311/0x4f0 [40243.639268] process_message+0x16/0x1a0 [40243.639915] ? sg_free_table+0x39/0x90 [40243.640572] ceph_con_v2_try_read+0xf58/0x2120 [40243.641255] ? lock_acquire+0xc8/0x300 [40243.641863] ceph_con_workfn+0x151/0x820 [40243.642493] process_one_work+0x22f/0x630 [40243.643093] ? process_one_work+0x254/0x630 [40243.643770] worker_thread+0x1e2/0x400 [40243.644332] ? __pfx_worker_thread+0x10/0x10 [40243.645020] kthread+0x109/0x140 [40243.645560] ? __pfx_kthread+0x10/0x10 [40243.646125] ret_from_fork+0x3f8/0x480 [40243.646752] ? __pfx_kthread+0x10/0x10 [40243.647316] ? __pfx_kthread+0x10/0x10 [40243.647919] ret_from_fork_asm+0x1a/0x30 [40243.648556] </TASK> [40243.648902] Modules linked in: overlay hctr2 libpolyval chacha libchacha adiantum libnh libpoly1305 essiv intel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit kvm_intel kvm irqbypass joydev ghash_clmulni_intel aesni_intel rapl input_leds mac_hid psmouse vga16fb serio_raw vgastate floppy i2c_piix4 pata_acpi bochs qemu_fw_cfg i2c_smbus sch_fq_codel rbd dm_crypt msr parport_pc ppdev lp parport efi_pstore [40243.654766] ---[ end trace 0000000000000000 ]--- Commit d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size") moved the required_blob_size computation to before the __build_xattrs() call, introducing a race. __build_xattrs() releases and reacquires i_ceph_lock during execution. In that window, handle_cap_grant() may update i_xattrs.blob with a newer MDS-provided blob and bump i_xattrs.version. When __build_xattrs() detects that index_version < version, it destroys and rebuilds the entire xattr rb-tree from the new blob, potentially increasing count, names_size, and vals_size. The prealloc_blob size check that follows still uses the stale required_blob_size computed before the rebuild, so it passes even when prealloc_blob is too small for the now-larger tree. After __set_xattr() adds one more xattr on top, __ceph_build_xattrs_blob() is called from the cap flush path and hits: BUG_ON(need > ci->i_xattrs.prealloc_blob->alloc_len); Fix this by recomputing required_blob_size after __build_xattrs() returns, using the current tree state. Also re-validate against m_max_xattr_size to fall back to the sync path if the rebuilt tree now exceeds the MDS limit. Cc: stable@vger.kernel.org Fixes: d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size") Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysceph: fix a buffer leak in __ceph_setxattr()Viacheslav Dubeyko1-0/+1
commit 5d3cc36b4e77a27ce7b686b7c59c7072bcb3fa8e upstream. The old_blob in __ceph_setxattr() can store ci->i_xattrs.prealloc_blob value during the retry. However, it is never called the ceph_buffer_put() for the old_blob object. This patch fixes the issue of the buffer leak. Cc: stable@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 dayssmb/client: fix possible infinite loop and oob read in symlink_data()Ye Bin1-0/+3
commit 7d9a7f1f96cd617ee9e75bb22217c709038e26b8 upstream. On 32-bit architectures, the infinite loop is as follows: len = p->ErrorDataLength == 0xfffffff8 u8 *next = p->ErrorContextData + len next == p On 32-bit architectures, the out-of-bounds read is as follows: len = p->ErrorDataLength == 0xfffffff0 u8 *next = p->ErrorContextData + len next == (u8 *)p - 8 Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+") Cc: stable@vger.kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysntfs: ->d_compare() must not blockAl Viro5-24/+20
[ Upstream commit ca2a04e84af79596e5cd9cfe697d5122ec39c8ce ] ... so don't use __getname() there. Switch it (and ntfs_d_hash(), while we are at it) to kmalloc(PATH_MAX, GFP_NOWAIT). Yes, ntfs_d_hash() almost certainly can do with smaller allocations, but let ntfs folks deal with that - keep the allocation size as-is for now. Stop abusing names_cachep in ntfs, period - various uses of that thing in there have nothing to do with pathnames; just use k[mz]alloc() and be done with that. For now let's keep sizes as-in, but AFAICS none of the users actually want PATH_MAX. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Li hongliang <1468888505@139.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayssmb: client: fix OOB reads parsing symlink error responseGreg Kroah-Hartman1-8/+12
[ Upstream commit 3df690bba28edec865cf7190be10708ad0ddd67e ] When a CREATE returns STATUS_STOPPED_ON_SYMLINK, smb2_check_message() returns success without any length validation, leaving the symlink parsers as the only defense against an untrusted server. symlink_data() walks SMB 3.1.1 error contexts with the loop test "p < end", but reads p->ErrorId at offset 4 and p->ErrorDataLength at offset 0. When the server-controlled ErrorDataLength advances p to within 1-7 bytes of end, the next iteration will read past it. When the matching context is found, sym->SymLinkErrorTag is read at offset 4 from p->ErrorContextData with no check that the symlink header itself fits. smb2_parse_symlink_response() then bounds-checks the substitute name using SMB2_SYMLINK_STRUCT_SIZE as the offset of PathBuffer from iov_base. That value is computed as sizeof(smb2_err_rsp) + sizeof(smb2_symlink_err_rsp), which is correct only when ErrorContextCount == 0. With at least one error context the symlink data sits 8 bytes deeper, and each skipped non-matching context shifts it further by 8 + ALIGN(ErrorDataLength, 8). The check is too short, allowing the substitute name read to run past iov_len. The out-of-bound heap bytes are UTF-16-decoded into the symlink target and returned to userspace via readlink(2). Fix this all up by making the loops test require the full context header to fit, rejecting sym if its header runs past end, and bound the substitute name against the actual position of sym->PathBuffer rather than a fixed offset. Because sub_offs and sub_len are 16bits, the pointer math will not overflow here with the new greater-than. Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Shyam Prasad N <sprasad@microsoft.com> Cc: Tom Talpey <tom@talpey.com> Cc: Bharath SM <bharathsm@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: stable <stable@kernel.org> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Alva Lan <alvalan9@foxmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayssmb: client: correctly handle ErrorContextData as a flexible arrayLiang Jie2-3/+3
[ Upstream commit 215b7f9ecb8d7c14d56febdcdd246f3579c32aba ] The `smb2_symlink_err_rsp` structure was previously defined with `ErrorContextData` as a single `__u8` byte. However, the `ErrorContextData` field is intended to be a variable-length array based on `ErrorDataLength`. This mismatch leads to incorrect pointer arithmetic and potential memory access issues when processing error contexts. Updates the `ErrorContextData` field to be a flexible array (`__u8 ErrorContextData[]`). Additionally, it modifies the corresponding casts in the `symlink_data()` function to properly handle the flexible array, ensuring correct memory calculations and data handling. These changes improve the robustness of SMB2 symlink error processing. Signed-off-by: Liang Jie <liangjie@lixiang.com> Suggested-by: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Alva Lan <alvalan9@foxmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysbtrfs: fix double-decrement of bytes_may_use in submit_one_async_extent()Mark Harmstone1-1/+1
[ Upstream commit 82323b1a7088b7a5c3e528a5d634bff447fa286f ] submit_one_async_extent() calls btrfs_reserve_extent(), which decrements bytes_may_use. If the call btrfs_create_io_em() fails, we jump to out_free_reserve, which calls extent_clear_unlock_delalloc(). Because we're specifying EXTENT_DO_ACCOUNTING, i.e. EXTENT_CLEAR_META_RESV | EXTENT_CLEAR_DATA_RESV, this decreases bytes_may_use again. This can lead to problems later on, as an initial write can fail only for the writeback to silently ENOSPC. Fix this by replacing EXTENT_DO_ACCOUNTING with EXTENT_CLEAR_META_RESV. This parallels a4fe134fc1d8eb ("btrfs: fix a double release on reserved extents in cow_one_range()"), which is the same fix in cow_one_range(). Fixes: 151a41bc46df ("Btrfs: fix what bits we clear when erroring out from delalloc") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfsnotify: fix inode reference leak in fsnotify_recalc_mask()Amir Goldstein1-3/+36
[ Upstream commit 4aca914ac152f5d055ddcb36704d1e539ac08977 ] fsnotify_recalc_mask() fails to handle the return value of __fsnotify_recalc_mask(), which may return an inode pointer that needs to be released via fsnotify_drop_object() when the connector's HAS_IREF flag transitions from set to cleared. This manifests as a hung task with the following call trace: INFO: task umount:1234 blocked for more than 120 seconds. Call Trace: __schedule schedule fsnotify_sb_delete generic_shutdown_super kill_anon_super cleanup_mnt task_work_run do_exit do_group_exit The race window that triggers the iref leak: Thread A (adding mark) Thread B (removing mark) ────────────────────── ──────────────────────── fsnotify_add_mark_locked(): fsnotify_add_mark_list(): spin_lock(conn->lock) add mark_B(evictable) to list spin_unlock(conn->lock) return /* ---- gap: no lock held ---- */ fsnotify_detach_mark(mark_A): spin_lock(mark_A->lock) clear ATTACHED flag on mark_A spin_unlock(mark_A->lock) fsnotify_put_mark(mark_A) fsnotify_recalc_mask(): spin_lock(conn->lock) __fsnotify_recalc_mask(): /* mark_A skipped: ATTACHED cleared */ /* only mark_B(evictable) remains */ want_iref = false has_iref = true /* not yet cleared */ -> HAS_IREF transitions true -> false -> returns inode pointer spin_unlock(conn->lock) /* BUG: return value discarded! * iput() and fsnotify_put_sb_watched_objects() * are never called */ Fix this by deferring the transition true -> false of HAS_IREF flag from fsnotify_recalc_mask() (Thread A) to fsnotify_put_mark() (thread B). Fixes: c3638b5b1374 ("fsnotify: allow adding an inode mark without pinning inode") Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/CAOQ4uxiPsbHb0o5voUKyPFMvBsDkG914FYDcs4C5UpBMNm0Vcg@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfs/adfs: validate nzones in adfs_validate_bblk()Bae Yeonju1-0/+3
[ Upstream commit dd9d3e16c2d5fa166e13dce07413be51f42c8f5d ] Reject ADFS disc records with a zero zone count during boot block validation, before the disc record is used. When nzones is 0, adfs_read_map() passes it to kmalloc_array(0, ...) which returns ZERO_SIZE_PTR, and adfs_map_layout() then writes to dm[-1], causing an out-of-bounds write before the allocated buffer. adfs_validate_dr0() already rejects nzones != 1 for old-format images. Add the equivalent check to adfs_validate_bblk() for new-format images so that a crafted image with nzones == 0 is rejected at probe time. Found by syzkaller. Fixes: f6f14a0d71b0 ("fs/adfs: map: move map-specific sb initialisation to map.c") Signed-off-by: Bae Yeonju <iwasbaeyz@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: scope conn->binding slowpath to bound sessions onlyHyunwoo Kim1-1/+6
[ Upstream commit b0da97c034b6107d14e537e212d4ce8b22109a58 ] When the binding SESSION_SETUP sets conn->binding = true, the flag stays set after the call so that the global session lookup in ksmbd_session_lookup_all() can find the session, which was not added to conn->sessions. Because the flag is connection-wide, the global lookup path will also resolve any other session by id if asked. Tighten the global lookup so that the returned session must have this connection registered in its channel xarray (sess->ksmbd_chann_list). The channel entry is installed by the existing binding_session path in ntlm_authenticate()/krb5_authenticate() when a SESSION_SETUP completes successfully, so this condition is a strict equivalent of "this connection has been accepted as a channel of this session". Connections that have not bound to a given session cannot reach it via the global table. The existing conn->binding gate for entering the slowpath is preserved so that non-binding connections keep the fast-path-only behavior, and the session->state check is unchanged. Fixes: f5a544e3bab7 ("ksmbd: add support for SMB3 multichannel") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: fix durable fd leak on ClientGUID mismatch in durable v2 openDaeMyung Kang1-0/+2
[ Upstream commit 804054d19886ac6628883d82410f6ee42a818664 ] ksmbd_lookup_fd_cguid() returns a ksmbd_file with its refcount incremented via ksmbd_fp_get(). parse_durable_handle_context() in the DURABLE_REQ_V2 case properly releases this reference on every path inside the ClientGUID-match branch, either by calling ksmbd_put_durable_fd() or by transferring ownership to dh_info->fp for a successful reconnect. However, when an entry exists in the global file table with the same CreateGuid but a different ClientGUID, the code simply falls through to the new-open path without dropping the reference obtained from ksmbd_lookup_fd_cguid(). Per MS-SMB2 section 3.3.5.9.10 ("Handling the SMB2_CREATE_DURABLE_HANDLE_REQUEST_V2 Create Context"), the server MUST locate an Open whose Open.CreateGuid matches the request's CreateGuid AND whose Open.ClientGuid matches the ClientGuid of the connection that received the request. If no such Open is found, the server MUST continue with the normal open execution phase. A CreateGuid hit with a ClientGUID mismatch is therefore the "Open not found" case: proceeding with a new open is correct, but the reference obtained purely as a side effect of the lookup must not be leaked. Repeated requests that hit this mismatch pin global_ft entries, prevent __ksmbd_close_fd() from ever running for the corresponding files, and defeat the durable scavenger, leading to long-lived resource leaks. Release the reference in the mismatch path and clear dh_info->fp so subsequent logic does not mistake a non-matching lookup result for a reconnect target. Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: destroy async_ida in ksmbd_conn_free()DaeMyung Kang1-0/+9
[ Upstream commit b32c8db48212a34998c36d0bbc05b29d5c407ef5 ] When per-connection async_ida was converted from a dynamically allocated ksmbd_ida to an embedded struct ida, ksmbd_ida_free() was removed from the connection teardown path but no matching ida_destroy() was added. The connection is therefore freed with the IDA's backing xarray still intact. The kernel IDA API expects ida_init() and ida_destroy() to be paired over an object's lifetime, so add the missing cleanup before the connection is freed. No leak has been observed in testing; this is a pairing fix to match the IDA lifetime rules, not a response to a reproduced regression. Fixes: d40012a83f87 ("cifsd: declare ida statically") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: destroy tree_conn_ida in ksmbd_session_destroy()DaeMyung Kang1-2/+3
[ Upstream commit c049ee14eb4343b69b6f7755563f961f5e153423 ] When per-session tree_conn_ida was converted from a dynamically allocated ksmbd_ida to an embedded struct ida, ksmbd_ida_free() was removed from ksmbd_session_destroy() but no matching ida_destroy() was added. The session is therefore freed with the IDA's backing xarray still intact. The kernel IDA API expects ida_init() and ida_destroy() to be paired over an object's lifetime, so add the missing cleanup before the enclosing session is freed. Also move ida_init() to right after the session is allocated so that it is always paired with the destroy call even on the early error paths of __session_create() (ksmbd_init_file_table() or __init_smb2_session() failures), both of which jump to the error label and invoke ksmbd_session_destroy() on a partially initialised session. No leak has been observed in testing; this is a pairing fix to match the IDA lifetime rules, not a response to a reproduced regression. Fixes: d40012a83f87 ("cifsd: declare ida statically") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayserofs: unify lcn as u64 for 32-bit platformsGao Xiang1-10/+9
[ Upstream commit 2d8c7edcb661812249469f4a5b62e9339118846f ] As sashiko reported [1], `lcn` was typed as `unsigned long` (or `unsigned int` sometimes), which is only 32 bits wide on 32-bit platforms, which causes `(lcn << lclusterbits)` to be truncated at 4 GiB. In order to consolidate the logic, just use `u64` consistently around the codebase. [1] https://sashiko.dev/r/20260420034612.1899973-1-hsiangkao%40linux.alibaba.com Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayserofs: avoid infinite loops due to corrupted subpage compact indexesGao Xiang1-14/+18
[ Upstream commit e13d315ae077bb7c3c6027cc292401bc0f4ec683 ] Robert reported an infinite loop observed by two crafted images. The root cause is that `clusterofs` can be larger than `lclustersize` for !NONHEAD `lclusters` in corrupted subpage compact indexes, e.g.: blocksize = lclustersize = 512 lcn = 6 clusterofs = 515 Move the corresponding check for full compress indexes to `z_erofs_load_lcluster_from_disk()` to also cover subpage compact compress indexes. It also fixes the position of `m->type >= Z_EROFS_LCLUSTER_TYPE_MAX` check, since it should be placed right after `z_erofs_load_{compact,full}_lcluster()`. Fixes: 8d2517aaeea3 ("erofs: fix up compacted indexes for block size < 4096") Fixes: 1a5223c182fd ("erofs: do sanity check on m->type in z_erofs_load_compact_lcluster()") Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/r/35167.1760645886@localhost Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Stable-dep-of: 2d8c7edcb661 ("erofs: unify lcn as u64 for 32-bit platforms") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayserofs: do sanity check on m->type in z_erofs_load_compact_lcluster()Chao Yu1-62/+41
[ Upstream commit 1a5223c182fdb3bb3c0ca85cec101c740f685ab6 ] All below functions will do sanity check on m->type, let's move sanity check to z_erofs_load_compact_lcluster() for cleanup. - z_erofs_map_blocks_fo - z_erofs_get_extent_compressedlen - z_erofs_get_extent_decompressedlen - z_erofs_extent_lookback Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250708110928.3110375-1-chao@kernel.org Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Stable-dep-of: 2d8c7edcb661 ("erofs: unify lcn as u64 for 32-bit platforms") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayserofs: add encoded extent on-disk definitionGao Xiang3-67/+58
[ Upstream commit efb2aef569b35b415c232c4e9fdecd0e540e1f60 ] Previously, EROFS provided both (non-)compact compressed indexes to keep necessary hints for each logical block, enabling O(1) random indexing. This approach was originally designed for small compression units (e.g., 4KiB), where compressed data is strictly block-aligned via fixed-sized output compression. However, EROFS now supports big pclusters up to 1MiB and many users use large configurations to minimize image sizes. For such configurations, the total number of extents decreases significantly (e.g., only 1,024 extents for a 1GiB file using 1MiB pclusters), then runtime metadata overhead becomes negligible compared to data I/O and decoding costs. Additionally, some popular compression algorithm (mainly Zstd) still lacks native fixed-sized output compression support (although it's planned by their authors). Instead of just waiting for compressor improvements, let's adopt byte-oriented extents, allowing these compressors to retain their current methods. For example, it speeds up Zstd compression a lot: Processor: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz * 96 Dataset: enwik9 Build time Size Type Command Line 3m52.339s 266653696 FO -C524288 -zzstd,22 3m48.549s 266174464 FO -E48bit -C524288 -zzstd,22 0m12.821s 272134144 FI -E48bit -C1048576 --max-extent-bytes=1048576 -zzstd,22 0m14.528s 248987648 FO -C1048576 -zlzma,9 0m14.605s 248504320 FO -E48bit -C1048576 -zlzma,9 Encoded extents are structured as an array of `struct z_erofs_extent`, sorted by logical address in ascending order: __le32 plen // encoded length, algorithm id and flags __le32 pstart_lo // physical offset LSB __le32 pstart_hi // physical offset MSB __le32 lstart_lo // logical offset __le32 lstart_hi // logical offset MSB .. Note that prefixed reduced records can be used to minimize metadata for specific cases (e.g. lstart less than 32 bits, then 32 to 16 bytes). If the logical lengths of all encoded extents are the same, 4-byte (plen) and 8-byte (plen, pstart_lo) records can be used. Or, 16-byte (plen .. lstart_lo) and 32-byte full records have to be used instead. If 16-byte and 32-byte records are used, the total number of extents is kept in `struct z_erofs_map_header`, and binary search can be applied on them. Note that `eytzinger order` is not considerd because data sequential access is important. If 4-byte records are used, 8-byte start physical offset is between `struct z_erofs_map_header` and the `plen` array. In addition, 64-bit physical offsets can be applied with new encoded extent format to match full 48-bit block addressing. Remove redundant comments around `struct z_erofs_lcluster_index` too. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20250310095459.2620647-8-hsiangkao@linux.alibaba.com Stable-dep-of: 2d8c7edcb661 ("erofs: unify lcn as u64 for 32-bit platforms") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysf2fs: protect extension_list reading with sb_lock in f2fs_sbi_show()Yongpeng Yang1-2/+5
[ Upstream commit 5909bedbed38c558bee7cb6758ceedf9bc3a9194 ] In f2fs_sbi_show(), the extension_list, extension_count and hot_ext_count are read without holding sbi->sb_lock. If a concurrent sysfs store modifies the extension list via f2fs_update_extension_list(), the show path may read inconsistent count and array contents, potentially leading to out-of-bounds access or displaying stale data. Fix this by holding sb_lock around the entire extension list read and format operation. Fixes: b6a06cbbb5f7 ("f2fs: support hot file extension") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfs/ntfs3: terminate the cached volume label after UTF-8 conversionPengpeng Hou1-1/+6
[ Upstream commit a6cd43fe9b083fa23fe1595666d5738856cb261a ] ntfs_fill_super() loads the on-disk volume label with utf16s_to_utf8s() and stores the result in sbi->volume.label. The converted label is later exposed through ntfs3_label_show() using %s, but utf16s_to_utf8s() only returns the number of bytes written and does not add a trailing NUL. If the converted label fills the entire fixed buffer, ntfs3_label_show() can read past the end of sbi->volume.label while looking for a terminator. Terminate the cached label explicitly after a successful conversion and clamp the exact-full case to the last byte of the buffer. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysnfs/blocklayout: Fix compilation error (`make W=1`) in bl_write_pagelist()Andy Shevchenko1-3/+1
[ Upstream commit f83c8dda456ce4863f346aa26d88efa276eda35d ] Clang compiler is not happy about set but unused variable (when dprintk() is no-op): .../blocklayout/blocklayout.c:384:9: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable] Remove a leftover from the previous cleanup. Fixes: 3a6fd1f004fc ("pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist") Acked-by: Anna Schumaker <anna.schumkaer@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysext4: fix possible null-ptr-deref in mbt_kunit_exit()Ye Bin1-1/+5
[ Upstream commit 22f53f08d9eb837ce69b1a07641d414aac8d045f ] There's issue as follows: # test_new_blocks_simple: failed to initialize: -12 KASAN: null-ptr-deref in range [0x0000000000000638-0x000000000000063f] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST RIP: 0010:mbt_kunit_exit+0x5e/0x3e0 [ext4_test] Call Trace: <TASK> kunit_try_run_case_cleanup+0xbc/0x100 [kunit] kunit_generic_run_threadfn_adapter+0x89/0x100 [kunit] kthread+0x408/0x540 ret_from_fork+0xa76/0xdf0 ret_from_fork_asm+0x1a/0x30 If mbt_kunit_init() init testcase failed will lead to null-ptr-deref. So add test if 'sb' is inited success in mbt_kunit_exit(). Fixes: 7c9fa399a369 ("ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20260330133035.287842-6-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysocfs2: validate group add input before cachingZhengYuan Huang1-5/+7
[ Upstream commit 70b672833f4025341c11b22c7f83778a5cd611bc ] [BUG] OCFS2_IOC_GROUP_ADD can trigger a BUG_ON in ocfs2_set_new_buffer_uptodate(): kernel BUG at fs/ocfs2/uptodate.c:509! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI RIP: 0010:ocfs2_set_new_buffer_uptodate+0x194/0x1e0 fs/ocfs2/uptodate.c:509 Code: ffffe88f 42b9fe4c 89e64889 dfe8b4df Call Trace: ocfs2_group_add+0x3f1/0x1510 fs/ocfs2/resize.c:507 ocfs2_ioctl+0x309/0x6e0 fs/ocfs2/ioctl.c:887 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583 x64_sys_call+0x1144/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7bbfb55a966d [CAUSE] ocfs2_group_add() calls ocfs2_set_new_buffer_uptodate() on a user-controlled group block before ocfs2_verify_group_and_input() validates that block number. That helper is only valid for newly allocated metadata and asserts that the block is not already present in the chosen metadata cache. The code also uses INODE_CACHE(inode) even though the group descriptor belongs to main_bm_inode and later journal accesses use that cache context instead. [FIX] Validate the on-disk group descriptor before caching it, then add it to the metadata cache tracked by INODE_CACHE(main_bm_inode). Keep the validation failure path separate from the later cleanup path so we only remove the buffer from that cache after it has actually been inserted. This keeps the group buffer lifetime consistent across validation, journaling, and cleanup. Link: https://lkml.kernel.org/r/20260410020209.3786348-1-gality369@gmail.com Fixes: 7909f2bf8353 ("[PATCH 2/2] ocfs2: Implement group add for online resize") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysocfs2: validate bg_bits during freefrag scanZhengYuan Huang1-1/+17
[ Upstream commit 8f687eeed3da3012152b0f9473f578869de0cd7b ] [BUG] A crafted filesystem can trigger an out-of-bounds bitmap walk when OCFS2_IOC_INFO is issued with OCFS2_INFO_FL_NON_COHERENT. BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline] BUG: KASAN: use-after-free in _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline] BUG: KASAN: use-after-free in test_bit_le include/asm-generic/bitops/le.h:21 [inline] BUG: KASAN: use-after-free in ocfs2_info_freefrag_scan_chain fs/ocfs2/ioctl.c:495 [inline] BUG: KASAN: use-after-free in ocfs2_info_freefrag_scan_bitmap fs/ocfs2/ioctl.c:588 [inline] BUG: KASAN: use-after-free in ocfs2_info_handle_freefrag fs/ocfs2/ioctl.c:662 [inline] BUG: KASAN: use-after-free in ocfs2_info_handle_request+0x1c66/0x3370 fs/ocfs2/ioctl.c:754 Read of size 8 at addr ffff888031bce000 by task syz.0.636/1435 Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xbe/0x130 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xd1/0x650 mm/kasan/report.c:482 kasan_report+0xfb/0x140 mm/kasan/report.c:595 check_region_inline mm/kasan/generic.c:186 [inline] kasan_check_range+0x11c/0x200 mm/kasan/generic.c:200 __kasan_check_read+0x11/0x20 mm/kasan/shadow.c:31 instrument_atomic_read include/linux/instrumented.h:68 [inline] _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline] test_bit_le include/asm-generic/bitops/le.h:21 [inline] ocfs2_info_freefrag_scan_chain fs/ocfs2/ioctl.c:495 [inline] ocfs2_info_freefrag_scan_bitmap fs/ocfs2/ioctl.c:588 [inline] ocfs2_info_handle_freefrag fs/ocfs2/ioctl.c:662 [inline] ocfs2_info_handle_request+0x1c66/0x3370 fs/ocfs2/ioctl.c:754 ocfs2_info_handle+0x18d/0x2a0 fs/ocfs2/ioctl.c:828 ocfs2_ioctl+0x632/0x6e0 fs/ocfs2/ioctl.c:913 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583 ... [CAUSE] ocfs2_info_freefrag_scan_chain() uses on-disk bg_bits directly as the bitmap scan limit. The coherent path reads group descriptors through ocfs2_read_group_descriptor(), which validates the descriptor before use. The non-coherent path uses ocfs2_read_blocks_sync() instead and skips that validation, so an impossible bg_bits value can drive the bitmap walk past the end of the block. [FIX] Compute the bitmap capacity from the filesystem format with ocfs2_group_bitmap_size(), report descriptors whose bg_bits exceeds that limit, and clamp the scan to the computed capacity. This keeps the freefrag report going while avoiding reads beyond the buffer. Link: https://lkml.kernel.org/r/20260410034220.3825769-1-gality369@gmail.com Fixes: d24a10b9f8ed ("Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysocfs2: fix listxattr handling when the buffer is fullZhengYuan Huang1-2/+2
[ Upstream commit d12f558e6200b3f47dbef9331ed6d115d2410e59 ] [BUG] If an OCFS2 inode has both inline and block-based xattrs, listxattr() can return a size larger than the caller's buffer when the inline names consume that buffer exactly. kernel BUG at mm/usercopy.c:102! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI RIP: 0010:usercopy_abort+0xb7/0xd0 mm/usercopy.c:102 Call Trace: __check_heap_object+0xe3/0x120 mm/slub.c:8243 check_heap_object mm/usercopy.c:196 [inline] __check_object_size mm/usercopy.c:250 [inline] __check_object_size+0x5c5/0x780 mm/usercopy.c:215 check_object_size include/linux/ucopysize.h:22 [inline] check_copy_size include/linux/ucopysize.h:59 [inline] copy_to_user include/linux/uaccess.h:219 [inline] listxattr+0xb0/0x170 fs/xattr.c:926 filename_listxattr fs/xattr.c:958 [inline] path_listxattrat+0x137/0x320 fs/xattr.c:988 __do_sys_listxattr fs/xattr.c:1001 [inline] __se_sys_listxattr fs/xattr.c:998 [inline] __x64_sys_listxattr+0x7f/0xd0 fs/xattr.c:998 ... [CAUSE] Commit 936b8834366e ("ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().") replaced the old per-handler list accounting with ocfs2_xattr_list_entry(), but it kept using size == 0 to detect probe mode. That assumption stops being true once ocfs2_listxattr() finishes the inline-xattr pass. If the inline names fill the caller buffer exactly, the block-xattr pass runs with a non-NULL buffer and a remaining size of zero. ocfs2_xattr_list_entry() then skips the bounds check, keeps counting block names, and returns a positive size larger than the supplied buffer. [FIX] Detect probe mode by testing whether the destination buffer pointer is NULL instead of whether the remaining size is zero. That restores the pre-refactor behavior and matches the OCFS2 getxattr helpers. Once the remaining buffer reaches zero while more names are left, the block-xattr pass now returns -ERANGE instead of reporting a size larger than the allocated list buffer. Link: https://lkml.kernel.org/r/20260410040339.3837162-1-gality369@gmail.com Fixes: 936b8834366e ("ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysocfs2/dlm: fix off-by-one in dlm_match_regions() region comparisonJunrui Luo1-1/+1
[ Upstream commit 01b61e8dda9b0fdb0d4cda43de25f4e390554d7b ] The local-vs-remote region comparison loop uses '<=' instead of '<', causing it to read one entry past the valid range of qr_regions. The other loops in the same function correctly use '<'. Fix the loop condition to use '<' for consistency and correctness. Link: https://lkml.kernel.org/r/SYBPR01MB78813DA26B50EC5E01F00566AF7BA@SYBPR01MB7881.ausprd01.prod.outlook.com Fixes: ea2034416b54 ("ocfs2/dlm: Add message DLM_QUERY_REGION") Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysocfs2/dlm: validate qr_numregions in dlm_match_regions()Junrui Luo1-0/+8
[ Upstream commit 7ab3fbb01bc6d79091bc375e5235d360cd9b78be ] Patch series "ocfs2/dlm: fix two bugs in dlm_match_regions()". In dlm_match_regions(), the qr_numregions field from a DLM_QUERY_REGION network message is used to drive loops over the qr_regions buffer without sufficient validation. This series fixes two issues: - Patch 1 adds a bounds check to reject messages where qr_numregions exceeds O2NM_MAX_REGIONS. The o2net layer only validates message byte length; it does not constrain field values, so a crafted message can set qr_numregions up to 255 and trigger out-of-bounds reads past the 1024-byte qr_regions buffer. - Patch 2 fixes an off-by-one in the local-vs-remote comparison loop, which uses '<=' instead of '<', reading one entry past the valid range even when qr_numregions is within bounds. This patch (of 2): The qr_numregions field from a DLM_QUERY_REGION network message is used directly as loop bounds in dlm_match_regions() without checking against O2NM_MAX_REGIONS. Since qr_regions is sized for at most O2NM_MAX_REGIONS (32) entries, a crafted message with qr_numregions > 32 causes out-of-bounds reads past the qr_regions buffer. Add a bounds check for qr_numregions before entering the loops. Link: https://lkml.kernel.org/r/SYBPR01MB7881A334D02ACEE5E0645801AF7BA@SYBPR01MB7881.ausprd01.prod.outlook.com Link: https://lkml.kernel.org/r/SYBPR01MB788166F524AD04E262E174BEAF7BA@SYBPR01MB7881.ausprd01.prod.outlook.com Fixes: ea2034416b54 ("ocfs2/dlm: Add message DLM_QUERY_REGION") Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: fix use-after-free from async crypto on Qualcomm crypto engineJoshua Klinesmith1-5/+6
[ Upstream commit 3e298897f41c61450c2e7a4f457e8b2485eb35b3 ] ksmbd_crypt_message() sets a NULL completion callback on AEAD requests and does not handle the -EINPROGRESS return code from async hardware crypto engines like the Qualcomm Crypto Engine (QCE). When QCE returns -EINPROGRESS, ksmbd treats it as an error and immediately frees the request while the hardware DMA operation is still in flight. The DMA completion callback then dereferences freed memory, causing a NULL pointer crash: pc : qce_skcipher_done+0x24/0x174 lr : vchan_complete+0x230/0x27c ... el1h_64_irq+0x68/0x6c ksmbd_free_work_struct+0x20/0x118 [ksmbd] ksmbd_exit_file_cache+0x694/0xa4c [ksmbd] Use the standard crypto_wait_req() pattern with crypto_req_done() as the completion callback, matching the approach used by the SMB client in fs/smb/client/smb2ops.c. This properly handles both synchronous engines (immediate return) and async engines (-EINPROGRESS followed by callback notification). Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Link: https://github.com/openwrt/openwrt/issues/21822 Signed-off-by: Joshua Klinesmith <joshuaklinesmith@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysgfs2: prevent NULL pointer dereference during unmountAndreas Gruenbacher1-2/+3
[ Upstream commit 74b4dbb946060a3233604d91859a9abd3708141d ] When flushing out outstanding glock work during an unmount, gfs2_log_flush() can be called when sdp->sd_jdesc has already been deallocated and sdp->sd_jdesc is NULL. Commit 35264909e9d1 ("gfs2: Fix NULL pointer dereference in gfs2_log_flush") added a check for that to gfs2_log_flush() itself, but it missed the sdp->sd_jdesc dereference in gfs2_log_release(). Fix that. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202604071139.HNJiCaAi-lkp@intel.com/ Fixes: 35264909e9d1 ("gfs2: Fix NULL pointer dereference in gfs2_log_flush") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysgfs2: add some missing log lockingAndreas Gruenbacher1-8/+20
[ Upstream commit fe2c8d051150b90b3ccb85f89e3b1d636cb88ec8 ] Function gfs2_logd() calls the log flushing functions gfs2_ail1_start(), gfs2_ail1_wait(), and gfs2_ail1_empty() without holding sdp->sd_log_flush_lock, but these functions require exclusion against concurrent transactions. To fix that, add a non-locking __gfs2_log_flush() function. Then, in gfs2_logd(), take sdp->sd_log_flush_lock before calling the above mentioned log flushing functions and __gfs2_log_flush(). Fixes: 5e4c7632aae1c ("gfs2: Issue revokes more intelligently") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysquota: Fix race of dquot_scan_active() with quota deactivationJan Kara1-8/+30
[ Upstream commit e93ab401da4b2e2c1b8ef2424de2f238d51c8b2d ] dquot_scan_active() can race with quota deactivation in quota_release_workfn() like: CPU0 (quota_release_workfn) CPU1 (dquot_scan_active) ============================== ============================== spin_lock(&dq_list_lock); list_replace_init( &releasing_dquots, &rls_head); /* dquot X on rls_head, dq_count == 0, DQ_ACTIVE_B still set */ spin_unlock(&dq_list_lock); synchronize_srcu(&dquot_srcu); spin_lock(&dq_list_lock); list_for_each_entry(dquot, &inuse_list, dq_inuse) { /* finds dquot X */ dquot_active(X) -> true atomic_inc(&X->dq_count); } spin_unlock(&dq_list_lock); spin_lock(&dq_list_lock); dquot = list_first_entry(&rls_head); WARN_ON_ONCE(atomic_read(&dquot->dq_count)); The problem is not only a cosmetic one as under memory pressure the caller of dquot_scan_active() can end up working on freed dquot. Fix the problem by making sure the dquot is removed from releasing list when we acquire a reference to it. Fixes: 869b6ea1609f ("quota: Fix slow quotaoff") Reported-by: Sam Sun <samsun1006219@gmail.com> Link: https://lore.kernel.org/all/CAEkJfYPTt3uP1vAYnQ5V2ZWn5O9PLhhGi5HbOcAzyP9vbXyjeg@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfanotify: call fanotify_events_supported() before path_permission() and ↵Ondrej Mosnacek1-15/+10
security_path_notify() [ Upstream commit 66052a768d4726a31e939b5ac902f2b0b452c8d5 ] The latter trigger LSM (e.g. SELinux) checks, which will log a denial when permission is denied, so it's better to do them after validity checks to avoid logging a denial when the operation would fail anyway. Fixes: 0b3b094ac9a7 ("fanotify: Disallow permission events for proc filesystem") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20260216150625.793013-3-omosnace@redhat.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfdget(), trivial conversionsAl Viro12-137/+74
[ Upstream commit 6348be02eead77bdd1562154ed6b3296ad3b3750 ] fdget() is the first thing done in scope, all matching fdput() are immediately followed by leaving the scope. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Stable-dep-of: 66052a768d47 ("fanotify: call fanotify_events_supported() before path_permission() and security_path_notify()") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysgfs2: Call unlock_new_inode before d_instantiateAndreas Gruenbacher1-2/+1
[ Upstream commit 2ff7cf7e0640ff071ebc5c7e3dc2df024a7c91e6 ] As Neil Brown describes in detail in the link referenced below, new inodes must be unlocked before they can be instantiated. An even better fix is to use d_instantiate_new(), which combines d_instantiate() and unlock_new_inode(). Fixes: 3d36e57ff768 ("gfs2: gfs2_create_inode rework") Reported-by: syzbot+0ea5108a1f5fb4fcc2d8@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-fsdevel/177153754005.8396.8777398743501764194@noble.neil.brown.name/ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysdebugfs: fix placement of EXPORT_SYMBOL_GPL for debugfs_create_str()Gui-Dong Han1-1/+1
[ Upstream commit 4afc929c0f74c4f22b055a82b371d50586da58ca ] The EXPORT_SYMBOL_GPL() for debugfs_create_str was placed incorrectly away from the function definition. Move it immediately below the debugfs_create_str() function where it belongs. Fixes: d60b59b96795 ("debugfs: Export debugfs_create_str symbol") Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Link: https://patch.msgid.link/20260323085930.88894-3-hanguidong02@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysdebugfs: check for NULL pointer in debugfs_create_str()Gui-Dong Han1-1/+4
[ Upstream commit 31de83980d3764d784f79ff1bc93c42b324f4013 ] Passing a NULL pointer to debugfs_create_str() leads to a NULL pointer dereference when the debugfs file is read. Following upstream discussions, forbid the creation of debugfs string files with NULL pointers. Add a WARN_ON() to expose offending callers and return early. Fixes: 9af0440ec86e ("debugfs: Implement debugfs_create_str()") Reported-by: yangshiguang <yangshiguang@xiaomi.com> Closes: https://lore.kernel.org/lkml/2025122221-gag-malt-75ba@gregkh/ Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Link: https://patch.msgid.link/20260323085930.88894-2-hanguidong02@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysbtrfs: fix deadlock between reflink and transaction commit when using ↵Filipe Manana1-0/+45
flushoncommit [ Upstream commit b48c980b6a7e409050bb3067165db31cc6205e3e ] When using the flushoncommit mount option, we can have a deadlock between a transaction commit and a reflink operation that copied an inline extent to an offset beyond the current i_size of the destination node. The deadlock happens like this: 1) Task A clones an inline extent from inode X to an offset of inode Y that is beyond Y's current i_size. This means we copied the inline extent's data to a folio of inode Y that is beyond its EOF, using a call to copy_inline_to_page(); 2) Task B starts a transaction commit and calls btrfs_start_delalloc_flush() to flush delalloc; 3) The delalloc flushing sees the new dirty folio of inode Y and when it attempts to flush it, it ends up at extent_writepage() and sees that the offset of the folio is beyond the i_size of inode Y, so it attempts to invalidate the folio by calling folio_invalidate(), which ends up at btrfs' folio invalidate callback - btrfs_invalidate_folio(). There it tries to lock the folio's range in inode Y's extent io tree, but it blocks since it's currently locked by task A - during a reflink we lock the inodes and the source and destination ranges after flushing all delalloc and waiting for ordered extent completion - after that we don't expect to have dirty folios in the ranges, the exception is if we have to copy an inline extent's data (because the destination offset is not zero); 4) Task A then attempts to start a transaction to update the inode item, and then it's blocked since the current transaction is in the TRANS_STATE_COMMIT_START state. Therefore task A has to wait for the current transaction to become unblocked (its state >= TRANS_STATE_UNBLOCKED). So task A is waiting for the transaction commit done by task B, and the later waiting on the extent lock of inode Y that is currently held by task A. Syzbot recently reported this with the following stack traces: INFO: task kworker/u8:7:1053 blocked for more than 143 seconds. Not tainted syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:7 state:D stack:23520 pid:1053 tgid:1053 ppid:2 task_flags:0x4208060 flags:0x00080000 Workqueue: writeback wb_workfn (flush-btrfs-46) Call Trace: <TASK> context_switch kernel/sched/core.c:5298 [inline] __schedule+0x1553/0x5240 kernel/sched/core.c:6911 __schedule_loop kernel/sched/core.c:6993 [inline] schedule+0x164/0x360 kernel/sched/core.c:7008 wait_extent_bit fs/btrfs/extent-io-tree.c:811 [inline] btrfs_lock_extent_bits+0x59c/0x700 fs/btrfs/extent-io-tree.c:1914 btrfs_lock_extent fs/btrfs/extent-io-tree.h:152 [inline] btrfs_invalidate_folio+0x43d/0xc40 fs/btrfs/inode.c:7704 extent_writepage fs/btrfs/extent_io.c:1852 [inline] extent_write_cache_pages fs/btrfs/extent_io.c:2580 [inline] btrfs_writepages+0x12ff/0x2440 fs/btrfs/extent_io.c:2713 do_writepages+0x32e/0x550 mm/page-writeback.c:2554 __writeback_single_inode+0x133/0x11a0 fs/fs-writeback.c:1750 writeback_sb_inodes+0x995/0x19d0 fs/fs-writeback.c:2042 wb_writeback+0x456/0xb70 fs/fs-writeback.c:2227 wb_do_writeback fs/fs-writeback.c:2374 [inline] wb_workfn+0x41a/0xf60 fs/fs-writeback.c:2414 process_one_work kernel/workqueue.c:3276 [inline] process_scheduled_works+0xb6e/0x18c0 kernel/workqueue.c:3359 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3440 kthread+0x388/0x470 kernel/kthread.c:436 ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> INFO: task syz.4.64:6910 blocked for more than 143 seconds. Not tainted syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz.4.64 state:D stack:22752 pid:6910 tgid:6905 ppid:5944 task_flags:0x400140 flags:0x00080002 Call Trace: <TASK> context_switch kernel/sched/core.c:5298 [inline] __schedule+0x1553/0x5240 kernel/sched/core.c:6911 __schedule_loop kernel/sched/core.c:6993 [inline] schedule+0x164/0x360 kernel/sched/core.c:7008 wait_current_trans+0x39f/0x590 fs/btrfs/transaction.c:535 start_transaction+0x6a7/0x1650 fs/btrfs/transaction.c:705 clone_copy_inline_extent fs/btrfs/reflink.c:299 [inline] btrfs_clone+0x128a/0x24d0 fs/btrfs/reflink.c:529 btrfs_clone_files+0x271/0x3f0 fs/btrfs/reflink.c:750 btrfs_remap_file_range+0x76b/0x1320 fs/btrfs/reflink.c:903 vfs_copy_file_range+0xda7/0x1390 fs/read_write.c:1600 __do_sys_copy_file_range fs/read_write.c:1683 [inline] __se_sys_copy_file_range+0x2fb/0x480 fs/read_write.c:1650 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f73afc799 RSP: 002b:00007f5f7315e028 EFLAGS: 00000246 ORIG_RAX: 0000000000000146 RAX: ffffffffffffffda RBX: 00007f5f73d75fa0 RCX: 00007f5f73afc799 RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000005 RBP: 00007f5f73b92c99 R08: 0000000000000863 R09: 0000000000000000 R10: 00002000000000c0 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5f73d76038 R14: 00007f5f73d75fa0 R15: 00007fff138a5068 </TASK> INFO: task syz.4.64:6975 blocked for more than 143 seconds. Not tainted syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz.4.64 state:D stack:24736 pid:6975 tgid:6905 ppid:5944 task_flags:0x400040 flags:0x00080002 Call Trace: <TASK> context_switch kernel/sched/core.c:5298 [inline] __schedule+0x1553/0x5240 kernel/sched/core.c:6911 __schedule_loop kernel/sched/core.c:6993 [inline] schedule+0x164/0x360 kernel/sched/core.c:7008 wb_wait_for_completion+0x3e8/0x790 fs/fs-writeback.c:227 __writeback_inodes_sb_nr+0x24c/0x2d0 fs/fs-writeback.c:2838 try_to_writeback_inodes_sb+0x9a/0xc0 fs/fs-writeback.c:2886 btrfs_start_delalloc_flush fs/btrfs/transaction.c:2175 [inline] btrfs_commit_transaction+0x82e/0x31a0 fs/btrfs/transaction.c:2364 btrfs_ioctl+0xca7/0xd00 fs/btrfs/ioctl.c:5206 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xff/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f73afc799 RSP: 002b:00007f5f7313d028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f5f73d76090 RCX: 00007f5f73afc799 RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000004 RBP: 00007f5f73b92c99 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5f73d76128 R14: 00007f5f73d76090 R15: 00007fff138a5068 </TASK> Fix this by updating the i_size of the destination inode of a reflink operation after we copy an inline extent's data to an offset beyond the i_size and before attempting to start a transaction to update the inode's item. Reported-by: syzbot+63056bf627663701bbbf@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/69bba3fe.050a0220.227207.002f.GAE@google.com/ Fixes: 05a5a7621ce6 ("Btrfs: implement full reflink support for inline extents") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysbtrfs: pass struct btrfs_inode to clone_copy_inline_extent()David Sterba1-14/+14
[ Upstream commit 65a66afd1ee5b2770fde296663baa0f79af56bc7 ] Pass a struct btrfs_inode to clone_copy_inline_extent() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: b48c980b6a7e ("btrfs: fix deadlock between reflink and transaction commit when using flushoncommit") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayspstore/ram: fix resource leak when ioremap() failsCole Leavitt1-0/+4
[ Upstream commit 2ddb69f686ef7a621645e97fc7329c50edf5d0e5 ] In persistent_ram_iomap(), ioremap() or ioremap_wc() may return NULL on failure. Currently, if this happens, the function returns NULL without releasing the memory region acquired by request_mem_region(). This leads to a resource leak where the memory region remains reserved but unusable. Additionally, the caller persistent_ram_buffer_map() handles NULL correctly by returning -ENOMEM, but without this check, a NULL return combined with request_mem_region() succeeding leaves resources in an inconsistent state. This is the ioremap() counterpart to commit 05363abc7625 ("pstore: ram_core: fix incorrect success return when vmap() fails") which fixed a similar issue in the vmap() path. Fixes: 404a6043385d ("staging: android: persistent_ram: handle reserving and mapping memory") Signed-off-by: Cole Leavitt <cole@unwrap.rs> Link: https://patch.msgid.link/20260225235406.11790-1-cole@unwrap.rs Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysnilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty()Deepanshu Kartikey1-0/+6
[ Upstream commit be3e5d10643d3be1cbac9d9939f220a99253f980 ] nilfs_ioctl_mark_blocks_dirty() uses bd_oblocknr to detect dead blocks by comparing it with the current block number bd_blocknr. If they differ, the block is considered dead and skipped. However, bd_oblocknr should never be 0 since block 0 typically stores the primary superblock and is never a valid GC target block. A corrupted ioctl request with bd_oblocknr set to 0 causes the comparison to incorrectly match when the lookup returns -ENOENT and sets bd_blocknr to 0, bypassing the dead block check and calling nilfs_bmap_mark() on a non-existent block. This causes nilfs_btree_do_lookup() to return -ENOENT, triggering the WARN_ON(ret == -ENOENT). Fix this by rejecting ioctl requests with bd_oblocknr set to 0 at the beginning of each iteration. [ryusuke: slightly modified the commit message and comments for accuracy] Fixes: 7942b919f732 ("nilfs2: ioctl operations") Reported-by: syzbot+98a040252119df0506f8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=98a040252119df0506f8 Suggested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> Reported-by: syzbot+466a45fcfb0562f5b9a0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=466a45fcfb0562f5b9a0 Cc: Junjie Cao <junjie.cao@linux.dev> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfs/mbcache: cancel shrink work before destroying the cacheHyungJung Joo1-0/+1
[ Upstream commit d227786ab1119669df4dc333a61510c52047cce4 ] mb_cache_destroy() calls shrinker_free() and then frees all cache entries and the cache itself, but it does not cancel the pending c_shrink_work work item first. If mb_cache_entry_create() schedules c_shrink_work via schedule_work() and the work item is still pending or running when mb_cache_destroy() runs, mb_cache_shrink_worker() will access the cache after its memory has been freed, causing a use-after-free. This is only reachable by a privileged user (root or CAP_SYS_ADMIN) who can trigger the last put of a mounted ext2/ext4/ocfs2 filesystem. Cancel the work item with cancel_work_sync() before calling shrinker_free(), ensuring the worker has finished and will not be rescheduled before the cache is torn down. Fixes: c2f3140fe2ec ("mbcache2: limit cache size") Signed-off-by: Hyungjung Joo <jhj140711@gmail.com> Link: https://patch.msgid.link/20260317054556.1821600-1-jhj140711@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>