summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
6 daysnetfs: Fix potential uninitialised var in netfs_extract_user_iter()David Howells1-1/+1
commit 7e3d8db899d54af39fafb2eb3392b0cdae9973b5 upstream. In netfs_extract_user_iter(), if it's given a zero-length iterator, it will fall through the loop without setting ret, and so the error handling behaviour will be undefined, depending on whether ret happens to be negative. The value of ret then propagates back up the callstack. Fix this by presetting ret to 0. Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-9-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daystracefs: Removed unused 'ret' variable in eventfs_iterate()Steven Rostedt1-2/+0
commit 43cec30c44764c4b1401fdeb48bfd18c3fc7eff8 upstream. Moving to guard() usage removed the need of using the 'ret' variable but it wasn't removed. As it was set to zero, the compiler in use didn't warn (although some compilers do). Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260414110344.75c0663f@robin Fixes: 4d9b262031f ("eventfs: Simplify code using guard()s") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202604100111.AAlbQKmK-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 dayssmb: client: Use FullSessionKey for AES-256 encryption key derivationPiyush Sachdeva2-8/+26
[ Upstream commit 5be7a0cef3229fb3b63a07c0d289daf752545424 ] When Kerberos authentication is used with AES-256 encryption (AES-256-CCM or AES-256-GCM), the SMB3 encryption and decryption keys must be derived using the full session key (Session.FullSessionKey) rather than just the first 16 bytes (Session.SessionKey). Per MS-SMB2 section 3.2.5.3.1, when Connection.Dialect is "3.1.1" and Connection.CipherId is AES-256-CCM or AES-256-GCM, Session.FullSessionKey must be set to the full cryptographic key from the GSS authentication context. The encryption and decryption key derivation (SMBC2SCipherKey, SMBS2CCipherKey) must use this FullSessionKey as the KDF input. The signing key derivation continues to use Session.SessionKey (first 16 bytes) in all cases. Previously, generate_key() hardcoded SMB2_NTLMV2_SESSKEY_SIZE (16) as the HMAC-SHA256 key input length for all derivations. When Kerberos with AES-256 provides a 32-byte session key, the KDF for encryption/decryption was using only the first 16 bytes, producing keys that did not match the server's, causing mount failures with sec=krb5 and require_gcm_256=1. Add a full_key_size parameter to generate_key() and pass the appropriate size from generate_smb3signingkey(): - Signing: always SMB2_NTLMV2_SESSKEY_SIZE (16 bytes) - Encryption/Decryption: ses->auth_key.len when AES-256, otherwise 16 Also fix cifs_dump_full_key() to report the actual session key length for AES-256 instead of hardcoded CIFS_SESS_KEY_SIZE, so that userspace tools like Wireshark receive the correct key for decryption. Cc: <stable@vger.kernel.org> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com> Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 dayseventfs: Use list_add_tail_rcu() for SRCU-protected children listDavid Carlier1-1/+1
[ Upstream commit f67950b2887fa10df50c4317a1fe98a65bc6875b ] Commit d2603279c7d6 ("eventfs: Use list_del_rcu() for SRCU protected list variable") converted the removal side to pair with the list_for_each_entry_srcu() walker in eventfs_iterate(). The insertion in eventfs_create_dir() was left as a plain list_add_tail(), which on weakly-ordered architectures can expose a new entry to the SRCU reader before its list pointers and fields are observable. Use list_add_tail_rcu() so the publication pairs with the existing list_del_rcu() and list_for_each_entry_srcu(). Fixes: 43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260418152251.199343-1-devnexen@gmail.com Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 dayseventfs: Simplify code using guard()sSteven Rostedt1-60/+36
[ Upstream commit 4d9b262031ffef203243e53577a90ae6e1090e67 ] Use guard(mutex), scoped_guard(mutex) and guard(src) to simplify the code and remove a lot of the jumps to "out:" labels. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250604151625.250d13e1@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Stable-dep-of: f67950b2887f ("eventfs: Use list_add_tail_rcu() for SRCU-protected children list") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysnfsd: update mtime/ctime on COPY in presence of delegated attributesOlga Kornievskaia2-1/+11
commit 4183cf383b6faec17a0882b84cd2d901dba62b16 upstream. When delegated attributes are given on open, the file is opened with NOCMTIME and modifying operations do not update mtime/ctime as to not get out-of-sync with the client's delegated view. However, for COPY operation, the server should update its view of mtime/ctime and reflect that in any GETATTR queries. Fixes: e5e9b24ab8fa ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysnfsd: update mtime/ctime on CLONE in presense of delegated attributesOlga Kornievskaia3-15/+33
commit 2863bac7f49c4acd80a048ce52506a2b9c8db015 upstream. When delegated attributes are given on open, the file is opened with NOCMTIME and modifying operations do not update mtime/ctime as to not get out-of-sync with the client's delegated view. However, for CLONE operation, the server should update its view of mtime/ctime and reflect that in any GETATTR queries. Fixes: e5e9b24ab8fa ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysnfsd: fix file change detection in CB_GETATTRScott Mayhew1-5/+8
commit 304d81a2fbf2b454def4debcb38ea173911b72cd upstream. RFC 8881, section 10.4.3 doesn't say anything about caching the file size in the delegation record, nor does it say anything about comparing a cached file size with the size reported by the client in the CB_GETATTR reply for the purpose of determining if the client holds modified data for the file. What section 10.4.3 of RFC 8881 does say is that the server should compare the *current* file size with the size reported by the client holding the delegation in the CB_GETATTR reply, and if they differ to treat it as a modification regardless of the change attribute retrieved via the CB_GETATTR. Doing otherwise would cause the server to believe the client holding the delegation has a modified version of the file, even if the client flushed the modifications to the server prior to the CB_GETATTR. This would have the added side effect of subsequent CB_GETATTRs causing updates to the mtime, ctime, and change attribute even if the client holding the delegation makes no further updates to the file. Modify nfsd4_deleg_getattr_conflict() to obtain the current file size via i_size_read(). Retain the ncf_cur_fsize field, since it's a convenient way to return the file size back to nfsd4_encode_fattr4(), but don't use it for the purpose of detecting file changes. Remove the unnecessary initialization of ncf_cur_fsize in nfs4_open_delegation(). Also, if we recall the delegation (because the client didn't respond to the CB_GETATTR), then skip the logic that checks the nfs4_cb_fattr fields. Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Cc: stable@vger.kernel.org Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysnfsd: fix GET_DIR_DELEGATION when VFS leases are disabledOlga Kornievskaia1-4/+0
commit b0bf14546bcefa4ea49f5efcd7db2a99f0cabde9 upstream. When leases are disabled on the server, running xfstest generic/309 leads to an error because GET_DIR_DELEGATION returns EINVAL. nfsd_get_dir_deleg() can fail in several ways: like memory allocation and unable to get a lease because either leases are disable or it's already held. Currently only the condition "already held" is translated to returning directory-delegation-is-unavailable error. However, other failure conditions are likely temporary and thus should result in the same kind of error. Fixes: 8b99f6a8c116 ("nfsd: wire up GET_DIR_DELEGATION handling") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysnetfs: fix error handling in netfs_extract_user_iter()Paulo Alcantara1-3/+10
commit 0aad5704c6b4d14007d4eab15883e8524e4310f4 upstream. In netfs_extract_user_iter(), if iov_iter_extract_pages() failed to extract user pages, bail out on -ENOMEM, otherwise return the error code only if @npages == 0, allowing short DIO reads and writes to be issued. This fixes mmapstress02 from LTP tests against CIFS. Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-10-dhowells@redhat.com Cc: netfs@lists.linux.dev Cc: stable@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysxfs: fix memory leak on error in xfs_alloc_zone_info()Wilfred Mallawa1-1/+1
commit 592975da8c3ca87b043077e6eafa37665eae7936 upstream. Currently, the 0th index of the zi_used_bucket_bitmap array is not freed on error due to the pre-decrement then evaluate semantic of the while loop used in xfs_alloc_zone_info(). Fix it by allowing for the i == 0 case to be covered. Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Cc: stable@vger.kernel.org # v6.15 Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysceph: put folios not suitable for writebackHristo Venev1-0/+2
commit 544576f0f05c4a759806acddfaaeb686f14fb4b0 upstream. The batch holds references to the folios (see `filemap_get_folios`, `folio_batch_release`), so we need to `folio_put` the folios we remove. Tested on v6.18. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/74156 Signed-off-by: Hristo Venev <hristo@venev.name> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysceph: fix BUG_ON in __ceph_build_xattrs_blob() due to stale blob sizeViacheslav Dubeyko1-0/+16
commit 0c22d9511cbde746622f8e4c11aaa63fe76d45f9 upstream. The generic/642 test-case can reproduce the kernel crash: [40243.605254] ------------[ cut here ]------------ [40243.605956] kernel BUG at fs/ceph/xattr.c:918! [40243.607142] Oops: invalid opcode: 0000 [#1] SMP PTI [40243.608067] CPU: 7 UID: 0 PID: 498762 Comm: kworker/7:1 Not tainted 7.0.0-rc7+ #3 PREEMPT(full) [40243.609700] Hardware name: QEMU Ubuntu 25.10 PC v2 (i440FX + PIIX, + 10.1 machine, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [40243.611820] Workqueue: ceph-msgr ceph_con_workfn [40243.612715] RIP: 0010:__ceph_build_xattrs_blob+0x1b8/0x1e0 [40243.613731] Code: 0f 84 82 fe ff ff e9 cf 8e 56 ff 48 8d 65 e8 31 c0 5b 41 5c 41 5d 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b 4c 8b 62 08 41 8b 85 24 07 00 00 49 83 c4 04 41 89 44 24 fc [40243.616888] RSP: 0018:ffffcc80c4d4b688 EFLAGS: 00010287 [40243.617773] RAX: 0000000000010026 RBX: 0000000000000001 RCX: 0000000000000000 [40243.618928] RDX: ffff8a773798dee0 RSI: 0000000000000000 RDI: 0000000000000000 [40243.620158] RBP: ffffcc80c4d4b6a0 R08: 0000000000000000 R09: 0000000000000000 [40243.621573] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a75f3b58000 [40243.622907] R13: ffff8a75f3b58000 R14: 0000000000000080 R15: 000000000000bffd [40243.624054] FS: 0000000000000000(0000) GS:ffff8a787d1b4000(0000) knlGS:0000000000000000 [40243.625331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40243.626269] CR2: 000072f390b623c0 CR3: 000000011c02a003 CR4: 0000000000372ef0 [40243.627408] Call Trace: [40243.627839] <TASK> [40243.628188] __prep_cap+0x3fd/0x4a0 [40243.628789] ? do_raw_spin_unlock+0x4e/0xe0 [40243.629474] ceph_check_caps+0x46a/0xc80 [40243.630094] ? __lock_acquire+0x4a2/0x2650 [40243.630773] ? find_held_lock+0x31/0x90 [40243.631347] ? handle_cap_grant+0x79f/0x1060 [40243.632068] ? lock_release+0xd9/0x300 [40243.632696] ? __mutex_unlock_slowpath+0x3e/0x340 [40243.633429] ? lock_release+0xd9/0x300 [40243.634052] handle_cap_grant+0xcf6/0x1060 [40243.634745] ceph_handle_caps+0x122b/0x2110 [40243.635415] mds_dispatch+0x5bd/0x2160 [40243.636034] ? ceph_con_process_message+0x65/0x190 [40243.636828] ? lock_release+0xd9/0x300 [40243.637431] ceph_con_process_message+0x7a/0x190 [40243.638184] ? kfree+0x311/0x4f0 [40243.638749] ? kfree+0x311/0x4f0 [40243.639268] process_message+0x16/0x1a0 [40243.639915] ? sg_free_table+0x39/0x90 [40243.640572] ceph_con_v2_try_read+0xf58/0x2120 [40243.641255] ? lock_acquire+0xc8/0x300 [40243.641863] ceph_con_workfn+0x151/0x820 [40243.642493] process_one_work+0x22f/0x630 [40243.643093] ? process_one_work+0x254/0x630 [40243.643770] worker_thread+0x1e2/0x400 [40243.644332] ? __pfx_worker_thread+0x10/0x10 [40243.645020] kthread+0x109/0x140 [40243.645560] ? __pfx_kthread+0x10/0x10 [40243.646125] ret_from_fork+0x3f8/0x480 [40243.646752] ? __pfx_kthread+0x10/0x10 [40243.647316] ? __pfx_kthread+0x10/0x10 [40243.647919] ret_from_fork_asm+0x1a/0x30 [40243.648556] </TASK> [40243.648902] Modules linked in: overlay hctr2 libpolyval chacha libchacha adiantum libnh libpoly1305 essiv intel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit kvm_intel kvm irqbypass joydev ghash_clmulni_intel aesni_intel rapl input_leds mac_hid psmouse vga16fb serio_raw vgastate floppy i2c_piix4 pata_acpi bochs qemu_fw_cfg i2c_smbus sch_fq_codel rbd dm_crypt msr parport_pc ppdev lp parport efi_pstore [40243.654766] ---[ end trace 0000000000000000 ]--- Commit d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size") moved the required_blob_size computation to before the __build_xattrs() call, introducing a race. __build_xattrs() releases and reacquires i_ceph_lock during execution. In that window, handle_cap_grant() may update i_xattrs.blob with a newer MDS-provided blob and bump i_xattrs.version. When __build_xattrs() detects that index_version < version, it destroys and rebuilds the entire xattr rb-tree from the new blob, potentially increasing count, names_size, and vals_size. The prealloc_blob size check that follows still uses the stale required_blob_size computed before the rebuild, so it passes even when prealloc_blob is too small for the now-larger tree. After __set_xattr() adds one more xattr on top, __ceph_build_xattrs_blob() is called from the cap flush path and hits: BUG_ON(need > ci->i_xattrs.prealloc_blob->alloc_len); Fix this by recomputing required_blob_size after __build_xattrs() returns, using the current tree state. Also re-validate against m_max_xattr_size to fall back to the sync path if the rebuilt tree now exceeds the MDS limit. Cc: stable@vger.kernel.org Fixes: d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size") Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysceph: fix a buffer leak in __ceph_setxattr()Viacheslav Dubeyko1-0/+1
commit 5d3cc36b4e77a27ce7b686b7c59c7072bcb3fa8e upstream. The old_blob in __ceph_setxattr() can store ci->i_xattrs.prealloc_blob value during the retry. However, it is never called the ceph_buffer_put() for the old_blob object. This patch fixes the issue of the buffer leak. Cc: stable@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysbtrfs: only release the dirty pages io tree after successful writesQu Wenruo2-5/+5
commit 4066c55e109475a06d18a1f127c939d551211956 upstream. [WARNING] With extra warning on dirty extent buffers at umount (aka, the next patch in the series), test case generic/388 can trigger the following warning about dirty extent buffers at unmount time: BTRFS critical (device dm-2 state E): emergency shutdown BTRFS error (device dm-2 state E): error while writing out transaction: -30 BTRFS warning (device dm-2 state E): Skipping commit of aborted transaction. BTRFS error (device dm-2 state EA): Transaction 9 aborted (error -30) BTRFS: error (device dm-2 state EA) in cleanup_transaction:2068: errno=-30 Readonly filesystem BTRFS info (device dm-2 state EA): forced readonly BTRFS info (device dm-2 state EA): last unmount of filesystem 4fbf2e15-f941-49a0-bc7c-716315d2777c ------------[ cut here ]------------ WARNING: disk-io.c:3311 at invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs], CPU#8: umount/914368 CPU: 8 UID: 0 PID: 914368 Comm: umount Tainted: G OE 7.1.0-rc1-custom+ #372 PREEMPT(full) 2de38db8d1deae71fde295430a0ff3ab98ccf596 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 RIP: 0010:invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs] Call Trace: <TASK> close_ctree+0x52e/0x574 [btrfs d2f0b1cd330d1287e7a9919d112eadfc0e914efd] generic_shutdown_super+0x89/0x1a0 kill_anon_super+0x16/0x40 btrfs_kill_super+0x16/0x20 [btrfs d2f0b1cd330d1287e7a9919d112eadfc0e914efd] deactivate_locked_super+0x2d/0xb0 cleanup_mnt+0xdc/0x140 task_work_run+0x5a/0xa0 exit_to_user_mode_loop+0x123/0x4b0 do_syscall_64+0x243/0x7c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> ---[ end trace 0000000000000000 ]--- BTRFS warning (device dm-2 state EA): unable to release extent buffer 30539776 owner 9 gen 9 refs 2 flags 0x7 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30621696 owner 257 gen 9 refs 2 flags 0x7 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30638080 owner 258 gen 9 refs 2 flags 0x7 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30654464 owner 7 gen 9 refs 2 flags 0x7 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30703616 owner 2 gen 9 refs 2 flags 0x7 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30720000 owner 10 gen 9 refs 2 flags 0x7 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30736384 owner 4 gen 9 refs 2 flags 0x7 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30752768 owner 11 gen 9 refs 2 flags 0x7 I'm using a stripped down version, which seems to trigger the warning more reliably: _fsstress_pid="" workload() { dmesg -C mkfs.btrfs -f -K $dev > /dev/null echo 1 > /sys/kernel/debug/clear_warn_once mount $dev $mnt $fsstress -w -n 1024 -p 4 -d $mnt & _fsstress_pid=$! sleep 0 $godown $mnt pkill --echo -PIPE fsstress > /dev/null wait $_fsstress_pid unset _fsstress_pid umount $mnt if dmesg | grep -q "WARNING"; then fail fi } for (( i = 0; i < $runtime; i++ )); do echo "=== $i/$runtime ===" workload done [CAUSE] Inside btrfs_write_and_wait_transaction(), we first try to write all dirty ebs, then wait for them to finish. After that we call btrfs_extent_io_tree_release() to free all extent states from dirty_pages io tree. However if we hit an error from btrfs_write_marked_extent(), then we still call btrfs_extent_io_tree_release() to clear that dirty_pages io tree, which may contain dirty records that we haven't yet submitted. Furthermore, the later transaction cleanup path will utilize that dirty_pages io tree to properly cleanup those dirty ebs, but since it's already empty, no dirty ebs are properly cleaned up, thus will later trigger the warnings inside invalidate_btree_folios(). [FIX] Normally such dirty ebs won't cause problems, as when the iput() is called on the btree inode, the dirty ebs will be forcibly written back, and since the fs is already in an error status, such writeback will not reach disk and finish immediately. But it's still better to get rid of such dirty ebs, if we ended up with dirty ebs but the fs is not in an error status, then such writeback at iput() time will be too late, as all workers are already stopped but writeback will utilize workers, which will lead to NULL pointer dereferences. Instead of unconditionally calling btrfs_extent_io_tree_release(), only call it if btrfs_write_and_wait_transaction() finished successfully, so that @dirty_pages extent io tree is kept untouched for transaction cleanup. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 dayssmb/client: fix possible infinite loop and oob read in symlink_data()Ye Bin1-0/+3
commit 7d9a7f1f96cd617ee9e75bb22217c709038e26b8 upstream. On 32-bit architectures, the infinite loop is as follows: len = p->ErrorDataLength == 0xfffffff8 u8 *next = p->ErrorContextData + len next == p On 32-bit architectures, the out-of-bounds read is as follows: len = p->ErrorDataLength == 0xfffffff0 u8 *next = p->ErrorContextData + len next == (u8 *)p - 8 Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+") Cc: stable@vger.kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysbtrfs: fix double-decrement of bytes_may_use in submit_one_async_extent()Mark Harmstone1-1/+1
[ Upstream commit 82323b1a7088b7a5c3e528a5d634bff447fa286f ] submit_one_async_extent() calls btrfs_reserve_extent(), which decrements bytes_may_use. If the call btrfs_create_io_em() fails, we jump to out_free_reserve, which calls extent_clear_unlock_delalloc(). Because we're specifying EXTENT_DO_ACCOUNTING, i.e. EXTENT_CLEAR_META_RESV | EXTENT_CLEAR_DATA_RESV, this decreases bytes_may_use again. This can lead to problems later on, as an initial write can fail only for the writeback to silently ENOSPC. Fix this by replacing EXTENT_DO_ACCOUNTING with EXTENT_CLEAR_META_RESV. This parallels a4fe134fc1d8eb ("btrfs: fix a double release on reserved extents in cow_one_range()"), which is the same fix in cow_one_range(). Fixes: 151a41bc46df ("Btrfs: fix what bits we clear when erroring out from delalloc") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysbtrfs: don't clobber errors in add_remap_tree_entries()Mark Harmstone1-1/+1
[ Upstream commit 44366af74061793ee5ceef455a4f0e465892d0de ] In add_remap_tree_entries(), we only process a certain number of entries at a time, meaning we may need to loop. But because we weren't checking the return value of btrfs_insert_empty_items() within the loop, this meant that if the last iteration of the loop succeeded but a previous iteration failed, we were erroneously returning 0. Fix this by breaking the loop early if btrfs_insert_empty_items() fails. Fixes: b56f35560b82 ("btrfs: handle setting up relocation of block group with remap-tree") Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysbtrfs: fix bytes_may_use leak in do_remap_reloc_trans()Mark Harmstone1-0/+6
[ Upstream commit 9b8824533d75fb199a3fb0f6147ffcca64b5caf8 ] If the call to btrfs_reserve_extent() in do_remap_reloc_trans() returns a smaller extent than we asked for, currently we're not undoing the bytes_may_use change that we made. Fix this by calling btrfs_space_info_update_bytes_may_use() again for the difference. Fixes: fd6594b1446c ("btrfs: replace identity remaps with actual remaps when doing relocations") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysbtrfs: fix bytes_may_use leak in move_existing_remap()Mark Harmstone1-0/+6
[ Upstream commit 68a135013bf73dfd6a277f76fc4e088b0f3dfa79 ] If the call to btrfs_reserve_extent() in move_existing_remap() returns a smaller extent than we asked for, currently we're not undoing the bytes_may_use change that we made. Fix this by calling btrfs_space_info_update_bytes_may_use() again for the difference. Fixes: bbea42dfb91f ("btrfs: move existing remaps before relocating block group") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfsnotify: fix inode reference leak in fsnotify_recalc_mask()Amir Goldstein1-3/+36
[ Upstream commit 4aca914ac152f5d055ddcb36704d1e539ac08977 ] fsnotify_recalc_mask() fails to handle the return value of __fsnotify_recalc_mask(), which may return an inode pointer that needs to be released via fsnotify_drop_object() when the connector's HAS_IREF flag transitions from set to cleared. This manifests as a hung task with the following call trace: INFO: task umount:1234 blocked for more than 120 seconds. Call Trace: __schedule schedule fsnotify_sb_delete generic_shutdown_super kill_anon_super cleanup_mnt task_work_run do_exit do_group_exit The race window that triggers the iref leak: Thread A (adding mark) Thread B (removing mark) ────────────────────── ──────────────────────── fsnotify_add_mark_locked(): fsnotify_add_mark_list(): spin_lock(conn->lock) add mark_B(evictable) to list spin_unlock(conn->lock) return /* ---- gap: no lock held ---- */ fsnotify_detach_mark(mark_A): spin_lock(mark_A->lock) clear ATTACHED flag on mark_A spin_unlock(mark_A->lock) fsnotify_put_mark(mark_A) fsnotify_recalc_mask(): spin_lock(conn->lock) __fsnotify_recalc_mask(): /* mark_A skipped: ATTACHED cleared */ /* only mark_B(evictable) remains */ want_iref = false has_iref = true /* not yet cleared */ -> HAS_IREF transitions true -> false -> returns inode pointer spin_unlock(conn->lock) /* BUG: return value discarded! * iput() and fsnotify_put_sb_watched_objects() * are never called */ Fix this by deferring the transition true -> false of HAS_IREF flag from fsnotify_recalc_mask() (Thread A) to fsnotify_put_mark() (thread B). Fixes: c3638b5b1374 ("fsnotify: allow adding an inode mark without pinning inode") Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/CAOQ4uxiPsbHb0o5voUKyPFMvBsDkG914FYDcs4C5UpBMNm0Vcg@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfs/adfs: validate nzones in adfs_validate_bblk()Bae Yeonju1-0/+3
[ Upstream commit dd9d3e16c2d5fa166e13dce07413be51f42c8f5d ] Reject ADFS disc records with a zero zone count during boot block validation, before the disc record is used. When nzones is 0, adfs_read_map() passes it to kmalloc_array(0, ...) which returns ZERO_SIZE_PTR, and adfs_map_layout() then writes to dm[-1], causing an out-of-bounds write before the allocated buffer. adfs_validate_dr0() already rejects nzones != 1 for old-format images. Add the equivalent check to adfs_validate_bblk() for new-format images so that a crafted image with nzones == 0 is rejected at probe time. Found by syzkaller. Fixes: f6f14a0d71b0 ("fs/adfs: map: move map-specific sb initialisation to map.c") Signed-off-by: Bae Yeonju <iwasbaeyz@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayseventpoll: fix ep_remove struct eventpoll / struct file UAFChristian Brauner1-6/+10
[ Upstream commit a6dc643c69311677c574a0f17a3f4d66a5f3744b ] ep_remove() (via ep_remove_file()) cleared file->f_ep under file->f_lock but then kept using @file inside the critical section (is_file_epoll(), hlist_del_rcu() through the head, spin_unlock). A concurrent __fput() taking the eventpoll_release() fastpath in that window observed the transient NULL, skipped eventpoll_release_file() and ran to f_op->release / file_free(). For the epoll-watches-epoll case, f_op->release is ep_eventpoll_release() -> ep_clear_and_put() -> ep_free(), which kfree()s the watched struct eventpoll. Its embedded ->refs hlist_head is exactly where epi->fllink.pprev points, so the subsequent hlist_del_rcu()'s "*pprev = next" scribbles into freed kmalloc-192 memory. In addition, struct file is SLAB_TYPESAFE_BY_RCU, so the slot backing @file could be recycled by alloc_empty_file() -- reinitializing f_lock and f_ep -- while ep_remove() is still nominally inside that lock. The upshot is an attacker-controllable kmem_cache_free() against the wrong slab cache. Pin @file via epi_fget() at the top of ep_remove() and gate the critical section on the pin succeeding. With the pin held @file cannot reach refcount zero, which holds __fput() off and transitively keeps the watched struct eventpoll alive across the hlist_del_rcu() and the f_lock use, closing both UAFs. If the pin fails @file has already reached refcount zero and its __fput() is in flight. Because we bailed before clearing f_ep, that path takes the eventpoll_release() slow path into eventpoll_release_file() and blocks on ep->mtx until the waiter side's ep_clear_and_put() drops it. The bailed epi's share of ep->refcount stays intact, so the trailing ep_refcount_dec_and_test() in ep_clear_and_put() cannot free the eventpoll out from under eventpoll_release_file(); the orphaned epi is then cleaned up there. A successful pin also proves we are not racing eventpoll_release_file() on this epi, so drop the now-redundant re-check of epi->dying under f_lock. The cheap lockless READ_ONCE(epi->dying) fast-path bailout stays. Fixes: 58c9b016e128 ("epoll: use refcount to reduce ep_mutex contention") Reported-by: Jaeyoung Chung <jjy600901@snu.ac.kr> Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-6-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayseventpoll: move epi_fget() upChristian Brauner1-28/+28
[ Upstream commit 86e87059e6d1fd5115a31949726450ed03c1073b ] We'll need it when removing files so move it up. No functional change. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-5-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayseventpoll: drop vestigial __ prefix from ep_remove_{file,epi}()Christian Brauner1-6/+6
[ Upstream commit 0feaf644f7180c4a91b6b405a881afbfd958f1cf ] With __ep_remove() gone, the double-underscore on __ep_remove_file() and __ep_remove_epi() no longer contrasts with a __-less parent and just reads as noise. Rename both to ep_remove_file() and ep_remove_epi(). No functional change. Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayseventpoll: kill __ep_remove()Christian Brauner1-37/+30
[ Upstream commit e9e5cd40d7c403e19f21d0f7b8b8ba3a76b58330 ] Remove the boolean conditional in __ep_remove() and restructure the code so the check for racing with eventpoll_release_file() are only done in the ep_remove_safe() path where they belong. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-3-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayseventpoll: split __ep_remove()Christian Brauner1-4/+23
[ Upstream commit 0f7bdfd413000985de09fc39eb9efa1e091a3ce0 ] Split __ep_remove() to delineate file removal from epoll item removal. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-2-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayseventpoll: use hlist_is_singular_node() in __ep_remove()Christian Brauner1-1/+1
[ Upstream commit 3d9fd0abc94d8cd430cc7cd7d37ce5e5aae2cd2b ] Replace the open-coded "epi is the only entry in file->f_ep" check with hlist_is_singular_node(). Same semantics, and the helper avoids the head-cacheline access in the common false case. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-1-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: scope conn->binding slowpath to bound sessions onlyHyunwoo Kim1-1/+6
[ Upstream commit b0da97c034b6107d14e537e212d4ce8b22109a58 ] When the binding SESSION_SETUP sets conn->binding = true, the flag stays set after the call so that the global session lookup in ksmbd_session_lookup_all() can find the session, which was not added to conn->sessions. Because the flag is connection-wide, the global lookup path will also resolve any other session by id if asked. Tighten the global lookup so that the returned session must have this connection registered in its channel xarray (sess->ksmbd_chann_list). The channel entry is installed by the existing binding_session path in ntlm_authenticate()/krb5_authenticate() when a SESSION_SETUP completes successfully, so this condition is a strict equivalent of "this connection has been accepted as a channel of this session". Connections that have not bound to a given session cannot reach it via the global table. The existing conn->binding gate for entering the slowpath is preserved so that non-binding connections keep the fast-path-only behavior, and the session->state check is unchanged. Fixes: f5a544e3bab7 ("ksmbd: add support for SMB3 multichannel") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: fix durable fd leak on ClientGUID mismatch in durable v2 openDaeMyung Kang1-0/+2
[ Upstream commit 804054d19886ac6628883d82410f6ee42a818664 ] ksmbd_lookup_fd_cguid() returns a ksmbd_file with its refcount incremented via ksmbd_fp_get(). parse_durable_handle_context() in the DURABLE_REQ_V2 case properly releases this reference on every path inside the ClientGUID-match branch, either by calling ksmbd_put_durable_fd() or by transferring ownership to dh_info->fp for a successful reconnect. However, when an entry exists in the global file table with the same CreateGuid but a different ClientGUID, the code simply falls through to the new-open path without dropping the reference obtained from ksmbd_lookup_fd_cguid(). Per MS-SMB2 section 3.3.5.9.10 ("Handling the SMB2_CREATE_DURABLE_HANDLE_REQUEST_V2 Create Context"), the server MUST locate an Open whose Open.CreateGuid matches the request's CreateGuid AND whose Open.ClientGuid matches the ClientGuid of the connection that received the request. If no such Open is found, the server MUST continue with the normal open execution phase. A CreateGuid hit with a ClientGUID mismatch is therefore the "Open not found" case: proceeding with a new open is correct, but the reference obtained purely as a side effect of the lookup must not be leaked. Repeated requests that hit this mismatch pin global_ft entries, prevent __ksmbd_close_fd() from ever running for the corresponding files, and defeat the durable scavenger, leading to long-lived resource leaks. Release the reference in the mismatch path and clear dh_info->fp so subsequent logic does not mistake a non-matching lookup result for a reconnect target. Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: destroy async_ida in ksmbd_conn_free()DaeMyung Kang1-0/+9
[ Upstream commit b32c8db48212a34998c36d0bbc05b29d5c407ef5 ] When per-connection async_ida was converted from a dynamically allocated ksmbd_ida to an embedded struct ida, ksmbd_ida_free() was removed from the connection teardown path but no matching ida_destroy() was added. The connection is therefore freed with the IDA's backing xarray still intact. The kernel IDA API expects ida_init() and ida_destroy() to be paired over an object's lifetime, so add the missing cleanup before the connection is freed. No leak has been observed in testing; this is a pairing fix to match the IDA lifetime rules, not a response to a reproduced regression. Fixes: d40012a83f87 ("cifsd: declare ida statically") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: destroy tree_conn_ida in ksmbd_session_destroy()DaeMyung Kang1-2/+3
[ Upstream commit c049ee14eb4343b69b6f7755563f961f5e153423 ] When per-session tree_conn_ida was converted from a dynamically allocated ksmbd_ida to an embedded struct ida, ksmbd_ida_free() was removed from ksmbd_session_destroy() but no matching ida_destroy() was added. The session is therefore freed with the IDA's backing xarray still intact. The kernel IDA API expects ida_init() and ida_destroy() to be paired over an object's lifetime, so add the missing cleanup before the enclosing session is freed. Also move ida_init() to right after the session is allocated so that it is always paired with the destroy call even on the early error paths of __session_create() (ksmbd_init_file_table() or __init_smb2_session() failures), both of which jump to the error label and invoke ksmbd_session_destroy() on a partially initialised session. No leak has been observed in testing; this is a pairing fix to match the IDA lifetime rules, not a response to a reproduced regression. Fixes: d40012a83f87 ("cifsd: declare ida statically") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysksmbd: fix use-after-free in smb2_open during durable reconnectAkif1-8/+5
[ Upstream commit 1baff47b81f94f9231c91236aa511420d0e266b9 ] In smb2_open, the call to ksmbd_put_durable_fd(fp) drops the reference to the durable file descriptor early during the durable reconnect process. If an error occurs subsequently (eg, ksmbd_iov_pin_rsp fails) or a scavenger accesses the file, it leads to a use-after-free when accessing fp properties (eg fp->create_time). Move the single put to the end of the function below err_out2 so fp stays valid until smb2_open returns. Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2") Signed-off-by: Akif <akif.sait111@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayserofs: unify lcn as u64 for 32-bit platformsGao Xiang1-10/+9
[ Upstream commit 2d8c7edcb661812249469f4a5b62e9339118846f ] As sashiko reported [1], `lcn` was typed as `unsigned long` (or `unsigned int` sometimes), which is only 32 bits wide on 32-bit platforms, which causes `(lcn << lclusterbits)` to be truncated at 4 GiB. In order to consolidate the logic, just use `u64` consistently around the codebase. [1] https://sashiko.dev/r/20260420034612.1899973-1-hsiangkao%40linux.alibaba.com Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysf2fs: protect extension_list reading with sb_lock in f2fs_sbi_show()Yongpeng Yang1-2/+5
[ Upstream commit 5909bedbed38c558bee7cb6758ceedf9bc3a9194 ] In f2fs_sbi_show(), the extension_list, extension_count and hot_ext_count are read without holding sbi->sb_lock. If a concurrent sysfs store modifies the extension list via f2fs_update_extension_list(), the show path may read inconsistent count and array contents, potentially leading to out-of-bounds access or displaying stale data. Fix this by holding sb_lock around the entire extension list read and format operation. Fixes: b6a06cbbb5f7 ("f2fs: support hot file extension") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysf2fs: allow empty mount string for Opt_usr|grp|projjquotaJaegeuk Kim1-12/+15
[ Upstream commit 2a3db1e02ce08c14af04da70bb99e8a0a31eb9e8 ] The fsparam_string_empty() gives an error when mounting without string, since its type is set to fsparam_flag in VFS. So, let's allow the flag as well. This addresses xfstests/f2fs/015 and f2fs/021. Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysf2fs: fix to preserve previous reserve_{blocks,node} value when remountZhiguo Niu1-0/+2
[ Upstream commit 01968164d94762db2f703647c5acfa28613844f1 ] The following steps will change previous value of reserve_{blocks,node}, this dones not match the original intention. 1.mount -t f2fs -o reserve_root=8192 imgfile test_mount/ F2FS-fs (loop56): Mounted with checkpoint version = 1b69f8c7 mount info: /dev/block/loop56 on /data/test_mount type f2fs (xxx,reserve_root=8192,reserve_node=0,resuid=0,resgid=0,xxx) 2.mount -t f2fs -o remount,reserve_root=4096 /data/test_mount F2FS-fs (loop56): Preserve previous reserve_root=8192 check mount info: reserve_root change to 4096 /dev/block/loop56 on /data/test_mount type f2fs (xxx,reserve_root=4096,reserve_node=0,resuid=0,resgid=0,xxx) Prior to commit d18535132523 ("f2fs: separate the options parsing and options checking"), the value of reserve_{blocks,node} was only set during the first mount, along with the corresponding mount option F2FS_MOUNT_RESERVE_{ROOT,NODE} . If the mount option F2FS_MOUNT_RESERVE_{ROOT,NODE} was found to have been set during the mount/remount, the previously value of reserve_{blocks,node} would also be preserved, as shown in the code below. if (test_opt(sbi, RESERVE_ROOT)) { f2fs_info(sbi, "Preserve previous reserve_root=%u", F2FS_OPTION(sbi).root_reserved_blocks); } else { F2FS_OPTION(sbi).root_reserved_blocks = arg; set_opt(sbi, RESERVE_ROOT); } But commit d18535132523 ("f2fs: separate the options parsing and options checking") only preserved the previous mount option; it did not preserve the previous value of reserve_{blocks,node}. Since value of reserve_{blocks,node} value is assigned or not depends on ctx->spec_mask, ctx->spec_mask should be alos handled in f2fs_check_opt_consistency. This patch will clear the corresponding ctx->spec_mask bits in f2fs_check_opt_consistency to preserve the previously values of reserve_{blocks,node} if it already have a value. Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysf2fs: fix data loss caused by incorrect use of nat_entry flagYongpeng Yang1-0/+3
[ Upstream commit 238e14eb7226f883b72caccd2d37bf5707df066b ] Data loss can occur when fsync is performed on a newly created file (before any checkpoint has been written) concurrently with a checkpoint operation. The scenario is as follows: create & write & fsync 'file A' write checkpoint - f2fs_do_sync_file // inline inode - f2fs_write_inode // inode folio is dirty - f2fs_write_checkpoint - f2fs_flush_merged_writes - f2fs_sync_node_pages - f2fs_flush_nat_entries - f2fs_fsync_node_pages // no dirty node - f2fs_need_inode_block_update // return false SPO and lost 'file A' f2fs_flush_nat_entries() sets the IS_CHECKPOINTED and HAS_LAST_FSYNC flags for the nat_entry, but this does not mean that the checkpoint has actually completed successfully. However, f2fs_need_inode_block_update() checks these flags and incorrectly assumes that the checkpoint has finished. The root cause is that the semantics of IS_CHECKPOINTED and HAS_LAST_FSYNC are only guaranteed after the checkpoint write fully completes. This patch modifies f2fs_need_inode_block_update() to acquire the sbi->node_write lock before reading the nat_entry flags, ensuring that once IS_CHECKPOINTED and HAS_LAST_FSYNC are observed to be set, the checkpoint operation has already completed. Fixes: e05df3b115e7 ("f2fs: add node operations") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysf2fs: avoid reading already updated pages during GCJianan Huang1-1/+4
[ Upstream commit 570e2ccc7cb35fe720106964e65060602d3d2ac4 ] We found the following issue during fuzz testing: page: refcount:3 mapcount:0 mapping:00000000b6e89c65 index:0x18b2dc pfn:0x161ba9 memcg:f8ffff800e269c00 aops:f2fs_meta_aops ino:2 flags: 0x52880000000080a9(locked|waiters|uptodate|lru|private|zone=1|kasantag=0x4a) raw: 52880000000080a9 fffffffec6e17588 fffffffec0ccc088 a7ffff8067063618 raw: 000000000018b2dc 0000000000000009 00000003ffffffff f8ffff800e269c00 page dumped because: VM_BUG_ON_FOLIO(folio_test_uptodate(folio)) page_owner tracks the page as allocated post_alloc_hook+0x58c/0x5ec prep_new_page+0x34/0x284 get_page_from_freelist+0x2dcc/0x2e8c __alloc_pages_noprof+0x280/0x76c __folio_alloc_noprof+0x18/0xac __filemap_get_folio+0x6bc/0xdc4 pagecache_get_page+0x3c/0x104 do_garbage_collect+0x5c78/0x77a4 f2fs_gc+0xd74/0x25f0 gc_thread_func+0xb28/0x2930 kthread+0x464/0x5d8 ret_from_fork+0x10/0x20 ------------[ cut here ]------------ kernel BUG at mm/filemap.c:1563! folio_end_read+0x140/0x168 f2fs_finish_read_bio+0x5c4/0xb80 f2fs_read_end_io+0x64c/0x708 bio_endio+0x85c/0x8c0 blk_update_request+0x690/0x127c scsi_end_request+0x9c/0xb8c scsi_io_completion+0xf0/0x250 scsi_finish_command+0x430/0x45c scsi_complete+0x178/0x6d4 blk_mq_complete_request+0xcc/0x104 scsi_done_internal+0x214/0x454 scsi_done+0x24/0x34 which is similar to the problem reported by syzbot: https://syzkaller.appspot.com/bug?extid=3686758660f980b402dc This case is consistent with the description in commit 9bf1a3f ("f2fs: avoid GC causing encrypted file corrupted"): Page 1 is moved from blkaddr A to blkaddr B by move_data_block, and after being written it is marked as uptodate. Then, Page 1 is moved from blkaddr B to blkaddr C, VM_BUG_ON_FOLIO was triggered in the endio initiated by ra_data_block. There is no need to read Page 1 again from blkaddr B, since it has already been updated. Therefore, avoid initiating I/O in this case. Fixes: 6aa58d8ad20a ("f2fs: readahead encrypted block during GC") Signed-off-by: Jianan Huang <huangjianan@xiaomi.com> Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfs/ntfs3: terminate the cached volume label after UTF-8 conversionPengpeng Hou1-1/+6
[ Upstream commit a6cd43fe9b083fa23fe1595666d5738856cb261a ] ntfs_fill_super() loads the on-disk volume label with utf16s_to_utf8s() and stores the result in sbi->volume.label. The converted label is later exposed through ntfs3_label_show() using %s, but utf16s_to_utf8s() only returns the number of bytes written and does not add a trailing NUL. If the converted label fills the entire fixed buffer, ntfs3_label_show() can read past the end of sbi->volume.label while looking for a terminator. Terminate the cached label explicitly after a successful conversion and clamp the exact-full case to the last byte of the buffer. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysnfsd: use dynamic allocation for oversized NFSv4.0 replay cacheChuck Lever3-12/+39
[ Upstream commit 116b6b7acdd82605ed530232cd7509d1b5282f5c ] Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE to prevent a heap overflow, but set rp_buflen to zero when the encoded response exceeded the inline buffer. A retransmitted LOCK reaching the replay path then produced only a status code with no operation body, resulting in a malformed XDR response. When the encoded response exceeds the 112-byte inline rp_ibuf, a buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen remains zero, preserving the behavior from the capped-copy fix. The buffer is freed when the stateowner is released or when a subsequent operation's response fits in the inline buffer. Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysNFSD: fix nfs4_file access extra count in nfsd4_add_rdaccess_to_wrdelegDai Ngo1-2/+2
[ Upstream commit b48f44f36e6607b2f818560f19deb86b4a9c717b ] In nfsd4_add_rdaccess_to_wrdeleg, if fp->fi_fds[O_RDONLY] is already set by another thread, __nfs4_file_get_access should not be called to increment the nfs4_file access count since that was already done by the thread that added READ access to the file. The extra fi_access count in nfs4_file can prevent the corresponding nfsd_file from being freed. When stopping nfs-server service, these extra access counts trigger a BUG in kmem_cache_destroy() that shows nfsd_file object remaining on __kmem_cache_shutdown. This problem can be reproduced by running the Git project's test suite over NFS. Fixes: 8072e34e1387 ("nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayssunrpc: Fix compilation error (`make W=1`) when dprintk() is no-opAndy Shevchenko1-0/+5
[ Upstream commit 6f57293abb8d087de830dd3f02e66d94b3e59973 ] Clang compiler is not happy about set but unused variables: .../flexfilelayout/flexfilelayoutdev.c:56:9: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable] .../flexfilelayout/flexfilelayout.c:1505:6: error: variable 'err' set but not used [-Werror,-Wunused-but-set-variable] .../nfs4proc.c:9244:12: error: variable 'ptr' set but not used [-Werror,-Wunused-but-set-variable] Fix these by forwarding parameters of dprintk() to no_printk(). The positive side-effect is a format-string checker enabled even for the cases when dprintk() is no-op. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Fixes: fc931582c260 ("nfs41: create_session operation") Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 dayssunrpc: Kill RPC_IFDEBUG()Andy Shevchenko1-3/+6
[ Upstream commit adcc59114ccd402259c089b0fea24da5e4974563 ] RPC_IFDEBUG() is used in only two places. In one the user of the definition is guarded by ifdeffery, in the second one it's implied due to dprintk() usage. Kill the macro and move the ifdeffery to the regular condition with the variable defined inside, while in the second case add the same conditional and move the respective code there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: 6f57293abb8d ("sunrpc: Fix compilation error (`make W=1`) when dprintk() is no-op") Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysnfs/blocklayout: Fix compilation error (`make W=1`) in bl_write_pagelist()Andy Shevchenko1-3/+1
[ Upstream commit f83c8dda456ce4863f346aa26d88efa276eda35d ] Clang compiler is not happy about set but unused variable (when dprintk() is no-op): .../blocklayout/blocklayout.c:384:9: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable] Remove a leftover from the previous cleanup. Fixes: 3a6fd1f004fc ("pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist") Acked-by: Anna Schumaker <anna.schumkaer@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfs/ntfs3: fix missing run load for vcn0 in attr_data_get_block_locked()Deepanshu Kartikey1-0/+15
[ Upstream commit d7ea8495fd307b58f8867acd81a1b40075b1d3ba ] When a compressed or sparse attribute has its clusters frame-aligned, vcn is rounded down to the frame start using cmask, which can result in vcn != vcn0. In this case, vcn and vcn0 may reside in different attribute segments. The code already handles the case where vcn is in a different segment by loading its runs before allocation. However, it fails to load runs for vcn0 when vcn0 resides in a different segment than vcn. This causes run_lookup_entry() to return SPARSE_LCN for vcn0 since its segment was never loaded into the in-memory run list, triggering the WARN_ON(1). Fix this by adding a missing check for vcn0 after the existing vcn segment check. If vcn0 falls outside the current segment range [svcn, evcn1), find and load the attribute segment containing vcn0 before performing the run lookup. The following scenario triggers the bug: attr_data_get_block_locked() vcn = vcn0 & cmask <- vcn != vcn0 after frame alignment load runs for vcn segment <- vcn0 segment not loaded! attr_allocate_clusters() <- allocation succeeds run_lookup_entry(vcn0) <- vcn0 not in run -> SPARSE_LCN WARN_ON(1) <- bug fires here! Reported-by: syzbot+c1e9aedbd913fadad617@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c1e9aedbd913fadad617 Fixes: c380b52f6c57 ("fs/ntfs3: Change new sparse cluster processing") Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysfs/ntfs3: prevent uninitialized lcn caused by zero lenEdward Adam Davis1-5/+5
[ Upstream commit e98266e823a1fa06fe6499df61aeaac2fd6f7a49 ] syzbot reported a uninit-value in ntfs_iomap_begin [1]. Since runs was not touched yet, run_lookup_entry() immediately fails and returns false, which makes the value of "*len" 0. Simultaneously, the new value and err value are also 0, causing the logic in attr_data_get_block_locked() to jump directly to ok, ultimately resulting in *lcn being triggered before it is set [1]. In ntfs_iomap_begin(), the check for a 0 value in clen is moved forward to before updating lcn to avoid this [1]. [1] BUG: KMSAN: uninit-value in ntfs_iomap_begin+0x8c0/0x1460 fs/ntfs3/inode.c:825 ntfs_iomap_begin+0x8c0/0x1460 fs/ntfs3/inode.c:825 iomap_iter+0x9b7/0x1540 fs/iomap/iter.c:110 Local variable lcn created at: ntfs_iomap_begin+0x15d/0x1460 fs/ntfs3/inode.c:786 Fixes: 10d7c95af043 ("fs/ntfs3: add delayed-allocation (delalloc) support") Reported-by: syzbot+7be88937363ac7ab7bb0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7be88937363ac7ab7bb0 Tested-by: syzbot+7be88937363ac7ab7bb0@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysext4: fix possible null-ptr-deref in mbt_kunit_exit()Ye Bin1-1/+5
[ Upstream commit 22f53f08d9eb837ce69b1a07641d414aac8d045f ] There's issue as follows: # test_new_blocks_simple: failed to initialize: -12 KASAN: null-ptr-deref in range [0x0000000000000638-0x000000000000063f] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST RIP: 0010:mbt_kunit_exit+0x5e/0x3e0 [ext4_test] Call Trace: <TASK> kunit_try_run_case_cleanup+0xbc/0x100 [kunit] kunit_generic_run_threadfn_adapter+0x89/0x100 [kunit] kthread+0x408/0x540 ret_from_fork+0xa76/0xdf0 ret_from_fork_asm+0x1a/0x30 If mbt_kunit_init() init testcase failed will lead to null-ptr-deref. So add test if 'sb' is inited success in mbt_kunit_exit(). Fixes: 7c9fa399a369 ("ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20260330133035.287842-6-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysext4: fix possible null-ptr-deref in extents_kunit_exit()Ye Bin1-2/+5
[ Upstream commit ca78c31af467ffe94b15f6a2e4e1cc1c164db19b ] There's issue as follows: KASAN: null-ptr-deref in range [0x00000000000002c0-0x00000000000002c7] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST RIP: 0010:extents_kunit_exit+0x2e/0xc0 [ext4_test] Call Trace: <TASK> kunit_try_run_case_cleanup+0xbc/0x100 [kunit] kunit_generic_run_threadfn_adapter+0x89/0x100 [kunit] kthread+0x408/0x540 ret_from_fork+0xa76/0xdf0 ret_from_fork_asm+0x1a/0x30 Above issue happens as extents_kunit_init() init testcase failed. So test if testcase is inited success. Fixes: cb1e0c1d1fad ("ext4: kunit tests for extent splitting and conversion") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/20260330133035.287842-5-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysext4: fix the error handling process in extents_kunit_init).Ye Bin1-15/+35
[ Upstream commit 17f73c95d47325000ee68492be3ad76ae09f6f19 ] The error processing in extents_kunit_init() is improper, causing resource leakage. Reconstruct the error handling process to prevent potential resource leaks Fixes: cb1e0c1d1fad ("ext4: kunit tests for extent splitting and conversion") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20260330133035.287842-4-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>