summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
5 daysMerge tag 'erofs-for-6.17-rc2-fixes' of ↵Linus Torvalds3-24/+37
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Align FSDAX enablement among multiple devices - Fix EROFS_FS_ZIP_ACCEL build dependency again to prevent forcing CRYPTO{,_DEFLATE}=y even if EROFS=m - Fix atomic context detection to properly launch kworkers on demand - Fix block count statistics for 48-bit addressing support * tag 'erofs-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix block count report when 48-bit layout is on erofs: fix atomic context detection when !CONFIG_DEBUG_LOCK_ALLOC erofs: Do not select tristate symbols from bool symbols erofs: Fallback to normal access if DAX is not supported on extra device
5 daysMerge tag 'mm-hotfixes-stable-2025-08-12-20-50' of ↵Linus Torvalds1-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "12 hotfixes. 5 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 10 of these fixes are for MM" * tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: proc: proc_maps_open allow proc_mem_open to return NULL mm/mremap: avoid expensive folio lookup on mremap folio pte batch userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry mm: pass page directly instead of using folio_page selftests/proc: fix string literal warning in proc-maps-race.c fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_stats mm/smaps: fix race between smaps_hugetlb_range and migration mm: fix the race between collapse and PT_RECLAIM under per-vma lock mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup() MAINTAINERS: add Masami as a reviewer of hung task detector mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock kasan/test: fix protection against compiler elision
6 daysMerge tag 'for-6.17-rc1-tag' of ↵Linus Torvalds6-23/+39
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix bug in qgroups reporting incorrect usage for higher level qgroups - in zoned mode, do not select metadata group as finish target - convert xarray lock to RCU when trying to release extent buffer to avoid a deadlock - do not allow relocation on partially dropped subvolumes, which is normally not possible but has been reported on old filesystems - in tree-log, report errors on missing block group when unaccounting log tree extent buffers - with large folios, fix range length when processing ordered extents * tag 'for-6.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix iteration bug in __qgroup_excl_accounting() btrfs: zoned: do not select metadata BG as finish target btrfs: do not allow relocation of partially dropped subvolumes btrfs: error on missing block group when unaccounting log tree extent buffers btrfs: fix wrong length parameter for btrfs_cleanup_ordered_extents() btrfs: make btrfs_cleanup_ordered_extents() support large folios btrfs: fix subpage deadlock in try_release_subpage_extent_buffer()
7 daysproc: proc_maps_open allow proc_mem_open to return NULLJialin Wang1-2/+2
The commit 65c66047259f ("proc: fix the issue of proc_mem_open returning NULL") caused proc_maps_open() to return -ESRCH when proc_mem_open() returns NULL. This breaks legitimate /proc/<pid>/maps access for kernel threads since kernel threads have NULL mm_struct. The regression causes perf to fail and exit when profiling a kernel thread: # perf record -v -g -p $(pgrep kswapd0) ... couldn't open /proc/65/task/65/maps This patch partially reverts the commit to fix it. Link: https://lkml.kernel.org/r/20250807165455.73656-1-wjl.linux@gmail.com Fixes: 65c66047259f ("proc: fix the issue of proc_mem_open returning NULL") Signed-off-by: Jialin Wang <wjl.linux@gmail.com> Cc: Penglei Jiang <superman.xpt@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
7 daysMerge tag 'nfsd-6.17-1' of ↵Linus Torvalds2-3/+12
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - A correctness fix for delegated timestamps - Address an NFSD shutdown hang when LOCALIO is in use - Prevent a remotely exploitable crasher when TLS is in use * tag 'nfsd-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: sunrpc: fix handling of server side tls alerts nfsd: avoid ref leak in nfsd_open_local_fh() nfsd: don't set the ctime on delegated atime updates
8 dayserofs: fix block count report when 48-bit layout is onGao Xiang1-2/+2
Fix incorrect shift order when combining the 48-bit block count. Fixes: 2e1473d5195f ("erofs: implement 48-bit block addressing for unencoded inodes") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250807082019.3093539-1-hsiangkao@linux.alibaba.com
8 dayserofs: fix atomic context detection when !CONFIG_DEBUG_LOCK_ALLOCJunli Liu1-2/+11
Since EROFS handles decompression in non-atomic contexts due to uncontrollable decompression latencies and vmap() usage, it tries to detect atomic contexts and only kicks off a kworker on demand in order to reduce unnecessary scheduling overhead. However, the current approach is insufficient and can lead to sleeping function calls in invalid contexts, causing kernel warnings and potential system instability. See the stacktrace [1] and previous discussion [2]. The current implementation only checks rcu_read_lock_any_held(), which behaves inconsistently across different kernel configurations: - When CONFIG_DEBUG_LOCK_ALLOC is enabled: correctly detects RCU critical sections by checking rcu_lock_map - When CONFIG_DEBUG_LOCK_ALLOC is disabled: compiles to "!preemptible()", which only checks preempt_count and misses RCU critical sections This patch introduces z_erofs_in_atomic() to provide comprehensive atomic context detection: 1. Check RCU preemption depth when CONFIG_PREEMPTION is enabled, as RCU critical sections may not affect preempt_count but still require atomic handling 2. Always use async processing when CONFIG_PREEMPT_COUNT is disabled, as preemption state cannot be reliably determined 3. Fall back to standard preemptible() check for remaining cases The function replaces the previous complex condition check and ensures that z_erofs always uses (kthread_)work in atomic contexts to minimize scheduling overhead and prevent sleeping in invalid contexts. [1] Problem stacktrace [ 61.266692] BUG: sleeping function called from invalid context at kernel/locking/rtmutex_api.c:510 [ 61.266702] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 107, name: irq/54-ufshcd [ 61.266704] preempt_count: 0, expected: 0 [ 61.266705] RCU nest depth: 2, expected: 0 [ 61.266710] CPU: 0 UID: 0 PID: 107 Comm: irq/54-ufshcd Tainted: G W O 6.12.17 #1 [ 61.266714] Tainted: [W]=WARN, [O]=OOT_MODULE [ 61.266715] Hardware name: schumacher (DT) [ 61.266717] Call trace: [ 61.266718] dump_backtrace+0x9c/0x100 [ 61.266727] show_stack+0x20/0x38 [ 61.266728] dump_stack_lvl+0x78/0x90 [ 61.266734] dump_stack+0x18/0x28 [ 61.266736] __might_resched+0x11c/0x180 [ 61.266743] __might_sleep+0x64/0xc8 [ 61.266745] mutex_lock+0x2c/0xc0 [ 61.266748] z_erofs_decompress_queue+0xe8/0x978 [ 61.266753] z_erofs_decompress_kickoff+0xa8/0x190 [ 61.266756] z_erofs_endio+0x168/0x288 [ 61.266758] bio_endio+0x160/0x218 [ 61.266762] blk_update_request+0x244/0x458 [ 61.266766] scsi_end_request+0x38/0x278 [ 61.266770] scsi_io_completion+0x4c/0x600 [ 61.266772] scsi_finish_command+0xc8/0xe8 [ 61.266775] scsi_complete+0x88/0x148 [ 61.266777] blk_mq_complete_request+0x3c/0x58 [ 61.266780] scsi_done_internal+0xcc/0x158 [ 61.266782] scsi_done+0x1c/0x30 [ 61.266783] ufshcd_compl_one_cqe+0x12c/0x438 [ 61.266786] __ufshcd_transfer_req_compl+0x2c/0x78 [ 61.266788] ufshcd_poll+0xf4/0x210 [ 61.266789] ufshcd_transfer_req_compl+0x50/0x88 [ 61.266791] ufshcd_intr+0x21c/0x7c8 [ 61.266792] irq_forced_thread_fn+0x44/0xd8 [ 61.266796] irq_thread+0x1a4/0x358 [ 61.266799] kthread+0x12c/0x138 [ 61.266802] ret_from_fork+0x10/0x20 [2] https://lore.kernel.org/r/58b661d0-0ebb-4b45-a10d-c5927fb791cd@paulmck-laptop Signed-off-by: Junli Liu <liujunli@lixiang.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250805011957.911186-1-liujunli@lixiang.com [ Gao Xiang: Use the original trace in v1. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
8 dayserofs: Do not select tristate symbols from bool symbolsGeert Uytterhoeven1-10/+10
The EROFS filesystem has many configurable options, controlled through boolean Kconfig symbols. When enabled, these options may need to enable additional library functionality elsewhere. Currently this is done by selecting the symbol for the additional functionality. However, if EROFS_FS itself is modular, and the target symbol is a tristate symbol, the additional functionality is always forced built-in. Selecting tristate symbols from a tristate symbol does keep modular transitivity. Hence fix this by moving selects of tristate symbols to the main EROFS_FS symbol. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/da1b899e511145dd43fd2d398f64b2e03c6a39e7.1753879351.git.geert+renesas@glider.be Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
8 dayserofs: Fallback to normal access if DAX is not supported on extra deviceYuezhang Mo1-10/+14
If using multiple devices, we should check if the extra device support DAX instead of checking the primary device when deciding if to use DAX to access a file. If an extra device does not support DAX we should fallback to normal access otherwise the data on that device will be inaccessible. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Friendy Su <friendy.su@sony.com> Reviewed-by: Jacky Cao <jacky.cao@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20250804082030.3667257-2-Yuezhang.Mo@sony.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
10 daysMerge tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds28-400/+784
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - don't inherit NFS filesystem capabilities when crossing from one filesystem to another Bugfixes: - NFS wakeup of __nfs_lookup_revalidate() needs memory barriers - NFS improve bounds checking in nfs_fh_to_dentry() - NFS Fix allocation errors when writing to a NFS file backed loopback device - NFSv4: More listxattr fixes - SUNRPC: fix client handling of TLS alerts - pNFS block/scsi layout fix for an uninitialised pointer dereference - pNFS block/scsi layout fixes for the extent encoding, stripe mapping, and disk offset overflows - pNFS layoutcommit work around for RPC size limitations - pNFS/flexfiles avoid looping when handling fatal errors after layoutget - localio: fix various race conditions Features and cleanups: - Add NFSv4 support for retrieving the btime - NFS: Allow folio migration for the case of mode == MIGRATE_SYNC - NFS: Support using a kernel keyring to store TLS certificates - NFSv4: Speed up delegation lookup using a hash table - Assorted cleanups to remove unused variables and struct fields - Assorted new tracepoints to improve debugging" * tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits) NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() NFS/localio: nfs_close_local_fh() fix check for file closed NFSv4: Remove duplicate lookups, capability probes and fsinfo calls NFS: Fix the setting of capabilities when automounting a new filesystem sunrpc: fix client side handling of tls alerts nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock() NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY NFSv4.2: another fix for listxattr NFS: Fix filehandle bounds checking in nfs_fh_to_dentry() SUNRPC: Silence warnings about parameters not being described NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final() NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate() NFS: use a hash table for delegation lookup NFS: track active delegations per-server NFS: move the delegation_watermark module parameter NFS: cleanup nfs_inode_reclaim_delegation NFS: cleanup error handling in nfs4_server_common_setup pNFS/flexfiles: don't attempt pnfs on fatal DS errors NFS: drop __exit from nfs_exit_keyring ...
10 daysMerge tag 'v6.17rc-part2-SMB3-client-fixes' of ↵Linus Torvalds20-1031/+1066
git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: "Non-smbdirect: - Fix null ptr deref caused by delay in global spinlock initialization - Two fixes for native symlink creation with SMB3.1.1 POSIX Extensions - Fix for socket special file creation with SMB3.1.1 POSIX Exensions - Reduce lock contention by splitting out mid_counter_lock - move SMB1 transport code to separate file to reduce module size when support for legacy servers is disabled - Two cleanup patches: rename mid_lock to make it clearer what it protects and one to convert mid flags to bool to make clearer Smbdirect/RDMA restructuring and fixes: - Fix for error handling in send done - Remove unneeded empty packet queue - Fix put_receive_buffer error path - Two fixes to recv_done error paths - Remove unused variable - Improve response and recvmsg type handling - Fix handling of incoming message type - Two cleanup fixes for better handling smbdirect recv io - Two cleanup fixes for socket spinlock - Two patches that add socket reassembly struct - Remove unused connection_status enum - Use flag in common header for SMBDIRECT_RECV_IO_MAX_SGE - Two cleanup patches to introduce and use smbdirect send io - Two cleanup patches to introduce and use smbdirect send_io struct - Fix to return error if rdma connect takes longer than 5 seconds - Error logging improvements - Fix redundand call to init_waitqueue_head - Remove unneeded wait queue" * tag 'v6.17rc-part2-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (33 commits) smb: client: only use a single wait_queue to monitor smbdirect connection status smb: client: don't call init_waitqueue_head(&info->conn_wait) twice in _smbd_get_connection smb: client: improve logging in smbd_conn_upcall() smb: client: return an error if rdma_connect does not return within 5 seconds smb: client: make use of smbdirect_socket.{send,recv}_io.mem.{cache,pool} smb: smbdirect: add smbdirect_socket.{send,recv}_io.mem.{cache,pool} smb: client: make use of struct smbdirect_send_io smb: smbdirect: introduce struct smbdirect_send_io smb: client: make use of SMBDIRECT_RECV_IO_MAX_SGE smb: smbdirect: add SMBDIRECT_RECV_IO_MAX_SGE smb: client: remove unused enum smbd_connection_status smb: client: make use of smbdirect_socket.recv_io.reassembly.* smb: smbdirect: introduce smbdirect_socket.recv_io.reassembly.* smb: client: make use of smb: smbdirect_socket.recv_io.free.{list,lock} smb: smbdirect: introduce smbdirect_socket.recv_io.free.{list,lock} smb: client: make use of struct smbdirect_recv_io smb: smbdirect: introduce struct smbdirect_recv_io smb: client: make use of smbdirect_socket->recv_io.expected smb: smbdirect: introduce smbdirect_socket.recv_io.expected smb: client: remove unused smbd_connection->fragment_reassembly_remaining ...
10 daysMerge tag 'v6.17rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds4-63/+54
Pull smb server fixes from Steve French: - Fix limiting repeated connections from same IP - Fix for extracting shortname when name begins with a dot - Four smbdirect fixes: - three fixes to the receive path: potential unmap bug, potential resource leaks and stale connections, and also potential use after free race - cleanup to remove unneeded queue * tag 'v6.17rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: smb: server: Fix extension string in ksmbd_extract_shortname() ksmbd: limit repeated connections from clients with the same IP smb: server: let recv_done() avoid touching data_transfer after cleanup/move smb: server: let recv_done() consistently call put_recvmsg/smb_direct_disconnect_rdma_connection smb: server: make sure we call ib_dma_unmap_single() only if we called ib_dma_map_single already smb: server: remove separate empty_recvmsg_queue
11 dayssmb: server: Fix extension string in ksmbd_extract_shortname()Thorsten Blum1-1/+1
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the length of the source string (excluding the NUL terminator) rather than the size of the destination buffer. This results in "__" being copied to 'extension' rather than "___" (two underscores instead of three). Use the destination buffer size instead to ensure that the string "___" (three underscores) is copied correctly. Cc: stable@vger.kernel.org Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
11 daysksmbd: limit repeated connections from clients with the same IPNamjae Jeon2-0/+18
Repeated connections from clients with the same IP address may exhaust the max connections and prevent other normal client connections. This patch limit repeated connections from clients with the same IP. Reported-by: tianshuo han <hantianshuo233@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
11 dayssmb: client: only use a single wait_queue to monitor smbdirect connection statusStefan Metzmacher2-11/+9
There's no need for separate conn_wait and disconn_wait queues. This will simplify the move to common code, the server code already a single wait_queue for this. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
11 dayssmb: client: don't call init_waitqueue_head(&info->conn_wait) twice in ↵Stefan Metzmacher1-1/+0
_smbd_get_connection It is already called long before we may hit this cleanup code path. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
11 dayssmb: client: improve logging in smbd_conn_upcall()Stefan Metzmacher1-4/+10
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
11 dayssmb: client: return an error if rdma_connect does not return within 5 secondsStefan Metzmacher1-2/+4
This matches the timeout for tcp connections. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
11 daysbtrfs: fix iteration bug in __qgroup_excl_accounting()Boris Burkov1-2/+1
__qgroup_excl_accounting() uses the qgroup iterator machinery to update the account of one qgroups usage for all its parent hierarchy, when we either add or remove a relation and have only exclusive usage. However, there is a small bug there: we loop with an extra iteration temporary qgroup called `cur` but never actually refer to that in the body of the loop. As a result, we redundantly account the same usage to the first qgroup in the list. This can be reproduced in the following way: mkfs.btrfs -f -O squota <dev> mount <dev> <mnt> btrfs subvol create <mnt>/sv dd if=/dev/zero of=<mnt>/sv/f bs=1M count=1 sync btrfs qgroup create 1/100 <mnt> btrfs qgroup create 2/200 <mnt> btrfs qgroup assign 1/100 2/200 <mnt> btrfs qgroup assign 0/256 1/100 <mnt> btrfs qgroup show <mnt> and the broken result is (note the 2MiB on 1/100 and 0Mib on 2/100): Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 1.02MiB 1.02MiB sv Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 1.02MiB 1.02MiB sv 1/100 2.03MiB 2.03MiB 2/100<1 member qgroup> 2/100 0.00B 0.00B <0 member qgroups> With this fix, which simply re-uses `qgroup` as the iteration variable, we see the expected result: Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 1.02MiB 1.02MiB sv Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 1.02MiB 1.02MiB sv 1/100 1.02MiB 1.02MiB 2/100<1 member qgroup> 2/100 1.02MiB 1.02MiB <0 member qgroups> The existing fstests did not exercise two layer inheritance so this bug was missed. I intend to add that testing there, as well. Fixes: a0bdc04b0732 ("btrfs: qgroup: use qgroup_iterator in __qgroup_excl_accounting()") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
11 daysbtrfs: zoned: do not select metadata BG as finish targetNaohiro Aota1-1/+1
We call btrfs_zone_finish_one_bg() to zone finish one block group and make room to activate another block group. Currently, we can choose a metadata block group as a target. But, as we reserve an active metadata block group, we no longer want to select a metadata block group. So, skip it in the loop. CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
11 daysbtrfs: do not allow relocation of partially dropped subvolumesQu Wenruo1-0/+19
[BUG] There is an internal report that balance triggered transaction abort, with the following call trace: item 85 key (594509824 169 0) itemoff 12599 itemsize 33 extent refs 1 gen 197740 flags 2 ref#0: tree block backref root 7 item 86 key (594558976 169 0) itemoff 12566 itemsize 33 extent refs 1 gen 197522 flags 2 ref#0: tree block backref root 7 ... BTRFS error (device loop0): extent item not found for insert, bytenr 594526208 num_bytes 16384 parent 449921024 root_objectid 934 owner 1 offset 0 BTRFS error (device loop0): failed to run delayed ref for logical 594526208 num_bytes 16384 type 182 action 1 ref_mod 1: -117 ------------[ cut here ]------------ BTRFS: Transaction aborted (error -117) WARNING: CPU: 1 PID: 6963 at ../fs/btrfs/extent-tree.c:2168 btrfs_run_delayed_refs+0xfa/0x110 [btrfs] And btrfs check doesn't report anything wrong related to the extent tree. [CAUSE] The cause is a little complex, firstly the extent tree indeed doesn't have the backref for 594526208. The extent tree only have the following two backrefs around that bytenr on-disk: item 65 key (594509824 METADATA_ITEM 0) itemoff 13880 itemsize 33 refs 1 gen 197740 flags TREE_BLOCK tree block skinny level 0 (176 0x7) tree block backref root CSUM_TREE item 66 key (594558976 METADATA_ITEM 0) itemoff 13847 itemsize 33 refs 1 gen 197522 flags TREE_BLOCK tree block skinny level 0 (176 0x7) tree block backref root CSUM_TREE But the such missing backref item is not an corruption on disk, as the offending delayed ref belongs to subvolume 934, and that subvolume is being dropped: item 0 key (934 ROOT_ITEM 198229) itemoff 15844 itemsize 439 generation 198229 root_dirid 256 bytenr 10741039104 byte_limit 0 bytes_used 345571328 last_snapshot 198229 flags 0x1000000000001(RDONLY) refs 0 drop_progress key (206324 EXTENT_DATA 2711650304) drop_level 2 level 2 generation_v2 198229 And that offending tree block 594526208 is inside the dropped range of that subvolume. That explains why there is no backref item for that bytenr and why btrfs check is not reporting anything wrong. But this also shows another problem, as btrfs will do all the orphan subvolume cleanup at a read-write mount. So half-dropped subvolume should not exist after an RW mount, and balance itself is also exclusive to subvolume cleanup, meaning we shouldn't hit a subvolume half-dropped during relocation. The root cause is, there is no orphan item for this subvolume. In fact there are 5 subvolumes from around 2021 that have the same problem. It looks like the original report has some older kernels running, and caused those zombie subvolumes. Thankfully upstream commit 8d488a8c7ba2 ("btrfs: fix subvolume/snapshot deletion not triggered on mount") has long fixed the bug. [ENHANCEMENT] For repairing such old fs, btrfs-progs will be enhanced. Considering how delayed the problem will show up (at run delayed ref time) and at that time we have to abort transaction already, it is too late. Instead here we reject any half-dropped subvolume for reloc tree at the earliest time, preventing confusion and extra time wasted on debugging similar bugs. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
11 daysbtrfs: error on missing block group when unaccounting log tree extent buffersFilipe Manana1-12/+7
Currently we only log an error message if we can't find the block group for a log tree extent buffer when unaccounting it (while freeing a log tree). A missing block group means something is seriously wrong and we end up leaking space from the metadata space info. So return -ENOENT in case we don't find the block group. CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
11 daysbtrfs: fix wrong length parameter for btrfs_cleanup_ordered_extents()Qu Wenruo1-1/+1
Inside nocow_one_range(), if the checksum cloning for data reloc inode failed, we call btrfs_cleanup_ordered_extents() to cleanup the just allocated ordered extents. But unlike extent_clear_unlock_delalloc(), btrfs_cleanup_ordered_extents() requires a length, not an inclusive end bytenr. This can be problematic, as the @end is normally way larger than @len. This means btrfs_cleanup_ordered_extents() can be called on folios out of the correct range, and if the out-of-range folio is under writeback, we can incorrectly clear the ordered flag of the folio, and trigger the DEBUG_WARN() inside btrfs_writepage_cow_fixup(). Fix the wrong parameter with correct length instead. Fixes: 94f6c5c17e52 ("btrfs: move ordered extent cleanup to where they are allocated") CC: stable@vger.kernel.org # 6.15+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
11 daysbtrfs: make btrfs_cleanup_ordered_extents() support large foliosQu Wenruo1-2/+4
When hitting a large folio, btrfs_cleanup_ordered_extents() will get the same large folio multiple times, and clearing the same range again and again. Thankfully this is not causing anything wrong, just inefficiency. This is caused by the fact that we're iterating folios using the old page index, thus can hit the same large folio again and again. Enhance it by increasing @index to the index of the folio end, and only increase @index by 1 if we failed to grab a folio. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
11 daysbtrfs: fix subpage deadlock in try_release_subpage_extent_buffer()Leo Martins1-5/+6
There is a potential deadlock that can happen in try_release_subpage_extent_buffer() because the irq-safe xarray spin lock fs_info->buffer_tree is being acquired before the irq-unsafe eb->refs_lock. This leads to the potential race: // T1 (random eb->refs user) // T2 (release folio) spin_lock(&eb->refs_lock); // interrupt end_bbio_meta_write() btrfs_meta_folio_clear_writeback() btree_release_folio() folio_test_writeback() //false try_release_extent_buffer() try_release_subpage_extent_buffer() xa_lock_irq(&fs_info->buffer_tree) spin_lock(&eb->refs_lock); // blocked; held by T1 buffer_tree_clear_mark() xas_lock_irqsave() // blocked; held by T2 I believe that the spin lock can safely be replaced by an rcu_read_lock. The xa_for_each loop does not need the spin lock as it's already internally protected by the rcu_read_lock. The extent buffer is also protected by the rcu_read_lock so it won't be freed before we take the eb->refs_lock and check the ref count. The rcu_read_lock is taken and released every iteration, just like the spin lock, which means we're not protected against concurrent insertions into the xarray. This is fine because we rely on folio->private to detect if there are any ebs remaining in the folio. There is already some precedent for this with find_extent_buffer_nolock, which loads an extent buffer from the xarray with only rcu_read_lock. lockdep warning: ===================================================== WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 6.16.0-0_fbk701_debug_rc0_123_g4c06e63b9203 #1 Tainted: G E N ----------------------------------------------------- kswapd0/66 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffff000011ffd600 (&eb->refs_lock){+.+.}-{3:3}, at: try_release_extent_buffer+0x18c/0x560 and this task is already holding: ffff0000c1d91b88 (&buffer_xa_class){-.-.}-{3:3}, at: try_release_extent_buffer+0x13c/0x560 which would create a new lock dependency: (&buffer_xa_class){-.-.}-{3:3} -> (&eb->refs_lock){+.+.}-{3:3} but this new dependency connects a HARDIRQ-irq-safe lock: (&buffer_xa_class){-.-.}-{3:3} ... which became HARDIRQ-irq-safe at: lock_acquire+0x178/0x358 _raw_spin_lock_irqsave+0x60/0x88 buffer_tree_clear_mark+0xc4/0x160 end_bbio_meta_write+0x238/0x398 btrfs_bio_end_io+0x1f8/0x330 btrfs_orig_write_end_io+0x1c4/0x2c0 bio_endio+0x63c/0x678 blk_update_request+0x1c4/0xa00 blk_mq_end_request+0x54/0x88 virtblk_request_done+0x124/0x1d0 blk_mq_complete_request+0x84/0xa0 virtblk_done+0x130/0x238 vring_interrupt+0x130/0x288 __handle_irq_event_percpu+0x1e8/0x708 handle_irq_event+0x98/0x1b0 handle_fasteoi_irq+0x264/0x7c0 generic_handle_domain_irq+0xa4/0x108 gic_handle_irq+0x7c/0x1a0 do_interrupt_handler+0xe4/0x148 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x14/0x20 el1h_64_irq+0x6c/0x70 _raw_spin_unlock_irq+0x38/0x70 __run_timer_base+0xdc/0x5e0 run_timer_softirq+0xa0/0x138 handle_softirqs.llvm.13542289750107964195+0x32c/0xbd0 ____do_softirq.llvm.17674514681856217165+0x18/0x28 call_on_irq_stack+0x24/0x30 __irq_exit_rcu+0x164/0x430 irq_exit_rcu+0x18/0x88 el1_interrupt+0x34/0x50 el1h_64_irq_handler+0x14/0x20 el1h_64_irq+0x6c/0x70 arch_local_irq_enable+0x4/0x8 do_idle+0x1a0/0x3b8 cpu_startup_entry+0x60/0x80 rest_init+0x204/0x228 start_kernel+0x394/0x3f0 __primary_switched+0x8c/0x8958 to a HARDIRQ-irq-unsafe lock: (&eb->refs_lock){+.+.}-{3:3} ... which became HARDIRQ-irq-unsafe at: ... lock_acquire+0x178/0x358 _raw_spin_lock+0x4c/0x68 free_extent_buffer_stale+0x2c/0x170 btrfs_read_sys_array+0x1b0/0x338 open_ctree+0xeb0/0x1df8 btrfs_get_tree+0xb60/0x1110 vfs_get_tree+0x8c/0x250 fc_mount+0x20/0x98 btrfs_get_tree+0x4a4/0x1110 vfs_get_tree+0x8c/0x250 do_new_mount+0x1e0/0x6c0 path_mount+0x4ec/0xa58 __arm64_sys_mount+0x370/0x490 invoke_syscall+0x6c/0x208 el0_svc_common+0x14c/0x1b8 do_el0_svc+0x4c/0x60 el0_svc+0x4c/0x160 el0t_64_sync_handler+0x70/0x100 el0t_64_sync+0x168/0x170 other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&eb->refs_lock); local_irq_disable(); lock(&buffer_xa_class); lock(&eb->refs_lock); <Interrupt> lock(&buffer_xa_class); *** DEADLOCK *** 2 locks held by kswapd0/66: #0: ffff800085506e40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xe8/0xe50 #1: ffff0000c1d91b88 (&buffer_xa_class){-.-.}-{3:3}, at: try_release_extent_buffer+0x13c/0x560 Link: https://www.kernel.org/doc/Documentation/locking/lockdep-design.rst#:~:text=Multi%2Dlock%20dependency%20rules%3A Fixes: 19d7f65f032f ("btrfs: convert the buffer_radix to an xarray") CC: stable@vger.kernel.org # 6.16+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Leo Martins <loemra.dev@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
12 dayssmb: client: make use of smbdirect_socket.{send,recv}_io.mem.{cache,pool}Stefan Metzmacher2-40/+34
This will allow common helper functions to be created later. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: smbdirect: add smbdirect_socket.{send,recv}_io.mem.{cache,pool}Stefan Metzmacher1-0/+23
This will be the common location memory caches and pools. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: make use of struct smbdirect_send_ioStefan Metzmacher2-38/+23
The server will also use this soon, so that we can split out common helper functions in future. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: smbdirect: introduce struct smbdirect_send_ioStefan Metzmacher1-0/+24
This will be used in client and server soon in order to replace smbd_request/smb_direct_sendmsg. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: make use of SMBDIRECT_RECV_IO_MAX_SGEStefan Metzmacher2-5/+2
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: smbdirect: add SMBDIRECT_RECV_IO_MAX_SGEStefan Metzmacher1-0/+7
This will allow the client and server specific defines to be replaced. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: remove unused enum smbd_connection_statusStefan Metzmacher1-10/+0
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: make use of smbdirect_socket.recv_io.reassembly.*Stefan Metzmacher3-61/+46
This will be used by the server too and will allow us to create common helper functions. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: smbdirect: introduce smbdirect_socket.recv_io.reassembly.*Stefan Metzmacher1-0/+26
This will be used in common between client and server soon. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: make use of smb: smbdirect_socket.recv_io.free.{list,lock}Stefan Metzmacher2-15/+13
This will be used by the server too in order to have common helper functions in future. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: smbdirect: introduce smbdirect_socket.recv_io.free.{list,lock}Stefan Metzmacher1-0/+9
This will allow the list of free smbdirect_recv_io messages including the spinlock to be in common between client and server in order to split out common helper functions in future. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: make use of struct smbdirect_recv_ioStefan Metzmacher2-54/+41
This is the shared structure that will be used in the server too and will allow us to move helper functions into common code soon. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: smbdirect: introduce struct smbdirect_recv_ioStefan Metzmacher1-0/+15
This will be used in client and server soon in order to replace smbd_response/smb_direct_recvmsg. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: make use of smbdirect_socket->recv_io.expectedStefan Metzmacher2-15/+14
The expected incoming message type can be per connection. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: smbdirect: introduce smbdirect_socket.recv_io.expectedStefan Metzmacher1-0/+14
The expected message type can be global as they never change during the after negotiation process. This will replace smbd_response->type and smb_direct_recvmsg->type in future. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: remove unused smbd_connection->fragment_reassembly_remainingStefan Metzmacher2-3/+0
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: let recv_done() avoid touching data_transfer after cleanup/moveStefan Metzmacher1-14/+11
Calling enqueue_reassembly() and wake_up_interruptible(&info->wait_reassembly_queue) or put_receive_buffer() means the response/data_transfer pointer might get re-used by another thread, which means these should be the last operations before calling return. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: let recv_done() cleanup before notifying the callers.Stefan Metzmacher1-6/+8
We should call put_receive_buffer() before waking up the callers. For the internal error case of response->type being unexpected, we now also call smbd_disconnect_rdma_connection() instead of not waking up the callers at all. Note that the SMBD_TRANSFER_DATA case still has problems, which will be addressed in the next commit in order to make it easier to review this one. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: make sure we call ib_dma_unmap_single() only if we called ↵Stefan Metzmacher1-2/+9
ib_dma_map_single already In case of failures either ib_dma_map_single() might not be called yet or ib_dma_unmap_single() was already called. We should make sure put_receive_buffer() only calls ib_dma_unmap_single() if needed. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: remove separate empty_packet_queueStefan Metzmacher3-65/+7
There's no need to maintain two lists, we can just have a single list of receive buffers, which are free to use. It just added unneeded complexity and resulted in ib_dma_unmap_single() not being called from recv_done() for empty keepalive packets. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: client: let send_done() cleanup before calling ↵Stefan Metzmacher1-6/+8
smbd_disconnect_rdma_connection() We should call ib_dma_unmap_single() and mempool_free() before calling smbd_disconnect_rdma_connection(). And smbd_disconnect_rdma_connection() needs to be the last function to call as all other state might already be gone after it returns. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayscifs: Fix null-ptr-deref by static initializing global lockYunseong Kim1-4/+2
A kernel panic can be triggered by reading /proc/fs/cifs/debug_dirs. The crash is a null-ptr-deref inside spin_lock(), caused by the use of the uninitialized global spinlock cifs_tcp_ses_lock. init_cifs() └── cifs_proc_init() └── // User can access /proc/fs/cifs/debug_dirs here └── cifs_debug_dirs_proc_show() └── spin_lock(&cifs_tcp_ses_lock); // Uninitialized! KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [dfff800000000000] address between user and kernel address ranges Internal error: Oops: 0000000096000005 [#1] SMP Modules linked in: CPU: 3 UID: 0 PID: 16435 Comm: stress-ng-procf Not tainted 6.16.0-10385-g79f14b5d84c6 #37 PREEMPT Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8ubuntu1 06/11/2025 pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : do_raw_spin_lock+0x84/0x2cc lr : _raw_spin_lock+0x24/0x34 sp : ffff8000966477e0 x29: ffff800096647860 x28: ffff800096647b88 x27: ffff0001c0c22070 x26: ffff0003eb2b60c8 x25: ffff0001c0c22018 x24: dfff800000000000 x23: ffff0000f624e000 x22: ffff0003eb2b6020 x21: ffff0000f624e768 x20: 0000000000000004 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: ffff8000804b9600 x15: ffff700012cc8f04 x14: 1ffff00012cc8f04 x13: 0000000000000004 x12: ffffffffffffffff x11: 1ffff00012cc8f00 x10: ffff80008d9af0d2 x9 : f3f3f304f1f1f1f1 x8 : 0000000000000000 x7 : 7365733c203e6469 x6 : 20656572743c2023 x5 : ffff0000e0ce0044 x4 : ffff80008a4deb6e x3 : ffff8000804b9718 x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: do_raw_spin_lock+0x84/0x2cc (P) _raw_spin_lock+0x24/0x34 cifs_debug_dirs_proc_show+0x1ac/0x4c0 seq_read_iter+0x3b0/0xc28 proc_reg_read_iter+0x178/0x2a8 vfs_read+0x5f8/0x88c ksys_read+0x120/0x210 __arm64_sys_read+0x7c/0x90 invoke_syscall+0x98/0x2b8 el0_svc_common+0x130/0x23c do_el0_svc+0x48/0x58 el0_svc+0x40/0x140 el0t_64_sync_handler+0x84/0x12c el0t_64_sync+0x1ac/0x1b0 Code: aa0003f3 f9000feb f2fe7e69 f8386969 (38f86908) ---[ end trace 0000000000000000 ]--- The root cause is an initialization order problem. The lock is declared as a global variable and intended to be initialized during module startup. However, the procfs entry that uses this lock can be accessed by userspace before the spin_lock_init() call has run. This creates a race window where reading the proc file will attempt to use the lock before it is initialized, leading to the crash. For a global lock with a static lifetime, the correct and robust approach is to use compile-time initialization. Fixes: 844e5c0eb176 ("smb3 client: add way to show directory leases for improved debugging") Signed-off-by: Yunseong Kim <ysk@kzalloc.com> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: server: let recv_done() avoid touching data_transfer after cleanup/moveStefan Metzmacher1-5/+7
Calling enqueue_reassembly() and wake_up_interruptible(&t->wait_reassembly_queue) or put_receive_buffer() means the recvmsg/data_transfer pointer might get re-used by another thread, which means these should be the last operations before calling return. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: server: let recv_done() consistently call ↵Stefan Metzmacher1-5/+13
put_recvmsg/smb_direct_disconnect_rdma_connection We should call put_recvmsg() before smb_direct_disconnect_rdma_connection() in order to call it before waking up the callers. In all error cases we should call smb_direct_disconnect_rdma_connection() in order to avoid stale connections. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb: server: make sure we call ib_dma_unmap_single() only if we called ↵Stefan Metzmacher1-2/+9
ib_dma_map_single already In case of failures either ib_dma_map_single() might not be called yet or ib_dma_unmap_single() was already called. We should make sure put_recvmsg() only calls ib_dma_unmap_single() if needed. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>