summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2025-06-16bcachefs: Delay calculation of trans->journal_u64sAlan Huang1-4/+7
When there is commit error that need split btree leaf, fsck might change the value of trans->journal_entries.u64s, when retry commit, the value of trans->journal_u64s would be incorrect, which will lead to trans->journal_res.u64s underflow, and then out of bounds write will occur: [ 464.496970][T11969] Call trace: [ 464.496973][T11969] show_stack+0x3c/0x88 (C) [ 464.496995][T11969] dump_stack_lvl+0xf8/0x178 [ 464.497014][T11969] dump_stack+0x20/0x30 [ 464.497031][T11969] __bch2_trans_log_str+0x344/0x350 [ 464.497048][T11969] bch2_trans_log_str+0x3c/0x60 [ 464.497065][T11969] __bch2_fsck_err+0x11bc/0x1390 [ 464.497083][T11969] bch2_check_discard_freespace_key+0xad4/0x10d0 [ 464.497100][T11969] bch2_bucket_alloc_freelist+0x99c/0x1130 [ 464.497117][T11969] bch2_bucket_alloc_trans+0x79c/0xcb8 [ 464.497133][T11969] bch2_bucket_alloc_set_trans+0x378/0xc20 [ 464.497151][T11969] __open_bucket_add_buckets+0x7fc/0x1c00 [ 464.497168][T11969] open_bucket_add_buckets+0x184/0x3a8 [ 464.497185][T11969] bch2_alloc_sectors_start_trans+0xa04/0x1da0 [ 464.497203][T11969] bch2_btree_reserve_get+0x6e0/0xef0 [ 464.497220][T11969] bch2_btree_update_start+0x1618/0x2600 [ 464.497239][T11969] bch2_btree_split_leaf+0xcc/0x730 [ 464.497258][T11969] bch2_trans_commit_error+0x22c/0xc30 [ 464.497276][T11969] __bch2_trans_commit+0x207c/0x4e30 [ 464.497292][T11969] bch2_journal_replay+0x9e0/0x1420 [ 464.497305][T11969] __bch2_run_recovery_passes+0x458/0xf98 [ 464.497318][T11969] bch2_run_recovery_passes+0x280/0x478 [ 464.497331][T11969] bch2_fs_recovery+0x24f0/0x3a28 [ 464.497344][T11969] bch2_fs_start+0xb80/0x1248 [ 464.497358][T11969] bch2_fs_get_tree+0xe94/0x1708 [ 464.497377][T11969] vfs_get_tree+0x84/0x2d0 Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: Add missing EBUG_ONAlan Huang1-0/+2
Just like the EBUG_ON in bch2_journal_add_entry(). Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: Fix alloc_req use after freeAlan Huang2-14/+35
Now the alloc_req is allocated from the bump allocator, if there is reallocation, the memory of alloc_req would be frees, fix by delaying the reallocation to transaction restart, it has to restart anyway. Reported-by: syzbot+2887a13a5c387e616a68@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: Don't allocate new memory when mempool is exhaustedAlan Huang1-25/+4
Allocating new memory when mempool is exhausted is too complicated, just return ENOMEM is fine. memcpy is not needed, since there might be pointers point to the old memory, that's the bug. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: btree iter tracepointsKent Overstreet2-5/+82
We've been seeing some livelock-ish behavior in the index update part of the main write path, and while we've got low level btree path tracepoints, we've been lacking high level btree iterator tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: trace_extent_trim_atomicKent Overstreet2-1/+17
Add a tracepoint for when we insert only part of an extent, due to too many overwrites. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-14Merge tag 'v6.16-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds4-22/+33
Pull smb client fixes from Steve French: - SMB3.1.1 POSIX extensions fix for char remapping - Fix for repeated directory listings when directory leases enabled - deferred close handle reuse fix * tag 'v6.16-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: improve directory cache reuse for readdir operations smb: client: fix perf regression with deferred closes smb: client: disable path remapping with POSIX extensions
2025-06-13NFSD: Avoid corruption of a referring call listChuck Lever1-0/+1
The new code neglects to remove a freshly-allocated RCL from the callback's referring call list when no matching referring call is found. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202505171002.cE46sdj5-lkp@intel.com/ Fixes: 4f3c8d8c9e10 ("NFSD: Implement CB_SEQUENCE referring call lists") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-06-12smb: improve directory cache reuse for readdir operationsBharath SM2-17/+19
Currently, cached directory contents were not reused across subsequent 'ls' operations because the cache validity check relied on comparing the ctx pointer, which changes with each readdir invocation. As a result, the cached dir entries was not marked as valid and the cache was not utilized for subsequent 'ls' operations. This change uses the file pointer, which remains consistent across all readdir calls for a given directory instance, to associate and validate the cache. As a result, cached directory contents can now be correctly reused, improving performance for repeated directory listings. Performance gains with local windows SMB server: Without the patch and default actimeo=1: 1000 directory enumeration operations on dir with 10k files took 135.0s With this patch and actimeo=0: 1000 directory enumeration operations on dir with 10k files took just 5.1s Signed-off-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-12smb: client: fix perf regression with deferred closesPaulo Alcantara1-3/+6
Customer reported that one of their applications started failing to open files with STATUS_INSUFFICIENT_RESOURCES due to NetApp server hitting the maximum number of opens to same file that it would allow for a single client connection. It turned out the client was failing to reuse open handles with deferred closes because matching ->f_flags directly without masking off O_CREAT|O_EXCL|O_TRUNC bits first broke the comparision and then client ended up with thousands of deferred closes to same file. Those bits are already satisfied on the original open, so no need to check them against existing open handles. Reproducer: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <fcntl.h> #include <pthread.h> #define NR_THREADS 4 #define NR_ITERATIONS 2500 #define TEST_FILE "/mnt/1/test/dir/foo" static char buf[64]; static void *worker(void *arg) { int i, j; int fd; for (i = 0; i < NR_ITERATIONS; i++) { fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_APPEND, 0666); for (j = 0; j < 16; j++) write(fd, buf, sizeof(buf)); close(fd); } } int main(int argc, char *argv[]) { pthread_t t[NR_THREADS]; int fd; int i; fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_TRUNC, 0666); close(fd); memset(buf, 'a', sizeof(buf)); for (i = 0; i < NR_THREADS; i++) pthread_create(&t[i], NULL, worker, NULL); for (i = 0; i < NR_THREADS; i++) pthread_join(t[i], NULL); return 0; } Before patch: $ mount.cifs //srv/share /mnt/1 -o ... $ mkdir -p /mnt/1/test/dir $ gcc repro.c && ./a.out ... number of opens: 1391 After patch: $ mount.cifs //srv/share /mnt/1 -o ... $ mkdir -p /mnt/1/test/dir $ gcc repro.c && ./a.out ... number of opens: 1 Cc: linux-cifs@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Cc: Jay Shin <jaeshin@redhat.com> Cc: Pierguido Lambri <plambri@redhat.com> Fixes: b8ea3b1ff544 ("smb: enable reuse of deferred file handles for write operations") Acked-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-12fs: unlock the superblock during iterate_supers_typeDarrick J. Wong1-1/+3
This function takes super_lock in shared mode, so it should release the same lock. Cc: stable@vger.kernel.org # v6.16-rc1 Fixes: af7551cf13cf7f ("super: remove pointless s_root checks") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Link: https://lore.kernel.org/20250611164044.GF6138@frogsfrogsfrogs Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-12ovl: fix debug print in case of mkdir errorAmir Goldstein1-3/+5
We want to print the name in case of mkdir failure and now we will get a cryptic (efault) as name. Fixes: c54b386969a5 ("VFS: Change vfs_mkdir() to return the dentry.") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250612072245.2825938-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-12bcachefs: Don't trace should_be_locked unless changingKent Overstreet1-2/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Ensure that snapshot creation propagates has_case_insensitiveKent Overstreet1-0/+10
We normally can't create a new directory with the case-insensitive option already set - except when we're creating a snapshot. And if casefolding is enabled filesystem wide, we should still set it even though not strictly required, for consistency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Print devices we're mounting on multi device filesystemsKent Overstreet1-6/+14
Previously, we only ever logged the filesystem UUID. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Don't trust sb->nr_devices in members_to_text()Kent Overstreet1-4/+30
We have to be able to print superblock sections even if they fail to validate (for debugging), so we have to calculate the number of entries from the field size. Reported-by: syzbot+5138f00559ffb3cb3610@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Fix version checks in validate_bset()Kent Overstreet1-5/+11
It seems btree node scan picked up a partially overwritten btree node, and corrected the "bset version older than sb version_min" error - resulting in an invalid superblock with a bad version_min field. Don't run this check at all when we're in btree node scan, and when we do run it, do something saner if the bset version is totally crazy. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: ioctl: avoid stack overflow warningArnd Bergmann1-2/+2
Multiple ioctl handlers individually use a lot of stack space, and clang chooses to inline them into the bch2_fs_ioctl() function, blowing through the warning limit: fs/bcachefs/chardev.c:655:6: error: stack frame size (1032) exceeds limit (1024) in 'bch2_fs_ioctl' [-Werror,-Wframe-larger-than] 655 | long bch2_fs_ioctl(struct bch_fs *c, unsigned cmd, void __user *arg) By marking the largest two of them as noinline_for_stack, no indidual code path ends up using this much, which avoids the warning and reduces the possible total stack usage in the ioctl handler. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Don't pass trans to fsck_err() in gc_accounting_doneKent Overstreet1-1/+3
fsck_err() can return a transaction restart if passed a transaction object - this has always been true when it has to drop locks to prompt for user input, but we're seeing this more now that we're logging the error being corrected in the journal. gc_accounting_done() doesn't call fsck_err() from an actual commit loop, and it doesn't need to be holding btree locks when it calls fsck_err(), so the easy fix here for the unhandled transaction restart is to just not pass it the transaction object. We'll miss out on the fancy new logging, but that's ok. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Fix leak in bch2_fs_recovery() error pathKent Overstreet1-4/+4
Fix a small leak of the superblock 'clean' section. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Fix rcu_pending for PREEMPT_RTKent Overstreet1-11/+11
PREEMPT_RT redefines how standard spinlocks work, so local_irq_save() + spin_lock() is no longer equivalent to spin_lock_irqsave(). Fortunately, we don't strictly need to do it that way. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Fix downgrade_table_extra()Kent Overstreet1-1/+4
Fix a UAF: we were calling darray_make_room() and retaining a pointer to the old buffer. And fix an UBSAN warning: struct bch_sb_field_downgrade_entry uses __counted_by, so set dst->nr_errors before assigning to the array entry. Reported-by: syzbot+14c52d86ddbd89bea13e@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Don't put rhashtable on stackKent Overstreet1-9/+13
Object debugging generally needs special provisions for putting said objects on the stack, which rhashtable does not have. Reported-by: syzbot+bcc38a9556d0324c2ec2@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Make sure opts.read_only gets propagated back to VFSKent Overstreet2-1/+11
If we think we're read-only but the VFS doesn't, fun will ensue. And now that we know we have to be able to do this safely, just make nochanges imply ro. Reported-by: syzbot+a7d6ceaba099cc21dee4@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Fix possible console lock involved deadlockAlan Huang6-20/+8
Link: https://lore.kernel.org/all/6822ab02.050a0220.f2294.00cb.GAE@google.com/T/ Reported-by: syzbot+2c3ef91c9523c3d1a25c@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: mark more errors autofixKent Overstreet1-4/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Don't persistently run scan_for_btree_nodesKent Overstreet2-3/+13
bch2_btree_lost_data() gets called on btree node read error, but the error might be transient. btree_node_scan is expensive, and there's no need to run it persistently (marking it in the superblock as required to run) - check_topology will run it if required, via bch2_get_scanned_nodes(). Running it non-persistently is fine, to avoid check_topology having to rewind recovery to run it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Read error message now prints if self healingKent Overstreet2-2/+10
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Only run 'increase_depth' for keys from btree node csanKent Overstreet1-1/+12
bch2_btree_increase_depth() was originally for disaster recovery, to get some data back from the journal when a btree root was bad. We don't need it for that purpose anymore; on bad btree root we'll launch btree node scan and reconstruct all the interior nodes. If there's a key in the journal for a depth that doesn't exists, and it's not from check_topology/btree node scan, we should just ignore it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Mark need_discard_freespace_key_bad autofixKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Update /dev/disk/by-uuid on device addKent Overstreet1-0/+16
Invalidate pagecache after we write the new superblock and send a uevent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Add more flags to btree nodes for rewrite reasonKent Overstreet4-6/+48
It seems excessive forced btree node rewrites can cause interior btree updates to become wedged during recovery, before we're using the write buffer for backpointer updates. Add more flags so we can determine where these are coming from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Add range being updated to btree_update_to_text()Kent Overstreet2-2/+31
We had a deadlock during recovery where interior btree updates became wedged and all open_buckets were consumed; start adding more introspection. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Log fsck errors in the journalKent Overstreet1-0/+3
Log the specific error being corrected in the journal when we're repairing, this helps greatly with 'bcachefs list_journal' analysis. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-12bcachefs: Add missing restart handling to check_topology()Kent Overstreet1-35/+60
The next patch will add logging of the specific error being corrected in repair paths to the journal; this means __bch2_fsck_err() can return transaction restarts in places that previously weren't expecting them. check_topology() is old code that doesn't use btree iterators for btree node locking - it'll have to be rewritten in the future to work online. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11VFS: change try_lookup_noperm() to skip revalidationNeilBrown1-4/+13
The recent change from using d_hash_and_lookup() to using try_lookup_noperm() inadvertently introduce a d_revalidate() call when the lookup was successful. Steven French reports that this resulted in worse than halving of performance in some cases. Prior to the offending patch the only caller of try_lookup_noperm() was autofs which does not need the d_revalidate(). So it is safe to remove the d_revalidate() call providing we stop using try_lookup_noperm() to implement lookup_noperm(). The "try_" in the name is strongly suggestive that the caller isn't expecting much effort, so it seems reasonable to avoid the effort of d_revalidate(). Fixes: 06c567403ae5 ("Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS") Reported-by: Steve French <smfrench@gmail.com> Link: https://lore.kernel.org/all/CAH2r5mu5SfBrdc2CFHwzft8=n9koPMk+Jzwpy-oUMx-wCRCesQ@mail.gmail.com/ Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/174951744454.608730.18354002683881684261@noble.neil.brown.name Tested-by: Steve French <stfrench@microsoft.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-10f2fs: Fix __write_node_folio() conversionMatthew Wilcox (Oracle)1-1/+0
This conversion moved the folio_unlock() to inside __write_node_folio(), but missed one caller so we had a double-unlock on this path. Cc: Christoph Hellwig <hch@lst.de> Cc: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Reported-by: syzbot+c0dc46208750f063d0e0@syzkaller.appspotmail.com Fixes: 80f31d2a7e5f (f2fs: return bool from __write_node_folio) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-06-09smb: client: disable path remapping with POSIX extensionsPhilipp Kerling1-2/+8
If SMB 3.1.1 POSIX Extensions are available and negotiated, the client should be able to use all characters and not remap anything. Currently, the user has to explicitly request this behavior by specifying the "nomapposix" mount option. Link: https://lore.kernel.org/4195bb677b33d680e77549890a4f4dd3b474ceaf.camel@rx2.rx-server.de Signed-off-by: Philipp Kerling <pkerling@casix.org> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-08Merge tag 'timers-cleanups-2025-06-08' of ↵Linus Torvalds5-5/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanup from Thomas Gleixner: "The delayed from_timer() API cleanup: The renaming to the timer_*() namespace was delayed due massive conflicts against Linux-next. Now that everything is upstream finish the conversion" * tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename from_timer() to timer_container_of()
2025-06-08Merge tag 'x86-urgent-2025-06-08' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - Cure IO bitmap inconsistencies A failed fork cleans up all resources of the newly created thread via exit_thread(). exit_thread() invokes io_bitmap_exit() which does the IO bitmap cleanups, which unfortunately assume that the cleanup is related to the current task, which is obviously bogus. Make it work correctly - A lockdep fix in the resctrl code removed the clearing of the command buffer in two places, which keeps stale error messages around. Bring them back. - Remove unused trace events" * tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex x86/iopl: Cure TIF_IO_BITMAP inconsistencies x86/fpu: Remove unused trace events
2025-06-08Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-42/+71
Pull mount fixes from Al Viro: "Various mount-related bugfixes: - split the do_move_mount() checks in subtree-of-our-ns and entire-anon cases and adapt detached mount propagation selftest for mount_setattr - allow clone_private_mount() for a path on real rootfs - fix a race in call of has_locked_children() - fix move_mount propagation graph breakage by MOVE_MOUNT_SET_GROUP - make sure clone_private_mnt() caller has CAP_SYS_ADMIN in the right userns - avoid false negatives in path_overmount() - don't leak MNT_LOCKED from parent to child in finish_automount() - do_change_type(): refuse to operate on unmounted/not ours mounts" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_change_type(): refuse to operate on unmounted/not ours mounts clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns selftests/mount_setattr: adapt detached mount propagation test do_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases fs: allow clone_private_mount() for a path on real rootfs fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) finish_automount(): don't leak MNT_LOCKED from parent to child path_overmount(): avoid false negatives fs/fhandle.c: fix a race in call of has_locked_children()
2025-06-08Merge tag '6.16-rc-part2-smb3-client-fixes' of ↵Linus Torvalds12-286/+421
git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: - multichannel/reconnect fixes - move smbdirect (smb over RDMA) defines to fs/smb/common so they will be able to be used in the future more broadly, and a documentation update explaining setting up smbdirect mounts - update email address for Paulo * tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal version number MAINTAINERS, mailmap: Update Paulo Alcantara's email address cifs: add documentation for smbdirect setup cifs: do not disable interface polling on failure cifs: serialize other channels when query server interfaces is pending cifs: deal with the channel loading lag while picking channels smb: client: make use of common smbdirect_socket_parameters smb: smbdirect: introduce smbdirect_socket_parameters smb: client: make use of common smbdirect_socket smb: smbdirect: add smbdirect_socket.h smb: client: make use of common smbdirect.h smb: smbdirect: add smbdirect.h with public structures smb: client: make use of common smbdirect_pdu.h smb: smbdirect: add smbdirect_pdu.h with protocol definitions
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar5-5/+6
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-07Merge tag 'ubifs-for-linus-6.16-rc1' of ↵Linus Torvalds4-4/+13
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull JFFS2 and UBIFS fixes from Richard Weinberger: "JFFS2: - Correctly check return code of jffs2_prealloc_raw_node_refs() UBIFS: - Spelling fixes" * tag 'ubifs-for-linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: jffs2: check jffs2_prealloc_raw_node_refs() result in few other places jffs2: check that raw node were preallocated before writing summary ubifs: Fix grammar in error message
2025-06-07do_change_type(): refuse to operate on unmounted/not ours mountsAl Viro1-0/+4
Ensure that propagation settings can only be changed for mounts located in the caller's mount namespace. This change aligns permission checking with the rest of mount(2). Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: 07b20889e305 ("beginning of the shared-subtree proper") Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-07clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right usernsAl Viro1-0/+3
What we want is to verify there is that clone won't expose something hidden by a mount we wouldn't be able to undo. "Wouldn't be able to undo" may be a result of MNT_LOCKED on a child, but it may also come from lacking admin rights in the userns of the namespace mount belongs to. clone_private_mnt() checks the former, but not the latter. There's a number of rather confusing CAP_SYS_ADMIN checks in various userns during the mount, especially with the new mount API; they serve different purposes and in case of clone_private_mnt() they usually, but not always end up covering the missing check mentioned above. Reviewed-by: Christian Brauner <brauner@kernel.org> Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com> Fixes: 427215d85e8d ("ovl: prevent private clone if bind mount is not allowed") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-07do_move_mount(): split the checks in subtree-of-our-ns and entire-anon casesAl Viro1-21/+25
... and fix the breakage in anon-to-anon case. There are two cases acceptable for do_move_mount() and mixing checks for those is making things hard to follow. One case is move of a subtree in caller's namespace. * source and destination must be in caller's namespace * source must be detachable from parent Another is moving the entire anon namespace elsewhere * source must be the root of anon namespace * target must either in caller's namespace or in a suitable anon namespace (see may_use_mount() for details). * target must not be in the same namespace as source. It's really easier to follow if tests are *not* mixed together... Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: 3b5260d12b1f ("Don't propagate mounts into detached trees") Reported-by: Allison Karlitskaya <lis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-07fs: allow clone_private_mount() for a path on real rootfsKONDO KAZUMA(近藤 和真)1-10/+11
Mounting overlayfs with a directory on real rootfs (initramfs) as upperdir has failed with following message since commit db04662e2f4f ("fs: allow detached mounts in clone_private_mount()"). [ 4.080134] overlayfs: failed to clone upperpath Overlayfs mount uses clone_private_mount() to create internal mount for the underlying layers. The commit made clone_private_mount() reject real rootfs because it does not have a parent mount and is in the initial mount namespace, that is not an anonymous mount namespace. This issue can be fixed by modifying the permission check of clone_private_mount() following [1]. Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()") Link: https://lore.kernel.org/all/20250514190252.GQ2023217@ZenIV/ [1] Link: https://lore.kernel.org/all/20250506194849.GT2023217@ZenIV/ Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kazuma Kondo <kazuma-kondo@nec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-07fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)Al Viro1-1/+1
9ffb14ef61ba "move_mount: allow to add a mount into an existing group" breaks assertions on ->mnt_share/->mnt_slave. For once, the data structures in question are actually documented. Documentation/filesystem/sharedsubtree.rst: All vfsmounts in a peer group have the same ->mnt_master. If it is non-NULL, they form a contiguous (ordered) segment of slave list. do_set_group() puts a mount into the same place in propagation graph as the old one. As the result, if old mount gets events from somewhere and is not a pure event sink, new one needs to be placed next to the old one in the slave list the old one's on. If it is a pure event sink, we only need to make sure the new one doesn't end up in the middle of some peer group. "move_mount: allow to add a mount into an existing group" ends up putting the new one in the beginning of list; that's definitely not going to be in the middle of anything, so that's fine for case when old is not marked shared. In case when old one _is_ marked shared (i.e. is not a pure event sink), that breaks the assumptions of propagation graph iterators. Put the new mount next to the old one on the list - that does the right thing in "old is marked shared" case and is just as correct as the current behaviour if old is not marked shared (kudos to Pavel for pointing that out - my original suggested fix changed behaviour in the "nor marked" case, which complicated things for no good reason). Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: 9ffb14ef61ba ("move_mount: allow to add a mount into an existing group") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-07path_overmount(): avoid false negativesAl Viro1-6/+13
Holding namespace_sem is enough to make sure that result remains valid. It is *not* enough to avoid false negatives from __lookup_mnt(). Mounts can be unhashed outside of namespace_sem (stuck children getting detached on final mntput() of lazy-umounted mount) and having an unrelated mount removed from the hash chain while we traverse it may end up with false negative from __lookup_mnt(). We need to sample and recheck the seqlock component of mount_lock... Bug predates the introduction of path_overmount() - it had come from the code in finish_automount() that got abstracted into that helper. Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: 26df6034fdb2 ("fix automount/automount race properly") Fixes: 6ac392815628 ("fs: allow to mount beneath top mount") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>