summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2025-04-28f2fs: Use a folio in f2fs_quota_read()Matthew Wilcox (Oracle)1-20/+16
Support arbitrary size folios and remove a few hidden calls to compound_head(). Also remove an unnecessary test of the uptodaate flag; if mapping_read_folio_gfp() cannot bring the folio uptodate, it will return an error. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in move_data_block()Matthew Wilcox (Oracle)1-16/+17
Fetch a folio from the pagecache instead of a page and operate on it throughout. Removes eight calls to compound_head() and an access to page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Remove access to page->mapping in f2fs_is_cp_guaranteed()Matthew Wilcox (Oracle)1-1/+1
page->mapping will be removed soon, so call page_folio() on the page that was passed in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in add_ipu_page()Matthew Wilcox (Oracle)1-2/+3
Convert the incoming page to a folio at the start, then use the folio later in the function. Moves a call to page_folio() earlier, and removes an access to page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use bio_for_each_folio_all() in __has_merged_page()Matthew Wilcox (Oracle)3-14/+15
Iterate over each folio rather than each page. Convert f2fs_compress_control_page() to f2fs_compress_control_folio() since this is the only caller. Removes a reference to page->mapping which is going away soon as well as calls to fscrypt_is_bounce_page() and fscrypt_pagecache_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use F2FS_P_SB() in f2fs_is_compressed_page()Matthew Wilcox (Oracle)1-1/+1
Remove a reference to page->mapping which is going away soon. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Introduce fio_inode()Matthew Wilcox (Oracle)3-7/+12
This helper returns the inode associated with the f2fs_io_info. That's a relatively common thing to want, mildly awkward to get and provides one place to change if we decide to record it directly, or change fio->page to fio->folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [Jaegeuk Kim: fix wrong fio_inode conversion] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in f2fs_write_raw_pages()Matthew Wilcox (Oracle)1-10/+12
Convert each page in rpages to a folio before operating on it. Replaces eight calls to compound_head() with one and removes a reference to page->mapping which is going away soon. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in f2fs_compress_free_page()Matthew Wilcox (Oracle)1-3/+6
Convert the incoming page to a folio and operate on it. Removes a reference to page->mapping which is going away soon. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28xfs: stop using set_blocksizeDarrick J. Wong1-4/+11
XFS has its own buffer cache for metadata that uses submit_bio, which means that it no longer uses the block device pagecache for anything. Create a more lightweight helper that runs the blocksize checks and flushes dirty data and use that instead. No more truncating the pagecache because XFS does not use it or care about it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-28fs/ntfs3: Add missing direct_IO in ntfs_aops_cmprLizhi Xu1-0/+5
The ntfs3 can use the page cache directly, so its address_space_operations need direct_IO. Exit ntfs_direct_IO() if it is a compressed file. Fixes: b432163ebd15 ("fs/ntfs3: Update inode->i_mapping->a_ops on compression state") Reported-by: syzbot+e36cc3297bd3afd25e19@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e36cc3297bd3afd25e19 Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-04-28fs/ntfs3: handle hdr_first_de() return valueAndrey Vatoropin1-0/+8
The hdr_first_de() function returns a pointer to a struct NTFS_DE. This pointer may be NULL. To handle the NULL error effectively, it is important to implement an error handler. This will help manage potential errors consistently. Additionally, error handling for the return value already exists at other points where this function is called. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-04-28fs/ntfs3: Drop redundant NULL checkAndrey Vatoropin1-1/+1
Static analysis shows that pointer "mi" cannot be NULL, since it is pre-initialized above. A potential failure when mi equals NULL is processed. Remove the extra NULL check. It is meaningless and harms the readability of the code, since before that the pointer is unconditionally dereferenced. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-04-28omfs: convert to new mount APIPavel Reichl1-72/+104
Convert the OMFS filesystem to the new mount API. Signed-off-by: Pavel Reichl <preichl@redhat.com> Link: https://lore.kernel.org/20250423220001.1535071-1-preichl@redhat.com Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-28fs/xattr.c: fix simple_xattr_list to always include security.* xattrsStephen Smalley1-0/+24
The vfs has long had a fallback to obtain the security.* xattrs from the LSM when the filesystem does not implement its own listxattr, but shmem/tmpfs and kernfs later gained their own xattr handlers to support other xattrs. Unfortunately, as a side effect, tmpfs and kernfs-based filesystems like sysfs no longer return the synthetic security.* xattr names via listxattr unless they are explicitly set by userspace or initially set upon inode creation after policy load. coreutils has recently switched from unconditionally invoking getxattr for security.* for ls -Z via libselinux to only doing so if listxattr returns the xattr name, breaking ls -Z of such inodes. Before: $ getfattr -m.* /run/initramfs <no output> $ getfattr -m.* /sys/kernel/fscaps <no output> $ setfattr -n user.foo /run/initramfs $ getfattr -m.* /run/initramfs user.foo After: $ getfattr -m.* /run/initramfs security.selinux $ getfattr -m.* /sys/kernel/fscaps security.selinux $ setfattr -n user.foo /run/initramfs $ getfattr -m.* /run/initramfs security.selinux user.foo Link: https://lore.kernel.org/selinux/CAFqZXNtF8wDyQajPCdGn=iOawX4y77ph0EcfcqcUUj+T87FKyA@mail.gmail.com/ Link: https://lore.kernel.org/selinux/20250423175728.3185-2-stephen.smalley.work@gmail.com/ Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Link: https://lore.kernel.org/20250424152822.2719-1-stephen.smalley.work@gmail.com Fixes: b09e0fa4b4ea66266058ee ("tmpfs: implement generic xattr support") Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-28Merge 6.15-rc4 into driver-core-nextGreg Kroah-Hartman140-1284/+1617
We need the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-28nfs: nfs3acl: drop useless assignment in nfs3_get_acl()Sergey Shtylyov1-1/+1
In nfs3_get_acl(), the local variable status is assigned the result of nfs_refresh_inode() inside the *switch* statement, but that value gets overwritten in the next *if* statement's true branch and is completely ignored if that branch isn't taken... Found by Linux Verification Center (linuxtesting.org) with the Svace static analysis tool. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/c32dced7-a4fa-43c0-aafe-ef6c819c2f91@omp.ru Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-04-28nfs: direct: drop useless initializer in nfs_direct_write_completion()Sergey Shtylyov1-1/+1
In nfs_direct_write_completion(), the local variable req isn't used outside the *while* loop and is assigned to right at the start of that loop's body, so its initializer appears useless -- drop it; then move the declaration to the loop body (which happens to have a pointless empty line anyway)... Found by Linux Verification Center (linuxtesting.org) with the Svace static analysis tool. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/416219f5-7983-484b-b5a7-5fb7da9561f7@omp.ru Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-04-28nfs: move the nfs4_data_server_cache into struct nfs_netJeff Layton3-15/+29
Since struct nfs4_pnfs_ds should not be shared between net namespaces, move from a global list of objects to a per-netns list and spinlock. Tested-by: Sargun Dillon <sargun@sargun.me> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/20250410-nfs-ds-netns-v2-2-f80b7979ba80@kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-04-28nfs: don't share pNFS DS connections between net namespacesJeff Layton4-11/+14
Currently, different NFS clients can share the same DS connections, even when they are in different net namespaces. If a containerized client creates a DS connection, another container can find and use it. When the first client exits, the connection will close which can lead to stalls in other clients. Add a net namespace pointer to struct nfs4_pnfs_ds, and compare those value to the caller's netns in _data_server_lookup_locked() when searching for a nfs4_pnfs_ds to match. Reported-by: Omar Sandoval <osandov@osandov.com> Reported-by: Sargun Dillon <sargun@sargun.me> Closes: https://lore.kernel.org/linux-nfs/Z_ArpQC_vREh_hEA@telecaster/ Tested-by: Sargun Dillon <sargun@sargun.me> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/20250410-nfs-ds-netns-v2-1-f80b7979ba80@kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-04-28nfs: handle failure of nfs_get_lock_context in unlock pathLi Lingfeng1-1/+8
When memory is insufficient, the allocation of nfs_lock_context in nfs_get_lock_context() fails and returns -ENOMEM. If we mistakenly treat an nfs4_unlockdata structure (whose l_ctx member has been set to -ENOMEM) as valid and proceed to execute rpc_run_task(), this will trigger a NULL pointer dereference in nfs4_locku_prepare. For example: BUG: kernel NULL pointer dereference, address: 000000000000000c PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 15 UID: 0 PID: 12 Comm: kworker/u64:0 Not tainted 6.15.0-rc2-dirty #60 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 Workqueue: rpciod rpc_async_schedule RIP: 0010:nfs4_locku_prepare+0x35/0xc2 Code: 89 f2 48 89 fd 48 c7 c7 68 69 ef b5 53 48 8b 8e 90 00 00 00 48 89 f3 RSP: 0018:ffffbbafc006bdb8 EFLAGS: 00010246 RAX: 000000000000004b RBX: ffff9b964fc1fa00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: fffffffffffffff4 RDI: ffff9ba53fddbf40 RBP: ffff9ba539934000 R08: 0000000000000000 R09: ffffbbafc006bc38 R10: ffffffffb6b689c8 R11: 0000000000000003 R12: ffff9ba539934030 R13: 0000000000000001 R14: 0000000004248060 R15: ffffffffb56d1c30 FS: 0000000000000000(0000) GS:ffff9ba5881f0000(0000) knlGS:00000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000c CR3: 000000093f244000 CR4: 00000000000006f0 Call Trace: <TASK> __rpc_execute+0xbc/0x480 rpc_async_schedule+0x2f/0x40 process_one_work+0x232/0x5d0 worker_thread+0x1da/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x10d/0x240 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: CR2: 000000000000000c ---[ end trace 0000000000000000 ]--- Free the allocated nfs4_unlockdata when nfs_get_lock_context() fails and return NULL to terminate subsequent rpc_run_task, preventing NULL pointer dereference. Fixes: f30cb757f680 ("NFS: Always wait for I/O completion before unlock") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20250417072508.3850532-1-lilingfeng3@huawei.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-04-28pNFS/flexfiles: Record the RPC errors in the I/O tracepointsTrond Myklebust2-15/+25
When debugging I/O issues, we want to see not just the NFS level errors, but also the RPC level problems, so record both in the tracepoints. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-04-28NFSv4/pnfs: Layoutreturn on close must handle fatal networking errorsTrond Myklebust1-0/+12
If we have a fatal ENETDOWN or ENETUNREACH error, then the layoutreturn on close code should also handle that as fatal, and free the layouts. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2025-04-28NFSv4: Handle fatal ENETDOWN and ENETUNREACH errorsTrond Myklebust1-0/+9
Ensure that the NFSv4 error handling code recognises the RPC_TASK_NETUNREACH_FATAL flag, and handles the ENETDOWN and ENETUNREACH errors accordingly. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2025-04-26pidfs: get rid of __pidfd_prepare()Christian Brauner1-9/+13
Fold it into pidfd_prepare() and rename PIDFD_CLONE to PIDFD_STALE to indicate that the passed pid might not have task linkage and no explicit check for that should be performed. Link: https://lore.kernel.org/20250425-work-pidfs-net-v2-3-450a19461e75@kernel.org Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: David Rheinsberg <david@readahead.eu> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-26pidfs: register pid in pidfsChristian Brauner1-0/+59
Add simple helpers that allow a struct pid to be pinned via a pidfs dentry/inode. If no pidfs dentry exists a new one will be allocated for it. A reference is taken by pidfs on @pid. The reference must be released via pidfs_put_pid(). This will allow AF_UNIX sockets to allocate a dentry for the peer credentials pid at the time they are recorded where we know the task is still alive. When the task gets reaped its exit status is guaranteed to be recorded and a pidfd can be handed out for the reaped task. Link: https://lore.kernel.org/20250425-work-pidfs-net-v2-1-450a19461e75@kernel.org Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: David Rheinsberg <david@readahead.eu> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-26ksmbd: fix use-after-free in session logoffSean Heelan1-4/+0
The sess->user object can currently be in use by another thread, for example if another connection has sent a session setup request to bind to the session being free'd. The handler for that connection could be in the smb2_sess_setup function which makes use of sess->user. Cc: stable@vger.kernel.org Signed-off-by: Sean Heelan <seanheelan@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-26ksmbd: fix use-after-free in kerberos authenticationSean Heelan2-6/+13
Setting sess->user = NULL was introduced to fix the dangling pointer created by ksmbd_free_user. However, it is possible another thread could be operating on the session and make use of sess->user after it has been passed to ksmbd_free_user but before sess->user is set to NULL. Cc: stable@vger.kernel.org Signed-off-by: Sean Heelan <seanheelan@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-26Merge tag 'vfs-6.15-rc4.fixes' of ↵Linus Torvalds9-65/+108
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - For some reason we went from zero to three maintainers for HFS/HFS+ in a matter of days. The lesson to learn from this might just be that we need to threaten code removal more often!? - Fix a regression introduced by enabling large folios for lage logical block sizes. This has caused issues for noref migration with large folios due to sleeping while in an atomic context. New sleeping variants of pagecache lookup helpers are introduced. These helpers take the folio lock instead of the mapping's private spinlock. The problematic users are converted to the sleeping variants and serialize against noref migration. Atomic users will bail on seeing the new BH_Migrate flag. This also shrinks the critical region of the mapping's private lock and the new blocking callers reduce contention on the spinlock for bdev mappings. - Fix two bugs in do_move_mount() when with MOVE_MOUNT_BENEATH. The first bug is using a mountpoint that is located on a mount we're not holding a reference to. The second bug is putting the mountpoint after we've called namespace_unlock() as it's no longer guaranteed that it does stay a mountpoint. - Remove a pointless call to vfs_getattr_nosec() in the devtmpfs code just to query i_mode instead of simply querying the inode directly. This also avoids lifetime issues for the dm code by an earlier bugfix this cycle that moved bdev_statx() handling into vfs_getattr_nosec(). - Fix AT_FDCWD handling with getname_maybe_null() in the xattr code. - Fix a performance regression for files when multiple callers issue a close when it's not the last reference. - Remove a duplicate noinline annotation from pipe_clear_nowait(). * tag 'vfs-6.15-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2) MAINTAINERS: hfs/hfsplus: add myself as maintainer splice: remove duplicate noinline from pipe_clear_nowait devtmpfs: don't use vfs_getattr_nosec to query i_mode fix a couple of races in MNT_TREE_BENEATH handling by do_move_mount() fs: fall back to file_ref_put() for non-last reference mm/migrate: fix sleep in atomic for large folios and buffer heads fs/ext4: use sleeping version of sb_find_get_block() fs/jbd2: use sleeping version of __find_get_block() fs/ocfs2: use sleeping version of __find_get_block() fs/buffer: use sleeping version of __find_get_block() fs/buffer: introduce sleeping flavors for pagecache lookups MAINTAINERS: add HFS/HFS+ maintainers fs/buffer: split locking for pagecache lookups
2025-04-26Merge tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds1-1/+1
Pull ceph fixes from Ilya Dryomov: "A small CephFS encryption-related fix and a dead code cleanup" * tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-client: ceph: Fix incorrect flush end position calculation ceph: Remove osd_client deadcode
2025-04-25Merge tag 'xfs-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-2/+8
Pull xfs fixes from Carlos Maiolino: "This contains a fix for a build failure on some 32-bit architectures and a warning generating docs" * tag 'xfs-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove duplicate Zoned Filesystems sections in admin-guide XFS: fix zoned gc threshold math for 32-bit arches
2025-04-25Merge tag 'bcachefs-2025-04-24' of git://evilpiepirate.org/bcachefsLinus Torvalds32-585/+735
Pull bcachefs fixes from Kent Overstreet: - Case insensitive directories now work - Ciemap now correctly reports on unwritten pagecache data - bcachefs tools 1.25.1 was incorrectly picking unaligned bucket sizes; fix journal and write path bugs this uncovered And assorted smaller fixes... * tag 'bcachefs-2025-04-24' of git://evilpiepirate.org/bcachefs: (24 commits) bcachefs: Rework fiemap transaction restart handling bcachefs: add fiemap delalloc extent detection bcachefs: refactor fiemap processing into extent helper and struct bcachefs: track current fiemap offset in start variable bcachefs: drop duplicate fiemap sync flag bcachefs: Fix btree_iter_peek_prev() at end of inode bcachefs: Make btree_iter_peek_prev() assert more precise bcachefs: Unit test fixes bcachefs: Print mount opts earlier bcachefs: unlink: casefold d_invalidate bcachefs: Fix casefold lookups bcachefs: Casefold is now a regular opts.h option bcachefs: Implement fileattr_(get|set) bcachefs: Allocator now copes with unaligned buckets bcachefs: Start copygc, rebalance threads earlier bcachefs: Refactor bch2_run_recovery_passes() bcachefs: bch2_copygc_wakeup() bcachefs: Fix ref leak in write_super() bcachefs: Change __journal_entry_close() assert to ERO bcachefs: Ensure journal space is block size aligned ...
2025-04-25kernfs: switch global kernfs_rename_lock to per-fs lockJinliang Zheng2-10/+19
The kernfs implementation has big lock granularity(kernfs_rename_lock) so every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the lock. This patch switches the global kernfs_rename_lock to per-fs lock, which put the rwlock into kernfs_root. Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20250415153659.14950-3-alexjlzheng@tencent.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25kernfs: switch global kernfs_idr_lock to per-fs lockJinliang Zheng2-7/+8
The kernfs implementation has big lock granularity(kernfs_idr_lock) so every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the lock. This patch switches the global kernfs_idr_lock to per-fs lock, which put the spinlock into kernfs_root. Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20250415153659.14950-2-alexjlzheng@tencent.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25fs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2)Jan Kara1-2/+2
Currently, setxattrat(2) and getxattrat(2) are wrongly handling the calls of the from setxattrat(AF_FDCWD, NULL, AT_EMPTY_PATH, ...) and fail with -EBADF error instead of operating on CWD. Fix it. Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/20250424132246.16822-2-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-25splice: remove duplicate noinline from pipe_clear_nowaitT.J. Mercier1-1/+1
pipe_clear_nowait has two noinline macros, but we only need one. I checked the whole tree, and this is the only occurrence: $ grep -r "noinline .* noinline" fs/splice.c:static noinline void noinline pipe_clear_nowait(struct file *file) $ Fixes: 0f99fc513ddd ("splice: clear FMODE_NOWAIT on file if splice/vmsplice is used") Signed-off-by: "T.J. Mercier" <tjmercier@google.com> Link: https://lore.kernel.org/20250423180025.2627670-1-tjmercier@google.com Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-25bcachefs: Rework fiemap transaction restart handlingKent Overstreet1-113/+112
Restart handling in the previous patch was incorrect, so: move btree operations into a separate helper, and run it with a lockrestart_do(). Additionally, clarify whether pagecache or the btree takes precedence. Right now, the btree takes precedence: this is incorrect, but it's needed to pass fstests. Add a giant comment explaining why. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-25bcachefs: add fiemap delalloc extent detectionBrian Foster1-7/+112
bcachefs currently populates fiemap data from the extents btree. This works correctly when the fiemap sync flag is provided, but if not, it skips all delalloc extents that have not yet been flushed. This is because delalloc extents from buffered writes are first stored as reservation in the pagecache, and only become resident in the extents btree after writeback completes. Update the fiemap implementation to process holes between extents by scanning pagecache for data, via seek data/hole. If a valid data range is found over a hole in the extent btree, fake up an extent key and flag the extent as delalloc for reporting to userspace. Note that this does not necessarily change behavior for the case where there is dirty pagecache over already written extents, where when in COW mode, writeback will allocate new blocks for the underlying ranges. The existing behavior is consistent with btrfs and it is recommended to use the sync flag for the most up to date extent state from fiemap. Signed-off-by: Brian Foster <bfoster@redhat.com>
2025-04-25bcachefs: refactor fiemap processing into extent helper and structBrian Foster1-34/+53
The bulk of the loop in bch2_fiemap() involves processing the current extent key from the iter, including following indirections and trimming the extent size and such. This patch makes a few changes to reduce the size of the loop and facilitate future changes to support delalloc extents. Define a new bch_fiemap_extent structure to wrap the bkey buffer that holds the extent key to report to userspace along with associated fiemap flags. Update bch2_fill_extent() to take the bch_fiemap_extent as a param instead of the individual fields. Finally, lift the bulk of the extent processing into a bch2_fiemap_extent() helper that takes the current key and formats the bch_fiemap_extent appropriately for the fill function. No functional changes intended by this patch. Signed-off-by: Brian Foster <bfoster@redhat.com>
2025-04-25bcachefs: track current fiemap offset in start variableBrian Foster1-2/+2
Signed-off-by: Brian Foster <bfoster@redhat.com>
2025-04-25bcachefs: drop duplicate fiemap sync flagBrian Foster1-1/+1
FIEMAP_FLAG_SYNC handling was deliberately moved into core code in commit 45dd052e67ad ("fs: handle FIEMAP_FLAG_SYNC in fiemap_prep"), released in kernel v5.8. Update bcachefs accordingly. Signed-off-by: Brian Foster <bfoster@redhat.com>
2025-04-25bcachefs: Fix btree_iter_peek_prev() at end of inodeKent Overstreet1-1/+4
At the end of the inode, on an extents iterator, peek_slot() has to advance to the next position to avoid returning a 0 size extent, which is not allowed. Changing iter->pos confuses peek_prev(), but we don't need to call peek_slot() in this case. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-25bcachefs: Make btree_iter_peek_prev() assert more preciseKent Overstreet1-1/+1
The issue this assert is guarding against is that in BTREE_ITER_filter_snapshots mode we only want to be iterating within a single inode number - if we iterate into another inode number with keys for a different snapshot tree, we'll loop arbitrarily long before finding a key we can return. This comes up in the unit tests, where we're using inode 0 for our test keys. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-25bcachefs: Unit test fixesKent Overstreet1-0/+4
The peek_end() tests expect an empty btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-25bcachefs: Print mount opts earlierKent Overstreet1-39/+37
If we aren't mounting with the correct degraded option, it's helpful to know that before we fail to mount degraded. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-25bcachefs: unlink: casefold d_invalidateKent Overstreet1-0/+5
casefolding results in additional aliases on lookup for the non-casefolded names - these need invalidating on unlink. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-25bcachefs: Fix casefold lookupsKent Overstreet3-17/+25
Add casefolding to bch2_lookup_trans: During the delay between when casefolding was written and when it was merged, the main filesystem lookup path grew self healing - which meant it was no longer using bch2_dirent_lookup_trans(), where casefolding on lookups happens. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-25bcachefs: Casefold is now a regular opts.h optionKent Overstreet7-22/+43
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-25gfs2: Fix usage of bio->bi_status in gfs2_end_log_writeAndrew Price1-2/+4
bio->bi_status is an index into the blk_errors array, not an errno. Its __bitwise tag is cast away here, resulting in a sparse warning: fs/gfs2/lops.c:207:22: warning: cast from restricted blk_status_t We could either add __force to the cast and continue logging bi_status in the error message, or we could look up the errno in the array and log that. As sdp->sd_log_error is used as an errno in all other cases, look up the errno here for consistency. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-25gfs2: deallocate inodes in gfs2_create_inodeAndreas Gruenbacher2-13/+20
When creating and destroying inodes, we are relying on the inode hash table to make sure that for a given inode number, only a single inode will exist. We then link that inode to its inode and iopen glock and let those glocks point back at the inode. However, when iget_failed() is called, the inode is removed from the inode hash table before gfs_evict_inode() is called, and uniqueness is no longer guaranteed. Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work around that problem by detaching the inode glock from the inode before calling iget_failed(), but that broke the inode deallocation code in gfs_evict_inode(). To fix that, deallocate partially created inodes in gfs2_create_inode() instead of relying on gfs_evict_inode() for doing that. This means that gfs2_evict_inode() and its helper functions will no longer see partially created inodes, and so some simplifications are possible there. Fixes: 9ffa18884cce ("gfs2: gl_object races fix") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>