Age | Commit message (Collapse) | Author | Files | Lines |
|
Support arbitrary size folios and remove a few hidden calls to
compound_head(). Also remove an unnecessary test of the uptodaate flag;
if mapping_read_folio_gfp() cannot bring the folio uptodate, it will
return an error.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fetch a folio from the pagecache instead of a page and operate on it
throughout. Removes eight calls to compound_head() and an access to
page->mapping.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
page->mapping will be removed soon, so call page_folio() on the
page that was passed in.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert the incoming page to a folio at the start, then use the
folio later in the function. Moves a call to page_folio() earlier,
and removes an access to page->mapping.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Iterate over each folio rather than each page. Convert
f2fs_compress_control_page() to f2fs_compress_control_folio() since
this is the only caller. Removes a reference to page->mapping which
is going away soon as well as calls to fscrypt_is_bounce_page() and
fscrypt_pagecache_page().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Remove a reference to page->mapping which is going away soon.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This helper returns the inode associated with the f2fs_io_info. That's a
relatively common thing to want, mildly awkward to get and provides one
place to change if we decide to record it directly, or change fio->page
to fio->folio.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[Jaegeuk Kim: fix wrong fio_inode conversion]
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert each page in rpages to a folio before operating on it. Replaces
eight calls to compound_head() with one and removes a reference to
page->mapping which is going away soon.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert the incoming page to a folio and operate on it. Removes a
reference to page->mapping which is going away soon.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
XFS has its own buffer cache for metadata that uses submit_bio, which
means that it no longer uses the block device pagecache for anything.
Create a more lightweight helper that runs the blocksize checks and
flushes dirty data and use that instead. No more truncating the
pagecache because XFS does not use it or care about it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The ntfs3 can use the page cache directly, so its address_space_operations
need direct_IO. Exit ntfs_direct_IO() if it is a compressed file.
Fixes: b432163ebd15 ("fs/ntfs3: Update inode->i_mapping->a_ops on compression state")
Reported-by: syzbot+e36cc3297bd3afd25e19@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e36cc3297bd3afd25e19
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
The hdr_first_de() function returns a pointer to a struct NTFS_DE. This
pointer may be NULL. To handle the NULL error effectively, it is important
to implement an error handler. This will help manage potential errors
consistently.
Additionally, error handling for the return value already exists at other
points where this function is called.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block")
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
Static analysis shows that pointer "mi" cannot be NULL, since it is
pre-initialized above. A potential failure when mi equals NULL is
processed.
Remove the extra NULL check. It is meaningless and harms the readability
of the code, since before that the pointer is unconditionally
dereferenced.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
Convert the OMFS filesystem to the new mount API.
Signed-off-by: Pavel Reichl <preichl@redhat.com>
Link: https://lore.kernel.org/20250423220001.1535071-1-preichl@redhat.com
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The vfs has long had a fallback to obtain the security.* xattrs from the
LSM when the filesystem does not implement its own listxattr, but
shmem/tmpfs and kernfs later gained their own xattr handlers to support
other xattrs. Unfortunately, as a side effect, tmpfs and kernfs-based
filesystems like sysfs no longer return the synthetic security.* xattr
names via listxattr unless they are explicitly set by userspace or
initially set upon inode creation after policy load. coreutils has
recently switched from unconditionally invoking getxattr for security.*
for ls -Z via libselinux to only doing so if listxattr returns the xattr
name, breaking ls -Z of such inodes.
Before:
$ getfattr -m.* /run/initramfs
<no output>
$ getfattr -m.* /sys/kernel/fscaps
<no output>
$ setfattr -n user.foo /run/initramfs
$ getfattr -m.* /run/initramfs
user.foo
After:
$ getfattr -m.* /run/initramfs
security.selinux
$ getfattr -m.* /sys/kernel/fscaps
security.selinux
$ setfattr -n user.foo /run/initramfs
$ getfattr -m.* /run/initramfs
security.selinux
user.foo
Link: https://lore.kernel.org/selinux/CAFqZXNtF8wDyQajPCdGn=iOawX4y77ph0EcfcqcUUj+T87FKyA@mail.gmail.com/
Link: https://lore.kernel.org/selinux/20250423175728.3185-2-stephen.smalley.work@gmail.com/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Link: https://lore.kernel.org/20250424152822.2719-1-stephen.smalley.work@gmail.com
Fixes: b09e0fa4b4ea66266058ee ("tmpfs: implement generic xattr support")
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We need the driver core fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In nfs3_get_acl(), the local variable status is assigned the result of
nfs_refresh_inode() inside the *switch* statement, but that value gets
overwritten in the next *if* statement's true branch and is completely
ignored if that branch isn't taken...
Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/c32dced7-a4fa-43c0-aafe-ef6c819c2f91@omp.ru
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
In nfs_direct_write_completion(), the local variable req isn't used outside
the *while* loop and is assigned to right at the start of that loop's body,
so its initializer appears useless -- drop it; then move the declaration to
the loop body (which happens to have a pointless empty line anyway)...
Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/416219f5-7983-484b-b5a7-5fb7da9561f7@omp.ru
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Since struct nfs4_pnfs_ds should not be shared between net namespaces,
move from a global list of objects to a per-netns list and spinlock.
Tested-by: Sargun Dillon <sargun@sargun.me>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/20250410-nfs-ds-netns-v2-2-f80b7979ba80@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Currently, different NFS clients can share the same DS connections, even
when they are in different net namespaces. If a containerized client
creates a DS connection, another container can find and use it. When the
first client exits, the connection will close which can lead to stalls
in other clients.
Add a net namespace pointer to struct nfs4_pnfs_ds, and compare those
value to the caller's netns in _data_server_lookup_locked() when
searching for a nfs4_pnfs_ds to match.
Reported-by: Omar Sandoval <osandov@osandov.com>
Reported-by: Sargun Dillon <sargun@sargun.me>
Closes: https://lore.kernel.org/linux-nfs/Z_ArpQC_vREh_hEA@telecaster/
Tested-by: Sargun Dillon <sargun@sargun.me>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/20250410-nfs-ds-netns-v2-1-f80b7979ba80@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When memory is insufficient, the allocation of nfs_lock_context in
nfs_get_lock_context() fails and returns -ENOMEM. If we mistakenly treat
an nfs4_unlockdata structure (whose l_ctx member has been set to -ENOMEM)
as valid and proceed to execute rpc_run_task(), this will trigger a NULL
pointer dereference in nfs4_locku_prepare. For example:
BUG: kernel NULL pointer dereference, address: 000000000000000c
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 15 UID: 0 PID: 12 Comm: kworker/u64:0 Not tainted 6.15.0-rc2-dirty #60
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40
Workqueue: rpciod rpc_async_schedule
RIP: 0010:nfs4_locku_prepare+0x35/0xc2
Code: 89 f2 48 89 fd 48 c7 c7 68 69 ef b5 53 48 8b 8e 90 00 00 00 48 89 f3
RSP: 0018:ffffbbafc006bdb8 EFLAGS: 00010246
RAX: 000000000000004b RBX: ffff9b964fc1fa00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: fffffffffffffff4 RDI: ffff9ba53fddbf40
RBP: ffff9ba539934000 R08: 0000000000000000 R09: ffffbbafc006bc38
R10: ffffffffb6b689c8 R11: 0000000000000003 R12: ffff9ba539934030
R13: 0000000000000001 R14: 0000000004248060 R15: ffffffffb56d1c30
FS: 0000000000000000(0000) GS:ffff9ba5881f0000(0000) knlGS:00000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000000c CR3: 000000093f244000 CR4: 00000000000006f0
Call Trace:
<TASK>
__rpc_execute+0xbc/0x480
rpc_async_schedule+0x2f/0x40
process_one_work+0x232/0x5d0
worker_thread+0x1da/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x10d/0x240
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in:
CR2: 000000000000000c
---[ end trace 0000000000000000 ]---
Free the allocated nfs4_unlockdata when nfs_get_lock_context() fails and
return NULL to terminate subsequent rpc_run_task, preventing NULL pointer
dereference.
Fixes: f30cb757f680 ("NFS: Always wait for I/O completion before unlock")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250417072508.3850532-1-lilingfeng3@huawei.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When debugging I/O issues, we want to see not just the NFS level errors,
but also the RPC level problems, so record both in the tracepoints.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If we have a fatal ENETDOWN or ENETUNREACH error, then the layoutreturn
on close code should also handle that as fatal, and free the layouts.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
Ensure that the NFSv4 error handling code recognises the
RPC_TASK_NETUNREACH_FATAL flag, and handles the ENETDOWN and ENETUNREACH
errors accordingly.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
Fold it into pidfd_prepare() and rename PIDFD_CLONE to PIDFD_STALE to
indicate that the passed pid might not have task linkage and no explicit
check for that should be performed.
Link: https://lore.kernel.org/20250425-work-pidfs-net-v2-3-450a19461e75@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: David Rheinsberg <david@readahead.eu>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add simple helpers that allow a struct pid to be pinned via a pidfs
dentry/inode. If no pidfs dentry exists a new one will be allocated for
it. A reference is taken by pidfs on @pid. The reference must be
released via pidfs_put_pid().
This will allow AF_UNIX sockets to allocate a dentry for the peer
credentials pid at the time they are recorded where we know the task is
still alive. When the task gets reaped its exit status is guaranteed to
be recorded and a pidfd can be handed out for the reaped task.
Link: https://lore.kernel.org/20250425-work-pidfs-net-v2-1-450a19461e75@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: David Rheinsberg <david@readahead.eu>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The sess->user object can currently be in use by another thread, for
example if another connection has sent a session setup request to
bind to the session being free'd. The handler for that connection could
be in the smb2_sess_setup function which makes use of sess->user.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Heelan <seanheelan@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Setting sess->user = NULL was introduced to fix the dangling pointer
created by ksmbd_free_user. However, it is possible another thread could
be operating on the session and make use of sess->user after it has been
passed to ksmbd_free_user but before sess->user is set to NULL.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Heelan <seanheelan@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- For some reason we went from zero to three maintainers for HFS/HFS+
in a matter of days. The lesson to learn from this might just be that
we need to threaten code removal more often!?
- Fix a regression introduced by enabling large folios for lage logical
block sizes. This has caused issues for noref migration with large
folios due to sleeping while in an atomic context.
New sleeping variants of pagecache lookup helpers are introduced.
These helpers take the folio lock instead of the mapping's private
spinlock. The problematic users are converted to the sleeping
variants and serialize against noref migration. Atomic users will
bail on seeing the new BH_Migrate flag.
This also shrinks the critical region of the mapping's private lock
and the new blocking callers reduce contention on the spinlock for
bdev mappings.
- Fix two bugs in do_move_mount() when with MOVE_MOUNT_BENEATH. The
first bug is using a mountpoint that is located on a mount we're not
holding a reference to. The second bug is putting the mountpoint
after we've called namespace_unlock() as it's no longer guaranteed
that it does stay a mountpoint.
- Remove a pointless call to vfs_getattr_nosec() in the devtmpfs code
just to query i_mode instead of simply querying the inode directly.
This also avoids lifetime issues for the dm code by an earlier bugfix
this cycle that moved bdev_statx() handling into vfs_getattr_nosec().
- Fix AT_FDCWD handling with getname_maybe_null() in the xattr code.
- Fix a performance regression for files when multiple callers issue a
close when it's not the last reference.
- Remove a duplicate noinline annotation from pipe_clear_nowait().
* tag 'vfs-6.15-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2)
MAINTAINERS: hfs/hfsplus: add myself as maintainer
splice: remove duplicate noinline from pipe_clear_nowait
devtmpfs: don't use vfs_getattr_nosec to query i_mode
fix a couple of races in MNT_TREE_BENEATH handling by do_move_mount()
fs: fall back to file_ref_put() for non-last reference
mm/migrate: fix sleep in atomic for large folios and buffer heads
fs/ext4: use sleeping version of sb_find_get_block()
fs/jbd2: use sleeping version of __find_get_block()
fs/ocfs2: use sleeping version of __find_get_block()
fs/buffer: use sleeping version of __find_get_block()
fs/buffer: introduce sleeping flavors for pagecache lookups
MAINTAINERS: add HFS/HFS+ maintainers
fs/buffer: split locking for pagecache lookups
|
|
Pull ceph fixes from Ilya Dryomov:
"A small CephFS encryption-related fix and a dead code cleanup"
* tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-client:
ceph: Fix incorrect flush end position calculation
ceph: Remove osd_client deadcode
|
|
Pull xfs fixes from Carlos Maiolino:
"This contains a fix for a build failure on some 32-bit architectures
and a warning generating docs"
* tag 'xfs-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove duplicate Zoned Filesystems sections in admin-guide
XFS: fix zoned gc threshold math for 32-bit arches
|
|
Pull bcachefs fixes from Kent Overstreet:
- Case insensitive directories now work
- Ciemap now correctly reports on unwritten pagecache data
- bcachefs tools 1.25.1 was incorrectly picking unaligned bucket sizes;
fix journal and write path bugs this uncovered
And assorted smaller fixes...
* tag 'bcachefs-2025-04-24' of git://evilpiepirate.org/bcachefs: (24 commits)
bcachefs: Rework fiemap transaction restart handling
bcachefs: add fiemap delalloc extent detection
bcachefs: refactor fiemap processing into extent helper and struct
bcachefs: track current fiemap offset in start variable
bcachefs: drop duplicate fiemap sync flag
bcachefs: Fix btree_iter_peek_prev() at end of inode
bcachefs: Make btree_iter_peek_prev() assert more precise
bcachefs: Unit test fixes
bcachefs: Print mount opts earlier
bcachefs: unlink: casefold d_invalidate
bcachefs: Fix casefold lookups
bcachefs: Casefold is now a regular opts.h option
bcachefs: Implement fileattr_(get|set)
bcachefs: Allocator now copes with unaligned buckets
bcachefs: Start copygc, rebalance threads earlier
bcachefs: Refactor bch2_run_recovery_passes()
bcachefs: bch2_copygc_wakeup()
bcachefs: Fix ref leak in write_super()
bcachefs: Change __journal_entry_close() assert to ERO
bcachefs: Ensure journal space is block size aligned
...
|
|
The kernfs implementation has big lock granularity(kernfs_rename_lock) so
every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the lock.
This patch switches the global kernfs_rename_lock to per-fs lock, which
put the rwlock into kernfs_root.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250415153659.14950-3-alexjlzheng@tencent.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The kernfs implementation has big lock granularity(kernfs_idr_lock) so
every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the lock.
This patch switches the global kernfs_idr_lock to per-fs lock, which
put the spinlock into kernfs_root.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250415153659.14950-2-alexjlzheng@tencent.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently, setxattrat(2) and getxattrat(2) are wrongly handling the
calls of the from setxattrat(AF_FDCWD, NULL, AT_EMPTY_PATH, ...) and
fail with -EBADF error instead of operating on CWD. Fix it.
Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/20250424132246.16822-2-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
pipe_clear_nowait has two noinline macros, but we only need one.
I checked the whole tree, and this is the only occurrence:
$ grep -r "noinline .* noinline"
fs/splice.c:static noinline void noinline pipe_clear_nowait(struct file *file)
$
Fixes: 0f99fc513ddd ("splice: clear FMODE_NOWAIT on file if splice/vmsplice is used")
Signed-off-by: "T.J. Mercier" <tjmercier@google.com>
Link: https://lore.kernel.org/20250423180025.2627670-1-tjmercier@google.com
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Restart handling in the previous patch was incorrect, so: move btree
operations into a separate helper, and run it with a lockrestart_do().
Additionally, clarify whether pagecache or the btree takes precedence.
Right now, the btree takes precedence: this is incorrect, but it's
needed to pass fstests. Add a giant comment explaining why.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bcachefs currently populates fiemap data from the extents btree.
This works correctly when the fiemap sync flag is provided, but if
not, it skips all delalloc extents that have not yet been flushed.
This is because delalloc extents from buffered writes are first
stored as reservation in the pagecache, and only become resident in
the extents btree after writeback completes.
Update the fiemap implementation to process holes between extents by
scanning pagecache for data, via seek data/hole. If a valid data
range is found over a hole in the extent btree, fake up an extent
key and flag the extent as delalloc for reporting to userspace.
Note that this does not necessarily change behavior for the case
where there is dirty pagecache over already written extents, where
when in COW mode, writeback will allocate new blocks for the
underlying ranges. The existing behavior is consistent with btrfs
and it is recommended to use the sync flag for the most up to date
extent state from fiemap.
Signed-off-by: Brian Foster <bfoster@redhat.com>
|
|
The bulk of the loop in bch2_fiemap() involves processing the
current extent key from the iter, including following indirections
and trimming the extent size and such. This patch makes a few
changes to reduce the size of the loop and facilitate future changes
to support delalloc extents.
Define a new bch_fiemap_extent structure to wrap the bkey buffer
that holds the extent key to report to userspace along with
associated fiemap flags. Update bch2_fill_extent() to take the
bch_fiemap_extent as a param instead of the individual fields.
Finally, lift the bulk of the extent processing into a
bch2_fiemap_extent() helper that takes the current key and formats
the bch_fiemap_extent appropriately for the fill function.
No functional changes intended by this patch.
Signed-off-by: Brian Foster <bfoster@redhat.com>
|
|
Signed-off-by: Brian Foster <bfoster@redhat.com>
|
|
FIEMAP_FLAG_SYNC handling was deliberately moved into core code in
commit 45dd052e67ad ("fs: handle FIEMAP_FLAG_SYNC in fiemap_prep"),
released in kernel v5.8. Update bcachefs accordingly.
Signed-off-by: Brian Foster <bfoster@redhat.com>
|
|
At the end of the inode, on an extents iterator, peek_slot() has to
advance to the next position to avoid returning a 0 size extent, which
is not allowed.
Changing iter->pos confuses peek_prev(), but we don't need to call
peek_slot() in this case.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The issue this assert is guarding against is that in
BTREE_ITER_filter_snapshots mode we only want to be iterating within a
single inode number - if we iterate into another inode number with keys
for a different snapshot tree, we'll loop arbitrarily long before
finding a key we can return.
This comes up in the unit tests, where we're using inode 0 for our test
keys.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The peek_end() tests expect an empty btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we aren't mounting with the correct degraded option, it's helpful to
know that before we fail to mount degraded.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
casefolding results in additional aliases on lookup for the
non-casefolded names - these need invalidating on unlink.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add casefolding to bch2_lookup_trans:
During the delay between when casefolding was written and when it was
merged, the main filesystem lookup path grew self healing - which meant
it was no longer using bch2_dirent_lookup_trans(), where casefolding on
lookups happens.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bio->bi_status is an index into the blk_errors array, not an errno. Its
__bitwise tag is cast away here, resulting in a sparse warning:
fs/gfs2/lops.c:207:22: warning: cast from restricted blk_status_t
We could either add __force to the cast and continue logging bi_status
in the error message, or we could look up the errno in the array and log
that. As sdp->sd_log_error is used as an errno in all other cases, look
up the errno here for consistency.
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When creating and destroying inodes, we are relying on the inode hash
table to make sure that for a given inode number, only a single inode
will exist. We then link that inode to its inode and iopen glock and
let those glocks point back at the inode. However, when iget_failed()
is called, the inode is removed from the inode hash table before
gfs_evict_inode() is called, and uniqueness is no longer guaranteed.
Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work
around that problem by detaching the inode glock from the inode before
calling iget_failed(), but that broke the inode deallocation code in
gfs_evict_inode().
To fix that, deallocate partially created inodes in gfs2_create_inode()
instead of relying on gfs_evict_inode() for doing that.
This means that gfs2_evict_inode() and its helper functions will no
longer see partially created inodes, and so some simplifications are
possible there.
Fixes: 9ffa18884cce ("gfs2: gl_object races fix")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|