Age | Commit message (Collapse) | Author | Files | Lines |
|
If we break out of the loop because an extent buffer doesn't have the bit
EXTENT_BUFFER_TREE_REF set, we end up unlocking the xarray twice, once
before we tested for the bit and break out of the loop, and once again
after the loop.
Fix this by testing the bit and exiting before unlocking the xarray.
The time spent testing the bit is negligible and it's not worth trying
to do that outside the critical section delimited by the xarray lock due
to the code complexity required to avoid it (like using a local boolean
variable to track whether the xarray is locked or not). The xarray unlock
only needs to be done before calling release_extent_buffer(), as that
needs to lock the xarray (through xa_cmpxchg_irq()) and does a more
significant amount of work.
Fixes: 19d7f65f032f ("btrfs: convert the buffer_radix to an xarray")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/linux-btrfs/aDRNDU0GM1_D4Xnw@stanley.mountain/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.
This can be seen by doing the following:
# dd if=/dev/zero of=test bs=4096 count=2621440
# mkfs.btrfs test
# mount test /root/temp
# cd /root/temp
# for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
# sync
# rm *
# sync
# btrfs balance start -dusage=0 .
# sync
# cd
# umount /root/temp
# btrfs check test
For btrfs-check to detect this, you will also need my patch at
https://github.com/kdave/btrfs-progs/pull/991.
Change btrfs_remove_dev_extents() so that it adds the devices to the
fs_info->post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.
Fixes: bbbf7243d62d ("btrfs: combine device update operations during transaction commit")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We have a race between a rename and directory inode logging that if it
happens and we crash/power fail before the rename completes, the next time
the filesystem is mounted, the log replay code will end up deleting the
file that was being renamed.
This is best explained following a step by step analysis of an interleaving
of steps that lead into this situation.
Consider the initial conditions:
1) We are at transaction N;
2) We have directories A and B created in a past transaction (< N);
3) We have inode X corresponding to a file that has 2 hardlinks, one in
directory A and the other in directory B, so we'll name them as
"A/foo_link1" and "B/foo_link2". Both hard links were persisted in a
past transaction (< N);
4) We have inode Y corresponding to a file that as a single hard link and
is located in directory A, we'll name it as "A/bar". This file was also
persisted in a past transaction (< N).
The steps leading to a file loss are the following and for all of them we
are under transaction N:
1) Link "A/foo_link1" is removed, so inode's X last_unlink_trans field
is updated to N, through btrfs_unlink() -> btrfs_record_unlink_dir();
2) Task A starts a rename for inode Y, with the goal of renaming from
"A/bar" to "A/baz", so we enter btrfs_rename();
3) Task A inserts the new BTRFS_INODE_REF_KEY for inode Y by calling
btrfs_insert_inode_ref();
4) Because the rename happens in the same directory, we don't set the
last_unlink_trans field of directoty A's inode to the current
transaction id, that is, we don't cal btrfs_record_unlink_dir();
5) Task A then removes the entries from directory A (BTRFS_DIR_ITEM_KEY
and BTRFS_DIR_INDEX_KEY items) when calling __btrfs_unlink_inode()
(actually the dir index item is added as a delayed item, but the
effect is the same);
6) Now before task A adds the new entry "A/baz" to directory A by
calling btrfs_add_link(), another task, task B is logging inode X;
7) Task B starts a fsync of inode X and after logging inode X, at
btrfs_log_inode_parent() it calls btrfs_log_all_parents(), since
inode X has a last_unlink_trans value of N, set at in step 1;
8) At btrfs_log_all_parents() we search for all parent directories of
inode X using the commit root, so we find directories A and B and log
them. Bu when logging direct A, we don't have a dir index item for
inode Y anymore, neither the old name "A/bar" nor for the new name
"A/baz" since the rename has deleted the old name but has not yet
inserted the new name - task A hasn't called yet btrfs_add_link() to
do that.
Note that logging directory A doesn't fallback to a transaction
commit because its last_unlink_trans has a lower value than the
current transaction's id (see step 4);
9) Task B finishes logging directories A and B and gets back to
btrfs_sync_file() where it calls btrfs_sync_log() to persist the log
tree;
10) Task B successfully persisted the log tree, btrfs_sync_log() completed
with success, and a power failure happened.
We have a log tree without any directory entry for inode Y, so the
log replay code deletes the entry for inode Y, name "A/bar", from the
subvolume tree since it doesn't exist in the log tree and the log
tree is authorative for its index (we logged a BTRFS_DIR_LOG_INDEX_KEY
item that covers the index range for the dentry that corresponds to
"A/bar").
Since there's no other hard link for inode Y and the log replay code
deletes the name "A/bar", the file is lost.
The issue wouldn't happen if task B synced the log only after task A
called btrfs_log_new_name(), which would update the log with the new name
for inode Y ("A/bar").
Fix this by pinning the log root during renames before removing the old
directory entry, and unpinning after btrfs_log_new_name() is called.
Fixes: 259c4b96d78d ("btrfs: stop doing unnecessary log updates during a rename")
CC: stable@vger.kernel.org # 5.18+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Add a "scrub: " prefix to all messages logged by scrub so that it's
easy to filter them from dmesg for analysis.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Add a warning for leaked delayed_nodes when putting a root. We currently
do this for inodes, but not delayed_nodes.
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ Remove the changelog from the commit message. ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If the delayed_root is not empty we are increasing the number of
references to a delayed_node without decreasing it, causing a leak. Fix
by decrementing the delayed_node reference count.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ Remove the changelog from the commit message. ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
To help debugging include the root number in the error message, and since
this is a critical error that implies a metadata inconsistency and results
in a transaction abort change the log message level from "info" to
"critical", which is a much better fit.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Crafted encoded extents could record out-of-range `lstart`, which should
not happen in normal cases.
It caused an iomap_iter_done() complaint [1] reported by syzbot.
[1] https://lore.kernel.org/r/684cb499.a00a0220.c6bd7.0010.GAE@google.com
Fixes: 1d191b4ca51d ("erofs: implement encoded extent metadata")
Reported-and-tested-by: syzbot+d8f000c609f05f52d9b5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8f000c609f05f52d9b5
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250619032839.2642193-1-hsiangkao@linux.alibaba.com
|
|
Pull smb server fixes from Steve French:
- Fix alternate data streams bug
- Important fix for null pointer deref with Kerberos authentication
- Fix oops in smbdirect (RDMA) in free_transport
* tag '6.16-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: handle set/get info file for streamed file
ksmbd: fix null pointer dereference in destroy_previous_session
ksmbd: add free_transport ops in ksmbd connection
|
|
fstest reports a f2fs bug:
generic/363 42s ... [failed, exit status 1]- output mismatch (see /share/git/fstests/results//generic/363.out.bad)
--- tests/generic/363.out 2025-01-12 21:57:40.271440542 +0800
+++ /share/git/fstests/results//generic/363.out.bad 2025-05-19 19:55:58.000000000 +0800
@@ -1,2 +1,78 @@
QA output created by 363
fsx -q -S 0 -e 1 -N 100000
+READ BAD DATA: offset = 0xd6fb, size = 0xf044, fname = /mnt/f2fs/junk
+OFFSET GOOD BAD RANGE
+0x1540d 0x0000 0x2a25 0x0
+operation# (mod 256) for the bad data may be 37
+0x1540e 0x0000 0x2527 0x1
...
(Run 'diff -u /share/git/fstests/tests/generic/363.out /share/git/fstests/results//generic/363.out.bad' to see the entire diff)
Ran: generic/363
Failures: generic/363
Failed 1 of 1 tests
The root cause is user can update post-eof page via mmap [1], however, f2fs
missed to zero post-eof page in below operations, so, once it expands i_size,
then it will include dummy data locates previous post-eof page, so during
below operations, we need to zero post-eof page.
Operations which can include dummy data after previous i_size after expanding
i_size:
- write
- mapwrite [1]
- truncate
- fallocate
* preallocate
* zero_range
* insert_range
* collapse_range
- clone_range (doesn’t support in f2fs)
- copy_range (doesn’t support in f2fs)
[1] https://man7.org/linux/man-pages/man2/mmap.2.html 'BUG section'
Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Commit 8bd25b61c5a5 ("smb: client: set correct d_type for reparse DFS/DFSR
and mount point") deduplicated assignment of fattr->cf_dtype member from
all places to end of the function cifs_reparse_point_to_fattr(). The only
one missing place which was not deduplicated is wsl_to_fattr(). Fix it.
Fixes: 8bd25b61c5a5 ("smb: client: set correct d_type for reparse DFS/DFSR and mount point")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
hostname when adding channels
When mounting a share with kerberos authentication with multichannel
support, share mounts correctly, but fails to create secondary
channels. This occurs because the hostname is not populated when
adding the channels. The hostname is necessary for the userspace
cifs.upcall program to retrieve the required credentials and pass
it back to kernel, without hostname secondary channels fails
establish.
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Reported-by: xfuren <xfuren@gmail.com>
Link: https://bugzilla.samba.org/show_bug.cgi?id=15824
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Previously, file operations on a file-backed mount used the current
process' credentials to access the backing FD. Attempting to do so on
Android lead to SELinux denials, as ACL rules on the backing file (e.g.
/system/apex/foo.apex) is restricted to a small set of process.
Arguably, this error is redundant and leaking implementation details, as
access to files on a mount is already ACL'ed by path.
Instead, override to use the opener's cred when accessing the backing
file. This makes the behavior similar to a loop-backed mount, which
uses kworker cred when accessing the backing file and does not cause
SELinux denials.
Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20250612-b4-erofs-impersonate-v1-1-8ea7d6f65171@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The normal fsck code doesn't call key_visible_in_snapshot() with an
empty list of snapshot IDs seen (the current snapshot ID will always be
on the list), but str_hash_repair_key() ->
bch2_get_snapshot_overwrites() can, and that's totally fine as long as
we check for it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The bug only appears when:
- windows 11 copies a file that has an alternate data stream
- streams_xattr is enabled on the share configuration.
Microsoft Edge adds a ZoneIdentifier data stream containing the URL
for files it downloads.
Another way to create a test file:
- open cmd.exe
- echo "hello from default data stream" > hello.txt
- echo "hello again from ads" > hello.txt:ads.txt
If you open the file using notepad, we'll see the first message.
If you run "notepad hello.txt:ads.txt" in cmd.exe, we should see
the second message.
dir /s /r should least all streams for the file.
The truncation happens because the windows 11 client sends
a SetInfo/EndOfFile message on the ADS, but it is instead applied
on the main file, because we don't check fp->stream.
When receiving set/get info file for a stream file, Change to process
requests using stream position and size.
Truncate is unnecessary for stream files, so we skip
set_file_allocation_info and set_end_of_file_info operations.
Reported-by: Marios Makassikis <mmakassikis@freebox.fr>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If client set ->PreviousSessionId on kerberos session setup stage,
NULL pointer dereference error will happen. Since sess->user is not
set yet, It can pass the user argument as NULL to destroy_previous_session.
sess->user will be set in ksmbd_krb5_authenticate(). So this patch move
calling destroy_previous_session() after ksmbd_krb5_authenticate().
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27391
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
free_transport function for tcp connection can be called from smbdirect.
It will cause kernel oops. This patch add free_transport ops in ksmbd
connection, and add each free_transports for tcp and smbdirect.
Fixes: 21a4e47578d4 ("ksmbd: fix use-after-free in __smb2_lease_break_noti()")
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
check_directory_structure runs after check_dirents, so it expects that
it won't see any inodes with missing backpointers - normally.
But online fsck can't run check_dirents yet, or the user might only be
running a specific pass, so we need to be careful that this isn't an
error. If an inode is unreachable, that's handled by a separate pass.
Also, add a new 'bch2_inode_has_backpointer()' helper, since we were
doing this inconsistently.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree node scrub was sometimes failing to rewrite nodes with errors;
bch2_btree_node_rewrite() can return a transaction restart and we
weren't checking - the lockrestart_do() needs to wrap the entire
operation.
And there's a better helper it should've been using,
bch2_btree_node_rewrite_key(), which makes all this more convenient.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can only pass negative error codes to bch2_err_str(); if it's a
positive integer it's not an error and we trip an assert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
A path exists in a particular snapshot: we should do the pathwalk in the
snapshot ID of the inode we started from, _not_ change snapshot ID as we
walk inodes and dirents.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can easily go from inode number -> path now, which makes for more
useful log messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Log the inode's new path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we find a directory connectivity problem, we should do the repair
in the oldest snapshot that has the issue - so that we don't end up
duplicating work or making a real mess of things.
Oldest snapshot IDs have the highest integer value, so - just walk
inodes in reverse order.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch_subvolume.fs_path_parent needs to be updated as well, it should
match inode.bi_parent_subvol.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The dirent will be in a different snapshot if the inode is a subvolume
root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The bch2_subvolume_get_snapshot() call needs to happen before the dirent
lookup - the dirent is in the parent subvolume.
Also, check for loops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
kthread creation checks for pending signals, which is _very_ annoying if
we have to do a long recovery and don't go rw until we've done
significant work.
Check if we'll be going rw and pre-allocate kthreads/workqueues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Print out more info when we find a key (extent, dirent, xattr) for a
missing inode - was there a good inode in an older snapshot, full(ish)
list of keys for that missing inode, so we can make better decisions on
how to repair.
If it looks like it should've been deleted, autofix it. If we ever hit
the non-autofix cases, we'll want to write more repair code (possibly
reconstituting the inode).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
After cd3cdb1ef706 ("Single err message for btree node reads"),
all errors caused __btree_err to return -BCH_ERR_fsck_fix no matter what
the actual error type was if the recovery pass was scanning for btree
nodes. This lead to the code continuing despite things like bad node
formats when they earlier would have caused a jump to fsck_err, because
btree_err only jumps when the return from __btree_err does not match
fsck_fix. Ultimately this lead to undefined behavior by attempting to
unpack a key based on an invalid format.
Make only errors of type -BCH_ERR_btree_node_read_err_fixable cause
__btree_err to return -BCH_ERR_fsck_fix when scanning for btree nodes.
Reported-by: syzbot+cfd994b9cdf00446fd54@syzkaller.appspotmail.com
Fixes: cd3cdb1ef706 ("bcachefs: Single err message for btree node reads")
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree_interior_update_pool has not been initialized before the
filesystem becomes read-write, thus mempool_alloc in bch2_btree_update_start
will trigger pool->alloc NULL pointer dereference in mempool_alloc_noprof
Reported-by: syzbot+2f3859bd28f20fa682e6@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In syzbot's crash, the bset's u64s is larger than the btree node.
Reported-by: syzbot+bfaeaa8e26281970158d@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Dead code cleanup.
Link: https://lore.kernel.org/linux-bcachefs/20250612224059.39fddd07@batman.local.home/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a mount option for rewinding the journal, bringing the entire
filesystem to where it was at a previous point in time.
This is for extreme disaster recovery scenarios - it's not intended as
an undelete operation.
The option takes a journal sequence number; the desired sequence number
can be determined with 'bcachefs list_journal'
Caveats:
- The 'journal_transaction_names' option must have been enabled (it's on
by default). The option controls emitting of extra debug info in the
journal, so we can see what individual transactions were doing;
It also enables journalling of keys being overwritten, which is what
we rely on here.
- A full fsck run will be automatically triggered since alloc info will
be inconsistent. Only leaf node updates to non-alloc btrees are
rewound, since rewinding interior btree updates isn't possible or
desirable.
- We can't do anything about data that was deleted and overwritten.
Lots of metadata updates after the point in time we're rewinding to
shouldn't cause a problem, since we segragate data and metadata
allocations (this is in order to make repair by btree node scan
practical on larger filesystems; there's a small 64-bit per device
bitmap in the superblock of device ranges with btree nodes, and we try
to keep this small).
However, having discards enabled will cause problems, since buckets
are discarded as soon as they become empty (this is why we don't
implement fstrim: we don't need it).
Hopefully, this feature will be a one-off thing that's never used
again: this was implemented for recovering from the "vfs i_nlink 0 ->
subvol deletion" bug, and that bug was unusually disastrous and
additional safeguards have since been implemented.
But if it does turn out that we need this more in the future, I'll
have to implement an option so that empty buckets aren't discarded
immediately - lagging by perhaps 1% of device capacity.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
subsystem
In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
structure representing a NUMA node relies on the cacheinfo interface
(rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
for monitoring. The L3 cache information of a SNC NUMA node determines
which domains are summed for the "top level" L3-scoped events.
rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
node. When this CPU goes offline, its shared_cpu_map is cleared to contain
only the offline CPU itself. Subsequently, attempting to read counters
via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
zero values for "top-level events" without any error indication.
Replace the cacheinfo references in struct rdt_mon_domain and struct
rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).
rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
domain. When reading "top-level events", select a CPU from
rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
valid CPUs for reading RMID counter via the MSR interface.
Considering all CPUs associated with the L3 cache improves the chances
of picking a housekeeping CPU on which the counter reading work can be
queued, avoiding an unnecessary IPI.
Fixes: 328ea68874642 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files")
Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix a regression in overlayfs caused by reworking the lookup_one*()
set of helpers
- Make sure that the name of the dentry is printed in overlayfs'
mkdir() helper
- Add missing iocb values to TRACE_IOCB_STRINGS define
- Unlock the superblock during iterate_supers_type(). This was an
accidental internal api change
- Drop a misleading assert in file_seek_cur_needs_f_lock() helper
- Never refuse to return PIDFD_GET_INGO when parent pid is zero
That can trivially happen in container scenarios where the parent
process might be located in an ancestor pid namespace
- Don't revalidate in try_lookup_noperm() as that causes regression for
filesystems such as cifs
- Fix simple_xattr_list() and reset the err variable after
security_inode_listsecurity() got called so as not to confuse
userspace about the length of the xattr
* tag 'vfs-6.16-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: drop assert in file_seek_cur_needs_f_lock
fs: unlock the superblock during iterate_supers_type
ovl: fix debug print in case of mkdir error
VFS: change try_lookup_noperm() to skip revalidation
fs: add missing values to TRACE_IOCB_STRINGS
fs/xattr.c: fix simple_xattr_list()
ovl: fix regression caused by lookup helpers API changes
pidfs: never refuse ppid == 0 in PIDFD_GET_INFO
|
|
The assert in function file_seek_cur_needs_f_lock() can be triggered very
easily because there are many users of vfs_llseek() (such as overlayfs)
that do their custom locking around llseek instead of relying on
fdget_pos(). Just drop the overzealous assertion.
Fixes: da06e3c51794 ("fs: don't needlessly acquire f_lock")
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Luis Henriques <luis@igalia.com>
Link: https://lore.kernel.org/20250613101111.17716-1-luis@igalia.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix the case where we're deleting in a different snapshot and need to
emit a whiteout - that requires a regular BTREE_ITER_filter_snapshots
iterator.
Also, only delete the part of the extent that extents past i_size.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
the inode btree uses the offset field for the inum, not the inode field.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When the inode was a whiteout, we were inserting a new whiteout at the
wrong (old) snapshot.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Check against version_incompat_allowed, not version_incompat.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prep work for journal rewind, where the seq we're replaying from may be
different than the last journal entry's last_seq.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, we weren't checking the result of the skiplist walk, just
the is_ancestor bitmap.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We need to start searching from search_key - _not_ path->pos, which will
point to the key we found in the btree
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
this code is rarely invoked, so - we had a few bugs left from basing it
off of bch2_journal_keys_peek_max()...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|