| Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit c6c209ceb87f64a6ceebe61761951dcbbf4a0baa ]
I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881.
None of these RFCs have an NFS status code that match the numeric
value "11".
Based on the meaning of the EAGAIN errno, I presume the use of this
status in NFSD means NFS4ERR_DELAY. So replace the one usage of
nfserr_eagain, and remove it from NFSD's NFS status conversion
tables.
As far as I can tell, NFSERR_EAGAIN has existed since the pre-git
era, but was not actually used by any code until commit f4e44b393389
("NFSD: delay unmount source's export after inter-server copy
completed."), at which time it become possible for NFSD to return
a status code of 11 (which is not valid NFS protocol).
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4806ded4c14c5e8fdc6ce885d83221a78c06a428 ]
Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and
fs/nfs/nfs3xdr.c
Will also be used by fs/nfsd/localio.c
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Stable-dep-of: c6c209ceb87f ("NFSD: Remove NFSERR_EAGAIN")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a1237c203f1757480dc2f3b930608ee00072d3cc ]
This was reported by the KUnit tests in the later patches.
See MS-ERREF 2.3.1 STATUS_NO_DATA_DETECTED. Keep it consistent with the
value in the documentation.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b2b50fca34da5ec231008edba798ddf92986bd7f ]
This was reported by the KUnit tests in the later patches.
See MS-ERREF 2.3.1 STATUS_DEVICE_DOOR_OPEN. Keep it consistent with the
value in the documentation.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9f99caa8950a76f560a90074e3a4b93cfa8b3d84 ]
This was reported by the KUnit tests in the later patches.
See MS-ERREF 2.3.1 STATUS_UNABLE_TO_FREE_VM. Keep it consistent with the
value in the documentation.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a2a8fc27dd668e7562b5326b5ed2f1604cb1e2e9 ]
When automounting, the fs_context should be fixed up to use the cred
from the parent filesystem, since the operation is just extending the
namespace. Authorisation to enter that namespace will already have been
provided by the preceding lookup.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2e47c3cc64b44b0b06cd68c2801db92ff143f2b2 ]
We have observed an NFSv4 client receiving a LOCK reply with a status of
NFS4ERR_OLD_STATEID and subsequently retrying the LOCK request with an
earlier seqid value in the stateid. As this was for a new lockowner,
that would imply that nfs_set_open_stateid_locked() had updated the open
stateid seqid with an earlier value.
Looking at nfs_set_open_stateid_locked(), if the incoming seqid is out
of sequence, the task will sleep on the state->waitq for up to 5
seconds. If the task waits for the full 5 seconds, then after finishing
the wait it'll update the open stateid seqid with whatever value the
incoming seqid has. If there are multiple waiters in this scenario,
then the last one to perform said update may not be the one with the
highest seqid.
Add a check to ensure that the seqid can only be incremented, and add a
tracepoint to indicate when old seqids are skipped.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 913f7cf77bf14c13cfea70e89bcb6d0b22239562 upstream.
An NFSv4 client that sets an ACL with a named principal during file
creation retrieves the ACL afterwards, and finds that it is only a
default ACL (based on the mode bits) and not the ACL that was
requested during file creation. This violates RFC 8881 section
6.4.1.3: "the ACL attribute is set as given".
The issue occurs in nfsd_create_setattr(), which calls
nfsd_attrs_valid() to determine whether to call nfsd_setattr().
However, nfsd_attrs_valid() checks only for iattr changes and
security labels, but not POSIX ACLs. When only an ACL is present,
the function returns false, nfsd_setattr() is skipped, and the
POSIX ACL is never applied to the inode.
Subsequently, when the client retrieves the ACL, the server finds
no POSIX ACL on the inode and returns one generated from the file's
mode bits rather than returning the originally-specified ACL.
Reported-by: Aurélien Couderc <aurelien.couderc2002@gmail.com>
Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
Cc: Roland Mainz <roland.mainz@nrubsig.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 442d27ff09a218b61020ab56387dbc508ad6bfa6 upstream.
When security labeling is enabled, the client can pass a file security
label as part of a create operation for the new file, similar to mode
and other attributes. At present, the security label is received by nfsd
and passed down to nfsd_create_setattr(), but nfsd_setattr() is never
called and therefore the label is never set on the new file. This bug
may have been introduced on or around commit d6a97d3f589a ("NFSD:
add security label to struct nfsd_attrs"). Looking at nfsd_setattr()
I am uncertain as to whether the same issue presents for
file ACLs and therefore requires a similar fix for those.
An alternative approach would be to introduce a new LSM hook to set the
"create SID" of the current task prior to the actual file creation, which
would atomically label the new inode at creation time. This would be better
for SELinux and a similar approach has been used previously
(see security_dentry_create_files_as) but perhaps not usable by other LSMs.
Reproducer:
1. Install a Linux distro with SELinux - Fedora is easiest
2. git clone https://github.com/SELinuxProject/selinux-testsuite
3. Install the requisite dependencies per selinux-testsuite/README.md
4. Run something like the following script:
MOUNT=$HOME/selinux-testsuite
sudo systemctl start nfs-server
sudo exportfs -o rw,no_root_squash,security_label localhost:$MOUNT
sudo mkdir -p /mnt/selinux-testsuite
sudo mount -t nfs -o vers=4.2 localhost:$MOUNT /mnt/selinux-testsuite
pushd /mnt/selinux-testsuite/
sudo make -C policy load
pushd tests/filesystem
sudo runcon -t test_filesystem_t ./create_file -f trans_test_file \
-e test_filesystem_filetranscon_t -v
sudo rm -f trans_test_file
popd
sudo make -C policy unload
popd
sudo umount /mnt/selinux-testsuite
sudo exportfs -u localhost:$MOUNT
sudo rmdir /mnt/selinux-testsuite
sudo systemctl stop nfs-server
Expected output:
<eliding noise from commands run prior to or after the test itself>
Process context:
unconfined_u:unconfined_r:test_filesystem_t:s0-s0:c0.c1023
Created file: trans_test_file
File context: unconfined_u:object_r:test_filesystem_filetranscon_t:s0
File context is correct
Actual output:
<eliding noise from commands run prior to or after the test itself>
Process context:
unconfined_u:unconfined_r:test_filesystem_t:s0-s0:c0.c1023
Created file: trans_test_file
File context: system_u:object_r:test_file_t:s0
File context error, expected:
test_filesystem_filetranscon_t
got:
test_file_t
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Stable-dep-of: 913f7cf77bf1 ("NFSD: NFSv4 file creation neglects setting ACL")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 24d92de9186ebc340687caf7356e1070773e67bc upstream.
The main point of the guarded SETATTR is to prevent races with other
WRITE and SETATTR calls. That requires that the check of the guard time
against the inode ctime be done after taking the inode lock.
Furthermore, we need to take into account the 32-bit nature of
timestamps in NFSv3, and the possibility that files may change at a
faster rate than once a second.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Stable-dep-of: 442d27ff09a2 ("nfsd: set security label during create operations")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 11fec9b9fb04fd1b3330a3b91ab9dcfa81ad5ad3 upstream.
Convert to using the new inode timestamp accessor functions.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-50-jlayton@kernel.org
Stable-dep-of: 24d92de9186e ("nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()")
Signed-off-by: Christian Brauner <brauner@kernel.org>
[ cel: d68886bae76a has already been applied ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7ba0b6461bc4edb3005ea6e00cdae189bcf908a5 upstream.
After rename exchanging (either with the rename exchange operation or
regular renames in multiple non-atomic steps) two inodes and at least
one of them is a directory, we can end up with a log tree that contains
only of the inodes and after a power failure that can result in an attempt
to delete the other inode when it should not because it was not deleted
before the power failure. In some case that delete attempt fails when
the target inode is a directory that contains a subvolume inside it, since
the log replay code is not prepared to deal with directory entries that
point to root items (only inode items).
1) We have directories "dir1" (inode A) and "dir2" (inode B) under the
same parent directory;
2) We have a file (inode C) under directory "dir1" (inode A);
3) We have a subvolume inside directory "dir2" (inode B);
4) All these inodes were persisted in a past transaction and we are
currently at transaction N;
5) We rename the file (inode C), so at btrfs_log_new_name() we update
inode C's last_unlink_trans to N;
6) We get a rename exchange for "dir1" (inode A) and "dir2" (inode B),
so after the exchange "dir1" is inode B and "dir2" is inode A.
During the rename exchange we call btrfs_log_new_name() for inodes
A and B, but because they are directories, we don't update their
last_unlink_trans to N;
7) An fsync against the file (inode C) is done, and because its inode
has a last_unlink_trans with a value of N we log its parent directory
(inode A) (through btrfs_log_all_parents(), called from
btrfs_log_inode_parent()).
8) So we end up with inode B not logged, which now has the old name
of inode A. At copy_inode_items_to_log(), when logging inode A, we
did not check if we had any conflicting inode to log because inode
A has a generation lower than the current transaction (created in
a past transaction);
9) After a power failure, when replaying the log tree, since we find that
inode A has a new name that conflicts with the name of inode B in the
fs tree, we attempt to delete inode B... this is wrong since that
directory was never deleted before the power failure, and because there
is a subvolume inside that directory, attempting to delete it will fail
since replay_dir_deletes() and btrfs_unlink_inode() are not prepared
to deal with dir items that point to roots instead of inodes.
When that happens the mount fails and we get a stack trace like the
following:
[87.2314] BTRFS info (device dm-0): start tree-log replay
[87.2318] BTRFS critical (device dm-0): failed to delete reference to subvol, root 5 inode 256 parent 259
[87.2332] ------------[ cut here ]------------
[87.2338] BTRFS: Transaction aborted (error -2)
[87.2346] WARNING: CPU: 1 PID: 638968 at fs/btrfs/inode.c:4345 __btrfs_unlink_inode+0x416/0x440 [btrfs]
[87.2368] Modules linked in: btrfs loop dm_thin_pool (...)
[87.2470] CPU: 1 UID: 0 PID: 638968 Comm: mount Tainted: G W 6.18.0-rc7-btrfs-next-218+ #2 PREEMPT(full)
[87.2489] Tainted: [W]=WARN
[87.2494] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[87.2514] RIP: 0010:__btrfs_unlink_inode+0x416/0x440 [btrfs]
[87.2538] Code: c0 89 04 24 (...)
[87.2568] RSP: 0018:ffffc0e741f4b9b8 EFLAGS: 00010286
[87.2574] RAX: 0000000000000000 RBX: ffff9d3ec8a6cf60 RCX: 0000000000000000
[87.2582] RDX: 0000000000000002 RSI: ffffffff84ab45a1 RDI: 00000000ffffffff
[87.2591] RBP: ffff9d3ec8a6ef20 R08: 0000000000000000 R09: ffffc0e741f4b840
[87.2599] R10: ffff9d45dc1fffa8 R11: 0000000000000003 R12: ffff9d3ee26d77e0
[87.2608] R13: ffffc0e741f4ba98 R14: ffff9d4458040800 R15: ffff9d44b6b7ca10
[87.2618] FS: 00007f7b9603a840(0000) GS:ffff9d4658982000(0000) knlGS:0000000000000000
[87.2629] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[87.2637] CR2: 00007ffc9ec33b98 CR3: 000000011273e003 CR4: 0000000000370ef0
[87.2648] Call Trace:
[87.2651] <TASK>
[87.2654] btrfs_unlink_inode+0x15/0x40 [btrfs]
[87.2661] unlink_inode_for_log_replay+0x27/0xf0 [btrfs]
[87.2669] check_item_in_log+0x1ea/0x2c0 [btrfs]
[87.2676] replay_dir_deletes+0x16b/0x380 [btrfs]
[87.2684] fixup_inode_link_count+0x34b/0x370 [btrfs]
[87.2696] fixup_inode_link_counts+0x41/0x160 [btrfs]
[87.2703] btrfs_recover_log_trees+0x1ff/0x7c0 [btrfs]
[87.2711] ? __pfx_replay_one_buffer+0x10/0x10 [btrfs]
[87.2719] open_ctree+0x10bb/0x15f0 [btrfs]
[87.2726] btrfs_get_tree.cold+0xb/0x16c [btrfs]
[87.2734] ? fscontext_read+0x15c/0x180
[87.2740] ? rw_verify_area+0x50/0x180
[87.2746] vfs_get_tree+0x25/0xd0
[87.2750] vfs_cmd_create+0x59/0xe0
[87.2755] __do_sys_fsconfig+0x4f6/0x6b0
[87.2760] do_syscall_64+0x50/0x1220
[87.2764] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[87.2770] RIP: 0033:0x7f7b9625f4aa
[87.2775] Code: 73 01 c3 48 (...)
[87.2803] RSP: 002b:00007ffc9ec35b08 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
[87.2817] RAX: ffffffffffffffda RBX: 0000558bfa91ac20 RCX: 00007f7b9625f4aa
[87.2829] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
[87.2842] RBP: 0000558bfa91b120 R08: 0000000000000000 R09: 0000000000000000
[87.2854] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[87.2864] R13: 00007f7b963f1580 R14: 00007f7b963f326c R15: 00007f7b963d8a23
[87.2877] </TASK>
[87.2882] ---[ end trace 0000000000000000 ]---
[87.2891] BTRFS: error (device dm-0 state A) in __btrfs_unlink_inode:4345: errno=-2 No such entry
[87.2904] BTRFS: error (device dm-0 state EAO) in do_abort_log_replay:191: errno=-2 No such entry
[87.2915] BTRFS critical (device dm-0 state EAO): log tree (for root 5) leaf currently being processed (slot 7 key (258 12 257)):
[87.2929] BTRFS info (device dm-0 state EAO): leaf 30736384 gen 10 total ptrs 7 free space 15712 owner 18446744073709551610
[87.2929] BTRFS info (device dm-0 state EAO): refs 3 lock_owner 0 current 638968
[87.2929] item 0 key (257 INODE_ITEM 0) itemoff 16123 itemsize 160
[87.2929] inode generation 9 transid 10 size 0 nbytes 0
[87.2929] block group 0 mode 40755 links 1 uid 0 gid 0
[87.2929] rdev 0 sequence 7 flags 0x0
[87.2929] atime 1765464494.678070921
[87.2929] ctime 1765464494.686606513
[87.2929] mtime 1765464494.686606513
[87.2929] otime 1765464494.678070921
[87.2929] item 1 key (257 INODE_REF 256) itemoff 16109 itemsize 14
[87.2929] index 4 name_len 4
[87.2929] item 2 key (257 DIR_LOG_INDEX 2) itemoff 16101 itemsize 8
[87.2929] dir log end 2
[87.2929] item 3 key (257 DIR_LOG_INDEX 3) itemoff 16093 itemsize 8
[87.2929] dir log end 18446744073709551615
[87.2930] item 4 key (257 DIR_INDEX 3) itemoff 16060 itemsize 33
[87.2930] location key (258 1 0) type 1
[87.2930] transid 10 data_len 0 name_len 3
[87.2930] item 5 key (258 INODE_ITEM 0) itemoff 15900 itemsize 160
[87.2930] inode generation 9 transid 10 size 0 nbytes 0
[87.2930] block group 0 mode 100644 links 1 uid 0 gid 0
[87.2930] rdev 0 sequence 2 flags 0x0
[87.2930] atime 1765464494.678456467
[87.2930] ctime 1765464494.686606513
[87.2930] mtime 1765464494.678456467
[87.2930] otime 1765464494.678456467
[87.2930] item 6 key (258 INODE_REF 257) itemoff 15887 itemsize 13
[87.2930] index 3 name_len 3
[87.2930] BTRFS critical (device dm-0 state EAO): log replay failed in unlink_inode_for_log_replay:1045 for root 5, stage 3, with error -2: failed to unlink inode 256 parent dir 259 name subvol root 5
[87.2963] BTRFS: error (device dm-0 state EAO) in btrfs_recover_log_trees:7743: errno=-2 No such entry
[87.2981] BTRFS: error (device dm-0 state EAO) in btrfs_replay_log:2083: errno=-2 No such entry (Failed to recover log tr
So fix this by changing copy_inode_items_to_log() to always detect if
there are conflicting inodes for the ref/extref of the inode being logged
even if the inode was created in a past transaction.
A test case for fstests will follow soon.
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2857bd59feb63fcf40fe4baf55401baea6b4feb4 upstream.
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.
We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.
nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.
However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it. For this we
add a new flag protected with nn->client_lock. It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.
So this patch adds a nfsd_net field "client_tracking_active" which is
set as described. Another field "grace_end_forced", is set when
v4_end_grace is written. After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.
This resolves a race which can result in use-after-free.
Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t
Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neil@brown.name>
Tested-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e901c7fce59e72d9f3c92733c379849c4034ac50 upstream.
Commit abc02e5602f7 ("NFSD: Support write delegations in LAYOUTGET")
added NFSD_MAY_OWNER_OVERRIDE to the access flags passed from
nfsd4_layoutget() to fh_verify(). This causes LAYOUTGET to fail for
executable-only files, and causes xfstests generic/126 to fail on
pNFS SCSI.
To allow read access to executable-only files, what we really want is:
1. The "permissions" portion of the access flags (the lower 6 bits)
must be exactly NFSD_MAY_READ
2. The "hints" portion of the access flags (the upper 26 bits) can
contain any combination of NFSD_MAY_OWNER_OVERRIDE and
NFSD_MAY_READ_IF_EXEC
Fixes: abc02e5602f7 ("NFSD: Support write delegations in LAYOUTGET")
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a2187431c395cdfbf144e3536f25468c64fc7cfa upstream.
Commit 985b67cd8639 ("ext4: filesystems without casefold feature cannot
be mounted with siphash") properly rejects volumes where
s_def_hash_version is set to DX_HASH_SIPHASH, but the check and the
error message should not look into casefold setup - a filesystem should
never have DX_HASH_SIPHASH as the default hash. Fix it and, since we
are there, move the check to ext4_hash_info_init.
Fixes:985b67cd8639 ("ext4: filesystems without casefold feature cannot
be mounted with siphash")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://patch.msgid.link/87jzg1en6j.fsf_-_@mailhost.krisman.be
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 985b67cd86392310d9e9326de941c22fc9340eec upstream.
When mounting the ext4 filesystem, if the default hash version is set to
DX_HASH_SIPHASH but the casefold feature is not set, exit the mounting.
Reported-by: syzbot+340581ba9dceb7e06fb3@syzkaller.appspotmail.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Link: https://patch.msgid.link/20240605012335.44086-1-lizhi.xu@windriver.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[cascardo: small conflict fixup]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 68d05693f8c031257a0822464366e1c2a239a512 ]
mkfs.f2fs -f /dev/vdd
mount /dev/vdd /mnt/f2fs
touch /mnt/f2fs/foo
sync # avoid CP_UMOUNT_FLAG in last f2fs_checkpoint.ckpt_flags
touch /mnt/f2fs/bar
f2fs_io fsync /mnt/f2fs/bar
f2fs_io shutdown 2 /mnt/f2fs
umount /mnt/f2fs
blockdev --setro /dev/vdd
mount /dev/vdd /mnt/f2fs
mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only.
For the case if we create and fsync a new inode before sudden power-cut,
without norecovery or disable_roll_forward mount option, the following
mount will succeed w/o recovering last fsynced inode.
The problem here is that we only check inode_list list after
find_fsync_dnodes() in f2fs_recover_fsync_data() to find out whether
there is recoverable data in the iamge, but there is a missed case, if
last fsynced inode is not existing in last checkpoint, then, we will
fail to get its inode due to nat of inode node is not existing in last
checkpoint, so the inode won't be linked in inode_list.
Let's detect such case in dyrun mode to fix this issue.
After this change, mount will fail as expected below:
mount: /mnt/f2fs: cannot mount /dev/vdd read-only.
dmesg(1) may have more information after failed mount system call.
demsg:
F2FS-fs (vdd): Need to recover fsync data, but write access unavailable, please try mount w/ disable_roll_forward or norecovery
Cc: stable@kernel.org
Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ folio => page ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1f27ef42bb0b7c0740c5616ec577ec188b8a1d05 ]
As Hong Yun reported in mailing list:
loop7: detected capacity change from 0 to 131072
------------[ cut here ]------------
kmem_cache of name 'f2fs_xattr_entry-7:7' already exists
WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 kmem_cache_sanity_check mm/slab_common.c:109 [inline]
WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 __kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307
CPU: 0 UID: 0 PID: 24426 Comm: syz.7.1370 Not tainted 6.17.0-rc4 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:kmem_cache_sanity_check mm/slab_common.c:109 [inline]
RIP: 0010:__kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307
Call Trace:
__kmem_cache_create include/linux/slab.h:353 [inline]
f2fs_kmem_cache_create fs/f2fs/f2fs.h:2943 [inline]
f2fs_init_xattr_caches+0xa5/0xe0 fs/f2fs/xattr.c:843
f2fs_fill_super+0x1645/0x2620 fs/f2fs/super.c:4918
get_tree_bdev_flags+0x1fb/0x260 fs/super.c:1692
vfs_get_tree+0x43/0x140 fs/super.c:1815
do_new_mount+0x201/0x550 fs/namespace.c:3808
do_mount fs/namespace.c:4136 [inline]
__do_sys_mount fs/namespace.c:4347 [inline]
__se_sys_mount+0x298/0x2f0 fs/namespace.c:4324
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x8e/0x3a0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug can be reproduced w/ below scripts:
- mount /dev/vdb /mnt1
- mount /dev/vdc /mnt2
- umount /mnt1
- mounnt /dev/vdb /mnt1
The reason is if we created two slab caches, named f2fs_xattr_entry-7:3
and f2fs_xattr_entry-7:7, and they have the same slab size. Actually,
slab system will only create one slab cache core structure which has
slab name of "f2fs_xattr_entry-7:3", and two slab caches share the same
structure and cache address.
So, if we destroy f2fs_xattr_entry-7:3 cache w/ cache address, it will
decrease reference count of slab cache, rather than release slab cache
entirely, since there is one more user has referenced the cache.
Then, if we try to create slab cache w/ name "f2fs_xattr_entry-7:3" again,
slab system will find that there is existed cache which has the same name
and trigger the warning.
Let's changes to use global inline_xattr_slab instead of per-sb slab cache
for fixing.
Fixes: a999150f4fe3 ("f2fs: use kmem_cache pool during inline xattr lookups")
Cc: stable@kernel.org
Reported-by: Hong Yun <yhong@link.cuhk.edu.hk>
Tested-by: Hong Yun <yhong@link.cuhk.edu.hk>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ folio => page ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit be112e7449a6e1b54aa9feac618825d154b3a5c7 ]
In order to let userspace detect such error rather than suffering
silent failure.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ Adjust context, no rollback ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 10b591e7fb7cdc8c1e53e9c000dc0ef7069aaa76 ]
Bai, Shuangpeng <sjb7183@psu.edu> reported a bug as below:
Oops: divide error: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 11441 Comm: syz.0.46 Not tainted 6.17.0 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:f2fs_all_cluster_page_ready+0x106/0x550 fs/f2fs/compress.c:857
Call Trace:
<TASK>
f2fs_write_cache_pages fs/f2fs/data.c:3078 [inline]
__f2fs_write_data_pages fs/f2fs/data.c:3290 [inline]
f2fs_write_data_pages+0x1c19/0x3600 fs/f2fs/data.c:3317
do_writepages+0x38e/0x640 mm/page-writeback.c:2634
filemap_fdatawrite_wbc mm/filemap.c:386 [inline]
__filemap_fdatawrite_range mm/filemap.c:419 [inline]
file_write_and_wait_range+0x2ba/0x3e0 mm/filemap.c:794
f2fs_do_sync_file+0x6e6/0x1b00 fs/f2fs/file.c:294
generic_write_sync include/linux/fs.h:3043 [inline]
f2fs_file_write_iter+0x76e/0x2700 fs/f2fs/file.c:5259
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x7e9/0xe00 fs/read_write.c:686
ksys_write+0x19d/0x2d0 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf7/0x470 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The bug was triggered w/ below race condition:
fsync setattr ioctl
- f2fs_do_sync_file
- file_write_and_wait_range
- f2fs_write_cache_pages
: inode is non-compressed
: cc.cluster_size =
F2FS_I(inode)->i_cluster_size = 0
- tag_pages_for_writeback
- f2fs_setattr
- truncate_setsize
- f2fs_truncate
- f2fs_fileattr_set
- f2fs_setflags_common
- set_compress_context
: F2FS_I(inode)->i_cluster_size = 4
: set_inode_flag(inode, FI_COMPRESSED_FILE)
- f2fs_compressed_file
: return true
- f2fs_all_cluster_page_ready
: "pgidx % cc->cluster_size" trigger dividing 0 issue
Let's change as below to fix this issue:
- introduce a new atomic type variable .writeback in structure f2fs_inode_info
to track the number of threads which calling f2fs_write_cache_pages().
- use .i_sem lock to protect .writeback update.
- check .writeback before update compression context in f2fs_setflags_common()
to avoid race w/ ->writepages.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Cc: stable@kernel.org
Reported-by: Bai, Shuangpeng <sjb7183@psu.edu>
Tested-by: Bai, Shuangpeng <sjb7183@psu.edu>
Closes: https://lore.kernel.org/lkml/44D8F7B3-68AD-425F-9915-65D27591F93F@psu.edu
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 078cad8212ce4f4ebbafcc0936475b8215e1ca2a ]
Let's drop the inode from the donation list when there is no other
open file.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 10b591e7fb7c ("f2fs: fix to avoid updating compression context during writeback")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ef0c333cad8d1940f132a7ce15f15920216a3bd5 ]
This patch records POSIX_FADV_NOREUSE ranges for users to reclaim the caches
instantly off from LRU.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 10b591e7fb7c ("f2fs: fix to avoid updating compression context during writeback")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 968c4f72b23c0c8f1e94e942eab89b8c5a3022e7 ]
After commit 3db1de0e582c ("f2fs: change the current atomic write way"),
we removed all GC_FAILURE_ATOMIC usage, let's change i_gc_failures[]
array to i_pin_failure for cleanup.
Meanwhile, let's define i_current_depth and i_gc_failures as union
variable due to they won't be valid at the same time.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 10b591e7fb7c ("f2fs: fix to avoid updating compression context during writeback")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5701875f9609b000d91351eaa6bfd97fe2f157f4 ]
There's issue as follows:
BUG: KASAN: use-after-free in ext4_xattr_inode_dec_ref_all+0x6ff/0x790
Read of size 4 at addr ffff88807b003000 by task syz-executor.0/15172
CPU: 3 PID: 15172 Comm: syz-executor.0
Call Trace:
__dump_stack lib/dump_stack.c:82 [inline]
dump_stack+0xbe/0xfd lib/dump_stack.c:123
print_address_description.constprop.0+0x1e/0x280 mm/kasan/report.c:400
__kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560
kasan_report+0x3a/0x50 mm/kasan/report.c:585
ext4_xattr_inode_dec_ref_all+0x6ff/0x790 fs/ext4/xattr.c:1137
ext4_xattr_delete_inode+0x4c7/0xda0 fs/ext4/xattr.c:2896
ext4_evict_inode+0xb3b/0x1670 fs/ext4/inode.c:323
evict+0x39f/0x880 fs/inode.c:622
iput_final fs/inode.c:1746 [inline]
iput fs/inode.c:1772 [inline]
iput+0x525/0x6c0 fs/inode.c:1758
ext4_orphan_cleanup fs/ext4/super.c:3298 [inline]
ext4_fill_super+0x8c57/0xba40 fs/ext4/super.c:5300
mount_bdev+0x355/0x410 fs/super.c:1446
legacy_get_tree+0xfe/0x220 fs/fs_context.c:611
vfs_get_tree+0x8d/0x2f0 fs/super.c:1576
do_new_mount fs/namespace.c:2983 [inline]
path_mount+0x119a/0x1ad0 fs/namespace.c:3316
do_mount+0xfc/0x110 fs/namespace.c:3329
__do_sys_mount fs/namespace.c:3540 [inline]
__se_sys_mount+0x219/0x2e0 fs/namespace.c:3514
do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x67/0xd1
Memory state around the buggy address:
ffff88807b002f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88807b002f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88807b003000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88807b003080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88807b003100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Above issue happens as ext4_xattr_delete_inode() isn't check xattr
is valid if xattr is in inode.
To solve above issue call xattr_check_inode() check if xattr if valid
in inode. In fact, we can directly verify in ext4_iget_extra_inode(),
so that there is no divergent verification.
Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250208063141.1539283-3-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: David Nyström <david.nystrom@est.tech>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 69f3a3039b0d0003de008659cafd5a1eaaa0a7a4 ]
Introduce ITAIL helper to get the bound of xattr in inode.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250208063141.1539283-2-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: David Nyström <david.nystrom@est.tech>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a49a2a1baa0c553c3548a1c414b6a3c005a8deba ]
Usage of vfs_test_lock() is somewhat confused. Documentation suggests
it is given a "lock" but this is not the case. It is given a struct
file_lock which contains some details of the sort of lock it should be
looking for.
In particular passing a "file_lock" containing fl_lmops or fl_ops is
meaningless and possibly confusing.
This is particularly problematic in lockd. nlmsvc_testlock() receives
an initialised "file_lock" from xdr-decode, including manager ops and an
owner. It then mistakenly passes this to vfs_test_lock() which might
replace the owner and the ops. This can lead to confusion when freeing
the lock.
The primary role of the 'struct file_lock' passed to vfs_test_lock() is
to report a conflicting lock that was found, so it makes more sense for
nlmsvc_testlock() to pass "conflock", which it uses for returning the
conflicting lock.
With this change, freeing of the lock is not confused and code in
__nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified.
Documentation for vfs_test_lock() is improved to reflect its real
purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the
future.
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com
Signed-off-by: NeilBrown <neil@brown.name>
Fixes: 20fa19027286 ("nfs: add export operations")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ adapted fl->c.flc_* field references to flat fl->fl_* naming convention ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bd5603eaae0aabf527bfb3ce1bb07e979ce5bd50 ]
Commit e26ee4efbc79 ("fuse: allocate ff->release_args only if release is
needed") skips allocating ff->release_args if the server does not
implement open. However in doing so, fuse_prepare_release() now skips
grabbing the reference on the inode, which makes it possible for an
inode to be evicted from the dcache while there are inflight readahead
requests. This causes a deadlock if the server triggers reclaim while
servicing the readahead request and reclaim attempts to evict the inode
of the file being read ahead. Since the folio is locked during
readahead, when reclaim evicts the fuse inode and fuse_evict_inode()
attempts to remove all folios associated with the inode from the page
cache (truncate_inode_pages_range()), reclaim will block forever waiting
for the lock since readahead cannot relinquish the lock because it is
itself blocked in reclaim:
>>> stack_trace(1504735)
folio_wait_bit_common (mm/filemap.c:1308:4)
folio_lock (./include/linux/pagemap.h:1052:3)
truncate_inode_pages_range (mm/truncate.c:336:10)
fuse_evict_inode (fs/fuse/inode.c:161:2)
evict (fs/inode.c:704:3)
dentry_unlink_inode (fs/dcache.c:412:3)
__dentry_kill (fs/dcache.c:615:3)
shrink_kill (fs/dcache.c:1060:12)
shrink_dentry_list (fs/dcache.c:1087:3)
prune_dcache_sb (fs/dcache.c:1168:2)
super_cache_scan (fs/super.c:221:10)
do_shrink_slab (mm/shrinker.c:435:9)
shrink_slab (mm/shrinker.c:626:10)
shrink_node (mm/vmscan.c:5951:2)
shrink_zones (mm/vmscan.c:6195:3)
do_try_to_free_pages (mm/vmscan.c:6257:3)
do_swap_page (mm/memory.c:4136:11)
handle_pte_fault (mm/memory.c:5562:10)
handle_mm_fault (mm/memory.c:5870:9)
do_user_addr_fault (arch/x86/mm/fault.c:1338:10)
handle_page_fault (arch/x86/mm/fault.c:1481:3)
exc_page_fault (arch/x86/mm/fault.c:1539:2)
asm_exc_page_fault+0x22/0x27
Fix this deadlock by allocating ff->release_args and grabbing the
reference on the inode when preparing the file for release even if the
server does not implement open. The inode reference will be dropped when
the last reference on the fuse file is dropped (see fuse_file_put() ->
fuse_release_end()).
Fixes: e26ee4efbc79 ("fuse: allocate ff->release_args only if release is needed")
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reported-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 27d17641cacfedd816789b75d342430f6b912bd2 ]
>>From RFC 8881:
5.8.1.14. Attribute 75: suppattr_exclcreat
> The bit vector that would set all REQUIRED and RECOMMENDED
> attributes that are supported by the EXCLUSIVE4_1 method of file
> creation via the OPEN operation. The scope of this attribute
> applies to all objects with a matching fsid.
There's nothing in RFC 8881 that states that suppattr_exclcreat is
or is not allowed to contain bits for attributes that are clear in
the reported supported_attrs bitmask. But it doesn't make sense for
an NFS server to indicate that it /doesn't/ implement an attribute,
but then also indicate that clients /are/ allowed to set that
attribute using OPEN(create) with EXCLUSIVE4_1.
Ensure that the SECURITY_LABEL and ACL bits are not set in the
suppattr_exclcreat bitmask when they are also not set in the
supported_attrs bitmask.
Fixes: 8c18f2052e75 ("nfsd41: SUPPATTR_EXCLCREAT attribute")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ca8b201f28547e28343a6f00a6e91fa8c09572fe ]
As Jiaming Zhang and syzbot reported, there is potential deadlock in
f2fs as below:
Chain exists of:
&sbi->cp_rwsem --> fs_reclaim --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(sb_internal#2);
lock(fs_reclaim);
lock(sb_internal#2);
rlock(&sbi->cp_rwsem);
*** DEADLOCK ***
3 locks held by kswapd0/73:
#0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:7015 [inline]
#0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x951/0x2800 mm/vmscan.c:7389
#1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_trylock_shared fs/super.c:562 [inline]
#1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_cache_scan+0x91/0x4b0 fs/super.c:197
#2: ffff888011840610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x8d9/0x1b60 fs/f2fs/inode.c:890
stack backtrace:
CPU: 0 UID: 0 PID: 73 Comm: kswapd0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043
check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
__lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
down_read+0x46/0x2e0 kernel/locking/rwsem.c:1537
f2fs_down_read fs/f2fs/f2fs.h:2278 [inline]
f2fs_lock_op fs/f2fs/f2fs.h:2357 [inline]
f2fs_do_truncate_blocks+0x21c/0x10c0 fs/f2fs/file.c:791
f2fs_truncate_blocks+0x10a/0x300 fs/f2fs/file.c:867
f2fs_truncate+0x489/0x7c0 fs/f2fs/file.c:925
f2fs_evict_inode+0x9f2/0x1b60 fs/f2fs/inode.c:897
evict+0x504/0x9c0 fs/inode.c:810
f2fs_evict_inode+0x1dc/0x1b60 fs/f2fs/inode.c:853
evict+0x504/0x9c0 fs/inode.c:810
dispose_list fs/inode.c:852 [inline]
prune_icache_sb+0x21b/0x2c0 fs/inode.c:1000
super_cache_scan+0x39b/0x4b0 fs/super.c:224
do_shrink_slab+0x6ef/0x1110 mm/shrinker.c:437
shrink_slab_memcg mm/shrinker.c:550 [inline]
shrink_slab+0x7ef/0x10d0 mm/shrinker.c:628
shrink_one+0x28a/0x7c0 mm/vmscan.c:4955
shrink_many mm/vmscan.c:5016 [inline]
lru_gen_shrink_node mm/vmscan.c:5094 [inline]
shrink_node+0x315d/0x3780 mm/vmscan.c:6081
kswapd_shrink_node mm/vmscan.c:6941 [inline]
balance_pgdat mm/vmscan.c:7124 [inline]
kswapd+0x147c/0x2800 mm/vmscan.c:7389
kthread+0x70e/0x8a0 kernel/kthread.c:463
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
The root cause is deadlock among four locks as below:
kswapd
- fs_reclaim --- Lock A
- shrink_one
- evict
- f2fs_evict_inode
- sb_start_intwrite --- Lock B
- iput
- evict
- f2fs_evict_inode
- sb_start_intwrite --- Lock B
- f2fs_truncate
- f2fs_truncate_blocks
- f2fs_do_truncate_blocks
- f2fs_lock_op --- Lock C
ioctl
- f2fs_ioc_commit_atomic_write
- f2fs_lock_op --- Lock C
- __f2fs_commit_atomic_write
- __replace_atomic_write_block
- f2fs_get_dnode_of_data
- __get_node_folio
- f2fs_check_nid_range
- f2fs_handle_error
- f2fs_record_errors
- f2fs_down_write --- Lock D
open
- do_open
- do_truncate
- security_inode_need_killpriv
- f2fs_getxattr
- lookup_all_xattrs
- f2fs_handle_error
- f2fs_record_errors
- f2fs_down_write --- Lock D
- f2fs_commit_super
- read_mapping_folio
- filemap_alloc_folio_noprof
- prepare_alloc_pages
- fs_reclaim_acquire --- Lock A
In order to avoid such deadlock, we need to avoid grabbing sb_lock in
f2fs_handle_error(), so, let's use asynchronous method instead:
- remove f2fs_handle_error() implementation
- rename f2fs_handle_error_async() to f2fs_handle_error()
- spread f2fs_handle_error()
Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Cc: stable@kernel.org
Reported-by: syzbot+14b90e1156b9f6fc1266@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68eae49b.050a0220.ac43.0001.GAE@google.com
Reported-by: Jiaming Zhang <r772577952@gmail.com>
Closes: https://lore.kernel.org/lkml/CANypQFa-Gy9sD-N35o3PC+FystOWkNuN8pv6S75HLT0ga-Tzgw@mail.gmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0b8eb814e05885cde53c1d56ee012a029b8413e6 ]
Use f2fs_err_ratelimited() to instead f2fs_err() in
f2fs_record_stop_reason() and f2fs_record_errors() to
avoid redundant logs.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: ca8b201f2854 ("f2fs: fix to avoid potential deadlock")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0185c2292c600993199bc6b1f342ad47a9e8c678 ]
In our user safe ino resolve ioctl we'll just turn any ret into -EACCES
from inode_permission(). This is redundant, and could potentially be
wrong if we had an ENOMEM in the security layer or some such other
error, so simply return the actual return value.
Note: The patch was taken from v5 of fscrypt patchset
(https://lore.kernel.org/linux-btrfs/cover.1706116485.git.josef@toxicpanda.com/)
which was handled over time by various people: Omar Sandoval, Sweet Tea
Dorminy, Josef Bacik.
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4cfc7d5a4a01d2133b278cdbb1371fba1b419174 ]
After commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"),
the freeze error handling is broken because gfs2_do_thaw()
overwrites the 'error' variable, causing incorrect processing
of the original freeze error.
Fix this by calling gfs2_do_thaw() when gfs2_lock_fs_check_clean()
fails but ignoring its return value to preserve the original
freeze error for proper reporting.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: b77b4a4815a9 ("gfs2: Rework freeze / thaw logic")
Cc: stable@vger.kernel.org # v6.5+
Signed-off-by: Alexey Velichayshiy <a.velichayshiy@ispras.ru>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[ gfs2_do_thaw() only takes one param ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6abfe107894af7e8ce3a2e120c619d81ee764ad5 ]
Copying the file system while it is mounted as read-only results in
a mount failure:
[~]# mkfs.ext4 -F /dev/sdc
[~]# mount /dev/sdc -o ro /mnt/test
[~]# dd if=/dev/sdc of=/dev/sda bs=1M
[~]# mount /dev/sda /mnt/test1
[ 1094.849826] JBD2: journal checksum error
[ 1094.850927] EXT4-fs (sda): Could not load journal inode
mount: mount /dev/sda on /mnt/test1 failed: Bad message
The process described above is just an abstracted way I came up with to
reproduce the issue. In the actual scenario, the file system was mounted
read-only and then copied while it was still mounted. It was found that
the mount operation failed. The user intended to verify the data or use
it as a backup, and this action was performed during a version upgrade.
Above issue may happen as follows:
ext4_fill_super
set_journal_csum_feature_set(sb)
if (ext4_has_metadata_csum(sb))
incompat = JBD2_FEATURE_INCOMPAT_CSUM_V3;
if (test_opt(sb, JOURNAL_CHECKSUM)
jbd2_journal_set_features(sbi->s_journal, compat, 0, incompat);
lock_buffer(journal->j_sb_buffer);
sb->s_feature_incompat |= cpu_to_be32(incompat);
//The data in the journal sb was modified, but the checksum was not
updated, so the data remaining in memory has a mismatch between the
data and the checksum.
unlock_buffer(journal->j_sb_buffer);
In this case, the journal sb copied over is in a state where the checksum
and data are inconsistent, so mounting fails.
To solve the above issue, update the checksum in memory after modifying
the journal sb.
Fixes: 4fd5ea43bc11 ("jbd2: checksum journal superblock")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251103010123.3753631-1-yebin@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
[ jbd2_superblock_csum() also takes a journal param ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ee5a977b4e771cc181f39d504426dbd31ed701cc ]
strscpy_pad() can't be used to copy a non-NUL-term string into a NUL-term
string of possibly bigger size. Commit 0efc5990bca5 ("string.h: Introduce
memtostr() and memtostr_pad()") provides additional information in that
regard. So if this happens, the following warning is observed:
strnlen: detected buffer overflow: 65 byte read of buffer size 64
WARNING: CPU: 0 PID: 28655 at lib/string_helpers.c:1032 __fortify_report+0x96/0xc0 lib/string_helpers.c:1032
Modules linked in:
CPU: 0 UID: 0 PID: 28655 Comm: syz-executor.3 Not tainted 6.12.54-syzkaller-00144-g5f0270f1ba00 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:__fortify_report+0x96/0xc0 lib/string_helpers.c:1032
Call Trace:
<TASK>
__fortify_panic+0x1f/0x30 lib/string_helpers.c:1039
strnlen include/linux/fortify-string.h:235 [inline]
sized_strscpy include/linux/fortify-string.h:309 [inline]
parse_apply_sb_mount_options fs/ext4/super.c:2504 [inline]
__ext4_fill_super fs/ext4/super.c:5261 [inline]
ext4_fill_super+0x3c35/0xad00 fs/ext4/super.c:5706
get_tree_bdev_flags+0x387/0x620 fs/super.c:1636
vfs_get_tree+0x93/0x380 fs/super.c:1814
do_new_mount fs/namespace.c:3553 [inline]
path_mount+0x6ae/0x1f70 fs/namespace.c:3880
do_mount fs/namespace.c:3893 [inline]
__do_sys_mount fs/namespace.c:4103 [inline]
__se_sys_mount fs/namespace.c:4080 [inline]
__x64_sys_mount+0x280/0x300 fs/namespace.c:4080
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x64/0x140 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Since userspace is expected to provide s_mount_opts field to be at most 63
characters long with the ending byte being NUL-term, use a 64-byte buffer
which matches the size of s_mount_opts, so that strscpy_pad() does its job
properly. Return with error if the user still managed to provide a
non-NUL-term string here.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 8ecb790ea8c3 ("ext4: avoid potential buffer over-read in parse_apply_sb_mount_options()")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251101160430.222297-1-pchelkin@ispras.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[ adapted 2-argument strscpy_pad() call to 3-argument form with explicit sizeof() ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1f941b2c23fd34c6f3b76d36f9d0a2528fa92b8f upstream.
In error path, call drop_client() to drop the reference
obtained by get_nfsdfs_clp().
Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 68f6bd128e75a032432eda9d16676ed2969a1096 upstream.
When reading a compressed file, we may read several pages in addition to
the one requested. The current code will overwrite pages in the page
cache with the data from disc which can definitely result in changes
that have been made being lost.
For example if we have four consecutie pages ABCD in the file compressed
into a single extent, on first access, we'll bring in ABCD. Then we
write to page B. Memory pressure results in the eviction of ACD.
When we attempt to write to page C, we will overwrite the data in page
B with the data currently on disk.
I haven't investigated the decompression code to check whether it's
OK to overwrite a clean page or whether it might be possible to see
corrupt data. Out of an abundance of caution, decline to overwrite
uptodate pages, not just dirty pages.
Fixes: 4342306f0f0d (fs/ntfs3: Add file operations and implementation)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0c56693b06a68476ba113db6347e7897475f9e4c ]
In get_file_all_info(), if vfs_getattr() fails, the function returns
immediately without freeing the allocated filename, leading to a memory
leak.
Fix this by freeing the filename before returning in this error case.
Fixes: 5614c8c487f6a ("ksmbd: replace generic_fillattr with vfs_getattr")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
transaction
commit 266273eaf4d99475f1ae57f687b3e42bc71ec6f0 upstream.
We can't log a conflicting inode if it's a directory and it was moved
from one parent directory to another parent directory in the current
transaction, as this can result an attempt to have a directory with
two hard links during log replay, one for the old parent directory and
another for the new parent directory.
The following scenario triggers that issue:
1) We have directories "dir1" and "dir2" created in a past transaction.
Directory "dir1" has inode A as its parent directory;
2) We move "dir1" to some other directory;
3) We create a file with the name "dir1" in directory inode A;
4) We fsync the new file. This results in logging the inode of the new file
and the inode for the directory "dir1" that was previously moved in the
current transaction. So the log tree has the INODE_REF item for the
new location of "dir1";
5) We move the new file to some other directory. This results in updating
the log tree to included the new INODE_REF for the new location of the
file and removes the INODE_REF for the old location. This happens
during the rename when we call btrfs_log_new_name();
6) We fsync the file, and that persists the log tree changes done in the
previous step (btrfs_log_new_name() only updates the log tree in
memory);
7) We have a power failure;
8) Next time the fs is mounted, log replay happens and when processing
the inode for directory "dir1" we find a new INODE_REF and add that
link, but we don't remove the old link of the inode since we have
not logged the old parent directory of the directory inode "dir1".
As a result after log replay finishes when we trigger writeback of the
subvolume tree's extent buffers, the tree check will detect that we have
a directory a hard link count of 2 and we get a mount failure.
The errors and stack traces reported in dmesg/syslog are like this:
[ 3845.729764] BTRFS info (device dm-0): start tree-log replay
[ 3845.730304] page: refcount:3 mapcount:0 mapping:000000005c8a3027 index:0x1d00 pfn:0x11510c
[ 3845.731236] memcg:ffff9264c02f4e00
[ 3845.731751] aops:btree_aops [btrfs] ino:1
[ 3845.732300] flags: 0x17fffc00000400a(uptodate|private|writeback|node=0|zone=2|lastcpupid=0x1ffff)
[ 3845.733346] raw: 017fffc00000400a 0000000000000000 dead000000000122 ffff9264d978aea8
[ 3845.734265] raw: 0000000000001d00 ffff92650e6d4738 00000003ffffffff ffff9264c02f4e00
[ 3845.735305] page dumped because: eb page dump
[ 3845.735981] BTRFS critical (device dm-0): corrupt leaf: root=5 block=30408704 slot=6 ino=257, invalid nlink: has 2 expect no more than 1 for dir
[ 3845.737786] BTRFS info (device dm-0): leaf 30408704 gen 10 total ptrs 17 free space 14881 owner 5
[ 3845.737789] BTRFS info (device dm-0): refs 4 lock_owner 0 current 30701
[ 3845.737792] item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
[ 3845.737794] inode generation 3 transid 9 size 16 nbytes 16384
[ 3845.737795] block group 0 mode 40755 links 1 uid 0 gid 0
[ 3845.737797] rdev 0 sequence 2 flags 0x0
[ 3845.737798] atime 1764259517.0
[ 3845.737800] ctime 1764259517.572889464
[ 3845.737801] mtime 1764259517.572889464
[ 3845.737802] otime 1764259517.0
[ 3845.737803] item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
[ 3845.737805] index 0 name_len 2
[ 3845.737807] item 2 key (256 DIR_ITEM 2363071922) itemoff 16077 itemsize 34
[ 3845.737808] location key (257 1 0) type 2
[ 3845.737810] transid 9 data_len 0 name_len 4
[ 3845.737811] item 3 key (256 DIR_ITEM 2676584006) itemoff 16043 itemsize 34
[ 3845.737813] location key (258 1 0) type 2
[ 3845.737814] transid 9 data_len 0 name_len 4
[ 3845.737815] item 4 key (256 DIR_INDEX 2) itemoff 16009 itemsize 34
[ 3845.737816] location key (257 1 0) type 2
[ 3845.737818] transid 9 data_len 0 name_len 4
[ 3845.737819] item 5 key (256 DIR_INDEX 3) itemoff 15975 itemsize 34
[ 3845.737820] location key (258 1 0) type 2
[ 3845.737821] transid 9 data_len 0 name_len 4
[ 3845.737822] item 6 key (257 INODE_ITEM 0) itemoff 15815 itemsize 160
[ 3845.737824] inode generation 9 transid 10 size 6 nbytes 0
[ 3845.737825] block group 0 mode 40755 links 2 uid 0 gid 0
[ 3845.737826] rdev 0 sequence 1 flags 0x0
[ 3845.737827] atime 1764259517.572889464
[ 3845.737828] ctime 1764259517.572889464
[ 3845.737830] mtime 1764259517.572889464
[ 3845.737831] otime 1764259517.572889464
[ 3845.737832] item 7 key (257 INODE_REF 256) itemoff 15801 itemsize 14
[ 3845.737833] index 2 name_len 4
[ 3845.737834] item 8 key (257 INODE_REF 258) itemoff 15787 itemsize 14
[ 3845.737836] index 2 name_len 4
[ 3845.737837] item 9 key (257 DIR_ITEM 2507850652) itemoff 15754 itemsize 33
[ 3845.737838] location key (259 1 0) type 1
[ 3845.737839] transid 10 data_len 0 name_len 3
[ 3845.737840] item 10 key (257 DIR_INDEX 2) itemoff 15721 itemsize 33
[ 3845.737842] location key (259 1 0) type 1
[ 3845.737843] transid 10 data_len 0 name_len 3
[ 3845.737844] item 11 key (258 INODE_ITEM 0) itemoff 15561 itemsize 160
[ 3845.737846] inode generation 9 transid 10 size 8 nbytes 0
[ 3845.737847] block group 0 mode 40755 links 1 uid 0 gid 0
[ 3845.737848] rdev 0 sequence 1 flags 0x0
[ 3845.737849] atime 1764259517.572889464
[ 3845.737850] ctime 1764259517.572889464
[ 3845.737851] mtime 1764259517.572889464
[ 3845.737852] otime 1764259517.572889464
[ 3845.737853] item 12 key (258 INODE_REF 256) itemoff 15547 itemsize 14
[ 3845.737855] index 3 name_len 4
[ 3845.737856] item 13 key (258 DIR_ITEM 1843588421) itemoff 15513 itemsize 34
[ 3845.737857] location key (257 1 0) type 2
[ 3845.737858] transid 10 data_len 0 name_len 4
[ 3845.737860] item 14 key (258 DIR_INDEX 2) itemoff 15479 itemsize 34
[ 3845.737861] location key (257 1 0) type 2
[ 3845.737862] transid 10 data_len 0 name_len 4
[ 3845.737863] item 15 key (259 INODE_ITEM 0) itemoff 15319 itemsize 160
[ 3845.737865] inode generation 10 transid 10 size 0 nbytes 0
[ 3845.737866] block group 0 mode 100600 links 1 uid 0 gid 0
[ 3845.737867] rdev 0 sequence 2 flags 0x0
[ 3845.737868] atime 1764259517.580874966
[ 3845.737869] ctime 1764259517.586121869
[ 3845.737870] mtime 1764259517.580874966
[ 3845.737872] otime 1764259517.580874966
[ 3845.737873] item 16 key (259 INODE_REF 257) itemoff 15306 itemsize 13
[ 3845.737874] index 2 name_len 3
[ 3845.737875] BTRFS error (device dm-0): block=30408704 write time tree block corruption detected
[ 3845.739448] ------------[ cut here ]------------
[ 3845.740092] WARNING: CPU: 5 PID: 30701 at fs/btrfs/disk-io.c:335 btree_csum_one_bio+0x25a/0x270 [btrfs]
[ 3845.741439] Modules linked in: btrfs dm_flakey crc32c_cryptoapi (...)
[ 3845.750626] CPU: 5 UID: 0 PID: 30701 Comm: mount Tainted: G W 6.18.0-rc6-btrfs-next-218+ #1 PREEMPT(full)
[ 3845.752414] Tainted: [W]=WARN
[ 3845.752828] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 3845.754499] RIP: 0010:btree_csum_one_bio+0x25a/0x270 [btrfs]
[ 3845.755460] Code: 31 f6 48 89 (...)
[ 3845.758685] RSP: 0018:ffffa8d9c5677678 EFLAGS: 00010246
[ 3845.759450] RAX: 0000000000000000 RBX: ffff92650e6d4738 RCX: 0000000000000000
[ 3845.760309] RDX: 0000000000000000 RSI: ffffffff9aab45b9 RDI: ffff9264c4748000
[ 3845.761239] RBP: ffff9264d4324000 R08: 0000000000000000 R09: ffffa8d9c5677468
[ 3845.762607] R10: ffff926bdc1fffa8 R11: 0000000000000003 R12: ffffa8d9c5677680
[ 3845.764099] R13: 0000000000004000 R14: ffff9264dd624000 R15: ffff9264d978aba8
[ 3845.765094] FS: 00007f751fa5a840(0000) GS:ffff926c42a82000(0000) knlGS:0000000000000000
[ 3845.766226] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3845.766970] CR2: 0000558df1815380 CR3: 000000010ed88003 CR4: 0000000000370ef0
[ 3845.768009] Call Trace:
[ 3845.768392] <TASK>
[ 3845.768714] btrfs_submit_bbio+0x6ee/0x7f0 [btrfs]
[ 3845.769640] ? write_one_eb+0x28e/0x340 [btrfs]
[ 3845.770588] btree_write_cache_pages+0x2f0/0x550 [btrfs]
[ 3845.771286] ? alloc_extent_state+0x19/0x100 [btrfs]
[ 3845.771967] ? merge_next_state+0x1a/0x90 [btrfs]
[ 3845.772586] ? set_extent_bit+0x233/0x8b0 [btrfs]
[ 3845.773198] ? xas_load+0x9/0xc0
[ 3845.773589] ? xas_find+0x14d/0x1a0
[ 3845.773969] do_writepages+0xc6/0x160
[ 3845.774367] filemap_fdatawrite_wbc+0x48/0x60
[ 3845.775003] __filemap_fdatawrite_range+0x5b/0x80
[ 3845.775902] btrfs_write_marked_extents+0x61/0x170 [btrfs]
[ 3845.776707] btrfs_write_and_wait_transaction+0x4e/0xc0 [btrfs]
[ 3845.777379] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 3845.777923] btrfs_commit_transaction+0x5ea/0xd20 [btrfs]
[ 3845.778551] ? _raw_spin_unlock+0x15/0x30
[ 3845.778986] ? release_extent_buffer+0x34/0x160 [btrfs]
[ 3845.779659] btrfs_recover_log_trees+0x7a3/0x7c0 [btrfs]
[ 3845.780416] ? __pfx_replay_one_buffer+0x10/0x10 [btrfs]
[ 3845.781499] open_ctree+0x10bb/0x15f0 [btrfs]
[ 3845.782194] btrfs_get_tree.cold+0xb/0x16c [btrfs]
[ 3845.782764] ? fscontext_read+0x15c/0x180
[ 3845.783202] ? rw_verify_area+0x50/0x180
[ 3845.783667] vfs_get_tree+0x25/0xd0
[ 3845.784047] vfs_cmd_create+0x59/0xe0
[ 3845.784458] __do_sys_fsconfig+0x4f6/0x6b0
[ 3845.784914] do_syscall_64+0x50/0x1220
[ 3845.785340] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 3845.785980] RIP: 0033:0x7f751fc7f4aa
[ 3845.786759] Code: 73 01 c3 48 (...)
[ 3845.789951] RSP: 002b:00007ffcdba45dc8 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
[ 3845.791402] RAX: ffffffffffffffda RBX: 000055ccc8291c20 RCX: 00007f751fc7f4aa
[ 3845.792688] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
[ 3845.794308] RBP: 000055ccc8292120 R08: 0000000000000000 R09: 0000000000000000
[ 3845.795829] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 3845.797183] R13: 00007f751fe11580 R14: 00007f751fe1326c R15: 00007f751fdf8a23
[ 3845.798633] </TASK>
[ 3845.799067] ---[ end trace 0000000000000000 ]---
[ 3845.800215] BTRFS: error (device dm-0) in btrfs_commit_transaction:2553: errno=-5 IO failure (Error while writing out transaction)
[ 3845.801860] BTRFS warning (device dm-0 state E): Skipping commit of aborted transaction.
[ 3845.802815] BTRFS error (device dm-0 state EA): Transaction aborted (error -5)
[ 3845.803728] BTRFS: error (device dm-0 state EA) in cleanup_transaction:2036: errno=-5 IO failure
[ 3845.805374] BTRFS: error (device dm-0 state EA) in btrfs_replay_log:2083: errno=-5 IO failure (Failed to recover log tree)
[ 3845.807919] BTRFS error (device dm-0 state EA): open_ctree failed: -5
Fix this by never logging a conflicting inode that is a directory and was
moved in the current transaction (its last_unlink_trans equals the current
transaction) and instead fallback to a transaction commit.
A test case for fstests will follow soon.
Reported-by: Vyacheslav Kovalevsky <slva.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/7bbc9419-5c56-450a-b5a0-efeae7457113@gmail.com/
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ebae102897e760e9e6bc625f701dd666b2163bd1 upstream.
Clang is not happy about set but (in some cases) unused variable:
fs/nfsd/export.c:1027:17: error: variable 'inode' set but not used [-Werror,-Wunused-but-set-variable]
since it's used as a parameter to dprintk() which might be configured
a no-op. To avoid uglifying code with the specific ifdeffery just mark
the variable __maybe_unused.
The commit [1], which introduced this behaviour, is quite old and hence
the Fixes tag points to the first of the Git era.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0431923fb7a1 [1]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 635bc4def026a24e071436f4f356ea08c0eed6ff upstream.
inotify/fanotify do not allow users with no read access to a file to
subscribe to events (e.g. IN_ACCESS/IN_MODIFY), but they do allow the
same user to subscribe for watching events on children when the user
has access to the parent directory (e.g. /dev).
Users with no read access to a file but with read access to its parent
directory can still stat the file and see if it was accessed/modified
via atime/mtime change.
The same is not true for special files (e.g. /dev/null). Users will not
generally observe atime/mtime changes when other users read/write to
special files, only when someone sets atime/mtime via utimensat().
Align fsnotify events with this stat behavior and do not generate
ACCESS/MODIFY events to parent watchers on read/write of special files.
The events are still generated to parent watchers on utimensat(). This
closes some side-channels that could be possibly used for information
exfiltration [1].
[1] https://snee.la/pdf/pubs/file-notification-attacks.pdf
Reported-by: Sudheendra Raghav Neela <sneela@tugraz.at>
CC: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc40459de82543b565ebc839dca8f7987f16f62e upstream.
xfs_buf_item_get_format() may allocate memory for bip->bli_formats,
free the memory in the error path.
Fixes: c3d5f0c2fb85 ("xfs: complain if anyone tries to create a too-large buffer log item")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 039bef30e320827bac8990c9f29d2a68cd8adb5f upstream.
syzbot reported a kernel BUG in ocfs2_find_victim_chain() because the
`cl_next_free_rec` field of the allocation chain list (next free slot in
the chain list) is 0, triggring the BUG_ON(!cl->cl_next_free_rec)
condition in ocfs2_find_victim_chain() and panicking the kernel.
To fix this, an if condition is introduced in ocfs2_claim_suballoc_bits(),
just before calling ocfs2_find_victim_chain(), the code block in it being
executed when either of the following conditions is true:
1. `cl_next_free_rec` is equal to 0, indicating that there are no free
chains in the allocation chain list
2. `cl_next_free_rec` is greater than `cl_count` (the total number of
chains in the allocation chain list)
Either of them being true is indicative of the fact that there are no
chains left for usage.
This is addressed using ocfs2_error(), which prints
the error log for debugging purposes, rather than panicking the kernel.
Link: https://lkml.kernel.org/r/20251201130711.143900-1-activprithvi@gmail.com
Signed-off-by: Prithvi Tambewagh <activprithvi@gmail.com>
Reported-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96d38c6e1655c1420a72
Tested-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Heming Zhao <heming.zhao@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 01fba45deaddcce0d0b01c411435d1acf6feab7b upstream.
With below scripts, it will trigger panic in f2fs:
mkfs.f2fs -f /dev/vdd
mount /dev/vdd /mnt/f2fs
touch /mnt/f2fs/foo
sync
echo 111 >> /mnt/f2fs/foo
f2fs_io fsync /mnt/f2fs/foo
f2fs_io shutdown 2 /mnt/f2fs
umount /mnt/f2fs
mount -o ro,norecovery /dev/vdd /mnt/f2fs
or
mount -o ro,disable_roll_forward /dev/vdd /mnt/f2fs
F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 0
F2FS-fs (vdd): Mounted with checkpoint version = 7f5c361f
F2FS-fs (vdd): Stopped filesystem due to reason: 0
F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 1
Filesystem f2fs get_tree() didn't set fc->root, returned 1
------------[ cut here ]------------
kernel BUG at fs/super.c:1761!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 3 UID: 0 PID: 722 Comm: mount Not tainted 6.18.0-rc2+ #721 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:vfs_get_tree.cold+0x18/0x1a
Call Trace:
<TASK>
fc_mount+0x13/0xa0
path_mount+0x34e/0xc50
__x64_sys_mount+0x121/0x150
do_syscall_64+0x84/0x800
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa6cc126cfe
The root cause is we missed to handle error number returned from
f2fs_recover_fsync_data() when mounting image w/ ro,norecovery or
ro,disable_roll_forward mount option, result in returning a positive
error number to vfs_get_tree(), fix it.
Cc: stable@kernel.org
Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 27bf6a637b7613fc85fa6af468b7d612d78cd5c0 upstream.
The age extent cache uses last_blocks (derived from
allocated_data_blocks) to determine data age. However, there's a
conflict between the deletion
marker (last_blocks=0) and legitimate last_blocks=0 cases when
allocated_data_blocks overflows to 0 after reaching ULLONG_MAX.
In this case, valid extents are incorrectly skipped due to the
"if (!tei->last_blocks)" check in __update_extent_tree_range().
This patch fixes the issue by:
1. Reserving ULLONG_MAX as an invalid/deletion marker
2. Limiting allocated_data_blocks to range [0, ULLONG_MAX-1]
3. Using F2FS_EXTENT_AGE_INVALID for deletion scenarios
4. Adjusting overflow age calculation from ULLONG_MAX to (ULLONG_MAX-1)
Reproducer (using a patched kernel with allocated_data_blocks
initialized to ULLONG_MAX - 3 for quick testing):
Step 1: Mount and check initial state
# dd if=/dev/zero of=/tmp/test.img bs=1M count=100
# mkfs.f2fs -f /tmp/test.img
# mkdir -p /mnt/f2fs_test
# mount -t f2fs -o loop,age_extent_cache /tmp/test.img /mnt/f2fs_test
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551612 # ULLONG_MAX - 3
Inner Struct Count: tree: 1(0), node: 0
Step 2: Create files and write data to trigger overflow
# touch /mnt/f2fs_test/{1,2,3,4}.txt; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551613 # ULLONG_MAX - 2
Inner Struct Count: tree: 5(0), node: 1
# dd if=/dev/urandom of=/mnt/f2fs_test/1.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551614 # ULLONG_MAX - 1
Inner Struct Count: tree: 5(0), node: 2
# dd if=/dev/urandom of=/mnt/f2fs_test/2.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551615 # ULLONG_MAX
Inner Struct Count: tree: 5(0), node: 3
# dd if=/dev/urandom of=/mnt/f2fs_test/3.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 0 # Counter overflowed!
Inner Struct Count: tree: 5(0), node: 4
Step 3: Trigger the bug - next write should create node but gets skipped
# dd if=/dev/urandom of=/mnt/f2fs_test/4.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 1
Inner Struct Count: tree: 5(0), node: 4
Expected: node: 5 (new extent node for 4.txt)
Actual: node: 4 (extent insertion was incorrectly skipped due to
last_blocks = allocated_data_blocks = 0 in __get_new_block_age)
After this fix, the extent node is correctly inserted and node count
becomes 5 as expected.
Fixes: 71644dff4811 ("f2fs: add block_age-based extent cache")
Cc: stable@kernel.org
Signed-off-by: Xiaole He <hexiaole1994@126.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d33f89b34aa313f50f9a512d58dd288999f246b0 upstream.
F2FS can mount filesystems with corrupted directory depth values that
get runtime-clamped to MAX_DIR_HASH_DEPTH. When RENAME_WHITEOUT
operations are performed on such directories, f2fs_rename performs
directory modifications (updating target entry and deleting source
entry) before attempting to add the whiteout entry via f2fs_add_link.
If f2fs_add_link fails due to the corrupted directory structure, the
function returns an error to VFS, but the partial directory
modifications have already been committed to disk. VFS assumes the
entire rename operation failed and does not update the dentry cache,
leaving stale mappings.
In the error path, VFS does not call d_move() to update the dentry
cache. This results in new_dentry still pointing to the old inode
(new_inode) which has already had its i_nlink decremented to zero.
The stale cache causes subsequent operations to incorrectly reference
the freed inode.
This causes subsequent operations to use cached dentry information that
no longer matches the on-disk state. When a second rename targets the
same entry, VFS attempts to decrement i_nlink on the stale inode, which
may already have i_nlink=0, triggering a WARNING in drop_nlink().
Example sequence:
1. First rename (RENAME_WHITEOUT): file2 → file1
- f2fs updates file1 entry on disk (points to inode 8)
- f2fs deletes file2 entry on disk
- f2fs_add_link(whiteout) fails (corrupted directory)
- Returns error to VFS
- VFS does not call d_move() due to error
- VFS cache still has: file1 → inode 7 (stale!)
- inode 7 has i_nlink=0 (already decremented)
2. Second rename: file3 → file1
- VFS uses stale cache: file1 → inode 7
- Tries to drop_nlink on inode 7 (i_nlink already 0)
- WARNING in drop_nlink()
Fix this by explicitly invalidating old_dentry and new_dentry when
f2fs_add_link fails during whiteout creation. This forces VFS to
refresh from disk on subsequent operations, ensuring cache consistency
even when the rename partially succeeds.
Reproducer:
1. Mount F2FS image with corrupted i_current_depth
2. renameat2(file2, file1, RENAME_WHITEOUT)
3. renameat2(file3, file1, 0)
4. System triggers WARNING in drop_nlink()
Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT")
Reported-by: syzbot+632cf32276a9a564188d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=632cf32276a9a564188d
Suggested-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/all/20251022233349.102728-1-kartikey406@gmail.com/ [v1]
Cc: stable@vger.kernel.org
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7c37c79510329cd951a4dedf3f7bf7e2b18dccec upstream.
As syzbot reported:
F2FS-fs (loop0): __update_extent_tree_range: extent len is zero, type: 0, extent [0, 0, 0], age [0, 0]
------------[ cut here ]------------
kernel BUG at fs/f2fs/extent_cache.c:678!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5336 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__update_extent_tree_range+0x13bc/0x1500 fs/f2fs/extent_cache.c:678
Call Trace:
<TASK>
f2fs_update_read_extent_cache_range+0x192/0x3e0 fs/f2fs/extent_cache.c:1085
f2fs_do_zero_range fs/f2fs/file.c:1657 [inline]
f2fs_zero_range+0x10c1/0x1580 fs/f2fs/file.c:1737
f2fs_fallocate+0x583/0x990 fs/f2fs/file.c:2030
vfs_fallocate+0x669/0x7e0 fs/open.c:342
ioctl_preallocate fs/ioctl.c:289 [inline]
file_ioctl+0x611/0x780 fs/ioctl.c:-1
do_vfs_ioctl+0xb33/0x1430 fs/ioctl.c:576
__do_sys_ioctl fs/ioctl.c:595 [inline]
__se_sys_ioctl+0x82/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f07bc58eec9
In error path of f2fs_zero_range(), it may add a zero-sized extent
into extent cache, it should be avoided.
Fixes: 6e9619499f53 ("f2fs: support in batch fzero in dnode page")
Cc: stable@kernel.org
Reported-by: syzbot+24124df3170c3638b35f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68e5d698.050a0220.256323.0032.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 297baa4aa263ff8f5b3d246ee16a660d76aa82c4 upstream.
Xfstests generic/335, generic/336 sometimes crash with the following message:
F2FS-fs (dm-0): detect filesystem reference count leak during umount, type: 9, count: 1
------------[ cut here ]------------
kernel BUG at fs/f2fs/super.c:1939!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 609351 Comm: umount Tainted: G W 6.17.0-rc5-xfstests-g9dd1835ecda5 #1 PREEMPT(none)
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:f2fs_put_super+0x3b3/0x3c0
Call Trace:
<TASK>
generic_shutdown_super+0x7e/0x190
kill_block_super+0x1a/0x40
kill_f2fs_super+0x9d/0x190
deactivate_locked_super+0x30/0xb0
cleanup_mnt+0xba/0x150
task_work_run+0x5c/0xa0
exit_to_user_mode_loop+0xb7/0xc0
do_syscall_64+0x1ae/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
---[ end trace 0000000000000000 ]---
It appears that sometimes it is possible that f2fs_put_super() is called before
all node page reads are completed.
Adding a call to f2fs_wait_on_all_pages() for F2FS_RD_NODE fixes the problem.
Cc: stable@kernel.org
Fixes: 20872584b8c0b ("f2fs: fix to drop all dirty meta/node pages during umount()")
Signed-off-by: Jan Prusakowski <jprusakowski@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6f52063db9aabdaabea929b1e998af98c2e8d917 upstream.
The reservation type argument for the pr_preempt call should match the
one used in nfsd4_block_get_device_info_scsi.
Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
Cc: stable@vger.kernel.org
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 40a71b53d5a6d4ea17e4d54b99b2ac03a7f5e783 upstream.
jbd2 journal handling code doesn't want jbd2_might_wait_for_commit()
to be placed between start_this_handle() and stop_this_handle(). So it
marks the region with rwsem_acquire_read() and rwsem_release().
However, the annotation is too strong for that purpose. We don't have
to use more than try lock annotation for that.
rwsem_acquire_read() implies:
1. might be a waiter on contention of the lock.
2. enter to the critical section of the lock.
All we need in here is to act 2, not 1. So trylock version of
annotation is sufficient for that purpose. Now that dept partially
relies on lockdep annotaions, dept interpets rwsem_acquire_read() as a
potential wait and might report a deadlock by the wait.
Replace it with trylock version of annotation.
Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Message-ID: <20251024073940.1063-1-byungchul@sk.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 524c3853831cf4f7e1db579e487c757c3065165c upstream.
syzbot is reporting possibility of deadlock due to sharing lock_class_key
for jbd2_handle across ext4 and ocfs2. But this is a false positive, for
one disk partition can't have two filesystems at the same time.
Reported-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6e493c165d26d6fcbf72
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <987110fc-5470-457a-a218-d286a09dd82f@I-love.SAKURA.ne.jp>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|