Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull smb client fix from Steve French:
"SMB3 client multichannel fix"
* tag 'v6.14-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: pick channels for individual subrequests
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix stale page cache after race between readahead and direct IO write
- fix hole expansion when writing at an offset beyond EOF, the range
will not be zeroed
- use proper way to calculate offsets in folio ranges
* tag 'for-6.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix hole expansion when writing at an offset beyond EOF
btrfs: fix stale page cache after race between readahead and direct IO write
btrfs: fix two misuses of folio_shift()
|
|
Pull bcachefs fixes from Kent Overstreet:
"Just small stuff.
As a general announcement, on disk format is now frozen in my master
branch - future on disk format changes will be optional, not required.
- More fixes for going read-only: the previous fix was insufficient,
but with more work on ordering journal reclaim flushing (and a
btree node accounting fix so we don't split until we have to) the
tiering_replication test now consistently goes read-only in less
than a second.
- fix for fsck when we have reflink pointers to missing indirect
extents
- some transaction restart handling fixes from Alan; the "Pass
_orig_restart_count to trans_was_restarted" likely fixes some rare
undefined behaviour heisenbugs"
* tag 'bcachefs-2025-02-12' of git://evilpiepirate.org/bcachefs:
bcachefs: Reuse transaction
bcachefs: Pass _orig_restart_count to trans_was_restarted
bcachefs: CONFIG_BCACHEFS_INJECT_TRANSACTION_RESTARTS
bcachefs: Fix want_new_bset() so we write until the end of the btree node
bcachefs: Split out journal pins by btree level
bcachefs: Fix use after free
bcachefs: Fix marking reflink pointers to missing indirect extents
|
|
bch2_nocow_write_convert_unwritten is already in transaction context:
00191 ========= TEST generic/648
00242 kernel BUG at fs/bcachefs/btree_iter.c:3332!
00242 Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
00242 Modules linked in:
00242 CPU: 4 UID: 0 PID: 2593 Comm: fsstress Not tainted 6.13.0-rc3-ktest-g345af8f855b7 #14403
00242 Hardware name: linux,dummy-virt (DT)
00242 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00242 pc : __bch2_trans_get+0x120/0x410
00242 lr : __bch2_trans_get+0xcc/0x410
00242 sp : ffffff80d89af600
00242 x29: ffffff80d89af600 x28: ffffff80ddb23000 x27: 00000000fffff705
00242 x26: ffffff80ddb23028 x25: ffffff80d8903fe0 x24: ffffff80ebb30168
00242 x23: ffffff80c8aeb500 x22: 000000000000005d x21: ffffff80d8904078
00242 x20: ffffff80d8900000 x19: ffffff80da9e8000 x18: 0000000000000000
00242 x17: 64747568735f6c61 x16: 6e72756f6a20726f x15: 0000000000000028
00242 x14: 0000000000000004 x13: 000000000000f787 x12: ffffffc081bbcdc8
00242 x11: 0000000000000000 x10: 0000000000000003 x9 : ffffffc08094efbc
00242 x8 : 000000001092c111 x7 : 000000000000000c x6 : ffffffc083c31fc4
00242 x5 : ffffffc083c31f28 x4 : ffffff80c8aeb500 x3 : ffffff80ebb30000
00242 x2 : 0000000000000001 x1 : 0000000000000a21 x0 : 000000000000028e
00242 Call trace:
00242 __bch2_trans_get+0x120/0x410 (P)
00242 bch2_inum_offset_err_msg+0x48/0xb0
00242 bch2_nocow_write_convert_unwritten+0x3d0/0x530
00242 bch2_nocow_write+0xeb0/0x1000
00242 __bch2_write+0x330/0x4e8
00242 bch2_write+0x1f0/0x530
00242 bch2_direct_write+0x530/0xc00
00242 bch2_write_iter+0x160/0xbe0
00242 vfs_write+0x1cc/0x360
00242 ksys_write+0x5c/0xf0
00242 __arm64_sys_write+0x20/0x30
00242 invoke_syscall.constprop.0+0x54/0xe8
00242 do_el0_svc+0x44/0xc0
00242 el0_svc+0x34/0xa0
00242 el0t_64_sync_handler+0x104/0x130
00242 el0t_64_sync+0x154/0x158
00242 Code: 6b01001f 54ffff01 79408460 3617fec0 (d4210000)
00242 ---[ end trace 0000000000000000 ]---
00242 Kernel panic - not syncing: Oops - BUG: Fatal exception
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
_orig_restart_count is unused now, according to the logic, trans_was_restarted
should be using _orig_restart_count.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Incorrectly handled transaction restarts can be a source of heisenbugs;
add a mode where we randomly inject them to shake them out.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The netfs library could break down a read request into
multiple subrequests. When multichannel is used, there is
potential to improve performance when each of these
subrequests pick a different channel.
Today we call cifs_pick_channel when the main read request
is initialized in cifs_init_request. This change moves this to
cifs_prepare_read, which is the right place to pick channel since
it gets called for each subrequest.
Interestingly cifs_prepare_write already does channel selection
for individual subreq, but looks like it was missed for read.
This is especially important when multichannel is used with
increased rasize.
In my test setup, with rasize set to 8MB, a sequential read
of large file was taking 11.5s without this change. With the
change, it completed in 9s. The difference is even more signigicant
with bigger rasize.
Cc: <stable@vger.kernel.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
At btrfs_write_check() if our file's i_size is not sector size aligned and
we have a write that starts at an offset larger than the i_size that falls
within the same page of the i_size, then we end up not zeroing the file
range [i_size, write_offset).
The code is this:
start_pos = round_down(pos, fs_info->sectorsize);
oldsize = i_size_read(inode);
if (start_pos > oldsize) {
/* Expand hole size to cover write data, preventing empty gap */
loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
if (ret)
return ret;
}
So if our file's i_size is 90269 bytes and a write at offset 90365 bytes
comes in, we get 'start_pos' set to 90112 bytes, which is less than the
i_size and therefore we don't zero out the range [90269, 90365) by
calling btrfs_cont_expand().
This is an old bug introduced in commit 9036c10208e1 ("Btrfs: update hole
handling v2"), from 2008, and the buggy code got moved around over the
years.
Fix this by discarding 'start_pos' and comparing against the write offset
('pos') without any alignment.
This bug was recently exposed by test case generic/363 which tests this
scenario by polluting ranges beyond EOF with an mmap write and than verify
that after a file increases we get zeroes for the range which is supposed
to be a hole and not what we wrote with the previous mmaped write.
We're only seeing this exposed now because generic/363 used to run only
on xfs until last Sunday's fstests update.
The test was failing like this:
$ ./check generic/363
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.13.0-rc7-btrfs-next-185+ #17 SMP PREEMPT_DYNAMIC Mon Feb 3 12:28:46 WET 2025
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
generic/363 0s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad)
--- tests/generic/363.out 2025-02-05 15:31:14.013646509 +0000
+++ /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad 2025-02-05 17:25:33.112630781 +0000
@@ -1 +1,46 @@
QA output created by 363
+READ BAD DATA: offset = 0xdcad, size = 0xd921, fname = /home/fdmanana/btrfs-tests/dev/junk
+OFFSET GOOD BAD RANGE
+0x1609d 0x0000 0x3104 0x0
+operation# (mod 256) for the bad data may be 4
+0x1609e 0x0000 0x0472 0x1
+operation# (mod 256) for the bad data may be 4
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/363.out /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad' to see the entire diff)
Ran: generic/363
Failures: generic/363
Failed 1 of 1 tests
Fixes: 9036c10208e1 ("Btrfs: update hole handling v2")
CC: stable@vger.kernel.org
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
After commit ac325fc2aad5 ("btrfs: do not hold the extent lock for entire
read") we can now trigger a race between a task doing a direct IO write
and readahead. When this race is triggered it results in tasks getting
stale data when they attempt do a buffered read (including the task that
did the direct IO write).
This race can be sporadically triggered with test case generic/418, failing
like this:
$ ./check generic/418
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.13.0-rc7-btrfs-next-185+ #17 SMP PREEMPT_DYNAMIC Mon Feb 3 12:28:46 WET 2025
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
generic/418 14s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad)
--- tests/generic/418.out 2020-06-10 19:29:03.850519863 +0100
+++ /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad 2025-02-03 15:42:36.974609476 +0000
@@ -1,2 +1,5 @@
QA output created by 418
+cmpbuf: offset 0: Expected: 0x1, got 0x0
+[6:0] FAIL - comparison failed, offset 24576
+diotest -wp -b 4096 -n 8 -i 4 failed at loop 3
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/418.out /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad' to see the entire diff)
Ran: generic/418
Failures: generic/418
Failed 1 of 1 tests
The race happens like this:
1) A file has a prealloc extent for the range [16K, 28K);
2) Task A starts a direct IO write against file range [24K, 28K).
At the start of the direct IO write it invalidates the page cache at
__iomap_dio_rw() with kiocb_invalidate_pages() for the 4K page at file
offset 24K;
3) Task A enters btrfs_dio_iomap_begin() and locks the extent range
[24K, 28K);
4) Task B starts a readahead for file range [16K, 28K), entering
btrfs_readahead().
First it attempts to read the page at offset 16K by entering
btrfs_do_readpage(), where it calls get_extent_map(), locks the range
[16K, 20K) and gets the extent map for the range [16K, 28K), caching
it into the 'em_cached' variable declared in the local stack of
btrfs_readahead(), and then unlocks the range [16K, 20K).
Since the extent map has the prealloc flag, at btrfs_do_readpage() we
zero out the page's content and don't submit any bio to read the page
from the extent.
Then it attempts to read the page at offset 20K entering
btrfs_do_readpage() where we reuse the previously cached extent map
(decided by get_extent_map()) since it spans the page's range and
it's still in the inode's extent map tree.
Just like for the previous page, we zero out the page's content since
the extent map has the prealloc flag set.
Then it attempts to read the page at offset 24K entering
btrfs_do_readpage() where we reuse the previously cached extent map
(decided by get_extent_map()) since it spans the page's range and
it's still in the inode's extent map tree.
Just like for the previous pages, we zero out the page's content since
the extent map has the prealloc flag set. Note that we didn't lock the
extent range [24K, 28K), so we didn't synchronize with the ongoing
direct IO write being performed by task A;
5) Task A enters btrfs_create_dio_extent() and creates an ordered extent
for the range [24K, 28K), with the flags BTRFS_ORDERED_DIRECT and
BTRFS_ORDERED_PREALLOC set;
6) Task A unlocks the range [24K, 28K) at btrfs_dio_iomap_begin();
7) The ordered extent enters btrfs_finish_one_ordered() and locks the
range [24K, 28K);
8) Task A enters fs/iomap/direct-io.c:iomap_dio_complete() and it tries
to invalidate the page at offset 24K by calling
kiocb_invalidate_post_direct_write(), resulting in a call chain that
ends up at btrfs_release_folio().
The btrfs_release_folio() call ends up returning false because the range
for the page at file offset 24K is currently locked by the task doing
the ordered extent completion in the previous step (7), so we have:
btrfs_release_folio() ->
__btrfs_release_folio() ->
try_release_extent_mapping() ->
try_release_extent_state()
This last function checking that the range is locked and returning false
and propagating it up to btrfs_release_folio().
So this results in a failure to invalidate the page and
kiocb_invalidate_post_direct_write() triggers this message logged in
dmesg:
Page cache invalidation failure on direct I/O. Possible data corruption due to collision with buffered I/O!
After this we leave the page cache with stale data for the file range
[24K, 28K), filled with zeroes instead of the data written by direct IO
write (all bytes with a 0x01 value), so any task attempting to read with
buffered IO, including the task that did the direct IO write, will get
all bytes in the range with a 0x00 value instead of the written data.
Fix this by locking the range, with btrfs_lock_and_flush_ordered_range(),
at the two callers of btrfs_do_readpage() instead of doing it at
get_extent_map(), just like we did before commit ac325fc2aad5 ("btrfs: do
not hold the extent lock for entire read"), and unlocking the range after
all the calls to btrfs_do_readpage(). This way we never reuse a cached
extent map without flushing any pending ordered extents from a concurrent
direct IO write.
Fixes: ac325fc2aad5 ("btrfs: do not hold the extent lock for entire read")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
want_new_bset() returns the address of a new bset to initialize if we
wish to do so in a btree node - either because the previous one is too
big, or because it's been written.
The case for 'previous bset was written' was wrong: it's only supposed
to check for if we have space in the node for one more block, but
because it subtracted the header from the space available it would never
initialize a new bset if we were down to the last block in a node.
Fixing this results in fewer btree node splits/compactions, which fixes
a bug with flushing the journal to go read-only sometimes not
terminating or taking excessively long.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This lets us flush the journal to go read-only more effectively.
Flushing the journal and going read-only requires halting mutually
recursive processes, which strictly speaking are not guaranteed to
terminate.
Flushing btree node journal pins will kick off a btree node write, and
btree node writes on completion must do another btree update to the
parent node to update the 'sectors_written' field for that node's key.
If the parent node is full and requires a split or compaction, that's
going to generate a whole bunch of additional btree updates - alloc
info, LRU btree, and more - which then have to be flushed, and the cycle
repeats.
This process will terminate much more effectively if we tweak journal
reclaim to flush btree updates leaf to root: i.e., don't flush updates
for a given btree node (kicking off a write, and consuming space within
that node up to the next block boundary) if there might still be
unflushed updates in child nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
acc->k.data should be used with the lock hold:
00221 ========= TEST generic/187
00221 run fstests generic/187 at 2025-02-09 21:08:10
00221 spectre-v4 mitigation disabled by command-line option
00222 bcachefs (vdc): starting version 1.20: directory_size opts=errors=ro
00222 bcachefs (vdc): initializing new filesystem
00222 bcachefs (vdc): going read-write
00222 bcachefs (vdc): marking superblocks
00222 bcachefs (vdc): initializing freespace
00222 bcachefs (vdc): done initializing freespace
00222 bcachefs (vdc): reading snapshots table
00222 bcachefs (vdc): reading snapshots done
00222 bcachefs (vdc): done starting filesystem
00222 bcachefs (vdc): shutting down
00222 bcachefs (vdc): going read-only
00222 bcachefs (vdc): finished waiting for writes to stop
00223 bcachefs (vdc): flushing journal and stopping allocators, journal seq 6
00223 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 8
00223 bcachefs (vdc): clean shutdown complete, journal seq 9
00223 bcachefs (vdc): marking filesystem clean
00223 bcachefs (vdc): shutdown complete
00223 bcachefs (vdc): starting version 1.20: directory_size opts=errors=ro
00223 bcachefs (vdc): initializing new filesystem
00223 bcachefs (vdc): going read-write
00223 bcachefs (vdc): marking superblocks
00223 bcachefs (vdc): initializing freespace
00223 bcachefs (vdc): done initializing freespace
00223 bcachefs (vdc): reading snapshots table
00223 bcachefs (vdc): reading snapshots done
00223 bcachefs (vdc): done starting filesystem
00244 hrtimer: interrupt took 123350440 ns
00264 bcachefs (vdc): shutting down
00264 bcachefs (vdc): going read-only
00264 bcachefs (vdc): finished waiting for writes to stop
00264 bcachefs (vdc): flushing journal and stopping allocators, journal seq 97
00265 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 101
00265 bcachefs (vdc): clean shutdown complete, journal seq 102
00265 bcachefs (vdc): marking filesystem clean
00265 bcachefs (vdc): shutdown complete
00265 bcachefs (vdc): starting version 1.20: directory_size opts=errors=ro
00265 bcachefs (vdc): recovering from clean shutdown, journal seq 102
00265 bcachefs (vdc): accounting_read...
00265 ==================================================================
00265 done
00265 BUG: KASAN: slab-use-after-free in bch2_fs_to_text+0x12b4/0x1728
00265 bcachefs (vdc): alloc_read... done
00265 bcachefs (vdc): stripes_read... done
00265 Read of size 4 at addr ffffff80c57eac00 by task cat/7531
00265 bcachefs (vdc): snapshots_read... done
00265
00265 CPU: 6 UID: 0 PID: 7531 Comm: cat Not tainted 6.13.0-rc3-ktest-g16fc6fa3819d #14103
00265 Hardware name: linux,dummy-virt (DT)
00265 Call trace:
00265 show_stack+0x1c/0x30 (C)
00265 dump_stack_lvl+0x6c/0x80
00265 print_report+0xf8/0x5d8
00265 kasan_report+0x90/0xd0
00265 __asan_report_load4_noabort+0x1c/0x28
00265 bch2_fs_to_text+0x12b4/0x1728
00265 bch2_fs_show+0x94/0x188
00265 sysfs_kf_seq_show+0x1a4/0x348
00265 kernfs_seq_show+0x12c/0x198
00265 seq_read_iter+0x27c/0xfd0
00265 kernfs_fop_read_iter+0x390/0x4f8
00265 vfs_read+0x480/0x7f0
00265 ksys_read+0xe0/0x1e8
00265 __arm64_sys_read+0x70/0xa8
00265 invoke_syscall.constprop.0+0x74/0x1e8
00265 do_el0_svc+0xc8/0x1c8
00265 el0_svc+0x20/0x60
00265 el0t_64_sync_handler+0x104/0x130
00265 el0t_64_sync+0x154/0x158
00265
00265 Allocated by task 7510:
00265 kasan_save_stack+0x28/0x50
00265 kasan_save_track+0x1c/0x38
00265 kasan_save_alloc_info+0x3c/0x50
00265 __kasan_kmalloc+0xac/0xb0
00265 __kmalloc_node_noprof+0x168/0x348
00265 __kvmalloc_node_noprof+0x20/0x140
00265 __bch2_darray_resize_noprof+0x90/0x1b0
00265 __bch2_accounting_mem_insert+0x76c/0xb08
00265 bch2_accounting_mem_insert+0x224/0x3b8
00265 bch2_accounting_mem_mod_locked+0x480/0xc58
00265 bch2_accounting_read+0xa94/0x3eb8
00265 bch2_run_recovery_pass+0x80/0x178
00265 bch2_run_recovery_passes+0x340/0x698
00265 bch2_fs_recovery+0x1c98/0x2bd8
00265 bch2_fs_start+0x240/0x490
00265 bch2_fs_get_tree+0xe1c/0x1458
00265 vfs_get_tree+0x7c/0x250
00265 path_mount+0xe24/0x1648
00265 __arm64_sys_mount+0x240/0x438
00265 invoke_syscall.constprop.0+0x74/0x1e8
00265 do_el0_svc+0xc8/0x1c8
00265 el0_svc+0x20/0x60
00265 el0t_64_sync_handler+0x104/0x130
00265 el0t_64_sync+0x154/0x158
00265
00265 Freed by task 7510:
00265 kasan_save_stack+0x28/0x50
00265 kasan_save_track+0x1c/0x38
00265 kasan_save_free_info+0x48/0x88
00265 __kasan_slab_free+0x48/0x60
00265 kfree+0x188/0x408
00265 kvfree+0x3c/0x50
00265 __bch2_darray_resize_noprof+0xe0/0x1b0
00265 __bch2_accounting_mem_insert+0x76c/0xb08
00265 bch2_accounting_mem_insert+0x224/0x3b8
00265 bch2_accounting_mem_mod_locked+0x480/0xc58
00265 bch2_accounting_read+0xa94/0x3eb8
00265 bch2_run_recovery_pass+0x80/0x178
00265 bch2_run_recovery_passes+0x340/0x698
00265 bch2_fs_recovery+0x1c98/0x2bd8
00265 bch2_fs_start+0x240/0x490
00265 bch2_fs_get_tree+0xe1c/0x1458
00265 vfs_get_tree+0x7c/0x250
00265 path_mount+0xe24/0x1648
00265 bcachefs (vdc): going read-write
00265 __arm64_sys_mount+0x240/0x438
00265 invoke_syscall.constprop.0+0x74/0x1e8
00265 do_el0_svc+0xc8/0x1c8
00265 el0_svc+0x20/0x60
00265 el0t_64_sync_handler+0x104/0x130
00265 el0t_64_sync+0x154/0x158
00265
00265 The buggy address belongs to the object at ffffff80c57eac00
00265 which belongs to the cache kmalloc-128 of size 128
00265 The buggy address is located 0 bytes inside of
00265 freed 128-byte region [ffffff80c57eac00, ffffff80c57eac80)
00265
00265 The buggy address belongs to the physical page:
00265 page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1057ea
00265 head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
00265 flags: 0x8000000000000040(head|zone=2)
00265 page_type: f5(slab)
00265 raw: 8000000000000040 ffffff80c0002800 dead000000000100 dead000000000122
00265 raw: 0000000000000000 0000000000200020 00000001f5000000 ffffff80c57a6400
00265 head: 8000000000000040 ffffff80c0002800 dead000000000100 dead000000000122
00265 head: 0000000000000000 0000000000200020 00000001f5000000 ffffff80c57a6400
00265 head: 8000000000000001 fffffffec315fa81 ffffffffffffffff 0000000000000000
00265 head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000
00265 page dumped because: kasan: bad access detected
00265
00265 Memory state around the buggy address:
00265 ffffff80c57eab00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00265 ffffff80c57eab80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
00265 >ffffff80c57eac00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
00265 ^
00265 ffffff80c57eac80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
00265 ffffff80c57ead00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
00265 ==================================================================
00265 Kernel panic - not syncing: kasan.fault=panic set ...
00265 CPU: 6 UID: 0 PID: 7531 Comm: cat Not tainted 6.13.0-rc3-ktest-g16fc6fa3819d #14103
00265 Hardware name: linux,dummy-virt (DT)
00265 Call trace:
00265 show_stack+0x1c/0x30 (C)
00265 dump_stack_lvl+0x30/0x80
00265 dump_stack+0x18/0x20
00265 panic+0x4d4/0x518
00265 start_report.constprop.0+0x0/0x90
00265 kasan_report+0xa0/0xd0
00265 __asan_report_load4_noabort+0x1c/0x28
00265 bch2_fs_to_text+0x12b4/0x1728
00265 bch2_fs_show+0x94/0x188
00265 sysfs_kf_seq_show+0x1a4/0x348
00265 kernfs_seq_show+0x12c/0x198
00265 seq_read_iter+0x27c/0xfd0
00265 kernfs_fop_read_iter+0x390/0x4f8
00265 vfs_read+0x480/0x7f0
00265 ksys_read+0xe0/0x1e8
00265 __arm64_sys_read+0x70/0xa8
00265 invoke_syscall.constprop.0+0x74/0x1e8
00265 do_el0_svc+0xc8/0x1c8
00265 el0_svc+0x20/0x60
00265 el0t_64_sync_handler+0x104/0x130
00265 el0t_64_sync+0x154/0x158
00265 SMP: stopping secondary CPUs
00265 Kernel Offset: disabled
00265 CPU features: 0x000,00000070,00000010,8240500b
00265 Memory Limit: none
00265 ---[ end Kernel panic - not syncing: kasan.fault=panic set ... ]---
00270 ========= FAILED TIMEOUT generic.187 in 1200s
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Fixes for new bugs:
- A fix for CB_GETATTR reply decoding was not quite correct
- Fix the NFSD connection limiting logic
- Fix a bug in the new session table resizing logic
Bugs that pre-date v6.14:
- Support for courteous clients (5.19) introduced a shutdown hang
- Fix a crash in the filecache laundrette (6.9)
- Fix a zero-day crash in NFSD's NFSv3 ACL implementation"
* tag 'nfsd-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix CB_GETATTR status fix
NFSD: fix hang in nfsd4_shutdown_callback
nfsd: fix __fh_verify for localio
nfsd: fix uninitialised slot info when a request is retried
nfsd: validate the nfsd_serv pointer before calling svc_wake_up
nfsd: clear acl_access/acl_default after releasing them
|
|
Jeff says:
Now that I look, 1b3e26a5ccbf is wrong. The patch on the ml was correct, but
the one that got committed is different. It should be:
status = decode_cb_op_status(xdr, OP_CB_GETATTR, &cb->cb_status);
if (unlikely(status || cb->cb_status))
If "status" is non-zero, decoding failed (usu. BADXDR), but we also want to
bail out and not decode the rest of the call if the decoded cb_status is
non-zero. That's not happening here, cb_seq_status has already been checked and
is non-zero, so this ends up trying to decode the rest of the CB_GETATTR reply
when it doesn't exist.
Reported-by: Jeff Layton <jlayton@kernel.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219737
Fixes: 1b3e26a5ccbf ("NFSD: fix decoding in nfs4_xdr_dec_cb_getattr")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If nfs4_client is in courtesy state then there is no point to send
the callback. This causes nfsd4_shutdown_callback to hang since
cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
notifies NFSD that the connection was dropped.
This patch modifies nfsd4_run_cb_work to skip the RPC call if
nfs4_client is in courtesy state.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Fixes: 66af25799940 ("NFSD: add courteous server support for thread with only delegation")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
__fh_verify() added a call to svc_xprt_set_valid() to help do connection
management but during LOCALIO path rqstp argument is NULL, leading to
NULL pointer dereferencing and a crash.
Fixes: eccbbc7c00a5 ("nfsd: don't use sv_nrthreads in connection limiting calculations.")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
A recent patch moved the assignment of seq->maxslots from before the
test for a resent request (which ends with a goto) to after, resulting
in it not being run in that case. This results in the server returning
bogus "high slot id" and "target high slot id" values.
The assignments to ->maxslots and ->target_maxslots need to be *after*
the out: label so that the correct values are returned in replies to
requests that are served from cache.
Fixes: 60aa6564317d ("nfsd: allocate new session-based DRC slots on demand.")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull smb client fixes from Steve French:
- Three DFS fixes: DFS mount fix, fix for noisy log msg and one to
remove some unused code
- SMB3 Lease fix
* tag 'v6.14rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: change lease epoch type from unsigned int to __u16
smb: client: get rid of kstrdup() in get_ses_refpath()
smb: client: fix noisy when tree connecting to DFS interlink targets
smb: client: don't trust DFSREF_STORAGE_SERVER bit
|
|
It is meaningless to shift a byte count by folio_shift(). The folio index
is in units of PAGE_SIZE, not folio_size(). We can use folio_contains()
to make this work for arbitrary-order folios, so remove the assertion
that the folios are of order 0.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
reflink pointers to missing indirect extents aren't deleted, they just
have an error bit set - in case the indirect extent somehow reappears.
fsck/mark and sweep thus needs to ignore these errors.
Also, they can be marked AUTOFIX now.
Reported-by: Roland Vet <vet.roland@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix fsnotify FMODE_NONOTIFY* handling.
This also disables fsnotify on all pseudo files by default apart from
very select exceptions. This carries a regression risk so we need to
watch out and adapt accordingly. However, it is overall a significant
improvement over the current status quo where every rando file can
get fsnotify enabled.
- Cleanup and simplify lockref_init() after recent lockref changes.
- Fix vboxfs build with gcc-15.
- Add an assert into inode_set_cached_link() to catch corrupt links.
- Allow users to also use an empty string check to detect whether a
given mount option string was empty or not.
- Fix how security options were appended to statmount()'s ->mnt_opt
field.
- Fix statmount() selftests to always check the returned mask.
- Fix uninitialized value in vfs_statx_path().
- Fix pidfs_ioctl() sanity checks to guard against ioctl() overloading
and preserve extensibility.
* tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: sanity check the length passed to inode_set_cached_link()
pidfs: improve ioctl handling
fsnotify: disable pre-content and permission events by default
selftests: always check mask returned by statmount(2)
fsnotify: disable notification by default for all pseudo files
fs: fix adding security options to statmount.mnt_opt
fsnotify: use accessor to set FMODE_NONOTIFY_*
lockref: remove count argument of lockref_init
gfs2: switch to lockref_init(..., 1)
gfs2: use lockref_init for gl_lockref
statmount: let unset strings be empty
vboxsf: fix building with GCC 15
fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()
|
|
Pull bcachefs fixes from Kent Overstreet:
"Nothing major, things continue to be fairly quiet over here.
- add a SubmittingPatches to clarify that patches submitted for
bcachefs do, in fact, need to be tested
- discard path now correctly issues journal flushes when needed, this
fixes performance issues when the filesystem is nearly full and
we're bottlenecked on copygc
- fix a bug that could cause the pending rebalance work accounting to
be off when devices are being onlined/offlined; users should report
if they are still seeing this
- and a few more trivial ones"
* tag 'bcachefs-2025-02-06.2' of git://evilpiepirate.org/bcachefs:
bcachefs: bch2_bkey_sectors_need_rebalance() now only depends on bch_extent_rebalance
bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()
bcachefs: Fix discard path journal flushing
bcachefs: fix deadlock in journal_entry_open()
bcachefs: fix incorrect pointer check in __bch2_subvolume_delete()
bcachefs docs: SubmittingPatches.rst
|
|
Pidfs supports extensible and non-extensible ioctls. The extensible
ioctls need to check for the ioctl number itself not just the ioctl
command otherwise both backward- and forward compatibility are broken.
The pidfs ioctl handler also needs to look at the type of the ioctl
command to guard against cases where "[...] a daemon receives some
random file descriptor from a (potentially less privileged) client and
expects the FD to be of some specific type, it might call ioctl() on
this FD with some type-specific command and expect the call to fail if
the FD is of the wrong type; but due to the missing type check, the
kernel instead performs some action that userspace didn't expect."
(cf. [1]]
Link: https://lore.kernel.org/r/20250204-work-pidfs-ioctl-v1-1-04987d239575@kernel.org
Link: https://lore.kernel.org/r/CAG48ez2K9A5GwtgqO31u9ZL292we8ZwAA=TJwwEv7wRuJ3j4Lw@mail.gmail.com [1]
Fixes: 8ce352818820 ("pidfs: check for valid ioctl commands")
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org # v6.13; please backport with 8ce352818820 ("pidfs: check for valid ioctl commands")
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
After introducing pre-content events, we had a regression related to
disabling huge faults on files that should never have pre-content events
enabled.
This happened because the default f_mode of allocated files (0) does
not disable pre-content events.
Pre-content events are disabled in file_set_fsnotify_mode_by_watchers()
but internal files may not get to call this helper.
Initialize f_mode to disable permission and pre-content events for all
files and if needed they will be enabled for the callers of
file_set_fsnotify_mode_by_watchers().
Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250203223205.861346-4-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Most pseudo files are not applicable for fsnotify events at all,
let alone to the new pre-content events.
Disable notifications to all files allocated with alloc_file_pseudo()
and enable legacy inotify events for the specific cases of pipe and
socket, which have known users of inotify events.
Pre-content events are also kept disabled for sockets and pipes.
Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250203223205.861346-3-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The FMODE_NONOTIFY_* bits are a 2-bits mode. Open coding manipulation
of those bits is risky. Use an accessor file_set_fsnotify_mode() to
set the mode.
Rename file_set_fsnotify_mode() => file_set_fsnotify_mode_from_watchers()
to make way for the simple accessor name.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250203223205.861346-2-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Prepending security options was made conditional on sb->s_op->show_options,
but security options are independent of sb options.
Fixes: 056d33137bf9 ("fs: prepend statmount.mnt_opts string with security_sb_mnt_opts()")
Fixes: f9af549d1fd3 ("fs: export mount options via statmount()")
Cc: stable@vger.kernel.org # v6.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20250129151253.33241-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
All users of lockref_init() now initialize the count to 1, so hardcode
that and remove the count argument.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-4-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In qd_alloc(), initialize the lockref count to 1 to cover the common
case. Compensate for that in gfs2_quota_init() by adjusting the count
back down to 0; this only occurs when mounting the filesystem rw.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-3-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Move the initialization of gl_lockref from gfs2_init_glock_once() to
gfs2_glock_get(). This allows to use lockref_init() there.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-2-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Just like it's normal for unset values to be zero, unset strings should be
empty instead of containing random values.
It seems to be a typical mistake that the mask returned by statmount is not
checked, which can result in various bugs.
With this fix, these bugs are prevented, since it is highly likely that
userspace would just want to turn the missing mask case into an empty
string anyway (most of the recently found cases are of this type).
Link: https://lore.kernel.org/all/CAJfpegsVCPfCn2DpM8iiYSS5DpMsLB8QBUCHecoj6s0Vxf4jzg@mail.gmail.com/
Fixes: 68385d77c05b ("statmount: simplify string option retrieval")
Fixes: 46eae99ef733 ("add statmount(2) syscall")
Cc: stable@vger.kernel.org # v6.8
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20250130121500.113446-1-mszeredi@redhat.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Building with GCC 15 results in build error
fs/vboxsf/super.c:24:54: error: initializer-string for array of ‘unsigned char’ is too long [-Werror=unterminated-string-initialization]
24 | static const unsigned char VBSF_MOUNT_SIGNATURE[4] = "\000\377\376\375";
| ^~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Due to GCC having enabled -Werror=unterminated-string-initialization[0]
by default. Separately initializing each array element of
VBSF_MOUNT_SIGNATURE to ensure NUL termination, thus satisfying GCC 15
and fixing the build error.
[0]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wno-unterminated-string-initialization
Signed-off-by: Brahmajit Das <brahmajit.xyz@gmail.com>
Link: https://lore.kernel.org/r/20250121162648.1408743-1-brahmajit.xyz@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Clang static checker(scan-build) warning:
fs/stat.c:287:21: warning: The left expression of the compound assignment is
an uninitialized value. The computed value will also be garbage.
287 | stat->result_mask |= STATX_MNT_ID_UNIQUE;
| ~~~~~~~~~~~~~~~~~ ^
fs/stat.c:290:21: warning: The left expression of the compound assignment is
an uninitialized value. The computed value will also be garbage.
290 | stat->result_mask |= STATX_MNT_ID;
When vfs_getattr() failed because of security_inode_getattr(), 'stat' is
uninitialized. In this case, there is a harmless garbage problem in
vfs_statx_path(). It's better to return error directly when
vfs_getattr() failed, avoiding garbage value and more clearly.
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20250119025946.1168957-1-suhui@nfschina.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
bch_extent_rebalance
Previously, bch2_bkey_sectors_need_rebalance() called
bch2_target_accepts_data(), checking whether the target is writable.
However, this means that adding or removing devices from a target would
change the value of bch2_bkey_sectors_need_rebalance() for an existing
extent; this needs to be invariant so that the extent trigger can
correctly maintain rebalance_work accounting.
Instead, check target_accepts_data() in io_opts_to_rebalance_opts(),
before creating the bch_extent_rebalance entry.
This fixes (one?) cause of rebalance_work accounting being off.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Spotted by sparse.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The discard path is supposed to issue journal flushes when there's too
many buckets empty buckets that need a journal commit before they can be
written to again, but at some point this code seems to have been lost.
Bring it back with a new optimization to make sure we don't issue too
many journal flushes: the journal now tracks the sequence number of the
most recent flush in progress, which the discard path uses when deciding
which buckets need a journal flush.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In the previous commit b3d82c2f2761, code was added to prevent journal sequence
overflow. Among them, the code added to journal_entry_open() uses the
bch2_fs_fatal_err_on() function to handle errors.
However, __journal_res_get() , which calls journal_entry_open() , calls
journal_entry_open() while holding journal->lock , but bch2_fs_fatal_err_on()
internally tries to acquire journal->lock , which results in a deadlock.
So we need to add a locked helper to handle fatal errors even when the
journal->lock is held.
Fixes: b3d82c2f2761 ("bcachefs: Guard against journal seq overflow")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
For some unknown reason, checks on struct bkey_s_c_snapshot and struct
bkey_s_c_snapshot_tree pointers are missing.
Therefore, I think it would be appropriate to fix the incorrect pointer checking
through this patch.
Fixes: 4bd06f07bcb5 ("bcachefs: Fixes for snapshot_tree.master_subvol")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
MS-SMB2 section 2.2.13.2.10 specifies that 'epoch' should be a 16-bit
unsigned integer used to track lease state changes. Change the data
type of all instances of 'epoch' from unsigned int to __u16. This
simplifies the epoch change comparisons and makes the code more
compliant with the protocol spec.
Cc: stable@vger.kernel.org
Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
After commit 36008fe6e3dc ("smb: client: don't try following DFS links
in cifs_tree_connect()"), TCP_Server_Info::leaf_fullpath will no
longer be changed, so there is no need to kstrdup() it.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When the client attempts to tree connect to a domain-based DFS
namespace from a DFS interlink target, the server will return
STATUS_BAD_NETWORK_NAME and the following will appear on dmesg:
CIFS: VFS: BAD_NETWORK_NAME: \\dom\dfs
Since a DFS share might contain several DFS interlinks and they expire
after 10 minutes, the above message might end up being flooded on
dmesg when mounting or accessing them.
Print this only once per share.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Some servers don't respect the DFSREF_STORAGE_SERVER bit, so
unconditionally tree connect to DFS link target and then decide
whether or not continue chasing DFS referrals for DFS interlinks.
Otherwise the client would fail to mount such shares.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- add lockdep annotation for relocation root to fix a splat warning
while merging roots
- fix assertion failure when splitting ordered extent after transaction
abort
- don't print 'qgroup inconsistent' message when rescan process updates
qgroup data sooner than the subvolume deletion process
- fix use-after-free (accessing the error number) when attempting to
join an aborted transaction
- avoid starting new transaction if not necessary when cleaning qgroup
during subvolume drop
* tag 'for-6.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: avoid starting new transaction when cleaning qgroup during subvolume drop
btrfs: fix use-after-free when attempting to join an aborted transaction
btrfs: do not output error message if a qgroup has been already cleaned up
btrfs: fix assertion failure when splitting ordered extent after transaction abort
btrfs: fix lockdep splat while merging a relocation root
|
|
Pull d_revalidate fix from Al Viro:
"Fix a braino in d_revalidate series: check ->d_op for NULL"
* tag 'pull-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix braino in "9p: fix ->rename_sem exclusion"
|
|
->d_op can bloody well be NULL
Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 30d61efe118c "9p: fix ->rename_sem exclusion"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull xfs bug fixes from Carlos Maiolino:
"A few fixes for XFS, but the most notable one is:
- xfs: remove xfs_buf_cache.bc_lock
which has been hit by different persons including syzbot"
* tag 'xfs-fixes-6.14-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove xfs_buf_cache.bc_lock
xfs: Add error handling for xfs_reflink_cancel_cow_range
xfs: Propagate errors from xfs_reflink_cancel_cow_range in xfs_dax_write_iomap_end
xfs: don't call remap_verify_area with sb write protection held
xfs: remove an out of data comment in _xfs_buf_alloc
xfs: fix the entry condition of exact EOF block allocation optimization
|
|
nfsd_file_dispose_list_delayed can be called from the filecache
laundrette, which is shut down after the nfsd threads are shut down and
the nfsd_serv pointer is cleared. If nn->nfsd_serv is NULL then there
are no threads to wake.
Ensure that the nn->nfsd_serv pointer is non-NULL before calling
svc_wake_up in nfsd_file_dispose_list_delayed. This is safe since the
svc_serv is not freed until after the filecache laundrette is cancelled.
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Closes: https://bugs.debian.org/1093734
Fixes: ffb402596147 ("nfsd: Don't leave work of closing files to a work queue")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If getting acl_default fails, acl_access and acl_default will be released
simultaneously. However, acl_access will still retain a pointer pointing
to the released posix_acl, which will trigger a WARNING in
nfs3svc_release_getacl like this:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 26 PID: 3199 at lib/refcount.c:28
refcount_warn_saturate+0xb5/0x170
Modules linked in:
CPU: 26 UID: 0 PID: 3199 Comm: nfsd Not tainted
6.12.0-rc6-00079-g04ae226af01f-dirty #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb5/0x170
Code: cc cc 0f b6 1d b3 20 a5 03 80 fb 01 0f 87 65 48 d8 00 83 e3 01 75
e4 48 c7 c7 c0 3b 9b 85 c6 05 97 20 a5 03 01 e8 fb 3e 30 ff <0f> 0b eb
cd 0f b6 1d 8a3
RSP: 0018:ffffc90008637cd8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83904fde
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88871ed36380
RBP: ffff888158beeb40 R08: 0000000000000001 R09: fffff520010c6f56
R10: ffffc90008637ab7 R11: 0000000000000001 R12: 0000000000000001
R13: ffff888140e77400 R14: ffff888140e77408 R15: ffffffff858b42c0
FS: 0000000000000000(0000) GS:ffff88871ed00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000562384d32158 CR3: 000000055cc6a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? refcount_warn_saturate+0xb5/0x170
? __warn+0xa5/0x140
? refcount_warn_saturate+0xb5/0x170
? report_bug+0x1b1/0x1e0
? handle_bug+0x53/0xa0
? exc_invalid_op+0x17/0x40
? asm_exc_invalid_op+0x1a/0x20
? tick_nohz_tick_stopped+0x1e/0x40
? refcount_warn_saturate+0xb5/0x170
? refcount_warn_saturate+0xb5/0x170
nfs3svc_release_getacl+0xc9/0xe0
svc_process_common+0x5db/0xb60
? __pfx_svc_process_common+0x10/0x10
? __rcu_read_unlock+0x69/0xa0
? __pfx_nfsd_dispatch+0x10/0x10
? svc_xprt_received+0xa1/0x120
? xdr_init_decode+0x11d/0x190
svc_process+0x2a7/0x330
svc_handle_xprt+0x69d/0x940
svc_recv+0x180/0x2d0
nfsd+0x168/0x200
? __pfx_nfsd+0x10/0x10
kthread+0x1a2/0x1e0
? kthread+0xf4/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Kernel panic - not syncing: kernel: panic_on_warn set ...
Clear acl_access/acl_default after posix_acl_release is called to prevent
UAF from being triggered.
Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241107014705.2509463-1-lilingfeng@huaweicloud.com/
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull misc vfs cleanups from Al Viro:
"Two unrelated patches - one is a removal of long-obsolete include in
overlayfs (it used to need fs/internal.h, but the extern it wanted has
been moved back to include/linux/namei.h) and another introduces
convenience helper constructing struct qstr by a NUL-terminated
string"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add a string-to-qstr constructor
fs/overlayfs/namei.c: get rid of include ../internal.h
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- various updates for special file handling: symlink handling,
support for creating sockets, cleanups, new mount options (e.g. to
allow disabling using reparse points for them, and to allow
overriding the way symlinks are saved), and fixes to error paths
- fix for kerberos mounts (allow IAKerb)
- SMB1 fix for stat and for setting SACL (auditing)
- fix an incorrect error code mapping
- cleanups"
* tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: (21 commits)
cifs: Fix parsing native symlinks directory/file type
cifs: update internal version number
cifs: Add support for creating WSL-style symlinks
smb3: add support for IAKerb
cifs: Fix struct FILE_ALL_INFO
cifs: Add support for creating NFS-style symlinks
cifs: Add support for creating native Windows sockets
cifs: Add mount option -o reparse=none
cifs: Add mount option -o symlink= for choosing symlink create type
cifs: Fix creating and resolving absolute NT-style symlinks
cifs: Simplify reparse point check in cifs_query_path_info() function
cifs: Remove symlink member from cifs_open_info_data union
cifs: Update description about ACL permissions
cifs: Rename struct reparse_posix_data to reparse_nfs_data_buffer and move to common/smb2pdu.h
cifs: Remove struct reparse_posix_data from struct cifs_open_info_data
cifs: Remove unicode parameter from parse_reparse_point() function
cifs: Fix getting and setting SACLs over SMB1
cifs: Remove intermediate object of failed create SFU call
cifs: Validate EAs for WSL reparse points
cifs: Change translation of STATUS_PRIVILEGE_NOT_HELD to -EPERM
...
|