Age | Commit message (Collapse) | Author | Files | Lines |
|
commit c8b359dddb418c60df1a69beea01d1b3322bfe83 upstream.
Add a check to the ovl_dentry_weird() function to prevent the
processing of directory inodes that lack the lookup function.
This is important because such inodes can cause errors in overlayfs
when passed to the lowerstack.
Reported-by: syzbot+a8c9d476508bd14a90e5@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=a8c9d476508bd14a90e5
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/linux-unionfs/CAJfpegvx-oS9XGuwpJx=Xe28_jzWx5eRo1y900_ZzWY+=gGzUg@mail.gmail.com/
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7c4e39f9d2af4abaf82ca0e315d1fd340456620f ]
At btrfs_ref_tree_mod() after we successfully inserted the new ref entry
(local variable 'ref') into the respective block entry's rbtree (local
variable 'be'), if we find an unexpected action of BTRFS_DROP_DELAYED_REF,
we error out and free the ref entry without removing it from the block
entry's rbtree. Then in the error path of btrfs_ref_tree_mod() we call
btrfs_free_ref_cache(), which iterates over all block entries and then
calls free_block_entry() for each one, and there we will trigger a
use-after-free when we are called against the block entry to which we
added the freed ref entry to its rbtree, since the rbtree still points
to the block entry, as we didn't remove it from the rbtree before freeing
it in the error path at btrfs_ref_tree_mod(). Fix this by removing the
new ref entry from the rbtree before freeing it.
Syzbot report this with the following stack traces:
BTRFS error (device loop0 state EA): Ref action 2, root 5, ref_root 0, parent 8564736, owner 0, offset 0, num_refs 18446744073709551615
__btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523
update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512
btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594
btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754
btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116
btrfs_insert_empty_items+0x9c/0x1a0 fs/btrfs/ctree.c:4314
btrfs_insert_empty_item fs/btrfs/ctree.h:669 [inline]
btrfs_insert_orphan_item+0x1f1/0x320 fs/btrfs/orphan.c:23
btrfs_orphan_add+0x6d/0x1a0 fs/btrfs/inode.c:3482
btrfs_unlink+0x267/0x350 fs/btrfs/inode.c:4293
vfs_unlink+0x365/0x650 fs/namei.c:4469
do_unlinkat+0x4ae/0x830 fs/namei.c:4533
__do_sys_unlinkat fs/namei.c:4576 [inline]
__se_sys_unlinkat fs/namei.c:4569 [inline]
__x64_sys_unlinkat+0xcc/0xf0 fs/namei.c:4569
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
BTRFS error (device loop0 state EA): Ref action 1, root 5, ref_root 5, parent 0, owner 260, offset 0, num_refs 1
__btrfs_mod_ref+0x76b/0xac0 fs/btrfs/extent-tree.c:2521
update_ref_for_cow+0x96a/0x11f0
btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594
btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754
btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411
__btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030
btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline]
__btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137
__btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171
btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313
prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586
relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611
btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377
__btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161
btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538
BTRFS error (device loop0 state EA): Ref action 2, root 5, ref_root 0, parent 8564736, owner 0, offset 0, num_refs 18446744073709551615
__btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523
update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512
btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594
btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754
btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411
__btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030
btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline]
__btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137
__btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171
btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313
prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586
relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611
btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377
__btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161
btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538
==================================================================
BUG: KASAN: slab-use-after-free in rb_first+0x69/0x70 lib/rbtree.c:473
Read of size 8 at addr ffff888042d1af38 by task syz.0.0/5329
CPU: 0 UID: 0 PID: 5329 Comm: syz.0.0 Not tainted 6.12.0-rc7-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
rb_first+0x69/0x70 lib/rbtree.c:473
free_block_entry+0x78/0x230 fs/btrfs/ref-verify.c:248
btrfs_free_ref_cache+0xa3/0x100 fs/btrfs/ref-verify.c:917
btrfs_ref_tree_mod+0x139f/0x15e0 fs/btrfs/ref-verify.c:898
btrfs_free_extent+0x33c/0x380 fs/btrfs/extent-tree.c:3544
__btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523
update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512
btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594
btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754
btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411
__btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030
btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline]
__btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137
__btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171
btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313
prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586
relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611
btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377
__btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161
btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538
btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f996df7e719
RSP: 002b:00007f996ede7038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f996e135f80 RCX: 00007f996df7e719
RDX: 0000000020000180 RSI: 00000000c4009420 RDI: 0000000000000004
RBP: 00007f996dff139e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f996e135f80 R15: 00007fff79f32e68
</TASK>
Allocated by task 5329:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:257 [inline]
__kmalloc_cache_noprof+0x19c/0x2c0 mm/slub.c:4295
kmalloc_noprof include/linux/slab.h:878 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
btrfs_ref_tree_mod+0x264/0x15e0 fs/btrfs/ref-verify.c:701
btrfs_free_extent+0x33c/0x380 fs/btrfs/extent-tree.c:3544
__btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523
update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512
btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594
btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754
btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411
__btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030
btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline]
__btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137
__btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171
btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313
prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586
relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611
btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377
__btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161
btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538
btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5329:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:230 [inline]
slab_free_hook mm/slub.c:2342 [inline]
slab_free mm/slub.c:4579 [inline]
kfree+0x1a0/0x440 mm/slub.c:4727
btrfs_ref_tree_mod+0x136c/0x15e0
btrfs_free_extent+0x33c/0x380 fs/btrfs/extent-tree.c:3544
__btrfs_mod_ref+0x7dd/0xac0 fs/btrfs/extent-tree.c:2523
update_ref_for_cow+0x9cd/0x11f0 fs/btrfs/ctree.c:512
btrfs_force_cow_block+0x9f6/0x1da0 fs/btrfs/ctree.c:594
btrfs_cow_block+0x35e/0xa40 fs/btrfs/ctree.c:754
btrfs_search_slot+0xbdd/0x30d0 fs/btrfs/ctree.c:2116
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:411
__btrfs_update_delayed_inode+0x1e7/0xb90 fs/btrfs/delayed-inode.c:1030
btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1114 [inline]
__btrfs_commit_inode_delayed_items+0x2318/0x24a0 fs/btrfs/delayed-inode.c:1137
__btrfs_run_delayed_items+0x213/0x490 fs/btrfs/delayed-inode.c:1171
btrfs_commit_transaction+0x8a8/0x3740 fs/btrfs/transaction.c:2313
prepare_to_relocate+0x3c4/0x4c0 fs/btrfs/relocation.c:3586
relocate_block_group+0x16c/0xd40 fs/btrfs/relocation.c:3611
btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4081
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3377
__btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4161
btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4538
btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ffff888042d1af00
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 56 bytes inside of
freed 64-byte region [ffff888042d1af00, ffff888042d1af40)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x42d1a
anon flags: 0x4fff00000000000(node=1|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 04fff00000000000 ffff88801ac418c0 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000200020 00000001f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52c40(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 5055, tgid 5055 (dhcpcd-run-hook), ts 40377240074, free_ts 40376848335
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1541
prep_new_page mm/page_alloc.c:1549 [inline]
get_page_from_freelist+0x3649/0x3790 mm/page_alloc.c:3459
__alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4735
alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
alloc_slab_page+0x6a/0x140 mm/slub.c:2412
allocate_slab+0x5a/0x2f0 mm/slub.c:2578
new_slab mm/slub.c:2631 [inline]
___slab_alloc+0xcd1/0x14b0 mm/slub.c:3818
__slab_alloc+0x58/0xa0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
__do_kmalloc_node mm/slub.c:4263 [inline]
__kmalloc_noprof+0x25a/0x400 mm/slub.c:4276
kmalloc_noprof include/linux/slab.h:882 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
tomoyo_encode2 security/tomoyo/realpath.c:45 [inline]
tomoyo_encode+0x26f/0x540 security/tomoyo/realpath.c:80
tomoyo_realpath_from_path+0x59e/0x5e0 security/tomoyo/realpath.c:283
tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
tomoyo_check_open_permission+0x255/0x500 security/tomoyo/file.c:771
security_file_open+0x777/0x990 security/security.c:3109
do_dentry_open+0x369/0x1460 fs/open.c:945
vfs_open+0x3e/0x330 fs/open.c:1088
do_open fs/namei.c:3774 [inline]
path_openat+0x2c84/0x3590 fs/namei.c:3933
page last free pid 5055 tgid 5055 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1112 [inline]
free_unref_page+0xcfb/0xf20 mm/page_alloc.c:2642
free_pipe_info+0x300/0x390 fs/pipe.c:860
put_pipe_info fs/pipe.c:719 [inline]
pipe_release+0x245/0x320 fs/pipe.c:742
__fput+0x23f/0x880 fs/file_table.c:431
__do_sys_close fs/open.c:1567 [inline]
__se_sys_close fs/open.c:1552 [inline]
__x64_sys_close+0x7f/0x110 fs/open.c:1552
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Memory state around the buggy address:
ffff888042d1ae00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888042d1ae80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
>ffff888042d1af00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^
ffff888042d1af80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
ffff888042d1b000: 00 00 00 00 00 fc fc 00 00 00 00 00 fc fc 00 00
Reported-by: syzbot+7325f164162e200000c1@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/673723eb.050a0220.1324f8.00a8.GAE@google.com/T/#u
Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3ed51857a50f530ac7a1482e069dfbd1298558d4 ]
Syzbot reports a null-ptr-deref in btrfs_search_slot().
The reproducer is using rescue=ibadroots, and the extent tree root is
corrupted thus the extent tree is NULL.
When scrub tries to search the extent tree to gather the needed extent
info, btrfs_search_slot() doesn't check if the target root is NULL or
not, resulting the null-ptr-deref.
Add sanity check for btrfs root before using it in btrfs_search_slot().
Reported-by: syzbot+3030e17bd57a73d39bd7@syzkaller.appspotmail.com
Fixes: 42437a6386ff ("btrfs: introduce mount option rescue=ignorebadroots")
Link: https://syzkaller.appspot.com/bug?extid=3030e17bd57a73d39bd7
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-by: syzbot+3030e17bd57a73d39bd7@syzkaller.appspotmail.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a4c853af0c511d7e0f7cb306bbc8a4f1dbdb64ca ]
Add annotations to functions that might sleep due to allocations or IO
and could be called from various contexts. In case of btrfs_search_slot
it's not obvious why it would sleep:
btrfs_search_slot
setup_nodes_for_search
reada_for_balance
btrfs_readahead_node_child
btrfs_readahead_tree_block
btrfs_find_create_tree_block
alloc_extent_buffer
kmem_cache_zalloc
/* allocate memory non-atomically, might sleep */
kmem_cache_alloc(GFP_NOFS|__GFP_NOFAIL|__GFP_ZERO)
read_extent_buffer_pages
submit_extent_page
/* disk IO, might sleep */
submit_one_bio
Other examples where the sleeping could happen is in 3 places might
sleep in update_qgroup_limit_item(), as shown below:
update_qgroup_limit_item
btrfs_alloc_path
/* allocate memory non-atomically, might sleep */
kmem_cache_zalloc(btrfs_path_cachep, GFP_NOFS)
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 3ed51857a50f ("btrfs: add a sanity check for btrfs root in btrfs_search_slot()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ed67f2a913a4f0fc505db29805c41dd07d3cb356 ]
When checking for delayed refs when verifying if there are cross
references for a data extent, we stop if the path has nowait set and we
can't try lock the delayed ref head's mutex, returning -EAGAIN with the
goal of making a write fallback to a blocking context. However we ignore
the -EAGAIN at btrfs_cross_ref_exist() when check_delayed_ref() returns
it, and keep looping instead of immediately returning the -EAGAIN to the
caller.
Fix this by not looping if we get -EAGAIN and we have a nowait path.
Fixes: 26ce91144631 ("btrfs: make can_nocow_extent nowait compatible")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ac6f420291b3fee1113f21d612fa88b628afab5b ]
One of the paths quota writeback is called from is:
freeze_super()
sync_filesystem()
ext4_sync_fs()
dquot_writeback_dquots()
Since we currently don't always flush the quota_release_work queue in
this path, we can end up with the following race:
1. dquot are added to releasing_dquots list during regular operations.
2. FS Freeze starts, however, this does not flush the quota_release_work queue.
3. Freeze completes.
4. Kernel eventually tries to flush the workqueue while FS is frozen which
hits a WARN_ON since transaction gets started during frozen state:
ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
__ext4_journal_start_sb+0x64/0x1c0 [ext4]
ext4_release_dquot+0x90/0x1d0 [ext4]
quota_release_workfn+0x43c/0x4d0
Which is the following line:
WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
Which ultimately results in generic/390 failing due to dmesg
noise. This was detected on powerpc machine 15 cores.
To avoid this, make sure to flush the workqueue during
dquot_writeback_dquots() so we dont have any pending workitems after
freeze.
Reported-by: Disha Goel <disgoel@linux.ibm.com>
CC: stable@vger.kernel.org
Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241121123855.645335-2-ojaswin@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 652f03db897ba24f9c4b269e254ccc6cc01ff1b7 ]
Compat features are new features that older kernels can safely ignore,
allowing read-write mounts without issues. The current sb write validation
implementation returns -EFSCORRUPTED for unknown compat features,
preventing filesystem write operations and contradicting the feature's
definition.
Additionally, if the mounted image is unclean, the log recovery may need
to write to the superblock. Returning an error for unknown compat features
during sb write validation can cause mount failures.
Although XFS currently does not use compat feature flags, this issue
affects current kernels' ability to mount images that may use compat
feature flags in the future.
Since superblock read validation already warns about unknown compat
features, it's unnecessary to repeat this warning during write validation.
Therefore, the relevant code in write validation is being removed.
Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 52cb7f8f177878b4f22397b9c4d2c8f743766be3 ]
When exporting only one file system with fsid=0 on the server side, the
client alternately uses the ro/rw mount options to perform the mount
operation, and a new vfsmount is generated each time.
It can be reproduced as follows:
[root@localhost ~]# mount /dev/sda /mnt2
[root@localhost ~]# echo "/mnt2 *(rw,no_root_squash,fsid=0)" >/etc/exports
[root@localhost ~]# systemctl restart nfs-server
[root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount | grep nfs4
127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,...
127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,...
127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,...
127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,...
[root@localhost ~]#
We expected that after mounting with the ro option, using the rw option to
mount again would return EBUSY, but the actual situation was not the case.
As shown above, when mounting for the first time, a superblock with the ro
flag will be generated, and at the same time, in do_new_mount_fc -->
do_add_mount, it detects that the superblock corresponding to the current
target directory is inconsistent with the currently generated one
(path->mnt->mnt_sb != newmnt->mnt.mnt_sb), and a new vfsmount will be
generated.
When mounting with the rw option for the second time, since no matching
superblock can be found in the fs_supers list, a new superblock with the
rw flag will be generated again. The superblock in use (ro) is different
from the newly generated superblock (rw), and a new vfsmount will be
generated again.
When mounting with the ro option for the third time, the superblock (ro)
is found in fs_supers, the superblock in use (rw) is different from the
found superblock (ro), and a new vfsmount will be generated again.
We can switch between ro/rw through remount, and only one superblock needs
to be generated, thus avoiding the problem of repeated generation of
vfsmount caused by switching superblocks.
Furthermore, This can also resolve the issue described in the link.
Fixes: 275a5d24bf56 ("NFS: Error when mounting the same filesystem with different options")
Link: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3ba44ee966bc3c41dd8a944f963466c8fcc60dc8 ]
When building the kernel with -Wmaybe-uninitialized, the compiler
reports this warning:
In function 'jffs2_mark_erased_block',
inlined from 'jffs2_erase_pending_blocks' at fs/jffs2/erase.c:116:4:
fs/jffs2/erase.c:474:9: warning: 'bad_offset' may be used uninitialized [-Wmaybe-uninitialized]
474 | jffs2_erase_failed(c, jeb, bad_offset);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/jffs2/erase.c: In function 'jffs2_erase_pending_blocks':
fs/jffs2/erase.c:402:18: note: 'bad_offset' was declared here
402 | uint32_t bad_offset;
| ^~~~~~~~~~
When mtd->point() is used, jffs2_erase_pending_blocks can return -EIO
without initializing bad_offset, which is later used at the filebad
label in jffs2_mark_erased_block.
Fix it by initializing this variable.
Fixes: 8a0f572397ca ("[JFFS2] Return values of jffs2_block_check_erase error paths")
Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4617fb8fc15effe8eda4dd898d4e33eb537a7140 ]
After an insertion in TNC, the tree might split and cause a node to
change its `znode->parent`. A further deletion of other nodes in the
tree (which also could free the nodes), the aforementioned node's
`znode->cparent` could still point to a freed node. This
`znode->cparent` may not be updated when getting nodes to commit in
`ubifs_tnc_start_commit()`. This could then trigger a use-after-free
when accessing the `znode->cparent` in `write_index()` in
`ubifs_tnc_end_commit()`.
This can be triggered by running
rm -f /etc/test-file.bin
dd if=/dev/urandom of=/etc/test-file.bin bs=1M count=60 conv=fsync
in a loop, and with `CONFIG_UBIFS_FS_AUTHENTICATION`. KASAN then
reports:
BUG: KASAN: use-after-free in ubifs_tnc_end_commit+0xa5c/0x1950
Write of size 32 at addr ffffff800a3af86c by task ubifs_bgt0_20/153
Call trace:
dump_backtrace+0x0/0x340
show_stack+0x18/0x24
dump_stack_lvl+0x9c/0xbc
print_address_description.constprop.0+0x74/0x2b0
kasan_report+0x1d8/0x1f0
kasan_check_range+0xf8/0x1a0
memcpy+0x84/0xf4
ubifs_tnc_end_commit+0xa5c/0x1950
do_commit+0x4e0/0x1340
ubifs_bg_thread+0x234/0x2e0
kthread+0x36c/0x410
ret_from_fork+0x10/0x20
Allocated by task 401:
kasan_save_stack+0x38/0x70
__kasan_kmalloc+0x8c/0xd0
__kmalloc+0x34c/0x5bc
tnc_insert+0x140/0x16a4
ubifs_tnc_add+0x370/0x52c
ubifs_jnl_write_data+0x5d8/0x870
do_writepage+0x36c/0x510
ubifs_writepage+0x190/0x4dc
__writepage+0x58/0x154
write_cache_pages+0x394/0x830
do_writepages+0x1f0/0x5b0
filemap_fdatawrite_wbc+0x170/0x25c
file_write_and_wait_range+0x140/0x190
ubifs_fsync+0xe8/0x290
vfs_fsync_range+0xc0/0x1e4
do_fsync+0x40/0x90
__arm64_sys_fsync+0x34/0x50
invoke_syscall.constprop.0+0xa8/0x260
do_el0_svc+0xc8/0x1f0
el0_svc+0x34/0x70
el0t_64_sync_handler+0x108/0x114
el0t_64_sync+0x1a4/0x1a8
Freed by task 403:
kasan_save_stack+0x38/0x70
kasan_set_track+0x28/0x40
kasan_set_free_info+0x28/0x4c
__kasan_slab_free+0xd4/0x13c
kfree+0xc4/0x3a0
tnc_delete+0x3f4/0xe40
ubifs_tnc_remove_range+0x368/0x73c
ubifs_tnc_remove_ino+0x29c/0x2e0
ubifs_jnl_delete_inode+0x150/0x260
ubifs_evict_inode+0x1d4/0x2e4
evict+0x1c8/0x450
iput+0x2a0/0x3c4
do_unlinkat+0x2cc/0x490
__arm64_sys_unlinkat+0x90/0x100
invoke_syscall.constprop.0+0xa8/0x260
do_el0_svc+0xc8/0x1f0
el0_svc+0x34/0x70
el0t_64_sync_handler+0x108/0x114
el0t_64_sync+0x1a4/0x1a8
The offending `memcpy()` in `ubifs_copy_hash()` has a use-after-free
when a node becomes root in TNC but still has a `cparent` to an already
freed node. More specifically, consider the following TNC:
zroot
/
/
zp1
/
/
zn
Inserting a new node `zn_new` with a key smaller then `zn` will trigger
a split in `tnc_insert()` if `zp1` is full:
zroot
/ \
/ \
zp1 zp2
/ \
/ \
zn_new zn
`zn->parent` has now been moved to `zp2`, *but* `zn->cparent` still
points to `zp1`.
Now, consider a removal of all the nodes _except_ `zn`. Just when
`tnc_delete()` is about to delete `zroot` and `zp2`:
zroot
\
\
zp2
\
\
zn
`zroot` and `zp2` get freed and the tree collapses:
zn
`zn` now becomes the new `zroot`.
`get_znodes_to_commit()` will now only find `zn`, the new `zroot`, and
`write_index()` will check its `znode->cparent` that wrongly points to
the already freed `zp1`. `ubifs_copy_hash()` thus gets wrongly called
with `znode->cparent->zbranch[znode->iip].hash` that triggers the
use-after-free!
Fix this by explicitly setting `znode->cparent` to `NULL` in
`get_znodes_to_commit()` for the root node. The search for the dirty
nodes is bottom-up in the tree. Thus, when `find_next_dirty(znode)`
returns NULL, the current `znode` _is_ the root node. Add an assert for
this.
Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes")
Tested-by: Waqar Hameed <waqar.hameed@axis.com>
Co-developed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Waqar Hameed <waqar.hameed@axis.com>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 84a2bee9c49769310efa19601157ef50a1df1267 ]
Since commit e874dcde1cbf ("ubifs: Reserve one leb for each journal
head while doing budget"), available space is calulated by deducting
reservation for all journal heads. However, the total block count (
which is only used by statfs) is not updated yet, which will cause
the wrong displaying for used space(total - available).
Fix it by deducting reservation for all journal heads from total
block count.
Fixes: e874dcde1cbf ("ubifs: Reserve one leb for each journal head while doing budget")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2fdb05dc0931250574f0cb0ebeb5ed8e20f4a889 ]
Yang Erkun reports that when two threads are opening files at the same
time, and are forced to abort before a reply is seen, then the call to
nfs_release_seqid() in nfs4_opendata_free() can result in a
use-after-free of the pointer to the defunct rpc task of the other
thread.
The fix is to ensure that if the RPC call is aborted before the call to
nfs_wait_on_sequence() is complete, then we must call nfs_release_seqid()
in nfs4_open_release() before the rpc_task is freed.
Reported-by: Yang Erkun <yangerkun@huawei.com>
Fixes: 24ac23ab88df ("NFSv4: Convert open() into an asynchronous RPC call")
Reviewed-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 7f33b92e5b18e904a481e6e208486da43e4dc841 upstream.
If the tag length is >= U32_MAX - 3 then the "length + 4" addition
can result in an integer overflow. Address this by splitting the
decoding into several steps so that decode_cb_compound4res() does
not have to perform arithmetic on the unsafe length value.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9ed9d83a51a9636d367c796252409e7b2f4de4d4 upstream.
This client was only requesting READ caching, not READ and HANDLE caching
in the LeaseState on the open requests we send for directories. To
delay closing a handle (e.g. for caching directory contents) we should
be requesting HANDLE as well as READ (as we already do for deferred
close of files). See MS-SMB2 3.3.1.4 e.g.
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 02dffe9ab092fc4c8800aee68cb7eafd37a980c4 upstream.
There is no check if stream size and start_clu are invalid.
If start_clu is EOF cluster and stream size is 4096, It will
cause uninit value access. because ei->hint_femp.eidx could
be 128(if cluster size is 4K) and wrong hint will allocate
next cluster. and this cluster will be same with the cluster
that is allocated by exfat_extend_valid_size(). The previous
patch will check invalid start_clu, but for clarity, initialize
hint_femp.eidx to zero.
Cc: stable@vger.kernel.org
Reported-by: syzbot+01218003be74b5e1213a@syzkaller.appspotmail.com
Tested-by: syzbot+01218003be74b5e1213a@syzkaller.appspotmail.com
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit aa52c54da40d9eee3ba87c05cdcb0cd07c04fa13 upstream.
We got a report that adding a fanotify filsystem watch prevents tail -f
from receiving events.
Reproducer:
1. Create 3 windows / login sessions. Become root in each session.
2. Choose a mounted filesystem that is pretty quiet; I picked /boot.
3. In the first window, run: fsnotifywait -S -m /boot
4. In the second window, run: echo data >> /boot/foo
5. In the third window, run: tail -f /boot/foo
6. Go back to the second window and run: echo more data >> /boot/foo
7. Observe that the tail command doesn't show the new data.
8. In the first window, hit control-C to interrupt fsnotifywait.
9. In the second window, run: echo still more data >> /boot/foo
10. Observe that the tail command in the third window has now printed
the missing data.
When stracing tail, we observed that when fanotify filesystem mark is
set, tail does get the inotify event, but the event is receieved with
the filename:
read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0foo\0\0\0\0\0\0\0\0\0\0\0\0\0",
50) = 32
This is unexpected, because tail is watching the file itself and not its
parent and is inconsistent with the inotify event received by tail when
fanotify filesystem mark is not set:
read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 50) = 16
The inteference between different fsnotify groups was caused by the fact
that the mark on the sb requires the filename, so the filename is passed
to fsnotify(). Later on, fsnotify_handle_event() tries to take care of
not passing the filename to groups (such as inotify) that are interested
in the filename only when the parent is watching.
But the logic was incorrect for the case that no group is watching the
parent, some groups are watching the sb and some watching the inode.
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: 7372e79c9eb9 ("fanotify: fix logic of reporting name info with watched parent")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d9f9d96136cba8fedd647d2c024342ce090133c2 upstream.
Commit 7c55b78818cf ("jfs: xattr: fix buffer overflow for invalid xattr")
also addresses this issue but it only fixes it for positive values, while
ea_size is an integer type and can take negative values, e.g. in case of
a corrupted filesystem. This still breaks validation and would overflow
because of implicit conversion from int to size_t in print_hex_dump().
Fix this issue by clamping the ea_size value instead.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Cc: stable@vger.kernel.org
Signed-off-by: Artem Sadovnikov <ancowi69@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4a622e4d477bb12ad5ed4abbc7ad1365de1fa347 upstream.
The original implementation ext4's FS_IOC_GETFSMAP handling only
worked when the range of queried blocks included at least one free
(unallocated) block range. This is because how the metadata blocks
were emitted was as a side effect of ext4_mballoc_query_range()
calling ext4_getfsmap_datadev_helper(), and that function was only
called when a free block range was identified. As a result, this
caused generic/365 to fail.
Fix this by creating a new function ext4_getfsmap_meta_helper() which
gets called so that blocks before the first free block range in a
block group can get properly reported.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 902cc179c931a033cd7f4242353aa2733bf8524c upstream.
find_group_other() and find_group_orlov() read *_lo, *_hi with
ext4_free_inodes_count without additional locking. This can cause
data-race warning, but since the lock is held for most writes and free
inodes value is generally not a problem even if it is incorrect, it is
more appropriate to use READ_ONCE()/WRITE_ONCE() than to add locking.
==================================================================
BUG: KCSAN: data-race in ext4_free_inodes_count / ext4_free_inodes_set
write to 0xffff88810404300e of 2 bytes by task 6254 on cpu 1:
ext4_free_inodes_set+0x1f/0x80 fs/ext4/super.c:405
__ext4_new_inode+0x15ca/0x2200 fs/ext4/ialloc.c:1216
ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391
vfs_symlink+0xca/0x1d0 fs/namei.c:4615
do_symlinkat+0xe3/0x340 fs/namei.c:4641
__do_sys_symlinkat fs/namei.c:4657 [inline]
__se_sys_symlinkat fs/namei.c:4654 [inline]
__x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654
x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
read to 0xffff88810404300e of 2 bytes by task 6257 on cpu 0:
ext4_free_inodes_count+0x1c/0x80 fs/ext4/super.c:349
find_group_other fs/ext4/ialloc.c:594 [inline]
__ext4_new_inode+0x6ec/0x2200 fs/ext4/ialloc.c:1017
ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391
vfs_symlink+0xca/0x1d0 fs/namei.c:4615
do_symlinkat+0xe3/0x340 fs/namei.c:4641
__do_sys_symlinkat fs/namei.c:4657 [inline]
__se_sys_symlinkat fs/namei.c:4654 [inline]
__x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654
x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Cc: stable@vger.kernel.org
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://patch.msgid.link/20241003125337.47283-1-aha310510@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 74e97958121aa1f5854da6effba70143f051b0cd upstream.
Create subvolume, create snapshot and delete subvolume all use
btrfs_subvolume_reserve_metadata() to reserve metadata for the changes
done to the parent subvolume's fs tree, which cannot be mediated in the
normal way via start_transaction. When quota groups (squota or qgroups)
are enabled, this reserves qgroup metadata of type PREALLOC. Once the
operation is associated to a transaction, we convert PREALLOC to
PERTRANS, which gets cleared in bulk at the end of the transaction.
However, the error paths of these three operations were not implementing
this lifecycle correctly. They unconditionally converted the PREALLOC to
PERTRANS in a generic cleanup step regardless of errors or whether the
operation was fully associated to a transaction or not. This resulted in
error paths occasionally converting this rsv to PERTRANS without calling
record_root_in_trans successfully, which meant that unless that root got
recorded in the transaction by some other thread, the end of the
transaction would not free that root's PERTRANS, leaking it. Ultimately,
this resulted in hitting a WARN in CONFIG_BTRFS_DEBUG builds at unmount
for the leaked reservation.
The fix is to ensure that every qgroup PREALLOC reservation observes the
following properties:
1. any failure before record_root_in_trans is called successfully
results in freeing the PREALLOC reservation.
2. after record_root_in_trans, we convert to PERTRANS, and now the
transaction owns freeing the reservation.
This patch enforces those properties on the three operations. Without
it, generic/269 with squotas enabled at mkfs time would fail in ~5-10
runs on my system. With this patch, it ran successfully 1000 times in a
row.
Fixes: e85fde5162bf ("btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
[Xiangyu: BP to fix CVE-2024-35956, due to 6.1 btrfs_subvolume_release_metadata()
defined in ctree.h, modified the header file name from root-tree.h to ctree.h]
Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fb63435b7c7dc112b1ae1baea5486e0a6e27b196 upstream.
There is a lack of verification of the space occupied by fixed members
of xlog_op_header in the xlog_recover_process_data.
We can create a crafted image to trigger an out of bounds read by
following these steps:
1) Mount an image of xfs, and do some file operations to leave records
2) Before umounting, copy the image for subsequent steps to simulate
abnormal exit. Because umount will ensure that tail_blk and
head_blk are the same, which will result in the inability to enter
xlog_recover_process_data
3) Write a tool to parse and modify the copied image in step 2
4) Make the end of the xlog_op_header entries only 1 byte away from
xlog_rec_header->h_size
5) xlog_rec_header->h_num_logops++
6) Modify xlog_rec_header->h_crc
Fix:
Add a check to make sure there is sufficient space to access fixed members
of xlog_op_header.
Signed-off-by: lei lu <llfamsec@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 556bdf27c2dd5c74a9caacbe524b943a6cd42d99 upstream.
Added bounds checking to make sure that every attr don't stray beyond
valid memory region.
Signed-off-by: lei lu <llfamsec@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 652cfeb43d6b9aba5c7c4902bed7a7340df131fb upstream.
Reported-by: Robert Morris <rtm@csail.mit.edu>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 62a8642ba00aa8ceb0a02ade942f5ec52e877c95 ]
nfsd4_shutdown_copy() is just this:
while ((copy = nfsd4_get_copy(clp)) != NULL)
nfsd4_stop_copy(copy);
nfsd4_get_copy() bumps @copy's reference count, preventing
nfsd4_stop_copy() from releasing @copy.
A while loop like this usually works by removing the first element
of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy()
alters the async_copies list.
Best I can tell, then, is that nfsd4_shutdown_copy() continues to
loop until other threads manage to remove all the items from this
list. The spinning loop blocks shutdown until these items are gone.
Possibly the reason we haven't seen this issue in the field is
because client_has_state() prevents __destroy_client() from calling
nfsd4_shutdown_copy() if there are any items on this list. In a
subsequent patch I plan to remove that restriction.
Fixes: e0639dc5805a ("NFSD introduce async copy feature")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f64ea4af43161bb86ffc77e6aeb5bcf5c3229df0 ]
It's only current caller already length-checks the string, but let's
be safe.
Fixes: 0964a3d3f1aa ("[PATCH] knfsd: nfsd4 reboot dirname fix")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1e02c641c3a43c88cecc08402000418e15578d38 ]
@ses is initialized to NULL. If __nfsd4_find_backchannel() finds no
available backchannel session, setup_callback_client() will try to
dereference @ses and segfault.
Fixes: dcbeaa68dbbd ("nfsd4: allow backchannel recovery")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 26e6f59d0bbaac76fa3413462d780bd2b5f9f653 ]
Jinsu Lee reported a performance regression issue, after commit
5c8764f8679e ("f2fs: fix to force buffered IO on inline_data
inode"), we forced direct write to use buffered IO on inline_data
inode, it will cause performace regression due to memory copy
and data flush.
It's fine to not force direct write to use buffered IO, as it
can convert inline inode before committing direct write IO.
Fixes: 5c8764f8679e ("f2fs: fix to force buffered IO on inline_data inode")
Reported-by: Jinsu Lee <jinsu1.lee@samsung.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/af03dd2c-e361-4f80-b2fd-39440766cf6e@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
GC_URGENT_MID
[ Upstream commit 296b8cb34e65fa93382cf919be5a056f719c9a26 ]
If gc_mode is set to GC_URGENT_LOW or GC_URGENT_MID, cost benefit GC
approach should be used, but if ATGC is enabled at the same time,
Age-threshold approach will be selected, which can only do amount of
GC and it is much less than the numbers of CB approach.
some traces:
f2fs_gc-254:48-396 [007] ..... 2311600.684028: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1053, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
f2fs_gc-254:48-396 [007] ..... 2311600.684527: f2fs_get_victim: dev = (254,48), type = No TYPE, policy = (Background GC, LFS-mode, Age-threshold), victim = 10, cost = 4294364975, ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 44898
f2fs_gc-254:48-396 [007] ..... 2311600.714835: f2fs_gc_end: dev = (254,48), ret = 0, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
f2fs_gc-254:48-396 [007] ..... 2311600.714843: f2fs_background_gc: dev = (254,48), wait_ms = 50, prefree = 0, free = 44898
f2fs_gc-254:48-396 [007] ..... 2311600.771785: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:
f2fs_gc-254:48-396 [007] ..... 2311600.772275: f2fs_gc_end: dev = (254,48), ret = -61, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
Fixes: 0e5e81114de1 ("f2fs: add GC_URGENT_LOW mode in gc_urgent")
Fixes: d98af5f45520 ("f2fs: introduce gc_urgent_mid mode")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 43563069e1c1df417d2eed6eca8a22fc6b04691d ]
In the __f2fs_init_atgc_curseg->get_atssr_segment calling,
curseg->segno is NULL_SEGNO, indicating that there is no summary
block that needs to be written.
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5bcd655fffaec24e849bda1207446f5cc821713e ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 43563069e1c1 ("f2fs: check curseg->inited before write_sum_page in change_curseg")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8442d94b8ac8d5d8300725a9ffa9def526b71170 ]
allocate_segment_by_default has just two callers, which use very
different code pathes inside it based on the force paramter. Just
open code the logic in the two callers using a new helper to decided
if a new segment should be allocated.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 43563069e1c1 ("f2fs: check curseg->inited before write_sum_page in change_curseg")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1c8a8ec0a0e9a1176022a35c4daf04fe1594d270 ]
There is only single instance of these ops, so remove the indirection
and call allocate_segment_by_default directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 43563069e1c1 ("f2fs: check curseg->inited before write_sum_page in change_curseg")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c3af1f13476ec23fd99c98d060a89be28c1e8871 ]
This f2fs_bug_on was introduced by commit 2c1905042c8c ("f2fs: check
segment type in __f2fs_replace_block") when there were only 6 curseg types.
After commit d0b9e42ab615 ("f2fs: introduce inmem curseg") was introduced,
the condition should be changed to checking curseg->seg_type.
Fixes: d0b9e42ab615 ("f2fs: introduce inmem curseg")
Signed-off-by: LongPing Wei <weilongping@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1acd73edbbfef2c3c5b43cba4006a7797eca7050 ]
It will trigger system panic w/ testcase in [1]:
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2752!
RIP: 0010:new_curseg+0xc81/0x2110
Call Trace:
f2fs_allocate_data_block+0x1c91/0x4540
do_write_page+0x163/0xdf0
f2fs_outplace_write_data+0x1aa/0x340
f2fs_do_write_data_page+0x797/0x2280
f2fs_write_single_data_page+0x16cd/0x2190
f2fs_write_cache_pages+0x994/0x1c80
f2fs_write_data_pages+0x9cc/0xea0
do_writepages+0x194/0x7a0
filemap_fdatawrite_wbc+0x12b/0x1a0
__filemap_fdatawrite_range+0xbb/0xf0
file_write_and_wait_range+0xa1/0x110
f2fs_do_sync_file+0x26f/0x1c50
f2fs_sync_file+0x12b/0x1d0
vfs_fsync_range+0xfa/0x230
do_fsync+0x3d/0x80
__x64_sys_fsync+0x37/0x50
x64_sys_call+0x1e88/0x20d0
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The root cause is if checkpoint_disabling and lfs_mode are both on,
it will trigger OPU for all overwritten data, it may cost more free
segment than expected, so f2fs must account those data correctly to
calculate cosumed free segments later, and return ENOSPC earlier to
avoid run out of free segment during block allocation.
[1] https://lore.kernel.org/fstests/20241015025106.3203676-1-chao@kernel.org/
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
release_compress_blocks and reserve_compress_blocks
[ Upstream commit 26413ce18e85de3dda2cd3d72c3c3e8ab8f4f996 ]
After release a file and subsequently reserve it, the FSCK flag is set
when the file is deleted, as shown in the following backtrace:
F2FS-fs (dm-48): Inconsistent i_blocks, ino:401231, iblocks:1448, sectors:1472
fs_rec_info_write_type+0x58/0x274
f2fs_rec_info_write+0x1c/0x2c
set_sbi_flag+0x74/0x98
dec_valid_block_count+0x150/0x190
f2fs_truncate_data_blocks_range+0x2d4/0x3cc
f2fs_do_truncate_blocks+0x2fc/0x5f0
f2fs_truncate_blocks+0x68/0x100
f2fs_truncate+0x80/0x128
f2fs_evict_inode+0x1a4/0x794
evict+0xd4/0x280
iput+0x238/0x284
do_unlinkat+0x1ac/0x298
__arm64_sys_unlinkat+0x48/0x68
invoke_syscall+0x58/0x11c
For clusters of the following type, i_blocks are decremented by 1 and
i_compr_blocks are incremented by 7 in release_compress_blocks, while
updates to i_blocks and i_compr_blocks are skipped in reserve_compress_blocks.
raw node:
D D D D D D D D
after compress:
C D D D D D D D
after reserve:
C D D D D D D D
Let's update i_blocks and i_compr_blocks properly in reserve_compress_blocks.
Fixes: eb8fbaa53374 ("f2fs: compress: fix to check unreleased compressed cluster")
Signed-off-by: Qi Han <hanqi@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 128630e1dbec8074c7707aad107299169047e68f ]
Update this log message since cached fids may represent things other
than the root of a mount.
Fixes: e4029e072673 ("cifs: find and use the dentry for cached non-root directories also")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit adc77b19f62d7e80f98400b2fca9d700d2afdd6f ]
Syzbot has reported the following KMSAN splat:
BUG: KMSAN: uninit-value in ocfs2_file_read_iter+0x9a4/0xf80
ocfs2_file_read_iter+0x9a4/0xf80
__io_read+0x8d4/0x20f0
io_read+0x3e/0xf0
io_issue_sqe+0x42b/0x22c0
io_wq_submit_work+0xaf9/0xdc0
io_worker_handle_work+0xd13/0x2110
io_wq_worker+0x447/0x1410
ret_from_fork+0x6f/0x90
ret_from_fork_asm+0x1a/0x30
Uninit was created at:
__alloc_pages_noprof+0x9a7/0xe00
alloc_pages_mpol_noprof+0x299/0x990
alloc_pages_noprof+0x1bf/0x1e0
allocate_slab+0x33a/0x1250
___slab_alloc+0x12ef/0x35e0
kmem_cache_alloc_bulk_noprof+0x486/0x1330
__io_alloc_req_refill+0x84/0x560
io_submit_sqes+0x172f/0x2f30
__se_sys_io_uring_enter+0x406/0x41c0
__x64_sys_io_uring_enter+0x11f/0x1a0
x64_sys_call+0x2b54/0x3ba0
do_syscall_64+0xcd/0x1e0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Since an instance of 'struct kiocb' may be passed from the block layer
with 'private' field uninitialized, introduce 'ocfs2_iocb_init_rw_locked()'
and use it from where 'ocfs2_dio_end_io()' might take care, i.e. in
'ocfs2_file_read_iter()' and 'ocfs2_file_write_iter()'.
Link: https://lkml.kernel.org/r/20241029091736.1501946-1-dmantipov@yandex.ru
Fixes: 7cdfc3a1c397 ("ocfs2: Remember rw lock level during direct io")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reported-by: syzbot+a73e253cca4f0230a5a5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a73e253cca4f0230a5a5
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 82e33f249f1126cf3c5f39a31b850d485ac33bc3 ]
Coccinelle complains about the nested reuse of the pointer `iter' with
different pointer type:
./fs/proc/kcore.c:515:26-30: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:534:23-27: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:550:40-44: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:568:27-31: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:581:28-32: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:599:27-31: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:607:38-42: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:614:26-30: ERROR: invalid reference to the index variable of the iterator on line 499
Replacing `struct kcore_list *iter' with `struct kcore_list *tmp' doesn't change the
scope and the functionality is the same and coccinelle seems happy.
NOTE: There was an issue with using `struct kcore_list *pos' as the nested iterator.
The build did not work!
[akpm@linux-foundation.org: s/tmp/pos/]
Link: https://lkml.kernel.org/r/20241029054651.86356-2-mtodorovac69@gmail.com
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Link: https://lkml.kernel.org/r/20220331223700.902556-1-jakobkoschel@gmail.com
Fixes: 04d168c6d42d ("fs/proc/kcore.c: remove check of list iterator against head past the loop body")
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Brian Johannesmeyer" <bjohannesmeyer@gmail.com>
Cc: Cristiano Giuffrida <c.giuffrida@vu.nl>
Cc: "Bos, H.J." <h.j.bos@vu.nl>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Yan Zhen <yanzhen@vivo.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 156bb2c569cd869583c593d27a5bd69e7b2a4264 ]
utf8_load() requests the symbol "utf8_data_table" and then checks if the
requested UTF-8 version is supported. If it's unsupported, it tries to
put the data table using symbol_put(). If an unsupported version is
requested, symbol_put() fails like this:
kernel BUG at kernel/module/main.c:786!
RIP: 0010:__symbol_put+0x93/0xb0
Call Trace:
<TASK>
? __die_body.cold+0x19/0x27
? die+0x2e/0x50
? do_trap+0xca/0x110
? do_error_trap+0x65/0x80
? __symbol_put+0x93/0xb0
? exc_invalid_op+0x51/0x70
? __symbol_put+0x93/0xb0
? asm_exc_invalid_op+0x1a/0x20
? __pfx_cmp_name+0x10/0x10
? __symbol_put+0x93/0xb0
? __symbol_put+0x62/0xb0
utf8_load+0xf8/0x150
That happens because symbol_put() expects the unique string that
identify the symbol, instead of a pointer to the loaded symbol. Fix that
by using such string.
Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20240902225511.757831-2-andrealmeid@igalia.com
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1c82587cb57687de3f18ab4b98a8850c789bedcf ]
Devices block sizes may change. One of these cases is a loop device by
using ioctl LOOP_SET_BLOCK_SIZE.
While this may cause other issues like IO being rejected, in the case of
hfsplus, it will allocate a block by using that size and potentially write
out-of-bounds when hfsplus_read_wrapper calls hfsplus_submit_bio and the
latter function reads a different io_size.
Using a new min_io_size initally set to sb_min_blocksize works for the
purposes of the original fix, since it will be set to the max between
HFSPLUS_SECTOR_SIZE and the first seen logical block size. We still use the
max between HFSPLUS_SECTOR_SIZE and min_io_size in case the latter is not
initialized.
Tested by mounting an hfsplus filesystem with loop block sizes 512, 1024
and 4096.
The produced KASAN report before the fix looks like this:
[ 419.944641] ==================================================================
[ 419.945655] BUG: KASAN: slab-use-after-free in hfsplus_read_wrapper+0x659/0xa0a
[ 419.946703] Read of size 2 at addr ffff88800721fc00 by task repro/10678
[ 419.947612]
[ 419.947846] CPU: 0 UID: 0 PID: 10678 Comm: repro Not tainted 6.12.0-rc5-00008-gdf56e0f2f3ca #84
[ 419.949007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[ 419.950035] Call Trace:
[ 419.950384] <TASK>
[ 419.950676] dump_stack_lvl+0x57/0x78
[ 419.951212] ? hfsplus_read_wrapper+0x659/0xa0a
[ 419.951830] print_report+0x14c/0x49e
[ 419.952361] ? __virt_addr_valid+0x267/0x278
[ 419.952979] ? kmem_cache_debug_flags+0xc/0x1d
[ 419.953561] ? hfsplus_read_wrapper+0x659/0xa0a
[ 419.954231] kasan_report+0x89/0xb0
[ 419.954748] ? hfsplus_read_wrapper+0x659/0xa0a
[ 419.955367] hfsplus_read_wrapper+0x659/0xa0a
[ 419.955948] ? __pfx_hfsplus_read_wrapper+0x10/0x10
[ 419.956618] ? do_raw_spin_unlock+0x59/0x1a9
[ 419.957214] ? _raw_spin_unlock+0x1a/0x2e
[ 419.957772] hfsplus_fill_super+0x348/0x1590
[ 419.958355] ? hlock_class+0x4c/0x109
[ 419.958867] ? __pfx_hfsplus_fill_super+0x10/0x10
[ 419.959499] ? __pfx_string+0x10/0x10
[ 419.960006] ? lock_acquire+0x3e2/0x454
[ 419.960532] ? bdev_name.constprop.0+0xce/0x243
[ 419.961129] ? __pfx_bdev_name.constprop.0+0x10/0x10
[ 419.961799] ? pointer+0x3f0/0x62f
[ 419.962277] ? __pfx_pointer+0x10/0x10
[ 419.962761] ? vsnprintf+0x6c4/0xfba
[ 419.963178] ? __pfx_vsnprintf+0x10/0x10
[ 419.963621] ? setup_bdev_super+0x376/0x3b3
[ 419.964029] ? snprintf+0x9d/0xd2
[ 419.964344] ? __pfx_snprintf+0x10/0x10
[ 419.964675] ? lock_acquired+0x45c/0x5e9
[ 419.965016] ? set_blocksize+0x139/0x1c1
[ 419.965381] ? sb_set_blocksize+0x6d/0xae
[ 419.965742] ? __pfx_hfsplus_fill_super+0x10/0x10
[ 419.966179] mount_bdev+0x12f/0x1bf
[ 419.966512] ? __pfx_mount_bdev+0x10/0x10
[ 419.966886] ? vfs_parse_fs_string+0xce/0x111
[ 419.967293] ? __pfx_vfs_parse_fs_string+0x10/0x10
[ 419.967702] ? __pfx_hfsplus_mount+0x10/0x10
[ 419.968073] legacy_get_tree+0x104/0x178
[ 419.968414] vfs_get_tree+0x86/0x296
[ 419.968751] path_mount+0xba3/0xd0b
[ 419.969157] ? __pfx_path_mount+0x10/0x10
[ 419.969594] ? kmem_cache_free+0x1e2/0x260
[ 419.970311] do_mount+0x99/0xe0
[ 419.970630] ? __pfx_do_mount+0x10/0x10
[ 419.971008] __do_sys_mount+0x199/0x1c9
[ 419.971397] do_syscall_64+0xd0/0x135
[ 419.971761] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 419.972233] RIP: 0033:0x7c3cb812972e
[ 419.972564] Code: 48 8b 0d f5 46 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c2 46 0d 00 f7 d8 64 89 01 48
[ 419.974371] RSP: 002b:00007ffe30632548 EFLAGS: 00000286 ORIG_RAX: 00000000000000a5
[ 419.975048] RAX: ffffffffffffffda RBX: 00007ffe306328d8 RCX: 00007c3cb812972e
[ 419.975701] RDX: 0000000020000000 RSI: 0000000020000c80 RDI: 00007ffe306325d0
[ 419.976363] RBP: 00007ffe30632720 R08: 00007ffe30632610 R09: 0000000000000000
[ 419.977034] R10: 0000000000200008 R11: 0000000000000286 R12: 0000000000000000
[ 419.977713] R13: 00007ffe306328e8 R14: 00005a0eb298bc68 R15: 00007c3cb8356000
[ 419.978375] </TASK>
[ 419.978589]
Fixes: 6596528e391a ("hfsplus: ensure bio requests are not smaller than the hardware sectors")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Link: https://lore.kernel.org/r/20241107114109.839253-1-cascardo@igalia.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 22f9400a6f3560629478e0a64247b8fcc811a24d ]
In fscache_create_volume(), there is a missing memory barrier between the
bit-clearing operation and the wake-up operation. This may cause a
situation where, after a wake-up, the bit-clearing operation hasn't been
detected yet, leading to an indefinite wait. The triggering process is as
follows:
[cookie1] [cookie2] [volume_work]
fscache_perform_lookup
fscache_create_volume
fscache_perform_lookup
fscache_create_volume
fscache_create_volume_work
cachefiles_acquire_volume
clear_and_wake_up_bit
test_and_set_bit
test_and_set_bit
goto maybe_wait
goto no_wait
In the above process, cookie1 and cookie2 has the same volume. When cookie1
enters the -no_wait- process, it will clear the bit and wake up the waiting
process. If a barrier is missing, it may cause cookie2 to remain in the
-wait- process indefinitely.
In commit 3288666c7256 ("fscache: Use clear_and_wake_up_bit() in
fscache_create_volume_work()"), barriers were added to similar operations
in fscache_create_volume_work(), but fscache_create_volume() was missed.
By combining the clear and wake operations into clear_and_wake_up_bit() to
fix this issue.
Fixes: bfa22da3ed65 ("fscache: Provide and use cache methods to lookup/create/free a volume")
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Link: https://lore.kernel.org/r/20241107110649.3980193-6-wozizhi@huawei.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 56f4856b425a30e1d8b3e41e6cde8bfba90ba5f8 ]
In the erofs on-demand loading scenario, read and write operations are
usually delivered through "off" and "len" contained in read req in user
mode. Naturally, pwrite is used to specify a specific offset to complete
write operations.
However, if the write(not pwrite) syscall is called multiple times in the
read-ahead scenario, we need to manually update ki_pos after each write
operation to update file->f_pos.
This step is currently missing from the cachefiles_ondemand_fd_write_iter
function, added to address this issue.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Link: https://lore.kernel.org/r/20241107110649.3980193-3-wozizhi@huawei.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 76486b104168ae59703190566e372badf433314b ]
When we remount filesystem with 'abort' mount option while changing
other mount options as well (as is LTP test doing), we can return error
from the system call after commit d3476f3dad4a ("ext4: don't set
SB_RDONLY after filesystem errors") because the application of mount
option changes detects shutdown filesystem and refuses to do anything.
The behavior of application of other mount options in presence of
'abort' mount option is currently rather arbitary as some mount option
changes are handled before 'abort' and some after it.
Move aborting of the filesystem to the end of remount handling so all
requested changes are properly applied before the filesystem is shutdown
to have a reasonably consistent behavior.
Fixes: d3476f3dad4a ("ext4: don't set SB_RDONLY after filesystem errors")
Reported-by: Jan Stancek <jstancek@redhat.com>
Link: https://lore.kernel.org/all/Zvp6L+oFnfASaoHl@t14s
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Jan Stancek <jstancek@redhat.com>
Link: https://patch.msgid.link/20241004221556.19222-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 22b8d707b07e6e06f50fe1d9ca8756e1f894eb0d ]
'abort' mount option is the only mount option that has special handling
and sets a bit in sbi->s_mount_flags. There is not strong reason for
that so just simplify the code and make 'abort' set a bit in
sbi->s_mount_opt2 as any other mount option. This simplifies the code
and will allow us to drop EXT4_MF_FS_ABORTED completely in the following
patch.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230616165109.21695-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 76486b104168 ("ext4: avoid remount errors with 'abort' mount option")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7f7b850689ac06a62befe26e1fd1806799e7f152 ]
It's observed that a crash occurs during hot-remove a memory device,
in which user is accessing the hugetlb. See calltrace as following:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790
Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s
mirror dm_region_hash dm_log dm_mod
CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ #492
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:do_user_addr_fault+0x2a0/0x790
Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41
RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046
RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658
R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000
FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __warn+0x8d/0x190
? do_user_addr_fault+0x2a0/0x790
? report_bug+0x1c3/0x1d0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? do_user_addr_fault+0x2a0/0x790
? exc_page_fault+0x31/0x200
exc_page_fault+0x68/0x200
<...snip...>
BUG: unable to handle page fault for address: 0000000000001000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
---[ end trace 0000000000000000 ]---
BUG: unable to handle page fault for address: 0000000000001000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ #492
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:dentry_name+0x1f4/0x440
<...snip...>
? dentry_name+0x2fa/0x440
vsnprintf+0x1f3/0x4f0
vprintk_store+0x23a/0x540
vprintk_emit+0x6d/0x330
_printk+0x58/0x80
dump_mapping+0x10b/0x1a0
? __pfx_free_object_rcu+0x10/0x10
__dump_page+0x26b/0x3e0
? vprintk_emit+0xe0/0x330
? _printk+0x58/0x80
? dump_page+0x17/0x50
dump_page+0x17/0x50
do_migrate_range+0x2f7/0x7f0
? do_migrate_range+0x42/0x7f0
? offline_pages+0x2f4/0x8c0
offline_pages+0x60a/0x8c0
memory_subsys_offline+0x9f/0x1c0
? lockdep_hardirqs_on+0x77/0x100
? _raw_spin_unlock_irqrestore+0x38/0x60
device_offline+0xe3/0x110
state_store+0x6e/0xc0
kernfs_fop_write_iter+0x143/0x200
vfs_write+0x39f/0x560
ksys_write+0x65/0xf0
do_syscall_64+0x62/0x130
Previously, some sanity check have been done in dump_mapping() before
the print facility parsing '%pd' though, it's still possible to run into
an invalid dentry.d_name.name.
Since dump_mapping() only needs to dump the filename only, retrieve it
by itself in a safer way to prevent an unnecessary crash.
Note that either retrieving the filename with '%pd' or
strncpy_from_kernel_nofault(), the filename could be unreliable.
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://lore.kernel.org/r/20240826055503.1522320-1-lizhijian@fujitsu.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[Xiangyu: Bp to fix CVE: CVE-2024-49934, modified strscpy step due to 6.1/6.6 need pass
the max len to strscpy]
Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit e2a8910af01653c1c268984855629d71fb81f404 upstream.
ReparseDataLength is sum of the InodeType size and DataBuffer size.
So to get DataBuffer size it is needed to subtract InodeType's size from
ReparseDataLength.
Function cifs_strndup_from_utf16() is currentlly accessing buf->DataBuffer
at position after the end of the buffer because it does not subtract
InodeType size from the length. Fix this problem and correctly subtract
variable len.
Member InodeType is present only when reparse buffer is large enough. Check
for ReparseDataLength before accessing InodeType to prevent another invalid
memory access.
Major and minor rdev values are present also only when reparse buffer is
large enough. Check for reparse buffer size before calling reparse_mkdev().
Fixes: d5ecebc4900d ("smb3: Allow query of symlinks stored as reparse points")
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
[use variable name symlink_buf, the other buf->InodeType accesses are
not used in current version so skip]
Signed-off-by: Mahmoud Adam <mngyadam@amazon.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 84b9749a3a704dcc824a88aa8267247c801d51e4 ]
seq_printf is costy, on a system with n CPUs, reading /proc/softirqs
would yield 10*n decimal values, and the extra cost parsing format string
grows linearly with number of cpus. Replace seq_printf with
seq_put_decimal_ull_width have significant performance improvement.
On an 8CPUs system, reading /proc/softirqs show ~40% performance
gain with this patch.
Signed-off-by: David Wang <00107082@163.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6630036b7c228f57c7893ee0403e92c2db2cd21d ]
If an iget fails due to not being able to retrieve information
from the server then the inode structure is only partially
initialized. When the inode gets evicted, references to
uninitialized structures (like fscache cookies) were being
made.
This patch checks for a bad_inode before doing anything other
than clearing the inode from the cache. Since the inode is
bad, it shouldn't have any state associated with it that needs
to be written back (and there really isn't a way to complete
those anyways).
Reported-by: syzbot+eb83fe1cce5833cd66a0@syzkaller.appspotmail.com
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
(cherry picked from commit 1b4cb6e91f19b81217ad98142ee53a1ab25893fd)
[Xiangyu: CVE-2024-36923 Minor conflict resolution due to missing 4eb31178 ]
Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c6cd2e8d2d9aa7ee35b1fa6a668e32a22a9753da upstream.
I found potencial out-of-bounds when buffer offset fields of a few requests
is invalid. This patch set the minimum value of buffer offset field to
->Buffer offset to validate buffer length.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Vamsi Krishna Brahmajosyula <vamsi-krishna.brahmajosyula@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a80a486d72e20bd12c335bcd38b6e6f19356b0aa upstream.
If ->NameOffset of smb2_create_req is smaller than Buffer offset of
smb2_create_req, slab-out-of-bounds read can happen from smb2_open.
This patch set the minimum value of the name offset to the buffer offset
to validate name length of smb2_create_req().
Cc: stable@vger.kernel.org
Reported-by: Xuanzhe Yu <yuxuanzhe@outlook.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Stable-dep-of: c6cd2e8d2d9a ("ksmbd: fix potencial out-of-bounds when buffer offset is invalid")
Signed-off-by: Vamsi Krishna Brahmajosyula <vamsi-krishna.brahmajosyula@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|