| Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 68d05693f8c031257a0822464366e1c2a239a512 ]
mkfs.f2fs -f /dev/vdd
mount /dev/vdd /mnt/f2fs
touch /mnt/f2fs/foo
sync # avoid CP_UMOUNT_FLAG in last f2fs_checkpoint.ckpt_flags
touch /mnt/f2fs/bar
f2fs_io fsync /mnt/f2fs/bar
f2fs_io shutdown 2 /mnt/f2fs
umount /mnt/f2fs
blockdev --setro /dev/vdd
mount /dev/vdd /mnt/f2fs
mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only.
For the case if we create and fsync a new inode before sudden power-cut,
without norecovery or disable_roll_forward mount option, the following
mount will succeed w/o recovering last fsynced inode.
The problem here is that we only check inode_list list after
find_fsync_dnodes() in f2fs_recover_fsync_data() to find out whether
there is recoverable data in the iamge, but there is a missed case, if
last fsynced inode is not existing in last checkpoint, then, we will
fail to get its inode due to nat of inode node is not existing in last
checkpoint, so the inode won't be linked in inode_list.
Let's detect such case in dyrun mode to fix this issue.
After this change, mount will fail as expected below:
mount: /mnt/f2fs: cannot mount /dev/vdd read-only.
dmesg(1) may have more information after failed mount system call.
demsg:
F2FS-fs (vdd): Need to recover fsync data, but write access unavailable, please try mount w/ disable_roll_forward or norecovery
Cc: stable@kernel.org
Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ folio => page ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1f27ef42bb0b7c0740c5616ec577ec188b8a1d05 ]
As Hong Yun reported in mailing list:
loop7: detected capacity change from 0 to 131072
------------[ cut here ]------------
kmem_cache of name 'f2fs_xattr_entry-7:7' already exists
WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 kmem_cache_sanity_check mm/slab_common.c:109 [inline]
WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 __kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307
CPU: 0 UID: 0 PID: 24426 Comm: syz.7.1370 Not tainted 6.17.0-rc4 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:kmem_cache_sanity_check mm/slab_common.c:109 [inline]
RIP: 0010:__kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307
Call Trace:
__kmem_cache_create include/linux/slab.h:353 [inline]
f2fs_kmem_cache_create fs/f2fs/f2fs.h:2943 [inline]
f2fs_init_xattr_caches+0xa5/0xe0 fs/f2fs/xattr.c:843
f2fs_fill_super+0x1645/0x2620 fs/f2fs/super.c:4918
get_tree_bdev_flags+0x1fb/0x260 fs/super.c:1692
vfs_get_tree+0x43/0x140 fs/super.c:1815
do_new_mount+0x201/0x550 fs/namespace.c:3808
do_mount fs/namespace.c:4136 [inline]
__do_sys_mount fs/namespace.c:4347 [inline]
__se_sys_mount+0x298/0x2f0 fs/namespace.c:4324
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x8e/0x3a0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug can be reproduced w/ below scripts:
- mount /dev/vdb /mnt1
- mount /dev/vdc /mnt2
- umount /mnt1
- mounnt /dev/vdb /mnt1
The reason is if we created two slab caches, named f2fs_xattr_entry-7:3
and f2fs_xattr_entry-7:7, and they have the same slab size. Actually,
slab system will only create one slab cache core structure which has
slab name of "f2fs_xattr_entry-7:3", and two slab caches share the same
structure and cache address.
So, if we destroy f2fs_xattr_entry-7:3 cache w/ cache address, it will
decrease reference count of slab cache, rather than release slab cache
entirely, since there is one more user has referenced the cache.
Then, if we try to create slab cache w/ name "f2fs_xattr_entry-7:3" again,
slab system will find that there is existed cache which has the same name
and trigger the warning.
Let's changes to use global inline_xattr_slab instead of per-sb slab cache
for fixing.
Fixes: a999150f4fe3 ("f2fs: use kmem_cache pool during inline xattr lookups")
Cc: stable@kernel.org
Reported-by: Hong Yun <yhong@link.cuhk.edu.hk>
Tested-by: Hong Yun <yhong@link.cuhk.edu.hk>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ folio => page ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit be112e7449a6e1b54aa9feac618825d154b3a5c7 ]
In order to let userspace detect such error rather than suffering
silent failure.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ Adjust context, no rollback ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 10b591e7fb7cdc8c1e53e9c000dc0ef7069aaa76 ]
Bai, Shuangpeng <sjb7183@psu.edu> reported a bug as below:
Oops: divide error: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 11441 Comm: syz.0.46 Not tainted 6.17.0 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:f2fs_all_cluster_page_ready+0x106/0x550 fs/f2fs/compress.c:857
Call Trace:
<TASK>
f2fs_write_cache_pages fs/f2fs/data.c:3078 [inline]
__f2fs_write_data_pages fs/f2fs/data.c:3290 [inline]
f2fs_write_data_pages+0x1c19/0x3600 fs/f2fs/data.c:3317
do_writepages+0x38e/0x640 mm/page-writeback.c:2634
filemap_fdatawrite_wbc mm/filemap.c:386 [inline]
__filemap_fdatawrite_range mm/filemap.c:419 [inline]
file_write_and_wait_range+0x2ba/0x3e0 mm/filemap.c:794
f2fs_do_sync_file+0x6e6/0x1b00 fs/f2fs/file.c:294
generic_write_sync include/linux/fs.h:3043 [inline]
f2fs_file_write_iter+0x76e/0x2700 fs/f2fs/file.c:5259
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x7e9/0xe00 fs/read_write.c:686
ksys_write+0x19d/0x2d0 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf7/0x470 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The bug was triggered w/ below race condition:
fsync setattr ioctl
- f2fs_do_sync_file
- file_write_and_wait_range
- f2fs_write_cache_pages
: inode is non-compressed
: cc.cluster_size =
F2FS_I(inode)->i_cluster_size = 0
- tag_pages_for_writeback
- f2fs_setattr
- truncate_setsize
- f2fs_truncate
- f2fs_fileattr_set
- f2fs_setflags_common
- set_compress_context
: F2FS_I(inode)->i_cluster_size = 4
: set_inode_flag(inode, FI_COMPRESSED_FILE)
- f2fs_compressed_file
: return true
- f2fs_all_cluster_page_ready
: "pgidx % cc->cluster_size" trigger dividing 0 issue
Let's change as below to fix this issue:
- introduce a new atomic type variable .writeback in structure f2fs_inode_info
to track the number of threads which calling f2fs_write_cache_pages().
- use .i_sem lock to protect .writeback update.
- check .writeback before update compression context in f2fs_setflags_common()
to avoid race w/ ->writepages.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Cc: stable@kernel.org
Reported-by: Bai, Shuangpeng <sjb7183@psu.edu>
Tested-by: Bai, Shuangpeng <sjb7183@psu.edu>
Closes: https://lore.kernel.org/lkml/44D8F7B3-68AD-425F-9915-65D27591F93F@psu.edu
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 078cad8212ce4f4ebbafcc0936475b8215e1ca2a ]
Let's drop the inode from the donation list when there is no other
open file.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 10b591e7fb7c ("f2fs: fix to avoid updating compression context during writeback")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ef0c333cad8d1940f132a7ce15f15920216a3bd5 ]
This patch records POSIX_FADV_NOREUSE ranges for users to reclaim the caches
instantly off from LRU.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 10b591e7fb7c ("f2fs: fix to avoid updating compression context during writeback")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 968c4f72b23c0c8f1e94e942eab89b8c5a3022e7 ]
After commit 3db1de0e582c ("f2fs: change the current atomic write way"),
we removed all GC_FAILURE_ATOMIC usage, let's change i_gc_failures[]
array to i_pin_failure for cleanup.
Meanwhile, let's define i_current_depth and i_gc_failures as union
variable due to they won't be valid at the same time.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 10b591e7fb7c ("f2fs: fix to avoid updating compression context during writeback")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ca8b201f28547e28343a6f00a6e91fa8c09572fe ]
As Jiaming Zhang and syzbot reported, there is potential deadlock in
f2fs as below:
Chain exists of:
&sbi->cp_rwsem --> fs_reclaim --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(sb_internal#2);
lock(fs_reclaim);
lock(sb_internal#2);
rlock(&sbi->cp_rwsem);
*** DEADLOCK ***
3 locks held by kswapd0/73:
#0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:7015 [inline]
#0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x951/0x2800 mm/vmscan.c:7389
#1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_trylock_shared fs/super.c:562 [inline]
#1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_cache_scan+0x91/0x4b0 fs/super.c:197
#2: ffff888011840610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x8d9/0x1b60 fs/f2fs/inode.c:890
stack backtrace:
CPU: 0 UID: 0 PID: 73 Comm: kswapd0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043
check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
__lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
down_read+0x46/0x2e0 kernel/locking/rwsem.c:1537
f2fs_down_read fs/f2fs/f2fs.h:2278 [inline]
f2fs_lock_op fs/f2fs/f2fs.h:2357 [inline]
f2fs_do_truncate_blocks+0x21c/0x10c0 fs/f2fs/file.c:791
f2fs_truncate_blocks+0x10a/0x300 fs/f2fs/file.c:867
f2fs_truncate+0x489/0x7c0 fs/f2fs/file.c:925
f2fs_evict_inode+0x9f2/0x1b60 fs/f2fs/inode.c:897
evict+0x504/0x9c0 fs/inode.c:810
f2fs_evict_inode+0x1dc/0x1b60 fs/f2fs/inode.c:853
evict+0x504/0x9c0 fs/inode.c:810
dispose_list fs/inode.c:852 [inline]
prune_icache_sb+0x21b/0x2c0 fs/inode.c:1000
super_cache_scan+0x39b/0x4b0 fs/super.c:224
do_shrink_slab+0x6ef/0x1110 mm/shrinker.c:437
shrink_slab_memcg mm/shrinker.c:550 [inline]
shrink_slab+0x7ef/0x10d0 mm/shrinker.c:628
shrink_one+0x28a/0x7c0 mm/vmscan.c:4955
shrink_many mm/vmscan.c:5016 [inline]
lru_gen_shrink_node mm/vmscan.c:5094 [inline]
shrink_node+0x315d/0x3780 mm/vmscan.c:6081
kswapd_shrink_node mm/vmscan.c:6941 [inline]
balance_pgdat mm/vmscan.c:7124 [inline]
kswapd+0x147c/0x2800 mm/vmscan.c:7389
kthread+0x70e/0x8a0 kernel/kthread.c:463
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
The root cause is deadlock among four locks as below:
kswapd
- fs_reclaim --- Lock A
- shrink_one
- evict
- f2fs_evict_inode
- sb_start_intwrite --- Lock B
- iput
- evict
- f2fs_evict_inode
- sb_start_intwrite --- Lock B
- f2fs_truncate
- f2fs_truncate_blocks
- f2fs_do_truncate_blocks
- f2fs_lock_op --- Lock C
ioctl
- f2fs_ioc_commit_atomic_write
- f2fs_lock_op --- Lock C
- __f2fs_commit_atomic_write
- __replace_atomic_write_block
- f2fs_get_dnode_of_data
- __get_node_folio
- f2fs_check_nid_range
- f2fs_handle_error
- f2fs_record_errors
- f2fs_down_write --- Lock D
open
- do_open
- do_truncate
- security_inode_need_killpriv
- f2fs_getxattr
- lookup_all_xattrs
- f2fs_handle_error
- f2fs_record_errors
- f2fs_down_write --- Lock D
- f2fs_commit_super
- read_mapping_folio
- filemap_alloc_folio_noprof
- prepare_alloc_pages
- fs_reclaim_acquire --- Lock A
In order to avoid such deadlock, we need to avoid grabbing sb_lock in
f2fs_handle_error(), so, let's use asynchronous method instead:
- remove f2fs_handle_error() implementation
- rename f2fs_handle_error_async() to f2fs_handle_error()
- spread f2fs_handle_error()
Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Cc: stable@kernel.org
Reported-by: syzbot+14b90e1156b9f6fc1266@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68eae49b.050a0220.ac43.0001.GAE@google.com
Reported-by: Jiaming Zhang <r772577952@gmail.com>
Closes: https://lore.kernel.org/lkml/CANypQFa-Gy9sD-N35o3PC+FystOWkNuN8pv6S75HLT0ga-Tzgw@mail.gmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0b8eb814e05885cde53c1d56ee012a029b8413e6 ]
Use f2fs_err_ratelimited() to instead f2fs_err() in
f2fs_record_stop_reason() and f2fs_record_errors() to
avoid redundant logs.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: ca8b201f2854 ("f2fs: fix to avoid potential deadlock")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 01fba45deaddcce0d0b01c411435d1acf6feab7b upstream.
With below scripts, it will trigger panic in f2fs:
mkfs.f2fs -f /dev/vdd
mount /dev/vdd /mnt/f2fs
touch /mnt/f2fs/foo
sync
echo 111 >> /mnt/f2fs/foo
f2fs_io fsync /mnt/f2fs/foo
f2fs_io shutdown 2 /mnt/f2fs
umount /mnt/f2fs
mount -o ro,norecovery /dev/vdd /mnt/f2fs
or
mount -o ro,disable_roll_forward /dev/vdd /mnt/f2fs
F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 0
F2FS-fs (vdd): Mounted with checkpoint version = 7f5c361f
F2FS-fs (vdd): Stopped filesystem due to reason: 0
F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 1
Filesystem f2fs get_tree() didn't set fc->root, returned 1
------------[ cut here ]------------
kernel BUG at fs/super.c:1761!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 3 UID: 0 PID: 722 Comm: mount Not tainted 6.18.0-rc2+ #721 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:vfs_get_tree.cold+0x18/0x1a
Call Trace:
<TASK>
fc_mount+0x13/0xa0
path_mount+0x34e/0xc50
__x64_sys_mount+0x121/0x150
do_syscall_64+0x84/0x800
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa6cc126cfe
The root cause is we missed to handle error number returned from
f2fs_recover_fsync_data() when mounting image w/ ro,norecovery or
ro,disable_roll_forward mount option, result in returning a positive
error number to vfs_get_tree(), fix it.
Cc: stable@kernel.org
Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 27bf6a637b7613fc85fa6af468b7d612d78cd5c0 upstream.
The age extent cache uses last_blocks (derived from
allocated_data_blocks) to determine data age. However, there's a
conflict between the deletion
marker (last_blocks=0) and legitimate last_blocks=0 cases when
allocated_data_blocks overflows to 0 after reaching ULLONG_MAX.
In this case, valid extents are incorrectly skipped due to the
"if (!tei->last_blocks)" check in __update_extent_tree_range().
This patch fixes the issue by:
1. Reserving ULLONG_MAX as an invalid/deletion marker
2. Limiting allocated_data_blocks to range [0, ULLONG_MAX-1]
3. Using F2FS_EXTENT_AGE_INVALID for deletion scenarios
4. Adjusting overflow age calculation from ULLONG_MAX to (ULLONG_MAX-1)
Reproducer (using a patched kernel with allocated_data_blocks
initialized to ULLONG_MAX - 3 for quick testing):
Step 1: Mount and check initial state
# dd if=/dev/zero of=/tmp/test.img bs=1M count=100
# mkfs.f2fs -f /tmp/test.img
# mkdir -p /mnt/f2fs_test
# mount -t f2fs -o loop,age_extent_cache /tmp/test.img /mnt/f2fs_test
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551612 # ULLONG_MAX - 3
Inner Struct Count: tree: 1(0), node: 0
Step 2: Create files and write data to trigger overflow
# touch /mnt/f2fs_test/{1,2,3,4}.txt; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551613 # ULLONG_MAX - 2
Inner Struct Count: tree: 5(0), node: 1
# dd if=/dev/urandom of=/mnt/f2fs_test/1.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551614 # ULLONG_MAX - 1
Inner Struct Count: tree: 5(0), node: 2
# dd if=/dev/urandom of=/mnt/f2fs_test/2.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551615 # ULLONG_MAX
Inner Struct Count: tree: 5(0), node: 3
# dd if=/dev/urandom of=/mnt/f2fs_test/3.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 0 # Counter overflowed!
Inner Struct Count: tree: 5(0), node: 4
Step 3: Trigger the bug - next write should create node but gets skipped
# dd if=/dev/urandom of=/mnt/f2fs_test/4.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 1
Inner Struct Count: tree: 5(0), node: 4
Expected: node: 5 (new extent node for 4.txt)
Actual: node: 4 (extent insertion was incorrectly skipped due to
last_blocks = allocated_data_blocks = 0 in __get_new_block_age)
After this fix, the extent node is correctly inserted and node count
becomes 5 as expected.
Fixes: 71644dff4811 ("f2fs: add block_age-based extent cache")
Cc: stable@kernel.org
Signed-off-by: Xiaole He <hexiaole1994@126.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d33f89b34aa313f50f9a512d58dd288999f246b0 upstream.
F2FS can mount filesystems with corrupted directory depth values that
get runtime-clamped to MAX_DIR_HASH_DEPTH. When RENAME_WHITEOUT
operations are performed on such directories, f2fs_rename performs
directory modifications (updating target entry and deleting source
entry) before attempting to add the whiteout entry via f2fs_add_link.
If f2fs_add_link fails due to the corrupted directory structure, the
function returns an error to VFS, but the partial directory
modifications have already been committed to disk. VFS assumes the
entire rename operation failed and does not update the dentry cache,
leaving stale mappings.
In the error path, VFS does not call d_move() to update the dentry
cache. This results in new_dentry still pointing to the old inode
(new_inode) which has already had its i_nlink decremented to zero.
The stale cache causes subsequent operations to incorrectly reference
the freed inode.
This causes subsequent operations to use cached dentry information that
no longer matches the on-disk state. When a second rename targets the
same entry, VFS attempts to decrement i_nlink on the stale inode, which
may already have i_nlink=0, triggering a WARNING in drop_nlink().
Example sequence:
1. First rename (RENAME_WHITEOUT): file2 → file1
- f2fs updates file1 entry on disk (points to inode 8)
- f2fs deletes file2 entry on disk
- f2fs_add_link(whiteout) fails (corrupted directory)
- Returns error to VFS
- VFS does not call d_move() due to error
- VFS cache still has: file1 → inode 7 (stale!)
- inode 7 has i_nlink=0 (already decremented)
2. Second rename: file3 → file1
- VFS uses stale cache: file1 → inode 7
- Tries to drop_nlink on inode 7 (i_nlink already 0)
- WARNING in drop_nlink()
Fix this by explicitly invalidating old_dentry and new_dentry when
f2fs_add_link fails during whiteout creation. This forces VFS to
refresh from disk on subsequent operations, ensuring cache consistency
even when the rename partially succeeds.
Reproducer:
1. Mount F2FS image with corrupted i_current_depth
2. renameat2(file2, file1, RENAME_WHITEOUT)
3. renameat2(file3, file1, 0)
4. System triggers WARNING in drop_nlink()
Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT")
Reported-by: syzbot+632cf32276a9a564188d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=632cf32276a9a564188d
Suggested-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/all/20251022233349.102728-1-kartikey406@gmail.com/ [v1]
Cc: stable@vger.kernel.org
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7c37c79510329cd951a4dedf3f7bf7e2b18dccec upstream.
As syzbot reported:
F2FS-fs (loop0): __update_extent_tree_range: extent len is zero, type: 0, extent [0, 0, 0], age [0, 0]
------------[ cut here ]------------
kernel BUG at fs/f2fs/extent_cache.c:678!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5336 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__update_extent_tree_range+0x13bc/0x1500 fs/f2fs/extent_cache.c:678
Call Trace:
<TASK>
f2fs_update_read_extent_cache_range+0x192/0x3e0 fs/f2fs/extent_cache.c:1085
f2fs_do_zero_range fs/f2fs/file.c:1657 [inline]
f2fs_zero_range+0x10c1/0x1580 fs/f2fs/file.c:1737
f2fs_fallocate+0x583/0x990 fs/f2fs/file.c:2030
vfs_fallocate+0x669/0x7e0 fs/open.c:342
ioctl_preallocate fs/ioctl.c:289 [inline]
file_ioctl+0x611/0x780 fs/ioctl.c:-1
do_vfs_ioctl+0xb33/0x1430 fs/ioctl.c:576
__do_sys_ioctl fs/ioctl.c:595 [inline]
__se_sys_ioctl+0x82/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f07bc58eec9
In error path of f2fs_zero_range(), it may add a zero-sized extent
into extent cache, it should be avoided.
Fixes: 6e9619499f53 ("f2fs: support in batch fzero in dnode page")
Cc: stable@kernel.org
Reported-by: syzbot+24124df3170c3638b35f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68e5d698.050a0220.256323.0032.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 297baa4aa263ff8f5b3d246ee16a660d76aa82c4 upstream.
Xfstests generic/335, generic/336 sometimes crash with the following message:
F2FS-fs (dm-0): detect filesystem reference count leak during umount, type: 9, count: 1
------------[ cut here ]------------
kernel BUG at fs/f2fs/super.c:1939!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 609351 Comm: umount Tainted: G W 6.17.0-rc5-xfstests-g9dd1835ecda5 #1 PREEMPT(none)
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:f2fs_put_super+0x3b3/0x3c0
Call Trace:
<TASK>
generic_shutdown_super+0x7e/0x190
kill_block_super+0x1a/0x40
kill_f2fs_super+0x9d/0x190
deactivate_locked_super+0x30/0xb0
cleanup_mnt+0xba/0x150
task_work_run+0x5c/0xa0
exit_to_user_mode_loop+0xb7/0xc0
do_syscall_64+0x1ae/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
---[ end trace 0000000000000000 ]---
It appears that sometimes it is possible that f2fs_put_super() is called before
all node page reads are completed.
Adding a call to f2fs_wait_on_all_pages() for F2FS_RD_NODE fixes the problem.
Cc: stable@kernel.org
Fixes: 20872584b8c0b ("f2fs: fix to drop all dirty meta/node pages during umount()")
Signed-off-by: Jan Prusakowski <jprusakowski@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 39868685c2a94a70762bc6d77dc81d781d05bff5 ]
The decompress_io_ctx may be released asynchronously after
I/O completion. If this file is deleted immediately after read,
and the kworker of processing post_read_wq has not been executed yet
due to high workloads, It is possible that the inode(f2fs_inode_info)
is evicted and freed before it is used f2fs_free_dic.
The UAF case as below:
Thread A Thread B
- f2fs_decompress_end_io
- f2fs_put_dic
- queue_work
add free_dic work to post_read_wq
- do_unlink
- iput
- evict
- call_rcu
This file is deleted after read.
Thread C kworker to process post_read_wq
- rcu_do_batch
- f2fs_free_inode
- kmem_cache_free
inode is freed by rcu
- process_scheduled_works
- f2fs_late_free_dic
- f2fs_free_dic
- f2fs_release_decomp_mem
read (dic->inode)->i_compress_algorithm
This patch store compress_algorithm and sbi in dic to avoid inode UAF.
In addition, the previous solution is deprecated in [1] may cause system hang.
[1] https://lore.kernel.org/all/c36ab955-c8db-4a8b-a9d0-f07b5f426c3f@kernel.org
Cc: Daeho Jeong <daehojeong@google.com>
Fixes: bff139b49d9f ("f2fs: handle decompress only post processing in softirq")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Baocong Liu <baocong.liu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ In Linux 6.6.y, the f2fs_vmalloc() function parameters are not
related to the f2fs_sb_info structure, the code changes for
f2fs_vmalloc() have not been backported. ]
Signed-off-by: Bin Lan <lanbincn@139.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8e2a9b656474d67c55010f2c003ea2cf889a19ff ]
No logic changes, just cleanup and prepare for fixing the UAF issue
in f2fs_free_dic.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Baocong Liu <baocong.liu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bin Lan <lanbincn@139.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0fe1c6bec54ea68ed8c987b3890f2296364e77bb ]
Should cast type of folio->index from pgoff_t to loff_t to avoid overflow
while left shift operation.
Fixes: 3265d3db1f16 ("f2fs: support partial truncation on compressed inode")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ Modification: Using rpages[i]->index instead of folio->index due to
it was changed since commit:1cda5bc0b2fe ("f2fs: Use a folio in
f2fs_truncate_partial_cluster()") on 6.14 ]
Signed-off-by: Rajani Kantha <681739313@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 23361bd54966b437e1ed3eb1a704572f4b279e58 ]
When we get wrong extent info data, and look up extent_node in rb tree,
it will cause infinite loop (CONFIG_F2FS_CHECK_FS=n). Avoiding this by
return NULL and print some kernel messages in that case.
Signed-off-by: wangzijie <wangzijie1@honor.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 9d5c4f5c7a2c7677e1b3942772122b032c265aae upstream.
Assuming the disk layout as below,
disk0: 0 --- 0x00035abfff
disk1: 0x00035ac000 --- 0x00037abfff
disk2: 0x00037ac000 --- 0x00037ebfff
and we want to read data from offset=13568 having len=128 across the block
devices, we can illustrate the block addresses like below.
0 .. 0x00037ac000 ------------------- 0x00037ebfff, 0x00037ec000 -------
| ^ ^ ^
| fofs 0 13568 13568+128
| ------------------------------------------------------
| LBA 0x37e8aa9 0x37ebfa9 0x37ec029
--- map 0x3caa9 0x3ffa9
In this example, we should give the relative map of the target block device
ranging from 0x3caa9 to 0x3ffa9 where the length should be calculated by
0x37ebfff + 1 - 0x37ebfa9.
In the below equation, however, map->m_pblk was supposed to be the original
address instead of the one from the target block address.
- map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk);
Cc: stable@vger.kernel.org
Fixes: 71f2c8206202 ("f2fs: multidevice: support direct IO")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8175c864391753b210f3dcfae1aeed686a226ebb ]
Script to reproduce:
f2fs_io write 1 0 1881 rand dsync testfile
f2fs_io fallocate 0 7708672 4096 testfile
f2fs_io write 1 1881 1 rand buffered testfile
fsync testfile
umount
mount
f2fs_io precache_extents testfile
When the data layout is something like this:
dnode1: dnode2:
[0] A [0] NEW_ADDR
[1] A+1 [1] 0x0
...
[1016] A+1016
[1017] B (B!=A+1017) [1017] 0x0
During precache_extents, we map the last block(valid blkaddr) in dnode1:
map->m_flags |= F2FS_MAP_MAPPED;
map->m_pblk = blkaddr(valid blkaddr);
map->m_len = 1;
then we goto next_dnode, meet the first block in dnode2(hole), goto sync_out:
map->m_flags & F2FS_MAP_MAPPED == true, and we make zero-sized extent:
map->m_len = 1
ofs = start_pgofs - map->m_lblk = 1882 - 1881 = 1
ei.fofs = start_pgofs = 1882
ei.len = map->m_len - ofs = 1 - 1 = 0
Rebased on patch[1], this patch can cover these cases to avoid zero-sized extent:
A,B,C is valid blkaddr
case1:
dnode1: dnode2:
[0] A [0] NEW_ADDR
[1] A+1 [1] 0x0
... ....
[1016] A+1016
[1017] B (B!=A+1017) [1017] 0x0
case2:
dnode1: dnode2:
[0] A [0] C (C!=B+1)
[1] A+1 [1] C+1
... ....
[1016] A+1016
[1017] B (B!=A+1017) [1017] 0x0
case3:
dnode1: dnode2:
[0] A [0] C (C!=B+2)
[1] A+1 [1] C+1
... ....
[1015] A+1015
[1016] B (B!=A+1016)
[1017] B+1 [1017] 0x0
[1] https://lore.kernel.org/linux-f2fs-devel/20250912081250.44383-1-chao@kernel.org/
Fixes: c4020b2da4c9 ("f2fs: support F2FS_IOC_PRECACHE_EXTENTS")
Signed-off-by: wangzijie <wangzijie1@honor.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c2f7c32b254006ad48f8e4efb2e7e7bf71739f17 ]
f2fs_zero_post_eof_page() may cuase more overhead due to invalidate_lock
and page lookup, change as below to mitigate its overhead:
- check new_size before grabbing invalidate_lock
- lookup and invalidate pages only in range of [old_size, new_size]
Fixes: ba8dac350faf ("f2fs: fix to zero post-eof page")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9251a9e6e871cb03c4714a18efa8f5d4a8818450 ]
syzbot reports a bug as below:
loop0: detected capacity change from 0 to 40427
F2FS-fs (loop0): Wrong SSA boundary, start(3584) end(4096) blocks(3072)
F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): invalid crc value
F2FS-fs (loop0): f2fs_convert_inline_folio: corrupted inline inode ino=3, i_addr[0]:0x1601, run fsck to fix.
------------[ cut here ]------------
kernel BUG at fs/inode.c:753!
RIP: 0010:clear_inode+0x169/0x190 fs/inode.c:753
Call Trace:
<TASK>
evict+0x504/0x9c0 fs/inode.c:810
f2fs_fill_super+0x5612/0x6fa0 fs/f2fs/super.c:5047
get_tree_bdev_flags+0x40e/0x4d0 fs/super.c:1692
vfs_get_tree+0x8f/0x2b0 fs/super.c:1815
do_new_mount+0x2a2/0x9e0 fs/namespace.c:3808
do_mount fs/namespace.c:4136 [inline]
__do_sys_mount fs/namespace.c:4347 [inline]
__se_sys_mount+0x317/0x410 fs/namespace.c:4324
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
During f2fs_evict_inode(), clear_inode() detects that we missed to truncate
all page cache before destorying inode, that is because in below path, we
will create page #0 in cache, but missed to drop it in error path, let's fix
it.
- evict
- f2fs_evict_inode
- f2fs_truncate
- f2fs_convert_inline_inode
- f2fs_grab_cache_folio
: create page #0 in cache
- f2fs_convert_inline_folio
: sanity check failed, return -EFSCORRUPTED
- clear_inode detects that inode->i_data.nrpages is not zero
Fixes: 92dffd01790a ("f2fs: convert inline_data when i_size becomes large")
Reported-by: syzbot+90266696fe5daacebd35@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68c09802.050a0220.3c6139.000e.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 869833f54e8306326b85ca3ed08979b7ad412a4a ]
Script to reproduce:
mkfs.f2fs -O extra_attr,compression /dev/vdb -f
mount /dev/vdb /mnt/f2fs -o mode=lfs,noextent_cache
cd /mnt/f2fs
f2fs_io write 1 0 1024 rand dsync testfile
xfs_io testfile -c "fsync"
f2fs_io write 1 0 512 rand dsync testfile
xfs_io testfile -c "fsync"
cd /
umount /mnt/f2fs
mount /dev/vdb /mnt/f2fs
f2fs_io precache_extents /mnt/f2fs/testfile
umount /mnt/f2fs
Tracepoint output:
f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 0, len = 512, blkaddr = 1055744, c_len = 0
f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 513, len = 351, blkaddr = 17921, c_len = 0
f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 864, len = 160, blkaddr = 18272, c_len = 0
During precache_extents, there is off-by-one issue, we should update
map->m_next_extent to pgofs rather than pgofs + 1, if last blkaddr is
valid and not contiguous to previous extent.
Fixes: c4020b2da4c9 ("f2fs: support F2FS_IOC_PRECACHE_EXTENTS")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e75ce117905d2830976a289e718470f3230fa30a ]
If reserve_root mount option is not assigned, __allow_reserved_blocks()
will return false, it's not correct, fix it.
Fixes: 7e65be49ed94 ("f2fs: add reserved blocks for root user")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 77de19b6867f2740cdcb6c9c7e50d522b47847a4 upstream.
As Jiaming Zhang reported:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x1c1/0x2a0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x17e/0x800 mm/kasan/report.c:480
kasan_report+0x147/0x180 mm/kasan/report.c:593
data_blkaddr fs/f2fs/f2fs.h:3053 [inline]
f2fs_data_blkaddr fs/f2fs/f2fs.h:3058 [inline]
f2fs_get_dnode_of_data+0x1a09/0x1c40 fs/f2fs/node.c:855
f2fs_reserve_block+0x53/0x310 fs/f2fs/data.c:1195
prepare_write_begin fs/f2fs/data.c:3395 [inline]
f2fs_write_begin+0xf39/0x2190 fs/f2fs/data.c:3594
generic_perform_write+0x2c7/0x910 mm/filemap.c:4112
f2fs_buffered_write_iter fs/f2fs/file.c:4988 [inline]
f2fs_file_write_iter+0x1ec8/0x2410 fs/f2fs/file.c:5216
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x546/0xa90 fs/read_write.c:686
ksys_write+0x149/0x250 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf3/0x3d0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The root cause is in the corrupted image, there is a dnode has the same
node id w/ its inode, so during f2fs_get_dnode_of_data(), it tries to
access block address in dnode at offset 934, however it parses the dnode
as inode node, so that get_dnode_addr() returns 360, then it tries to
access page address from 360 + 934 * 4 = 4096 w/ 4 bytes.
To fix this issue, let's add sanity check for node id of all direct nodes
during f2fs_get_dnode_of_data().
Cc: stable@kernel.org
Reported-by: Jiaming Zhang <r772577952@gmail.com>
Closes: https://groups.google.com/g/syzkaller/c/-ZnaaOOfO3M
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e23ab8028de0d92df5921a570f5212c0370db3b5 ]
Let's return errors caught by the generic checks. This fixes generic/494 where
it expects to see EBUSY by setattr_prepare instead of EINVAL by f2fs for active
swapfile.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1005a3ca28e90c7a64fa43023f866b960a60f791 ]
w/ "mode=lfs" mount option, generic/299 will cause system panic as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2835!
Call Trace:
<TASK>
f2fs_allocate_data_block+0x6f4/0xc50
f2fs_map_blocks+0x970/0x1550
f2fs_iomap_begin+0xb2/0x1e0
iomap_iter+0x1d6/0x430
__iomap_dio_rw+0x208/0x9a0
f2fs_file_write_iter+0x6b3/0xfa0
aio_write+0x15d/0x2e0
io_submit_one+0x55e/0xab0
__x64_sys_io_submit+0xa5/0x230
do_syscall_64+0x84/0x2f0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0010:new_curseg+0x70f/0x720
The root cause of we run out-of-space is: in f2fs_map_blocks(), f2fs may
trigger foreground gc only if it allocates any physical block, it will be
a little bit later when there is multiple threads writing data w/
aio/dio/bufio method in parallel, since we always use OPU in lfs mode, so
f2fs_map_blocks() does block allocations aggressively.
In order to fix this issue, let's give a chance to trigger foreground
gc in prior to block allocation in f2fs_map_blocks().
Fixes: 36abef4e796d ("f2fs: introduce mode=lfs mount option")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e194e140ab7de2ce2782e64b9e086a43ca6ff4f2 ]
In lfs mode, dirty data needs OPU, we'd better calculate lower_p and
upper_p w/ them during has_not_enough_free_secs(), otherwise we may
encounter out-of-space issue due to we missed to reclaim enough
free section w/ foreground gc.
Fixes: 36abef4e796d ("f2fs: introduce mode=lfs mount option")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6840faddb65683b4e7bd8196f177b038a1e19faf ]
Commit 1acd73edbbfe ("f2fs: fix to account dirty data in __get_secs_required()")
missed to calculate upper_p w/ data_secs, fix it.
Fixes: 1acd73edbbfe ("f2fs: fix to account dirty data in __get_secs_required()")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 08a7efc5b02a0620ae16aa9584060e980a69cb55 ]
When testing F2FS with xfstests using UFS backed virtual disks the
kernel complains sometimes that f2fs_release_decomp_mem() calls
vm_unmap_ram() from an invalid context. Example trace from
f2fs/007 test:
f2fs/007 5s ... [12:59:38][ 8.902525] run fstests f2fs/007
[ 11.468026] BUG: sleeping function called from invalid context at mm/vmalloc.c:2978
[ 11.471849] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 68, name: irq/22-ufshcd
[ 11.475357] preempt_count: 1, expected: 0
[ 11.476970] RCU nest depth: 0, expected: 0
[ 11.478531] CPU: 0 UID: 0 PID: 68 Comm: irq/22-ufshcd Tainted: G W 6.16.0-rc5-xfstests-ufs-g40f92e79b0aa #9 PREEMPT(none)
[ 11.478535] Tainted: [W]=WARN
[ 11.478536] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 11.478537] Call Trace:
[ 11.478543] <TASK>
[ 11.478545] dump_stack_lvl+0x4e/0x70
[ 11.478554] __might_resched.cold+0xaf/0xbe
[ 11.478557] vm_unmap_ram+0x21/0xb0
[ 11.478560] f2fs_release_decomp_mem+0x59/0x80
[ 11.478563] f2fs_free_dic+0x18/0x1a0
[ 11.478565] f2fs_finish_read_bio+0xd7/0x290
[ 11.478570] blk_update_request+0xec/0x3b0
[ 11.478574] ? sbitmap_queue_clear+0x3b/0x60
[ 11.478576] scsi_end_request+0x27/0x1a0
[ 11.478582] scsi_io_completion+0x40/0x300
[ 11.478583] ufshcd_mcq_poll_cqe_lock+0xa3/0xe0
[ 11.478588] ufshcd_sl_intr+0x194/0x1f0
[ 11.478592] ufshcd_threaded_intr+0x68/0xb0
[ 11.478594] ? __pfx_irq_thread_fn+0x10/0x10
[ 11.478599] irq_thread_fn+0x20/0x60
[ 11.478602] ? __pfx_irq_thread_fn+0x10/0x10
[ 11.478603] irq_thread+0xb9/0x180
[ 11.478605] ? __pfx_irq_thread_dtor+0x10/0x10
[ 11.478607] ? __pfx_irq_thread+0x10/0x10
[ 11.478609] kthread+0x10a/0x230
[ 11.478614] ? __pfx_kthread+0x10/0x10
[ 11.478615] ret_from_fork+0x7e/0xd0
[ 11.478619] ? __pfx_kthread+0x10/0x10
[ 11.478621] ret_from_fork_asm+0x1a/0x30
[ 11.478623] </TASK>
This patch modifies in_task() check inside f2fs_read_end_io() to also
check if interrupts are disabled. This ensures that pages are unmapped
asynchronously in an interrupt handler.
Fixes: bff139b49d9f ("f2fs: handle decompress only post processing in softirq")
Signed-off-by: Jan Prusakowski <jprusakowski@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5661998536af52848cc4d52a377e90368196edea ]
- touch /mnt/f2fs/012345678901234567890123456789012345678901234567890123
- truncate -s $((1024*1024*1024)) \
/mnt/f2fs/012345678901234567890123456789012345678901234567890123
- touch /mnt/f2fs/file
- truncate -s $((1024*1024*1024)) /mnt/f2fs/file
- mkfs.f2fs /mnt/f2fs/012345678901234567890123456789012345678901234567890123 \
-c /mnt/f2fs/file
- mount /mnt/f2fs/012345678901234567890123456789012345678901234567890123 \
/mnt/f2fs/loop
[16937.192225] F2FS-fs (loop0): Mount Device [ 0]: /mnt/f2fs/012345678901234567890123456789012345678901234567890123\xff\x01, 511, 0 - 3ffff
[16937.192268] F2FS-fs (loop0): Failed to find devices
If device path length equals to MAX_PATH_LEN, sbi->devs.path[] may
not end up w/ null character due to path array is fully filled, So
accidently, fields locate after path[] may be treated as part of
device path, result in parsing wrong device path.
struct f2fs_dev_info {
...
char path[MAX_PATH_LEN];
...
};
Let's add one byte space for sbi->devs.path[] to store null
character of device path string.
Fixes: 3c62be17d4f5 ("f2fs: support multiple devices")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a509a55f8eecc8970b3980c6f06886bbff0e2f68 ]
As syzbot [1] reported as below:
R10: 0000000000000100 R11: 0000000000000206 R12: 00007ffe17473450
R13: 00007f28b1c10854 R14: 000000000000dae5 R15: 00007ffe17474520
</TASK>
---[ end trace 0000000000000000 ]---
==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
Read of size 8 at addr ffff88812d962278 by task syz-executor/564
CPU: 1 PID: 564 Comm: syz-executor Tainted: G W 6.1.129-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
<TASK>
__dump_stack+0x21/0x24 lib/dump_stack.c:88
dump_stack_lvl+0xee/0x158 lib/dump_stack.c:106
print_address_description+0x71/0x210 mm/kasan/report.c:316
print_report+0x4a/0x60 mm/kasan/report.c:427
kasan_report+0x122/0x150 mm/kasan/report.c:531
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:351
__list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
__list_del_entry include/linux/list.h:134 [inline]
list_del_init include/linux/list.h:206 [inline]
f2fs_inode_synced+0xf7/0x2e0 fs/f2fs/super.c:1531
f2fs_update_inode+0x74/0x1c40 fs/f2fs/inode.c:585
f2fs_update_inode_page+0x137/0x170 fs/f2fs/inode.c:703
f2fs_write_inode+0x4ec/0x770 fs/f2fs/inode.c:731
write_inode fs/fs-writeback.c:1460 [inline]
__writeback_single_inode+0x4a0/0xab0 fs/fs-writeback.c:1677
writeback_single_inode+0x221/0x8b0 fs/fs-writeback.c:1733
sync_inode_metadata+0xb6/0x110 fs/fs-writeback.c:2789
f2fs_sync_inode_meta+0x16d/0x2a0 fs/f2fs/checkpoint.c:1159
block_operations fs/f2fs/checkpoint.c:1269 [inline]
f2fs_write_checkpoint+0xca3/0x2100 fs/f2fs/checkpoint.c:1658
kill_f2fs_super+0x231/0x390 fs/f2fs/super.c:4668
deactivate_locked_super+0x98/0x100 fs/super.c:332
deactivate_super+0xaf/0xe0 fs/super.c:363
cleanup_mnt+0x45f/0x4e0 fs/namespace.c:1186
__cleanup_mnt+0x19/0x20 fs/namespace.c:1193
task_work_run+0x1c6/0x230 kernel/task_work.c:203
exit_task_work include/linux/task_work.h:39 [inline]
do_exit+0x9fb/0x2410 kernel/exit.c:871
do_group_exit+0x210/0x2d0 kernel/exit.c:1021
__do_sys_exit_group kernel/exit.c:1032 [inline]
__se_sys_exit_group kernel/exit.c:1030 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1030
x64_sys_call+0x7b4/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:232
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
RIP: 0033:0x7f28b1b8e169
Code: Unable to access opcode bytes at 0x7f28b1b8e13f.
RSP: 002b:00007ffe174710a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f28b1c10879 RCX: 00007f28b1b8e169
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 0000000000000002 R08: 00007ffe1746ee47 R09: 00007ffe17472360
R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffe17472360
R13: 00007f28b1c10854 R14: 000000000000dae5 R15: 00007ffe17474520
</TASK>
Allocated by task 569:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
kasan_save_alloc_info+0x25/0x30 mm/kasan/generic.c:505
__kasan_slab_alloc+0x72/0x80 mm/kasan/common.c:328
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook+0x4f/0x2c0 mm/slab.h:737
slab_alloc_node mm/slub.c:3398 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc_lru+0x104/0x220 mm/slub.c:3429
alloc_inode_sb include/linux/fs.h:3245 [inline]
f2fs_alloc_inode+0x2d/0x340 fs/f2fs/super.c:1419
alloc_inode fs/inode.c:261 [inline]
iget_locked+0x186/0x880 fs/inode.c:1373
f2fs_iget+0x55/0x4c60 fs/f2fs/inode.c:483
f2fs_lookup+0x366/0xab0 fs/f2fs/namei.c:487
__lookup_slow+0x2a3/0x3d0 fs/namei.c:1690
lookup_slow+0x57/0x70 fs/namei.c:1707
walk_component+0x2e6/0x410 fs/namei.c:1998
lookup_last fs/namei.c:2455 [inline]
path_lookupat+0x180/0x490 fs/namei.c:2479
filename_lookup+0x1f0/0x500 fs/namei.c:2508
vfs_statx+0x10b/0x660 fs/stat.c:229
vfs_fstatat fs/stat.c:267 [inline]
vfs_lstat include/linux/fs.h:3424 [inline]
__do_sys_newlstat fs/stat.c:423 [inline]
__se_sys_newlstat+0xd5/0x350 fs/stat.c:417
__x64_sys_newlstat+0x5b/0x70 fs/stat.c:417
x64_sys_call+0x393/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:7
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
Freed by task 13:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
kasan_save_free_info+0x31/0x50 mm/kasan/generic.c:516
____kasan_slab_free+0x132/0x180 mm/kasan/common.c:236
__kasan_slab_free+0x11/0x20 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:177 [inline]
slab_free_hook mm/slub.c:1724 [inline]
slab_free_freelist_hook+0xc2/0x190 mm/slub.c:1750
slab_free mm/slub.c:3661 [inline]
kmem_cache_free+0x12d/0x2a0 mm/slub.c:3683
f2fs_free_inode+0x24/0x30 fs/f2fs/super.c:1562
i_callback+0x4c/0x70 fs/inode.c:250
rcu_do_batch+0x503/0xb80 kernel/rcu/tree.c:2297
rcu_core+0x5a2/0xe70 kernel/rcu/tree.c:2557
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2574
handle_softirqs+0x178/0x500 kernel/softirq.c:578
run_ksoftirqd+0x28/0x30 kernel/softirq.c:945
smpboot_thread_fn+0x45a/0x8c0 kernel/smpboot.c:164
kthread+0x270/0x310 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Last potentially related work creation:
kasan_save_stack+0x3a/0x60 mm/kasan/common.c:45
__kasan_record_aux_stack+0xb6/0xc0 mm/kasan/generic.c:486
kasan_record_aux_stack_noalloc+0xb/0x10 mm/kasan/generic.c:496
call_rcu+0xd4/0xf70 kernel/rcu/tree.c:2845
destroy_inode fs/inode.c:316 [inline]
evict+0x7da/0x870 fs/inode.c:720
iput_final fs/inode.c:1834 [inline]
iput+0x62b/0x830 fs/inode.c:1860
do_unlinkat+0x356/0x540 fs/namei.c:4397
__do_sys_unlink fs/namei.c:4438 [inline]
__se_sys_unlink fs/namei.c:4436 [inline]
__x64_sys_unlink+0x49/0x50 fs/namei.c:4436
x64_sys_call+0x958/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:88
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
The buggy address belongs to the object at ffff88812d961f20
which belongs to the cache f2fs_inode_cache of size 1200
The buggy address is located 856 bytes inside of
1200-byte region [ffff88812d961f20, ffff88812d9623d0)
The buggy address belongs to the physical page:
page:ffffea0004b65800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12d960
head:ffffea0004b65800 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=1)
raw: 4000000000010200 0000000000000000 dead000000000122 ffff88810a94c500
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Reclaimable, gfp_mask 0x1d2050(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_RECLAIMABLE), pid 569, tgid 568 (syz.2.16), ts 55943246141, free_ts 0
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x1d0/0x1f0 mm/page_alloc.c:2532
prep_new_page mm/page_alloc.c:2539 [inline]
get_page_from_freelist+0x2e63/0x2ef0 mm/page_alloc.c:4328
__alloc_pages+0x235/0x4b0 mm/page_alloc.c:5605
alloc_slab_page include/linux/gfp.h:-1 [inline]
allocate_slab mm/slub.c:1939 [inline]
new_slab+0xec/0x4b0 mm/slub.c:1992
___slab_alloc+0x6f6/0xb50 mm/slub.c:3180
__slab_alloc+0x5e/0xa0 mm/slub.c:3279
slab_alloc_node mm/slub.c:3364 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc_lru+0x13f/0x220 mm/slub.c:3429
alloc_inode_sb include/linux/fs.h:3245 [inline]
f2fs_alloc_inode+0x2d/0x340 fs/f2fs/super.c:1419
alloc_inode fs/inode.c:261 [inline]
iget_locked+0x186/0x880 fs/inode.c:1373
f2fs_iget+0x55/0x4c60 fs/f2fs/inode.c:483
f2fs_fill_super+0x3ad7/0x6bb0 fs/f2fs/super.c:4293
mount_bdev+0x2ae/0x3e0 fs/super.c:1443
f2fs_mount+0x34/0x40 fs/f2fs/super.c:4642
legacy_get_tree+0xea/0x190 fs/fs_context.c:632
vfs_get_tree+0x89/0x260 fs/super.c:1573
do_new_mount+0x25a/0xa20 fs/namespace.c:3056
page_owner free stack trace missing
Memory state around the buggy address:
ffff88812d962100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88812d962180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88812d962200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88812d962280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88812d962300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
[1] https://syzkaller.appspot.com/x/report.txt?x=13448368580000
This bug can be reproduced w/ the reproducer [2], once we enable
CONFIG_F2FS_CHECK_FS config, the reproducer will trigger panic as below,
so the direct reason of this bug is the same as the one below patch [3]
fixed.
kernel BUG at fs/f2fs/inode.c:857!
RIP: 0010:f2fs_evict_inode+0x1204/0x1a20
Call Trace:
<TASK>
evict+0x32a/0x7a0
do_unlinkat+0x37b/0x5b0
__x64_sys_unlink+0xad/0x100
do_syscall_64+0x5a/0xb0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0010:f2fs_evict_inode+0x1204/0x1a20
[2] https://syzkaller.appspot.com/x/repro.c?x=17495ccc580000
[3] https://lore.kernel.org/linux-f2fs-devel/20250702120321.1080759-1-chao@kernel.org
Tracepoints before panic:
f2fs_unlink_enter: dev = (7,0), dir ino = 3, i_size = 4096, i_blocks = 8, name = file1
f2fs_unlink_exit: dev = (7,0), ino = 7, ret = 0
f2fs_evict_inode: dev = (7,0), ino = 7, pino = 3, i_mode = 0x81ed, i_size = 10, i_nlink = 0, i_blocks = 0, i_advise = 0x0
f2fs_truncate_node: dev = (7,0), ino = 7, nid = 8, block_address = 0x3c05
f2fs_unlink_enter: dev = (7,0), dir ino = 3, i_size = 4096, i_blocks = 8, name = file3
f2fs_unlink_exit: dev = (7,0), ino = 8, ret = 0
f2fs_evict_inode: dev = (7,0), ino = 8, pino = 3, i_mode = 0x81ed, i_size = 9000, i_nlink = 0, i_blocks = 24, i_advise = 0x4
f2fs_truncate: dev = (7,0), ino = 8, pino = 3, i_mode = 0x81ed, i_size = 0, i_nlink = 0, i_blocks = 24, i_advise = 0x4
f2fs_truncate_blocks_enter: dev = (7,0), ino = 8, i_size = 0, i_blocks = 24, start file offset = 0
f2fs_truncate_blocks_exit: dev = (7,0), ino = 8, ret = -2
The root cause is: in the fuzzed image, dnode #8 belongs to inode #7,
after inode #7 eviction, dnode #8 was dropped.
However there is dirent that has ino #8, so, once we unlink file3, in
f2fs_evict_inode(), both f2fs_truncate() and f2fs_update_inode_page()
will fail due to we can not load node #8, result in we missed to call
f2fs_inode_synced() to clear inode dirty status.
Let's fix this by calling f2fs_inode_synced() in error path of
f2fs_evict_inode().
PS: As I verified, the reproducer [2] can trigger this bug in v6.1.129,
but it failed in v6.16-rc4, this is because the testcase will stop due to
other corruption has been detected by f2fs:
F2FS-fs (loop0): inconsistent node block, node_type:2, nid:8, node_footer[nid:8,ino:8,ofs:0,cpver:5013063228981249506,blkaddr:15366]
F2FS-fs (loop0): f2fs_lookup: inode (ino=9) has zero i_nlink
Fixes: 0f18b462b2e5 ("f2fs: flush inode metadata when checkpoint is doing")
Closes: https://syzkaller.appspot.com/x/report.txt?x=13448368580000
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7c30d79930132466f5be7d0b57add14d1a016bda ]
syzbot reported an UAF issue as below: [1] [2]
[1] https://syzkaller.appspot.com/text?tag=CrashReport&x=16594c60580000
==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
Read of size 8 at addr ffff888100567dc8 by task kworker/u4:0/8
CPU: 1 PID: 8 Comm: kworker/u4:0 Tainted: G W 6.1.129-syzkaller-00017-g642656a36791 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Workqueue: writeback wb_workfn (flush-7:0)
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x151/0x1b7 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:316 [inline]
print_report+0x158/0x4e0 mm/kasan/report.c:427
kasan_report+0x13c/0x170 mm/kasan/report.c:531
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:351
__list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
__list_del_entry include/linux/list.h:134 [inline]
list_del_init include/linux/list.h:206 [inline]
f2fs_inode_synced+0x100/0x2e0 fs/f2fs/super.c:1553
f2fs_update_inode+0x72/0x1c40 fs/f2fs/inode.c:588
f2fs_update_inode_page+0x135/0x170 fs/f2fs/inode.c:706
f2fs_write_inode+0x416/0x790 fs/f2fs/inode.c:734
write_inode fs/fs-writeback.c:1460 [inline]
__writeback_single_inode+0x4cf/0xb80 fs/fs-writeback.c:1677
writeback_sb_inodes+0xb32/0x1910 fs/fs-writeback.c:1903
__writeback_inodes_wb+0x118/0x3f0 fs/fs-writeback.c:1974
wb_writeback+0x3da/0xa00 fs/fs-writeback.c:2081
wb_check_background_flush fs/fs-writeback.c:2151 [inline]
wb_do_writeback fs/fs-writeback.c:2239 [inline]
wb_workfn+0xbba/0x1030 fs/fs-writeback.c:2266
process_one_work+0x73d/0xcb0 kernel/workqueue.c:2299
worker_thread+0xa60/0x1260 kernel/workqueue.c:2446
kthread+0x26d/0x300 kernel/kthread.c:386
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Allocated by task 298:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
kasan_save_alloc_info+0x1f/0x30 mm/kasan/generic.c:505
__kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:333
kasan_slab_alloc include/linux/kasan.h:202 [inline]
slab_post_alloc_hook+0x53/0x2c0 mm/slab.h:768
slab_alloc_node mm/slub.c:3421 [inline]
slab_alloc mm/slub.c:3431 [inline]
__kmem_cache_alloc_lru mm/slub.c:3438 [inline]
kmem_cache_alloc_lru+0x102/0x270 mm/slub.c:3454
alloc_inode_sb include/linux/fs.h:3255 [inline]
f2fs_alloc_inode+0x2d/0x350 fs/f2fs/super.c:1437
alloc_inode fs/inode.c:261 [inline]
iget_locked+0x18c/0x7e0 fs/inode.c:1373
f2fs_iget+0x55/0x4ca0 fs/f2fs/inode.c:486
f2fs_lookup+0x3c1/0xb50 fs/f2fs/namei.c:484
__lookup_slow+0x2b9/0x3e0 fs/namei.c:1689
lookup_slow+0x5a/0x80 fs/namei.c:1706
walk_component+0x2e7/0x410 fs/namei.c:1997
lookup_last fs/namei.c:2454 [inline]
path_lookupat+0x16d/0x450 fs/namei.c:2478
filename_lookup+0x251/0x600 fs/namei.c:2507
vfs_statx+0x107/0x4b0 fs/stat.c:229
vfs_fstatat fs/stat.c:267 [inline]
vfs_lstat include/linux/fs.h:3434 [inline]
__do_sys_newlstat fs/stat.c:423 [inline]
__se_sys_newlstat+0xda/0x7c0 fs/stat.c:417
__x64_sys_newlstat+0x5b/0x70 fs/stat.c:417
x64_sys_call+0x52/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:7
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x3b/0x80 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
Freed by task 0:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:516
____kasan_slab_free+0x131/0x180 mm/kasan/common.c:241
__kasan_slab_free+0x11/0x20 mm/kasan/common.c:249
kasan_slab_free include/linux/kasan.h:178 [inline]
slab_free_hook mm/slub.c:1745 [inline]
slab_free_freelist_hook mm/slub.c:1771 [inline]
slab_free mm/slub.c:3686 [inline]
kmem_cache_free+0x291/0x560 mm/slub.c:3711
f2fs_free_inode+0x24/0x30 fs/f2fs/super.c:1584
i_callback+0x4b/0x70 fs/inode.c:250
rcu_do_batch+0x552/0xbe0 kernel/rcu/tree.c:2297
rcu_core+0x502/0xf40 kernel/rcu/tree.c:2557
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2574
handle_softirqs+0x1db/0x650 kernel/softirq.c:624
__do_softirq kernel/softirq.c:662 [inline]
invoke_softirq kernel/softirq.c:479 [inline]
__irq_exit_rcu+0x52/0xf0 kernel/softirq.c:711
irq_exit_rcu+0x9/0x10 kernel/softirq.c:723
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1118 [inline]
sysvec_apic_timer_interrupt+0xa9/0xc0 arch/x86/kernel/apic/apic.c:1118
asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:691
Last potentially related work creation:
kasan_save_stack+0x3b/0x60 mm/kasan/common.c:45
__kasan_record_aux_stack+0xb4/0xc0 mm/kasan/generic.c:486
kasan_record_aux_stack_noalloc+0xb/0x10 mm/kasan/generic.c:496
__call_rcu_common kernel/rcu/tree.c:2807 [inline]
call_rcu+0xdc/0x10f0 kernel/rcu/tree.c:2926
destroy_inode fs/inode.c:316 [inline]
evict+0x87d/0x930 fs/inode.c:720
iput_final fs/inode.c:1834 [inline]
iput+0x616/0x690 fs/inode.c:1860
do_unlinkat+0x4e1/0x920 fs/namei.c:4396
__do_sys_unlink fs/namei.c:4437 [inline]
__se_sys_unlink fs/namei.c:4435 [inline]
__x64_sys_unlink+0x49/0x50 fs/namei.c:4435
x64_sys_call+0x289/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:88
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x3b/0x80 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
The buggy address belongs to the object at ffff888100567a10
which belongs to the cache f2fs_inode_cache of size 1360
The buggy address is located 952 bytes inside of
1360-byte region [ffff888100567a10, ffff888100567f60)
The buggy address belongs to the physical page:
page:ffffea0004015800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100560
head:ffffea0004015800 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=1)
raw: 4000000000010200 0000000000000000 dead000000000122 ffff8881002c4d80
raw: 0000000000000000 0000000080160016 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Reclaimable, gfp_mask 0xd2050(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE), pid 298, tgid 298 (syz-executor330), ts 26489303743, free_ts 0
set_page_owner include/linux/page_owner.h:33 [inline]
post_alloc_hook+0x213/0x220 mm/page_alloc.c:2637
prep_new_page+0x1b/0x110 mm/page_alloc.c:2644
get_page_from_freelist+0x3a98/0x3b10 mm/page_alloc.c:4539
__alloc_pages+0x234/0x610 mm/page_alloc.c:5837
alloc_slab_page+0x6c/0xf0 include/linux/gfp.h:-1
allocate_slab mm/slub.c:1962 [inline]
new_slab+0x90/0x3e0 mm/slub.c:2015
___slab_alloc+0x6f9/0xb80 mm/slub.c:3203
__slab_alloc+0x5d/0xa0 mm/slub.c:3302
slab_alloc_node mm/slub.c:3387 [inline]
slab_alloc mm/slub.c:3431 [inline]
__kmem_cache_alloc_lru mm/slub.c:3438 [inline]
kmem_cache_alloc_lru+0x149/0x270 mm/slub.c:3454
alloc_inode_sb include/linux/fs.h:3255 [inline]
f2fs_alloc_inode+0x2d/0x350 fs/f2fs/super.c:1437
alloc_inode fs/inode.c:261 [inline]
iget_locked+0x18c/0x7e0 fs/inode.c:1373
f2fs_iget+0x55/0x4ca0 fs/f2fs/inode.c:486
f2fs_fill_super+0x5360/0x6dc0 fs/f2fs/super.c:4488
mount_bdev+0x282/0x3b0 fs/super.c:1445
f2fs_mount+0x34/0x40 fs/f2fs/super.c:4743
legacy_get_tree+0xf1/0x190 fs/fs_context.c:632
page_owner free stack trace missing
Memory state around the buggy address:
ffff888100567c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888100567d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888100567d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888100567e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888100567e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
[2] https://syzkaller.appspot.com/text?tag=CrashLog&x=13654c60580000
[ 24.675720][ T28] audit: type=1400 audit(1745327318.732:72): avc: denied { write } for pid=298 comm="syz-executor399" name="/" dev="loop0" ino=3 scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[ 24.705426][ T296] ------------[ cut here ]------------
[ 24.706608][ T28] audit: type=1400 audit(1745327318.732:73): avc: denied { remove_name } for pid=298 comm="syz-executor399" name="file0" dev="loop0" ino=4 scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[ 24.711550][ T296] WARNING: CPU: 0 PID: 296 at fs/f2fs/inode.c:847 f2fs_evict_inode+0x1262/0x1540
[ 24.734141][ T28] audit: type=1400 audit(1745327318.732:74): avc: denied { rename } for pid=298 comm="syz-executor399" name="file0" dev="loop0" ino=4 scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[ 24.742969][ T296] Modules linked in:
[ 24.765201][ T28] audit: type=1400 audit(1745327318.732:75): avc: denied { add_name } for pid=298 comm="syz-executor399" name="bus" scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[ 24.768847][ T296] CPU: 0 PID: 296 Comm: syz-executor399 Not tainted 6.1.129-syzkaller-00017-g642656a36791 #0
[ 24.799506][ T296] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
[ 24.809401][ T296] RIP: 0010:f2fs_evict_inode+0x1262/0x1540
[ 24.815018][ T296] Code: 34 70 4a ff eb 0d e8 2d 70 4a ff 4d 89 e5 4c 8b 64 24 18 48 8b 5c 24 28 4c 89 e7 e8 78 38 03 00 e9 84 fc ff ff e8 0e 70 4a ff <0f> 0b 4c 89 f7 be 08 00 00 00 e8 7f 21 92 ff f0 41 80 0e 04 e9 61
[ 24.834584][ T296] RSP: 0018:ffffc90000db7a40 EFLAGS: 00010293
[ 24.840465][ T296] RAX: ffffffff822aca42 RBX: 0000000000000002 RCX: ffff888110948000
[ 24.848291][ T296] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
[ 24.856064][ T296] RBP: ffffc90000db7bb0 R08: ffffffff822ac6a8 R09: ffffed10200b005d
[ 24.864073][ T296] R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888100580000
[ 24.871812][ T296] R13: dffffc0000000000 R14: ffff88810fef4078 R15: 1ffff920001b6f5c
The root cause is w/ a fuzzed image, f2fs may missed to clear FI_DIRTY_INODE
flag for target inode, after f2fs_evict_inode(), the inode is still linked in
sbi->inode_list[DIRTY_META] global list, once it triggers checkpoint,
f2fs_sync_inode_meta() may access the released inode.
In f2fs_evict_inode(), let's always call f2fs_inode_synced() to clear
FI_DIRTY_INODE flag and drop inode from global dirty list to avoid this
UAF issue.
Fixes: 0f18b462b2e5 ("f2fs: flush inode metadata when checkpoint is doing")
Closes: https://syzkaller.appspot.com/bug?extid=849174b2efaf0d8be6ba
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 154467f4ad033473e5c903a03e7b9bca7df9a0fa ]
KMSAN reported a use of uninitialized value in `__is_extent_mergeable()`
and `__is_back_mergeable()` via the read extent tree path.
The root cause is that `get_read_extent_info()` only initializes three
fields (`fofs`, `blk`, `len`) of `struct extent_info`, leaving the
remaining fields uninitialized. This leads to undefined behavior
when those fields are accessed later, especially during
extent merging.
Fix it by zero-initializing the `extent_info` struct before population.
Reported-by: syzbot+b8c1d60e95df65e827d4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b8c1d60e95df65e827d4
Fixes: 94afd6d6e525 ("f2fs: extent cache: support unaligned extent")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Abinash Singh <abinashsinghlalotra@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit eb70d5a6c932d9d23f4bb3e7b83782c21ac4b064 upstream.
syzbot reports a f2fs bug as below:
BUG: KASAN: slab-use-after-free in f2fs_filemap_fault+0xd1/0x2c0 fs/f2fs/file.c:49
Read of size 8 at addr ffff88807bb22680 by task syz-executor184/5058
CPU: 0 PID: 5058 Comm: syz-executor184 Not tainted 6.7.0-syzkaller-09928-g052d534373b7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x163/0x540 mm/kasan/report.c:488
kasan_report+0x142/0x170 mm/kasan/report.c:601
f2fs_filemap_fault+0xd1/0x2c0 fs/f2fs/file.c:49
__do_fault+0x131/0x450 mm/memory.c:4376
do_shared_fault mm/memory.c:4798 [inline]
do_fault mm/memory.c:4872 [inline]
do_pte_missing mm/memory.c:3745 [inline]
handle_pte_fault mm/memory.c:5144 [inline]
__handle_mm_fault+0x23b7/0x72b0 mm/memory.c:5285
handle_mm_fault+0x27e/0x770 mm/memory.c:5450
do_user_addr_fault arch/x86/mm/fault.c:1364 [inline]
handle_page_fault arch/x86/mm/fault.c:1507 [inline]
exc_page_fault+0x456/0x870 arch/x86/mm/fault.c:1563
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
The root cause is: in f2fs_filemap_fault(), vmf->vma may be not alive after
filemap_fault(), so it may cause use-after-free issue when accessing
vmf->vma->vm_flags in trace_f2fs_filemap_fault(). So it needs to keep vm_flags
in separated temporary variable for tracepoint use.
Fixes: 87f3afd366f7 ("f2fs: add tracepoint for f2fs_vm_page_mkwrite()")
Reported-and-tested-by: syzbot+763afad57075d3f862f2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000e8222b060f00db3b@google.com
Cc: Ed Tsai <Ed.Tsai@mediatek.com>
Suggested-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ba8dac350faf16afc129ce6303ca4feaf083ccb1 ]
fstest reports a f2fs bug:
#generic/363 42s ... [failed, exit status 1]- output mismatch (see /share/git/fstests/results//generic/363.out.bad)
# --- tests/generic/363.out 2025-01-12 21:57:40.271440542 +0800
# +++ /share/git/fstests/results//generic/363.out.bad 2025-05-19 19:55:58.000000000 +0800
# @@ -1,2 +1,78 @@
# QA output created by 363
# fsx -q -S 0 -e 1 -N 100000
# +READ BAD DATA: offset = 0xd6fb, size = 0xf044, fname = /mnt/f2fs/junk
# +OFFSET GOOD BAD RANGE
# +0x1540d 0x0000 0x2a25 0x0
# +operation# (mod 256) for the bad data may be 37
# +0x1540e 0x0000 0x2527 0x1
# ...
# (Run 'diff -u /share/git/fstests/tests/generic/363.out /share/git/fstests/results//generic/363.out.bad' to see the entire diff)
Ran: generic/363
Failures: generic/363
Failed 1 of 1 tests
The root cause is user can update post-eof page via mmap [1], however, f2fs
missed to zero post-eof page in below operations, so, once it expands i_size,
then it will include dummy data locates previous post-eof page, so during
below operations, we need to zero post-eof page.
Operations which can include dummy data after previous i_size after expanding
i_size:
- write
- mapwrite [1]
- truncate
- fallocate
* preallocate
* zero_range
* insert_range
* collapse_range
- clone_range (doesn’t support in f2fs)
- copy_range (doesn’t support in f2fs)
[1] https://man7.org/linux/man-pages/man2/mmap.2.html 'BUG section'
Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit aec5755951b74e3bbb5ddee39ac142a788547854 ]
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].
[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: ba8dac350faf ("f2fs: fix to zero post-eof page")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3fdd89b452c2ea5e2195d6e315bef122769584c9 ]
In a case writing without fallocate(), we can't guarantee it's allocated
in the conventional area for zoned stroage. To make it consistent across
storage devices, we disallow it regardless of storage device types.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: ba8dac350faf ("f2fs: fix to zero post-eof page")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 87f3afd366f7c668be0269efda8a89741a3ea6b3 ]
This patch adds to support tracepoint for f2fs_vm_page_mkwrite(),
meanwhile it prints more details for trace_f2fs_filemap_fault().
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: ba8dac350faf ("f2fs: fix to zero post-eof page")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a9201960623287927bf5776de3f70fb2fbde7e02 ]
This fixes an analogus bug that was fixed in modern filesystems:
a) xfs in commit 4b8d867ca6e2 ("xfs: don't over-report free space or
inodes in statvfs")
b) ext4 in commit f87d3af74193 ("ext4: don't over-report free space
or inodes in statvfs")
where statfs can report misleading / incorrect information where
project quota is enabled, and the free space is less than the
remaining quota.
This commit will resolve a test failure in generic/762 which tests
for this bug.
generic/762 - output mismatch (see /share/git/fstests/results//generic/762.out.bad)
# --- tests/generic/762.out 2025-04-15 10:21:53.371067071 +0800
# +++ /share/git/fstests/results//generic/762.out.bad 2025-05-13 16:13:37.000000000 +0800
# @@ -6,8 +6,10 @@
# root blocks2 is in range
# dir blocks2 is in range
# root bavail2 is in range
# -dir bavail2 is in range
# +dir bavail2 has value of 1539066
# +dir bavail2 is NOT in range 304734.87 .. 310891.13
# root blocks3 is in range
# ...
# (Run 'diff -u /share/git/fstests/tests/generic/762.out /share/git/fstests/results//generic/762.out.bad' to see the entire diff)
HINT: You _MAY_ be missing kernel fix:
XXXXXXXXXXXXXX xfs: don't over-report free space or inodes in statvfs
Cc: stable@kernel.org
Fixes: ddc34e328d06 ("f2fs: introduce f2fs_statfs_project")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit db03c20c0850dc8d2bcabfa54b9438f7d666c863 ]
1. After we start atomic write in a database file, before committing
all data, we'd better not set inode w/ vfs dirty status to avoid
redundant updates, instead, we only set inode w/ atomic dirty status.
2. After we commit all data, before committing metadata, we need to
clear atomic dirty status, and set vfs dirty status to allow vfs flush
dirty inode.
Cc: Daeho Jeong <daehojeong@google.com>
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 70dd07c888451503c3e93b6821e10d1ea1ec9930 ]
.init_{,de}compress_ctx uses kvmalloc() to alloc memory, it will try
to allocate physically continuous page first, it may cause more memory
allocation pressure, let's use vmalloc instead to mitigate it.
[Test]
cd /data/local/tmp
touch file
f2fs_io setflags compression file
f2fs_io getflags file
for i in $(seq 1 10); do sync; echo 3 > /proc/sys/vm/drop_caches;\
time f2fs_io write 512 0 4096 zero osync file; truncate -s 0 file;\
done
[Result]
Before After Delta
21.243 21.694 -2.12%
For compression, we recommend to use ioctl to compress file data in
background for workaround.
For decompression, only zstd will be affected.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 5db0d252c64e91ba1929c70112352e85dc5751e7 upstream.
w/ below testcase, resize will generate a corrupted image which
contains inconsistent metadata, so when mounting such image, it
will trigger kernel panic:
touch img
truncate -s $((512*1024*1024*1024)) img
mkfs.f2fs -f img $((256*1024*1024))
resize.f2fs -s -i img -t $((1024*1024*1024))
mount img /mnt/f2fs
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.h:863!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 11 UID: 0 PID: 3922 Comm: mount Not tainted 6.15.0-rc1+ #191 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:f2fs_ra_meta_pages+0x47c/0x490
Call Trace:
f2fs_build_segment_manager+0x11c3/0x2600
f2fs_fill_super+0xe97/0x2840
mount_bdev+0xf4/0x140
legacy_get_tree+0x2b/0x50
vfs_get_tree+0x29/0xd0
path_mount+0x487/0xaf0
__x64_sys_mount+0x116/0x150
do_syscall_64+0x82/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fdbfde1bcfe
The reaseon is:
sit_i->bitmap_size is 192, so size of sit bitmap is 192*8=1536, at maximum
there are 1536 sit blocks, however MAIN_SEGS is 261893, so that sit_blk_cnt
is 4762, build_sit_entries() -> current_sit_addr() tries to access
out-of-boundary in sit_bitmap at offset from [1536, 4762), once sit_bitmap
and sit_bitmap_mirror is not the same, it will trigger f2fs_bug_on().
Let's add sanity check in f2fs_sanity_check_ckpt() to avoid panic.
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 42cb74a92adaf88061039601ddf7c874f58b554e upstream.
WARNING: CPU: 1 PID: 9426 at fs/inode.c:417 drop_nlink+0xac/0xd0
home/cc/linux/fs/inode.c:417
Modules linked in:
CPU: 1 UID: 0 PID: 9426 Comm: syz-executor568 Not tainted
6.14.0-12627-g94d471a4f428 #2 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:drop_nlink+0xac/0xd0 home/cc/linux/fs/inode.c:417
Code: 48 8b 5d 28 be 08 00 00 00 48 8d bb 70 07 00 00 e8 f9 67 e6 ff
f0 48 ff 83 70 07 00 00 5b 5d e9 9a 12 82 ff e8 95 12 82 ff 90
<0f> 0b 90 c7 45 48 ff ff ff ff 5b 5d e9 83 12 82 ff e8 fe 5f e6
ff
RSP: 0018:ffffc900026b7c28 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8239710f
RDX: ffff888041345a00 RSI: ffffffff8239717b RDI: 0000000000000005
RBP: ffff888054509ad0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff9ab36f08 R12: ffff88804bb40000
R13: ffff8880545091e0 R14: 0000000000008000 R15: ffff8880545091e0
FS: 000055555d0c5880(0000) GS:ffff8880eb3e3000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f915c55b178 CR3: 0000000050d20000 CR4: 0000000000352ef0
Call Trace:
<task>
f2fs_i_links_write home/cc/linux/fs/f2fs/f2fs.h:3194 [inline]
f2fs_drop_nlink+0xd1/0x3c0 home/cc/linux/fs/f2fs/dir.c:845
f2fs_delete_entry+0x542/0x1450 home/cc/linux/fs/f2fs/dir.c:909
f2fs_unlink+0x45c/0x890 home/cc/linux/fs/f2fs/namei.c:581
vfs_unlink+0x2fb/0x9b0 home/cc/linux/fs/namei.c:4544
do_unlinkat+0x4c5/0x6a0 home/cc/linux/fs/namei.c:4608
__do_sys_unlink home/cc/linux/fs/namei.c:4654 [inline]
__se_sys_unlink home/cc/linux/fs/namei.c:4652 [inline]
__x64_sys_unlink+0xc5/0x110 home/cc/linux/fs/namei.c:4652
do_syscall_x64 home/cc/linux/arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xc7/0x250 home/cc/linux/arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb3d092324b
Code: 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66
2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 57 00 00 00 0f 05
<48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01
48
RSP: 002b:00007ffdc232d938 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb3d092324b
RDX: 00007ffdc232d960 RSI: 00007ffdc232d960 RDI: 00007ffdc232d9f0
RBP: 00007ffdc232d9f0 R08: 0000000000000001 R09: 00007ffdc232d7c0
R10: 00000000fffffffd R11: 0000000000000206 R12: 00007ffdc232eaf0
R13: 000055555d0cebb0 R14: 00007ffdc232d958 R15: 0000000000000001
</task>
Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 061cf3a84bde038708eb0f1d065b31b7c2456533 upstream.
syzbot reported a f2fs bug as below:
INFO: task syz-executor140:5308 blocked for more than 143 seconds.
Not tainted 6.14.0-rc7-syzkaller-00069-g81e4f8d68c66 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor140 state:D stack:24016 pid:5308 tgid:5308 ppid:5306 task_flags:0x400140 flags:0x00000006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5378 [inline]
__schedule+0x190e/0x4c90 kernel/sched/core.c:6765
__schedule_loop kernel/sched/core.c:6842 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6857
io_schedule+0x8d/0x110 kernel/sched/core.c:7690
folio_wait_bit_common+0x839/0xee0 mm/filemap.c:1317
__folio_lock mm/filemap.c:1664 [inline]
folio_lock include/linux/pagemap.h:1163 [inline]
__filemap_get_folio+0x147/0xb40 mm/filemap.c:1917
pagecache_get_page+0x2c/0x130 mm/folio-compat.c:87
find_get_page_flags include/linux/pagemap.h:842 [inline]
f2fs_grab_cache_page+0x2b/0x320 fs/f2fs/f2fs.h:2776
__get_node_page+0x131/0x11b0 fs/f2fs/node.c:1463
read_xattr_block+0xfb/0x190 fs/f2fs/xattr.c:306
lookup_all_xattrs fs/f2fs/xattr.c:355 [inline]
f2fs_getxattr+0x676/0xf70 fs/f2fs/xattr.c:533
__f2fs_get_acl+0x52/0x870 fs/f2fs/acl.c:179
f2fs_acl_create fs/f2fs/acl.c:375 [inline]
f2fs_init_acl+0xd7/0x9b0 fs/f2fs/acl.c:418
f2fs_init_inode_metadata+0xa0f/0x1050 fs/f2fs/dir.c:539
f2fs_add_inline_entry+0x448/0x860 fs/f2fs/inline.c:666
f2fs_add_dentry+0xba/0x1e0 fs/f2fs/dir.c:765
f2fs_do_add_link+0x28c/0x3a0 fs/f2fs/dir.c:808
f2fs_add_link fs/f2fs/f2fs.h:3616 [inline]
f2fs_mknod+0x2e8/0x5b0 fs/f2fs/namei.c:766
vfs_mknod+0x36d/0x3b0 fs/namei.c:4191
unix_bind_bsd net/unix/af_unix.c:1286 [inline]
unix_bind+0x563/0xe30 net/unix/af_unix.c:1379
__sys_bind_socket net/socket.c:1817 [inline]
__sys_bind+0x1e4/0x290 net/socket.c:1848
__do_sys_bind net/socket.c:1853 [inline]
__se_sys_bind net/socket.c:1851 [inline]
__x64_sys_bind+0x7a/0x90 net/socket.c:1851
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Let's dump and check metadata of corrupted inode, it shows its xattr_nid
is the same to its i_ino.
dump.f2fs -i 3 chaseyu.img.raw
i_xattr_nid [0x 3 : 3]
So that, during mknod in the corrupted directory, it tries to get and
lock inode page twice, result in deadlock.
- f2fs_mknod
- f2fs_add_inline_entry
- f2fs_get_inode_page --- lock dir's inode page
- f2fs_init_acl
- f2fs_acl_create(dir,..)
- __f2fs_get_acl
- f2fs_getxattr
- lookup_all_xattrs
- __get_node_page --- try to lock dir's inode page
In order to fix this, let's add sanity check on ino and xnid.
Cc: stable@vger.kernel.org
Reported-by: syzbot+cc448dcdc7ae0b4e4ffa@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/67e06150.050a0220.21942d.0005.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9883494c45a13dc88d27dde4f988c04823b42a2f ]
Should be "old_dir" here.
Fixes: 5c57132eaf52 ("f2fs: support project quota")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a6c397a31f58a1d577c2c8d04b624e9baa31951c ]
no logic changes.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit aa1be8dd64163eca4dde7fd2557eb19927a06a47 ]
Jan Prusakowski reported a f2fs bug as below:
f2fs/007 will hang kernel during testing w/ below configs:
kernel 6.12.18 (from pixel-kernel/android16-6.12)
export MKFS_OPTIONS="-O encrypt -O extra_attr -O project_quota -O quota"
export F2FS_MOUNT_OPTIONS="test_dummy_encryption,discard,fsync_mode=nobarrier,reserve_root=32768,checkpoint_merge,atgc"
cat /proc/<umount_proc_id>/stack
f2fs_wait_on_all_pages+0xa3/0x130
do_checkpoint+0x40c/0x5d0
f2fs_write_checkpoint+0x258/0x550
kill_f2fs_super+0x14f/0x190
deactivate_locked_super+0x30/0xb0
cleanup_mnt+0xba/0x150
task_work_run+0x59/0xa0
syscall_exit_to_user_mode+0x12d/0x130
do_syscall_64+0x57/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
cat /sys/kernel/debug/f2fs/status
- IO_W (CP: -256, Data: 256, Flush: ( 0 0 1), Discard: ( 0 0)) cmd: 0 undiscard: 0
CP IOs reference count becomes negative.
The root cause is:
After 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block
migration"), we will tag page w/ gcing flag for raw page of cluster
during its migration.
However, if the inode is both encrypted and compressed, during
ioc_decompress(), it will tag page w/ gcing flag, and it increase
F2FS_WB_DATA reference count:
- f2fs_write_multi_page
- f2fs_write_raw_page
- f2fs_write_single_page
- do_write_page
- f2fs_submit_page_write
- WB_DATA_TYPE(bio_page, fio->compressed_page)
: bio_page is encrypted, so mapping is NULL, and fio->compressed_page
is NULL, it returns F2FS_WB_DATA
- inc_page_count(.., F2FS_WB_DATA)
Then, during end_io(), it decrease F2FS_WB_CP_DATA reference count:
- f2fs_write_end_io
- f2fs_compress_write_end_io
- fscrypt_pagecache_folio
: get raw page from encrypted page
- WB_DATA_TYPE(&folio->page, false)
: raw page has gcing flag, it returns F2FS_WB_CP_DATA
- dec_page_count(.., F2FS_WB_CP_DATA)
In order to fix this issue, we need to detect gcing flag in raw page
in f2fs_is_cp_guaranteed().
Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration")
Reported-by: Jan Prusakowski <jprusakowski@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0c708e35cf26449ca317fcbfc274704660b6d269 ]
Just cleanup, no logic changes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 05872a167c2cab80ef186ef23cc34a6776a1a30c ]
syzbot reported a f2fs bug as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/f2fs.h:2521!
RIP: 0010:dec_valid_block_count+0x3b2/0x3c0 fs/f2fs/f2fs.h:2521
Call Trace:
f2fs_truncate_data_blocks_range+0xc8c/0x11a0 fs/f2fs/file.c:695
truncate_dnode+0x417/0x740 fs/f2fs/node.c:973
truncate_nodes+0x3ec/0xf50 fs/f2fs/node.c:1014
f2fs_truncate_inode_blocks+0x8e3/0x1370 fs/f2fs/node.c:1197
f2fs_do_truncate_blocks+0x840/0x12b0 fs/f2fs/file.c:810
f2fs_truncate_blocks+0x10d/0x300 fs/f2fs/file.c:838
f2fs_truncate+0x417/0x720 fs/f2fs/file.c:888
f2fs_setattr+0xc4f/0x12f0 fs/f2fs/file.c:1112
notify_change+0xbca/0xe90 fs/attr.c:552
do_truncate+0x222/0x310 fs/open.c:65
handle_truncate fs/namei.c:3466 [inline]
do_open fs/namei.c:3849 [inline]
path_openat+0x2e4f/0x35d0 fs/namei.c:4004
do_filp_open+0x284/0x4e0 fs/namei.c:4031
do_sys_openat2+0x12b/0x1d0 fs/open.c:1429
do_sys_open fs/open.c:1444 [inline]
__do_sys_creat fs/open.c:1522 [inline]
__se_sys_creat fs/open.c:1516 [inline]
__x64_sys_creat+0x124/0x170 fs/open.c:1516
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/syscall_64.c:94
The reason is: in fuzzed image, sbi->total_valid_block_count is
inconsistent w/ mapped blocks indexed by inode, so, we should
not trigger panic for such case, instead, let's print log and
set fsck flag.
Fixes: 39a53e0ce0df ("f2fs: add superblock and major in-memory structure")
Reported-by: syzbot+8b376a77b2f364097fbe@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/67f3c0b2.050a0220.396535.0547.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|