Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 30dac24e14b52e1787572d1d4e06eeabe8a63630 ]
Most of the callers of wbc_account_cgroup_owner() are converting a folio
to page before calling the function. wbc_account_cgroup_owner() is
converting the page back to a folio to call mem_cgroup_css_from_folio().
Convert wbc_account_cgroup_owner() to take a folio instead of a page,
and convert all callers to pass a folio directly except f2fs.
Convert the page to folio for all the callers from f2fs as they were the
only callers calling wbc_account_cgroup_owner() with a page. As f2fs is
already in the process of converting to folios, these call sites might
also soon be calling wbc_account_cgroup_owner() with a folio directly in
the future.
No functional changes. Only compile tested.
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Link: https://lore.kernel.org/r/20240926140121.203821-1-kernel@pankajraghav.com
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Stable-dep-of: 51d20d1dacbe ("iomap: fix zero padding data issue in concurrent append writes")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 009a8241a8e5a14ea2dd0b8db42dbf283527dd44 ]
Quoted:
"at this time, there are still 1086911 extent nodes in this zombie
extent tree that need to be cleaned up.
crash_arm64_sprd_v8.0.3++> extent_tree.node_cnt ffffff80896cc500
node_cnt = {
counter = 1086911
},
"
As reported by Xiuhong, there will be a huge number of extent nodes
in extent tree, it may potentially cause:
- slab memory fragments
- extreme long time shrink on extent tree
- low mapping efficiency
Let's add a sysfs node to limit max read extent count for each inode,
by default, value of this threshold is 10240, it can be updated
according to user's requirement.
Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3fc5d5a182f6a1f8bd4dc775feb54c369dd2c343 ]
We use rwlock to protect core structure data of extent tree during
its shrink, however, if there is a huge number of extent nodes in
extent tree, during shrink of extent tree, it may hold rwlock for
a very long time, which may trigger kernel hang issue.
This patch fixes to shrink read extent node in batches, so that,
critical region of the rwlock can be shrunk to avoid its extreme
long time hold.
Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 81520c684ca67aea6a589461a3caebb9b11dcc90 ]
If fs corruption occurs in f2fs_new_node_page(), let's print
more information about corrupted metadata into kernel log.
Meanwhile, it updates to record ERROR_INCONSISTENT_NAT instead
of ERROR_INVALID_BLKADDR if blkaddr in nat entry is not
NULL_ADDR which means nat bitmap and nat entry is inconsistent.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d5c367ef8287fb4d235c46a2f8c8d68715f3a0ca ]
creating a large files during checkpoint disable until it runs out of
space and then delete it, then remount to enable checkpoint again, and
then unmount the filesystem triggers the f2fs_bug_on as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:896!
CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty #360
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:f2fs_evict_inode+0x58c/0x610
Call Trace:
__die_body+0x15/0x60
die+0x33/0x50
do_trap+0x10a/0x120
f2fs_evict_inode+0x58c/0x610
do_error_trap+0x60/0x80
f2fs_evict_inode+0x58c/0x610
exc_invalid_op+0x53/0x60
f2fs_evict_inode+0x58c/0x610
asm_exc_invalid_op+0x16/0x20
f2fs_evict_inode+0x58c/0x610
evict+0x101/0x260
dispose_list+0x30/0x50
evict_inodes+0x140/0x190
generic_shutdown_super+0x2f/0x150
kill_block_super+0x11/0x40
kill_f2fs_super+0x7d/0x140
deactivate_locked_super+0x2a/0x70
cleanup_mnt+0xb3/0x140
task_work_run+0x61/0x90
The root cause is: creating large files during disable checkpoint
period results in not enough free segments, so when writing back root
inode will failed in f2fs_enable_checkpoint. When umount the file
system after enabling checkpoint, the root inode is dirty in
f2fs_evict_inode function, which triggers BUG_ON. The steps to
reproduce are as follows:
dd if=/dev/zero of=f2fs.img bs=1M count=55
mount f2fs.img f2fs_dir -o checkpoint=disable:10%
dd if=/dev/zero of=big bs=1M count=50
sync
rm big
mount -o remount,checkpoint=enable f2fs_dir
umount f2fs_dir
Let's redirty inode when there is not free segments during checkpoint
is disable.
Signed-off-by: Qi Han <hanqi@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6787a82245857271133b63ae7f72f1dc9f29e985 ]
dd if=/dev/zero of=file bs=4k count=5
xfs_io file -c "fiemap -v 2 16384"
file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..31]: 139272..139303 32 0x1000
1: [32..39]: 139304..139311 8 0x1001
xfs_io file -c "fiemap -v 0 16384"
file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..31]: 139272..139303 32 0x1000
xfs_io file -c "fiemap -v 0 16385"
file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..39]: 139272..139311 40 0x1001
There are two problems:
- continuous extent is split to two
- FIEMAP_EXTENT_LAST is missing in last extent
The root cause is: if upper boundary of inquiry crosses extent,
f2fs_map_blocks() will truncate length of returned extent to
F2FS_BYTES_TO_BLK(len), and also, it will stop to query latter
extent or hole to make sure current extent is last or not.
In order to fix this issue, once we found an extent locates
in the end of inquiry range by f2fs_map_blocks(), we need to
expand inquiry range to requiry.
Cc: stable@vger.kernel.org
Fixes: 7f63eb77af7b ("f2fs: report unwritten area in f2fs_fiemap")
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 77569f785c8624fa4189795fb52e635a973672e5 ]
If user give a file size as "length" parameter for fiemap
operations, but if this size is non-block size aligned,
it will show 2 segments fiemap results even this whole file
is contiguous on disk, such as the following results:
./f2fs_io fiemap 0 19034 ylog/analyzer.py
Fiemap: offset = 0 len = 19034
logical addr. physical addr. length flags
0 0000000000000000 0000000020baa000 0000000000004000 00001000
1 0000000000004000 0000000020bae000 0000000000001000 00001001
after this patch:
./f2fs_io fiemap 0 19034 ylog/analyzer.py
Fiemap: offset = 0 len = 19034
logical addr. physical addr. length flags
0 0000000000000000 00000000315f3000 0000000000005000 00001001
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 6787a8224585 ("f2fs: fix to requery extent which cross boundary of inquiry")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7461f37094180200cb2f98e60ef99a0cea97beec ]
f2fs doesn't support different blksize in one instance, so
bytes_to_blks() and blks_to_bytes() are equal to F2FS_BYTES_TO_BLK
and F2FS_BLK_TO_BYTES, let's use F2FS_BYTES_TO_BLK/F2FS_BLK_TO_BYTES
instead for cleanup.
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 6787a8224585 ("f2fs: fix to requery extent which cross boundary of inquiry")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit bc8aeb04fd80cb8cfae3058445c84410fd0beb5e upstream.
Piergiorgio reported a bug in bugzilla as below:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330
RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Call Trace:
__issue_discard_cmd+0x1ca/0x350 [f2fs]
issue_discard_thread+0x191/0x480 [f2fs]
kthread+0xcf/0x100
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1a/0x30
w/ below testcase, it can reproduce this bug quickly:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs
The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.
This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().
Cc: stable@vger.kernel.org
Fixes: 35ec7d574884 ("f2fs: split discard command in prior to block layer")
Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6babe00ccd34fc65b78ef8b99754e32b4385f23d upstream.
syzbot reports a f2fs bug as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2534!
RIP: 0010:f2fs_invalidate_blocks+0x35f/0x370 fs/f2fs/segment.c:2534
Call Trace:
truncate_node+0x1ae/0x8c0 fs/f2fs/node.c:909
f2fs_remove_inode_page+0x5c2/0x870 fs/f2fs/node.c:1288
f2fs_evict_inode+0x879/0x15c0 fs/f2fs/inode.c:856
evict+0x4e8/0x9b0 fs/inode.c:723
f2fs_handle_failed_inode+0x271/0x2e0 fs/f2fs/inode.c:986
f2fs_create+0x357/0x530 fs/f2fs/namei.c:394
lookup_open fs/namei.c:3595 [inline]
open_last_lookups fs/namei.c:3694 [inline]
path_openat+0x1c03/0x3590 fs/namei.c:3930
do_filp_open+0x235/0x490 fs/namei.c:3960
do_sys_openat2+0x13e/0x1d0 fs/open.c:1415
do_sys_open fs/open.c:1430 [inline]
__do_sys_openat fs/open.c:1446 [inline]
__se_sys_openat fs/open.c:1441 [inline]
__x64_sys_openat+0x247/0x2a0 fs/open.c:1441
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0010:f2fs_invalidate_blocks+0x35f/0x370 fs/f2fs/segment.c:2534
The root cause is: on a fuzzed image, blkaddr in nat entry may be
corrupted, then it will cause system panic when using it in
f2fs_invalidate_blocks(), to avoid this, let's add sanity check on
nat blkaddr in truncate_node().
Reported-by: syzbot+33379ce4ac76acf7d0c7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/0000000000009a6cd706224ca720@google.com/
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit acff9409dd40beaca2bd982678d222e2740ad84b upstream.
This reverts commit 54f43a10fa257ad4af02a1d157fefef6ebcfa7dc.
The above commit broke the lazytime mount, given
mount("/dev/vdb", "/mnt/test", "f2fs", 0, "lazytime");
CC: stable@vger.kernel.org # 6.11+
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a7a7c1d423a6351a6541e95c797da5358e5ad1ea upstream.
After enable 16K page size, an infinite loop may occur in
fiemap (fm_length=UINT64_MAX) on a file, such as the 16KB
scratch.img during the remount operation in Android.
The condition for whether fiemap continues to map is to check
whether the number of bytes corresponding to the next map.m_lblk
exceeds blks_to_bytes(inode,max_inode_blocks(inode)) if there are HOLE.
The latter does not take into account the maximum size of a file with 16KB
page size, so the loop cannot be jumped out.
The following is the fail trace:
When f2fs_map_blocks reaches map.m_lblk=3936, it needs to go to the
first direct node block, so the map is 3936 + 4090 = 8026,
The next map is the second direct node block, that is,
8026 + 4090 = 12116,
The next map is the first indirect node block, that is,
12116 + 4090 * 4090 = 16740216,
The next map is the second indirect node block, that is,
16740216 + 4090 * 4090 = 33468316,
The next map is the first double indirect node block, that is,
33468316 + 4090 * 4090 * 4090 = 68451397316
Since map.m_lblk represents the address of a block, which is 32
bits, truncation will occur, that is, 68451397316 becomes
4026887876, and the number of bytes corresponding to the block
number does not exceed blks_to_bytes(inode,max_inode_blocks(inode)),
so the loop will not be jumped out.
The next time, it will be considered that it should still be a
double indirect node block, that is,
4026887876 + 4090 * 4090 * 4090 = 72444816876, which will be
truncated to 3725340140, and the loop will not be jumped out.
156.374871: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 0, start blkaddr = 0x8e00, len = 0x200, flags = 2,seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.374916: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 512, start blkaddr = 0x0, len = 0x0, flags = 0 , seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.374920: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 513, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
......
156.385747: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3935, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385752: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3936, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385755: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 8026, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385758: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 12116, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385761: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 16740216, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385764: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 33468316, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385767: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 4026887876, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385770: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3725340140, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385772: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 4026887876, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385775: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3725340140, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
Commit a6a010f5def5 ("f2fs: Restrict max filesize for 16K f2fs")
has set the maximum allowed file size to (U32_MAX + 1) * F2FS_BLKSIZE,
so max_file_blocks should be used here to limit it, that is,
maxbytes defined above. And the max_inode_blocks function is not
called by other functions except here, so cleanup it.
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3273d8ad947dea925a65a78ca29e5351c960c801 ]
It missed to cast variable to unsigned long long type before
bit shift, which will cause overflow, fix it.
Fixes: f7ef9b83b583 ("f2fs: introduce macros to convert bytes and blocks in f2fs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 26e6f59d0bbaac76fa3413462d780bd2b5f9f653 ]
Jinsu Lee reported a performance regression issue, after commit
5c8764f8679e ("f2fs: fix to force buffered IO on inline_data
inode"), we forced direct write to use buffered IO on inline_data
inode, it will cause performace regression due to memory copy
and data flush.
It's fine to not force direct write to use buffered IO, as it
can convert inline inode before committing direct write IO.
Fixes: 5c8764f8679e ("f2fs: fix to force buffered IO on inline_data inode")
Reported-by: Jinsu Lee <jinsu1.lee@samsung.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/af03dd2c-e361-4f80-b2fd-39440766cf6e@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5dd00ebda337b9295e7027691fa70540da369ff2 ]
f2fs_map_blocks() supports to map continuous holes or preallocated
address, we should avoid setting F2FS_MAP_MAPPED for these cases
only, otherwise, it may fail f2fs_iomap_begin(), and make direct
write fallbacking to use buffered IO and flush, result in performance
regression.
Fixes: 9f0f6bf42714 ("f2fs: support to map continuous holes or preallocated address")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409122103.e45aa13b-oliver.sang@intel.com
Cc: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7b0033dbc48340a1c1c3f12448ba17d6587ca092 ]
In my test case, concurrent calls to f2fs shutdown report the following
stack trace:
Oops: general protection fault, probably for non-canonical address 0xc6cfff63bb5513fc: 0000 [#1] PREEMPT SMP PTI
CPU: 0 UID: 0 PID: 678 Comm: f2fs_rep_shutdo Not tainted 6.12.0-rc5-next-20241029-g6fb2fa9805c5-dirty #85
Call Trace:
<TASK>
? show_regs+0x8b/0xa0
? __die_body+0x26/0xa0
? die_addr+0x54/0x90
? exc_general_protection+0x24b/0x5c0
? asm_exc_general_protection+0x26/0x30
? kthread_stop+0x46/0x390
f2fs_stop_gc_thread+0x6c/0x110
f2fs_do_shutdown+0x309/0x3a0
f2fs_ioc_shutdown+0x150/0x1c0
__f2fs_ioctl+0xffd/0x2ac0
f2fs_ioctl+0x76/0xe0
vfs_ioctl+0x23/0x60
__x64_sys_ioctl+0xce/0xf0
x64_sys_call+0x2b1b/0x4540
do_syscall_64+0xa7/0x240
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The root cause is a race condition in f2fs_stop_gc_thread() called from
different f2fs shutdown paths:
[CPU0] [CPU1]
---------------------- -----------------------
f2fs_stop_gc_thread f2fs_stop_gc_thread
gc_th = sbi->gc_thread
gc_th = sbi->gc_thread
kfree(gc_th)
sbi->gc_thread = NULL
< gc_th != NULL >
kthread_stop(gc_th->f2fs_gc_task) //UAF
The commit c7f114d864ac ("f2fs: fix to avoid use-after-free in
f2fs_stop_gc_thread()") attempted to fix this issue by using a read
semaphore to prevent races between shutdown and remount threads, but
it fails to prevent all race conditions.
Fix it by converting to write lock of s_umount in f2fs_do_shutdown().
Fixes: 7950e9ac638e ("f2fs: stop gc/discard thread after fs shutdown")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
GC_URGENT_MID
[ Upstream commit 296b8cb34e65fa93382cf919be5a056f719c9a26 ]
If gc_mode is set to GC_URGENT_LOW or GC_URGENT_MID, cost benefit GC
approach should be used, but if ATGC is enabled at the same time,
Age-threshold approach will be selected, which can only do amount of
GC and it is much less than the numbers of CB approach.
some traces:
f2fs_gc-254:48-396 [007] ..... 2311600.684028: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1053, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
f2fs_gc-254:48-396 [007] ..... 2311600.684527: f2fs_get_victim: dev = (254,48), type = No TYPE, policy = (Background GC, LFS-mode, Age-threshold), victim = 10, cost = 4294364975, ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 44898
f2fs_gc-254:48-396 [007] ..... 2311600.714835: f2fs_gc_end: dev = (254,48), ret = 0, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
f2fs_gc-254:48-396 [007] ..... 2311600.714843: f2fs_background_gc: dev = (254,48), wait_ms = 50, prefree = 0, free = 44898
f2fs_gc-254:48-396 [007] ..... 2311600.771785: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:
f2fs_gc-254:48-396 [007] ..... 2311600.772275: f2fs_gc_end: dev = (254,48), ret = -61, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
Fixes: 0e5e81114de1 ("f2fs: add GC_URGENT_LOW mode in gc_urgent")
Fixes: d98af5f45520 ("f2fs: introduce gc_urgent_mid mode")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f10a890308a7cd8794e21f646f09827c6cb4bf5d ]
syzbot reports deadlock issue of f2fs as below:
======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc3-syzkaller-00087-gc964ced77262 #0 Not tainted
------------------------------------------------------
kswapd0/79 is trying to acquire lock:
ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2199 [inline]
ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_record_stop_reason+0x52/0x1d0 fs/f2fs/super.c:4068
but task is already holding lock:
ffff88804bd92610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){.+.+}-{0:0}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
__sb_start_write include/linux/fs.h:1716 [inline]
sb_start_intwrite+0x4d/0x1c0 include/linux/fs.h:1899
f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842
evict+0x4e8/0x9b0 fs/inode.c:725
f2fs_evict_inode+0x1a4/0x15c0 fs/f2fs/inode.c:807
evict+0x4e8/0x9b0 fs/inode.c:725
dispose_list fs/inode.c:774 [inline]
prune_icache_sb+0x239/0x2f0 fs/inode.c:963
super_cache_scan+0x38c/0x4b0 fs/super.c:223
do_shrink_slab+0x701/0x1160 mm/shrinker.c:435
shrink_slab+0x1093/0x14d0 mm/shrinker.c:662
shrink_one+0x43b/0x850 mm/vmscan.c:4818
shrink_many mm/vmscan.c:4879 [inline]
lru_gen_shrink_node mm/vmscan.c:4957 [inline]
shrink_node+0x3799/0x3de0 mm/vmscan.c:5937
kswapd_shrink_node mm/vmscan.c:6765 [inline]
balance_pgdat mm/vmscan.c:6957 [inline]
kswapd+0x1ca3/0x3700 mm/vmscan.c:7226
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
-> #1 (fs_reclaim){+.+.}-{0:0}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
__fs_reclaim_acquire mm/page_alloc.c:3834 [inline]
fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3848
might_alloc include/linux/sched/mm.h:318 [inline]
prepare_alloc_pages+0x147/0x5b0 mm/page_alloc.c:4493
__alloc_pages_noprof+0x16f/0x710 mm/page_alloc.c:4722
alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
alloc_pages_noprof mm/mempolicy.c:2345 [inline]
folio_alloc_noprof+0x128/0x180 mm/mempolicy.c:2352
filemap_alloc_folio_noprof+0xdf/0x500 mm/filemap.c:1010
do_read_cache_folio+0x2eb/0x850 mm/filemap.c:3787
read_mapping_folio include/linux/pagemap.h:1011 [inline]
f2fs_commit_super+0x3c0/0x7d0 fs/f2fs/super.c:4032
f2fs_record_stop_reason+0x13b/0x1d0 fs/f2fs/super.c:4079
f2fs_handle_critical_error+0x2ac/0x5c0 fs/f2fs/super.c:4174
f2fs_write_inode+0x35f/0x4d0 fs/f2fs/inode.c:785
write_inode fs/fs-writeback.c:1503 [inline]
__writeback_single_inode+0x711/0x10d0 fs/fs-writeback.c:1723
writeback_single_inode+0x1f3/0x660 fs/fs-writeback.c:1779
sync_inode_metadata+0xc4/0x120 fs/fs-writeback.c:2849
f2fs_release_file+0xa8/0x100 fs/f2fs/file.c:1941
__fput+0x23f/0x880 fs/file_table.c:431
task_work_run+0x24f/0x310 kernel/task_work.c:228
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218
do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&sbi->sb_lock){++++}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3161 [inline]
check_prevs_add kernel/locking/lockdep.c:3280 [inline]
validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
__lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
down_write+0x99/0x220 kernel/locking/rwsem.c:1577
f2fs_down_write fs/f2fs/f2fs.h:2199 [inline]
f2fs_record_stop_reason+0x52/0x1d0 fs/f2fs/super.c:4068
f2fs_handle_critical_error+0x2ac/0x5c0 fs/f2fs/super.c:4174
f2fs_evict_inode+0xa61/0x15c0 fs/f2fs/inode.c:883
evict+0x4e8/0x9b0 fs/inode.c:725
f2fs_evict_inode+0x1a4/0x15c0 fs/f2fs/inode.c:807
evict+0x4e8/0x9b0 fs/inode.c:725
dispose_list fs/inode.c:774 [inline]
prune_icache_sb+0x239/0x2f0 fs/inode.c:963
super_cache_scan+0x38c/0x4b0 fs/super.c:223
do_shrink_slab+0x701/0x1160 mm/shrinker.c:435
shrink_slab+0x1093/0x14d0 mm/shrinker.c:662
shrink_one+0x43b/0x850 mm/vmscan.c:4818
shrink_many mm/vmscan.c:4879 [inline]
lru_gen_shrink_node mm/vmscan.c:4957 [inline]
shrink_node+0x3799/0x3de0 mm/vmscan.c:5937
kswapd_shrink_node mm/vmscan.c:6765 [inline]
balance_pgdat mm/vmscan.c:6957 [inline]
kswapd+0x1ca3/0x3700 mm/vmscan.c:7226
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
other info that might help us debug this:
Chain exists of:
&sbi->sb_lock --> fs_reclaim --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(sb_internal#2);
lock(fs_reclaim);
lock(sb_internal#2);
lock(&sbi->sb_lock);
Root cause is there will be potential deadlock in between
below tasks:
Thread A Kswapd
- f2fs_ioc_commit_atomic_write
- mnt_want_write_file -- down_read lock A
- balance_pgdat
- __fs_reclaim_acquire -- lock B
- shrink_node
- prune_icache_sb
- dispose_list
- f2fs_evict_inode
- sb_start_intwrite -- down_read lock A
- f2fs_do_sync_file
- f2fs_write_inode
- f2fs_handle_critical_error
- f2fs_record_stop_reason
- f2fs_commit_super
- read_mapping_folio
- filemap_alloc_folio_noprof
- fs_reclaim_acquire -- lock B
Both threads try to acquire read lock of lock A, then its upcoming write
lock grabber will trigger deadlock.
Let's always create an asynchronous task in f2fs_handle_critical_error()
rather than calling f2fs_record_stop_reason() synchronously to avoid
this potential deadlock issue.
Fixes: b62e71be2110 ("f2fs: support errors=remount-ro|continue|panic mountoption")
Reported-by: syzbot+be4a9983e95a5e25c8d3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6704d667.050a0220.1e4d62.0081.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0c3a38a4b442893f8baca72e44a2a27d52d6cc75 ]
Fix the following compilation warning:
fs/f2fs/data.c:2391:10: warning: variable ‘index’ set but not used
[-Wunused-but-set-variable]
2391 | pgoff_t index;
Only define and set the variable index when the CONFIG_F2FS_FS_COMPRESSION
is enabled.
Fixes: db92e6c729d8 ("f2fs: convert f2fs_mpage_readpages() to use folio")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 43563069e1c1df417d2eed6eca8a22fc6b04691d ]
In the __f2fs_init_atgc_curseg->get_atssr_segment calling,
curseg->segno is NULL_SEGNO, indicating that there is no summary
block that needs to be written.
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c3af1f13476ec23fd99c98d060a89be28c1e8871 ]
This f2fs_bug_on was introduced by commit 2c1905042c8c ("f2fs: check
segment type in __f2fs_replace_block") when there were only 6 curseg types.
After commit d0b9e42ab615 ("f2fs: introduce inmem curseg") was introduced,
the condition should be changed to checking curseg->seg_type.
Fixes: d0b9e42ab615 ("f2fs: introduce inmem curseg")
Signed-off-by: LongPing Wei <weilongping@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1acd73edbbfef2c3c5b43cba4006a7797eca7050 ]
It will trigger system panic w/ testcase in [1]:
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2752!
RIP: 0010:new_curseg+0xc81/0x2110
Call Trace:
f2fs_allocate_data_block+0x1c91/0x4540
do_write_page+0x163/0xdf0
f2fs_outplace_write_data+0x1aa/0x340
f2fs_do_write_data_page+0x797/0x2280
f2fs_write_single_data_page+0x16cd/0x2190
f2fs_write_cache_pages+0x994/0x1c80
f2fs_write_data_pages+0x9cc/0xea0
do_writepages+0x194/0x7a0
filemap_fdatawrite_wbc+0x12b/0x1a0
__filemap_fdatawrite_range+0xbb/0xf0
file_write_and_wait_range+0xa1/0x110
f2fs_do_sync_file+0x26f/0x1c50
f2fs_sync_file+0x12b/0x1d0
vfs_fsync_range+0xfa/0x230
do_fsync+0x3d/0x80
__x64_sys_fsync+0x37/0x50
x64_sys_call+0x1e88/0x20d0
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The root cause is if checkpoint_disabling and lfs_mode are both on,
it will trigger OPU for all overwritten data, it may cost more free
segment than expected, so f2fs must account those data correctly to
calculate cosumed free segments later, and return ENOSPC earlier to
avoid run out of free segment during block allocation.
[1] https://lore.kernel.org/fstests/20241015025106.3203676-1-chao@kernel.org/
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b7d0a97b28083084ebdd8e5c6bccd12e6ec18faa ]
There's issue as follows when concurrently installing the f2fs.ko
module and mounting the f2fs file system:
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
RIP: 0010:__bio_alloc+0x2fb/0x6c0 [f2fs]
Call Trace:
<TASK>
f2fs_submit_page_bio+0x126/0x8b0 [f2fs]
__get_meta_page+0x1d4/0x920 [f2fs]
get_checkpoint_version.constprop.0+0x2b/0x3c0 [f2fs]
validate_checkpoint+0xac/0x290 [f2fs]
f2fs_get_valid_checkpoint+0x207/0x950 [f2fs]
f2fs_fill_super+0x1007/0x39b0 [f2fs]
mount_bdev+0x183/0x250
legacy_get_tree+0xf4/0x1e0
vfs_get_tree+0x88/0x340
do_new_mount+0x283/0x5e0
path_mount+0x2b2/0x15b0
__x64_sys_mount+0x1fe/0x270
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Above issue happens as the biset of the f2fs file system is not
initialized before register "f2fs_fs_type".
To address above issue just register "f2fs_fs_type" at the last in
init_f2fs_fs(). Ensure that all f2fs file system resources are
initialized.
Fixes: f543805fcd60 ("f2fs: introduce private bioset")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
release_compress_blocks and reserve_compress_blocks
[ Upstream commit 26413ce18e85de3dda2cd3d72c3c3e8ab8f4f996 ]
After release a file and subsequently reserve it, the FSCK flag is set
when the file is deleted, as shown in the following backtrace:
F2FS-fs (dm-48): Inconsistent i_blocks, ino:401231, iblocks:1448, sectors:1472
fs_rec_info_write_type+0x58/0x274
f2fs_rec_info_write+0x1c/0x2c
set_sbi_flag+0x74/0x98
dec_valid_block_count+0x150/0x190
f2fs_truncate_data_blocks_range+0x2d4/0x3cc
f2fs_do_truncate_blocks+0x2fc/0x5f0
f2fs_truncate_blocks+0x68/0x100
f2fs_truncate+0x80/0x128
f2fs_evict_inode+0x1a4/0x794
evict+0xd4/0x280
iput+0x238/0x284
do_unlinkat+0x1ac/0x298
__arm64_sys_unlinkat+0x48/0x68
invoke_syscall+0x58/0x11c
For clusters of the following type, i_blocks are decremented by 1 and
i_compr_blocks are incremented by 7 in release_compress_blocks, while
updates to i_blocks and i_compr_blocks are skipped in reserve_compress_blocks.
raw node:
D D D D D D D D
after compress:
C D D D D D D D
after reserve:
C D D D D D D D
Let's update i_blocks and i_compr_blocks properly in reserve_compress_blocks.
Fixes: eb8fbaa53374 ("f2fs: compress: fix to check unreleased compressed cluster")
Signed-off-by: Qi Han <hanqi@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
This fixes a regression which prevents parallel DIO reads.
Fixes: 0cac51185e65 ("f2fs: fix to avoid racing in between read and OPU dio write")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"The main changes include converting major IO paths to use folio, and
adding various knobs to control GC more flexibly for Zoned devices.
In addition, there are several patches to address corner cases of
atomic file operations and better support for file pinning on zoned
device.
Enhancement:
- add knobs to tune foreground/background GCs for Zoned devices
- convert IO paths to use folio
- reduce expensive checkpoint trigger frequency
- allow F2FS_IPU_NOCACHE for pinned file
- forcibly migrate to secure space for zoned device file pinning
- get rid of buffer_head use
- add write priority option based on zone UFS
- get rid of online repair on corrupted directory
Bug fixes:
- fix to don't panic system for no free segment fault injection
- fix to don't set SB_RDONLY in f2fs_handle_critical_error()
- avoid unused block when dio write in LFS mode
- compress: don't redirty sparse cluster during {,de}compress
- check discard support for conventional zones
- atomic: prevent atomic file from being dirtied before commit
- atomic: fix to check atomic_file in f2fs ioctl interfaces
- atomic: fix to forbid dio in atomic_file
- atomic: fix to truncate pagecache before on-disk metadata truncation
- atomic: create COW inode from parent dentry
- atomic: fix to avoid racing w/ GC
- atomic: require FMODE_WRITE for atomic write ioctls
- fix to wait page writeback before setting gcing flag
- fix to avoid racing in between read and OPU dio write, dio completion
- fix several potential integer overflows in file offsets and dir_block_index
- fix to avoid use-after-free in f2fs_stop_gc_thread()
As usual, there are several code clean-ups and refactorings"
* tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
f2fs: allow F2FS_IPU_NOCACHE for pinned file
f2fs: forcibly migrate to secure space for zoned device file pinning
f2fs: remove unused parameters
f2fs: fix to don't panic system for no free segment fault injection
f2fs: fix to don't set SB_RDONLY in f2fs_handle_critical_error()
f2fs: add valid block ratio not to do excessive GC for one time GC
f2fs: create gc_no_zoned_gc_percent and gc_boost_zoned_gc_percent
f2fs: do FG_GC when GC boosting is required for zoned devices
f2fs: increase BG GC migration window granularity when boosted for zoned devices
f2fs: add reserved_segments sysfs node
f2fs: introduce migration_window_granularity
f2fs: make BG GC more aggressive for zoned devices
f2fs: avoid unused block when dio write in LFS mode
f2fs: fix to check atomic_file in f2fs ioctl interfaces
f2fs: get rid of online repaire on corrupted directory
f2fs: prevent atomic file from being dirtied before commit
f2fs: get rid of page->index
f2fs: convert read_node_page() to use folio
f2fs: convert __write_node_page() to use folio
f2fs: convert f2fs_write_data_page() to use folio
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 'struct fd' updates from Al Viro:
"Just the 'struct fd' layout change, with conversion to accessor
helpers"
* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add struct fd constructors, get rid of __to_fd()
struct fd: representation change
introduce fd_file(), convert all accessors to it.
|
|
This patch allows f2fs to submit bios of in-place writes on pinned file.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We need to migrate data blocks even though it is full to secure space
for zoned device file pinning.
Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Remove unused parameter segno from f2fs_usable_segs_in_sec.
Signed-off-by: liuderong <liuderong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs: fix to don't panic system for no free segment fault injection
syzbot reports a f2fs bug as below:
F2FS-fs (loop0): inject no free segment in get_new_segment of __allocate_new_segment+0x1ce/0x940 fs/f2fs/segment.c:3167
F2FS-fs (loop0): Stopped filesystem due to reason: 7
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2748!
CPU: 0 UID: 0 PID: 5109 Comm: syz-executor304 Not tainted 6.11.0-rc6-syzkaller-00363-g89f5e14d05b4 #0
RIP: 0010:get_new_segment fs/f2fs/segment.c:2748 [inline]
RIP: 0010:new_curseg+0x1f61/0x1f70 fs/f2fs/segment.c:2836
Call Trace:
__allocate_new_segment+0x1ce/0x940 fs/f2fs/segment.c:3167
f2fs_allocate_new_section fs/f2fs/segment.c:3181 [inline]
f2fs_allocate_pinning_section+0xfa/0x4e0 fs/f2fs/segment.c:3195
f2fs_expand_inode_data+0x5d6/0xbb0 fs/f2fs/file.c:1799
f2fs_fallocate+0x448/0x960 fs/f2fs/file.c:1903
vfs_fallocate+0x553/0x6c0 fs/open.c:334
do_vfs_ioctl+0x2592/0x2e50 fs/ioctl.c:886
__do_sys_ioctl fs/ioctl.c:905 [inline]
__se_sys_ioctl+0x81/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0010:get_new_segment fs/f2fs/segment.c:2748 [inline]
RIP: 0010:new_curseg+0x1f61/0x1f70 fs/f2fs/segment.c:2836
The root cause is when we inject no free segment fault into f2fs,
we should not panic system, fix it.
Fixes: 8b10d3653735 ("f2fs: introduce FAULT_NO_SEGMENT")
Reported-by: syzbot+341e5f32ebafbb46b81c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f0ee5b0621ab694b@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
syzbot reports a f2fs bug as below:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 58 at kernel/rcu/sync.c:177 rcu_sync_dtor+0xcd/0x180 kernel/rcu/sync.c:177
CPU: 1 UID: 0 PID: 58 Comm: kworker/1:2 Not tainted 6.10.0-syzkaller-12562-g1722389b0d86 #0
Workqueue: events destroy_super_work
RIP: 0010:rcu_sync_dtor+0xcd/0x180 kernel/rcu/sync.c:177
Call Trace:
percpu_free_rwsem+0x41/0x80 kernel/locking/percpu-rwsem.c:42
destroy_super_work+0xec/0x130 fs/super.c:282
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
worker_thread+0x86d/0xd40 kernel/workqueue.c:3390
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
As Christian Brauner pointed out [1]: the root cause is f2fs sets
SB_RDONLY flag in internal function, rather than setting the flag
covered w/ sb->s_umount semaphore via remount procedure, then below
race condition causes this bug:
- freeze_super()
- sb_wait_write(sb, SB_FREEZE_WRITE)
- sb_wait_write(sb, SB_FREEZE_PAGEFAULT)
- sb_wait_write(sb, SB_FREEZE_FS)
- f2fs_handle_critical_error
- sb->s_flags |= SB_RDONLY
- thaw_super
- thaw_super_locked
- sb_rdonly() is true, so it skips
sb_freeze_unlock(sb, SB_FREEZE_FS)
- deactivate_locked_super
Since f2fs has almost the same logic as ext4 [2] when handling critical
error in filesystem if it mounts w/ errors=remount-ro option:
- set CP_ERROR_FLAG flag which indicates filesystem is stopped
- record errors to superblock
- set SB_RDONLY falg
Once we set CP_ERROR_FLAG flag, all writable interfaces can detect the
flag and stop any further updates on filesystem. So, it is safe to not
set SB_RDONLY flag, let's remove the logic and keep in line w/ ext4 [3].
[1] https://lore.kernel.org/all/20240729-himbeeren-funknetz-96e62f9c7aee@brauner
[2] https://lore.kernel.org/all/20240729132721.hxih6ehigadqf7wx@quack3
[3] https://lore.kernel.org/linux-ext4/20240805201241.27286-1-jack@suse.cz
Fixes: b62e71be2110 ("f2fs: support errors=remount-ro|continue|panic mountoption")
Reported-by: syzbot+20d7e439f76bbbd863a7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000b90a8e061e21d12f@google.com/
Cc: Jan Kara <jack@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We need to introduce a valid block ratio threshold not to trigger
excessive GC for zoned deivces. The initial value of it is 95%. So, F2FS
will stop the thread from intiating GC for sections having valid blocks
exceeding the ratio.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Added control knobs for gc_no_zoned_gc_percent and
gc_boost_zoned_gc_percent.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Under low free section count, we need to use FG_GC instead of BG_GC to
recover free sections.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Need bigger BG GC migration window granularity when free section is
running low.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
For the fine tuning of GC behavior, add reserved_segments sysfs node.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We can control the scanning window granularity for GC migration. For
more frequent scanning and GC on zoned devices, we need a fine grained
control knob for it.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Since we don't have any GC on device side for zoned devices, need more
aggressive BG GC. So, tune the parameters for that.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch addresses the problem that when using LFS mode, unused blocks
may occur in f2fs_map_blocks() during block allocation for dio writes.
If a new section is allocated during block allocation, it will not be
included in the map struct by map_is_mergeable() if the LBA of the
allocated block is not contiguous. However, the block already allocated
in this process will remain unused due to the LFS mode.
This patch avoids the possibility of unused blocks by escaping
f2fs_map_blocks() when allocating the last block in a section.
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Some f2fs ioctl interfaces like f2fs_ioc_set_pin_file(),
f2fs_move_file_range(), and f2fs_defragment_range() missed to
check atomic_write status, which may cause potential race issue,
fix it.
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
syzbot reports a f2fs bug as below:
kernel BUG at fs/f2fs/inode.c:896!
RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896
Call Trace:
evict+0x532/0x950 fs/inode.c:704
dispose_list fs/inode.c:747 [inline]
evict_inodes+0x5f9/0x690 fs/inode.c:797
generic_shutdown_super+0x9d/0x2d0 fs/super.c:627
kill_block_super+0x44/0x90 fs/super.c:1696
kill_f2fs_super+0x344/0x690 fs/f2fs/super.c:4898
deactivate_locked_super+0xc4/0x130 fs/super.c:473
cleanup_mnt+0x41f/0x4b0 fs/namespace.c:1373
task_work_run+0x24f/0x310 kernel/task_work.c:228
ptrace_notify+0x2d2/0x380 kernel/signal.c:2402
ptrace_report_syscall include/linux/ptrace.h:415 [inline]
ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline]
syscall_exit_work+0xc6/0x190 kernel/entry/common.c:173
syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline]
syscall_exit_to_user_mode+0x279/0x370 kernel/entry/common.c:218
do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896
Online repaire on corrupted directory in f2fs_lookup() can generate
dirty data/meta while racing w/ readonly remount, it may leave dirty
inode after filesystem becomes readonly, however, checkpoint() will
skips flushing dirty inode in a state of readonly mode, result in
above panic.
Let's get rid of online repaire in f2fs_lookup(), and leave the work
to fsck.f2fs.
Fixes: 510022a85839 ("f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries")
Reported-by: syzbot+ebea2790904673d7c618@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000a7b20f061ff2d56a@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Keep atomic file clean while updating and make it dirtied during commit
in order to avoid unnecessary and excessive inode updates in the previous
fix.
Fixes: 4bf78322346f ("f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].
[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].
[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].
[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].
[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].
[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].
[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|