summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)AuthorFilesLines
2024-09-07f2fs: convert f2fs_write_single_data_page() to use folioChao Yu3-16/+18
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-07f2fs: convert f2fs_write_inline_data() to use folioChao Yu4-8/+8
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-07f2fs: convert f2fs_clear_page_cache_dirty_tag() to use folioChao Yu5-6/+5
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-07f2fs: convert f2fs_vm_page_mkwrite() to use folioChao Yu1-16/+16
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-07f2fs: convert f2fs_compress_ctx_add_page() to use folioChao Yu3-10/+10
onvert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: Use sysfs_emit_at() to simplify codeChristophe JAILLET1-24/+21
This file already uses sysfs_emit(). So be consistent and also use sysfs_emit_at(). This slightly simplifies the code and makes it more readable. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: atomic: fix to forbid dio in atomic_fileChao Yu1-12/+24
atomic write can only be used via buffered IO, let's fail direct IO on atomic_file and return -EOPNOTSUPP. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: compress: don't redirty sparse cluster during {,de}compressYeongjin Gil3-26/+61
In f2fs_do_write_data_page, when the data block is NULL_ADDR, it skips writepage considering that it has been already truncated. This results in an infinite loop as the PAGECACHE_TAG_TOWRITE tag is not cleared during the writeback process for a compressed file including NULL_ADDR in compress_mode=user. This is the reproduction process: 1. dd if=/dev/zero bs=4096 count=1024 seek=1024 of=testfile 2. f2fs_io compress testfile 3. dd if=/dev/zero bs=4096 count=1 conv=notrunc of=testfile 4. f2fs_io decompress testfile To prevent the problem, let's check whether the cluster is fully allocated before redirty its pages. Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Tested-by: Jaewook Kim <jw5454.kim@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: check discard support for conventional zonesShin'ichiro Kawasaki1-0/+7
As the helper function f2fs_bdev_support_discard() shows, f2fs checks if the target block devices support discard by calling bdev_max_discard_sectors() and bdev_is_zoned(). This check works well for most cases, but it does not work for conventional zones on zoned block devices. F2fs assumes that zoned block devices support discard, and calls __submit_discard_cmd(). When __submit_discard_cmd() is called for sequential write required zones, it works fine since __submit_discard_cmd() issues zone reset commands instead of discard commands. However, when __submit_discard_cmd() is called for conventional zones, __blkdev_issue_discard() is called even when the devices do not support discard. The inappropriate __blkdev_issue_discard() call was not a problem before the commit 30f1e7241422 ("block: move discard checks into the ioctl handler") because __blkdev_issue_discard() checked if the target devices support discard or not. If not, it returned EOPNOTSUPP. After the commit, __blkdev_issue_discard() no longer checks it. It always returns zero and sets NULL to the given bio pointer. This NULL pointer triggers f2fs_bug_on() in __submit_discard_cmd(). The BUG is recreated with the commands below at the umount step, where /dev/nullb0 is a zoned null_blk with 5GB total size, 128MB zone size and 10 conventional zones. $ mkfs.f2fs -f -m /dev/nullb0 $ mount /dev/nullb0 /mnt $ for ((i=0;i<5;i++)); do dd if=/dev/zero of=/mnt/test bs=65536 count=1600 conv=fsync; done $ umount /mnt To fix the BUG, avoid the inappropriate __blkdev_issue_discard() call. When discard is requested for conventional zones, check if the device supports discard or not. If not, return EOPNOTSUPP. Fixes: 30f1e7241422 ("block: move discard checks into the ioctl handler") Cc: stable@vger.kernel.org Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread()Chao Yu3-4/+11
syzbot reports a f2fs bug as below: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_report+0xe8/0x550 mm/kasan/report.c:491 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline] __refcount_add include/linux/refcount.h:184 [inline] __refcount_inc include/linux/refcount.h:241 [inline] refcount_inc include/linux/refcount.h:258 [inline] get_task_struct include/linux/sched/task.h:118 [inline] kthread_stop+0xca/0x630 kernel/kthread.c:704 f2fs_stop_gc_thread+0x65/0xb0 fs/f2fs/gc.c:210 f2fs_do_shutdown+0x192/0x540 fs/f2fs/file.c:2283 f2fs_ioc_shutdown fs/f2fs/file.c:2325 [inline] __f2fs_ioctl+0x443a/0xbe60 fs/f2fs/file.c:4325 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is below race condition, it may cause use-after-free issue in sbi->gc_th pointer. - remount - f2fs_remount - f2fs_stop_gc_thread - kfree(gc_th) - f2fs_ioc_shutdown - f2fs_do_shutdown - f2fs_stop_gc_thread - kthread_stop(gc_th->f2fs_gc_task) : sbi->gc_thread = NULL; We will call f2fs_do_shutdown() in two paths: - for f2fs_ioc_shutdown() path, we should grab sb->s_umount semaphore for fixing. - for f2fs_shutdown() path, it's safe since caller has already grabbed sb->s_umount semaphore. Reported-by: syzbot+1a8e2b31f2ac9bd3d148@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/0000000000005c7ccb061e032b9b@google.com Fixes: 7950e9ac638e ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: atomic: fix to truncate pagecache before on-disk metadata truncationChao Yu1-0/+4
We should always truncate pagecache while truncating on-disk data. Fixes: a46bebd502fe ("f2fs: synchronize atomic write aborts") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: fix to wait page writeback before setting gcing flagChao Yu1-0/+4
Soft IRQ Thread - f2fs_write_end_io - f2fs_defragment_range - set_page_private_gcing - type = WB_DATA_TYPE(page, false); : assign type w/ F2FS_WB_CP_DATA due to page_private_gcing() is true - dec_page_count() w/ wrong type - end_page_writeback() Value of F2FS_WB_CP_DATA reference count may become negative under above race condition, the root cause is we missed to wait page writeback before setting gcing page private flag, let's fix it. Fixes: 2d1fe8a86bf5 ("f2fs: fix to tag gcing flag on page during file defragment") Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: Create COW inode from parent dentry for atomic writeYeongjin Gil1-9/+3
The i_pino in f2fs_inode_info has the previous parent's i_ino when inode was renamed, which may cause f2fs_ioc_start_atomic_write to fail. If file_wrong_pino is true and i_nlink is 1, then to find a valid pino, we should refer to the dentry from inode. To resolve this issue, let's get parent inode using parent dentry directly. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: Require FMODE_WRITE for atomic write ioctlsJann Horn1-0/+9
The F2FS ioctls for starting and committing atomic writes check for inode_owner_or_capable(), but this does not give LSMs like SELinux or Landlock an opportunity to deny the write access - if the caller's FSUID matches the inode's UID, inode_owner_or_capable() immediately returns true. There are scenarios where LSMs want to deny a process the ability to write particular files, even files that the FSUID of the process owns; but this can currently partially be bypassed using atomic write ioctls in two ways: - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can truncate an inode to size 0 - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert changes another process concurrently made to a file Fix it by requiring FMODE_WRITE for these operations, just like for F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these ioctls when intending to write into the file, that seems unlikely to break anything. Fixes: 88b88a667971 ("f2fs: support atomic writes") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: clean up val{>>,<<}F2FS_BLKSIZE_BITSZhiguo Niu5-8/+8
Use F2FS_BYTES_TO_BLK(bytes) and F2FS_BLK_TO_BYTES(blk) for cleanup Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: fix to use per-inode maxbytes and cleanupZhiguo Niu3-9/+7
This is a supplement to commit 6d1451bf7f84 ("f2fs: fix to use per-inode maxbytes") for some missed cases, also cleanup redundant code in f2fs_llseek. Cc: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: use f2fs_get_node_page when write inline dataZijie Wang1-12/+11
We just need inode page when write inline data, use f2fs_get_node_page() to get it instead of using dnode_of_data, which can eliminate unnecessary struct use. Signed-off-by: Zijie Wang <wangzijie1@honor.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: sysfs: support atgc_enabledliujinbao11-0/+8
When we add "atgc" to the fstab table, ATGC is not immediately enabled. There is a 7-day time threshold, and we can use "atgc_enabled" to show whether ATGC is enabled. Signed-off-by: liujinbao1 <liujinbao1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15Revert "f2fs: use flush command instead of FUA for zoned device"Wenjie Cheng2-3/+2
This reverts commit c550e25bca660ed2554cbb48d32b82d0bb98e4b1. Commit c550e25bca660ed2554cbb48d32b82d0bb98e4b1 ("f2fs: use flush command instead of FUA for zoned device") used additional flush command to keep write order. Since Commit dd291d77cc90eb6a86e9860ba8e6e38eebd57d12 ("block: Introduce zone write plugging") has enabled the block layer to handle this order issue, there is no need to use flush command. Signed-off-by: Wenjie Cheng <cwjhust@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: get rid of buffer_head useChao Yu5-41/+66
Convert to use folio and related functionality. Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: fix to avoid racing in between read and OPU dio writeChao Yu1-0/+4
If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete In LFS mode, if there is inflight dio, let's wait for its completion, this policy won't cover all race cases, however it is a tradeoff which avoids abusing lock around IO paths. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: fix to wait dio completionChao Yu1-0/+13
It should wait all existing dio write IOs before block removal, otherwise, previous direct write IO may overwrite data in the block which may be reused by other inode. Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: reduce expensive checkpoint trigger frequencyChao Yu3-2/+17
We may trigger high frequent checkpoint for below case: 1. mkdir /mnt/dir1; set dir1 encrypted 2. touch /mnt/file1; fsync /mnt/file1 3. mkdir /mnt/dir2; set dir2 encrypted 4. touch /mnt/file2; fsync /mnt/file2 ... Although, newly created dir and file are not related, due to commit bbf156f7afa7 ("f2fs: fix lost xattrs of directories"), we will trigger checkpoint whenever fsync() comes after a new encrypted dir created. In order to avoid such performance regression issue, let's record an entry including directory's ino in global cache whenever we update directory's xattr data, and then triggerring checkpoint() only if xattr metadata of target file's parent was updated. This patch updates to cover below no encryption case as well: 1) parent is checkpointed 2) set_xattr(dir) w/ new xnid 3) create(file) 4) fsync(file) Fixes: bbf156f7afa7 ("f2fs: fix lost xattrs of directories") Reported-by: wangzijie <wangzijie1@honor.com> Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reported-by: Yunlei He <heyunlei@hihonor.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-13introduce fd_file(), convert all accessors to it.Al Viro1-3/+3
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-08-07fs: Convert aops->write_begin to take a folioMatthew Wilcox (Oracle)3-11/+13
Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07fs: Convert aops->write_end to take a folioMatthew Wilcox (Oracle)3-4/+3
Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07f2fs: Convert f2fs_write_begin() to use a folioMatthew Wilcox (Oracle)1-31/+35
Fetch a folio from the page cache instead of a page and use it throughout. We still have to convert back to a page for calling internal f2fs functions, but hopefully they will be converted soon. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07f2fs: Convert f2fs_write_end() to use a folioMatthew Wilcox (Oracle)1-6/+8
Convert the passed page to a folio and operate on that. Replaces five calls to compound_head() with one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-05f2fs: atomic: fix to avoid racing w/ GCChao Yu2-2/+16
Case #1: SQLite App GC Thread Kworker Shrinker - f2fs_ioc_start_atomic_write - f2fs_ioc_commit_atomic_write - f2fs_commit_atomic_write - filemap_write_and_wait_range : write atomic_file's data to cow_inode echo 3 > drop_caches to drop atomic_file's cache. - f2fs_gc - gc_data_segment - move_data_page - set_page_dirty - writepages - f2fs_do_write_data_page : overwrite atomic_file's data to cow_inode - f2fs_down_write(&fi->i_gc_rwsem[WRITE]) - __f2fs_commit_atomic_write - f2fs_up_write(&fi->i_gc_rwsem[WRITE]) Case #2: SQLite App GC Thread Kworker - f2fs_ioc_start_atomic_write - __writeback_single_inode - do_writepages - f2fs_write_cache_pages - f2fs_write_single_data_page - f2fs_do_write_data_page : write atomic_file's data to cow_inode - f2fs_gc - gc_data_segment - move_data_page - set_page_dirty - writepages - f2fs_do_write_data_page : overwrite atomic_file's data to cow_inode - f2fs_ioc_commit_atomic_write In above cases racing in between atomic_write and GC, previous data in atomic_file may be overwrited to cow_file, result in data corruption. This patch introduces PAGE_PRIVATE_ATOMIC_WRITE bit flag in page.private, and use it to indicate that there is last dirty data in atomic file, and the data should be writebacked into cow_file, if the flag is not tagged in page, we should never write data across files. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: fix macro definition stat_inc_cp_countJulian Sun1-1/+1
The macro stat_inc_cp_count accepts a parameter si, but it was not used, rather the variable sbi was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: fix macro definition on_f2fs_build_free_nidsJulian Sun1-1/+1
The macro on_f2fs_build_free_nids accepts a parameter nmi, but it was not used, rather the variable nm_i was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: add write priority option based on zone UFSLiao Yuanhong4-1/+44
Currently, we are using a mix of traditional UFS and zone UFS to support some functionalities that cannot be achieved on zone UFS alone. However, there are some issues with this approach. There exists a significant performance difference between traditional UFS and zone UFS. Under normal usage, we prioritize writes to zone UFS. However, in critical conditions (such as when the entire UFS is almost full), we cannot determine whether data will be written to traditional UFS or zone UFS. This can lead to significant performance fluctuations, which is not conducive to development and testing. To address this, we have added an option zlu_io_enable under sys with the following three modes: 1) zlu_io_enable == 0:Normal mode, prioritize writing to zone UFS; 2) zlu_io_enable == 1:Zone UFS only mode, only allow writing to zone UFS; 3) zlu_io_enable == 2:Traditional UFS priority mode, prioritize writing to traditional UFS. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: avoid potential int overflow in sanity_check_area_boundary()Nikita Zhandarovich1-2/+2
While calculating the end addresses of main area and segment 0, u32 may be not enough to hold the result without the danger of int overflow. Just in case, play it safe and cast one of the operands to a wider type (u64). Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: fd694733d523 ("f2fs: cover large section in sanity check of super") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: fix several potential integer overflows in file offsetsNikita Zhandarovich2-3/+3
When dealing with large extents and calculating file offsets by summing up according extent offsets and lengths of unsigned int type, one may encounter possible integer overflow if the values are big enough. Prevent this from happening by expanding one of the addends to (pgoff_t) type. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: d323d005ac4a ("f2fs: support file defragment") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: prevent possible int overflow in dir_block_index()Nikita Zhandarovich1-1/+2
The result of multiplication between values derived from functions dir_buckets() and bucket_blocks() *could* technically reach 2^30 * 2^2 = 2^32. While unlikely to happen, it is prudent to ensure that it will not lead to integer overflow. Thus, use mul_u32_u32() as it's more appropriate to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 3843154598a0 ("f2fs: introduce large directory support") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: clean up data_blkaddr() and get_dnode_addr()Chao Yu1-29/+17
Introudce a new help get_dnode_base() to wrap common code from get_dnode_addr() and data_blkaddr() for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-24Merge tag 'f2fs-for-6.11-rc1' of ↵Linus Torvalds15-222/+326
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "A pretty small update including mostly minor bug fixes in zoned storage along with the large section support. Enhancements: - add support for FS_IOC_GETFSSYSFSPATH - enable atgc dynamically if conditions are met - use new ioprio Macro to get ckpt thread ioprio level - remove unreachable lazytime mount option parsing Bug fixes: - fix null reference error when checking end of zone - fix start segno of large section - fix to cover read extent cache access with lock - don't dirty inode for readonly filesystem - allocate a new section if curseg is not the first seg in its zone - only fragment segment in the same section - truncate preallocated blocks in f2fs_file_open() - fix to avoid use SSR allocate when do defragment - fix to force buffered IO on inline_data inode And some minor code clean-ups and sanity checks" * tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (26 commits) f2fs: clean up addrs_per_{inode,block}() f2fs: clean up F2FS_I() f2fs: use meta inode for GC of COW file f2fs: use meta inode for GC of atomic file f2fs: only fragment segment in the same section f2fs: fix to update user block counts in block_operations() f2fs: remove unreachable lazytime mount option parsing f2fs: fix null reference error when checking end of zone f2fs: fix start segno of large section f2fs: remove redundant sanity check in sanity_check_inode() f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid f2fs: fix to use mnt_{want,drop}_write_file replace file_{start,end}_wrtie f2fs: clean up set REQ_RAHEAD given rac f2fs: enable atgc dynamically if conditions are met f2fs: fix to truncate preallocated blocks in f2fs_file_open() f2fs: fix to cover read extent cache access with lock f2fs: fix return value of f2fs_convert_inline_inode() f2fs: use new ioprio Macro to get ckpt thread ioprio level f2fs: fix to don't dirty inode for readonly filesystem f2fs: fix to avoid use SSR allocate when do defragment ...
2024-07-15Merge tag 'vfs-6.11.casefold' of ↵Linus Torvalds5-93/+55
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs casefolding updates from Christian Brauner: "This contains some work to simplify the handling of casefolded names: - Simplify the handling of casefolded names in f2fs and ext4 by keeping the names as a qstr to avoiding unnecessary conversions - Introduce a new generic_ci_match() libfs case-insensitive lookup helper and use it in both f2fs and ext4 allowing to remove the filesystem specific implementations - Remove a bunch of ifdefs by making the unicode build checks part of the code flow" * tag 'vfs-6.11.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: f2fs: Move CONFIG_UNICODE defguards into the code flow ext4: Move CONFIG_UNICODE defguards into the code flow f2fs: Reuse generic_ci_match for ci comparisons ext4: Reuse generic_ci_match for ci comparisons libfs: Introduce case-insensitive string comparison helper f2fs: Simplify the handling of cached casefolded names ext4: Simplify the handling of cached casefolded names
2024-07-11f2fs: clean up addrs_per_{inode,block}()Chao Yu1-13/+7
Introduce a new help addrs_per_page() to wrap common code from addrs_per_inode() and addrs_per_block() for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-11f2fs: clean up F2FS_I()Chao Yu5-93/+88
Use temporary variable instead of F2FS_I() for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-11f2fs: use meta inode for GC of COW fileSunmin Jeong6-7/+23
In case of the COW file, new updates and GC writes are already separated to page caches of the atomic file and COW file. As some cases that use the meta inode for GC, there are some race issues between a foreground thread and GC thread. To handle them, we need to take care when to invalidate and wait writeback of GC pages in COW files as the case of using the meta inode. Also, a pointer from the COW inode to the original inode is required to check the state of original pages. For the former, we can solve the problem by using the meta inode for GC of COW files. Then let's get a page from the original inode in move_data_block when GCing the COW file to avoid race condition. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable@vger.kernel.org #v5.19+ Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-11f2fs: use meta inode for GC of atomic fileSunmin Jeong4-9/+14
The page cache of the atomic file keeps new data pages which will be stored in the COW file. It can also keep old data pages when GCing the atomic file. In this case, new data can be overwritten by old data if a GC thread sets the old data page as dirty after new data page was evicted. Also, since all writes to the atomic file are redirected to COW inodes, GC for the atomic file is not working well as below. f2fs_gc(gc_type=FG_GC) - select A as a victim segment do_garbage_collect - iget atomic file's inode for block B move_data_page f2fs_do_write_data_page - use dn of cow inode - set fio->old_blkaddr from cow inode - seg_freed is 0 since block B is still valid - goto gc_more and A is selected as victim again To solve the problem, let's separate GC writes and updates in the atomic file by using the meta inode for GC writes. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable@vger.kernel.org #v5.19+ Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-11f2fs: only fragment segment in the same sectionSheng Yong1-3/+11
When new_curseg() is allocating a new segment, if mode=fragment:xxx is switched on in large section scenario, __get_next_segno() will select the next segno randomly in the range of [0, maxsegno] in order to fragment segments. If the candidate segno is free, get_new_segment() will use it directly as the new segment. However, if the section of the candidate is not empty, and some other segments have already been used, and have a different type (e.g NODE) with the candidate (e.g DATA), GC will complain inconsistent segment type later. This could be reproduced by the following steps: dd if=/dev/zero of=test.img bs=1M count=10240 mkfs.f2fs -s 128 test.img mount -t f2fs test.img /mnt -o mode=fragment:block echo 1 > /sys/fs/f2fs/loop0/max_fragment_chunk echo 512 > /sys/fs/f2fs/loop0/max_fragment_hole dd if=/dev/zero of=/mnt/testfile bs=4K count=100 umount /mnt F2FS-fs (loop0): Inconsistent segment (4377) type [0, 1] in SSA and SIT In order to allow simulating segment fragmentation in large section scenario, this patch reduces the candidate range: * if curseg is the last segment in the section, return curseg->segno to make get_new_segment() itself find the next free segment. * if curseg is in the middle of the section, select candicate randomly in the range of [curseg + 1, last_seg_in_the_same_section] to keep type consistent. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Sheng Yong <shengyong@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-11f2fs: fix to update user block counts in block_operations()Chao Yu1-5/+5
Commit 59c9081bc86e ("f2fs: allow write page cache when writting cp") allows write() to write data to page cache during checkpoint, so block count fields like .total_valid_block_count, .alloc_valid_block_count and .rf_node_block_count may encounter race condition as below: CP Thread A - write_checkpoint - block_operations - f2fs_down_write(&sbi->node_change) - __prepare_cp_block : ckpt->valid_block_count = .total_valid_block_count - f2fs_up_write(&sbi->node_change) - write - f2fs_preallocate_blocks - f2fs_map_blocks(,F2FS_GET_BLOCK_PRE_AIO) - f2fs_map_lock - f2fs_down_read(&sbi->node_change) - f2fs_reserve_new_blocks - inc_valid_block_count : percpu_counter_add(&sbi->alloc_valid_block_count, count) : sbi->total_valid_block_count += count - f2fs_up_read(&sbi->node_change) - do_checkpoint : sbi->last_valid_block_count = sbi->total_valid_block_count : percpu_counter_set(&sbi->alloc_valid_block_count, 0) : percpu_counter_set(&sbi->rf_node_block_count, 0) - fsync - need_do_checkpoint - f2fs_space_for_roll_forward : alloc_valid_block_count was reset to zero, so, it may missed last data during checkpoint Let's change to update .total_valid_block_count, .alloc_valid_block_count and .rf_node_block_count in block_operations(), then their access can be protected by .node_change and .cp_rwsem lock, so that it can avoid above race condition. Fixes: 59c9081bc86e ("f2fs: allow write page cache when writting cp") Cc: Yunlei He <heyunlei@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-11f2fs: remove unreachable lazytime mount option parsingEric Sandeen1-10/+0
The lazytime/nolazytime options are now handled in the VFS, and are never seen in filesystem parsers, so remove handling of these options from f2fs. Note: when lazytime support was added in 6d94c74ab85f it made lazytime the default in default_options() - as a result, lazytime cannot be disabled (because Opt_nolazytime is never seen in f2fs parsing). If lazytime is desired to be configurable, and default off is OK, default_options() could be updated to stop setting it by default and allow mount option control. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-11f2fs: fix null reference error when checking end of zoneDaejun Park1-1/+3
This patch fixes a potentially null pointer being accessed by is_end_zone_blkaddr() that checks the last block of a zone when f2fs is mounted as a single device. Fixes: e067dc3c6b9c ("f2fs: maintain six open zones for zoned devices") Signed-off-by: Daejun Park <daejun7.park@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-09f2fs: fix start segno of large sectionSheng Yong1-1/+2
get_ckpt_valid_blocks() checks valid ckpt blocks in current section. It counts all vblocks from the first to the last segment in the large section. However, START_SEGNO() is used to get the first segno in an SIT block. This patch fixes that to get the correct start segno. Fixes: 61461fc921b7 ("f2fs: fix to avoid touching checkpointed data in get_victim()") Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-06-27vfs: rename parent_ino to d_parent_ino and make it use RCUMateusz Guzik1-1/+1
The routine is used by procfs through dir_emit_dots. The combined RCU and lock fallback implementation is too big for an inline. Given that the routine takes a dentry argument fs/dcache.c seems like the place to put it in. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240627161152.802567-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-25f2fs: Use in_group_or_capable() helperYouling Tang2-5/+2
Use the in_group_or_capable() helper function to simplify the code. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Link: https://lore.kernel.org/r/20240620032335.147136-2-youling.tang@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-24f2fs: remove redundant sanity check in sanity_check_inode()Chao Yu1-4/+0
Commit f240d3aaf5a1 ("f2fs: do more sanity check on inode") missed to remove redundant sanity check on flexible_inline_xattr flag, fix it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>