summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)AuthorFilesLines
2025-08-05Merge tag 'f2fs-for-6.17-rc1' of ↵Linus Torvalds20-1526/+1987
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "Three main updates: folio conversion by Matthew, switch to a new mount API by Hongbo and Eric, and several sysfs entries to tune GCs for ZUFS with finer granularity by Daeho. There are also patches to address bugs and issues in the existing features such as GCs, file pinning, write-while-dio-read, contingous block allocation, and memory access violations. Enhancements: - switch to new mount API and folio conversion - add sysfs nodes to controle F2FS GCs for ZUFS - improve performance on the nat entry cache - drop inode from the donation list when the last file is closed - avoid splitting bio when reading multiple pages Bug fixes: - fix to trigger foreground gc during f2fs_map_blocks() in lfs mode - make sure zoned device GC to use FG_GC in shortage of free section - fix to calculate dirty data during has_not_enough_free_secs() - fix to update upper_p in __get_secs_required() correctly - wait for inflight dio completion, excluding pinned files read using dio - don't break allocation when crossing contiguous sections - vm_unmap_ram() may be called from an invalid context - fix to avoid out-of-boundary access in dnode page - fix to avoid panic in f2fs_evict_inode - fix to avoid UAF in f2fs_sync_inode_meta() - fix to use f2fs_is_valid_blkaddr_raw() in do_write_page() - fix UAF of f2fs_inode_info in f2fs_free_dic - fix to avoid invalid wait context issue - fix bio memleak when committing super block - handle nat.blkaddr corruption in f2fs_get_node_info() In addition, there are also clean-ups and minor bug fixes" * tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (109 commits) f2fs: drop inode from the donation list when the last file is closed f2fs: add gc_boost_gc_greedy sysfs node f2fs: add gc_boost_gc_multiple sysfs node f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode f2fs: fix to calculate dirty data during has_not_enough_free_secs() f2fs: fix to update upper_p in __get_secs_required() correctly f2fs: directly add newly allocated pre-dirty nat entry to dirty set list f2fs: avoid redundant clean nat entry move in lru list f2fs: zone: wait for inflight dio completion, excluding pinned files read using dio f2fs: ignore valid ratio when free section count is low f2fs: don't break allocation when crossing contiguous sections f2fs: remove unnecessary tracepoint enabled check f2fs: merge the two conditions to avoid code duplication f2fs: vm_unmap_ram() may be called from an invalid context f2fs: fix to avoid out-of-boundary access in dnode page f2fs: switch to the new mount api f2fs: introduce fs_context_operation structure f2fs: separate the options parsing and options checking f2fs: Add f2fs_fs_context to record the mount options f2fs: Allow sbi to be NULL in f2fs_printk ...
2025-07-30f2fs: drop inode from the donation list when the last file is closedJaegeuk Kim4-2/+11
Let's drop the inode from the donation list when there is no other open file. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-29f2fs: add gc_boost_gc_greedy sysfs nodeDaeho Jeong3-1/+12
Add this to control GC algorithm for boost GC. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-29f2fs: add gc_boost_gc_multiple sysfs nodeDaeho Jeong3-1/+12
Add a sysfs knob to set a multiplier for the background GC migration window when F2FS Garbage Collection is boosted. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-29Merge tag 'vfs-6.17-rc1.fileattr' of ↵Linus Torvalds2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
2025-07-28Merge tag 'vfs-6.17-rc1.mmap_prepare' of ↵Linus Torvalds1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_*mmap() users to .mmap_prepare() fs: convert simple use of generic_file_*_mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28Merge tag 'vfs-6.17-rc1.misc' of ↵Linus Torvalds1-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc VFS updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add ext4 IOCB_DONTCACHE support This refactors the address_space_operations write_begin() and write_end() callbacks to take const struct kiocb * as their first argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate to the filesystem's buffered I/O path. Ext4 is updated to implement handling of the IOCB_DONTCACHE flag and advertises support via the FOP_DONTCACHE file operation flag. Additionally, the i915 driver's shmem write paths are updated to bypass the legacy write_begin/write_end interface in favor of directly calling write_iter() with a constructed synchronous kiocb. Another i915 change replaces a manual write loop with kernel_write() during GEM shmem object creation. Cleanups: - don't duplicate vfs_open() in kernel_file_open() - proc_fd_getattr(): don't bother with S_ISDIR() check - fs/ecryptfs: replace snprintf with sysfs_emit in show function - vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() - filelock: add new locks_wake_up_waiter() helper - fs: Remove three arguments from block_write_end() - VFS: change old_dir and new_dir in struct renamedata to dentrys - netfs: Remove unused declaration netfs_queue_write_request() Fixes: - eventpoll: Fix semi-unbounded recursion - eventpoll: fix sphinx documentation build warning - fs/read_write: Fix spelling typo - fs: annotate data race between poll_schedule_timeout() and pollwake() - fs/pipe: set FMODE_NOWAIT in create_pipe_files() - docs/vfs: update references to i_mutex to i_rwsem - fs/buffer: remove comment about hard sectorsize - fs/buffer: remove the min and max limit checks in __getblk_slow() - fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable - fs_context: fix parameter name in infofc() macro - fs: Prevent file descriptor table allocations exceeding INT_MAX" * tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits) netfs: Remove unused declaration netfs_queue_write_request() eventpoll: fix sphinx documentation build warning ext4: support uncached buffered I/O mm/pagemap: add write_begin_get_folio() helper function fs: change write_begin/write_end interface to take struct kiocb * drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter drm/i915: Use kernel_write() in shmem object create eventpoll: Fix semi-unbounded recursion vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable fs/buffer: remove the min and max limit checks in __getblk_slow() fs: Prevent file descriptor table allocations exceeding INT_MAX fs: Remove three arguments from block_write_end() fs/ecryptfs: replace snprintf with sysfs_emit in show function fs: annotate suspected data race between poll_schedule_timeout() and pollwake() docs/vfs: update references to i_mutex to i_rwsem fs/buffer: remove comment about hard sectorsize fs_context: fix parameter name in infofc() macro VFS: change old_dir and new_dir in struct renamedata to dentrys proc_fd_getattr(): don't bother with S_ISDIR() check ...
2025-07-28f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs modeChao Yu1-1/+4
w/ "mode=lfs" mount option, generic/299 will cause system panic as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:2835! Call Trace: <TASK> f2fs_allocate_data_block+0x6f4/0xc50 f2fs_map_blocks+0x970/0x1550 f2fs_iomap_begin+0xb2/0x1e0 iomap_iter+0x1d6/0x430 __iomap_dio_rw+0x208/0x9a0 f2fs_file_write_iter+0x6b3/0xfa0 aio_write+0x15d/0x2e0 io_submit_one+0x55e/0xab0 __x64_sys_io_submit+0xa5/0x230 do_syscall_64+0x84/0x2f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0010:new_curseg+0x70f/0x720 The root cause of we run out-of-space is: in f2fs_map_blocks(), f2fs may trigger foreground gc only if it allocates any physical block, it will be a little bit later when there is multiple threads writing data w/ aio/dio/bufio method in parallel, since we always use OPU in lfs mode, so f2fs_map_blocks() does block allocations aggressively. In order to fix this issue, let's give a chance to trigger foreground gc in prior to block allocation in f2fs_map_blocks(). Fixes: 36abef4e796d ("f2fs: introduce mode=lfs mount option") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28f2fs: fix to calculate dirty data during has_not_enough_free_secs()Chao Yu1-2/+1
In lfs mode, dirty data needs OPU, we'd better calculate lower_p and upper_p w/ them during has_not_enough_free_secs(), otherwise we may encounter out-of-space issue due to we missed to reclaim enough free section w/ foreground gc. Fixes: 36abef4e796d ("f2fs: introduce mode=lfs mount option") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28f2fs: fix to update upper_p in __get_secs_required() correctlyChao Yu1-1/+1
Commit 1acd73edbbfe ("f2fs: fix to account dirty data in __get_secs_required()") missed to calculate upper_p w/ data_secs, fix it. Fixes: 1acd73edbbfe ("f2fs: fix to account dirty data in __get_secs_required()") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28f2fs: directly add newly allocated pre-dirty nat entry to dirty set listwangzijie1-8/+21
When we need to alloc nat entry and set it dirty, we can directly add it to dirty set list(or initialize its list_head for new_ne) instead of adding it to clean list and make a move. Introduce init_dirty flag to do it. Signed-off-by: wangzijie <wangzijie1@honor.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28f2fs: avoid redundant clean nat entry move in lru listwangzijie1-12/+15
__lookup_nat_cache follows LRU manner to move clean nat entry, when nat entries are going to be dirty, no need to move them to tail of lru list. Introduce a parameter 'for_dirty' to avoid it. Signed-off-by: wangzijie <wangzijie1@honor.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-24f2fs: zone: wait for inflight dio completion, excluding pinned files read ↵yohan.joung1-2/+6
using dio read for the pinfile using Direct I/O do not wait for dio write. Signed-off-by: yohan.joung <yohan.joung@sk.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-24f2fs: ignore valid ratio when free section count is lowDaeho Jeong1-6/+12
Otherwise F2FS will not do GC in background in low free section. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-24f2fs: don't break allocation when crossing contiguous sectionsChao Yu2-10/+19
Commit 0638a3197c19 ("f2fs: avoid unused block when dio write in LFS mode") has fixed unused block issue for dio write in lfs mode. However, f2fs_map_blocks() may break and return smaller extent when last allocated block locates in the end of section, even allocator can allocate contiguous blocks across sections. Actually, for the case that allocator returns a block address which is not contiguous w/ current extent, we can record the block address in iomap->private, in the next round, skip reallocating for the last allocated block, then we can fix unused block issue, meanwhile, also, we can allocates contiguous physical blocks as much as possible for dio write in lfs mode. Testcase: - mkfs.f2fs -f /dev/vdb - mount -o mode=lfs /dev/vdb /mnt/f2fs - dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=3; sync; - dd if=/dev/zero of=/mnt/f2fs/dio bs=2M count=1 oflag=direct; - umount /mnt/f2fs Before: f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 0, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0 f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 256, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0 f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 512, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0 f2fs_map_blocks: dev = (253,16), ino = 5, file offset = 0, start blkaddr = 0x4700, len = 0x100, flags = 3, seg_type = 1, may_create = 1, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 5, file offset = 256, start blkaddr = 0x4800, len = 0x100, flags = 3, seg_type = 1, may_create = 1, multidevice = 0, flag = 3, err = 0 After: f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 0, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0 f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 256, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0 f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 512, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0 f2fs_map_blocks: dev = (253,16), ino = 5, file offset = 0, start blkaddr = 0x4700, len = 0x200, flags = 3, seg_type = 1, may_create = 1, multidevice = 0, flag = 3, err = 0 Cc: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-24f2fs: remove unnecessary tracepoint enabled checkSheng Yong1-6/+3
There is no extra work before trace_f2fs_[dataread|datawrite]_end(), so there is no need to check trace_<tracepoint>_enabled(). Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-24f2fs: merge the two conditions to avoid code duplicationmason.zhang1-6/+1
No functional changes. Signed-off-by: mason.zhang <masonzhang.linuxer@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-24f2fs: vm_unmap_ram() may be called from an invalid contextJan Prusakowski1-1/+1
When testing F2FS with xfstests using UFS backed virtual disks the kernel complains sometimes that f2fs_release_decomp_mem() calls vm_unmap_ram() from an invalid context. Example trace from f2fs/007 test: f2fs/007 5s ... [12:59:38][ 8.902525] run fstests f2fs/007 [ 11.468026] BUG: sleeping function called from invalid context at mm/vmalloc.c:2978 [ 11.471849] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 68, name: irq/22-ufshcd [ 11.475357] preempt_count: 1, expected: 0 [ 11.476970] RCU nest depth: 0, expected: 0 [ 11.478531] CPU: 0 UID: 0 PID: 68 Comm: irq/22-ufshcd Tainted: G W 6.16.0-rc5-xfstests-ufs-g40f92e79b0aa #9 PREEMPT(none) [ 11.478535] Tainted: [W]=WARN [ 11.478536] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 11.478537] Call Trace: [ 11.478543] <TASK> [ 11.478545] dump_stack_lvl+0x4e/0x70 [ 11.478554] __might_resched.cold+0xaf/0xbe [ 11.478557] vm_unmap_ram+0x21/0xb0 [ 11.478560] f2fs_release_decomp_mem+0x59/0x80 [ 11.478563] f2fs_free_dic+0x18/0x1a0 [ 11.478565] f2fs_finish_read_bio+0xd7/0x290 [ 11.478570] blk_update_request+0xec/0x3b0 [ 11.478574] ? sbitmap_queue_clear+0x3b/0x60 [ 11.478576] scsi_end_request+0x27/0x1a0 [ 11.478582] scsi_io_completion+0x40/0x300 [ 11.478583] ufshcd_mcq_poll_cqe_lock+0xa3/0xe0 [ 11.478588] ufshcd_sl_intr+0x194/0x1f0 [ 11.478592] ufshcd_threaded_intr+0x68/0xb0 [ 11.478594] ? __pfx_irq_thread_fn+0x10/0x10 [ 11.478599] irq_thread_fn+0x20/0x60 [ 11.478602] ? __pfx_irq_thread_fn+0x10/0x10 [ 11.478603] irq_thread+0xb9/0x180 [ 11.478605] ? __pfx_irq_thread_dtor+0x10/0x10 [ 11.478607] ? __pfx_irq_thread+0x10/0x10 [ 11.478609] kthread+0x10a/0x230 [ 11.478614] ? __pfx_kthread+0x10/0x10 [ 11.478615] ret_from_fork+0x7e/0xd0 [ 11.478619] ? __pfx_kthread+0x10/0x10 [ 11.478621] ret_from_fork_asm+0x1a/0x30 [ 11.478623] </TASK> This patch modifies in_task() check inside f2fs_read_end_io() to also check if interrupts are disabled. This ensures that pages are unmapped asynchronously in an interrupt handler. Fixes: bff139b49d9f ("f2fs: handle decompress only post processing in softirq") Signed-off-by: Jan Prusakowski <jprusakowski@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: fix to avoid out-of-boundary access in dnode pageChao Yu1-0/+10
As Jiaming Zhang reported: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x1c1/0x2a0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x17e/0x800 mm/kasan/report.c:480 kasan_report+0x147/0x180 mm/kasan/report.c:593 data_blkaddr fs/f2fs/f2fs.h:3053 [inline] f2fs_data_blkaddr fs/f2fs/f2fs.h:3058 [inline] f2fs_get_dnode_of_data+0x1a09/0x1c40 fs/f2fs/node.c:855 f2fs_reserve_block+0x53/0x310 fs/f2fs/data.c:1195 prepare_write_begin fs/f2fs/data.c:3395 [inline] f2fs_write_begin+0xf39/0x2190 fs/f2fs/data.c:3594 generic_perform_write+0x2c7/0x910 mm/filemap.c:4112 f2fs_buffered_write_iter fs/f2fs/file.c:4988 [inline] f2fs_file_write_iter+0x1ec8/0x2410 fs/f2fs/file.c:5216 new_sync_write fs/read_write.c:593 [inline] vfs_write+0x546/0xa90 fs/read_write.c:686 ksys_write+0x149/0x250 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf3/0x3d0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is in the corrupted image, there is a dnode has the same node id w/ its inode, so during f2fs_get_dnode_of_data(), it tries to access block address in dnode at offset 934, however it parses the dnode as inode node, so that get_dnode_addr() returns 360, then it tries to access page address from 360 + 934 * 4 = 4096 w/ 4 bytes. To fix this issue, let's add sanity check for node id of all direct nodes during f2fs_get_dnode_of_data(). Cc: stable@kernel.org Reported-by: Jiaming Zhang <r772577952@gmail.com> Closes: https://groups.google.com/g/syzkaller/c/-ZnaaOOfO3M Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: switch to the new mount apiHongbo Li1-95/+70
The new mount api will execute .parse_param, .init_fs_context, .get_tree and will call .remount if remount happened. So we add the necessary functions for the fs_context_operations. If .init_fs_context is added, the old .mount should remove. See Documentation/filesystems/mount_api.rst for more information. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> [sandeen: forward port] Signed-off-by: Eric Sandeen <sandeen@redhat.com> [hongbo: context modified] Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: introduce fs_context_operation structureHongbo Li1-2/+6
The handle_mount_opt() helper is used to parse mount parameters, and so we can rename this function to f2fs_parse_param() and set it as .param_param in fs_context_operations. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> [sandeen: forward port] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: separate the options parsing and options checkingHongbo Li1-193/+545
The new mount api separates option parsing and super block setup into two distinct steps and so we need to separate the options parsing out of the parse_options(). In order to achieve this, here we handle the mount options with three steps: - Firstly, we move sb/sbi out of handle_mount_opt. As the former patch introduced f2fs_fs_context, so we record the changed mount options in this context. In handle_mount_opt, sb/sbi is null, so we should move all relative code out of handle_mount_opt (thus, some check case which use sb/sbi should move out). - Secondly, we introduce the some check helpers to keep the option consistent. During filling superblock period, sb/sbi are ready. So we check the f2fs_fs_context which holds the mount options base on sb/sbi. - Thirdly, we apply the new mount options to sb/sbi. After checking the f2fs_fs_context, all changed on mount options are valid. So we can apply them to sb/sbi directly. After do these, option parsing and super block setting have been decoupled. Also it should have retained the original execution flow. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> [sandeen: forward port, minor fixes and updates] Signed-off-by: Eric Sandeen <sandeen@redhat.com> [hongbo: minor fixes] Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Add f2fs_fs_context to record the mount optionsHongbo Li1-175/+235
At the parsing phase of mouont in the new mount api, options value will be recorded with the context, and then it will be used in fill_super and other helpers. Note that, this is a temporary status, we want remove the sb and sbi usages in handle_mount_opt. So here the f2fs_fs_context only records the mount options, it will be copied in sb/sbi in later process. (At this point in the series, mount options are temporarily not set during mount.) Signed-off-by: Hongbo Li <lihongbo22@huawei.com> [sandeen: forward port, minor fixes and updates] Signed-off-by: Eric Sandeen <sandeen@redhat.com> [hongbo: minor cleanup] Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Allow sbi to be NULL in f2fs_printkHongbo Li1-41/+49
At the parsing phase of the new mount api, sbi will not be available. So here allows sbi to be NULL in f2fs log helpers and use that in handle_mount_opt(). Signed-off-by: Hongbo Li <lihongbo22@huawei.com> [sandeen: forward port] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: move the option parser into handle_mount_optHongbo Li1-655/+413
In handle_mount_opt, we use fs_parameter to parse each option. However we're still using the old API to get the options string. Using fsparams parse_options allows us to remove many of the Opt_ enums, so remove them. The checkpoint disable cap (or percent) involves rather complex parsing; we retain the old match_table mechanism for this, which handles it well. There are some changes about parsing options: 1. For `active_logs`, `inline_xattr_size` and `fault_injection`, we use s32 type according the internal structure to record the option's value. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> [sandeen: forward port, minor fixes and updates] Signed-off-by: Eric Sandeen <sandeen@redhat.com> [hongbo: minor cleanup] Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Add fs parameter specifications for mount optionsHongbo Li1-0/+122
Use an array of `fs_parameter_spec` called f2fs_param_specs to hold the mount option specifications for the new mount api. Add constant_table structures for several options to facilitate parsing. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> [sandeen: forward port, minor fixes and updates, more fsparam_enum] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: fix to avoid out-of-boundary access in devs.pathChao Yu1-1/+1
- touch /mnt/f2fs/012345678901234567890123456789012345678901234567890123 - truncate -s $((1024*1024*1024)) \ /mnt/f2fs/012345678901234567890123456789012345678901234567890123 - touch /mnt/f2fs/file - truncate -s $((1024*1024*1024)) /mnt/f2fs/file - mkfs.f2fs /mnt/f2fs/012345678901234567890123456789012345678901234567890123 \ -c /mnt/f2fs/file - mount /mnt/f2fs/012345678901234567890123456789012345678901234567890123 \ /mnt/f2fs/loop [16937.192225] F2FS-fs (loop0): Mount Device [ 0]: /mnt/f2fs/012345678901234567890123456789012345678901234567890123\xff\x01, 511, 0 - 3ffff [16937.192268] F2FS-fs (loop0): Failed to find devices If device path length equals to MAX_PATH_LEN, sbi->devs.path[] may not end up w/ null character due to path array is fully filled, So accidently, fields locate after path[] may be treated as part of device path, result in parsing wrong device path. struct f2fs_dev_info { ... char path[MAX_PATH_LEN]; ... }; Let's add one byte space for sbi->devs.path[] to store null character of device path string. Fixes: 3c62be17d4f5 ("f2fs: support multiple devices") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Remove F2FS_P_SB()Matthew Wilcox (Oracle)1-5/+0
All callers have been converted to F2FS_F_SB() so delete this wrapper. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to __has_merged_page()Matthew Wilcox (Oracle)1-6/+6
All three callers have a folio so pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to f2fs_submit_merged_write_cond()Matthew Wilcox (Oracle)3-6/+6
Most callers pass NULL, and the one that passes a page already has a folio. Also convert __submit_merged_write_cond() to take a folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Remove use of page from f2fs_write_single_data_page()Matthew Wilcox (Oracle)1-3/+2
Both remaining uses of page now have a folio equivalent. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Remove clear_page_private_all()Matthew Wilcox (Oracle)3-21/+3
All callers can simply call folio_detach_private(). This was the only way that clear_page_private_data() could be called, so remove that too. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Use F2FS_F_SB() in f2fs_read_end_io()Matthew Wilcox (Oracle)1-1/+1
Get the folio from the bio instead of the page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Use a folio in f2fs_encrypted_get_link()Matthew Wilcox (Oracle)1-6/+6
Use a folio instead of a page when dealing with the page cache. Removes a hidden call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to f2fs_cache_compressed_page()Matthew Wilcox (Oracle)2-8/+7
The only caller already has a folio so pass it in. f2fs_cache_compressed_page() is not used outside compress.c so make it static. This requires a forward declaration (or would require rearranging this file, but I've chosen not to do that for readability of the diff). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to F2FS_NODE()Matthew Wilcox (Oracle)4-27/+27
All callers now have a folio so pass it in Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass the nat_blk to __update_nat_bits()Matthew Wilcox (Oracle)1-3/+2
The page argument is only used to look up the address of the nat_blk. Since the caller already has it, pass it in instead. Also mark it const as the nat_blk isn't modified by this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Convert get_next_nat_page() to get_next_nat_folio()Matthew Wilcox (Oracle)1-10/+10
Return a folio from this function and convert its one caller. Removes a call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to f2fs_is_compressed_page()Matthew Wilcox (Oracle)3-15/+13
All callers now have a folio so pass it in. Also remove the test for the private flag; it is redundant with checking folio->private for being NULL. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Use a folio iterator in f2fs_verify_bio()Matthew Wilcox (Oracle)1-6/+5
Change from bio_for_each_segment_all() to bio_for_each_folio_all() to iterate over each folio instead of each page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to f2fs_end_read_compressed_page()Matthew Wilcox (Oracle)3-8/+7
Both callers now have a folio so pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Use a folio iterator in f2fs_handle_step_decompress()Matthew Wilcox (Oracle)1-6/+5
Change from bio_for_each_segment_all() to bio_for_each_folio_all() to iterate over each folio instead of each page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to WB_DATA_TYPE() and f2fs_is_cp_guaranteed()Matthew Wilcox (Oracle)3-10/+9
All callers now have a folio so pass it in. Removes a call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Use a bio in f2fs_submit_page_write()Matthew Wilcox (Oracle)1-8/+8
Convert bio_page to bio_folio and use it throughout. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Use a folio in f2fs_merge_page_bio()Matthew Wilcox (Oracle)1-6/+6
We have two folios to deal with here; one carries the metadata and the other points to the data. They may be the same, but if it's compressed, the data_folio will differ from the metadata folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to f2fs_compress_write_end_io()Matthew Wilcox (Oracle)3-5/+5
The only caller has a folio so pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Convert get_page_private_data() to folio_get_f2fs_data()Matthew Wilcox (Oracle)2-3/+3
The only caller already has a folio so convert this function to be folio based. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Convert set_page_private_data() to folio_set_f2fs_data()Matthew Wilcox (Oracle)2-6/+8
The only caller has a folio, so pass it in and operate on it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Use a folio in f2fs_is_cp_guaranteed()Matthew Wilcox (Oracle)2-6/+7
Convert the passed page to a folio and use it throughout. Removes a use of fscrypt_is_bounce_page(), which we're trying to remove. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Add folio counterparts to page_private_flags functionsMatthew Wilcox (Oracle)8-22/+50
Name these new functions folio_test_f2fs_*(), folio_set_f2fs_*() and folio_clear_f2fs_*(). Convert all callers which currently have a folio and cast back to a page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>