summaryrefslogtreecommitdiff
path: root/fs/f2fs/segment.c
AgeCommit message (Collapse)AuthorFilesLines
2019-11-28f2fs: fix to spread clear_cold_data()Chao Yu1-1/+3
[ Upstream commit 2baf07818549c8bb8d7b3437e889b86eab56d38e ] We need to drop PG_checked flag on page as well when we clear PG_uptodate flag, in order to avoid treating the page as GCing one later. Signed-off-by: Weichao Guo <guoweichao@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05f2fs: fix to do sanity check on segment bitmap of LFS cursegChao Yu1-0/+39
[ Upstream commit c854f4d681365498f53ba07843a16423625aa7e9 ] As Jungyeon Reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203233 - Reproduces gcc poc_13.c ./run.sh f2fs - Kernel messages F2FS-fs (sdb): Bitmap was wrongly set, blk:4608 kernel BUG at fs/f2fs/segment.c:2133! RIP: 0010:update_sit_entry+0x35d/0x3e0 Call Trace: f2fs_allocate_data_block+0x16c/0x5a0 do_write_page+0x57/0x100 f2fs_do_write_node_page+0x33/0xa0 __write_node_page+0x270/0x4e0 f2fs_sync_node_pages+0x5df/0x670 f2fs_write_checkpoint+0x364/0x13a0 f2fs_sync_fs+0xa3/0x130 f2fs_do_sync_file+0x1a6/0x810 do_fsync+0x33/0x60 __x64_sys_fsync+0xb/0x10 do_syscall_64+0x43/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The testcase fails because that, in fuzzed image, current segment was allocated with LFS type, its .next_blkoff should point to an unused block address, but actually, its bitmap shows it's not. So during allocation, f2fs crash when setting bitmap. Introducing sanity_check_curseg() to check such inconsistence of current in-used segment. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05Revert "f2fs: avoid out-of-range memory access"Chao Yu1-5/+0
[ Upstream commit a37d0862d17411edb67677a580a6f505ec2225f6 ] As Pavel Machek reported: "We normally use -EUCLEAN to signal filesystem corruption. Plus, it is good idea to report it to the syslog and mark filesystem as "needing fsck" if filesystem can do that." Still we need improve the original patch with: - use unlikely keyword - add message print - return EUCLEAN However, after rethink this patch, I don't think we should add such condition check here as below reasons: - We have already checked the field in f2fs_sanity_check_ckpt(), - If there is fs corrupt or security vulnerability, there is nothing to guarantee the field is integrated after the check, unless we do the check before each of its use, however no filesystem does that. - We only have similar check for bitmap, which was added due to there is bitmap corruption happened on f2fs' runtime in product. - There are so many key fields in SB/CP/NAT did have such check after f2fs_sanity_check_{sb,cp,..}. So I propose to revert this unneeded check. This reverts commit 56f3ce675103e3fb9e631cfb4131fc768bc23e9a. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-04f2fs: avoid out-of-range memory accessOcean Chen1-0/+5
[ Upstream commit 56f3ce675103e3fb9e631cfb4131fc768bc23e9a ] blkoff_off might over 512 due to fs corrupt or security vulnerability. That should be checked before being using. Use ENTRIES_IN_SUM to protect invalid value in cur_data_blkoff. Signed-off-by: Ocean Chen <oceanchen@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-08f2fs: introduce and spread verify_blkaddrChao Yu1-2/+2
commit e1da7872f6eda977bd812346bf588c35e4495a1e upstream. This patch introduces verify_blkaddr to check meta/data block address with valid range to detect bug earlier. In addition, once we encounter an invalid blkaddr, notice user to run fsck to fix, and let the kernel panic. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [bwh: Backported to 4.9: - I skipped an earlier renaming of is_valid_meta_blkaddr() to f2fs_is_valid_meta_blkaddr() - Adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08f2fs: clean up with is_valid_blkaddr()Chao Yu1-2/+2
commit 7b525dd01365c6764018e374d391c92466be1b7a upstream. - rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability. - introduce is_valid_blkaddr() for cleanup. No logic change in this patch. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08f2fs: sanity check on sit entryJaegeuk Kim1-0/+9
commit b2ca374f33bd33fd822eb871876e4888cf79dc97 upstream. syzbot hit the following crash on upstream commit 87ef12027b9b1dd0e0b12cf311fbcb19f9d92539 (Wed Apr 18 19:48:17 2018 +0000) Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=83699adeb2d13579c31e C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5805208181407744 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6005073343676416 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6555047731134464 Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0) F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock F2FS-fs (loop0): invalid crc value BUG: unable to handle kernel paging request at ffffed006b2a50c0 PGD 21ffee067 P4D 21ffee067 PUD 21fbeb067 PMD 0 Oops: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4514 Comm: syzkaller989480 Not tainted 4.17.0-rc1+ #8 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:build_sit_entries fs/f2fs/segment.c:3653 [inline] RIP: 0010:build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: 0018:ffff8801b102e5b0 EFLAGS: 00010a06 RAX: 1ffff1006b2a50c0 RBX: 0000000000000004 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801ac74243e RBP: ffff8801b102f410 R08: ffff8801acbd46c0 R09: fffffbfff14d9af8 R10: fffffbfff14d9af8 R11: ffff8801acbd46c0 R12: ffff8801ac742a80 R13: ffff8801d9519100 R14: dffffc0000000000 R15: ffff880359528600 FS: 0000000001e04880(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffed006b2a50c0 CR3: 00000001ac6ac000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: f2fs_fill_super+0x4095/0x7bf0 fs/f2fs/super.c:2803 mount_bdev+0x30c/0x3e0 fs/super.c:1165 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020 mount_fs+0xae/0x328 fs/super.c:1268 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037 vfs_kern_mount fs/namespace.c:1027 [inline] do_new_mount fs/namespace.c:2517 [inline] do_mount+0x564/0x3070 fs/namespace.c:2847 ksys_mount+0x12d/0x140 fs/namespace.c:3063 __do_sys_mount fs/namespace.c:3077 [inline] __se_sys_mount fs/namespace.c:3074 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x443d6a RSP: 002b:00007ffd312813c8 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443d6a RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffd312813d0 RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004 R13: 0000000000402c60 R14: 0000000000000000 R15: 0000000000000000 RIP: build_sit_entries fs/f2fs/segment.c:3653 [inline] RSP: ffff8801b102e5b0 RIP: build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: ffff8801b102e5b0 CR2: ffffed006b2a50c0 ---[ end trace a2034989e196ff17 ]--- Reported-and-tested-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08f2fs: return error during fill_superJaegeuk Kim1-4/+12
commit c39a1b348c4fe172729eff77c533dabc3c7cdaa7 upstream. Let's avoid BUG_ON during fill_super, when on-disk was totall corrupted. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08f2fs: fix a panic caused by NULL flush_cmd_controlYunlei He1-1/+4
commit d4fdf8ba0e5808ba9ad6b44337783bd9935e0982 upstream. Mount fs with option noflush_merge, boot failed for illegal address fcc in function f2fs_issue_flush: if (!test_opt(sbi, FLUSH_MERGE)) { ret = submit_flush_wait(sbi); atomic_inc(&fcc->issued_flush); -> Here, fcc illegal return ret; } Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03f2fs: fix to wait page writeback during revoking atomic writeChao Yu1-0/+2
[ Upstream commit e5e5732d8120654159254c16834bc8663d8be124 ] After revoking atomic write, related LBA can be reused by others, so we need to wait page writeback before reusing the LBA, in order to avoid interference between old atomic written in-flight IO and new IO. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03f2fs: fix to don't trigger writeback during recoveryChao Yu1-0/+3
[ Upstream commit 64c74a7ab505ea40d1b3e5d02735ecab08ae1b14 ] - f2fs_fill_super - recover_fsync_data - recover_data - del_fsync_inode - iput - iput_final - write_inode_now - f2fs_write_inode - f2fs_balance_fs - f2fs_balance_fs_bg - sync_dirty_inodes With data_flush mount option, during recovery, in order to avoid entering above writeback flow, let's detect recovery status and do skip in f2fs_balance_fs_bg. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21f2fs: do SSR for data when there is enough free spaceYunlong Song1-1/+1
[ Upstream commit 035e97adab26c1121cedaeb9bd04cf48a8e8cf51 ] In allocate_segment_by_default(), need_SSR() already detected it's time to do SSR. So, let's try to find victims for data segments more aggressively in time. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-28f2fs: sanity check size of nat and sit cacheJin Qian1-0/+7
commit 21d3f8e1c3b7996ce239ab6fa82e9f7a8c47d84d upstream. Make sure number of entires doesn't exceed max journal size. Signed-off-by: Jin Qian <jinqian@android.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12f2fs: avoid to issue redundant discard commandsJaegeuk Kim1-0/+2
commit 8b107f5b97772c7c0c218302e9a4d15b4edf50b4 upstream. If segs_per_sec is over 1 like under SMR, previously f2fs issues discard commands redundantly on the same section, since we didn't move end position for the previous discard command. E.g., start end | | prefree_bitmap = [01111100111100] And, after issue discard for this section, end start | | prefree_bitmap = [01111100111100] Select this section again by searching from (end + 1), start end | | prefree_bitmap = [01111100111100] Fixes: 36abef4e796d38 ("f2fs: introduce mode=lfs mount option") Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-01f2fs: support checkpoint error injectionChao Yu1-0/+5
This patch adds to support checkpoint error injection in f2fs for testing fatal error tolerance, it will be useful that it can simulate abnormal power off by f2fs itself instead of calling godown ioctl by running apps. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-01f2fs: remove redundant value definitionYunlei He1-4/+3
This patch remove redundant value definition in build_sit_entries Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-01f2fs: introduce cp_lock to protect updating of ckpt_flagsChao Yu1-2/+2
This patch introduces spinlock to protect updating process of ckpt_flags field in struct f2fs_checkpoint, it avoids incorrectly updating in race condition. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: use crc and cp version to determine roll-forward recoveryJaegeuk Kim1-22/+0
Previously, we used cp_version only to detect recoverable dnodes. In order to avoid same garbage cp_version, we needed to truncate the next dnode during checkpoint, resulting in additional discard or data write. If we can distinguish this by using crc in addition to cp_version, we can remove this overhead. There is backward compatibility concern where it changes node_footer layout. So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to detect new layout. New layout will be activated only when this flag is set. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-22f2fs: preallocate blocks for encrypted fileYunlei He1-3/+1
This patch allow preallocates data blocks for buffered aio writes in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix to avoid BUG_ON] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-12f2fs: check free_sections for defragmentationJaegeuk Kim1-3/+3
Fix wrong condition check for defragmentation of a file. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-12f2fs: forbid to do fstrim if fs has some errorYunlei He1-0/+6
This patch skip fstrim if sbi set SBI_NEED_FSCK flag Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-08f2fs: fix minor typoChao Yu1-2/+2
Correct typo from 'destory' to 'destroy'. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-08f2fs: support async discardChao Yu1-2/+85
Like most filesystems, f2fs will issue discard command synchronously, so when user trigger fstrim through ioctl, multiple discard commands will be issued serially with sync mode, which makes poor performance. In this patch we try to support async discard, so that all discard commands can be issued and be waited for endio in batch to improve performance. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-08f2fs: schedule in between two continous batch discardsChao Yu1-0/+2
In batch discard approach of fstrim will grab/release gc_mutex lock repeatly, it makes contention of the lock becoming more intensive. So after one batch discards were issued in checkpoint and the lock was released, it's better to do schedule() to increase opportunity of grabbing gc_mutex lock for other competitors. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-30f2fs: check return value of write_checkpoint during fstrimChao Yu1-0/+2
During fstrim, if one of multiple write_checkpoint failed, break off and return error number to caller. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-30f2fs: avoid unneeded loop in build_sit_entriesChao Yu1-16/+32
When building each sit entry in cache, firstly, we will load it from sit page, and then check all entries in sit journal, if there is one updated entry in journal, cover cached entry with the journaled one. Actually, most of check operation is unneeded since we only need to update cached entries with journaled entries in batch, so changing the flow as below for more efficient: 1. load all sit entries into cache from sit pages; 2. update sit entries with journal. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-24f2fs: do not use discard_map for hard disksJaegeuk Kim1-9/+19
We don't need to keep discard_map, if disk does not support discard command. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-27Merge tag 'for-f2fs-4.8' of ↵Linus Torvalds1-7/+52
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The major change in this version is mitigating cpu overheads on write paths by replacing redundant inode page updates with mark_inode_dirty calls. And we tried to reduce lock contentions as well to improve filesystem scalability. Other feature is setting F2FS automatically when detecting host-managed SMR. Enhancements: - ioctl to move a range of data between files - inject orphan inode errors - avoid flush commands congestion - support lazytime Bug fixes: - return proper results for some dentry operations - fix deadlock in add_link failure - disable extent_cache for fcollapse/finsert" * tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits) f2fs: clean up coding style and redundancy f2fs: get victim segment again after new cp f2fs: handle error case with f2fs_bug_on f2fs: avoid data race when deciding checkpoin in f2fs_sync_file f2fs: support an ioctl to move a range of data blocks f2fs: fix to report error number of f2fs_find_entry f2fs: avoid memory allocation failure due to a long length f2fs: reset default idle interval value f2fs: use blk_plug in all the possible paths f2fs: fix to avoid data update racing between GC and DIO f2fs: add maximum prefree segments f2fs: disable extent_cache for fcollapse/finsert inodes f2fs: refactor __exchange_data_block for speed up f2fs: fix ERR_PTR returned by bio f2fs: avoid mark_inode_dirty f2fs: move i_size_write in f2fs_write_end f2fs: fix to avoid redundant discard during fstrim f2fs: avoid mismatching block range for discard f2fs: fix incorrect f_bfree calculation in ->statfs f2fs: use percpu_rw_semaphore ...
2016-07-16f2fs: use blk_plug in all the possible pathsJaegeuk Kim1-1/+6
This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging), and adds blk_plug in write paths additionally. The main reason is that blk_start_plug can be used to wake up from low-power mode before submitting further bios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-16f2fs: add maximum prefree segmentsJaegeuk Kim1-0/+3
In 1TB storage, we need to admit 22841 prefree segments, which can consume too much segments. This patch sets 8GB in max. prefree segments in that case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: fix to avoid redundant discard during fstrimChao Yu1-2/+3
With below test steps, f2fs will issue redundant discard when doing fstrim, the reason is that we issue discards for both prefree segments and consecutive freed region user wants to trim, part regions they covered are overlapped, here, we change to do not to issue any discards for prefree segments in trimmed range. 1. mount -t f2fs -o discard /dev/zram0 /mnt/f2fs 2. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/ 3. dd if=/dev/zero of=/mnt/f2fs/a bs=2M count=1 4. dd if=/dev/zero of=/mnt/f2fs/b bs=1M count=1 5. sync 6. rm /mnt/f2fs/a /mnt/f2fs/b 7. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/ Before: <...>-5428 [001] ...1 9511.052125: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x200 <...>-5428 [001] ...1 9511.052787: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300 After: <...>-6764 [000] ...1 9720.382504: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08f2fs: avoid mismatching block range for discardYunlei He1-0/+4
This patch skip discard block range smaller than trim_minlen, and can not be merged by neighbour Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-06f2fs: produce more nids and reduce readahead natsJaegeuk Kim1-1/+3
The readahead nat pages are more likely to be reclaimed quickly, so it'd better to gather more free nids in advance. And, let's keep some free nids as much as possible. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-06f2fs: detect host-managed SMR by feature flagJaegeuk Kim1-1/+2
If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs by default. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-13f2fs: introduce mode=lfs mount optionJaegeuk Kim1-1/+19
This mount option is to enable original log-structured filesystem forcefully. So, there should be no random writes for main area. Especially, this supports host-managed SMR device. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-08f2fs: drop any block pluggingJaegeuk Kim1-6/+1
In f2fs, we don't need to keep block plugging for NODE and DATA writes, since we already merged bios as much as possible. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-08f2fs: avoid reverse IO order for NODE and DATAJaegeuk Kim1-0/+6
There is a data race between allocate_data_block() and f2fs_sbumit_page_mbio(), which incur unnecessary reversed bio submission. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-07f2fs: use bio op accessorsMike Christie1-5/+7
Separate the op from the rq_flag_bits and have f2fs set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block/fs/drivers: remove rw argument from submit_bioMike Christie1-2/+4
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07f2fs: control not to exceed # of cached nat entriesJaegeuk Kim1-0/+5
This is to avoid cache entry management overhead including radix tree. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-03f2fs: detect congestion of flush command issuesJaegeuk Kim1-1/+6
If flush commands do not incur any congestion, we don't need to throw that to dispatching queue which causes unnecessary latency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-03f2fs: use inode pointer for {set, clear}_inode_flagJaegeuk Kim1-1/+1
This patch refactors to use inode pointer for set_inode_flag and clear_inode_flag. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-03f2fs: fix to clear page private flagChao Yu1-0/+1
Commit 28bc106b2346 ("f2fs: support revoking atomic written pages") forgot to clear page private flag correctly, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: don't invalidate atomic page if successfulJaegeuk Kim1-2/+3
If we committed atomic write successfully, we don't need to invalidate pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: unset atomic/volatile flag in f2fs_release_fileJaegeuk Kim1-0/+2
The atomic/volatile operation should be done in pair of start and commit ioctl. For example, if a killed process remains open-ended atomic operation, we should drop its flag as well as its atomic data. Otherwise, if sqlite initiates another operation which doesn't require atomic writes, it will lose every data, since f2fs still treats with them as atomic writes; nobody will trigger its commit. Reported-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov1-8/+8
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-26f2fs: introduce f2fs_update_data_blkaddr for cleanupChao Yu1-4/+2
Add a new help f2fs_update_data_blkaddr to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26f2fs crypto: fix incorrect positioning for GCing encrypted data pageChao Yu1-2/+1
For now, flow of GCing an encrypted data page: 1) try to grab meta page in meta inode's mapping with index of old block address of that data page 2) load data of ciphertext into meta page 3) allocate new block address 4) write the meta page into new block address 5) update block address pointer in direct node page. Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to check and wait on GCed encrypted data cached in meta page writebacked in order to avoid inconsistence among data page cache, meta page cache and data on-disk when updating. However, we will use new block address updated in step 5) as an index to lookup meta page in inner bio buffer. That would be wrong, and we will never find the GCing meta page, since we use the old block address as index of that page in step 1). This patch fixes the issue by adjust the order of step 1) and step 3), and in step 1) grab page with index generated in step 3). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-23f2fs: trace old block address for CoWed pageChao Yu1-4/+6
This patch enables to trace old block address of CoWed page for better debugging. f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-23f2fs: split journal cache from curseg cacheChao Yu1-24/+61
In curseg cache, f2fs caches two different parts: - datas of current summay block, i.e. summary entries, footer info. - journal info, i.e. sparse nat/sit entries or io stat info. With this approach, 1) it may cause higher lock contention when we access or update both of the parts of cache since we use the same mutex lock curseg_mutex to protect the cache. 2) current summary block with last journal info will be writebacked into device as a normal summary block when flushing, however, we treat journal info as valid one only in current summary, so most normal summary blocks contain junk journal data, it wastes remaining space of summary block. So, in order to fix above issues, we split curseg cache into two parts: a) current summary block, protected by original mutex lock curseg_mutex b) journal cache, protected by newly introduced r/w semaphore journal_rwsem When loading curseg cache during ->mount, we store summary info and journal info into different caches; When doing checkpoint, we combine datas of two cache into current summary block for persisting. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>