summaryrefslogtreecommitdiff
path: root/fs/f2fs/segment.c
AgeCommit message (Collapse)AuthorFilesLines
2018-05-31f2fs: correct return value of f2fs_trim_fsChao Yu1-2/+2
Correct return value in two cases: - return EINVAL if end boundary is out-of-range. - return EIO if fs needs off-line check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: don't use GFP_ZERO for page cachesChao Yu1-0/+3
Related to https://lkml.org/lkml/2018/4/8/661 Sometimes, we need to write meta data to new allocated block address, then we will allocate a zeroed page in inner inode's address space, and fill partial data in it, and leave other place with zero value which means some fields are initial status. There are two inner inodes (meta inode and node inode) setting __GFP_ZERO, I have just checked them, for both of them, we can avoid using __GFP_ZERO, and do initialization by ourselves to avoid unneeded/redundant zeroing from mm. Cc: <stable@vger.kernel.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: issue all big range discards in umount processYunlei He1-0/+1
This patch modify max_requests to UINT_MAX, to issue all big range discards in umount. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31f2fs: run fstrim asynchronously if runtime discard is onJaegeuk Kim1-2/+11
We don't need to wait for whole bunch of discard candidates in fstrim, since runtime discard will issue them in idle time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30f2fs: turn down IO priority of discard from backgroundChao Yu1-0/+1
In order to avoid interfering normal r/w IO, let's turn down IO priority of discard issued from background. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30f2fs: don't split checkpoint in fstrimChao Yu1-27/+12
Now, we issue discard asynchronously in separated thread instead of in checkpoint, after that, we won't encounter long latency in checkpoint due to huge number of synchronous discard command handling, so, we don't need to split checkpoint to do trim in batch, merge it and obsolete related sysfs entry. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30f2fs: issue discard commands proactively in high fs utilizationJaegeuk Kim1-32/+39
In the high utilization like over 80%, we don't expect huge # of large discard commands, but do many small pending discards which affects FTL GCs a lot. Let's issue them in that case. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29f2fs: let fstrim issue discard commands in lower priorityJaegeuk Kim1-64/+75
The fstrim gathers huge number of large discard commands, and tries to issue without IO awareness, which results in long user-perceive IO latencies on READ, WRITE, and FLUSH in UFS. We've observed some of commands take several seconds due to long discard latency. This patch limits the maximum size to 2MB per candidate, and check IO congestion when issuing them to disk. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-03f2fs: clear PageError on writepageJaegeuk Kim1-0/+1
This patch clears PageError in some pages tagged by read path, but when we write the pages with valid contents, writepage should clear the bit likewise ext4. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-28f2fs: Add a segment type check in inplace writeYunlei He1-0/+5
This patch add a segment type check in IPU, in case of something wrong with blkadd in dnode. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: wrap all options with f2fs_sb_info.mount_optChao Yu1-4/+4
This patch merges miscellaneous mount options into struct f2fs_mount_info, After this patch, once we add new mount option, we don't need to worry about recovery of it in remount_fs(), since we will recover the f2fs_sb_info.mount_opt including all options. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: Don't overwrite all types of node to keep node chainYunlei He1-1/+1
Currently, we enable node SSR by default, and mixed different types of node segment to do SSR more intensively. Although reuse warm node is not allowed, warm node chain will be destroyed by errors introduced by other types node chain. So we'd better forbid reusing all types of node to keep warm node chain. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13f2fs: support hot file extensionChao Yu1-1/+2
This patch supports to recognize hot file extension in f2fs, so that we can allocate proper hot segment location for its data, which can lead to better hot/cold seperation in filesystem. In addition, we changes a bit on query/add/del operation method for extension_list sysfs entry as below: - Query: cat /sys/fs/f2fs/<disk>/extension_list - Add: echo 'extension' > /sys/fs/f2fs/<disk>/extension_list - Del: echo '!extension' > /sys/fs/f2fs/<disk>/extension_list - Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list - Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list - [h] means add/del hot file extension - [c] means add/del cold file extension Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13f2fs: issue discard aggressively in the gc_urgent modeJaegeuk Kim1-6/+5
This patch avoids to skip discard commands when user sets gc_urgent mode. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13f2fs: add mount option for segment allocation policyJaegeuk Kim1-0/+5
This patch adds an mount option, "alloc_mode=%s" having two options, "default" and "reuse". In "alloc_mode=reuse" case, f2fs starts to allocate segments from 0'th segment all the time to reassign segments. It'd be useful for small-sized eMMC parts. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13f2fs: clean up f2fs_sb_has_xxx functionsSheng Yong1-2/+2
This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of different f2fs_sb_has_xxx functions. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13f2fs: support passing down write hints to block layer with F2FS policyHyunchul Lee1-9/+48
Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes down write hints with its policy. * whint_mode=fs-based. F2FS passes down hints with its policy. User F2FS Block ---- ---- ----- META WRITE_LIFE_MEDIUM; HOT_NODE WRITE_LIFE_NOT_SET WARM_NODE " COLD_NODE WRITE_LIFE_NONE ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME extension list " " -- buffered io WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG WRITE_LIFE_NONE " " WRITE_LIFE_MEDIUM " " WRITE_LIFE_LONG " " -- direct io WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET WRITE_LIFE_NONE " WRITE_LIFE_NONE WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM WRITE_LIFE_LONG " WRITE_LIFE_LONG Many thanks to Chao Yu and Jaegeuk Kim for comments to implement this patch. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13f2fs: support passing down write hints given by users to block layerHyunchul Lee1-0/+59
Add the 'whint_mode' mount option that controls which write hints are passed down to block layer. There are "off" and "user-based" mode. The default mode is "off". 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET. 2) whint_mode=user-based. F2FS tries to pass down hints given by users. User F2FS Block ---- ---- ----- META WRITE_LIFE_NOT_SET HOT_NODE " WARM_NODE " COLD_NODE " ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME extension list " " -- buffered io WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET WRITE_LIFE_NONE " " WRITE_LIFE_MEDIUM " " WRITE_LIFE_LONG " " -- direct io WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET WRITE_LIFE_NONE " WRITE_LIFE_NONE WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM WRITE_LIFE_LONG " WRITE_LIFE_LONG Many thanks to Chao Yu and Jaegeuk Kim for comments to implement this patch. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: avoid build warning] [Chao Yu: fix to restore whint_mode in ->remount_fs] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13f2fs: fix heap mode to reset it backYunlong Song1-1/+2
Commit 7a20b8a61eff81bdb7097a578752a74860e9d142 ("f2fs: allocate node and hot data in the beginning of partition") introduces another mount option, heap, to reset it back. But it does not do anything for heap mode, so fix it. Cc: stable@vger.kernel.org Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25f2fs: rebuild sit page from sit info in memYunlei He1-14/+5
This patch rebuild sit page from sit info in mem instead of issue a read io. I test this method and the result is as below: Pre: mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit Some sit flush consume more than 50ms. Now: mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit Most sit flush consume no more than 1ms. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25f2fs: stop issuing discard if fs is readonlyChao Yu1-0/+2
If filesystem is readonly, stop to issue discard in daemon. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25f2fs: clean up duplicated assignment in init_discard_policyChao Yu1-8/+3
Remove duplicated codes of assignment for .max_requests and .io_aware_gran. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: stop gc/discard thread after fs shutdownChao Yu1-0/+5
Once filesystem shuts down, daemons like gc/discard thread should be aware of it, and do exit, in addtion, drop all cached pending discard commands and turn off real-time discard mode. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-19f2fs: handle newly created page when revoking inmem pagesDaeho Jeong1-1/+5
When committing inmem pages is successful, we revoke already committed blocks in __revoke_inmem_pages() and finally replace the committed ones with the old blocks using f2fs_replace_block(). However, if the committed block was newly created one, the address of the old block is NEW_ADDR and __f2fs_replace_block() cannot handle NEW_ADDR as new_blkaddr properly and a kernel panic occurrs. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Tested-by: Shu Tan <shu.tan@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-17f2fs: avoid high cpu usage in discard threadChao Yu1-0/+3
We take very long time to finish generic/476, this is because we will check consistence of all discard entries in global rb tree while traversing all different granularity pending lists, even when the list is empty, in order to avoid that unneeded overhead, we have to skip the check when coming up an empty list. generic/476 time consumption: cost Before patch & w/o consistence check 57s Before patch & w/ consistence check 1426s After patch 78s Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-17f2fs: make local functions staticWei Yongjun1-2/+2
Fixes the following sparse warnings: fs/f2fs/segment.c:887:6: warning: symbol '__check_sit_bitmap' was not declared. Should it be static? fs/f2fs/segment.c:1327:6: warning: symbol 'f2fs_wait_discard_bio' was not declared. Should it be static? fs/f2fs/super.c:1661:5: warning: symbol 'f2fs_get_projid' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-17f2fs: check segment type in __f2fs_replace_blockYunlong Song1-0/+1
In some case, the node blocks has wrong blkaddr whose segment type is NODE, e.g., recover inode has missing xattr flag and the blkaddr is in the xattr range. Since fsck.f2fs does not check the recovery nodes, this will cause __f2fs_replace_block change the curseg of node and do the update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a result, when recovery process write checkpoint and sync nodes, the next_blkoff of curseg is used in the segment bit map, then it will cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-03f2fs: return error during fill_superJaegeuk Kim1-4/+12
Let's avoid BUG_ON during fill_super, when on-disk was totall corrupted. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-03f2fs: no need return value in restore summary processYunlei He1-11/+3
No need return value in restore summary process Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-03f2fs: spread f2fs_k{m,z}allocChao Yu1-1/+1
Use f2fs_k{m,z}alloc as much as possible to increase fault injection points. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-03f2fs: inject fault to kvmallocChao Yu1-7/+9
This patch supports to inject fault into kvmalloc/kvzalloc. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-03f2fs: inject fault to kzallocChao Yu1-14/+16
This patch introduces f2fs_kzalloc based on f2fs_kmalloc in order to support error injection for kzalloc(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-09f2fs: apply write hints to select the type of segments for buffered writeHyunchul Lee1-1/+13
Write hints helps F2FS to determine which type of segments would be selected for buffered write. This patch implements the mapping from write hints to segment types as shown below. hints segment type ----- ------------ WRITE_LIFE_SHORT CURSEG_HOT_DATA WRITE_LIFE_EXTREME CURSEG_COLD_DATA others CURSEG_WARM_DATA the F2FS poliy for hot/cold seperation has precedence over this hints. And hints are not applied in in-place update. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-06f2fs: fix summary info corruptionChao Yu1-1/+27
Sometimes, after running generic/270 of fstest, fsck reports summary info and actual position of block address in direct node becoming inconsistent. The root cause is race in between __f2fs_replace_block and change_curseg as below: Thread A Thread B - __clone_blkaddrs - f2fs_replace_block - __f2fs_replace_block - segnoA = GET_SEGNO(sbi, blkaddrA); - type = se->type:=CURSEG_HOT_DATA - if (!IS_CURSEG(sbi, segnoA)) type = CURSEG_WARM_DATA - allocate_data_block - allocate_segment - get_ssr_segment - change_curseg(segnoA, CURSEG_HOT_DATA) - change_curseg(segnoA, CURSEG_WARM_DATA) - reset_curseg - __set_sit_entry_type - change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA So finally, hot curseg locates in segnoA, but type of segnoA becomes CURSEG_WARM_DATA. Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false), as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA, then change its summary cache and writeback it to summary block. But segnoA is used by hot type curseg too, once it moves or persist, it will cover summary block content with inner old summary cache, result in inconsistent status. This patch tries to fix this issue by introduce global curseg lock to avoid race in between __f2fs_replace_block and change_curseg. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-06f2fs: remove dead code in update_meta_pageChao Yu1-5/+1
After commit a468f0ef516f ("f2fs: use crc and cp version to determine roll-forward recovery"), last caller of update_meta_page passing @src with NULL is gone, so remove related dead code there. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-06f2fs: use rw_semaphore to protect SIT cacheChao Yu1-15/+19
There are some cases user didn't update SIT cache under this lock, so let's use rw_semaphore instead of mutex to enhance concurrently accessing. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-06Revert "f2fs: handle dirty segments inside refresh_sit_entry"Yunlong Song1-13/+14
This reverts commit 5e443818fa0b2a2845561ee25bec181424fb2889 The commit should be reverted because call sequence of below two parts of code must be kept: a. update sit information, it needs to be updated before segment allocation since latter allocation may trigger SSR, and SSR allocation needs latest valid block information of all segments. b. update segment status, it needs to be updated after segment allocation since we can skip updating current opened segment status. Fixes: 5e443818fa0b ("f2fs: handle dirty segments inside refresh_sit_entry") Suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: remove refresh_sit_entry function] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-06f2fs: export SSR allocation thresholdChao Yu1-1/+2
This patch exports min_ssr_segments threshold in sysfs to let user control triggering SSR allocation flexibly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-06f2fs: give correct trimmed blocks in fstrimChao Yu1-8/+19
We have supported to issue discard in specified range during fstrim, it needs to return caller with successfully trimmed bytes in that range instead of bytes of invalid blocks which are scanned in checkpoint. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-06f2fs: support bio allocation error injectionChao Yu1-1/+1
This patch adds to support bio allocation error injection to simulate out-of-memory test scenario. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: remove several redundant assignmentsColin Ian King1-5/+1
There are several assignments to variables that are redundant as the values are never read when the variables are updated later and so the redundant statements can be safely removed. Cleans up clang warnings: fs/f2fs/segment.c:923:19: warning: Value stored to 'p' during its initialization is never read fs/f2fs/segment.c:2060:2: warning: Value stored to 'hint' is never read fs/f2fs/segment.c:2353:2: warning: Value stored to 'start_block' is never read fs/f2fs/segment.c:2354:2: warning: Value stored to 'end_block' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: limit # of inmemory pagesJaegeuk Kim1-0/+38
If some abnormal users try lots of atomic write operations, f2fs is able to produce pinned pages in the main memory which affects system performance. This patch limits that as 20% over total memory size, and if f2fs reaches to the limit, it will drop all the inmemory pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: give up CP_TRIMMED_FLAG if it drops discardsChao Yu1-3/+10
In ->umount, once we drop remained discard entries, we should not set CP_TRIMMED_FLAG with another checkpoint. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: trace f2fs_remove_discardChao Yu1-0/+2
This patch adds tracepoint to trace f2fs_remove_discard. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: reduce cmd_lock coverage in __issue_discard_cmdChao Yu1-8/+10
__submit_discard_cmd may lead long latency due to exhaustion of I/O request resource in block layer, so issuing all discard under cmd_lock may lead to hangtask, in order to avoid that, let's reduce it's coverage. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: split discard policyChao Yu1-76/+73
There are many different scenarios such as fstrim, umount, urgent or background where we will issue discards, actually, they need use different policy in aspect of io aware, discard granularity, delay interval and so on. But now they just share one common discard policy, so there will be race when changing policy in between these scenarios, the interference of changing discard policy will be very serious. This patch changes to split discard policy for different scenarios. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: wrap discard policyChao Yu1-9/+29
This patch wraps scattered optional parameters into discard policy as below, later, with it we expect that we can adjust these parameters with proper strategy in different scenario. struct discard_policy { unsigned int min_interval; /* used for candidates exist */ unsigned int max_interval; /* used for candidates not exist */ unsigned int max_requests; /* # of discards issued per round */ unsigned int io_aware_gran; /* minimum granularity discard not be aware of I/O */ bool io_aware; /* issue discard in idle time */ bool sync; /* submit discard with REQ_SYNC flag */ }; This patch doesn't change any logic of codes. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: support issuing/waiting discard in rangeChao Yu1-21/+106
Fstrim intends to trim invalid blocks of filesystem only with specified range and granularity, but actually, it will issue all previous cached discard commands which may be out-of-range and be with unmatched granularity, it's unneeded. In order to fix above issues, this patch introduces new helps to support to issue and wait discard in range and adds a new fstrim_list for tracking in-flight discard from ->fstrim. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-10f2fs: fix to flush multiple device in checkpointChao Yu1-0/+29
If f2fs manages multiple devices, in checkpoint, we need to issue flush in those devices which contain dirty data/node in their cache before we write checkpoint region, otherwise, filesystem metadata could be corrupted if hitting SPO after checkpoint. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-10f2fs: enhance multiple device flushChao Yu1-11/+35
When multiple device feature is enabled, during ->fsync we will issue flush in all devices to make sure node/data of the file being persisted into storage. But some flushes of device could be unneeded as file's data may be not writebacked into those devices. So this patch adds and manage bitmap per inode in global cache to indicate which device is dirty and it needs to issue flush during ->fsync, hence, we could improve performance of fsync in scenario of multiple device. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>