Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch has no functional change.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.
Internal block management is very simple, but we will modify block allocation
and background GC policy to boost IO speed by exploiting them accoording to
each device speed.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We can allow dio reads for LFS mode, while doing buffered writes for dio writes.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now we don't need to be too much careful about storage alignment for dio, since
its speed becomes quite fast and we'd better avoid any misalignment first.
Revert: 38aa0889b250 (f2fs: align direct_io'ed data to section)
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
SMR stands for "Shingled Magnetic Recording" which makes sense
only for hard disk drives (spinning rust). The ZBC/ZAC standards
enable management of SMR disks, but solid state drives may also
support those standards. So rename the HMSMR feature to BLKZONED
to avoid a HDD centric terminology. For the same reason, rename
f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
gcc is unsure about the use of last_ofs_in_node, which might happen
without a prior initialization:
fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’:
fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {
As pointed out by Chao Yu, the code is actually correct as 'prealloc'
is only set if the last_ofs_in_node has been set, the two always
get updated together.
This initializes last_ofs_in_node to dn.ofs_in_node for each
new dnode at the start of the 'next_block' loop, which at that
point is a correct initialization as well. I assume that compilers
that correctly track the contents of the variables and do not
warn about the condition also figure out that they can eliminate
the extra assignment here.
Fixes: 46008c6d4232 ("f2fs: support in batch multi blocks preallocation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We don't need to allocate bio partially in order to maximize sequential writes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs can support fallocating blocks beyond file size without changing the
size, but ->fiemap of f2fs was restricted and can't detect these extents
fallocated past EOF, now relieve the restriction.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_map_blocks, let f2fs_balance_fs detects node page modification
with dn.node_changed to avoid miss some corner cases.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If there is no dirty pages in inode, we should give a chance to detach
the inode from global dirty list, otherwise it needs to call another
unnecessary .writepages for detaching.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Avoid re-use of page index as tweak for AES-XTS when multiple parts of
same page are encrypted. This will happen on multiple (partial) calls of
fscrypt_encrypt_page on same page.
page->index is only valid for writeback pages.
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Not all filesystems work on full pages, thus we should allow them to
hand partial pages to fscrypt for en/decryption.
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Add wbc_to_write_flags(), which returns the write modifier flags to use,
based on a struct writeback_control. No functional changes in this
patch, but it prepares us for factoring other wbc fields for write type.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly. Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The mapping_set_error() helper sets the correct AS_ flag for the mapping
so there is no reason to open code it. Use the helper directly.
[akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
While we call ->writepages, there are two cases:
a. we didn't writeout any dirty pages, since they are writebacked by other
thread concurrently.
b. we writeout dirty pages, and have already submitted bio to block layer.
In these cases, we don't need to do additional bio flushing unnecessarily,
it may split bio in cache into smaller one.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.
It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.
>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch improves the migration of dirty pages and allows migrating atomic
written pages that F2FS uses in Page Cache. Instead of the fallback releasing
page path, it provides better performance for memory compaction, CMA and other
users of memory page migrating. For dirty pages, there is no need to write back
first when migrating. For an atomic written page before committing, we can
migrate the page and update the related 'inmem_pages' list at the same time.
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix some coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch allow preallocates data blocks for buffered aio writes
in encrypted file.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix to avoid BUG_ON]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to support IO error injection for testing IO error
tolerance of f2fs.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Previously, f2fs_write_begin sets PageUptodate all the time. But, when user
tries to update the entire page (i.e., len == PAGE_SIZE), we need to consider
that the page is able to be copied partially afterwards. In such the case,
we will lose the remaing region in the page.
This patch fixes this by setting PageUptodate in f2fs_write_end as given copied
result. In the short copy case, it returns zero to let generic_perform_write
retry copying user data again.
As a result, f2fs_write_end() works:
PageUptodate len copied return retry
1. no 4096 4096 4096 false -> return 4096
2. no 4096 1024 0 true -> goto #1 case
3. yes 2048 2048 2048 false -> return 2048
4. yes 2048 1024 1024 false -> return 1024
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fix wrong condition check for defragmentation of a file.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We don't need to make zeros beyond i_size, since we already wrote that through
NEW_ADDR case.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In write_begin(), we skip checking dnode block for preallocating block
when whole block needs to be updated since we preallocated its block in
f2fs_preallocate_blocks, for partial updated block, we will still try
to lock its node and do preallocation in write_begin(), so in
f2fs_preallocate_blocks we should not preallocate its block.
But previously, the calculation of preallocating block number is
incorrect, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fixes the following sparse warning:
fs/f2fs/data.c:969:12: warning:
symbol 'f2fs_grab_bio' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If we preallocate blocks with f2fs_reserve_blocks in f2fs_map_blocks, we
should call f2fs_balance_fs for checking and reclaiming space, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This reverts commit a2ee0a300344a6da76186129b078113354fe13d2.
When testing with generic/032 of xfstest suit, failure message will be
reported as below:
generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
--- tests/generic/032.out 2015-01-11 16:52:27.643681072 +0800
+++ results/generic/032.out.bad 2016-08-06 13:44:43.861330500 +0800
@@ -1,5 +1,5 @@
QA output created by 032
-100 iterations
-0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
-*
-0100000
+1: [768..775]: unwritten
+Unwritten extents found!
...
(Run 'diff -u tests/generic/032.out results/generic/032.out.bad' to see the entire diff)
Ran: generic/032
Failures: generic/032
Failed 1 of 1 tests
In write_end(), we should update i_size of inode before unlock page,
otherwise, we will lose newly updated data in following race condition.
Thread A Thread B
- write_end
- unlock page
- writepages
- lock_page
- writepage
if page is out-of-range of file size,
we will skip writting the page.
- update i_size
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Merge 4fc29c1aa375 included this extra line, but it's not needed (or
useful) since we'll bio_set_op_attrs() right after to properly set
the op and flags for the bio.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"The major change in this version is mitigating cpu overheads on write
paths by replacing redundant inode page updates with mark_inode_dirty
calls. And we tried to reduce lock contentions as well to improve
filesystem scalability. Other feature is setting F2FS automatically
when detecting host-managed SMR.
Enhancements:
- ioctl to move a range of data between files
- inject orphan inode errors
- avoid flush commands congestion
- support lazytime
Bug fixes:
- return proper results for some dentry operations
- fix deadlock in add_link failure
- disable extent_cache for fcollapse/finsert"
* tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
f2fs: clean up coding style and redundancy
f2fs: get victim segment again after new cp
f2fs: handle error case with f2fs_bug_on
f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
f2fs: support an ioctl to move a range of data blocks
f2fs: fix to report error number of f2fs_find_entry
f2fs: avoid memory allocation failure due to a long length
f2fs: reset default idle interval value
f2fs: use blk_plug in all the possible paths
f2fs: fix to avoid data update racing between GC and DIO
f2fs: add maximum prefree segments
f2fs: disable extent_cache for fcollapse/finsert inodes
f2fs: refactor __exchange_data_block for speed up
f2fs: fix ERR_PTR returned by bio
f2fs: avoid mark_inode_dirty
f2fs: move i_size_write in f2fs_write_end
f2fs: fix to avoid redundant discard during fstrim
f2fs: avoid mismatching block range for discard
f2fs: fix incorrect f_bfree calculation in ->statfs
f2fs: use percpu_rw_semaphore
...
|
|
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2
- most(?) of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in <linux/compaction.h>
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...
|
|
Vladimir has noticed that we might declare memcg oom even during
readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
restriction) while __do_page_cache_readahead uses
page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
OOMs. This gfp mask discrepancy is really unfortunate and easily
fixable. Drop page_cache_alloc_readahead() which only has one user and
outsource the gfp_mask logic into readahead_gfp_mask and propagate this
mask from __do_page_cache_readahead down to read_pages.
This alone would have only very limited impact as most filesystems are
implementing ->readpages and the common implementation mpage_readpages
does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
use readahead_gfp_mask instead as this function is called only during
readahead as well. The same applies to read_cache_pages.
ext4 has its own ext4_mpage_readpages but the path which has pages !=
NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
doing a very similar pattern to mpage_readpages so the same can be
applied to them as well.
[akpm@linux-foundation.org: coding-style fixes]
[mhocko@suse.com: restrict gfp mask in mpage_alloc]
Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Chris Mason <clm@fb.com>
Cc: Steve French <sfrench@samba.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Changman Lee <cm224.lee@samsung.com>
Cc: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch includes minor clean-ups.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging),
and adds blk_plug in write paths additionally.
The main reason is that blk_start_plug can be used to wake up from low-power
mode before submitting further bios.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Datas in file can be operated by GC and DIO simultaneously, so we will
face race case as below:
For write case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- dio_bio_submit
update user data to old block address
For read case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_balance_fs
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- write_checkpoint
- do_checkpoint
- clear_prefree_segments
- f2fs_issue_discard
discard old block adress
- dio_bio_submit
update user buffer from obsolete block address
In order to fix this, for one file, we should let DIO and GC getting exclusion
against with each other.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This is to fix wrong error pointer handling flow reported by Dan.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We don't need to do i_size_write under page lock.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
SetPageUptodate() issues memory barrier, resulting in performance degrdation.
Let's avoid that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
overall performance.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In procedure of synchonized read, after sending out the read request, reader
will try to lock the page for waiting device to finish the read jobs and
unlock the page, but meanwhile, truncater will race with reader, so after
reader get lock of the page, it should check page's mapping to detect
whether someone has truncated the page in advance, then reader has the
chance to do the retry if truncation was done, otherwise read can be failed
due to previous condition check.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
For encrypted inode, if user overwrites data of the inode, f2fs will read
encrypted data into page cache, and then do the decryption.
However reader can race with overwriter, and it will see encrypted data
which has not been decrypted by overwriter yet. Fix it by moving decrypting
work to background and keep page non-uptodated until data is decrypted.
Thread A Thread B
- f2fs_file_write_iter
- __generic_file_write_iter
- generic_perform_write
- f2fs_write_begin
- f2fs_submit_page_bio
- generic_file_read_iter
- do_generic_file_read
- lock_page_killable
- unlock_page
- copy_page_to_iter
hit the encrypted data in updated page
- lock_page
- fscrypt_decrypt_page
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The f2fs_map_blocks is very related to the performance, so let's avoid any
latency to read ahead node pages.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs
by default.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This mount option is to enable original log-structured filesystem forcefully.
So, there should be no random writes for main area.
Especially, this supports host-managed SMR device.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs, we don't need to keep block plugging for NODE and DATA writes, since
we already merged bios as much as possible.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If EIO occurred, we need to set all the mapping to avoid any further IOs.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Separate the op from the rq_flag_bits and have f2fs
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Fixed up fs/ext4/crypto.c
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Previously, f2fs_write_data_pages() calls __f2fs_writepage() which calls
f2fs_write_data_page().
If f2fs_write_data_page() returns AOP_WRITEPAGE_ACTIVATE, __f2fs_writepage()
calls mapping_set_error(). But, this should not happen at every time, since
sometimes f2fs_write_data_page() tries to skip writing pages without error.
For example, volatile_write() gives EIO all the time, as Shuoran Liu pointed
out.
Reported-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If there is no cold page, we don't need to do a loop to flush dirty
data pages.
On /dev/pmem0,
1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync
Before : 1.1 GB/s
After : 1.2 GB/s
2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048
Before : 2.2 GB/s
After : 2.3 GB/s
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
For data pages, let's try to flush as much as possible in background.
On /dev/pmem0,
1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync
Before : 800 MB/s
After : 1.1 GB/s
2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048
Before : 1.3 GB/s
After : 2.2 GB/s
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|