Age | Commit message (Collapse) | Author | Files | Lines |
|
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Sharing f2fs_ci_compare() between comparing cached dentries
(f2fs_d_compare()) and comparing on-disk dentries (f2fs_match_name())
doesn't work as well as intended, as these actions fundamentally differ
in several ways (e.g. whether the task may sleep, whether the directory
is stable, whether the casefolded name was precomputed, whether the
dentry will need to be decrypted once we allow casefold+encrypt, etc.)
Just make f2fs_d_compare() implement what it needs directly, and rework
f2fs_ci_compare() to be specialized for f2fs_match_name().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We need to call fscrypt_free_filename() to free the memory allocated by
fscrypt_setup_filename().
Fixes: b06af2aff28b ("f2fs: convert inline_dir early before starting rename")
Cc: <stable@vger.kernel.org> # v5.6+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
LZO-RLE extension (run length encoding) was introduced to improve
performance of LZO algorithm in scenario of data contains many zeros,
zram has changed to use this extended algorithm by default, this
patch adds to support this algorithm extension, to enable this
extension, it needs to enable F2FS_FS_LZO and F2FS_FS_LZORLE config,
and specifies "compress_algorithm=lzo-rle" mountoption.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If compression feature is on, in scenario of no enough free memory,
page refault ratio is higher than before, the root cause is:
- {,de}compression flow needs to allocate intermediate pages to store
compressed data in cluster, so during their allocation, vm may reclaim
mmaped pages.
- if above reclaimed pages belong to compressed cluster, during its
refault, it may cause more intermediate pages allocation, result in
reclaiming more mmaped pages.
So this patch introduces a mempool for intermediate page allocation,
in order to avoid high refault ratio, by default, number of
preallocated page in pool is 512, user can change the number by
assigning 'num_compress_pages' parameter during module initialization.
Ma Feng found warnings in the original patch and fixed like below.
Fix the following sparse warning:
fs/f2fs/compress.c:501:5: warning: symbol 'num_compress_pages' was not declared.
Should it be static?
fs/f2fs/compress.c:530:6: warning: symbol 'f2fs_compress_free_page' was not
declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
to support bmap() on compressed inode: if queried block locates in
non-compressed cluster, return its physical block address.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Map normal/compressed cluster of compressed inode correctly, and give
the right fiemap flag FIEMAP_EXTENT_ENCODED on mapped compressed extent.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Supports to truncate compressed/normal cluster partially on compressed
inode.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
due to f2fs_post_read_required() has did that.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Remove the pointless string length checks. Just use strcmp().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch corrects the SPDX License Identifier style in
header files related to F2FS File System support.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used).
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
<linux/cryptohash.h> sounds very generic and important, like it's the
header to include if you're doing cryptographic hashing in the kernel.
But actually it only includes the library implementation of the SHA-1
compression function (not even the full SHA-1). This should basically
never be used anymore; SHA-1 is no longer considered secure, and there
are much better ways to do cryptographic hashing in the kernel.
Most files that include this header don't actually need it. So in
preparation for removing it, remove all these unneeded includes of it.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
f2fs_quota_sync() uses f2fs_lock_op() before flushing dirty pages, but
f2fs_write_data_page() returns EAGAIN.
Likewise dentry blocks, we can just bypass getting the lock, since quota
blocks are also maintained by checkpoint.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Adds to support accounting read IOs from userspace/kernel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When a discard_cmd needs to be split due to dpolicy->max_requests, then
for the remaining length it will be either merged into another cmd or a
new discard_cmd will be created. In this case, there is double
accounting of dcc->undiscard_blks for the remaining len, due to which
it shows incorrect value in stats.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_ra_meta_pages(), if f2fs_submit_page_bio() failed, we need to
unlock page, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In case a discard_cmd is split into several bios, the dc->error
must not be overwritten once an error is reported by a bio. Also,
move it under dc->lock.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
F2FS already has a default timeout of 5 secs for discards that
can be issued during umount, but it can take more than the 5 sec
timeout if the underlying UFS device queue is already full and there
are no more available free tags to be used. Fix this by submitting a
small batch of discard requests so that it won't cause the device
queue to be full at any time and thus doesn't incur its wait time
in the umount context.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Added a tracepoint to see iostat of f2fs. Default period of that
is 3 second. This tracepoint can be used to be monitoring
I/O statistics periodically.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch introduces a way to attach REQ_META/FUA explicitly
to all the data writes given temperature.
-> attach REQ_FUA to Hot Data writes
-> attach REQ_FUA to Hot|Warm Data writes
-> attach REQ_FUA to Hot|Warm|Cold Data writes
-> attach REQ_FUA to Hot|Warm|Cold Data writes as well as
REQ_META to Hot Data writes
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've mainly focused on fixing bugs and addressing
issues in recently introduced compression support.
Enhancement:
- add zstd support, and set LZ4 by default
- add ioctl() to show # of compressed blocks
- show mount time in debugfs
- replace rwsem with spinlock
- avoid lock contention in DIO reads
Some major bug fixes wrt compression:
- compressed block count
- memory access and leak
- remove obsolete fields
- flag controls
Other bug fixes and clean ups:
- fix overflow when handling .flags in inode_info
- fix SPO issue during resize FS flow
- fix compression with fsverity enabled
- potential deadlock when writing compressed pages
- show missing mount options"
* tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
f2fs: keep inline_data when compression conversion
f2fs: fix to disable compression on directory
f2fs: add missing CONFIG_F2FS_FS_COMPRESSION
f2fs: switch discard_policy.timeout to bool type
f2fs: fix to verify tpage before releasing in f2fs_free_dic()
f2fs: show compression in statx
f2fs: clean up dic->tpages assignment
f2fs: compress: support zstd compress algorithm
f2fs: compress: add .{init,destroy}_decompress_ctx callback
f2fs: compress: fix to call missing destroy_compress_ctx()
f2fs: change default compression algorithm
f2fs: clean up {cic,dic}.ref handling
f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages()
f2fs: xattr.h: Make stub helpers inline
f2fs: fix to avoid double unlock
f2fs: fix potential .flags overflow on 32bit architecture
f2fs: fix NULL pointer dereference in f2fs_verity_work()
f2fs: fix to clear PG_error if fsverity failed
f2fs: don't call fscrypt_get_encryption_info() explicitly in f2fs_tmpfile()
f2fs: don't trigger data flush in foreground operation
...
|
|
We can keep compressed inode's data inline before inline conversion.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It needs to call f2fs_disable_compressed_file() to disable
compression on directory.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Compression sysfs node should not be shown if f2fs module disables
compression feature.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
While checking discard timeout, we use specified type
UMOUNT_DISCARD_TIMEOUT, so just replace doplicy.timeout with
it, and switch doplicy.timeout to bool type.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In below error path, tpages[i] could be NULL, fix to check it before
releasing it.
- f2fs_read_multi_pages
- f2fs_alloc_dic
- f2fs_free_dic
Fixes: 61fbae2b2b12 ("f2fs: fix to avoid NULL pointer dereference")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
fstest reports below message when compression is on:
generic/424 1s ... - output mismatch
--- tests/generic/424.out
+++ results/generic/424.out.bad
@@ -1,2 +1,26 @@
QA output created by 424
+[!] Attribute compressed should be set
+Failed
+stat_test failed
+[!] Attribute compressed should be set
+Failed
+stat_test failed
We missed to set STATX_ATTR_COMPRESSED on compressed inode in getattr(),
fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Just cleanup, no logic change.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Add zstd compress algorithm support, use "compress_algorithm=zstd"
mountoption to enable it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Pull fscrypt updates from Eric Biggers:
"Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves a file's
encryption nonce.
This makes it easier to write automated tests which verify that
fscrypt is doing the encryption correctly"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
ubifs: wire up FS_IOC_GET_ENCRYPTION_NONCE
f2fs: wire up FS_IOC_GET_ENCRYPTION_NONCE
ext4: wire up FS_IOC_GET_ENCRYPTION_NONCE
fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl
|
|
Add below two callback interfaces in struct f2fs_compress_ops:
int (*init_decompress_ctx)(struct decompress_io_ctx *dic);
void (*destroy_decompress_ctx)(struct decompress_io_ctx *dic);
Which will be used by zstd compress algorithm later.
In addition, this patch adds callback function pointer check, so that
specified algorithm can avoid defining unneeded functions.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Otherwise, it will cause memory leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use LZ4 as default compression algorithm, as compared to LZO, it shows
almost the same compression ratio and much better decompression speed.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
{cic,dic}.ref should be initialized to number of compressed pages,
let's initialize it directly rather than doing w/
f2fs_set_compressed_page().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Multipage read flow should consider fsverity, so it needs to use
f2fs_readpage_limit() instead of i_size_read() to check EOF condition.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fix gcc warnings:
In file included from fs/f2fs/dir.c:15:0:
fs/f2fs/xattr.h:157:13: warning: 'f2fs_destroy_xattr_caches' defined but not used [-Wunused-function]
static void f2fs_destroy_xattr_caches(struct f2fs_sb_info *sbi) { }
^~~~~~~~~~~~~~~~~~~~~~~~~
fs/f2fs/xattr.h:156:12: warning: 'f2fs_init_xattr_caches' defined but not used [-Wunused-function]
static int f2fs_init_xattr_caches(struct f2fs_sb_info *sbi) { return 0; }
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: a999150f4fe3 ("f2fs: use kmem_cache pool during inline xattr lookups")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
On image that has verity and compression feature, if compressed pages
and non-compressed pages are mixed in one bio, we may double unlock
non-compressed page in below flow:
- f2fs_post_read_work
- f2fs_decompress_work
- f2fs_decompress_bio
- __read_end_io
- unlock_page
- fsverity_enqueue_verify_work
- f2fs_verity_work
- f2fs_verify_bio
- unlock_page
So it should skip handling non-compressed page in f2fs_decompress_work()
if verity is on.
Besides, add missing dec_page_count() in f2fs_verify_bio().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs_inode_info.flags is unsigned long variable, it has 32 bits
in 32bit architecture, since we introduced FI_MMAP_FILE flag
when we support data compression, we may access memory cross
the border of .flags field, corrupting .i_sem field, result in
below deadlock.
To fix this issue, let's expand .flags as an array to grab enough
space to store new flags.
Call Trace:
__schedule+0x8d0/0x13fc
? mark_held_locks+0xac/0x100
schedule+0xcc/0x260
rwsem_down_write_slowpath+0x3ab/0x65d
down_write+0xc7/0xe0
f2fs_drop_nlink+0x3d/0x600 [f2fs]
f2fs_delete_inline_entry+0x300/0x440 [f2fs]
f2fs_delete_entry+0x3a1/0x7f0 [f2fs]
f2fs_unlink+0x500/0x790 [f2fs]
vfs_unlink+0x211/0x490
do_unlinkat+0x483/0x520
sys_unlink+0x4a/0x70
do_fast_syscall_32+0x12b/0x683
entry_SYSENTER_32+0xaa/0x102
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Tested-by: Ondrej Jirman <megous@megous.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If both compression and fsverity feature is on, generic/572 will
report below NULL pointer dereference bug.
BUG: kernel NULL pointer dereference, address: 0000000000000018
RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs]
#PF: supervisor read access in kernel mode
Workqueue: fsverity_read_queue f2fs_verity_work [f2fs]
RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs]
Call Trace:
process_one_work+0x16c/0x3f0
worker_thread+0x4c/0x440
? rescuer_thread+0x350/0x350
kthread+0xf8/0x130
? kthread_unpark+0x70/0x70
ret_from_fork+0x35/0x40
There are two issue in f2fs_verity_work():
- it needs to traverse and verify all pages in bio.
- if pages in bio belong to non-compressed cluster, accessing
decompress IO context stored in page private will cause NULL
pointer dereference.
Fix them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_decompress_end_io(), we should clear PG_error flag before page
unlock, otherwise reread will fail due to the flag as described in
commit fb7d70db305a ("f2fs: clear PageError on the read path").
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_tmpfile(), parent inode's encryption info is only used when
inheriting encryption context to its child inode, however, we have
already called fscrypt_get_encryption_info() in fscrypt_inherit_context()
to get the encryption info, so just removing unneeded one in
f2fs_tmpfile().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Data flush can generate heavy IO and cause long latency during
flush, so it's not appropriate to trigger it in foreground
operation.
And also, we may face below potential deadlock during data flush:
- f2fs_write_multi_pages
- f2fs_write_raw_pages
- f2fs_write_single_data_page
- f2fs_balance_fs
- f2fs_balance_fs_bg
- f2fs_sync_dirty_inodes
- filemap_fdatawrite -- stuck on flush same cluster
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:f2fs_write_begin+0x823/0xb90 [f2fs]
Call Trace:
f2fs_quota_write+0x139/0x1d0 [f2fs]
write_blk+0x36/0x80 [quota_tree]
get_free_dqblk+0x42/0xa0 [quota_tree]
do_insert_tree+0x235/0x4a0 [quota_tree]
do_insert_tree+0x26e/0x4a0 [quota_tree]
do_insert_tree+0x26e/0x4a0 [quota_tree]
do_insert_tree+0x26e/0x4a0 [quota_tree]
qtree_write_dquot+0x70/0x190 [quota_tree]
v2_write_dquot+0x43/0x90 [quota_v2]
dquot_acquire+0x77/0x100
f2fs_dquot_acquire+0x2f/0x60 [f2fs]
dqget+0x310/0x450
dquot_transfer+0x7e/0x120
f2fs_setattr+0x11a/0x4a0 [f2fs]
notify_change+0x349/0x480
chown_common+0x168/0x1c0
do_fchownat+0xbc/0xf0
__x64_sys_fchownat+0x20/0x30
do_syscall_64+0x5f/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Passing fsdata parameter to .write_{begin,end} in f2fs_quota_write(),
so that if quota file is compressed one, we can avoid above NULL
pointer dereference when updating quota content.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Merge below two conditions into f2fs_may_encrypt() for cleanup
- IS_ENCRYPTED()
- DUMMY_ENCRYPTION_ENABLED()
Check IS_ENCRYPTED(inode) condition in f2fs_init_inode_metadata()
is enough since we have already set encrypt flag in f2fs_new_inode().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We should always check F2FS_I(inode)->cp_task condition in prior to other
conditions in __should_serialize_io() to avoid deadloop described in
commit 040d2bb318d1 ("f2fs: fix to avoid deadloop if data_flush is on"),
however we break this rule when we support compression, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In order to shrink page lock coverage.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
generic/232 reports below deadlock:
fsstress D 0 96980 96969 0x00084000
Call Trace:
schedule+0x4a/0xb0
io_schedule+0x12/0x40
__lock_page+0x127/0x1d0
pagecache_get_page+0x1d8/0x250
prepare_compress_overwrite+0xe0/0x490 [f2fs]
f2fs_prepare_compress_overwrite+0x5d/0x80 [f2fs]
f2fs_write_begin+0x833/0xb90 [f2fs]
f2fs_quota_write+0x145/0x1e0 [f2fs]
write_blk+0x36/0x80 [quota_tree]
do_insert_tree+0x2ac/0x4a0 [quota_tree]
do_insert_tree+0x26e/0x4a0 [quota_tree]
qtree_write_dquot+0x70/0x190 [quota_tree]
v2_write_dquot+0x43/0x90 [quota_v2]
dquot_acquire+0x77/0x100
f2fs_dquot_acquire+0x2f/0x60 [f2fs]
dqget+0x310/0x450
dquot_transfer+0xb2/0x120
f2fs_setattr+0x11a/0x4a0 [f2fs]
notify_change+0x349/0x480
chown_common+0x168/0x1c0
do_fchownat+0xbc/0xf0
__x64_sys_lchown+0x21/0x30
do_syscall_64+0x5f/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9
task PC stack pid father
kworker/u256:0 D 0 103444 2 0x80084000
Workqueue: writeback wb_workfn (flush-251:1)
Call Trace:
schedule+0x4a/0xb0
schedule_timeout+0x15e/0x2f0
io_schedule_timeout+0x19/0x40
congestion_wait+0x7e/0x120
f2fs_write_multi_pages+0x12a/0x840 [f2fs]
f2fs_write_cache_pages+0x48f/0x790 [f2fs]
f2fs_write_data_pages+0x2db/0x330 [f2fs]
do_writepages+0x1a/0x60
__writeback_single_inode+0x3d/0x340
writeback_sb_inodes+0x225/0x4a0
wb_writeback+0xf7/0x320
wb_workfn+0xba/0x470
process_one_work+0x16c/0x3f0
worker_thread+0x4c/0x440
kthread+0xf8/0x130
ret_from_fork+0x35/0x40
fsstress D 0 5277 5266 0x00084000
Call Trace:
schedule+0x4a/0xb0
rwsem_down_write_slowpath+0x29d/0x540
block_operations+0x105/0x360 [f2fs]
f2fs_write_checkpoint+0x101/0x1010 [f2fs]
f2fs_sync_fs+0xa8/0x130 [f2fs]
f2fs_do_sync_file+0x1ad/0x890 [f2fs]
do_fsync+0x38/0x60
__x64_sys_fdatasync+0x13/0x20
do_syscall_64+0x5f/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The root cause is there is potential deadlock between quota data
update and writeback.
Kworker Thread B Thread C
- f2fs_write_cache_pages
- lock whole cluster --- A
- f2fs_write_multi_pages
- f2fs_write_raw_pages
- f2fs_write_single_data_page
- f2fs_do_write_data_page
- f2fs_setattr
- f2fs_lock_op --- B
- f2fs_write_checkpoint
- block_operations
- f2fs_lock_all --- B
- dquot_transfer
- f2fs_quota_write
- f2fs_prepare_compress_overwrite
- pagecache_get_page --- A
- f2fs_trylock_op failed --- B
- congestion_wait
- goto rewrite
To fix this issue, during quota file writeback, just redirty all pages
left in cluster rather holding pages' lock in cluster and looping retrying
lock cp_rwsem.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This lock can be a contention with multi 4k random read IO with single inode.
example) fio --output=test --name=test --numjobs=60 --filename=/media/samsung960pro/file_test --rw=randread --bs=4k
--direct=1 --time_based --runtime=7 --ioengine=libaio --iodepth=256 --group_reporting --size=10G
With this commit, it remove that possible lock contention.
Signed-off-by: Dongjoo Seo <commisori28@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
- f2fs_iget
- do_read_inode
- set_inode_flag(, FI_COMPRESSED_FILE)
- __mark_inode_dirty_flag(, true)
It's unnecessary, so let's just mark compressed inode dirty while
compressed inode conversion.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
These macros are just used by a few files. Move them out of genhd.h,
which is included everywhere into a new standalone header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|