summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2018-01-24ovl: update documentation of inodes index featureAmir Goldstein1-6/+3
Document that inode index feature solves breaking hard links on copy up. Simplify Kconfig backward compatibility disclaimer. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: generalize ovl_verify_origin() and helpersAmir Goldstein4-30/+38
Remove the "origin" language from the functions that handle set, get and verify of "origin" xattr and pass the xattr name as an argument. The same helpers are going to be used for NFS export to get, get and verify the "upper" xattr for directory index entries. ovl_verify_origin() is now a helper used only to verify non upper file handle stored in "origin" xattr of upper inode. The upper root dir file handle is still stored in "origin" xattr on the index dir for backward compatibility. This is going to be changed by the patch that adds directory index entries support. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: simplify arguments to ovl_check_origin_fh()Amir Goldstein4-30/+24
Pass the fs instance with lower_layers array instead of the dentry lowerstack array to ovl_check_origin_fh(), because the dentry members of lowerstack play no role in this helper. This change simplifies the argument list of ovl_check_origin(), ovl_cleanup_index() and ovl_verify_index(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: factor out ovl_check_origin_fh()Amir Goldstein1-50/+92
Re-factor ovl_check_origin() and ovl_get_origin(), so origin fh xattr is read from upper inode only once during lookup with multiple lower layers and only once when verifying index entry origin. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: store layer index in ovl_layerAmir Goldstein3-16/+4
Store the fs root layer index inside ovl_layer struct, so we can get the root fs layer index from merge dir lower layer instead of find it with ovl_find_layer() helper. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: force r/o mount when index dir creation failsAmir Goldstein1-3/+9
When work dir creation fails, a warning is emitted and overlay is mounted r/o. Trying to remount r/w will fail with no work dir. When index dir creation fails, the same warning is emitted and overlay is mounted r/o, but trying to remount r/w will succeed. This may cause unintentional corruption of filesystem consistency. Adjust the behavior of index dir creation failure to that of work dir creation failure and do not allow to remount r/w. User needs to state an explicitly intention to work without an index by mounting with option 'index=off' to allow r/w mount with no index dir. When mounting with option 'index=on' and no 'upperdir', index is implicitly disabled, so do not warn about no file handle support. The issue was introduced with inodes index feature in v4.13, but this patch will not apply cleanly before ovl_fill_super() re-factoring in v4.15. Fixes: 02bcd1577400 ("ovl: introduce the inodes index dir feature") Cc: <stable@vger.kernel.org> #v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: disable index when no xattr supportAmir Goldstein1-1/+2
Overlayfs falls back to index=off if lower/upper fs does not support file handles. Do the same if upper fs does not support xattr. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: fix inconsistent d_ino for legacy merge dirAmir Goldstein3-2/+37
For a merge dir that was copied up before v4.12 or that was hand crafted offline (e.g. mkdir {upper/lower}/dir), upper dir does not contain the 'trusted.overlay.origin' xattr. In that case, stat(2) on the merge dir returns the lower dir st_ino, but getdents(2) returns the upper dir d_ino. After this change, on merge dir lookup, missing origin xattr on upper dir will be fixed and 'impure' xattr will be fixed on parent of the legacy merge dir. Suggested-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-6/+7
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in 'net'. The esp{4,6}_offload.c conflicts were overlapping changes. The 'out' label is removed so we just return ERR_PTR(-EINVAL) directly. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23GFS2: Log the reason for log flushes in every log headerBob Peterson10-26/+45
This patch just adds the capability for GFS2 to track which function called gfs2_log_flush. This should make it easier to diagnose problems based on the sequence of events found in the journals. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23GFS2: Introduce new gfs2_log_header_v2Bob Peterson14-57/+104
This patch adds a new structure called gfs2_log_header_v2 which is used to store expanded fields into previously unused areas of the log headers (i.e., this change is backwards compatible). Some of these are used for debug purposes so we can backtrack when problems occur. Others are reserved for future expansion. This patch is based on a prototype from Steve Whitehouse. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23sysfs: remove DEBUG definesGreg Kroah-Hartman2-3/+0
It isn't needed at all in these files, dynamic debug is the best way to enable this type of thing, if you really want it. As it is, these defines were not doing anything at all. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23sysfs: use SPDX identifiersGreg Kroah-Hartman6-13/+6
Move the license "mark" of the sysfs files to be in SPDX form, instead of the custom text that it currently is in. This is in a quest to get rid of the 700+ different ways we say "GPLv2" in the kernel tree. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23nfsd: auth: Fix gid sorting when rootsquash enabledBen Hutchings1-3/+3
Commit bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility group_info allocators") appears to break nfsd rootsquash in a pretty major way. It adds a call to groups_sort() inside the loop that copies/squashes gids, which means the valid gids are sorted along with the following garbage. The net result is that the highest numbered valid gids are replaced with any lower-valued garbage gids, possibly including 0. We should sort only once, after filling in all the gids. Fixes: bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility ...") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-23f2fs: allow to recover node blocks given updated checkpointJaegeuk Kim2-0/+5
If fsck.f2fs changes crc, we have no way to recover some inode blocks by roll- forward recovery. Let's relax the condition to recover them. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: recover some i_inline flagsJaegeuk Kim1-2/+19
This fixes lost i_inline flags during roll-forward. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: correct removexattr behavior for null valued extended attributeDaeho Jeong1-1/+1
__vfs_removexattr() transfers "NULL" value to the setxattr handler of the f2fs filesystem in order to remove the extended attribute. But, __f2fs_setxattr() just ignores the removal request when the value of the extended attribute is already NULL. We have to remove the extended attribute itself even if the value of that is already NULL. We can reporduce this bug with the below: 1. touch file 2. setfattr -n "user.foo" file 3. setfattr -x "user.foo" file 4. getfattr -d file > user.foo Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Tested-by: Hobin Woo <hobin.woo@samsung.com> Tested-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: drop page cache after fs shutdownChao Yu3-17/+21
Don't remain dirtied page cache in f2fs after shutdown, it can mitigate memory pressure of whole system, in order to keep other modules working properly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: stop gc/discard thread after fs shutdownChao Yu3-0/+13
Once filesystem shuts down, daemons like gc/discard thread should be aware of it, and do exit, in addtion, drop all cached pending discard commands and turn off real-time discard mode. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: hanlde error case in f2fs_ioc_shutdownChao Yu1-2/+8
This patch makes f2fs_ioc_shutdown handling error case correctly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: split need_inplace_updateChao Yu4-50/+75
This patch splits need_inplace_update to two functions: a. should_update_inplace() includes all conditions that we must use IPU. b. should_update_outplace() includes all conditions that we must use OPU. So that, in f2fs_ioc_set_pin_file() and f2fs_defragment_range(), we can use corresponding function to check whether we can trigger OPU/IPU or not. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: fix to update last_disk_size correctlyChao Yu1-4/+8
This patch fixes to update last_disk_size only when writing out page successfully. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanupChao Yu2-5/+4
Use get_inline_xattr_addrs directly instead of F2FS_INLINE_XATTR_ADDRS. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: clean up error path of fill_superChao Yu1-4/+6
This patch cleans up error path of fille_super to avoid unneeded release step. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: avoid hungtask when GC encrypted block if io_bits is setSheng Yong1-1/+6
When io_bits is set, GCing encrypted block may hit the following hungtask. Since io_bits requires aligned block address, f2fs_submit_page_write may return -EAGAIN if new_blkaddr does not satisify io_bits alignment. As a result, the encrypted page will never be writtenback. This patch makes move_data_block aware the EAGAIN error and cancel the writeback. [ 246.751371] INFO: task kworker/u4:4:797 blocked for more than 90 seconds. [ 246.752423] Not tainted 4.15.0-rc4+ #11 [ 246.754176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.755336] kworker/u4:4 D25448 797 2 0x80000000 [ 246.755597] Workqueue: writeback wb_workfn (flush-7:0) [ 246.755616] Call Trace: [ 246.755695] ? __schedule+0x322/0xa90 [ 246.755761] ? blk_init_request_from_bio+0x120/0x120 [ 246.755773] ? pci_mmcfg_check_reserved+0xb0/0xb0 [ 246.755801] ? __radix_tree_create+0x19e/0x200 [ 246.755813] ? delete_node+0x136/0x370 [ 246.755838] schedule+0x43/0xc0 [ 246.755904] io_schedule+0x17/0x40 [ 246.755939] wait_on_page_bit_common+0x17b/0x240 [ 246.755950] ? wake_page_function+0xa0/0xa0 [ 246.755961] ? add_to_page_cache_lru+0x160/0x160 [ 246.755972] ? page_cache_tree_insert+0x170/0x170 [ 246.755983] ? __lru_cache_add+0x96/0xb0 [ 246.756086] __filemap_fdatawait_range+0x14f/0x1c0 [ 246.756097] ? wait_on_page_bit_common+0x240/0x240 [ 246.756120] ? __wake_up_locked_key_bookmark+0x20/0x20 [ 246.756167] ? wait_on_all_pages_writeback+0xc9/0x100 [ 246.756179] ? __remove_ino_entry+0x120/0x120 [ 246.756192] ? wait_woken+0x100/0x100 [ 246.756204] filemap_fdatawait_range+0x9/0x20 [ 246.756216] write_checkpoint+0x18a1/0x1f00 [ 246.756254] ? blk_get_request+0x10/0x10 [ 246.756265] ? cpumask_next_and+0x43/0x60 [ 246.756279] ? f2fs_sync_inode_meta+0x160/0x160 [ 246.756289] ? remove_element.isra.4+0xa0/0xa0 [ 246.756300] ? __put_compound_page+0x40/0x40 [ 246.756310] ? f2fs_sync_fs+0xec/0x1c0 [ 246.756320] ? f2fs_sync_fs+0x120/0x1c0 [ 246.756329] f2fs_sync_fs+0x120/0x1c0 [ 246.756357] ? trace_event_raw_event_f2fs__page+0x260/0x260 [ 246.756393] ? ata_build_rw_tf+0x173/0x410 [ 246.756397] f2fs_balance_fs_bg+0x198/0x390 [ 246.756405] ? drop_inmem_page+0x230/0x230 [ 246.756415] ? ahci_qc_prep+0x1bb/0x2e0 [ 246.756418] ? ahci_qc_issue+0x1df/0x290 [ 246.756422] ? __accumulate_pelt_segments+0x42/0xd0 [ 246.756426] ? f2fs_write_node_pages+0xd1/0x380 [ 246.756429] f2fs_write_node_pages+0xd1/0x380 [ 246.756437] ? sync_node_pages+0x8f0/0x8f0 [ 246.756440] ? update_curr+0x53/0x220 [ 246.756444] ? __accumulate_pelt_segments+0xa2/0xd0 [ 246.756448] ? __update_load_avg_se.isra.39+0x349/0x360 [ 246.756452] ? do_writepages+0x2a/0xa0 [ 246.756456] do_writepages+0x2a/0xa0 [ 246.756460] __writeback_single_inode+0x70/0x490 [ 246.756463] ? check_preempt_wakeup+0x199/0x310 [ 246.756467] writeback_sb_inodes+0x2a2/0x660 [ 246.756471] ? is_empty_dir_inode+0x40/0x40 [ 246.756474] ? __writeback_single_inode+0x490/0x490 [ 246.756477] ? string+0xbf/0xf0 [ 246.756480] ? down_read_trylock+0x35/0x60 [ 246.756484] __writeback_inodes_wb+0x9f/0xf0 [ 246.756488] wb_writeback+0x41d/0x4b0 [ 246.756492] ? writeback_inodes_wb.constprop.55+0x150/0x150 [ 246.756498] ? set_worker_desc+0xf7/0x130 [ 246.756502] ? current_is_workqueue_rescuer+0x60/0x60 [ 246.756511] ? _find_next_bit+0x2c/0xa0 [ 246.756514] ? wb_workfn+0x400/0x5d0 [ 246.756518] wb_workfn+0x400/0x5d0 [ 246.756521] ? finish_task_switch+0xdf/0x2a0 [ 246.756525] ? inode_wait_for_writeback+0x30/0x30 [ 246.756529] process_one_work+0x3a7/0x6f0 [ 246.756533] worker_thread+0x82/0x750 [ 246.756537] kthread+0x16f/0x1c0 [ 246.756541] ? trace_event_raw_event_workqueue_work+0x110/0x110 [ 246.756544] ? kthread_create_worker_on_cpu+0xb0/0xb0 [ 246.756548] ret_from_fork+0x1f/0x30 Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: allow quota to use reserved blocksJaegeuk Kim1-3/+8
This patch allows quota to use reserved blocks all the time. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: fix to drop all inmem pages correctlyChao Yu1-2/+3
In commit 57864ae5ce3a ("f2fs: limit # of inmemory pages"), we have limited memory footprint of all inmem pages with 20% of total memory, otherwise, if we exceed the threshold, we will try to drop all inmem pages to avoid excessive memory pressure resulting in performance regression. But in some unrelated error paths, we will also drop all inmem pages, which should be wrong, fix it in this patch. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: speed up defragment on sparse fileChao Yu2-6/+11
We have supported to get next page offset with valid mapping crossing hole in f2fs_map_blocks, utilizing it to speed up defragment on sparse file. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: support F2FS_IOC_PRECACHE_EXTENTSChao Yu3-2/+85
This patch introduces a new ioctl F2FS_IOC_PRECACHE_EXTENTS to precache extent info like ext4, in order to gain better performance during triggering AIO by eliminating synchronous waiting of mapping info. Referred commit: 7869a4a6c5ca ("ext4: add support for extent pre-caching") In addition, with newly added extent precache abilitiy, this patch add to support FIEMAP_FLAG_CACHE in ->fiemap. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23f2fs: add an ioctl to disable GC for specific fileJaegeuk Kim6-1/+128
This patch gives a flag to disable GC on given file, which would be useful, when user wants to keep its block map. It also conducts in-place-update for dontmove file. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-23orangefs: initialize op on loop restart in orangefs_devreq_readMartin Brandenburg1-1/+2
In orangefs_devreq_read, there is a loop which picks an op off the list of pending ops. If the loop fails to find an op, there is nothing to read, and it returns EAGAIN. If the op has been given up on, the loop is restarted via a goto. The bug is that the variable which the found op is written to is not reinitialized, so if there are no more eligible ops on the list, the code runs again on the already handled op. This is triggered by interrupting a process while the op is being copied to the client-core. It's a fairly small window, but it's there. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-23orangefs: use list_for_each_entry_safe in purge_waiting_opsMartin Brandenburg1-2/+2
set_op_state_purged can delete the op. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-22btrfs: set the total_devices in device_list_add()Anand Jain1-4/+2
There is no other parent for device_list_add() except for btrfs_scan_one_device(), which would set btrfs_fs_devices::total_devices if device_list_add is successful and this can be done with in device_list_add() itself. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: move pr_info into device_list_addAnand Jain1-16/+10
Commit 60999ca4b403 ("btrfs: make device scan less noisy") adds return value 1 to device_list_add(), so that parent function can call pr_info only when new device is added. Move the pr_info() part into device_list_add() so that this function can be kept simple. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: make btrfs_free_stale_devices() to match the pathAnand Jain1-12/+16
The btrfs_free_stale_devices() is updated to match for the given device path and delete it. (It searches for only unmounted list of devices.) Also drop the comment about different path being used for the same device, since now we will have cli to clean any device that's not a concern any more. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: rename btrfs_free_stale_devices() arg to skip_devAnand Jain1-4/+4
No functional changes. Rename btrfs_free_stale_devices() arg to skip_dev, so that it reflects what that arg for. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: make btrfs_free_stale_devices() argument optionalAnand Jain1-9/+5
This updates btrfs_free_stale_devices() helper function to delete all unmouted devices, when arg is NULL. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: make btrfs_free_stale_device() to iterate all stalesAnand Jain1-13/+11
Let the list iterator iterate further and find other stale devices and delete it. This is in preparation to add support for user land request-able stale devices cleanup. Also rename btrfs_free_stale_device() to btrfs_free_stale_devices(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: no need to check for btrfs_fs_devices::seedingAnand Jain1-2/+0
There is no need to check for btrfs_fs_devices::seeding when we have checked for btrfs_fs_devices::opened, because we can't sprout without its seed FS being opened. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22sysfs: turn WARN() into pr_warn()Greg Kroah-Hartman1-2/+3
It's not good to crash the machine if panic_on_warn() is set just because someone made a stupid mistake of trying to create a sysfs file with the same name of an existing one. This makes the automated testing tools a lot harder to find the real bugs in the kernel. So just print a warning out and dump the stack to get the attention of the developer that they did something foolish. Then keep on trucking, as this should not be a fatal error at all. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-22btrfs: Use IS_ALIGNED in btrfs_truncate_block instead of opencoding itNikolay Borisov1-2/+2
No functional changes, just makes the code more readable Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22Btrfs: noinline merge_extent_mappingLiu Bo1-4/+4
In order to debug subtle bugs around merge_extent_mapping(), perf probe can be used to check the arguments, but sometimes merge_extent_mapping() got inlined by compiler and couldn't be probed. This is adding noinline attribute to merge_extent_mapping(). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mappingLiu Bo1-1/+8
This is a subtle case, so in order to understand the problem, it'd be good to know the content of existing and em when any error occurs. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22Btrfs: extent map selftest: dio write vs dio readLiu Bo1-0/+89
This test case simulates the racy situation of dio write vs dio read, and see if btrfs_get_extent() would return -EEXIST. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22Btrfs: extent map selftest: buffered write vs dio readLiu Bo1-0/+74
This test case simulates the racy situation of buffered write vs dio read, and see if btrfs_get_extent() would return -EEXIST. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22Btrfs: add extent map selftestsLiu Bo4-1/+208
We've observed that btrfs_get_extent() and merge_extent_mapping() could return -EEXIST in several cases, and they are caused by some racy condition, e.g dio read vs dio write, which makes the problem very tricky to reproduce. This adds extent map selftests in order to simulate those racy situations. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> [ minor string adjustments ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22Btrfs: move extent map specific code to extent_map.cLiu Bo4-109/+127
These helpers are extent map specific, move them to extent_map.c. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22Btrfs: add helper for em merge logicLiu Bo2-34/+48
This is a prepare work for the following extent map selftest, which runs tests against em merge logic. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22Btrfs: fix unexpected EEXIST from btrfs_get_extentLiu Bo1-14/+3
This fixes a corner case that is caused by a race of dio write vs dio read/write. Here is how the race could happen. Suppose that no extent map has been loaded into memory yet. There is a file extent [0, 32K), two jobs are running concurrently against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio read from [0, 4K) or [4K, 8K). t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K). ------------------------------------------------------ t1 t2 btrfs_get_blocks_direct() btrfs_get_blocks_direct() -> btrfs_get_extent() -> btrfs_get_extent() -> lookup_extent_mapping() -> add_extent_mapping() -> lookup_extent_mapping() # load [0, 32K) -> btrfs_new_extent_direct() -> btrfs_drop_extent_cache() # split [0, 32K) and # drop [8K, 32K) -> add_extent_mapping() # add [8K, 32K) -> add_extent_mapping() # handle -EEXIST when adding # [0, 32K) ------------------------------------------------------ About how t2(dio read/write) runs into -EEXIST: a) add_extent_mapping() gets -EEXIST for adding em [0, 32k), b) search_extent_mapping() then returns [0, 8k) as the existing em, even though start == existing->start, em is [0, 32k) so that extent_map_end(em) > extent_map_end(existing), i.e. 32k > 8k, c) then it goes thru merge_extent_mapping() which tries to add a [8k, 8k) (with a length 0) and returns -EEXIST as [8k, 32k) is already in tree, d) so btrfs_get_extent() ends up returning -EEXIST to dio read/write, which is confusing applications. Here I conclude all the possible situations, 1) start < existing->start +-----------+em+-----------+ +--prev---+ | +-------------+ | | | | | | | +---------+ + +---+existing++ ++ + | + start 2) start == existing->start +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 3) start > existing->start && start < (existing->start + existing->len) +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 4) start >= (existing->start + existing->len) +-----------+em+-----------+ | +-------------+ | +--next---+ | | | | | | + +---+existing++ + +---------+ + | + start As we can see, it turns out that if start is within existing em (front inclusive), then the existing em should be returned as is, otherwise, we try our best to merge candidate em with sibling ems to form a larger em (in order to reduce the total number of em). Reported-by: David Vallender <david.vallender@landmark.co.uk> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22Btrfs: fix incorrect block_len in merge_extent_mappingLiu Bo1-1/+1
%block_len could be checked on deciding if two em are mergeable. merge_extent_mapping() has only added the front pad if the front part of em gets truncated, but it's possible that the end part gets truncated. For both compressed extent and inline extent, em->block_len is not adjusted accordingly, and for regular extent, em->block_len always equals to em->len, hence this sets em->block_len with em->len. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>