summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-04-09ext4: improve cr 0 / cr 1 group scanningHarshad Shirwadkar5-15/+452
Instead of traversing through groups linearly, scan groups in specific orders at cr 0 and cr 1. At cr 0, we want to find groups that have the largest free order >= the order of the request. So, with this patch, we maintain lists for each possible order and insert each group into a list based on the largest free order in its buddy bitmap. During cr 0 allocation, we traverse these lists in the increasing order of largest free orders. This allows us to find a group with the best available cr 0 match in constant time. If nothing can be found, we fallback to cr 1 immediately. At CR1, the story is slightly different. We want to traverse in the order of increasing average fragment size. For CR1, we maintain a rb tree of groupinfos which is sorted by average fragment size. Instead of traversing linearly, at CR1, we traverse in the order of increasing average fragment size, starting at the most optimal group. This brings down cr 1 search complexity to log(num groups). For cr >= 2, we just perform the linear search as before. Also, in case of lock contention, we intermittently fallback to linear search even in CR 0 and CR 1 cases. This allows us to proceed during the allocation path even in case of high contention. There is an opportunity to do optimization at CR2 too. That's because at CR2 we only consider groups where bb_free counter (number of free blocks) is greater than the request extent size. That's left as future work. All the changes introduced in this patch are protected under a new mount option "mb_optimize_scan". With this patchset, following experiment was performed: Created a highly fragmented disk of size 65TB. The disk had no contiguous 2M regions. Following command was run consecutively for 3 times: time dd if=/dev/urandom of=file bs=2M count=10 Here are the results with and without cr 0/1 optimizations introduced in this patch: |---------+------------------------------+---------------------------| | | Without CR 0/1 Optimizations | With CR 0/1 Optimizations | |---------+------------------------------+---------------------------| | 1st run | 5m1.871s | 2m47.642s | | 2nd run | 2m28.390s | 0m0.611s | | 3rd run | 2m26.530s | 0m1.255s | |---------+------------------------------+---------------------------| Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210401172129.189766-6-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add MB_NUM_ORDERS macroHarshad Shirwadkar2-9/+15
A few arrays in mballoc.c use the total number of valid orders as their size. Currently, this value is set as "sb->s_blocksize_bits + 2". This makes code harder to read. So, instead add a new macro MB_NUM_ORDERS(sb) to make the code more readable. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-5-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add mballoc stats proc fileHarshad Shirwadkar3-2/+80
Add new stats for measuring the performance of mballoc. This patch is forked from Artem Blagodarenko's work that can be found here: https://github.com/lustre/lustre-release/blob/master/ldiskfs/kernel_patches/patches/rhel8/ext4-simple-blockalloc.patch This patch reorganizes the stats by cr level. This is how the output looks like: mballoc: reqs: 0 success: 0 groups_scanned: 0 cr0_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr1_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr2_stats: hits: 0 groups_considered: 0 useless_loops: 0 cr3_stats: hits: 0 groups_considered: 0 useless_loops: 0 extents_scanned: 0 goal_hits: 0 2^n_hits: 0 breaks: 0 lost: 0 buddies_generated: 0/40 buddies_time_used: 0 preallocated: 0 discarded: 0 Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-4-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add ability to return parsed options from parse_optionsHarshad Shirwadkar1-21/+30
Before this patch, the function parse_options() was returning journal_devnum and journal_ioprio variables to the caller. This patch generalizes that interface to allow parse_options to return any parsed options to return back to the caller. In this patch series, it gets used to capture the value of "mb_optimize_scan=%u" mount option. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-3-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: drop s_mb_bal_lock and convert protected fields to atomicHarshad Shirwadkar2-11/+7
s_mb_buddies_generated gets used later in this patch series to determine if the cr 0 and cr 1 optimziations should be performed or not. Currently, s_mb_buddies_generated is protected under a spin_lock. In the allocation path, it is better if we don't depend on the lock and instead read the value atomically. In order to do that, we drop s_bal_lock altogether and we convert the only two protected fields by it s_mb_buddies_generated and s_mb_generation_time to atomic type. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix check to prevent false positive report of incorrect used inodesZhang Yi1-16/+32
Commit <50122847007> ("ext4: fix check to prevent initializing reserved inodes") check the block group zero and prevent initializing reserved inodes. But in some special cases, the reserved inode may not all belong to the group zero, it may exist into the second group if we format filesystem below. mkfs.ext4 -b 4096 -g 8192 -N 1024 -I 4096 /dev/sda So, it will end up triggering a false positive report of a corrupted file system. This patch fix it by avoid check reserved inodes if no free inode blocks will be zeroed. Cc: stable@kernel.org Fixes: 50122847007 ("ext4: fix check to prevent initializing reserved inodes") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331121516.2243099-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09debugfs: Make debugfs_allow RO after initKees Cook1-1/+1
Since debugfs_allow is only set at boot time during __init, make it read-only after being set. Fixes: a24c6f7bc923 ("debugfs: Add access restriction option") Cc: Peter Enderborg <peter.enderborg@sony.com> Reviewed-by: Peter Enderborg <peter.enderborg@sony.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210405213959.3079432-1-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-09Merge tag '5.12-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds5-8/+22
Pull cifs fixes from Steve French: "Three cifs/smb3 fixes, two for stable: a reconnect fix and a fix for display of devnames with special characters" * tag '5.12-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: escape spaces in share names fs: cifs: Remove unnecessary struct declaration cifs: On cifs_reconnect, resolve the hostname again.
2021-04-09treewide: Change list_sort to use const pointersSami Tolvanen18-30/+38
list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
2021-04-08io-wq: cancel unbounded works on io-wq destroyPavel Begunkov1-0/+4
WARNING: CPU: 5 PID: 227 at fs/io_uring.c:8578 io_ring_exit_work+0xe6/0x470 RIP: 0010:io_ring_exit_work+0xe6/0x470 Call Trace: process_one_work+0x206/0x400 worker_thread+0x4a/0x3d0 kthread+0x129/0x170 ret_from_fork+0x22/0x30 INFO: task lfs-openat:2359 blocked for more than 245 seconds. task:lfs-openat state:D stack: 0 pid: 2359 ppid: 1 flags:0x00000004 Call Trace: ... wait_for_completion+0x8b/0xf0 io_wq_destroy_manager+0x24/0x60 io_wq_put_and_exit+0x18/0x30 io_uring_clean_tctx+0x76/0xa0 __io_uring_files_cancel+0x1b9/0x2e0 do_exit+0xc0/0xb40 ... Even after io-wq destroy has been issued io-wq worker threads will continue executing all left work items as usual, and may hang waiting for I/O that won't ever complete (aka unbounded). [<0>] pipe_read+0x306/0x450 [<0>] io_iter_do_read+0x1e/0x40 [<0>] io_read+0xd5/0x330 [<0>] io_issue_sqe+0xd21/0x18a0 [<0>] io_wq_submit_work+0x6c/0x140 [<0>] io_worker_handle_work+0x17d/0x400 [<0>] io_wqe_worker+0x2c0/0x330 [<0>] ret_from_fork+0x22/0x30 Cancel all unbounded I/O instead of executing them. This changes the user visible behaviour, but that's inevitable as io-wq is not per task. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cd4b543154154cba055cf86f351441c2174d7f71.1617842918.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-08io_uring: fix rw req completionPavel Begunkov1-0/+13
WARNING: at fs/io_uring.c:8578 io_ring_exit_work.cold+0x0/0x18 As reissuing is now passed back by REQ_F_REISSUE and kiocb_done() internally uses __io_complete_rw(), it may stop after setting the flag so leaving a dangling request. There are tricky edge cases, e.g. reading beyound file, boundary, so the easiest way is to hand code reissue in kiocb_done() as __io_complete_rw() was doing for us before. Fixes: 230d50d448ac ("io_uring: move reissue into regular IO path") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f602250d292f8a84cca9a01d747744d1e797be26.1617842918.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-08block: refactor blk_drop_partitionsChristoph Hellwig1-3/+5
Move the busy check and disk-wide sync into the only caller, so that the remainder can be shared with del_gendisk. Also pass the gendisk instead of the bdev as that is all that is needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210406062303.811835-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-08Merge tag 'for-linus-2021-04-08' of ↵Linus Torvalds1-4/+17
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull close_range() fix from Christian Brauner: "Syzbot reported a bug in close_range. Debugging this showed we didn't recalculate the current maximum fd number for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC after we unshared the file descriptors table. As a result, max_fd could exceed the current fdtable maximum causing us to set excessive bits. As a concrete example, let's say the user requested everything from fd 4 to ~0UL to be closed and their current fdtable size is 256 with their highest open fd being 4. With CLOSE_RANGE_UNSHARE the caller will end up with a new fdtable which has room for 64 file descriptors since that is the lowest fdtable size we accept. But now max_fd will still point to 255 and needs to be adjusted. Fix this by retrieving the correct maximum fd value in __range_cloexec(). I've carried this fix for a little while but since there was no linux-next release over easter I waited until now. With this change close_range() can be further simplified but imho we are in no hurry to do that and so I'll defer this for the 5.13 merge window" * tag 'for-linus-2021-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: file: fix close_range() for unshare+cloexec
2021-04-08Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-4/+4
Pull umount fix from Al Viro: "Brown paperbag time: dumb braino in the series that went into 5.7 broke the 'don't step into ->d_weak_revalidate() when umount(2) looks the victim up' behaviour. Spotted only now - saw if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) { err = handle_lookup_down(nd); nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please... } and went "why do we clear that flag here - nothing below that point is going to check it anyway" / "wait a minute, what is it doing *after* complete_walk() (which is where we check that flag and call ->d_weak_revalidate())" / "how could that possibly _not_ break?", followed by reproducing the breakage and verifying that the obvious fix of that braino does, indeed, fix it. The reproducer is (assuming that $DIR exists and is exported r/w to localhost) mkdir $DIR/a mkdir /tmp/foo mount --bind /tmp/foo /tmp/foo mkdir /tmp/foo/a mkdir /tmp/foo/b mount -t nfs4 localhost:$DIR/a /tmp/foo/a mount -t nfs4 localhost:$DIR /tmp/foo/b rmdir /tmp/foo/b/a umount /tmp/foo/b umount /tmp/foo/a umount -l /tmp/foo # will get everything under /tmp/foo, no matter what Correct behaviour is successful umount; broken kernels (5.7-rc1 and later) get umount.nfs4: /tmp/foo/a: Stale file handle Note that bind mount is there to be able to recover - on broken kernels we'd get stuck with impossible-to-umount filesystem if not for that. FWIW, that braino had been posted for review back then, at least twice. Unfortunately, the call of complete_walk() was outside of diff context, so the bogosity hadn't been immediately obvious from the patch alone ;-/" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
2021-04-08gfs2: Make gfs2_setattr_simple staticAndreas Gruenbacher2-2/+1
This function is only used in inode.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-04-08gfs2: Add new sysfs file for gfs2 statusBob Peterson1-0/+67
This patch adds a new file: /sys/fs/gfs2/*/status which will report the status of the file system. Catting this file dumps the current status of the file system according to various superblock variables. For example: Journal Checked: 1 Journal Live: 1 Journal ID: 0 Spectator: 0 Withdrawn: 0 No barriers: 0 No recovery: 0 Demote: 0 No Journal ID: 1 Mounted RO: 0 RO Recovery: 0 Skip DLM Unlock: 0 Force AIL Flush: 0 FS Frozen: 0 Withdrawing: 0 Withdraw In Prog: 0 Remote Withdraw: 0 Withdraw Recovery: 0 sd_log_error: 0 sd_log_flush_lock: 0 sd_log_num_revoke: 0 sd_log_in_flight: 0 sd_log_blks_needed: 0 sd_log_blks_free: 32768 sd_log_flush_head: 0 sd_log_flush_tail: 5384 sd_log_blks_reserved: 0 sd_log_revokes_available: 503 Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-04-08io_uring: clear F_REISSUE right after getting itPavel Begunkov1-2/+4
There are lots of ways r/w request may continue its path after getting REQ_F_REISSUE, it's not necessarily io-wq and can be, e.g. apoll, and submitted via io_async_task_func() -> __io_req_task_submit() Clear the flag right after getting it, so the next attempt is well prepared regardless how the request will be executed. Fixes: 230d50d448ac ("io_uring: move reissue into regular IO path") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/11dcead939343f4e27cab0074d34afcab771bfa4.1617842918.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-08cifs: escape spaces in share namesMaciek Borzecki1-1/+2
Commit 653a5efb849a ("cifs: update super_operations to show_devname") introduced the display of devname for cifs mounts. However, when mounting a share which has a whitespace in the name, that exact share name is also displayed in mountinfo. Make sure that all whitespace is escaped. Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com> CC: <stable@vger.kernel.org> # 5.11+ Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-04-08fs: cifs: Remove unnecessary struct declarationWan Jiabing1-2/+0
struct cifs_readdata is declared twice. One is declared at 208th line. And struct cifs_readdata is defined blew. The declaration here is not needed. Remove the duplicate. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-04-08cifs: On cifs_reconnect, resolve the hostname again.Shyam Prasad N3-5/+20
On cifs_reconnect, make sure that DNS resolution happens again. It could be the cause of connection to go dead in the first place. This also contains the fix for a build issue identified by Intel bot. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: <stable@vger.kernel.org> # 5.11+ Signed-off-by: Steve French <stfrench@microsoft.com>
2021-04-08xfs: move the check for post-EOF mappings into xfs_can_free_eofblocksDarrick J. Wong2-82/+99
Fix the weird split of responsibilities between xfs_can_free_eofblocks and xfs_free_eofblocks by moving the chunk of code that looks for any actual post-EOF space mappings from the second function into the first. This clears the way for deferred inode inactivation to be able to decide if an inode needs inactivation work before committing the released inode to the inactivation code paths (vs. marking it for reclaim). Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-04-08xfs: move the xfs_can_free_eofblocks call under the IOLOCKDarrick J. Wong1-8/+7
In xfs_inode_free_eofblocks, move the xfs_can_free_eofblocks call further down in the function to the point where we have taken the IOLOCK. This is preparation for the next patch, where we will need that lock (or equivalent) so that we can check if there are any post-eof blocks to clean out. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-04-08xfs: precalculate default inode attribute offsetDave Chinner4-12/+28
Default attr fork offset is based on inode size, so is a fixed geometry parameter of the inode. Move it to the xfs_ino_geometry structure and stop calculating it on every call to xfs_default_attroffset(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-04-08xfs: default attr fork size does not handle device inodesDave Chinner1-2/+7
Device inodes have a non-default data fork size of 8 bytes as checked/enforced by xfs_repair. xfs_default_attroffset() doesn't handle this, so lets do a minor refactor so it does. Fixes: e6a688c33238 ("xfs: initialise attr fork on inode create") Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-04-08xfs: inode fork allocation depends on XFS_IFEXTENT flagDave Chinner1-0/+1
Due to confusion on when the XFS_IFEXTENT needs to be set, the changes in e6a688c33238 ("xfs: initialise attr fork on inode create") failed to set the flag when initialising the empty attribute fork at inode creation. Set this flag the same way xfs_bmap_add_attrfork() does after attry fork allocation. Fixes: e6a688c33238 ("xfs: initialise attr fork on inode create") Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-04-08xfs: eager inode attr fork init needs attr feature awarenessDave Chinner1-1/+1
The pitfalls of regression testing on a machine without realising that selinux was disabled. Only set the attr fork during inode allocation if the attr feature bits are already set on the superblock. Fixes: e6a688c33238 ("xfs: initialise attr fork on inode create") Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-04-08xfs: scrub: Disable check for unoptimized data fork bmbt nodeChandan Babu R1-2/+28
xchk_btree_check_minrecs() checks if the contents of the immediate child of a bmbt root block can fit within the root block. This check could fail on inodes with an attr fork since xfs_bmap_add_attrfork_btree() used to demote the current root node of the data fork as the child of a newly allocated root node if it found that the size of "struct xfs_btree_block" along with the space required for records exceeded that of space available in the data fork. xfs_bmap_add_attrfork_btree() should have used "struct xfs_bmdr_block" instead of "struct xfs_btree_block" for the above mentioned space requirement calculation. This commit disables the check for unoptimized (in terms of disk space usage) data fork bmbt trees since there could be filesystems in use that already have such a layout. Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-04-08xfs: Use struct xfs_bmdr_block instead of struct xfs_btree_block to ↵Chandan Babu R1-1/+3
calculate root node size The incore data fork of an inode stores the bmap btree root node as 'struct xfs_btree_block'. However, the ondisk version of the inode stores the bmap btree root node as a 'struct xfs_bmdr_block'. xfs_bmap_add_attrfork_btree() checks if the btree root node fits inside the data fork of the inode. However, it incorrectly uses 'struct xfs_btree_block' to compute the size of the bmap btree root node. Since size of 'struct xfs_btree_block' is larger than that of 'struct xfs_bmdr_block', xfs_bmap_add_attrfork_btree() could end up unnecessarily demoting the current root node as the child of newly allocated root node. This commit optimizes space usage by modifying xfs_bmap_add_attrfork_btree() to use 'struct xfs_bmdr_block' to check if the bmap btree root node fits inside the data fork of the inode. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: deprecate BMV_IF_NO_DMAPI_READ flagAnthony Iliopoulos2-3/+1
Use of the flag has had no effect since kernel commit 288699fecaff ("xfs: drop dmapi hooks"), which removed all dmapi related code, so deprecate it. Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: merge _xfs_dic2xflags into xfs_ip2xflagsChristoph Hellwig1-32/+22
Merge _xfs_dic2xflags into its only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_crtime field to struct xfs_inodeChristoph Hellwig9-29/+11
Move the crtime field from struct xfs_icdinode into stuct xfs_inode and remove the now entirely unused struct xfs_icdinode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_flags2 field to struct xfs_inodeChristoph Hellwig13-52/+49
In preparation of removing the historic icinode struct, move the flags2 field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_flags field to struct xfs_inodeChristoph Hellwig17-47/+46
In preparation of removing the historic icinode struct, move the flags field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_forkoff field to struct xfs_inodeChristoph Hellwig9-39/+39
In preparation of removing the historic icinode struct, move the forkoff field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: use a union for i_cowextsize and i_flushiterChristoph Hellwig4-11/+20
The i_cowextsize field is only used for v3 inodes, and the i_flushiter field is only used for v1/v2 inodes. Use a union to pack the inode a littler better after adding a few missing guards around their usage. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: use XFS_B_TO_FSB in xfs_ioctl_setattrChristoph Hellwig1-2/+2
Clean up xfs_ioctl_setattr a bit by using XFS_B_TO_FSB. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: cleanup xfs_fill_fsxattrChristoph Hellwig1-2/+4
Add a local xfs_mount variable, and use the XFS_FSB_TO_B helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_flushiter field to struct xfs_inodeChristoph Hellwig6-15/+14
In preparation of removing the historic icinode struct, move the flushiter field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_cowextsize field to struct xfs_inodeChristoph Hellwig9-15/+13
In preparation of removing the historic icinode struct, move the cowextsize field into the containing xfs_inode structure. Also switch to use the xfs_extlen_t instead of a uint32_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_extsize field to struct xfs_inodeChristoph Hellwig8-16/+16
In preparation of removing the historic icinode struct, move the extsize field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_nblocks field to struct xfs_inodeChristoph Hellwig16-34/+33
In preparation of removing the historic icinode struct, move the nblocks field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_size field to struct xfs_inodeChristoph Hellwig29-98/+98
In preparation of removing the historic icinode struct, move the on-disk size field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: move the di_projid field to struct xfs_inodeChristoph Hellwig13-23/+23
In preparation of removing the historic icinode struct, move the projid field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: don't clear the "dinode core" in xfs_inode_allocChristoph Hellwig1-1/+2
The xfs_icdinode structure just contains a random mix of inode field, which are all read from the on-disk inode and mostly not looked at before reading the inode or initializing a new inode cluster. The only exceptions are the forkoff and blocks field, which are used in sanity checks for freshly allocated inodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: remove the di_dmevmask and di_dmstate fields from struct xfs_icdinodeChristoph Hellwig6-17/+35
The legacy DMAPI fields were never set by upstream Linux XFS, and have no way to be read using the kernel APIs. So instead of bloating the in-core inode for them just copy them from the on-disk inode into the log when logging the inode. The only caveat is that we need to make sure to zero the fields for newly read or deleted inodes, which is solved using a new flag in the inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: remove the unused xfs_icdinode_has_bigtime helperChristoph Hellwig1-5/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: handle crtime more carefully in xfs_bulkstat_one_intChristoph Hellwig1-2/+2
The crtime only exists for v5 inodes, so only copy it for those. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: consistently initialize di_flags2Christoph Hellwig2-1/+1
Make sure di_flags2 is always initialized. We currently get this implicitly by clearing the dinode core on allocating the in-core inode, but that is about to go away. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: split xfs_imap_to_bpChristoph Hellwig7-36/+17
Split looking up the dinode from xfs_imap_to_bp, which can be significantly simplified as a result. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08xfs: scrub: Remove incorrect check executed on block format directoriesChandan Babu R1-9/+0
A directory with one directory block which in turns consists of two or more fs blocks is incorrectly flagged as corrupt by scrub since it assumes that "Block" format directories have a data fork single extent spanning the file offset range of [0, Dir block size - 1]. This commit fixes the bug by removing the incorrect check. Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>