summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)AuthorFilesLines
2019-10-21xfs: refactor xfs_file_iomap_begin_delayChristoph Hellwig1-24/+24
Rejuggle the return path to prepare for filling out a source iomap. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: pass two imaps to xfs_reflink_allocate_cowChristoph Hellwig3-21/+21
xfs_reflink_allocate_cow consumes the source data fork imap, and potentially returns the COW fork imap. Split the arguments in two to clear up the calling conventions and to prepare for returning a source iomap from ->iomap_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: remove xfs_reflink_dirty_extentsChristoph Hellwig1-98/+5
Now that xfs_file_unshare is not completely dumb we can just call it directly without iterating the extent and reflink btrees ourselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: also call xfs_file_iomap_end_delalloc for zeroing operationsChristoph Hellwig1-1/+2
There is no reason not to punch out stale delalloc blocks for zeroing operations, as they otherwise behave exactly like normal writes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: fix inode fork extent count overflowDave Chinner5-20/+24
[commit message is verbose for discussion purposes - will trim it down later. Some questions about implementation details at the end.] Zorro Lang recently ran a new test to stress single inode extent counts now that they are no longer limited by memory allocation. The test was simply: # xfs_io -f -c "falloc 0 40t" /mnt/scratch/big-file # ~/src/xfstests-dev/punch-alternating /mnt/scratch/big-file This test uncovered a problem where the hole punching operation appeared to finish with no error, but apparently only created 268M extents instead of the 10 billion it was supposed to. Further, trying to punch out extents that should have been present resulted in success, but no change in the extent count. It looked like a silent failure. While running the test and observing the behaviour in real time, I observed the extent coutn growing at ~2M extents/minute, and saw this after about an hour: # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next ; \ > sleep 60 ; \ > xfs_io -f -c "stat" /mnt/scratch/big-file |grep next fsxattr.nextents = 127657993 fsxattr.nextents = 129683339 # And a few minutes later this: # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next fsxattr.nextents = 4177861124 # Ah, what? Where did that 4 billion extra extents suddenly come from? Stop the workload, unmount, mount: # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next fsxattr.nextents = 166044375 # And it's back at the expected number. i.e. the extent count is correct on disk, but it's screwed up in memory. I loaded up the extent list, and immediately: # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next fsxattr.nextents = 4192576215 # It's bad again. So, where does that number come from? xfs_fill_fsxattr(): if (ip->i_df.if_flags & XFS_IFEXTENTS) fa->fsx_nextents = xfs_iext_count(&ip->i_df); else fa->fsx_nextents = ip->i_d.di_nextents; And that's the behaviour I just saw in a nutshell. The on disk count is correct, but once the tree is loaded into memory, it goes whacky. Clearly there's something wrong with xfs_iext_count(): inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp) { return ifp->if_bytes / sizeof(struct xfs_iext_rec); } Simple enough, but 134M extents is 2**27, and that's right about where things went wrong. A struct xfs_iext_rec is 16 bytes in size, which means 2**27 * 2**4 = 2**31 and we're right on target for an integer overflow. And, sure enough: struct xfs_ifork { int if_bytes; /* bytes in if_u1 */ .... Once we get 2**27 extents in a file, we overflow if_bytes and the in-core extent count goes wrong. And when we reach 2**28 extents, if_bytes wraps back to zero and things really start to go wrong there. This is where the silent failure comes from - only the first 2**28 extents can be looked up directly due to the overflow, all the extents above this index wrap back to somewhere in the first 2**28 extents. Hence with a regular pattern, trying to punch a hole in the range that didn't have holes mapped to a hole in the first 2**28 extents and so "succeeded" without changing anything. Hence "silent failure"... Fix this by converting if_bytes to a int64_t and converting all the index variables and size calculations to use int64_t types to avoid overflows in future. Signed integers are still used to enable easy detection of extent count underflows. This enables scalability of extent counts to the limits of the on-disk format - MAXEXTNUM (2**31) extents. Current testing is at over 500M extents and still going: fsxattr.nextents = 517310478 Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: remove the XLOG_STATE_DO_CALLBACK stateChristoph Hellwig2-66/+11
XLOG_STATE_DO_CALLBACK is only entered through XLOG_STATE_DONE_SYNC and just used in a single debug check. Remove the flag and thus simplify the calling conventions for xlog_state_do_callback and xlog_state_iodone_process_iclog. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21xfs: turn ic_state into an enumChristoph Hellwig3-94/+88
ic_state really is a set of different states, even if the values are encoded as non-conflicting bits and we sometimes use logical and operations to check for them. Switch all comparisms to check for exact values (and use switch statements in a few places to make it more clear) and turn the values into an implicitly enumerated enum type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21xfs: remove the unused XLOG_STATE_ALL and XLOG_STATE_UNUSED flagsChristoph Hellwig1-2/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21xfs: remove dead ifdef XFSERRORDEBUG codeChristoph Hellwig1-13/+0
XFSERRORDEBUG is never set and the code isn't all that useful, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21xfs: call xlog_state_release_iclog with l_icloglock heldChristoph Hellwig1-98/+90
All but one caller of xlog_state_release_iclog hold l_icloglock and need to drop and reacquire it to call xlog_state_release_iclog. Switch the xlog_state_release_iclog calling conventions to expect the lock to be held, and open code the logic (using a shared helper) in the only remaining caller that does not have the lock (and where not holding it is a nice performance optimization). Also move the refactored code to require the least amount of forward declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor whitespace cleanup] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21xfs: move the locking from xlog_state_finish_copy to the callersChristoph Hellwig1-9/+7
This will allow optimizing various locking cycles in the following patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21xfs: remove the unused ic_io_size field from xlog_in_coreChristoph Hellwig2-7/+2
ic_io_size is only used inside xlog_write_iclog, where we can just use the count parameter intead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21xfs: pass the correct flag to xlog_write_iclogChristoph Hellwig1-1/+1
xlog_write_iclog expects a bool for the second argument. While any non-0 value happens to work fine this makes all calls consistent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21xfs: optimize near mode bnobt scans with concurrent cntbt lookupsBrian Foster2-12/+144
The near mode fallback algorithm consists of a left/right scan of the bnobt. This algorithm has very poor breakdown characteristics under worst case free space fragmentation conditions. If a suitable extent is far enough from the locality hint, each allocation may scan most or all of the bnobt before it completes. This causes pathological behavior and extremely high allocation latencies. While locality is important to near mode allocations, it is not so important as to incur pathological allocation latency to provide the asolute best available locality for every allocation. If the allocation is large enough or far enough away, there is a point of diminishing returns. As such, we can bound the overall operation by including an iterative cntbt lookup in the broader search. The cntbt lookup is optimized to immediately find the extent with best locality for the given size on each iteration. Since the cntbt is indexed by extent size, the lookup repeats with a variably aggressive increasing search key size until it runs off the edge of the tree. This approach provides a natural balance between the two algorithms for various situations. For example, the bnobt scan is able to satisfy smaller allocations such as for inode chunks or btree blocks more quickly where the cntbt search may have to search through a large set of extent sizes when the search key starts off small relative to the largest extent in the tree. On the other hand, the cntbt search more deterministically covers the set of suitable extents for larger data extent allocation requests that the bnobt scan may have to search the entire tree to locate. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: factor out tree fixup logic into helperBrian Foster2-10/+33
Lift the btree fixup path into a helper function. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor near mode alloc bnobt scan into separate functionBrian Foster1-54/+74
In preparation to enhance the near mode allocation bnobt scan algorithm, lift it into a separate function. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor and reuse best extent scanning logicBrian Foster1-55/+55
The bnobt "find best" helper implements a simple btree walker function. This general pattern, or a subset thereof, is reused in various parts of a near mode allocation operation. For example, the bnobt left/right scans are each iterative btree walks along with the cntbt lastblock scan. Rework this function into a generic btree walker, add a couple parameters to control termination behavior from various contexts and reuse it where applicable. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor allocation tree fixup codeBrian Foster1-16/+2
Both algorithms duplicate the same btree allocation code. Eliminate the duplication and reuse the fallback algorithm codepath. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: reuse best extent tracking logic for bnobt scanBrian Foster2-201/+79
The near mode bnobt scan searches left and right in the bnobt looking for the closest free extent to the allocation hint that satisfies minlen. Once such an extent is found, the left/right search terminates, we search one more time in the opposite direction and finish the allocation with the best overall extent. The left/right and find best searches are currently controlled via a combination of cursor state and local variables. Clean up this code and prepare for further improvements to the near mode fallback algorithm by reusing the allocation cursor best extent tracking mechanism. Update the tracking logic to deactivate bnobt cursors when out of allocation range and replace open-coded extent checks to calls to the common helper. In doing so, rename some misnamed local variables in the top-level near mode allocation function. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor cntbt lastblock scan best extent logic into helperBrian Foster2-28/+113
The cntbt lastblock scan checks the size, alignment, locality, etc. of each free extent in the block and compares it with the current best candidate. This logic will be reused by the upcoming optimized cntbt algorithm, so refactor it into a separate helper. Note that acur->diff is now initialized to -1 (unsigned) instead of 0 to support the more granular comparison logic in the new helper. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: track best extent from cntbt lastblock scan in alloc cursorBrian Foster1-30/+33
If the size lookup lands in the last block of the by-size btree, the near mode algorithm scans the entire block for the extent with best available locality. In preparation for similar best available extent tracking across both btrees, extend the allocation cursor with best extent data and lift the associated state from the cntbt last block scan code. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: track allocation busy state in allocation cursorBrian Foster1-11/+14
Extend the allocation cursor to track extent busy state for an allocation attempt. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: introduce allocation cursor data structureBrian Foster1-155/+163
Introduce a new allocation cursor data structure to encapsulate the various states and structures used to perform an extent allocation. This structure will eventually be used to track overall allocation state across different search algorithms on both free space btrees. To start, include the three btree cursors (one for the cntbt and two for the bnobt left/right search) used by the near mode allocation algorithm and refactor the cursor setup and teardown code into helpers. This slightly changes cursor memory allocation patterns, but otherwise makes no functional changes to the allocation algorithm. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix sparse complaints] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: track active state of allocation btree cursorsBrian Foster3-3/+25
The upcoming allocation algorithm update searches multiple allocation btree cursors concurrently. As such, it requires an active state to track when a particular cursor should continue searching. While active state will be modified based on higher level logic, we can define base functionality based on the result of allocation btree lookups. Define an active flag in the private area of the btree cursor. Update it based on the result of lookups in the existing allocation btree helpers. Finally, provide a new helper to query the current state. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: ignore extent size hints for always COW inodesChristoph Hellwig1-0/+6
There is no point in applying extent size hints for always COW inodes, as we would just have to COW any extra allocation beyond the data actually written. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: include QUOTA, FATAL ASSERT build options in XFS_BUILD_OPTIONSyu kuai1-0/+10
In commit d03a2f1b9fa8 ("xfs: include WARN, REPAIR build options in XFS_BUILD_OPTIONS"), Eric pointed out that the XFS_BUILD_OPTIONS string, shown at module init time and in modinfo output, does not currently include all available build options. So, he added in CONFIG_XFS_WARN and CONFIG_XFS_REPAIR. However, this is not enough, add in CONFIG_XFS_QUOTA and CONFIG_XFS_ASSERT_FATAL. Signed-off-by: yu kuai <yukuai3@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: use a srcmap for a read-modify-write I/OGoldwyn Rodrigues1-3/+6
The srcmap is used to identify where the read is to be performed from. It is passed to ->iomap_begin, which can fill it in if we need to read data for partially written blocks from a different location than the write target. The srcmap is only supported for buffered writes so far. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as srcmap if not set, adjust length down to srcmap end as well] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2019-10-21iomap: ignore non-shared or non-data blocks in xfs_file_dirtyChristoph Hellwig1-1/+1
xfs_file_dirty is used to unshare reflink blocks. Rename the function to xfs_file_unshare to better document that purpose, and skip iomaps that are not shared and don't need zeroing. This will allow to simplify the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: lift the xfs writeback code to iomapChristoph Hellwig4-647/+59
Take the xfs writeback code and move it to fs/iomap. A new structure with three methods is added as the abstraction from the generic writeback code to the file system. These methods are used to map blocks, submit an ioend, and cancel a page that encountered an error before it was added to an ioend. Signed-off-by: Christoph Hellwig <hch@lst.de> [darrick: rename ->submit_ioend to ->prepare_ioend to clarify what it does] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-10-21iomap: lift common tracing code from xfs to iomapChristoph Hellwig2-50/+3
Lift the xfs code for tracing address space operations to the iomap layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: remove the fork fields in the writepage_ctx and ioendChristoph Hellwig2-27/+46
In preparation for moving the writeback code to iomap.c, replace the XFS-specific COW fork concept with the iomap IOMAP_F_SHARED flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: turn io_append_trans into an io_private void pointerChristoph Hellwig2-14/+14
In preparation for moving the ioend structure to common code we need to get rid of the xfs-specific xfs_trans type. Just make it a file system private void pointer instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor the ioend merging codeChristoph Hellwig1-31/+42
Introduce two nicely abstracted helper, which can be moved to the iomap code later. Also use list_first_entry_or_null to simplify the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: use a struct iomap in xfs_writepage_ctxChristoph Hellwig4-51/+50
In preparation for moving the XFS writeback code to fs/iomap.c, switch it to use struct iomap instead of the XFS-specific struct xfs_bmbt_irec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: set IOMAP_F_NEW more carefullyChristoph Hellwig1-3/+6
Don't set IOMAP_F_NEW if we COW over an existing allocated range, as these aren't strictly new allocations. This is required to be able to use IOMAP_F_NEW to zero newly allocated blocks, which is required for the iomap code to fully support file systems that don't do delayed allocations or use unwritten extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: initialize iomap->flags in xfs_bmbt_to_iomapChristoph Hellwig3-14/+20
Currently we don't overwrite the flags field in the iomap in xfs_bmbt_to_iomap. This works fine with 0-initialized iomaps on stack, but is harmful once we want to be able to reuse an iomap in the writeback code. Replace the shared parameter with a set of initial flags an thus ensures the flags field is always reinitialized. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-17iomap: iomap that extends beyond EOF should be marked dirtyDave Chinner1-0/+7
When doing a direct IO that spans the current EOF, and there are written blocks beyond EOF that extend beyond the current write, the only metadata update that needs to be done is a file size extension. However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that there is IO completion metadata updates required, and hence we may fail to correctly sync file size extensions made in IO completion when O_DSYNC writes are being used and the hardware supports FUA. Hence when setting IOMAP_F_DIRTY, we need to also take into account whether the iomap spans the current EOF. If it does, then we need to mark it dirty so that IO completion will call generic_write_sync() to flush the inode size update to stable storage correctly. Fixes: 3460cac1ca76 ("iomap: Use FUA for pure data O_DSYNC DIO writes") Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: removed the ext4 part; they'll handle it separately] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-15xfs: change the seconds fields in xfs_bulkstat to signedDarrick J. Wong1-4/+4
64-bit time is a signed quantity in the kernel, so the bulkstat structure should reflect that. Note that the structure size stays the same and that we have not yet published userspace headers for this new ioctl so there are no users to break. Fixes: 7035f9724f84 ("xfs: introduce new v5 bulkstat structure") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-10-15xfs: Use iomap_dio_rw to wait for unaligned direct IOJan Kara1-8/+4
Use iomap_dio_rw() to wait for unaligned direct IO instead of opencoding the wait. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-15iomap: Allow forcing of waiting for running DIO in iomap_dio_rw()Jan Kara1-2/+3
Filesystems do not support doing IO as asynchronous in some cases. For example in case of unaligned writes or in case file size needs to be extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO in such cases, add argument to iomap_dio_rw() which makes the function wait for IO completion. This also results in executing iomap_dio_complete() inline in iomap_dio_rw() providing its return value to the caller as for ordinary sync IO. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-09xfs: move local to extent inode logging into bmap helperBrian Foster4-7/+8
The callers of xfs_bmap_local_to_extents_empty() log the inode external to the function, yet this function is where the on-disk format value is updated. Push the inode logging down into the function itself to help prevent future mistakes. Note that internal bmap callers track the inode logging flags independently and thus may log the inode core twice due to this change. This is harmless, so leave this code around for consistency with the other attr fork conversion functions. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-09xfs: remove broken error handling on failed attr sf to leaf changeBrian Foster1-17/+2
xfs_attr_shortform_to_leaf() attempts to put the shortform fork back together after a failed attempt to convert from shortform to leaf format. While this code reallocates and copies back the shortform attr fork data, it never resets the inode format field back to local format. Further, now that the inode is properly logged after the initial switch from local format, any error that triggers the recovery code will eventually abort the transaction and shutdown the fs. Therefore, remove the broken and unnecessary error handling code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-09xfs: log the inode on directory sf to block format changeBrian Foster2-0/+2
When a directory changes from shortform (sf) to block format, the sf format is copied to a temporary buffer, the inode format is modified and the updated format filled with the dentries from the temporary buffer. If the inode format is modified and attempt to grow the inode fails (due to I/O error, for example), it is possible to return an error while leaving the directory in an inconsistent state and with an otherwise clean transaction. This results in corruption of the associated directory and leads to xfs_dabuf_map() errors as subsequent lookups cannot accurately determine the format of the directory. This problem is reproduced occasionally by generic/475. The fundamental problem is that xfs_dir2_sf_to_block() changes the on-disk inode format without logging the inode. The inode is eventually logged by the bmapi layer in the common case, but error checking introduces the possibility of failing the high level request before this happens. Update both of the dir2 and attr callers of xfs_bmap_local_to_extents_empty() to log the inode core as consistent with the bmap local to extent format change codepath. This ensures that any subsequent errors after the format has changed cause the transaction to abort. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-07xfs: assure zeroed memory buffers for certain kmem allocationsBill O'Donnell3-3/+13
Guarantee zeroed memory buffers for cases where potential memory leak to disk can occur. In these cases, kmem_alloc is used and doesn't zero the buffer, opening the possibility of information leakage to disk. Use existing infrastucture (xfs_buf_allocate_memory) to obtain the already zeroed buffer from kernel memory. This solution avoids the performance issue that would occur if a wholesale change to replace kmem_alloc with kmem_zalloc was done. Signed-off-by: Bill O'Donnell <billodo@redhat.com> [darrick: fix bitwise complaint about kmflag_mask] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-07xfs: removed unused error variable from xchk_refcountbt_recAliasgar Surti1-2/+1
Removed unused error variable. Instead of using error variable, returned the value directly as it wasn't updated. Signed-off-by: Aliasgar Surti <aliasgar.surti500@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-07xfs: remove unused flags arg from xfs_get_aghdr_buf()Eric Sandeen1-3/+2
The flags arg is always passed as zero, so remove it. (xfs_buf_get_uncached takes flags to support XBF_NO_IOACCT for the sb, but that should never be relevant for xfs_get_aghdr_buf) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-07xfs: Fix tail rounding in xfs_alloc_file_space()Max Reitz1-1/+3
To ensure that all blocks touched by the range [offset, offset + count) are allocated, we need to calculate the block count from the difference of the range end (rounded up) and the range start (rounded down). Before this patch, we just round up the byte count, which may lead to unaligned ranges not being fully allocated: $ touch test_file $ block_size=$(stat -fc '%S' test_file) $ fallocate -o $((block_size / 2)) -l $block_size test_file $ xfs_bmap test_file test_file: 0: [0..7]: 1396264..1396271 1: [8..15]: hole There should not be a hole there. Instead, the first two blocks should be fully allocated. With this patch applied, the result is something like this: $ touch test_file $ block_size=$(stat -fc '%S' test_file) $ fallocate -o $((block_size / 2)) -l $block_size test_file $ xfs_bmap test_file test_file: 0: [0..15]: 11024..11039 Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-26Merge tag 'xfs-5.4-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds5-21/+17
Pull xfs fixes from Darrick Wong: "There are a couple of bug fixes and some small code cleanups that came in recently: - Minor code cleanups - Fix a superblock logging error - Ensure that collapse range converts the data fork to extents format when necessary - Revert the ALLOC_USERDATA cleanup because it caused subtle behavior regressions" * tag 'xfs-5.4-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: avoid unused to_mp() function warning xfs: log proper length of superblock xfs: revert 1baa2800e62d ("xfs: remove the unused XFS_ALLOC_USERDATA flag") xfs: removed unneeded variable xfs: convert inode to extent format after extent merge due to shift
2019-09-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+2
Merge more updates from Andrew Morton: - almost all of the rest of -mm - various other subsystems Subsystems affected by this patch series: memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork, cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise, cleanups, pagemap * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (77 commits) arch/sparc/include/asm/pgtable_64.h: fix build mm: treewide: clarify pgtable_page_{ctor,dtor}() naming ntfs: remove (un)?likely() from IS_ERR() conditions IB/hfi1: remove unlikely() from IS_ERR*() condition xfs: remove unlikely() from WARN_ON() condition wimax/i2400m: remove unlikely() from WARN*() condition fs: remove unlikely() from WARN_ON() condition xen/events: remove unlikely() from WARN() condition checkpatch: check for nested (un)?likely() calls hexagon: drop empty and unused free_initrd_mem mm: factor out common parts between MADV_COLD and MADV_PAGEOUT mm: introduce MADV_PAGEOUT mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM mm: introduce MADV_COLD mm: untag user pointers in mmap/munmap/mremap/brk vfio/type1: untag user pointers in vaddr_get_pfn tee/shm: untag user pointers in tee_shm_register media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get drm/radeon: untag user pointers in radeon_gem_userptr_ioctl drm/amdgpu: untag user pointers ...
2019-09-26xfs: remove unlikely() from WARN_ON() conditionDenis Efremov1-2/+2
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely() internally. Link: http://lkml.kernel.org/r/20190829165025.15750-7-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>