summaryrefslogtreecommitdiff
path: root/fs/xfs/libxfs
AgeCommit message (Collapse)AuthorFilesLines
2021-08-19xfs: convert bp->b_bn references to xfs_buf_daddr()Dave Chinner13-37/+37
Stop directly referencing b_bn in code outside the buffer cache, as b_bn is supposed to be used only as an internal cache index. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: introduce xfs_buf_daddr()Dave Chinner10-16/+16
Introduce a helper function xfs_buf_daddr() to extract the disk address of the buffer from the struct xfs_buf. This will replace direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as the XFS_BUF_ADDR() macro. This patch introduces the helper function and replaces all uses of XFS_BUF_ADDR() as this is just a simple sed replacement. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: kill xfs_sb_version_has_v3inode()Dave Chinner4-19/+15
All callers to xfs_dinode_good_version() and XFS_DINODE_SIZE() in both the kernel and userspace have a xfs_mount structure available which means they can use mount features checks instead looking directly are the superblock. Convert these functions to take a mount and use a xfs_has_v3inodes() check and move it out of the libxfs/xfs_format.h file as it really doesn't have anything to do with the definition of the on-disk format. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: introduce xfs_sb_is_v5 helperDave Chinner2-30/+31
Rather than open coding XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 checks everywhere, add a simple wrapper to encapsulate this and make the code easier to read. This allows us to remove the xfs_sb_version_has_v3inode() wrapper which is only used in xfs_format.h now and is just a version number check. There are a couple of places where we should be checking the mount feature bits rather than the superblock version (e.g. remount), so those are converted to use xfs_has_crc(mp) instead. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: remove unused xfs_sb_version_has wrappersDave Chinner1-152/+3
The vast majority of these wrappers are now unused. Remove them leaving just the small subset of wrappers that are used to either add feature bits or make the mount features field setup code simpler. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: convert xfs_sb_version_has checks to use mount featuresDave Chinner24-77/+75
This is a conversion of the remaining xfs_sb_version_has..(sbp) checks to use xfs_has_..(mp) feature checks. This was largely done with a vim replacement macro that did: :0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR> A couple of other variants were also used, and the rest touched up by hand. $ size -t fs/xfs/built-in.a text data bss dec hex filename before 1127533 311352 484 1439369 15f689 (TOTALS) after 1125360 311352 484 1437196 15ee0c (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: open code sb verifier feature checksDave Chinner3-65/+81
The superblock verifiers are one of the last places that use the sb version functions to do feature checks. This are all quite simple uses, and there aren't many of them so open code them all. Also, move the good version number check into xfs_sb.c instead of it being an inline function in xfs_format.h Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: convert xfs_fs_geometry to use mount feature checksDave Chinner2-23/+25
Reporting filesystem features to userspace is currently superblock based. Now we have a general mount-based feature infrastructure, switch to using the xfs_mount rather than the superblock directly. This reduces the size of the function by over 300 bytes. $ size -t fs/xfs/built-in.a text data bss dec hex filename before 1127855 311352 484 1439691 15f7cb (TOTALS) after 1127535 311352 484 1439371 15f68b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdownDave Chinner5-15/+15
Remove the shouty macro and instead use the inline function that matches other state/feature check wrapper naming. This conversion was done with sed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: convert remaining mount flags to state flagsDave Chinner2-2/+2
The remaining mount flags kept in m_flags are actually runtime state flags. These change dynamically, so they really should be updated atomically so we don't potentially lose an update due to racing modifications. Convert these remaining flags to be stored in m_opstate and use atomic bitops to set and clear the flags. This also adds a couple of simple wrappers for common state checks - read only and shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: convert mount flags to featuresDave Chinner4-28/+31
Replace m_flags feature checks with xfs_has_<feature>() calls and rework the setup code to set flags in m_features. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: replace xfs_sb_version checks with feature flag checksDave Chinner32-126/+126
Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: reflect sb features in xfs_mountDave Chinner3-1/+68
Currently on-disk feature checks require decoding the superblock fileds and so can be non-trivial. We have almost 400 hundred individual feature checks in the XFS code, so this is a significant amount of code. To reduce runtime check overhead, pre-process all the version flags into a features field in the xfs_mount at mount time so we can convert all the feature checks to a simple flag check. There is also a need to convert the dynamic feature flags to update the m_features field. This is required for attr, attr2 and quota features. New xfs_mount based wrappers are added for this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: rework attr2 feature and mount optionsDave Chinner1-7/+0
The attr2 feature is somewhat unique in that it has both a superblock feature bit to enable it and mount options to enable and disable it. Back when it was first introduced in 2005, attr2 was disabled unless either the attr2 superblock feature bit was set, or the attr2 mount option was set. If the superblock feature bit was not set but the mount option was set, then when the first attr2 format inode fork was created, it would set the superblock feature bit. This is as it should be - the superblock feature bit indicated the presence of the attr2 on disk format. The noattr2 mount option, however, did not affect the superblock feature bit. If noattr2 was specified, the on-disk superblock feature bit was ignored and the code always just created attr1 format inode forks. If neither of the attr2 or noattr2 mounts option were specified, then the behaviour was determined by the superblock feature bit. This was all pretty sane. Fast foward 3 years, and we are dealing with fallout from the botched sb_features2 addition and having to deal with feature mismatches between the sb_features2 and sb_bad_features2 fields. The attr2 feature bit was one of these flags. The reconciliation was done well after mount option parsing and, unfortunately, the feature reconciliation had a bug where it ignored the noattr2 mount option. For reasons lost to the mists of time, it was decided that resolving this issue in commit 7c12f296500e ("[XFS] Fix up noattr2 so that it will properly update the versionnum and features2 fields.") required noattr2 to clear the superblock attr2 feature bit. This greatly complicated the attr2 behaviour and broke rules about feature bits needing to be set when those specific features are present in the filesystem. By complicated, I mean that it introduced problems due to feature bit interactions with log recovery. All of the superblock feature bit checks are done prior to log recovery, but if we crash after removing a feature bit, then on the next mount we see the feature bit in the unrecovered superblock, only to have it go away after the log has been replayed. This means our mount time feature processing could be all wrong. Hence you can mount with noattr2, crash shortly afterwards, and mount again without attr2 or noattr2 and still have attr2 enabled because the second mount sees attr2 still enabled in the superblock before recovery runs and removes the feature bit. It's just a mess. Further, this is all legacy code as the v5 format requires attr2 to be enabled at all times and it cannot be disabled. i.e. the noattr2 mount option returns an error when used on v5 format filesystems. To straighten this all out, this patch reverts the attr2/noattr2 mount option behaviour back to the original behaviour. There is no reason for disabling attr2 these days, so we will only do this when the noattr2 mount option is set. This will not remove the superblock feature bit. The superblock bit will provide the default behaviour and only track whether attr2 is present on disk or not. The attr2 mount option will enable the creation of attr2 format inode forks, and if the superblock feature bit is not set it will be added when the first attr2 inode fork is created. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: rename xfs_has_attr()Dave Chinner2-5/+3
xfs_has_attr() is poorly named. It has global scope as it is defined in a header file, but it has no namespace scope that tells us what it is checking has attributes. It's not even clear what "has_attr" means, because what it is actually doing is an attribute fork lookup to see if the attribute exists. Upcoming patches use this "xfs_has_<foo>" namespace for global filesystem features, which conflicts with this function. Rename xfs_has_attr() to xfs_attr_lookup() and make it a static function, freeing up the "xfs_has_" namespace for global scope usage. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: sb verifier doesn't handle uncached sb bufferDave Chinner1-1/+1
The verifier checks explicitly for bp->b_bn == XFS_SB_DADDR to match the primary superblock buffer, but the primary superblock is an uncached buffer and so bp->b_bn is always -1ULL. Hence this never matches and the CRC error reporting is wholly dependent on the mount superblock already being populated so CRC feature checks pass and allow CRC errors to be reported. Fix this so that the primary superblock CRC error reporting is not dependent on already having read the superblock into memory. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: resolve fork names in trace outputDarrick J. Wong1-0/+5
Emit whichfork values as text strings in the ftrace output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2021-08-19xfs: constify btree function parameters that are not modifiedDarrick J. Wong2-44/+47
Constify the rest of the btree functions that take structure and union pointers and are not supposed to modify them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: make the start pointer passed to btree update_lastrec functions constDarrick J. Wong2-9/+9
This btree function is called when updating a record in the rightmost block of a btree so that we can update the AGF's longest free extent length field. Neither parameter is supposed to be updated, so mark them both const. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: make the start pointer passed to btree alloc_block functions constDarrick J. Wong7-34/+34
The @start pointer passed to each per-AG btree type's ->alloc_block function isn't supposed to be modified, since it's a hint about the location of the btree block being split that is to be fed to the allocator, so mark the parameter const. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: make the pointer passed to btree set_root functions constDarrick J. Wong6-19/+19
The pointer passed to each per-AG btree type's ->set_root function isn't supposed to be modified (that function sets an external pointer to the root block) so mark them const. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: mark the record passed into xchk_btree functions as constDarrick J. Wong2-2/+3
xchk_btree calls a user-supplied function to validate each btree record that it finds. Those functions are not supposed to change the record data, so mark the parameter const. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: make the keys and records passed to btree inorder functions constDarrick J. Wong6-40/+40
The inorder functions are simple predicates, which means that they don't modify the parameters. Mark them all const. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: mark the record passed into btree init_key functions as constDarrick J. Wong7-33/+33
These functions initialize a key from a record, but they aren't supposed to modify the record. Mark it const. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: make the record pointer passed to query_range functions constDarrick J. Wong10-28/+29
The query_range functions are supposed to call a caller-supplied function on each record found in the dataset. These functions don't own the memory storing the record, so don't let them change the record. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: make the key parameters to all btree query range functions constDarrick J. Wong6-14/+16
Range query functions are not supposed to modify the query keys that are being passed in, so mark them all const. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: make the key parameters to all btree key comparison functions constDarrick J. Wong6-50/+50
The btree key comparison functions are not allowed to change the keys that are passed in, so mark them const. We'll need this for the next patch, which adds const to the btree range query functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-19xfs: make xfs_rtalloc_query_range input parameters constDarrick J. Wong1-7/+7
In commit 8ad560d2565e, we changed xfs_rtalloc_query_range to constrain the range of bits in the realtime bitmap file that would actually be searched. In commit a3a374bf1889, we changed the range again (incorrectly), leading to the fix in commit d88850bd5516, which finally corrected the range check code. Unfortunately, the author never noticed that the function modifies its input parameters, which is a totaly no-no since none of the other range query functions change their input parameters. So, fix this function yet again to stash the upper end of the query range (i.e. the high key) in a local variable and hope this is the last time I have to fix my own function. While we're at it, mark the key inputs const so nobody makes this mistake again. :( Fixes: 8ad560d2565e ("xfs: strengthen rtalloc query range checks") Not-fixed-by: a3a374bf1889 ("xfs: fix off-by-one error in xfs_rtalloc_query_range") Not-fixed-by: d88850bd5516 ("xfs: fix high key handling in the rt allocator's query_range function") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-08-11xfs: Rename __xfs_attr_rmtval_removeAllison Henderson3-5/+5
Now that xfs_attr_rmtval_remove is gone, rename __xfs_attr_rmtval_remove to xfs_attr_rmtval_remove Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-10xfs: add attr state machine tracepointsAllison Henderson2-2/+30
This is a quick patch to add a new xfs_attr_*_return tracepoints. We use these to track when ever a new state is set or -EAGAIN is returned Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-10xfs: refactor xfs_iget calls from log intent recoveryDarrick J. Wong1-0/+2
Hoist the code from xfs_bui_item_recover that igets an inode and marks it as being part of log intent recovery. The next patch will want a common function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-08-10xfs: allow setting and clearing of log incompat feature flagsDarrick J. Wong1-0/+15
Log incompat feature flags in the superblock exist for one purpose: to protect the contents of a dirty log from replay on a kernel that isn't prepared to handle those dirty contents. This means that they can be cleared if (a) we know the log is clean and (b) we know that there aren't any other threads in the system that might be setting or relying upon a log incompat flag. Therefore, clear the log incompat flags when we've finished recovering the log, when we're unmounting cleanly, remounting read-only, or freezing; and provide a function so that subsequent patches can start using this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-08-10xfs: replace kmem_alloc_large() with kvmalloc()Dave Chinner1-1/+1
There is no reason for this wrapper existing anymore. All the places that use KM_NOFS allocation are within transaction contexts and hence covered by memalloc_nofs_save/restore contexts. Hence we don't need any special handling of vmalloc for large IOs anymore and so special casing this code isn't necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-09xfs: fix silly whitespace problems with kernel libxfsDarrick J. Wong4-4/+4
Fix a few whitespace errors such as spaces at the end of the line, etc. This gets us back to something more closely resembling parity. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-06xfs: remove the active vs running quota differentiationChristoph Hellwig1-26/+4
These only made a difference when quotaoff supported disabling quota accounting on a mounted file system, so we can switch everyone to use a single set of flags and helpers now. Note that the *QUOTA_ON naming for the helpers is kept as it was the much more commonly used one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-06xfs: remove support for disabling quota accounting on a mounted file systemChristoph Hellwig2-32/+0
Disabling quota accounting is hairy, racy code with all kinds of pitfalls. And it has a very strange mind set, as quota accounting (unlike enforcement) really is a propery of the on-disk format. There is no good use case for supporting this. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-01Merge tag 'xfs-5.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-1/+10
Pull xfs fixes from Darrick Wong: "This contains a bunch of bug fixes in XFS. Dave and I have been busy the last couple of weeks to find and fix as many log recovery bugs as we can find; here are the results so far. Go fstests -g recoveryloop! ;) - Fix a number of coordination bugs relating to cache flushes for metadata writeback, cache flushes for multi-buffer log writes, and FUA writes for single-buffer log writes - Fix a bug with incorrect replay of attr3 blocks - Fix unnecessary stalls when flushing logs to disk - Fix spoofing problems when recovering realtime bitmap blocks" * tag 'xfs-5.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: prevent spoofing of rtbitmap blocks when recovering buffers xfs: limit iclog tail updates xfs: need to see iclog flags in tracing xfs: Enforce attr3 buffer recovery order xfs: logging the on disk inode LSN can make it go backwards xfs: avoid unnecessary waits in xfs_log_force_lsn() xfs: log forces imply data device cache flushes xfs: factor out forced iclog flushes xfs: fix ordering violation between cache flushes and tail updates xfs: fold __xlog_state_release_iclog into xlog_state_release_iclog xfs: external logs need to flush data device xfs: flush data dev on external log write
2021-07-29xfs: logging the on disk inode LSN can make it go backwardsDave Chinner1-1/+10
When we log an inode, we format the "log inode" core and set an LSN in that inode core. We do that via xfs_inode_item_format_core(), which calls: xfs_inode_to_log_dinode(ip, dic, ip->i_itemp->ili_item.li_lsn); to format the log inode. It writes the LSN from the inode item into the log inode, and if recovery decides the inode item needs to be replayed, it recovers the log inode LSN field and writes it into the on disk inode LSN field. Now this might seem like a reasonable thing to do, but it is wrong on multiple levels. Firstly, if the item is not yet in the AIL, item->li_lsn is zero. i.e. the first time the inode it is logged and formatted, the LSN we write into the log inode will be zero. If we only log it once, recovery will run and can write this zero LSN into the inode. This means that the next time the inode is logged and log recovery runs, it will *always* replay changes to the inode regardless of whether the inode is newer on disk than the version in the log and that violates the entire purpose of recording the LSN in the inode at writeback time (i.e. to stop it going backwards in time on disk during recovery). Secondly, if we commit the CIL to the journal so the inode item moves to the AIL, and then relog the inode, the LSN that gets stamped into the log inode will be the LSN of the inode's current location in the AIL, not it's age on disk. And it's not the LSN that will be associated with the current change. That means when log recovery replays this inode item, the LSN that ends up on disk is the LSN for the previous changes in the log, not the current changes being replayed. IOWs, after recovery the LSN on disk is not in sync with the LSN of the modifications that were replayed into the inode. This, again, violates the recovery ordering semantics that on-disk writeback LSNs provide. Hence the inode LSN in the log dinode is -always- invalid. Thirdly, recovery actually has the LSN of the log transaction it is replaying right at hand - it uses it to determine if it should replay the inode by comparing it to the on-disk inode's LSN. But it doesn't use that LSN to stamp the LSN into the inode which will be written back when the transaction is fully replayed. It uses the one in the log dinode, which we know is always going to be incorrect. Looking back at the change history, the inode logging was broken by commit 93f958f9c41f ("xfs: cull unnecessary icdinode fields") way back in 2016 by a stupid idiot who thought he knew how this code worked. i.e. me. That commit replaced an in memory di_lsn field that was updated only at inode writeback time from the inode item.li_lsn value - and hence always contained the same LSN that appeared in the on-disk inode - with a read of the inode item LSN at inode format time. CLearly these are not the same thing. Before 93f958f9c41f, the log recovery behaviour was irrelevant, because the LSN in the log inode always matched the on-disk LSN at the time the inode was logged, hence recovery of the transaction would never make the on-disk LSN in the inode go backwards or get out of sync. A symptom of the problem is this, caught from a failure of generic/482. Before log recovery, the inode has been allocated but never used: xfs_db> inode 393388 xfs_db> p core.magic = 0x494e core.mode = 0 .... v3.crc = 0x99126961 (correct) v3.change_count = 0 v3.lsn = 0 v3.flags2 = 0 v3.cowextsize = 0 v3.crtime.sec = Thu Jan 1 10:00:00 1970 v3.crtime.nsec = 0 After log recovery: xfs_db> p core.magic = 0x494e core.mode = 020444 .... v3.crc = 0x23e68f23 (correct) v3.change_count = 2 v3.lsn = 0 v3.flags2 = 0 v3.cowextsize = 0 v3.crtime.sec = Thu Jul 22 17:03:03 2021 v3.crtime.nsec = 751000000 ... You can see that the LSN of the on-disk inode is 0, even though it clearly has been written to disk. I point out this inode, because the generic/482 failure occurred because several adjacent inodes in this specific inode cluster were not replayed correctly and still appeared to be zero on disk when all the other metadata (inobt, finobt, directories, etc) indicated they should be allocated and written back. The fix for this is two-fold. The first is that we need to either revert the LSN changes in 93f958f9c41f or stop logging the inode LSN altogether. If we do the former, log recovery does not need to change but we add 8 bytes of memory per inode to store what is largely a write-only inode field. If we do the latter, log recovery needs to stamp the on-disk inode in the same manner that inode writeback does. I prefer the latter, because we shouldn't really be trying to log and replay changes to the on disk LSN as the on-disk value is the canonical source of the on-disk version of the inode. It also matches the way we recover buffer items - we create a buf_log_item that carries the current recovery transaction LSN that gets stamped into the buffer by the write verifier when it gets written back when the transaction is fully recovered. However, this might break log recovery on older kernels even more, so I'm going to simply ignore the logged value in recovery and stamp the on-disk inode with the LSN of the transaction being recovered that will trigger writeback on transaction recovery completion. This will ensure that the on-disk inode LSN always reflects the LSN of the last change that was written to disk, regardless of whether it comes from log recovery or runtime writeback. Fixes: 93f958f9c41f ("xfs: cull unnecessary icdinode fields") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-18Merge tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds5-18/+86
Pull xfs fixes from Darrick Wong: "A few fixes for issues in the new online shrink code, additional corrections for my recent bug-hunt w.r.t. extent size hints on realtime, and improved input checking of the GROWFSRT ioctl. IOW, the usual 'I somehow got bored during the merge window and resumed auditing the farther reaches of xfs': - Fix shrink eligibility checking when sparse inode clusters enabled - Reset '..' directory entries when unlinking directories to prevent verifier errors if fs is shrinked later - Don't report unusable extent size hints to FSGETXATTR - Don't warn when extent size hints are unusable because the sysadmin configured them that way - Fix insufficient parameter validation in GROWFSRT ioctl - Fix integer overflow when adding rt volumes to filesystem" * tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: detect misaligned rtinherit directory extent size hints xfs: fix an integer overflow error in xfs_growfs_rt xfs: improve FSGROWFSRT precondition checking xfs: don't expose misaligned extszinherit hints to userspace xfs: correct the narrative around misaligned rtinherit/extszinherit dirs xfs: reset child dir '..' entry when unlinking child xfs: check for sparse inode clusters that cross new EOAG when shrinking
2021-07-15xfs: correct the narrative around misaligned rtinherit/extszinherit dirsDarrick J. Wong2-18/+20
While auditing the realtime growfs code, I realized that the GROWFSRT ioctl (and by extension xfs_growfs) has always allowed sysadmins to change the realtime extent size when adding a realtime section to the filesystem. Since we also have always allowed sysadmins to set RTINHERIT and EXTSZINHERIT on directories even if there is no realtime device, this invalidates the premise laid out in the comments added in commit 603f000b15f2. In other words, this is not a case of inadequate metadata validation. This is a case of nearly forgotten (and apparently untested) but supported functionality. Update the comments to reflect what we've learned, and remove the log message about correcting the misalignment. Fixes: 603f000b15f2 ("xfs: validate extsz hints against rt extent size when rtinherit is set") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-07-15xfs: check for sparse inode clusters that cross new EOAG when shrinkingDarrick J. Wong3-0/+66
While running xfs/168, I noticed occasional write verifier shutdowns involving inodes at the very end of the filesystem. Existing inode btree validation code checks that all inode clusters are fully contained within the filesystem. However, due to inadequate checking in the fs shrink code, it's possible that there could be a sparse inode cluster at the end of the filesystem where the upper inodes of the cluster are marked as holes and the corresponding blocks are free. In this case, the last blocks in the AG are listed in the bnobt. This enables the shrink to proceed but results in a filesystem that trips the inode verifiers. Fix this by disallowing the shrink. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-07-12xfs: Fix multiple fall-through warnings for ClangGustavo A. R. Silva1-8/+8
In preparation to enable -Wimplicit-fallthrough for Clang, fix the following warnings by replacing /* fallthrough */ comments, and its variants, with the new pseudo-keyword macro fallthrough: fs/xfs/libxfs/xfs_attr.c:487:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:500:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:532:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:594:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:607:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:1410:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:1445:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:1473:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Notice that Clang doesn't recognize /* fallthrough */ comments as implicit fall-through markings, so in order to globally enable -Wimplicit-fallthrough for Clang, these comments need to be replaced with fallthrough; in the whole codebase. Link: https://github.com/KSPP/linux/issues/115 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-07-03Merge tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds37-1305/+2117
Pull xfs updates from Darrick Wong: "Most of the work this cycle has been on refactoring various parts of the codebase. The biggest non-cleanup changes are (1) reducing the number of cache flushes sent when writing the log; (2) a substantial number of log recovery fixes; and (3) I started accepting pull requests from contributors if the commits in their branches match what's been sent to the list. For a week or so I /had/ staged a major cleanup of the logging code from Dave Chinner, but it exposed so many lurking bugs in other parts of the logging and log recovery code that I decided to defer that patchset until we can address those latent bugs. Larger cleanups this time include walking the incore inode cache (me) and rework of the extended attribute code (Allison) to prepare it for adding logged xattr updates (and directory tree parent pointers) in future releases. Summary: - Refactor the buffer cache to use bulk page allocation - Convert agnumber-based AG iteration to walk per-AG structures - Clean up some unit conversions and other code warts - Reduce spinlock contention in the directio fastpath - Collapse all the inode cache walks into a single function - Remove indirect function calls from the inode cache walk code - Dramatically reduce the number of cache flushes sent when writing log buffers - Preserve inode sickness reports for longer - Rename xfs_eofblocks since it controls inode cache walks - Refactor the extended attribute code to prepare it for the addition of log intent items to make xattrs fully transactional - A few fixes to earlier large patchsets - Log recovery fixes so that we don't accidentally mark the log clean when log intent recovery fails - Fix some latent SOB errors - Clean up shutdown messages that get logged to dmesg - Fix a regression in the online shrink code - Fix a UAF in the buffer logging code if the fs goes offline - Fix uninitialized error variables - Fix a UAF in the CIL when commited log item callbacks race with a shutdown - Fix a bug where the CIL could hang trying to push part of the log ring buffer that hasn't been filled yet" * tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (102 commits) xfs: don't wait on future iclogs when pushing the CIL xfs: Fix a CIL UAF by getting get rid of the iclog callback lock xfs: remove callback dequeue loop from xlog_state_do_iclog_callbacks xfs: don't nest icloglock inside ic_callback_lock xfs: Initialize error in xfs_attr_remove_iter xfs: fix endianness issue in xfs_ag_shrink_space xfs: remove dead stale buf unpin handling code xfs: hold buffer across unpin and potential shutdown processing xfs: force the log offline when log intent item recovery fails xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes xfs: shorten the shutdown messages to a single line xfs: print name of function causing fs shutdown instead of hex pointer xfs: fix type mismatches in the inode reclaim functions xfs: separate primary inode selection criteria in xfs_iget_cache_hit xfs: refactor the inode recycling code xfs: add iclog state trace events xfs: xfs_log_force_lsn isn't passed a LSN xfs: Fix CIL throttle hang when CIL space used going backwards xfs: journal IO cache flush reductions xfs: remove need_start_rec parameter from xlog_write() ...
2021-06-29Merge tag 'fallthrough-fixes-clang-5.14-rc1' of ↵Linus Torvalds3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo Silva: "Fix many fall-through warnings when building with Clang 12.0.0 and '-Wimplicit-fallthrough' so that we at some point will be able to enable that warning by default" * tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (26 commits) rxrpc: Fix fall-through warnings for Clang drm/nouveau/clk: Fix fall-through warnings for Clang drm/nouveau/therm: Fix fall-through warnings for Clang drm/nouveau: Fix fall-through warnings for Clang xfs: Fix fall-through warnings for Clang xfrm: Fix fall-through warnings for Clang tipc: Fix fall-through warnings for Clang sctp: Fix fall-through warnings for Clang rds: Fix fall-through warnings for Clang net/packet: Fix fall-through warnings for Clang net: netrom: Fix fall-through warnings for Clang ide: Fix fall-through warnings for Clang hwmon: (max6621) Fix fall-through warnings for Clang hwmon: (corsair-cpro) Fix fall-through warnings for Clang firewire: core: Fix fall-through warnings for Clang braille_console: Fix fall-through warnings for Clang ipv4: Fix fall-through warnings for Clang qlcnic: Fix fall-through warnings for Clang bnxt_en: Fix fall-through warnings for Clang netxen_nic: Fix fall-through warnings for Clang ...
2021-06-25xfs: Initialize error in xfs_attr_remove_iterAllison Henderson1-1/+1
A recent bug report generated a warning that a code path in xfs_attr_remove_iter could potentially return error uninitialized in the case of XFS_DAS_RM_SHRINK state. Fix this by initializing error. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-22xfs: fix endianness issue in xfs_ag_shrink_spaceDarrick J. Wong1-3/+4
The AGI buffer is in big-endian format, so we must convert the endianness to CPU format to do any comparisons. Fixes: 46141dc891f7 ("xfs: introduce xfs_ag_shrink_space()") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-06-21xfs: xfs_log_force_lsn isn't passed a LSNDave Chinner1-0/+1
In doing an investigation into AIL push stalls, I was looking at the log force code to see if an async CIL push could be done instead. This lead me to xfs_log_force_lsn() and looking at how it works. xfs_log_force_lsn() is only called from inode synchronisation contexts such as fsync(), and it takes the ip->i_itemp->ili_last_lsn value as the LSN to sync the log to. This gets passed to xlog_cil_force_lsn() via xfs_log_force_lsn() to flush the CIL to the journal, and then used by xfs_log_force_lsn() to flush the iclogs to the journal. The problem is that ip->i_itemp->ili_last_lsn does not store a log sequence number. What it stores is passed to it from the ->iop_committing method, which is called by xfs_log_commit_cil(). The value this passes to the iop_committing method is the CIL context sequence number that the item was committed to. As it turns out, xlog_cil_force_lsn() converts the sequence to an actual commit LSN for the related context and returns that to xfs_log_force_lsn(). xfs_log_force_lsn() overwrites it's "lsn" variable that contained a sequence with an actual LSN and then uses that to sync the iclogs. This caused me some confusion for a while, even though I originally wrote all this code a decade ago. ->iop_committing is only used by a couple of log item types, and only inode items use the sequence number it is passed. Let's clean up the API, CIL structures and inode log item to call it a sequence number, and make it clear that the high level code is using CIL sequence numbers and not on-disk LSNs for integrity synchronisation purposes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-18xfs: log stripe roundoff is a property of the logDave Chinner1-3/+0
We don't need to look at the xfs_mount and superblock every time we need to do an iclog roundoff calculation. The property is fixed for the life of the log, so store the roundoff in the log at mount time and use that everywhere. On a debug build: $ size fs/xfs/xfs_log.o.* text data bss dec hex filename 27360 560 8 27928 6d18 fs/xfs/xfs_log.o.orig 27219 560 8 27787 6c8b fs/xfs/xfs_log.o.patched Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-06-18xfs: perag may be null in xfs_imap()Dave Chinner1-1/+2
Dan Carpenter's static checker reported: The patch 7b13c5155182: "xfs: use perag for ialloc btree cursors" from Jun 2, 2021, leads to the following Smatch complaint: fs/xfs/libxfs/xfs_ialloc.c:2403 xfs_imap() error: we previously assumed 'pag' could be null (see line 2294) And it's right. Fix it. Fixes: 7b13c5155182 ("xfs: use perag for ialloc btree cursors") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-06-18Merge tag 'xfs-delay-ready-attrs-v20.1' of ↵Darrick J. Wong6-463/+1031
https://github.com/allisonhenderson/xfs_work into xfs-5.14-merge4 xfs: Delay Ready Attributes Hi all, This set is a subset of a larger series for Dealyed Attributes. Which is a subset of a yet larger series for parent pointers. Delayed attributes allow attribute operations (set and remove) to be logged and committed in the same way that other delayed operations do. This allows more complex operations (like parent pointers) to be broken up into multiple smaller transactions. To do this, the existing attr operations must be modified to operate as a delayed operation. This means that they cannot roll, commit, or finish transactions. Instead, they return -EAGAIN to allow the calling function to handle the transaction. In this series, we focus on only the delayed attribute portion. We will introduce parent pointers in a later set. The set as a whole is a bit much to digest at once, so I usually send out the smaller sub series to reduce reviewer burn out. But the entire extended series is visible through the included github links. Updates since v19: Added Darricks fix for the remote block accounting as well as some minor nits about the default assert in xfs_attr_set_iter. Spent quite a bit of time testing this cycle to weed out any more unexpected bugs. No new test failures were observed with the addition of this set. xfs: Fix default ASSERT in xfs_attr_set_iter Replaced the assert with ASSERT(0); xfs: Add delay ready attr remove routines Added Darricks fix for remote block accounting This series can be viewed on github here: https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v20 As well as the extended delayed attribute and parent pointer series: https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v20_extended And the test cases: https://github.com/allisonhenderson/xfs_work/tree/pptr_xfstestsv3 In order to run the test cases, you will need have the corresponding xfsprogs changes as well. Which can be found here: https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v20 https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v20_extended To run the xfs attributes tests run: check -g attr To run as delayed attributes run: export MOUNT_OPTIONS="-o delattr" check -g attr To run parent pointer tests: check -g parent I've also made the corresponding updates to the user space side as well, and ported anything they need to seat correctly. Questions, comment and feedback appreciated! Thanks all! Allison * tag 'xfs-delay-ready-attrs-v20.1' of https://github.com/allisonhenderson/xfs_work: xfs: Make attr name schemes consistent xfs: Fix default ASSERT in xfs_attr_set_iter xfs: Clean up xfs_attr_node_addname_clear_incomplete xfs: Remove xfs_attr_rmtval_set xfs: Add delay ready attr set routines xfs: Add delay ready attr remove routines xfs: Hoist node transaction handling xfs: Hoist xfs_attr_leaf_addname xfs: Hoist xfs_attr_node_addname xfs: Add helper xfs_attr_node_addname_find_attr xfs: Separate xfs_attr_node_addname and xfs_attr_node_addname_clear_incomplete xfs: Refactor xfs_attr_set_shortform xfs: Add xfs_attr_node_remove_name xfs: Reverse apply 72b97ea40d