<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/xfs/xfs_rtalloc.c, branch linux-6.0.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-04-21T06:46:17+00:00</updated>
<entry>
<title>Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next</title>
<updated>2022-04-21T06:46:17+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>david@fromorbit.com</email>
</author>
<published>2022-04-21T06:46:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a44a027a8b2a20fec30e0e9c99b0eb41c03e7420'/>
<id>urn:sha1:a44a027a8b2a20fec30e0e9c99b0eb41c03e7420</id>
<content type='text'>
xfs: Large extent counters

The commit xfs: fix inode fork extent count overflow
(3f8a4f1d876d3e3e49e50b0396eaffcc4ba71b08) mentions that 10 billion
data fork extents should be possible to create. However the
corresponding on-disk field has a signed 32-bit type. Hence this
patchset extends the per-inode data fork extent counter to 64 bits
(out of which 48 bits are used to store the extent count).

Also, XFS has an attribute fork extent counter which is 16 bits
wide. A workload that,
1. Creates 1 million 255-byte sized xattrs,
2. Deletes 50% of these xattrs in an alternating manner,
3. Tries to insert 400,000 new 255-byte sized xattrs
   causes the xattr extent counter to overflow.

Dave tells me that there are instances where a single file has more
than 100 million hardlinks. With parent pointers being stored in
xattrs, we will overflow the signed 16-bits wide attribute extent
counter when large number of hardlinks are created. Hence this
patchset extends the on-disk field to 32-bits.

The following changes are made to accomplish this,
1. A 64-bit inode field is carved out of existing di_pad and
   di_flushiter fields to hold the 64-bit data fork extent counter.
2. The existing 32-bit inode data fork extent counter will be used to
   hold the attribute fork extent counter.
3. A new incompat superblock flag to prevent older kernels from mounting
   the filesystem.

Signed-off-by: Chandan Babu R &lt;chandan.babu@oracle.com&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;
</content>
</entry>
<entry>
<title>xfs: Conditionally upgrade existing inodes to use large extent counters</title>
<updated>2022-04-13T07:02:44+00:00</updated>
<author>
<name>Chandan Babu R</name>
<email>chandan.babu@oracle.com</email>
</author>
<published>2022-03-09T07:49:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4f86bb4b66c999ad9ddcfd49fec93992eeba2715'/>
<id>urn:sha1:4f86bb4b66c999ad9ddcfd49fec93992eeba2715</id>
<content type='text'>
This commit enables upgrading existing inodes to use large extent counters
provided that underlying filesystem's superblock has large extent counter
feature enabled.

Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Chandan Babu R &lt;chandan.babu@oracle.com&gt;
</content>
</entry>
<entry>
<title>xfs: use a separate frextents counter for rt extent reservations</title>
<updated>2022-04-11T20:49:42+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-04-11T20:49:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2229276c5283264b8c2241c1ed972bbb136cab22'/>
<id>urn:sha1:2229276c5283264b8c2241c1ed972bbb136cab22</id>
<content type='text'>
As mentioned in the previous commit, the kernel misuses sb_frextents in
the incore mount to reflect both incore reservations made by running
transactions as well as the actual count of free rt extents on disk.
This results in the superblock being written to the log with an
underestimate of the number of rt extents that are marked free in the
rtbitmap.

Teaching XFS to recompute frextents after log recovery avoids
operational problems in the current mount, but it doesn't solve the
problem of us writing undercounted frextents which are then recovered by
an older kernel that doesn't have that fix.

Create an incore percpu counter to mirror the ondisk frextents.  This
new counter will track transaction reservations and the only time we
will touch the incore super counter (i.e the one that gets logged) is
when those transactions commit updates to the rt bitmap.  This is in
contrast to the lazysbcount counters (e.g. fdblocks), where we know that
log recovery will always fix any incorrect counter that we log.
As a bonus, we only take m_sb_lock at transaction commit time.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
<entry>
<title>xfs: recalculate free rt extents after log recovery</title>
<updated>2022-04-11T20:49:42+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-04-11T20:49:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5a605fd6cb1da0ec9cb6e54c06bcf58f706d2f83'/>
<id>urn:sha1:5a605fd6cb1da0ec9cb6e54c06bcf58f706d2f83</id>
<content type='text'>
I've been observing periodic corruption reports from xfs_scrub involving
the free rt extent counter (frextents) while running xfs/141.  That test
uses an error injection knob to induce a torn write to the log, and an
arbitrary number of recovery mounts, frextents will count fewer free rt
extents than can be found the rtbitmap.

The root cause of the problem is a combination of the misuse of
sb_frextents in the incore mount to reflect both incore reservations
made by running transactions as well as the actual count of free rt
extents on disk.  The following sequence can reproduce the undercount:

Thread 1			Thread 2
xfs_trans_alloc(rtextents=3)
xfs_mod_frextents(-3)
&lt;blocks&gt;
				xfs_attr_set()
				xfs_bmap_attr_addfork()
				xfs_add_attr2()
				xfs_log_sb()
				xfs_sb_to_disk()
				xfs_trans_commit()
&lt;log flushed to disk&gt;
&lt;log goes down&gt;

Note that thread 1 subtracts 3 from sb_frextents even though it never
commits to using that space.  Thread 2 writes the undercounted value to
the ondisk superblock and logs it to the xattr transaction, which is
then flushed to disk.  At next mount, log recovery will find the logged
superblock and write that back into the filesystem.  At the end of log
recovery, we reread the superblock and install the recovered
undercounted frextents value into the incore superblock.  From that
point on, we've effectively leaked thread 1's transaction reservation.

The correct fix for this is to separate the incore reservation from the
ondisk usage, but that's a matter for the next patch.  Because the
kernel has been logging superblocks with undercounted frextents for a
very long time and we don't demand that sysadmins run xfs_repair after a
crash, fix the undercount by recomputing frextents after log recovery.

Gating this on log recovery is a reasonable balance (I think) between
correcting the problem and slowing down every mount attempt.  Note that
xfs_repair will fix undercounted frextents.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
<entry>
<title>xfs: replace xfs_sb_version checks with feature flag checks</title>
<updated>2021-08-19T17:07:12+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2021-08-19T01:46:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=38c26bfd90e1999650d5ef40f90d721f05916643'/>
<id>urn:sha1:38c26bfd90e1999650d5ef40f90d721f05916643</id>
<content type='text'>
Convert the xfs_sb_version_hasfoo() to checks against
mp-&gt;m_features. Checks of the superblock itself during disk
operations (e.g. in the read/write verifiers and the to/from disk
formatters) are not converted - they operate purely on the
superblock state. Everything else should use the mount features.

Large parts of this conversion were done with sed with commands like
this:

for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
	sed -i -e 's/xfs_sb_version_has\(.*\)(&amp;\(.*\)-&gt;m_sb)/xfs_has_\1(\2)/' $f
done

With manual cleanups for things like "xfs_has_extflgbit" and other
little inconsistencies in naming.

The result is ia lot less typing to check features and an XFS binary
size reduced by a bit over 3kB:

$ size -t fs/xfs/built-in.a
	text	   data	    bss	    dec	    hex	filenam
before	1130866  311352     484 1442702  16038e (TOTALS)
after	1127727  311352     484 1439563  15f74b (TOTALS)

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>xfs: reflect sb features in xfs_mount</title>
<updated>2021-08-19T17:07:12+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2021-08-19T01:46:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a1d86e8dec8c1325d301c9d5594bb794bc428fc3'/>
<id>urn:sha1:a1d86e8dec8c1325d301c9d5594bb794bc428fc3</id>
<content type='text'>
Currently on-disk feature checks require decoding the superblock
fileds and so can be non-trivial. We have almost 400 hundred
individual feature checks in the XFS code, so this is a significant
amount of code. To reduce runtime check overhead, pre-process all
the version flags into a features field in the xfs_mount at mount
time so we can convert all the feature checks to a simple flag
check.

There is also a need to convert the dynamic feature flags to update
the m_features field. This is required for attr, attr2 and quota
features. New xfs_mount based wrappers are added for this.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>xfs: fix an integer overflow error in xfs_growfs_rt</title>
<updated>2021-07-15T16:58:42+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2021-07-12T19:58:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0925fecc557471b6f6a488c3590a275151210572'/>
<id>urn:sha1:0925fecc557471b6f6a488c3590a275151210572</id>
<content type='text'>
During a realtime grow operation, we run a single transaction for each
rt bitmap block added to the filesystem.  This means that each step has
to be careful to increase sb_rblocks appropriately.

Fix the integer overflow error in this calculation that can happen when
the extent size is very large.  Found by running growfs to add a rt
volume to a filesystem formatted with a 1g rt extent size.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>xfs: improve FSGROWFSRT precondition checking</title>
<updated>2021-07-15T16:58:42+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2021-07-12T19:58:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0e2af9296f4f9c4c815ced2beb21093af7c38644'/>
<id>urn:sha1:0e2af9296f4f9c4c815ced2beb21093af7c38644</id>
<content type='text'>
Improve the checking at the start of a realtime grow operation so that
we avoid accidentally set a new extent size that is too large and avoid
adding an rt volume to a filesystem with rmap or reflink because we
don't support rt rmap or reflink yet.

While we're at it, separate the checks so that we're only testing one
aspect at a time.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>xfs: move the di_flags field to struct xfs_inode</title>
<updated>2021-04-07T21:37:05+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-03-29T18:11:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db07349da2f564742c0f23528691991e641e315e'/>
<id>urn:sha1:db07349da2f564742c0f23528691991e641e315e</id>
<content type='text'>
In preparation of removing the historic icinode struct, move the flags
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>xfs: move the di_size field to struct xfs_inode</title>
<updated>2021-04-07T21:37:03+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-03-29T18:11:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=13d2c10b05d8e67cb9b4c2d1d4a09a906148a72e'/>
<id>urn:sha1:13d2c10b05d8e67cb9b4c2d1d4a09a906148a72e</id>
<content type='text'>
In preparation of removing the historic icinode struct, move the on-disk
size field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
</feed>
