summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-01-23xfs: speed up parent pointer operations when possibleDarrick J. Wong3-10/+109
After a recent fsmark benchmarking run, I observed that the overhead of parent pointers on file creation and deletion can be a bit high. On a machine with 20 CPUs, 128G of memory, and an NVME SSD capable of pushing 750000iops, I see the following results: $ mkfs.xfs -f -l logdev=/dev/nvme1n1,size=1g /dev/nvme0n1 -n parent=0 meta-data=/dev/nvme0n1 isize=512 agcount=40, agsize=9767586 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 = exchange=0 metadir=0 data = bsize=4096 blocks=390703440, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=0 log =/dev/nvme1n1 bsize=4096 blocks=262144, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 = rgcount=0 rgsize=0 extents = zoned=0 start=0 reserved=0 So we created 40 AGs, one per CPU. Now we create 40 directories and run fsmark: $ time fs_mark -D 10000 -S 0 -n 100000 -s 0 -L 8 -d ... # Version 3.3, 40 thread(s) starting at Wed Dec 10 14:22:07 2025 # Sync method: NO SYNC: Test does not issue sync() or fsync() calls. # Directories: Time based hash between directories across 10000 subdirectories with 180 seconds per subdirectory. # File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name) # Files info: size 0 bytes, written with an IO size of 16384 bytes per write # App overhead is time in microseconds spent in the test not doing file writing related system calls. parent=0 parent=1 ================== ================== real 0m57.573s real 1m2.934s user 3m53.578s user 3m53.508s sys 19m44.440s sys 25m14.810s $ time rm -rf ... parent=0 parent=1 ================== ================== real 0m59.649s real 1m12.505s user 0m41.196s user 0m47.489s sys 13m9.566s sys 20m33.844s Parent pointers increase the system time by 28% overhead to create 32 million files that are totally empty. Removing them incurs a system time increase of 56%. Wall time increases by 9% and 22%. For most filesystems, each file tends to have a single owner and not that many xattrs. If the xattr structure is shortform, then all xattr changes are logged with the inode and do not require the the xattr intent mechanism to persist the parent pointer. Therefore, we can speed up parent pointer operations by calling the shortform xattr functions directly if the child's xattr is in short format. Now the overhead looks like: $ time fs_mark -D 10000 -S 0 -n 100000 -s 0 -L 8 -d ... parent=0 parent=1 ================== ================== real 0m58.030s real 1m0.983s user 3m54.141s user 3m53.758s sys 19m57.003s sys 21m30.605s $ time rm -rf ... parent=0 parent=1 ================== ================== real 0m58.911s real 1m4.420s user 0m41.329s user 0m45.169s sys 13m27.857s sys 15m58.564s Now parent pointers only increase the system time by 8% for creation and 19% for deletion. Wall time increases by 5% and 9% now. Close the performance gap by creating helpers for the attr set, remove, and replace operations that will try to make direct shortform updates, and fall back to the attr intent machinery if that doesn't work. This works for regular xattrs and for parent pointers. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-23xfs: reduce xfs_attr_try_sf_addname parametersDarrick J. Wong1-7/+4
The dp parameter to this function is an alias of args->dp, so remove it for clarity before we go adding new callers. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-23xfs: fix remote xattr valuelblk checkDarrick J. Wong1-1/+4
In debugging other problems with generic/753, it turns out that it's possible for the system go to down in the middle of a remote xattr set operation such that the leaf block entry is marked incomplete and valueblk is set to zero. Make this no longer a failure. Cc: <stable@vger.kernel.org> # v4.15 Fixes: 13791d3b833428 ("xfs: scrub extended attribute leaf space") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-23xfs: fix the xattr scrub to detect freemap/entries array collisionsDarrick J. Wong1-27/+27
In the previous patches, we observed that it's possible for there to be freemap entries with zero size but a nonzero base. This isn't an inconsistency per se, but older kernels can get confused by this and corrupt the block, leading to corruption. If we see this, flag the xattr structure for optimization so that it gets rebuilt. Cc: <stable@vger.kernel.org> # v4.15 Fixes: 13791d3b833428 ("xfs: scrub extended attribute leaf space") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-23xfs: strengthen attr leaf block freemap checkingDarrick J. Wong1-0/+55
Check for erroneous overlapping freemap regions and collisions between freemap regions and the xattr leaf entry array. Note that we must explicitly zero out the extra freemaps in xfs_attr3_leaf_compact so that the in-memory buffer has a correctly initialized freemap array to satisfy the new verification code, even if subsequent code changes the contents before unlocking the buffer. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-23xfs: refactor attr3 leaf table size computationDarrick J. Wong2-29/+30
Replace all the open-coded callsites with a single static inline helper. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-23xfs: fix freemap adjustments when adding xattrs to leaf blocksDarrick J. Wong1-8/+28
xfs/592 and xfs/794 both trip this assertion in the leaf block freemap adjustment code after ~20 minutes of running on my test VMs: ASSERT(ichdr->firstused >= ichdr->count * sizeof(xfs_attr_leaf_entry_t) + xfs_attr3_leaf_hdr_size(leaf)); Upon enabling quite a lot more debugging code, I narrowed this down to fsstress trying to set a local extended attribute with namelen=3 and valuelen=71. This results in an entry size of 80 bytes. At the start of xfs_attr3_leaf_add_work, the freemap looks like this: i 0 base 448 size 0 rhs 448 count 46 i 1 base 388 size 132 rhs 448 count 46 i 2 base 2120 size 4 rhs 448 count 46 firstused = 520 where "rhs" is the first byte past the end of the leaf entry array. This is inconsistent -- the entries array ends at byte 448, but freemap[1] says there's free space starting at byte 388! By the end of the function, the freemap is in worse shape: i 0 base 456 size 0 rhs 456 count 47 i 1 base 388 size 52 rhs 456 count 47 i 2 base 2120 size 4 rhs 456 count 47 firstused = 440 Important note: 388 is not aligned with the entries array element size of 8 bytes. Based on the incorrect freemap, the name area starts at byte 440, which is below the end of the entries array! That's why the assertion triggers and the filesystem shuts down. How did we end up here? First, recall from the previous patch that the freemap array in an xattr leaf block is not intended to be a comprehensive map of all free space in the leaf block. In other words, it's perfectly legal to have a leaf block with: * 376 bytes in use by the entries array * freemap[0] has [base = 376, size = 8] * freemap[1] has [base = 388, size = 1500] * the space between 376 and 388 is free, but the freemap stopped tracking that some time ago If we add one xattr, the entries array grows to 384 bytes, and freemap[0] becomes [base = 384, size = 0]. So far, so good. But if we add a second xattr, the entries array grows to 392 bytes, and freemap[0] gets pushed up to [base = 392, size = 0]. This is bad, because freemap[1] hasn't been updated, and now the entries array and the free space claim the same space. The fix here is to adjust all freemap entries so that none of them collide with the entries array. Note that this fix relies on commit 2a2b5932db6758 ("xfs: fix attr leaf header freemap.size underflow") and the previous patch that resets zero length freemap entries to have base = 0. Cc: <stable@vger.kernel.org> # v2.6.12 Fixes: 1da177e4c3f415 ("Linux-2.6.12-rc2") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-23xfs: delete attr leaf freemap entries when emptyDarrick J. Wong1-0/+13
Back in commit 2a2b5932db6758 ("xfs: fix attr leaf header freemap.size underflow"), Brian Foster observed that it's possible for a small freemap at the end of the end of the xattr entries array to experience a size underflow when subtracting the space consumed by an expansion of the entries array. There are only three freemap entries, which means that it is not a complete index of all free space in the leaf block. This code can leave behind a zero-length freemap entry with a nonzero base. Subsequent setxattr operations can increase the base up to the point that it overlaps with another freemap entry. This isn't in and of itself a problem because the code in _leaf_add that finds free space ignores any freemap entry with zero size. However, there's another bug in the freemap update code in _leaf_add, which is that it fails to update a freemap entry that begins midway through the xattr entry that was just appended to the array. That can result in the freemap containing two entries with the same base but different sizes (0 for the "pushed-up" entry, nonzero for the entry that's actually tracking free space). A subsequent _leaf_add can then allocate xattr namevalue entries on top of the entries array, leading to data loss. But fixing that is for later. For now, eliminate the possibility of confusion by zeroing out the base of any freemap entry that has zero size. Because the freemap is not intended to be a complete index of free space, a subsequent failure to find any free space for a new xattr will trigger block compaction, which regenerates the freemap. It looks like this bug has been in the codebase for quite a long time. Cc: <stable@vger.kernel.org> # v2.6.12 Fixes: 1da177e4c3f415 ("Linux-2.6.12-rc2") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-21xfs: fix incorrect context handling in xfs_trans_rollWenwu Hou2-11/+6
The memalloc_nofs_save() and memalloc_nofs_restore() calls are incorrectly paired in xfs_trans_roll. Call path: xfs_trans_alloc() __xfs_trans_alloc() // tp->t_pflags = memalloc_nofs_save(); xfs_trans_set_context() ... xfs_defer_trans_roll() xfs_trans_roll() xfs_trans_dup() // old_tp->t_pflags = 0; xfs_trans_switch_context() __xfs_trans_commit() xfs_trans_free() // memalloc_nofs_restore(tp->t_pflags); xfs_trans_clear_context() The code passes 0 to memalloc_nofs_restore() when committing the original transaction, but memalloc_nofs_restore() should always receive the flags returned from the paired memalloc_nofs_save() call. Before commit 3f6d5e6a468d ("mm: introduce memalloc_flags_{save,restore}"), calling memalloc_nofs_restore(0) would unset the PF_MEMALLOC_NOFS flag, which could cause memory allocation deadlocks[1]. Fortunately, after that commit, memalloc_nofs_restore(0) does nothing, so this issue is currently harmless. Fixes: 756b1c343333 ("xfs: use current->journal_info for detecting transaction recursion") Link: https://lore.kernel.org/linux-xfs/20251104131857.1587584-1-leo.lilong@huawei.com [1] Signed-off-by: Wenwu Hou <hwenwur@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: always allocate the free zone with the lowest indexHans Holmberg2-31/+17
Zones in the beginning of the address space are typically mapped to higer bandwidth tracks on HDDs than those at the end of the address space. So, in stead of allocating zones "round robin" across the whole address space, always allocate the zone with the lowest index. This increases average write bandwidth for overwrite workloads when less than the full capacity is being used. At ~50% utilization this improves bandwidth for a random file overwrite benchmark with 128MiB files and 256MiB zone capacity by 30%. Running the same benchmark with small 2-8 MiB files at 67% capacity shows no significant difference in performance. Due to heavy fragmentation the whole zone range is in use, greatly limiting the number of free zones with high bw. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: promote metadata directories and large block supportDarrick J. Wong3-14/+0
Large block support was merged upstream in 6.12 (Dec 2024) and metadata directories was merged in 6.13 (Jan 2025). We've not received any serious complaints about the ondisk formats of these two features in the past year, so let's remove the experimental warnings. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: use blkdev_get_zone_info to simplify zone reportingChristoph Hellwig1-64/+50
Unwind the callback based programming model by querying the cached zone information using blkdev_get_zone_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: check that used blocks are smaller than the write pointerChristoph Hellwig1-0/+7
Any used block must have been written, this reject used blocks > write pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: split and refactor zone validationChristoph Hellwig3-114/+68
Currently xfs_zone_validate mixes validating the software zone state in the XFS realtime group with validating the hardware state reported in struct blk_zone and deriving the write pointer from that. Move all code that works on the realtime group to xfs_init_zone, and only keep the hardware state validation in xfs_zone_validate. This makes the code more clear, and allows for better reuse in userspace. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: pass the write pointer to xfs_init_zoneChristoph Hellwig1-29/+37
Move the two methods to query the write pointer out of xfs_init_zone into the callers, so that xfs_init_zone doesn't have to bother with the blk_zone structure and instead operates purely at the XFS realtime group level. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: add a xfs_rtgroup_raw_size helperChristoph Hellwig1-0/+15
Add a helper to figure the on-disk size of a group, accounting for the XFS_SB_FEAT_INCOMPAT_ZONE_GAPS feature if needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: add missing forward declaration in xfs_zones.hDamien Le Moal1-0/+1
Add the missing forward declaration for struct blk_zone in xfs_zones.h. This avoids headaches with the order of header file inclusion to avoid compilation errors. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: remove xfs_attr_leaf_hasnameChristoph Hellwig1-51/+24
The calling convention of xfs_attr_leaf_hasname() is problematic, because it returns a NULL buffer when xfs_attr3_leaf_read fails, a valid buffer when xfs_attr3_leaf_lookup_int returns -ENOATTR or -EEXIST, and a non-NULL buffer pointer for an already released buffer when xfs_attr3_leaf_lookup_int fails with other error values. Fix this by simply open coding xfs_attr_leaf_hasname in the callers, so that the buffer release code is done by each caller of xfs_attr3_leaf_read. Cc: stable@vger.kernel.org # v5.19+ Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines") Reported-by: Mark Tinguely <mark.tinguely@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: mark data structures corrupt on EIO and ENODATADarrick J. Wong3-0/+8
I learned a few things this year: first, blk_status_to_errno can return ENODATA for critical media errors; and second, the scrub code doesn't mark data structures as corrupt on ENODATA or EIO. Currently, scrub failing to capture these errors isn't all that impactful -- the checking code will exit to userspace with EIO/ENODATA, and xfs_scrub will log a complaint and exit with nonzero status. Most people treat fsck tools failing as a sign that the fs is corrupt, but online fsck should mark the metadata bad and keep moving. Cc: stable@vger.kernel.org # v4.15 Fixes: 4700d22980d459 ("xfs: create helpers to record and deal with scrub problems") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: rework zone GC buffer managementChristoph Hellwig1-47/+59
The double buffering where just one scratch area is used at a time does not efficiently use the available memory. It was originally implemented when GC I/O could happen out of order, but that was removed before upstream submission to avoid fragmentation. Now that all GC I/Os are processed in order, just use a number of buffers as a simple ring buffer. For a synthetic benchmark that fills 256MiB HDD zones and punches out holes to free half the space this leads to a decrease of GC time by a little more than 25%. Thanks to Hans Holmberg <hans.holmberg@wdc.com> for testing and benchmarking. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: use bio_reuse in the zone GC codeChristoph Hellwig1-6/+1
Replace our somewhat fragile code to reuse the bio, which caused a regression in the past with the block layer bio_reuse helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21block: add a bio_reuse helperChristoph Hellwig2-0/+35
Add a helper to allow an existing bio to be resubmitted without having to re-add the payload. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: directly include xfs_platform.hChristoph Hellwig194-204/+193
The xfs.h header conflicts with the public xfs.h in xfsprogs, leading to a spurious difference in all shared libxfs files that have to include libxfs_priv.h in userspace. Directly include xfs_platform.h so that we can add a header of the same name to xfsprogs and remove this major annoyance for the shared code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: move the remaining content from xfs.h to xfs_platform.hChristoph Hellwig2-17/+16
Move the global defines from xfs.h to xfs_platform.h to prepare for removing xfs.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: include global headers first in xfs_platform.hChristoph Hellwig1-14/+10
Ensure we have all kernel headers included by the time we do our own thing, just like the rest of the tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: rename xfs_linux.h to xfs_platform.hChristoph Hellwig2-4/+4
Rename xfs_linux.h to prepare for including including it directly from source files including those shared with xfsprogs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: factor out a xlog_write_space_advance helperChristoph Hellwig1-12/+20
Add a new xlog_write_space_advance that returns the current place in the iclog that data is written to, and advances the various counters by the amount taken from xlog_write_iovec, and also use it xlog_write_partial, which open codes the counter adjustments, but misses the asserts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: improve the iclog space assert in xlog_write_iovecChristoph Hellwig1-1/+1
We need enough space for the length we copy into the iclog, not just some space, so tighten up the check a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: add a xlog_write_space_left helperChristoph Hellwig1-9/+13
Various places check how much space is left in the current iclog, add a helper for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: improve the calling convention for the xlog_write helpersChristoph Hellwig1-110/+77
The xlog_write chain passes around the same seven variables that are often passed by reference. Add a xlog_write_data structure to contain them to improve code generation and readability. This change increases the generated code size by about 140 bytes for my x86_64 build, which is hopefully worth the much easier to follow code: $ size fs/xfs/xfs_log.o* text data bss dec hex filename 29300 1730 176 31206 79e6 fs/xfs/xfs_log.o 29160 1730 176 31066 795a fs/xfs/xfs_log.o.old Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: regularize iclog space accounting in xlog_write_partialChristoph Hellwig1-3/+4
When xlog_write_partial splits a log region over multiple iclogs, it has to include the continuation ophder in the length requested for the new iclog. Currently is simply adds that to the request, which makes the accounting of the used space below look slightly different from the other users of iclog space that decrement it. To prepare for more code sharing, add the ophdr size to the len variable that tracks the number of bytes still are left in this xlog_write operation before the calling xlog_write_get_more_iclog_space, and then decrement it later when consuming that space. This changes the value of len when xlog_write_get_more_iclog_space returns an error, but as nothing looks at len in that case the difference doesn't matter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: move struct xfs_log_vec to xfs_log_priv.hChristoph Hellwig2-12/+12
The log_vec is a private type for the log/CIL code and should not be exposed to anything else. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: move struct xfs_log_iovec to xfs_log_priv.hChristoph Hellwig2-7/+6
This structure is now only used by the core logging and CIL code. Also remove the unused typedef. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: improve the ->iop_format interfaceChristoph Hellwig14-194/+180
Export a higher level interface to format log items. The xlog_format_buf structure is hidden inside xfs_log_cil.c and only accessed using two helpers (and a wrapper build on top), hiding details of log iovecs from the log items. This also allows simply using an index into lv_iovecp instead of keeping a cursor vec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: set lv_bytes in xlog_write_one_vecChristoph Hellwig1-2/+3
lv_bytes is mostly just use by the CIL code, but has crept into the low-level log writing code to decide on a full or partial iclog write. Ensure it is valid even for the special log writes that don't go through the CIL by initializing it in xlog_write_one_vec. Note that even without this fix, the checkpoint commits would never trigger a partial iclog write, as they have no payload beyond the opheader. The unmount record on the other hand could in theory trigger a an overflow of the iclog, but given that is has never been seen in the wild this has probably been masked by the small size of it and the fact that the unmount process does multiple log forces before writing the unmount record and we thus usually operate on an empty or almost empty iclog. Fixes: 110dc24ad2ae ("xfs: log vector rounding leaks log space") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-21xfs: add a xlog_write_one_vec helperChristoph Hellwig3-24/+24
Add a wrapper for xlog_write for the two callers who need to build a log_vec and add it to a single-entry chain instead of duplicating the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-19Linux 6.19-rc6v6.19-rc6Linus Torvalds1-1/+1
2026-01-19Merge tag 'landlock-6.19-rc6' of ↵Linus Torvalds14-269/+170
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock fixes from Mickaël Salaün: "This fixes TCP handling, tests, documentation, non-audit elided code, and minor cosmetic changes" * tag 'landlock-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Clarify documentation for the IOCTL access right selftests/landlock: Properly close a file descriptor landlock: Improve the comment for domain_is_scoped selftests/landlock: Use scoped_base_variants.h for ptrace_test selftests/landlock: Fix missing semicolon selftests/landlock: Fix typo in fs_test landlock: Optimize stack usage when !CONFIG_AUDIT landlock: Fix spelling landlock: Clean up hook_ptrace_access_check() landlock: Improve erratum documentation landlock: Remove useless include landlock: Fix wrong type usage selftests/landlock: NULL-terminate unix pathname addresses selftests/landlock: Remove invalid unix socket bind() selftests/landlock: Add missing connect(minimal AF_UNSPEC) test selftests/landlock: Fix TCP bind(AF_UNSPEC) test case landlock: Fix TCP handling of short AF_UNSPEC addresses landlock: Fix formatting
2026-01-19Merge tag 'cgroup-for-6.19-rc5-fixes-2' of ↵Linus Torvalds4-16/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Add Chen Ridong as cpuset reviewer - Add SPDX license identifiers to cgroup files that were missing them * tag 'cgroup-for-6.19-rc5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: kernel: cgroup: Add LGPL-2.1 SPDX license ID to legacy_freezer.c kernel: cgroup: Add SPDX-License-Identifier lines MAINTAINERS: Add Chen Ridong as cpuset reviewer
2026-01-19Merge tag 'ext4_for_linus-6.19-rc6' of ↵Linus Torvalds3-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: - Fix an inconsistency in structure size on 32-bit platforms caused by padding differences for the new EXT4_IOC_[GS]ET_TUNE_SB_PARAM ioctls - Fix a buffer leak on the error path when dropping the refcount an xattr value stored in an inode - Fix missing locking on the error path for the file defragmentation ioctl leading to a BUG * tag 'ext4_for_linus-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix iloc.bh leak in ext4_xattr_inode_update_ref ext4: add missing down_write_data_sem in mext_move_extent(). ext4: fix ext4_tune_sb_params padding
2026-01-19Merge tag 'dmaengine-fix-6.19' of ↵Linus Torvalds20-73/+168
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "A bunch of driver fixes for: - dma mask fix for mmp pdma driver - Xilinx regmap max register, uninitialized addr_width fix - device leak fix for bunch of drivers in the subsystem - stm32 dmamux, TI crossbar driver fixes for device & of node leak and route allocation cleanup - Tegra use afer free fix - Memory leak fix in Qualcomm gpi and omap-dma driver - compatible fix for apple driver" * tag 'dmaengine-fix-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (25 commits) dmaengine: apple-admac: Add "apple,t8103-admac" compatible dmaengine: omap-dma: fix dma_pool resource leak in error paths dmaengine: qcom: gpi: Fix memory leak in gpi_peripheral_config() dmaengine: sh: rz-dmac: Fix rz_dmac_terminate_all() dmaengine: xilinx_dma: Fix uninitialized addr_width when "xlnx,addrwidth" property is missing dmaengine: tegra-adma: Fix use-after-free dmaengine: fsl-edma: Fix clk leak on alloc_chan_resources failure dmaengine: mmp_pdma: Fix race condition in mmp_pdma_residue() dmaengine: ti: k3-udma: fix device leak on udma lookup dmaengine: ti: dma-crossbar: clean up dra7x route allocation error paths dmaengine: ti: dma-crossbar: fix device leak on am335x route allocation dmaengine: ti: dma-crossbar: fix device leak on dra7x route allocation dmaengine: stm32: dmamux: clean up route allocation error labels dmaengine: stm32: dmamux: fix OF node leak on route allocation failure dmaengine: stm32: dmamux: fix device leak on route allocation dmaengine: sh: rz-dmac: fix device leak on probe failure dmaengine: lpc32xx-dmamux: fix device leak on route allocation dmaengine: lpc18xx-dmamux: fix device leak on route allocation dmaengine: idxd: fix device leaks on compat bind and unbind dmaengine: dw: dmamux: fix OF node leak on route allocation failure ...
2026-01-19Merge tag 'phy-fixes-6.19' of ↵Linus Torvalds11-50/+33
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy fixes from Vinod Koul: "A bunch of driver fixes: - Freescale typec orientation switch fix, clearing register fix, assertion of phy reset during power on - Qualcomm pcs register clear before using - stm one off fix - TI runtimepm error handling, regmap leak fixes - Rockchip gadget mode disconnection and disruption fixes - Tegra register level fix - Broadcom pointer cast warning fix" * tag 'phy-fixes-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: freescale: imx8m-pcie: assert phy reset during power on phy: rockchip: inno-usb2: Fix a double free bug in rockchip_usb2phy_probe() phy: broadcom: ns-usb3: Fix Wvoid-pointer-to-enum-cast warning (again) phy: tegra: xusb: Explicitly configure HS_DISCON_LEVEL to 0x7 phy: rockchip: inno-usb2: fix communication disruption in gadget mode phy: rockchip: inno-usb2: fix disconnection in gadget mode phy: ti: gmii-sel: fix regmap leak on probe failure phy: sparx5-serdes: make it selectable for ARCH_LAN969X phy: ti: da8xx-usb: Handle devm_pm_runtime_enable() errors phy: stm32-usphyc: Fix off by one in probe() phy: qcom-qusb2: Fix NULL pointer dereference on early suspend phy: fsl-imx8mq-usb: Clear the PCS_TX_SWING_FULL field before using it dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Update pcie phy bindings for qcs8300 phy: fsl-imx8mq-usb: fix typec orientation switch when built as module
2026-01-18Merge tag 'soundwire-6.19-fixes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire fix from Vinod Koul: - Single off-by-one fix for allocating slave id * tag 'soundwire-6.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: soundwire: bus: fix off-by-one when allocating slave IDs
2026-01-18Merge tag 'usb-6.19-rc6' of ↵Linus Torvalds22-67/+166
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes and new device ids for 6.19-rc6 Included in here are: - new usb-serial device ids - dwc3-apple driver fixes to get things working properly on that hardware platform - ohci/uhci platfrom driver module soft-deps with ehci to remove a runtime warning that sometimes shows up on some platforms. - quirk for broken devices that can not handle reading the BOS descriptor from them without going crazy. - usb-serial driver fixes - xhci driver fixes - usb gadget driver fixes All of these except for the last xhci fix has been in linux-next for a while. The xhci fix has been reported by others to solve the issue for them, so should be ok" * tag 'usb-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: xhci: sideband: don't dereference freed ring when removing sideband endpoint usb: gadget: uvc: retry vb2_reqbufs() with vb_vmalloc_memops if use_sg fail usb: gadget: uvc: return error from uvcg_queue_init() usb: gadget: uvc: fix interval_duration calculation usb: gadget: uvc: fix req_payload_size calculation usb: dwc3: apple: Ignore USB role switches to the active role usb: host: xhci-tegra: Use platform_get_irq_optional() for wake IRQs USB: OHCI/UHCI: Add soft dependencies on ehci_platform usb: dwc3: apple: Set USB2 PHY mode before dwc3 init USB: serial: f81232: fix incomplete serial port generation USB: serial: ftdi_sio: add support for PICAXE AXE027 cable USB: serial: option: add Telit LE910 MBIM composition usb: core: add USB_QUIRK_NO_BOS for devices that hang on BOS descriptor dt-bindings: usb: qcom,dwc3: Correct MSM8994 interrupts dt-bindings: usb: qcom,dwc3: Correct IPQ5018 interrupts tcpm: allow looking for role_sw device in the main node usb: dwc3: Check for USB4 IP_NAME
2026-01-18Merge tag 'i2c-for-6.19-rc6' of ↵Linus Torvalds4-12/+62
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - riic, imx-lpi2c: suspend/resume fixes - qcom-geni: DMA handling fix - iproc: correct DT binding description * tag 'i2c-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx-lpi2c: change to PIO mode in system-wide suspend/resume progress i2c: qcom-geni: make sure I2C hub controllers can't use SE DMA i2c: riic: Move suspend handling to NOIRQ phase dt-bindings: i2c: brcm,iproc-i2c: Allow 2 reg entries for brcm,iproc-nic-i2c
2026-01-18Merge tag 'edac_urgent_for_v6.19_rc6' of ↵Linus Torvalds2-8/+12
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: "Make sure the memory-mapped memory controller registers BAR gets unmapped when the driver memory allocation fails Fix that in both x38 and i3200 EDAC drivers as former has copied the bug from the latter, it looks like" * tag 'edac_urgent_for_v6.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/x38: Fix a resource leak in x38_probe1() EDAC/i3200: Fix a resource leak in i3200_probe1()
2026-01-18Merge tag 'x86-urgent-2026-01-18' of ↵Linus Torvalds3-4/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix resctrl initialization on Hygon CPUs - Fix resctrl memory bandwidth counters on Hygon CPUs - Fix x86 self-tests build bug * tag 'x86-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/x86: Add selftests include path for kselftest.h after centralization x86/resctrl: Fix memory bandwidth counter width for Hygon x86/resctrl: Add missing resctrl initialization for Hygon
2026-01-18Merge tag 'timers-urgent-2026-01-18' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix the update_needs_ipi() check in the hrtimer code that may result in incorrect skipping of hrtimer IPIs" * tag 'timers-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Fix softirq base check in update_needs_ipi()
2026-01-18Merge tag 'sched-urgent-2026-01-18' of ↵Linus Torvalds6-56/+59
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc deadline scheduler fixes, mainly for a new category of bugs that were discovered and fixed recently: - Fix a race condition in the DL server - Fix a DL server bug which can result in incorrectly going idle when there's work available - Fix DL server bug which triggers a WARN() due to broken get_prio_dl() logic and subsequent misbehavior - Fix double update_rq_clock() calls - Fix setscheduler() assumption about static priorities - Make sure balancing callbacks are always called - Plus a handful of preparatory commits for the fixes" * tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Use ENQUEUE_MOVE to allow priority change sched: Deadline has dynamic priority sched: Audit MOVE vs balance_callbacks sched: Fold rq-pin swizzle into __balance_callbacks() sched/deadline: Avoid double update_rq_clock() sched/deadline: Ensure get_prio_dl() is up-to-date sched/deadline: Fix server stopping with runnable tasks sched: Provide idle_rq() helper sched/deadline: Fix potential race in dl_add_task_root_domain() sched/deadline: Remove unnecessary comment in dl_add_task_root_domain()
2026-01-18Merge tag 'objtool-urgent-2026-01-18' of ↵Linus Torvalds2-12/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "Fix two objtool build failures that trigger in uncommon build environments" * tag 'objtool-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: fix build failure due to missing libopcodes check objtool: fix compilation failure with the x32 toolchain