Age | Commit message (Collapse) | Author | Files | Lines |
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA
xfs: reload entire iunlink lists
This is the second part of correcting XFS to reload the incore unlinked
inode list from the ondisk contents. Whereas part one tackled failures
from regular filesystem calls, this part takes on the problem of needing
to reload the entire incore unlinked inode list on account of somebody
loading an inode that's in the /middle/ of an unlinked list. This
happens during quotacheck, bulkstat, or even opening a file by handle.
In this case we don't know the length of the list that we're reloading,
so we don't want to create a new unbounded memory load while holding
resources locked. Instead, we'll target UNTRUSTED iget calls to reload
the entire bucket.
Note that this changes the definition of the incore unlinked inode list
slightly -- i_prev_unlinked == 0 now means "not on the incore list".
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'fix-iunlink-list-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: make inode unlinked bucket recovery work with quotacheck
xfs: reload entire unlinked bucket lists
xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA
xfs: reload the last iunlink item
It turns out that there are some serious bugs in how xfs handles the
unlinked inode lists. Way back before 4.14, there was a bug where a ro
mount of a dirty filesystem would recover the log bug neglect to purge
the unlinked list. This leads to clean unmounted filesystems with
unlinked inodes. Starting around 5.15, we also converted the codebase
to maintain a doubly-linked incore unlinked list. However, we never
provided the ability to load the incore list from disk. If someone
tries to allocate an O_TMPFILE file on a clean fs with a pre-existing
unlinked list or even deletes a file, the code will fail and the fs
shuts down.
This first part of the correction effort adds the ability to load the
first inode in the bucket when unlinking a file; and to load the next
inode in the list when inactivating (freeing) an inode.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'fix-iunlink-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: load uncached unlinked inodes into memory on demand
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA
xfs: fix EFI recovery livelocks
This series fixes a customer-reported transaction reservation bug
introduced ten years ago that could result in livelocks during log
recovery. Log intent item recovery single-steps each step of a deferred
op chain, which means that each step only needs to allocate one
transaction's worth of space in the log, not an entire chain all at
once. This single-stepping is critical to unpinning the log tail since
there's nobody else to do it for us.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'fix-efi-recovery-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: reserve less log space when recovering log intent items
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA
xfs: fix ro mounting with unknown rocompat features
Dave pointed out some failures in xfs/270 when he upgraded Debian
unstable and util-linux started using the new mount apis. Upon further
inquiry I noticed that XFS is quite a hot mess when it encounters a
filesystem with unrecognized rocompat bits set in the superblock.
Whereas we used to allow readonly mounts under these conditions, a
change to the sb write verifier several years ago resulted in the
filesystem going down immediately because the post-mount log cleaning
writes the superblock, which trips the sb write verifier on the
unrecognized rocompat bit. I made the observation that the ROCOMPAT
features RMAPBT and REFLINK both protect new log intent item types,
which means that we actually cannot support recovering the log if we
don't recognize all the rocompat bits.
Therefore -- fix inode inactivation to work when we're recovering the
log, disallow recovery when there's unrecognized rocompat bits, and
don't clean the log if doing so would trip the rocompat checks.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'fix-ro-mounts-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: fix log recovery when unknown rocompat bits are set
xfs: allow inode inactivation during a ro mount log recovery
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA
xfs: fix cpu hotplug mess
Ritesh and Eric separately reported crashes in XFS's hook function for
CPU hot remove if the remove event races with a filesystem being
mounted. I also noticed via generic/650 that once in a while the log
will shut down over an apparent overrun of a transaction reservation;
this turned out to be due to CIL percpu list aggregation failing to pick
up the percpu list items from a dying CPU.
Either way, the solution here is to eliminate the need for a CPU dying
hook by using a private cpumask to track which CPUs have added to their
percpu lists directly, and iterating with that mask. This fixes the log
problems and (I think) solves a theoretical UAF bug in the inodegc code
too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'fix-percpu-lists-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: remove CPU hotplug infrastructure
xfs: remove the all-mounts list
xfs: use per-mount cpumask to track nonempty percpu inodegc lists
xfs: fix per-cpu CIL structure aggregation racing with dying cpus
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA
xfs: fix fsmap cursor handling
This patchset addresses an integer overflow bug that Dave Chinner found
in how fsmap handles figuring out where in the record set we left off
when userspace calls back after the first call filled up all the
designated record space.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'fix-fsmap-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: fix an agbno overflow in __xfs_getfsmap_datadev
|
|
Teach quotacheck to reload the unlinked inode lists when walking the
inode table. This requires extra state handling, since it's possible
that a reloaded inode will get inactivated before quotacheck tries to
scan it; in this case, we need to ensure that the reloaded inode does
not have dquots attached when it is freed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
shrikanth hegde reports that filesystems fail shortly after mount with
the following failure:
WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs]
This of course is the WARN_ON_ONCE in xfs_iunlink_lookup:
ip = radix_tree_lookup(&pag->pag_ici_root, agino);
if (WARN_ON_ONCE(!ip || !ip->i_ino)) { ... }
From diagnostic data collected by the bug reporters, it would appear
that we cleanly mounted a filesystem that contained unlinked inodes.
Unlinked inodes are only processed as a final step of log recovery,
which means that clean mounts do not process the unlinked list at all.
Prior to the introduction of the incore unlinked lists, this wasn't a
problem because the unlink code would (very expensively) traverse the
entire ondisk metadata iunlink chain to keep things up to date.
However, the incore unlinked list code complains when it realizes that
it is out of sync with the ondisk metadata and shuts down the fs, which
is bad.
Ritesh proposed to solve this problem by unconditionally parsing the
unlinked lists at mount time, but this imposes a mount time cost for
every filesystem to catch something that should be very infrequent.
Instead, let's target the places where we can encounter a next_unlinked
pointer that refers to an inode that is not in cache, and load it into
cache.
Note: This patch does not address the problem of iget loading an inode
from the middle of the iunlink list and needing to set i_prev_unlinked
correctly.
Reported-by: shrikanth hegde <sshegde@linux.vnet.ibm.com>
Triaged-by: Ritesh Harjani <ritesh.list@gmail.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Wengang Wang reports that a customer's system was running a number of
truncate operations on a filesystem with a very small log. Contention
on the reserve heads lead to other threads stalling on smaller updates
(e.g. mtime updates) long enough to result in the node being rebooted
on account of the lack of responsivenes. The node failed to recover
because log recovery of an EFI became stuck waiting for a grant of
reserve space. From Wengang's report:
"For the file deletion, log bytes are reserved basing on
xfs_mount->tr_itruncate which is:
tr_logres = 175488,
tr_logcount = 2,
tr_logflags = XFS_TRANS_PERM_LOG_RES,
"You see it's a permanent log reservation with two log operations (two
transactions in rolling mode). After calculation (xlog_calc_unit_res()
adds space for various log headers), the final log space needed per
transaction changes from 175488 to 180208 bytes. So the total log
space needed is 360416 bytes (180208 * 2). [That quantity] of log space
(360416 bytes) needs to be reserved for both run time inode removing
(xfs_inactive_truncate()) and EFI recover (xfs_efi_item_recover())."
In other words, runtime pre-reserves 360K of space in anticipation of
running a chain of two transactions in which each transaction gets a
180K reservation.
Now that we've allocated the transaction, we delete the bmap mapping,
log an EFI to free the space, and roll the transaction as part of
finishing the deferops chain. Rolling creates a new xfs_trans which
shares its ticket with the old transaction. Next, xfs_trans_roll calls
__xfs_trans_commit with regrant == true, which calls xlog_cil_commit
with the same regrant parameter.
xlog_cil_commit calls xfs_log_ticket_regrant, which decrements t_cnt and
subtracts t_curr_res from the reservation and write heads.
If the filesystem is fresh and the first transaction only used (say)
20K, then t_curr_res will be 160K, and we give that much reservation
back to the reservation head. Or if the file is really fragmented and
the first transaction actually uses 170K, then t_curr_res will be 10K,
and that's what we give back to the reservation.
Having done that, we're now headed into the second transaction with an
EFI and 180K of reservation. Other threads apparently consumed all the
reservation for smaller transactions, such as timestamp updates.
Now let's say the first transaction gets written to disk and we crash
without ever completing the second transaction. Now we remount the fs,
log recovery finds the unfinished EFI, and calls xfs_efi_recover to
finish the EFI. However, xfs_efi_recover starts a new tr_itruncate
tranasction, which asks for 360K log reservation. This is a lot more
than the 180K that we had reserved at the time of the crash. If the
first EFI to be recovered is also pinning the tail of the log, we will
be unable to free any space in the log, and recovery livelocks.
Wengang confirmed this:
"Now we have the second transaction which has 180208 log bytes reserved
too. The second transaction is supposed to process intents including
extent freeing. With my hacking patch, I blocked the extent freeing 5
hours. So in that 5 hours, 180208 (NOT 360416) log bytes are reserved.
"With my test case, other transactions (update timestamps) then happen.
As my hacking patch pins the journal tail, those timestamp-updating
transactions finally use up (almost) all the left available log space
(in memory in on disk). And finally the on disk (and in memory)
available log space goes down near to 180208 bytes. Those 180208 bytes
are reserved by [the] second (extent-free) transaction [in the chain]."
Wengang and I noticed that EFI recovery starts a transaction, completes
one step of the chain, and commits the transaction without completing
any other steps of the chain. Those subsequent steps are completed by
xlog_finish_defer_ops, which allocates yet another transaction to
finish the rest of the chain. That transaction gets the same tr_logres
as the head transaction, but with tr_logcount = 1 to force regranting
with every roll to avoid livelocks.
In other words, we already figured this out in commit 929b92f64048d
("xfs: xfs_defer_capture should absorb remaining transaction
reservation"), but should have applied that logic to each intent item's
recovery function. For Wengang's case, the xfs_trans_alloc call in the
EFI recovery function should only be asking for a single transaction's
worth of log reservation -- 180K, not 360K.
Quoting Wengang again:
"With log recovery, during EFI recovery, we use tr_itruncate again to
reserve two transactions that needs 360416 log bytes. Reserving 360416
bytes fails [stalls] because we now only have about 180208 available.
"Actually during the EFI recover, we only need one transaction to free
the extents just like the 2nd transaction at RUNTIME. So it only needs
to reserve 180208 rather than 360416 bytes. We have (a bit) more than
180208 available log bytes on disk, so [if we decrease the reservation
to 180K] the reservation goes and the recovery [finishes]. That is to
say: we can fix the log recover part to fix the issue. We can introduce
a new xfs_trans_res xfs_mount->tr_ext_free
{
tr_logres = 175488,
tr_logcount = 0,
tr_logflags = 0,
}
"and use tr_ext_free instead of tr_itruncate in EFI recover."
However, I don't think it quite makes sense to create an entirely new
transaction reservation type to handle single-stepping during log
recovery. Instead, we should copy the transaction reservation
information in the xfs_mount, change tr_logcount to 1, and pass that
into xfs_trans_alloc. We know this won't risk changing the min log size
computation since we always ask for a fraction of the reservation for
all known transaction types.
This looks like it's been lurking in the codebase since commit
3d3c8b5222b92, which changed the xfs_trans_reserve call in
xlog_recover_process_efi to use the tr_logcount in tr_itruncate.
That changed the EFI recovery transaction from making a
non-XFS_TRANS_PERM_LOG_RES request for one transaction's worth of log
space to a XFS_TRANS_PERM_LOG_RES request for two transactions worth.
Fixes: 3d3c8b5222b92 ("xfs: refactor xfs_trans_reserve() interface")
Complements: 929b92f64048d ("xfs: xfs_defer_capture should absorb remaining transaction reservation")
Suggested-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Srikanth C S <srikanth.c.s@oracle.com>
[djwong: apply the same transformation to all log intent recovery]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Log recovery has always run on read only mounts, even where the primary
superblock advertises unknown rocompat bits. Due to a misunderstanding
between Eric and Darrick back in 2018, we accidentally changed the
superblock write verifier to shutdown the fs over that exact scenario.
As a result, the log cleaning that occurs at the end of the mounting
process fails if there are unknown rocompat bits set.
As we now allow writing of the superblock if there are unknown rocompat
bits set on a RO mount, we no longer want to turn off RO state to allow
log recovery to succeed on a RO mount. Hence we also remove all the
(now unnecessary) RO state toggling from the log recovery path.
Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier"
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
The previous patch to reload unrecovered unlinked inodes when adding a
newly created inode to the unlinked list is missing a key piece of
functionality. It doesn't handle the case that someone calls xfs_iget
on an inode that is not the last item in the incore list. For example,
if at mount time the ondisk iunlink bucket looks like this:
AGI -> 7 -> 22 -> 3 -> NULL
None of these three inodes are cached in memory. Now let's say that
someone tries to open inode 3 by handle. We need to walk the list to
make sure that inodes 7 and 22 get loaded cold, and that the
i_prev_unlinked of inode 3 gets set to 22.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
In the next patch, we're going to prohibit log recovery if the primary
superblock contains an unrecognized rocompat feature bit even on
readonly mounts. This requires removing all the code in the log
mounting process that temporarily disables the readonly state.
Unfortunately, inode inactivation disables itself on readonly mounts.
Clearing the iunlinked lists after log recovery needs inactivation to
run to free the unreferenced inodes, which (AFAICT) is the only reason
why log mounting plays games with the readonly state in the first place.
Therefore, change the inactivation predicates to allow inactivation
during log recovery of a readonly mount.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Alter the definition of i_prev_unlinked slightly to make it more obvious
when an inode with 0 link count is not part of the iunlink bucket lists
rooted in the AGI. This distinction is necessary because it is not
sufficient to check inode.i_nlink to decide if an inode is on the
unlinked list. Updates to i_nlink can happen while holding only
ILOCK_EXCL, but updates to an inode's position in the AGI unlinked list
(which happen after the nlink update) requires both ILOCK_EXCL and the
AGI buffer lock.
The next few patches will make it possible to reload an entire unlinked
bucket list when we're walking the inode table or performing handle
operations and need more than the ability to iget the last inode in the
chain.
The upcoming directory repair code also needs to be able to make this
distinction to decide if a zero link count directory should be moved to
the orphanage or allowed to inactivate. An upcoming enhancement to the
online AGI fsck code will need this distinction to check and rebuild the
AGI unlinked buckets.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
There are no users of the cpu hotplug hooks in xfs now, so remove it.
This reverts f1653c2e2831e ("xfs: introduce CPU hotplug
infrastructure").
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Revert commit 0ed17f01c8540 ("xfs: introduce all-mounts list for cpu
hotplug notifications") because the cpu hotplug hooks are now pointless,
so we don't need this list anymore.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Directly track which CPUs have contributed to the inodegc percpu lists
instead of trusting the cpu online mask. This eliminates a theoretical
problem where the inodegc flush functions might fail to flush a CPU's
inodes if that CPU happened to be dying at exactly the same time. Most
likely nobody's noticed this because the CPU dead hook moves the percpu
inodegc list to another CPU and schedules that worker immediately. But
it's quite possible that this is a subtle race leading to UAF if the
inodegc flush were part of an unmount.
Further benefits: This reduces the overhead of the inodegc flush code
slightly by allowing us to ignore CPUs that have empty lists. Better
yet, it reduces our dependence on the cpu online masks, which have been
the cause of confusion and drama lately.
Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Dave Chinner reported that xfs/273 fails if the AG size happens to be an
exact power of two. I traced this to an agbno integer overflow when the
current GETFSMAP call is a continuation of a previous GETFSMAP call, and
the last record returned was non-shareable space at the end of an AG.
__xfs_getfsmap_datadev sets up a data device query by converting the
incoming fmr_physical into an xfs_fsblock_t and cracking it into an agno
and agbno pair. In the (failing) case of where fmr_blockcount of the
low key is nonzero and the record was for a non-shareable extent, it
will add fmr_blockcount to start_fsb and info->low.rm_startblock.
If the low key was actually the last record for that AG, then this
addition causes info->low.rm_startblock to point beyond EOAG. When the
rmapbt range query starts, it'll return an empty set, and fsmap moves on
to the next AG.
Or so I thought. Remember how we added to start_fsb?
If agsize < 1<<agblklog, start_fsb points to the same AG as the original
fmr_physical from the low key. We run the rmapbt query, which returns
nothing, so getfsmap zeroes info->low and moves on to the next AG.
If agsize == 1<<agblklog, start_fsb now points to the next AG. We run
the rmapbt query on the next AG with the excessively large
rm_startblock. If this next AG is actually the last AG, we'll set
info->high to EOFS (which is now has a lower rm_startblock than
info->low), and the ranged btree query code will return -EINVAL. If
it's not the last AG, we ignore all records for the intermediate AGs.
Oops.
Fix this by decoding start_fsb into agno and agbno only after making
adjustments to start_fsb. This means that info->low.rm_startblock will
always be set to a valid agbno, and we always start the rmapbt iteration
in the correct AG.
While we're at it, fix the predicate for determining if an fsmap record
represents non-shareable space to include file data on pre-reflink
filesystems.
Reported-by: Dave Chinner <david@fromorbit.com>
Fixes: 63ef7a35912dd ("xfs: fix interval filtering in multi-step fsmap queries")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
In commit 7c8ade2121200 ("xfs: implement percpu cil space used
calculation"), the XFS committed (log) item list code was converted to
use per-cpu lists and space tracking to reduce cpu contention when
multiple threads are modifying different parts of the filesystem and
hence end up contending on the log structures during transaction commit.
Each CPU tracks its own commit items and space usage, and these do not
have to be merged into the main CIL until either someone wants to push
the CIL items, or we run over a soft threshold and switch to slower (but
more accurate) accounting with atomics.
Unfortunately, the for_each_cpu iteration suffers from the same race
with cpu dying problem that was identified in commit 8b57b11cca88f
("pcpcntrs: fix dying cpu summation race") -- CPUs are removed from
cpu_online_mask before the CPUHP_XFS_DEAD callback gets called. As a
result, both CIL percpu structure aggregation functions fail to collect
the items and accounted space usage at the correct point in time.
If we're lucky, the items that are collected from the online cpus exceed
the space given to those cpus, and the log immediately shuts down in
xlog_cil_insert_items due to the (apparent) log reservation overrun.
This happens periodically with generic/650, which exercises cpu hotplug
vs. the filesystem code:
smpboot: CPU 3 is now offline
XFS (sda3): ctx ticket reservation ran out. Need to up reservation
XFS (sda3): ticket reservation summary:
XFS (sda3): unit res = 9268 bytes
XFS (sda3): current res = -40 bytes
XFS (sda3): original count = 1
XFS (sda3): remaining count = 1
XFS (sda3): Filesystem has been shut down due to log error (0x2).
Applying the same sort of fix from 8b57b11cca88f to the CIL code seems
to make the generic/650 problem go away, but I've been told that tglx
was not happy when he saw:
"...the only thing we actually need to care about is that
percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when
there are no CPUs dying, it has no addition overhead except for a
cpumask_or() operation."
The CPU hotplug code is rather complex and difficult to understand and I
don't want to try to understand the cpu hotplug locking well enough to
use cpu_dying mask. Furthermore, there's a performance improvement that
could be had here. Attach a private cpu mask to the CIL structure so
that we can track exactly which cpus have accessed the percpu data at
all. It doesn't matter if the cpu has since gone offline; log item
aggregation will still find the items. Better yet, we skip cpus that
have not recently logged anything.
Worse yet, Ritesh Harjani and Eric Sandeen both reported today that CPU
hot remove racing with an xfs mount can crash if the cpu_dead notifier
tries to access the log but the mount hasn't yet set up the log.
Link: https://lore.kernel.org/linux-xfs/ZOLzgBOuyWHapOyZ@dread.disaster.area/T/
Link: https://lore.kernel.org/lkml/877cuj1mt1.ffs@tglx/
Link: https://lore.kernel.org/lkml/20230414162755.281993820@linutronix.de/
Link: https://lore.kernel.org/linux-xfs/ZOVkjxWZq0YmjrJu@dread.disaster.area/T/
Cc: tglx@linutronix.de
Cc: peterz@infradead.org
Reported-by: ritesh.list@gmail.com
Reported-by: sandeen@sandeen.net
Fixes: af1c2146a50b ("xfs: introduce per-cpu CIL tracking structure")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Commit d7a74cad8f45 ("xfs: track usage statistics of online fsck")
introduces config XFS_ONLINE_SCRUB_STATS, which selects the non-existing
config FS_DEBUG. It is probably intended to select the existing config
XFS_DEBUG.
Fix the select in config XFS_ONLINE_SCRUB_STATS.
Fixes: d7a74cad8f45 ("xfs: track usage statistics of online fsck")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
|
|
Pull drm ci scripts from Dave Airlie:
"This is a bunch of ci integration for the freedesktop gitlab instance
where we currently do upstream userspace testing on diverse sets of
GPU hardware. From my perspective I think it's an experiment worth
going with and seeing how the benefits/noise playout keeping these
files useful.
Ideally I'd like to get this so we can do pre-merge testing on PRs
eventually.
Below is some info from danvet on why we've ended up making the
decision and how we can roll it back if we decide it was a bad plan.
Why in upstream?
- like documentation, testcases, tools CI integration is one of these
things where you can waste endless amounts of time if you
accidentally have a version that doesn't match your source code
- but also like the above, there's a balance, this is the initial cut
of what we think makes sense to keep in sync vs out-of-tree,
probably needs adjustment
- gitlab supports out-of-repo gitlab integration and that's what's
been used for the kernel in drm, but it results in per-driver
fragmentation and lots of duplicated effort. the simple act of
smashing an arbitrary winner into a topic branch already started
surfacing patches on dri-devel and sparking good cross driver team
discussions
Why gitlab?
- it's not any more shit than any of the other CI
- drm userspace uses it extensively for everything in userspace, we
have a lot of people and experience with this, including
integration of hw testing labs
- media userspace like gstreamer is also on gitlab.fd.o, and there's
discussion to extend this to the media subsystem in some fashion
Can this be shared?
- there's definitely a pile of code that could move to scripts/ if
other subsystem adopt ci integration in upstream kernel git. other
bits are more drm/gpu specific like the igt-gpu-tests/tools
integration
- docker images can be run locally or in other CI runners
Will we regret this?
- it's all in one directory, intentionally, for easy deletion
- probably 1-2 years in upstream to see whether this is worth it or a
Big Mistake. that's roughly what it took to _really_ roll out solid
CI in the bigger userspace projects we have on gitlab.fd.o like
mesa3d"
* tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm:
drm: ci: docs: fix build warning - add missing escape
drm: Add initial ci/ subdirectory
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix preemption delays in the SGX code, remove unnecessarily
UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
make the x86 SMP init code a bit more conservative to fix kexec()
lockups"
* tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
x86/smp: Don't send INIT to non-present and non-booted CPUs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event fix from Ingo Molnar:
"Work around a firmware bug in the uncore PMU driver, affecting certain
Intel systems"
* tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/uncore: Correct the number of CHAs on EMR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"perf tools maintainership:
- Add git information for perf-tools and perf-tools-next trees and
branches to the MAINTAINERS file. That is where development now
takes place and myself and Namhyung Kim have write access, more
people to come as we emulate other maintainer groups.
perf record:
- Record kernel data maps when 'perf record --data' is used, so that
global variables can be resolved and used in tools that do data
profiling.
perf trace:
- Remove the old, experimental support for BPF events in which a .c
file was passed as an event: "perf trace -e hello.c" to then get
compiled and loaded.
The only known usage for that, that shipped with the kernel as an
example for such events, augmented the raw_syscalls tracepoints and
was converted to a libbpf skeleton, reusing all the user space
components and the BPF code connected to the syscalls.
In the end just the way to glue the BPF part and the user space
type beautifiers changed, now being performed by libbpf skeletons.
The next step is to use BTF to do pretty printing of all syscall
types, as discussed with Alan Maguire and others.
Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all
path/filenames/strings, some of the networking data structures,
perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls
and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5
seconds:
# perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5
0.000 ( 9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
9.039 ( 0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
? ( ): gpm/991 ... [continued]: clock_nanosleep()) = 0
10.133 ( ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ...
? ( ): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
30.276 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
30.276 (2000.394 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0
1230.814 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
1230.814 (1000.404 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
2030.886 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
? ( ): crond/1172 ... [continued]: clock_nanosleep()) = 0
3242.699 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
2030.886 (2000.385 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0
3728.078 ( ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ...
3242.699 (1000.158 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
4031.409 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
10.133 (5000.375 ms): sleep/327642 ... [continued]: clock_nanosleep()) = 0
Performance counter stats for 'sleep 5':
2,617,347 cycles
1,855,997 instructions # 0.71 insn per cycle
5.002282128 seconds time elapsed
0.000855000 seconds user
0.000852000 seconds sys
perf annotate:
- Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1)
for licensing reasons, and we missed a build test on
tools/perf/tests makefile.
Since we now default to NDEBUG=1, we ended up segfaulting when
building with BUILD_NONDISTRO=1 because a needed initialization
routine was being "error checked" via an assert.
Fix it by explicitly checking the result and aborting instead if it
fails.
We better back propagate the error, but at least 'perf annotate' on
samples collected for a BPF program is back working when perf is
built with BUILD_NONDISTRO=1.
perf report/top:
- Add back TUI hierarchy mode header, that is seen when using 'perf
report/top --hierarchy'.
- Fix the number of entries for 'e' key in the TUI that was
preventing navigation of lines when expanding an entry.
perf report/script:
- Support cross platform register handling, allowing a perf.data file
collected on one architecture to have registers sampled correctly
displayed when analysis tools such as 'perf report' and 'perf
script' are used on a different architecture.
- Fix handling of event attributes in pipe mode, i.e. when one uses:
perf record -o - | perf report -i -
When no perf.data files are used.
- Handle files generated via pipe mode with a version of perf and
then read also via pipe mode with a different version of perf,
where the event attr record may have changed, use the record size
field to properly support this version mismatch.
perf probe:
- Accessing global variables from uprobes isn't supported, make the
error message state that instead of stating that some minimal
kernel version is needed to have that feature. This seems just a
tool limitation, the kernel probably has all that is needed.
perf tests:
- Fix a reference count related leak in the dlfilter v0 API where the
result of a thread__find_symbol_fb() is not matched with an
addr_location__exit() to drop the reference counts of the resolved
components (machine, thread, map, symbol, etc). Add a dlfilter test
to make sure that doesn't regresses.
- Lots of fixes for the 'perf test' written in shell script related
to problems found with the shellcheck utility.
- Fixes for 'perf test' shell scripts testing features enabled when
perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf
counters.
- Add perf record sample filtering test, things like the following
example, that gets implemented as a BPF filter attached to the
event:
# perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000'
- Improve the way the task_analyzer test checks if libtraceevent is
linked, using 'perf version --build-options' instead of the more
expensinve 'perf record -e "sched:sched_switch"'.
- Add support for riscv in the mmap-basic test. (This went as well
via the RiscV tree, same contents).
libperf:
- Implement riscv mmap support (This went as well via the RiscV tree,
same contents).
perf script:
- New tool that converts perf.data files to the firefox profiler
format so that one can use the visualizer at
https://profiler.firefox.com/. Done by Anup Sharma as part of this
year's Google Summer of Code.
One can generate the output and upload it to the web interface but
Anup also automated everything:
perf script gecko -F 99 -a sleep 60
- Support syscall name parsing on arm64.
- Print "cgroup" field on the same line as "comm".
perf bench:
- Add new 'uprobe' benchmark to measure the overhead of uprobes
with/without BPF programs attached to it.
- breakpoints are not available on power9, skip that test.
perf stat:
- Add #num_cpus_online literal to be used in 'perf stat' metrics, and
add this extra 'perf test' check that exemplifies its purpose:
TEST_ASSERT_VAL("#num_cpus_online",
expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0);
TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0);
TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online);
Miscellaneous:
- Improve tool startup time by lazily reading PMU, JSON, sysfs data.
- Improve error reporting in the parsing of events, passing YYLTYPE
to error routines, so that the output can show were the parsing
error was found.
- Add 'perf test' entries to check the parsing of events
improvements.
- Fix various leak for things detected by -fsanitize=address, mostly
things that would be freed at tool exit, including:
- Free evsel->filter on the destructor.
- Allow tools to register a thread->priv destructor and use it in
'perf trace'.
- Free evsel->priv in 'perf trace'.
- Free string returned by synthesize_perf_probe_point() when the
caller fails to do all it needs.
- Adjust various compiler options to not consider errors some
warnings when building with broken headers found in things like
python, flex, bison, as we otherwise build with -Werror. Some for
gcc, some for clang, some for some specific version of those, some
for some specific version of flex or bison, or some specific
combination of these components, bah.
- Allow customization of clang options for BPF target, this helps
building on gentoo where there are other oddities where BPF targets
gets passed some compiler options intended for the native build, so
building with WERROR=0 helps while these oddities are fixed.
- Dont pass ERR_PTR() values to perf_session__delete() in 'perf top'
and 'perf lock', fixing some segfaults when handling some odd
failures.
- Add LTO build option.
- Fix format of unordered lists in the perf docs
(tools/perf/Documentation)
- Overhaul the bison files, using constructs such as YYNOMEM.
- Remove unused tokens from the bison .y files.
- Add more comments to various structs.
- A few LoongArch enablement patches.
Vendor events (JSON):
- Add JSON metrics for Yitian 710 DDR (aarch64). Things like:
EventName, BriefDescription
visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.",
visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.",
op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.",
op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.",
op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.",
- Add AmpereOne metrics (aarch64).
- Update N2 and V2 metrics (aarch64) and events using Arm telemetry
repo.
- Update scale units and descriptions of common topdown metrics on
aarch64. Things like:
- "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
- "BriefDescription": "Frontend bound L1 topdown metric",
+ "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
+ "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",
- Update events for intel: meteorlake to 1.04, sapphirerapids to
1.15, Icelake+ metric constraints.
- Update files for the power10 platform"
* tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits)
perf parse-events: Fix driver config term
perf parse-events: Fixes relating to no_value terms
perf parse-events: Fix propagation of term's no_value when cloning
perf parse-events: Name the two term enums
perf list: Don't print Unit for "default_core"
perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake
perf dlfilter: Avoid leak in v0 API test use of resolve_address()
perf metric: Add #num_cpus_online literal
perf pmu: Remove str from perf_pmu_alias
perf parse-events: Make common term list to strbuf helper
perf parse-events: Minor help message improvements
perf pmu: Avoid uninitialized use of alias->str
perf jevents: Use "default_core" for events with no Unit
perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
perf test shell stat_bpf_counters: Fix test on Intel
perf test shell record_bpf_filter: Skip 6.2 kernel
libperf: Get rid of attr.id field
perf tools: Convert to perf_record_header_attr_id()
libperf: Add perf_record_header_attr_id()
perf tools: Handle old data in PERF_RECORD_ATTR
...
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- six smb3 client fixes including ones to allow controlling smb3
directory caching timeout and limits, and one debugging improvement
- one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option)
- one minor spnego registry update
* tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
spnego: add missing OID to oid registry
smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU
cifs: update internal module version number for cifs.ko
smb3: allow controlling maximum number of cached directories
smb3: add trace point for queryfs (statfs)
nls: Hide new NLS_UCS2_UTILS
smb3: allow controlling length of time directory entries are cached with dir leases
smb: propagate error code of extract_sharename()
|
|
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction. ITER_DISCARD isn't dealt with
either as that can't be extracted.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction. ITER_DISCARD isn't dealt with
either as that does nothing.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
iov_iter_extract_pages() doesn't correctly handle skipping over initial
zero-length entries in ITER_KVEC and ITER_BVEC-type iterators.
The problem is that it accidentally reduces maxsize to 0 when it
skipping and thus runs to the end of the array and returns 0.
Fix this by sticking the calculated size-to-copy in a new variable
rather than back in maxsize.
Fixes: 7d58fe731028 ("iov_iter: Add a function to extract a page list from an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from Adrian Glaubitz:
- Fix a use-after-free bug in the push-switch driver (Duoming Zhou)
- Fix calls to dma_declare_coherent_memory() that incorrectly passed
the buffer end address instead of the buffer size as the size
parameter
* tag 'sh-for-v6.6-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: push-switch: Reorder cleanup operations to avoid use-after-free bug
sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- The kernel now dynamically probes for misaligned access speed, as
opposed to relying on a table of known implementations.
- Support for non-coherent devices on systems using the Andes AX45MP
core, including the RZ/Five SoCs.
- Support for the V extension in ptrace(), again.
- Support for KASLR.
- Support for the BPF prog pack allocator in RISC-V.
- A handful of bug fixes and cleanups.
* tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (25 commits)
soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met
riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config
riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config
riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled
bpf, riscv: use prog pack allocator in the BPF JIT
riscv: implement a memset like function for text
riscv: extend patch_text_nosync() for multiple pages
bpf: make bpf_prog_pack allocator portable
riscv: libstub: Implement KASLR by using generic functions
libstub: Fix compilation warning for rv32
arm64: libstub: Move KASLR handling functions to kaslr.c
riscv: Dump out kernel offset information on panic
riscv: Introduce virtual kernel mapping KASLR
RISC-V: Add ptrace support for vectors
soc: renesas: Kconfig: Select the required configs for RZ/Five SoC
cache: Add L2 cache management for Andes AX45MP RISC-V core
dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller
riscv: mm: dma-noncoherent: nonstandard cache operations support
riscv: errata: Add Andes alternative ports
riscv: asm: vendorid_list: Add Andes Technology to the vendors list
...
|
|
The original code puts flush_work() before timer_shutdown_sync()
in switch_drv_remove(). Although we use flush_work() to stop
the worker, it could be rescheduled in switch_timer(). As a result,
a use-after-free bug can occur. The details are shown below:
(cpu 0) | (cpu 1)
switch_drv_remove() |
flush_work() |
... | switch_timer // timer
| schedule_work(&psw->work)
timer_shutdown_sync() |
... | switch_work_handler // worker
kfree(psw) // free |
| psw->state = 0 // use
This patch puts timer_shutdown_sync() before flush_work() to
mitigate the bugs. As a result, the worker and timer will be
stopped safely before the deallocate operations.
Fixes: 9f5e8eee5cfe ("sh: generic push-switch framework.")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20230802033737.9738-1-duoming@zju.edu.cn
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
|
|
In all these cases, the last argument to dma_declare_coherent_memory() is
the buffer end address, but the expected value should be the size of the
reserved region.
Fixes: 39fb993038e1 ("media: arch: sh: ap325rxa: Use new renesas-ceu camera driver")
Fixes: c2f9b05fd5c1 ("media: arch: sh: ecovec: Use new renesas-ceu camera driver")
Fixes: f3590dc32974 ("media: arch: sh: kfr2r09: Use new renesas-ceu camera driver")
Fixes: 186c446f4b84 ("media: arch: sh: migor: Use new renesas-ceu camera driver")
Fixes: 1a3c230b4151 ("media: arch: sh: ms7724se: Use new renesas-ceu camera driver")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://lore.kernel.org/r/20230724120742.2187-1-petrtesarik@huaweicloud.com
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
|
|
Pull more SCSI updates from James Bottomley:
"Mostly small stragglers that missed the initial merge.
Driver updates are qla2xxx and smartpqi (mp3sas has a high diffstat
due to the volatile qualifier removal, fnic due to unused function
removal and sd.c has a lot of code shuffling to remove forward
declarations)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (38 commits)
scsi: ufs: core: No need to update UPIU.header.flags and lun in advanced RPMB handler
scsi: ufs: core: Add advanced RPMB support where UFSHCI 4.0 does not support EHS length in UTRD
scsi: mpt3sas: Remove volatile qualifier
scsi: mpt3sas: Perform additional retries if doorbell read returns 0
scsi: libsas: Simplify sas_queue_reset() and remove unused code
scsi: ufs: Fix the build for the old ARM OABI
scsi: qla2xxx: Fix unused variable warning in qla2xxx_process_purls_pkt()
scsi: fnic: Remove unused functions fnic_scsi_host_start/end_tag()
scsi: qla2xxx: Fix spelling mistake "tranport" -> "transport"
scsi: fnic: Replace sgreset tag with max_tag_id
scsi: qla2xxx: Remove unused variables in qla24xx_build_scsi_type_6_iocbs()
scsi: qla2xxx: Fix nvme_fc_rcv_ls_req() undefined error
scsi: smartpqi: Change driver version to 2.1.24-046
scsi: smartpqi: Enhance error messages
scsi: smartpqi: Enhance controller offline notification
scsi: smartpqi: Enhance shutdown notification
scsi: smartpqi: Simplify lun_number assignment
scsi: smartpqi: Rename pciinfo to pci_info
scsi: smartpqi: Rename MACRO to clarify purpose
scsi: smartpqi: Add abort handler
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver symbol lookup fix from Greg KH:
"Here is one last fixup for your tree for 6.6-rc1. It resolves a
problem with the way that symbol_get was changed in the module tree
merge in your tree to fix up the DVB drivers which rely on this old
api to attach new devices.
As the changelog comment says:
In commit 9011e49d54dc ("modules: only allow symbol_get of
EXPORT_SYMBOL_GPL modules") the use of symbol_get is properly
restricted to GPL-only marked symbols. This interacts oddly with the
DVB logic which only uses dvb_attach() to load the dvb driver which
then uses symbol_get().
Fix this up by properly marking all of the dvb_attach attach symbols
as EXPORT_SYMBOL_GPL().
This has been acked by Hans from the V4L driver side, Luis from the
module side, Mauro on the media side, and Christoph said it was the
correct solution, and was tested by the original reporter of the
issue.
It has passed 0-day testing, but has not been in linux-next due to it
only being sent yesterday"
* tag 'driver-core-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
media: dvb: symbol fixup for dvb_attach()
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- move a dma-debug call that prints a message out from a lock that's
causing problems with the lock order in serial drivers (Sergey
Senozhatsky)
- fix the CONFIG_DMA_NUMA_CMA Kconfig entry to have the right
dependency and not default to y (Christoph Hellwig)
- move an ifdef a bit to remove a __maybe_unused that seems to trip up
some sensitivities (Christoph Hellwig)
- revert a bogus check in the CMA allocator (Zhenhua Huang)
* tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping:
Revert "dma-contiguous: check for memory region overlap"
dma-pool: remove a __maybe_unused label in atomic_pool_expand
dma-contiguous: fix the Kconfig entry for CONFIG_DMA_NUMA_CMA
dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Add PCI_DYNAMIC_OF_NODES dependency on OF_IRQ to fix sparc64 build
error (Lizhi Hou)
- After coalescing host bridge resources, free any released resources
to avoid a leak (Ross Lagerwall)
- Revert a quirk that prevented NVIDIA T4 GPUs from using Secondary Bus
Reset. The quirk worked around an issue that we now think is related
to the Root Port, not the GPU (Bjorn Helgaas)
* tag 'pci-v6.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI: Mark NVIDIA T4 GPUs to avoid bus reset"
PCI: Free released resource after coalescing
PCI: Fix CONFIG_PCI_DYNAMIC_OF_NODES kconfig dependencies
|
|
Pull NTB updates from Jon Mason:
"Link toggling fixes and debugfs error path fixes"
[ And for everybody like me who always have to remind themselves what
the TLA of the day is, and what NTB stands for - it's a PCIe
"Non-Transparent Bridge" thing - Linus ]
* tag 'ntb-6.6' of https://github.com/jonmason/ntb:
ntb: Check tx descriptors outstanding instead of head/tail for tx queue
ntb: Fix calculation ntb_transport_tx_free_entry()
ntb: Drop packets when qp link is down
ntb: Clean up tx tail index on link down
ntb: amd: Drop unnecessary error check for debugfs_create_dir
NTB: ntb_tool: Switch to memdup_user_nul() helper
dtivers: ntb: fix parameter check in perf_setup_dbgfs()
ntb: Remove error checking for debugfs_create_dir()
|
|
Add missing OID to the registry. Some servers and clients (including
Windows) now request "NEGOEX - SPNEGEO Extended Negotiation Security")
See https://datatracker.ietf.org/doc/html/draft-zhu-negoex-02
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In commit 9011e49d54dc ("modules: only allow symbol_get of
EXPORT_SYMBOL_GPL modules") the use of symbol_get is properly restricted
to GPL-only marked symbols. This interacts oddly with the DVB logic
which only uses dvb_attach() to load the dvb driver which then uses
symbol_get().
Fix this up by properly marking all of the dvb_attach attach symbols as
EXPORT_SYMBOL_GPL().
Fixes: 9011e49d54dc ("modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules")
Cc: stable <stable@kernel.org>
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-media@vger.kernel.org
Cc: linux-modules@vger.kernel.org
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Link: https://lore.kernel.org/r/20230908092035.3815268-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull smb server update from Steve French:
"After two years, many fixes and much testing, ksmbd is no longer
experimental"
* tag '6.6-rc-ksmbd' of git://git.samba.org/ksmbd:
ksmbd: remove experimental warning
|
|
Pull xarray fixes from Matthew Wilcox:
- Fix a bug encountered by people using bittorrent where they'd get
NULL pointer dereferences on page cache lookups when using XFS
- Two documentation fixes
* tag 'xarray-6.6' of git://git.infradead.org/users/willy/xarray:
idr: fix param name in idr_alloc_cyclic() doc
xarray: Document necessary flag in alloc functions
XArray: Do not return sibling entries from xa_load()
|
|
Pull block fixes from Jens Axboe:
- Fix null_blk polled IO timeout handling (Chengming)
- Regression fix for swapped arguments in drbd bvec_set_page()
(Christoph)
- String length handling fix for s390 dasd (Heiko)
- Fixes for blk-throttle accounting (Yu)
- Fix page pinning issue for same page segments (Christoph)
- Remove redundant file_remove_privs() call (Christoph)
- Fix a regression in partition handling for devices not supporting
partitions (Li)
* tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linux:
drbd: swap bvec_set_page len and offset
block: fix pin count management when merging same-page segments
null_blk: fix poll request timeout handling
s390/dasd: fix string length handling
block: don't add or resize partition on the disk with GENHD_FL_NO_PART
block: remove the call to file_remove_privs in blkdev_write_iter
blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()
blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()
blk-throttle: fix wrong comparation while 'carryover_ios/bytes' is negative
blk-throttle: print signed value 'carryover_bytes/ios' for user
|
|
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into the 6.6-rc merge window:
- Fix for a regression this merge window caused by the SQPOLL
affinity patch, where we can race with SQPOLL thread shutdown and
cause an oops when trying to set affinity (Gabriel)
- Fix for a regression this merge window where fdinfo reading with
for a ring setup with IORING_SETUP_NO_SQARRAY will attempt to
deference the non-existing SQ ring array (me)
- Add the patch that allows more finegrained control over who can use
io_uring (Matteo)
- Locking fix for a regression added this merge window for IOPOLL
overflow (Pavel)
- IOPOLL fix for stable, breaking our loop if helper threads are
exiting (Pavel)
Also had a fix for unreaped iopoll requests from io-wq from Ming, but
we found an issue with that and hence it got reverted. Will get this
sorted for a future rc"
* tag 'io_uring-6.6-2023-09-08' of git://git.kernel.dk/linux:
Revert "io_uring: fix IO hang in io_wq_put_and_exit from do_exit()"
io_uring: fix unprotected iopoll overflow
io_uring: break out of iowq iopoll on teardown
io_uring: add a sysctl to disable io_uring system-wide
io_uring/fdinfo: only print ->sq_array[] if it's there
io_uring: fix IO hang in io_wq_put_and_exit from do_exit()
io_uring: Don't set affinity on a dying sqpoll thread
|
|
There was a minor typo in the define for SMB2_GLOBAL_CAP_LARGE_MTU
0X00000004 instead of 0x00000004
make it consistent
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"Eliminate an obsolete thermal zone registration function"
* tag 'thermal-6.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Drop thermal_zone_device_register()
thermal: Use thermal_tripless_zone_device_register()
thermal: core: Add function for registering tripless thermal zones
thermal: core: Clean up headers of thermal zone registration functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix an Intel RAPL power capping driver regression introduced during
the 6.5 development cycle (Srinivas Pandruvada)"
* tag 'pm-6.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap: intel_rapl: Fix invalid setting of Power Limit 4
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fix from Bartosz Golaszewski:
- fix a regression in irqchip setup in gpio-zynq
* tag 'gpio-fixes-for-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: zynq: restore zynq_gpio_irq_reqres/zynq_gpio_irq_relres callbacks
|
|
This reverts commit d5af729dc2071273f14cbb94abbc60608142fd83.
d5af729dc207 ("PCI: Mark NVIDIA T4 GPUs to avoid bus reset") avoided
Secondary Bus Reset on the T4 because the reset seemed to not work when the
T4 was directly attached to a Root Port.
But NVIDIA thinks the issue is probably related to some issue with the Root
Port, not with the T4. The T4 provides neither PM nor FLR reset, so
masking bus reset compromises this device for assignment scenarios.
Revert d5af729dc207 as requested by Wu Zongyong. This will leave SBR
broken in the specific configuration Wu tested, as it was in v6.5, so Wu
will debug that further.
Link: https://lore.kernel.org/r/ZPqMCDWvITlOLHgJ@wuzongyong-alibaba
Link: https://lore.kernel.org/r/20230908201104.GA305023@bhelgaas
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of fixes for 6.6-rc1. All small and easy ones.
- The corrections of the previous PCM iov_iter transitions
- Regression fixes in MIDI 2.0 / USB changes
- Various ASoC codec fixes for Cirrus, Realtek, WCD
- ASoC AMD quirks and ASoC Intel AVS driver workaround"
* tag 'sound-fix-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek - ALC287 I2S speaker platform support
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82TL
ASoC: Intel: avs: Provide support for fallback topology
ALSA: seq: Fix snd_seq_expand_var_event() call to user-space
ALSA: usb-audio: Fix potential memory leaks at error path for UMP open
ALSA: hda/cirrus: Fix broken audio on hardware with two CS42L42 codecs.
ASoC: rt5645: NULL pointer access when removing jack
ASoC: amd: yc: Add DMI entries to support Victus by HP Gaming Laptop 15-fb0xxx (8A3E)
MAINTAINERS: Update the MAINTAINERS enties for TEXAS INSTRUMENTS ASoC DRIVERS
ALSA: sb: Fix wrong argument in commented code
ALSA: pcm: Fix error checks of default read/write copy ops
ASoC: Name iov_iter argument as iterator instead of buffer
ASoC: dmaengine: Drop unused iov_iter for process callback
ALSA: hda/tas2781: Use standard clamp() macro
ASoC: cs35l56: Waiting for firmware to boot must be tolerant of I/O errors
ASoC: dt-bindings: fsl_easrc: Add support for imx8mp-easrc
ASoC: cs42l43: Fix missing error code in cs42l43_codec_probe()
ASoC: cs35l45: Rename DACPCM1 Source control
ASoC: cs35l45: Fix "Dead assigment" warning
ASoC: cs35l45: Add support for Chip ID 0x35A460
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main one is a fix for a broken strscpy() conversion that landed in
the merge window and broke early parsing of the kernel command line.
- Fix an incorrect mask in the CXL PMU driver
- Fix a regression in early parsing of the kernel command line
- Fix an IP checksum OoB access reported by syzbot"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: csum: Fix OoB access in IP checksum code for negative lengths
arm64/sysreg: Fix broken strncpy() -> strscpy() conversion
perf: CXL: fix mismatched number of counters mask
|