| Age | Commit message (Collapse) | Author | Files | Lines |
|
Wire up system vector 0xf5 for handling PMIs (i.e. interrupts delivered
through the LVTPC) while running KVM guests with a mediated PMU. Perf
currently delivers all PMIs as NMIs, e.g. so that events that trigger while
IRQs are disabled aren't delayed and generate useless records, but due to
the multiplexing of NMIs throughout the system, correctly identifying NMIs
for a mediated PMU is practically infeasible.
To (greatly) simplify identifying guest mediated PMU PMIs, perf will
switch the CPU's LVTPC between PERF_GUEST_MEDIATED_PMI_VECTOR and NMI when
guest PMU context is loaded/put. I.e. PMIs that are generated by the CPU
while the guest is active will be identified purely based on the IRQ
vector.
Route the vector through perf, e.g. as opposed to letting KVM attach a
handler directly a la posted interrupt notification vectors, as perf owns
the LVTPC and thus is the rightful owner of PERF_GUEST_MEDIATED_PMI_VECTOR.
Functionally, having KVM directly own the vector would be fine (both KVM
and perf will be completely aware of when a mediated PMU is active), but
would lead to an undesirable split in ownership: perf would be responsible
for installing the vector, but not handling the resulting IRQs.
Add a new perf_guest_info_callbacks hook (and static call) to allow KVM to
register its handler with perf when running guests with mediated PMUs.
Note, because KVM always runs guests with host IRQs enabled, there is no
danger of a PMI being delayed from the guest's perspective due to using a
regular IRQ instead of an NMI.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-9-seanjc@google.com
|
|
Add exported APIs to load/put a guest mediated PMU context. KVM will
load the guest PMU shortly before VM-Enter, and put the guest PMU shortly
after VM-Exit.
On the perf side of things, schedule out all exclude_guest events when the
guest context is loaded, and schedule them back in when the guest context
is put. I.e. yield the hardware PMU resources to the guest, by way of KVM.
Note, perf is only responsible for managing host context. KVM is
responsible for loading/storing guest state to/from hardware.
[sean: shuffle patches around, write changelog]
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-8-seanjc@google.com
|
|
Current perf doesn't explicitly schedule out all exclude_guest events
while the guest is running. There is no problem with the current
emulated vPMU. Because perf owns all the PMU counters. It can mask the
counter which is assigned to an exclude_guest event when a guest is
running (Intel way), or set the corresponding HOSTONLY bit in evsentsel
(AMD way). The counter doesn't count when a guest is running.
However, either way doesn't work with the introduced mediated vPMU.
A guest owns all the PMU counters when it's running. The host should not
mask any counters. The counter may be used by the guest. The evsentsel
may be overwritten.
Perf should explicitly schedule out all exclude_guest events to release
the PMU resources when entering a guest, and resume the counting when
exiting the guest.
It's possible that an exclude_guest event is created when a guest is
running. The new event should not be scheduled in as well.
The ctx time is shared among different PMUs. The time cannot be stopped
when a guest is running. It is required to calculate the time for events
from other PMUs, e.g., uncore events. Add timeguest to track the guest
run time. For an exclude_guest event, the elapsed time equals
the ctx time - guest time.
Cgroup has dedicated times. Use the same method to deduct the guest time
from the cgroup time as well.
[sean: massage comments]
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-7-seanjc@google.com
|
|
The current perf tracks two timestamps for the normal ctx and cgroup.
The same type of variables and similar codes are used to track the
timestamps. In the following patch, the third timestamp to track the
guest time will be introduced.
To avoid the code duplication, add a new struct perf_time_ctx and factor
out a generic function update_perf_time_ctx().
No functional change.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-6-seanjc@google.com
|
|
Currently, exposing PMU capabilities to a KVM guest is done by emulating
guest PMCs via host perf events, i.e. by having KVM be "just" another user
of perf. As a result, the guest and host are effectively competing for
resources, and emulating guest accesses to vPMU resources requires
expensive actions (expensive relative to the native instruction). The
overhead and resource competition results in degraded guest performance
and ultimately very poor vPMU accuracy.
To address the issues with the perf-emulated vPMU, introduce a "mediated
vPMU", where the data plane (PMCs and enable/disable knobs) is exposed
directly to the guest, but the control plane (event selectors and access
to fixed counters) is managed by KVM (via MSR interceptions). To allow
host perf usage of the PMU to (partially) co-exist with KVM/guest usage
of the PMU, KVM and perf will coordinate to a world switch between host
perf context and guest vPMU context near VM-Enter/VM-Exit.
Add two exported APIs, perf_{create,release}_mediated_pmu(), to allow KVM
to create and release a mediated PMU instance (per VM). Because host perf
context will be deactivated while the guest is running, mediated PMU usage
will be mutually exclusive with perf analysis of the guest, i.e. perf
events that do NOT exclude the guest will not behave as expected.
To avoid silent failure of !exclude_guest perf events, disallow creating a
mediated PMU if there are active !exclude_guest events, and on the perf
side, disallowing creating new !exclude_guest perf events while there is
at least one active mediated PMU.
Exempt PMU resources that do not support mediated PMU usage, i.e. that are
outside the scope/view of KVM's vPMU and will not be swapped out while the
guest is running.
Guard mediated PMU with a new kconfig to help readers identify code paths
that are unique to mediated PMU support, and to allow for adding arch-
specific hooks without stubs. KVM x86 is expected to be the only KVM
architecture to support a mediated PMU in the near future (e.g. arm64 is
trending toward a partitioned PMU implementation), and KVM x86 will select
PERF_GUEST_MEDIATED_PMU unconditionally, i.e. won't need stubs.
Immediately select PERF_GUEST_MEDIATED_PMU when KVM x86 is enabled so that
all paths are compile tested. Full KVM support is on its way...
[sean: add kconfig and WARNing, rewrite changelog, swizzle patch ordering]
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-5-seanjc@google.com
|
|
Cross-merge BPF and other fixes after downstream PR.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix BPF builds due to -fms-extensions. selftests (Alexei
Starovoitov), bpftool (Quentin Monnet).
- Fix build of net/smc when CONFIG_BPF_SYSCALL=y, but CONFIG_BPF_JIT=n
(Geert Uytterhoeven)
- Fix livepatch/BPF interaction and support reliable unwinding through
BPF stack frames (Josh Poimboeuf)
- Do not audit capability check in arm64 JIT (Ondrej Mosnacek)
- Fix truncated dmabuf BPF iterator reads (T.J. Mercier)
- Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu)
- Fix warnings in libbpf when built with -Wdiscarded-qualifiers under
C23 (Mikhail Gavrilov)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: add regression test for bpf_d_path()
bpf: Fix verifier assumptions of bpf_d_path's output buffer
selftests/bpf: Add test for truncated dmabuf_iter reads
bpf: Fix truncated dmabuf iterator reads
x86/unwind/orc: Support reliable unwinding through BPF stack frames
bpf: Add bpf_has_frame_pointer()
bpf, arm64: Do not audit capability check in do_jit()
libbpf: Fix -Wdiscarded-qualifiers under C23
bpftool: Fix build warnings due to MS extensions
net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT
selftests/bpf: Add -fms-extensions to bpf build flags
|
|
Introduce a new structure for reporting an encrypted session over an
fc_rport. The encryption group is added as an attribute in struct
fc_rport and reports information in fc_encryption_info. This structure
contains a status member variable, which stores a bit value indicating
an encrypted session.
Signed-off-by: Sarah Catania <sarah.catania@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251211001659.138635-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add system cache table(SCT) and configs for Glymur SoC
Updated the list of usecase id's to enable additional clients for Glymur
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Pankaj Patil <pankaj.patil@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251211-glymur_llcc_enablement-v3-2-43457b354b0d@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Fix kernel-doc comments in include/linux/shdma-base.h to avoid
most warnings:
- prefix an enum name with "enum"
- prefix enum values with '@'
- prefix struct member names with '@'
shdma-base.h:28: warning: cannot understand function prototype:
'enum shdma_pm_state '
Warning: shdma-base.h:103 struct member 'desc_completed' not described
in 'shdma_ops'
Warning: shdma-base.h:103 struct member 'halt_channel' not described
in 'shdma_ops'
Warning: shdma-base.h:103 struct member 'channel_busy' not described
in 'shdma_ops'
Warning: shdma-base.h:103 struct member 'slave_addr' not described
in 'shdma_ops'
Warning: shdma-base.h:103 struct member 'desc_setup' not described
in 'shdma_ops'
Warning: shdma-base.h:103 struct member 'set_slave' not described
in 'shdma_ops'
Warning: shdma-base.h:103 struct member 'setup_xfer' not described
in 'shdma_ops'
Warning: shdma-base.h:103 struct member 'start_xfer' not described
in 'shdma_ops'
Warning: shdma-base.h:103 struct member 'embedded_desc' not described
in 'shdma_ops'
Warning: shdma-base.h:103 struct member 'chan_irq' not described
in 'shdma_ops'
This one is not fixed: from 4f46f8ac80416:
Warning: shdma-base.h:103 struct member 'get_partial' not described
in 'shdma_ops'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251104002001.445297-1-rdunlap@infradead.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
Use the correct enum name in its kernel-doc heading.
Add ending ':' to struct member names.
Drop the @id: kernel-doc entry since there is no struct member named 'id'.
edma.h:46: warning: expecting prototype for struct dw_edma_core_ops.
Prototype was for struct dw_edma_plat_ops instead
Warning: edma.h:101 struct member 'ops' not described in 'dw_edma_chip'
Warning: edma.h:101 struct member 'flags' not described in 'dw_edma_chip'
Warning: edma.h:101 struct member 'reg_base' not described
in 'dw_edma_chip'
Warning: edma.h:101 struct member 'll_wr_cnt' not described
in 'dw_edma_chip'
Warning: edma.h:101 struct member 'll_rd_cnt' not described
in 'dw_edma_chip'
Warning: edma.h:101 struct member 'll_region_wr' not described
in 'dw_edma_chip'
Warning: edma.h:101 struct member 'll_region_rd' not described
in 'dw_edma_chip'
Warning: edma.h:101 struct member 'dt_region_wr' not described
in 'dw_edma_chip'
Warning: edma.h:101 struct member 'dt_region_rd' not described
in 'dw_edma_chip'
Warning: edma.h:101 struct member 'mf' not described in 'dw_edma_chip'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251101191524.1991135-1-rdunlap@infradead.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
Netfilter code (net/netfilter/nft_log.c and net/netfilter/xt_AUDIT.c)
have to be kept in sync. Both source files had duplicated versions of
audit_ip4() and audit_ip6() functions, which can result in lack of
consistency and/or duplicated work.
This patch adds a helper function in audit.c that can be called by
netfilter code commonly, aiming to improve maintainability and
consistency.
Suggested-by: Florian Westphal <fw@strlen.de>
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
In the EFI config table, rename LINUX_EFI_SCREEN_INFO_TABLE_GUID to
LINUX_EFI_PRIMARY_DISPLAY_TABLE_GUID. Read sysfb_primary_display from
the entry. In addition to the screen_info, the entry now also contains
EDID information.
In libstub, replace struct screen_info with struct sysfb_display_info
from the kernel's sysfb_primary_display and rename functions
accordingly. Transfer it to the runtime kernel using the kernel's
global state or the LINUX_EFI_PRIMARY_DISPLAY_TABLE_GUID config-table
entry.
With CONFIG_FIRMWARE_EDID=y, libstub now transfers the GOP device's EDID
information to the kernel. If CONFIG_FIRMWARE_EDID=n, EDID information
is disabled. Make the Kconfig symbol CONFIG_FIRMWARE_EDID available with
EFI. Setting the value to 'n' disables EDID support.
Also rename screen_info.c to primary_display.c and adapt the contained
comment according to the changes.
Link: https://lore.kernel.org/all/20251126160854.553077-8-tzimmermann@suse.de/
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
[ardb: depend on EFI_GENERIC_STUB not EFI, fix conflicts after dropping
the preceding patch from the series]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Move x86's edid_info into sysfb_primary_display as a new field named
edid. Adapt all users.
An instance of edid_info has only been defined on x86. With the move
into sysfb_primary_display, it becomes available on all architectures.
Therefore remove this contraint from CONFIG_FIRMWARE_EDID.
x86 fills the EDID data from boot_params.edid_info. DRM drivers pick
up the raw data and make it available to DRM clients. Replace the
drivers' references to edid_info and instead use the sysfb_display_info
as passed from sysfb.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Replace the global screen_info with sysfb_primary_display of type
struct sysfb_display_info. Adapt all users of screen_info.
Instances of screen_info are defined for x86, loongarch and EFI,
with only one instance compiled into a specific build. Replace all
of them with sysfb_primary_display.
All existing users of screen_info are updated by pointing them to
sysfb_primary_display.screen instead. This introduces some churn to
the code, but has no impact on functionality.
Boot parameters and EFI config tables are unchanged. They transfer
screen_info as before. The logic in EFI's alloc_screen_info() changes
slightly, as it now returns the screen field of sysfb_primary_display.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/
Reviewed-by: Richard Lyu <richard.lyu@suse.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Add struct sysfb_display_info to wrap display-related state. For now
it contains only the screen's video mode. Later EDID will be added as
well.
This struct will be helpful for passing display state to sysfb drivers
or from the EFI stub library.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Richard Lyu <richard.lyu@suse.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Replace usage of global screen_info with local pointers. This will
later reduce churn when screen_info is being moved.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Richard Lyu <richard.lyu@suse.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Add support for Dosilicon DS35Q1GA (3.3V) and DS35M1GA (1.8V) SPI NAND.
These are 1Gbit (128MB) devices with:
- 2048 byte pages + 64 byte OOB
- 64 pages per block, 1024 blocks
- On-die 4-bit ECC per 512 byte sector
The 64-byte OOB area is divided into 4 segments of 16 bytes, with each
segment containing 8 bytes of user data (M2+M1) and 8 bytes of ECC
parity (R1). This provides 30 bytes of usable OOB space after reserving
2 bytes for the bad block marker.
Tested on Genexis Platinum 4410 (EcoNet EN751221) by writing known
patterns to OOB and verifying ECC parity placement in R1 regions.
Datasheet:
https://www.dosilicon.com/resources/SPI%20NAND/DS35X1GAXXX_rev08.pdf
Signed-off-by: Ahmed Naseef <naseefkm@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
Pull shmem rename fixes from Al Viro:
"A couple of shmem rename fixes - recent regression from tree-in-dcache
series and older breakage from stable directory offsets stuff"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
shmem: fix recovery on rename failures
shmem_whiteout(): fix regression from tree-in-dcache series
|
|
maple_tree insertions can fail if we are seriously short on memory;
simple_offset_rename() does not recover well if it runs into that.
The same goes for simple_offset_rename_exchange().
Moreover, shmem_whiteout() expects that if it succeeds, the caller will
progress to d_move(), i.e. that shmem_rename2() won't fail past the
successful call of shmem_whiteout().
Not hard to fix, fortunately - mtree_store() can't fail if the index we
are trying to store into is already present in the tree as a singleton.
For simple_offset_rename_exchange() that's enough - we just need to be
careful about the order of operations.
For simple_offset_rename() solution is to preinsert the target into the
tree for new_dir; the rest can be done without any potentially failing
operations.
That preinsertion has to be done in shmem_rename2() rather than in
simple_offset_rename() itself - otherwise we'd need to deal with the
possibility of failure after successful shmem_whiteout().
Fixes: a2e459555c5f ("shmem: stable directory offsets")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Use the correct struct member names to avoid kernel-doc warnings:
Warning: include/linux/lsm_hooks.h:83 struct member 'name' not described
in 'lsm_id'
Warning: include/linux/lsm_hooks.h:183 struct member 'initcall_device' not
described in 'lsm_info'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
The Renesas RZ/T2H (R9A09G077) and Renesas RZ/N2H (R9A09G087) SoCs have an
Interrupt Controller (ICU) that supports interrupts from external pins IRQ0
to IRQ15, and SEI, and software-triggered interrupts INTCPU0 to INTCPU15.
INTCPU0 to INTCPU13, IRQ0 to IRQ13 are non-safety interrupts, while
INTCPU14, INTCPU15, IRQ14, IRQ15 and SEI are safety interrupts, and are
exposed via a separate register space.
Signed-off-by: Cosmin Tanislav <cosmin-gabriel.tanislav.xa@renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251201112933.488801-3-cosmin-gabriel.tanislav.xa@renesas.com
|
|
Add infrastructure to redirect interrupt handler execution to a
different CPU when the current CPU is not part of the interrupt's CPU
affinity mask.
This is primarily aimed at (de)multiplexed interrupts, where the child
interrupt handler runs in the context of the parent interrupt handler,
and therefore CPU affinity control for the child interrupt is typically
not available.
With the new infrastructure, the child interrupt is allowed to freely
change its affinity setting, independently of the parent. If the
interrupt handler happens to be triggered on an "incompatible" CPU (a
CPU that's not part of the child interrupt's affinity mask), the handler
is redirected and runs in IRQ work context on a "compatible" CPU.
No functional change is being made to any existing irqchip driver, and
irqchip drivers must be explicitly modified to use the newly added
infrastructure to support interrupt redirection.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radu Rendec <rrendec@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
Link: https://patch.msgid.link/20251128212055.1409093-2-rrendec@redhat.com
|
|
setup_percpu_irq() was always a bad kludge, and should have never
been there the first place. Now that the last users are gone,
remove it for good.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251210082242.360936-7-maz@kernel.org
|
|
With the IRQ timing stuff being gone, there is no need to specify a flag
when requesting a percpu interrupt. Not only IRQF_TIMER was the only flag
(set of flags actually) allowed, but nobody ever passed it.
Get rid of __request_percpu_irq(), which was only getting 0 as flags, and
promote request_percpu_irq_affinity() as its replacement.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20251210082242.360936-3-maz@kernel.org
|
|
The IRQ timing tracking infrastructure was merged in 2019, but was never
plumbed in, is not selectable, and is therefore never used.
As Daniel agrees that there is little hope for this infrastructure to be
completed in the near term, drop it altogether.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/87zf7vex6h.wl-maz@kernel.org
Link: https://patch.msgid.link/20251210082242.360936-2-maz@kernel.org
|
|
Eliminate all kernel-doc warnings in <linux/msi.h>:
- add "struct" to struct kernel-doc headers
- add missing struct member descriptions or correct typos in them
Fixes these warnings:
Warning: include/linux/msi.h:60 cannot understand function prototype:
'struct msi_msg'
Warning: include/linux/msi.h:73 struct member 'arch_addr_lo' not described
in 'msi_msg'
Warning: include/linux/msi.h:73 struct member 'arch_addr_hi' not described
in 'msi_msg'
Warning: include/linux/msi.h:106 cannot understand function prototype:
'struct pci_msi_desc'
Warning: include/linux/msi.h:124 struct member 'msi_attrib' not described
in 'pci_msi_desc'
Warning: include/linux/msi.h:204 struct member 'sysfs_attrs' not described
in 'msi_desc'
Warning: include/linux/msi.h:227 struct member 'domain' not described in
'msi_dev_domain'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251214202341.2205675-1-rdunlap@infradead.org
|
|
New network transport protocols want NIC drivers to get hardware timestamps
of all incoming packets, and possibly all outgoing packets.
One example is the upcoming 'Swift congestion control' which is used by TCP
transport and is the primary need for timecounter_cyc2time(). This means
timecounter_cyc2time() can be called more than 100 million times per second
on a busy server.
Inlining timecounter_cyc2time() brings a 12% improvement on a UDP receive
stress test on a 100Gbit NIC.
Note that FDO, LTO, PGO are unable to magically help for this case,
presumably because NIC drivers are almost exclusively shipped as modules.
Add an unlikely() around the cc_cyc2ns_backwards() case, even if FDO (when
used) is able to take care of this optimization.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://research.google/pubs/swift-delay-is-simple-and-effective-for-congestion-control-in-the-datacenter/
Link: https://patch.msgid.link/20251129095740.3338476-1-edumazet@google.com
|
|
Requesting a delegation on a file from the userland fcntl() interface
currently succeeds when there are conflicting opens present.
This is because the lease handling code ignores conflicting opens for
FL_LAYOUT and FL_DELEG leases. This was a hack put in place long ago,
because nfsd already checks for conflicts in its own way. The kernel
needs to perform this check for userland delegations the same way it is
done for leases, however.
Make this dependent on the lease_manager by adding a new
->lm_open_conflict() lease_manager operation and have
generic_add_lease() call that instead of check_conflicting_open().
Morph check_conflicting_open() into a ->lm_open_conflict() op that is
only called for userland leases/delegations. Set the
->lm_open_conflict() operations for nfsd to trivial functions that
always return 0.
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251204-dir-deleg-ro-v2-2-22d37f92ce2c@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Zhang Yi points out that the dynamic folio_batch allocation in
iomap_fill_dirty_folios() is problematic for the ext4 on iomap work
that is under development because it doesn't sufficiently handle the
allocation failure case (by allowing a retry, for example). We've
also seen lockdep (via syzbot) complain recently about the scope of
the allocation.
The dynamic allocation was initially added for simplicity and to
help indicate whether the batch was used or not by the calling fs.
To address these issues, put the batch on the stack of
iomap_zero_range() and use a flag to control whether the batch
should be used in the iomap folio lookup path. This keeps things
simple and eliminates allocation issues with lockdep and for ext4 on
iomap.
While here, also clean up the fill helper signature to be more
consistent with the underlying filemap helper. Pass through the
return value of the filemap helper (folio count) and update the
lookup offset via an out param.
Fixes: 395ed1ef0012 ("iomap: optional zero range dirty folio processing")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://patch.msgid.link/20251208140548.373411-1-bfoster@redhat.com
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
While the used GML is consistent with the pattern for other Intel * Lake
SoCs, the de facto use is GLK. Update the acronym and users accordingly.
Note, a handful of the drivers for Gemini Lake in the Linux kernel use
GLK already (LPC, MEI, pin control, SDHCI, ...) and even some in ASoC.
The only ones in this patch used the inconsistent one.
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20251212181742.3944789-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Now that the last in-tree filesystem has been converted to the new mount
API, remove all legacy mount API code designed to handle un-converted
filesystems, and remove associated documentation as well.
(The code to handle the legacy mount(2) syscall from userspace is still
in place, of course.)
Tested with an allmodconfig build on x86_64, and a sanity check of an
old mount(2) syscall mount.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://patch.msgid.link/20251212174403.2882183-1-sandeen@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Note no effort is made to make sure structs embedding the namespace are
themselves aligned, so this is not guaranteed to eliminate cacheline
bouncing due to refcount management.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251203092851.287617-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Opening and closing an inode dirties the ->i_readcount field.
Depending on the alignment of the inode, it may happen to false-share
with other fields loaded both for both operations to various extent.
This notably concerns the ->i_flctx field.
Since most inodes don't have the field populated, this bit can be managed
with a flag in ->i_opflags instead which bypasses the problem.
Here are results I obtained while opening a file read-only in a loop
with 24 cores doing the work on Sapphire Rapids. Utilizing the flag as
opposed to reading ->i_flctx field was toggled at runtime as the benchmark
was running, to make sure both results come from the same alignment.
before: 3233740
after: 3373346 (+4%)
before: 3284313
after: 3518711 (+7%)
before: 3505545
after: 4092806 (+16%)
Or to put it differently, this varies wildly depending on how (un)lucky
you get.
The primary bottleneck before and after is the avoidable lockref trip in
do_dentry_open().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251203094837.290654-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Matches the idiom of storing a pointer with a release fence and safely
getting the content with a consume fence after.
Eliminates an actual fence on some archs.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251203094837.290654-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
While knfsd offers combined exclusive create and open results to clients,
on some filesystems those results may not be atomic. This behavior can be
observed. For example, an open O_CREAT with mode 0 will succeed in creating
the file but unexpectedly return -EACCES from vfs_open().
Additionally reducing the number of remote RPC calls required for O_CREAT
on network filesystem provides a performance benefit in the open path.
Teach knfsd's helper dentry_create() to use atomic_open() for filesystems
that support it. The previously const @path is passed up to atomic_open()
and may be modified depending on whether an existing entry was found or if
the atomic_open() returned an error and consumed the passed-in dentry.
Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
Link: https://patch.msgid.link/8e449bfb64ab055abb9fd82641a171531415a88c.1764259052.git.bcodding@hammerspace.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Let's kickstart the v6.20 (7.0?) release cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
We have currently used up 30 out of the 32-bits in the struct ata_device
struct member quirks. Thus, it is only possible to add two more quirks.
Change the struct ata_device struct member quirks from an unsigned int to
an u64.
Doing this core level change now, will make it easier for us now, as we
will not need to also do core level changes once the final two bits are
used as well.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Modify the existing libata.force parameters "max_sec_128" and
"max_sec_1024" to use the generic ATA_QUIRK_MAX_SEC quirk rather than
individual quirks.
This also allows us to remove the individual quirks ATA_QUIRK_MAX_SEC_128
and ATA_QUIRK_MAX_SEC_1024.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Add a new quirk ATA_QUIRK_MAX_SEC, which has a separate table with device
specific values.
Convert all existing ATA_QUIRK_MAX_SEC_XXX device quirks in
__ata_dev_quirks to the new format.
Quirks ATA_QUIRK_MAX_SEC_128 and ATA_QUIRK_MAX_SEC_1024 cannot be removed
yet, since they are also used by libata.force, which functionally, is a
separate user of the quirks. The quirks will be removed once all users
have been converted to use the new format.
The quirk ATA_QUIRK_MAX_SEC_8191 can be removed since it has no equivalent
libata.force parameter.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
There's no real space concerns here and keeping these fields
in a union makes reading (and tracing) the scheduler code harder.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251201064647.1851919-4-mingo@kernel.org
|
|
Define __signed_scalar_typeof() to declare a signed scalar type, leaving
non-scalar types unchanged.
To be used to clean up the scheduler load-balancing code a bit.
[ mingo: Split off this patch from the scheduler patch. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://patch.msgid.link/20251127154725.413564507@infradead.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
- Fix error code in the irqchip/mchp-eic driver
- Fix setup_percpu_irq() affinity assumptions
- Remove the unused irq_domain_add_tree() function
* tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mchp-eic: Fix error code in mchp_eic_domain_alloc()
irqdomain: Delete irq_domain_add_tree()
genirq: Allow NULL affinity for setup_percpu_irq()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:
- Improve bug reporting
- Suppress W=1 format warning
- Improve rseq scalability on Clang builds
* tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Always inline rseq_debug_syscall_return()
bug: Hush suggest-attribute=format for __warn_printf()
bug: Let report_bug_entry() provide the correct bugaddr
|
|
This has been unused since it was added 11 years ago in:
d17d8f9dedb9 ("x86/mm: Add tracepoints for TLB flushes")
Signed-off-by: Tal Zussman <tz2294@columbia.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://patch.msgid.link/20251212-tlb-trace-fix-v2-2-d322e0ad9b69@columbia.edu
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next
Pull in pf1550-onkey driver.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"There are no significant series in this small merge. Please see the
individual changelogs for details"
[ Editor's note: it's mainly ocfs2 and a couple of random fixes ]
* tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: memfd_luo: add CONFIG_SHMEM dependency
mm: shmem: avoid build warning for CONFIG_SHMEM=n
ocfs2: fix memory leak in ocfs2_merge_rec_left()
ocfs2: invalidate inode if i_mode is zero after block read
ocfs2: avoid -Wflex-array-member-not-at-end warning
ocfs2: convert remaining read-only checks to ocfs2_emergency_state
ocfs2: add ocfs2_emergency_state helper and apply to setattr
checkpatch: add uninitialized pointer with __free attribute check
args: fix documentation to reflect the correct numbers
ocfs2: fix kernel BUG in ocfs2_find_victim_chain
liveupdate: luo_core: fix redundant bound check in luo_ioctl()
ocfs2: validate inline xattr size and entry count in ocfs2_xattr_ibody_list
fs/fat: remove unnecessary wrapper fat_max_cache()
ocfs2: replace deprecated strcpy with strscpy
ocfs2: check tl_used after reading it from trancate log inode
liveupdate: luo_file: don't use invalid list iterator
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "powerpc/pseries/cmm: two smaller fixes" (David Hildenbrand)
fixes a couple of minor things in ppc land
- "Improve folio split related functions" (Zi Yan)
some cleanups and minorish fixes in the folio splitting code
* tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/damon/tests/core-kunit: avoid damos_test_commit stack warning
mm: vmscan: correct nr_requested tracing in scan_folios
MAINTAINERS: add idr core-api doc file to XARRAY
mm/hugetlb: fix incorrect error return from hugetlb_reserve_pages()
mm: fix CONFIG_STACK_GROWSUP typo in mm.h
mm/huge_memory: fix folio split stats counting
mm/huge_memory: make min_order_for_split() always return an order
mm/huge_memory: replace can_split_folio() with direct refcount calculation
mm/huge_memory: change folio_split_supported() to folio_check_splittable()
mm/sparse: fix sparse_vmemmap_init_nid_early definition without CONFIG_SPARSEMEM
powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages
powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION
|
|
Brown paper bag time. This is a silly oversight where I missed to drop
the error condition checking to ensure we clean up on early error
returns. I have an internal unit testset coming up for this which will
catch all such issues going forward.
Reported-by: Chris Mason <clm@fb.com>
Reported-by: Jeff Layton <jlayton@kernel.org>
Fixes: 011703a9acd7 ("file: add FD_{ADD,PREPARE}()")
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull further i3c update from Alexandre Belloni:
"We are removing a legacy API callback and having this sooner rather
than later will help ensuring no one introduces a new driver using it.
I've also added patches removing the "__free(...) = NULL" pattern
because I'm sure we won't avoid people sending those following the
mailing list discussion..."
* tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: adi: Fix confusing cleanup.h syntax
i3c: master: Fix confusing cleanup.h syntax
i3c: master: cleanup callback .priv_xfers()
i3c: master: switch to use new callback .i3c_xfers() from .priv_xfers()
|