| Age | Commit message (Collapse) | Author | Files | Lines |
|
Provide individual LPAE and non-LPAE definitions for both these
functions, rather than having ifdefs inside the function body. This
places the functions closer to their associated definitions.
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
The FSR's fault status bits depend on whether LPAE is enabled. Rather
than always exposing both LPAE and non-LPAE to all code, move them
inside the ifdef blocks dependent on LPAE to restrict their visibility.
No code other than fsr_fs() makes use of these.
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Modernise the fault status field definitions by using BIT() and
GENMASK().
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
is_permission_fault() and is_translation_fault() are both conditional
on the FSR encodings, which are dependent on LPAE. We define the
constants in fault.h. Move these inline functions to fault.h to be
near the FSR definitions.
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Split the vmalloc() lazy-page table population from
do_translation_fault() into a new vmalloc_fault() function.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
__do_user_fault() may be called from fault handling paths where the
interrupts are enabled or disabled. E.g. do_page_fault() calls this
with interrupts enabled, whereas do_sect_fault()->do_bad_area()
will call this with interrupts disabled. Since this is a userspace
fault, we know that interrupts were enabled in the parent context,
so call local_irq_enable() here to give a consistent interrupt state.
This is necessary for force_sig_info() when PREEMPT_RT is enabled.
Reported-by: Yadi.hu <yadi.hu@windriver.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Fix incorrect model string in Devicetree for BPI-R4-Pro.
Fixes: f397471a6a8c ("arm64: dts: mediatek: mt7988: Add devicetree for BananaPi R4 Pro")
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
|
|
The SEV-SNP IBPB-on-Entry feature does not require a guest-side
implementation. It was added in Zen5 h/w, after the first SNP Zen
implementation, and thus was not accounted for when the initial set of SNP
features were added to the kernel.
In its abundant precaution, commit
8c29f0165405 ("x86/sev: Add SEV-SNP guest feature negotiation support")
included SEV_STATUS' IBPB-on-Entry bit as a reserved bit, thereby masking
guests from using the feature.
Allow guests to make use of IBPB-on-Entry when supported by the hypervisor, as
the bit is now architecturally defined and safe to expose.
Fixes: 8c29f0165405 ("x86/sev: Add SEV-SNP guest feature negotiation support")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@kernel.org
Link: https://patch.msgid.link/20260203222405.4065706-2-kim.phillips@amd.com
|
|
As part of the work to remove the dependency on calling into the decompressor
code (startup_64()) for a UEFI boot, a call to rmpadjust() was removed from
sev_enable() in favor of checking the value of the snp_vmpl variable.
When booting through a non-UEFI path and calling startup_64(), the call to
sev_enable() is performed before the BSS section is zeroed. With the removal
of the rmpadjust() call and the corresponding check of the return code, the
snp_vmpl variable is checked.
Since the kernel is running at VMPL0, the snp_vmpl variable will not have been
set and should be the default value of 0. However, since the call occurs
before the BSS is zeroed, the snp_vmpl variable may not actually be zero,
which will cause the guest boot to fail.
Since the decompressor relocates itself, the BSS would need to be cleared both
before and after the relocation, but this would, in effect, cause all of the
changes to BSS variables before relocation to be lost after relocation.
Instead, move the snp_vmpl variable into the .data section so that it is
initialized and the value made safe during relocation. As a pre-caution
against future changes, move other SEV-related decompressor variables into the
.data section, too.
Fixes: 68a501d7fd82 ("x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Tested-by: Kevin Hui <kevinhui@meta.com>
Tested-by: Changyuan Lyu <changyuanl@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/5648b7de5b0a5d0dfef3785f9582b718678c6448.1770217260.git.thomas.lendacky@amd.com
|
|
Pull kvm fixes from Paolo Bonzini:
"Arm:
- Make sure we don't leak any S1POE state from guest to guest when
the feature is supported on the HW, but not enabled on the host
- Propagate the ID registers from the host into non-protected VMs
managed by pKVM, ensuring that the guest sees the intended feature
set
- Drop double kern_hyp_va() from unpin_host_sve_state(), which could
bite us if we were to change kern_hyp_va() to not being idempotent
- Don't leak stage-2 mappings in protected mode
- Correctly align the faulting address when dealing with single page
stage-2 mappings for PAGE_SIZE > 4kB
- Fix detection of virtualisation-capable GICv5 IRS, due to the
maintainer being obviously fat fingered... [his words, not mine]
- Remove duplication of code retrieving the ASID for the purpose of
S1 PT handling
- Fix slightly abusive const-ification in vgic_set_kvm_info()
Generic:
- Remove internal Kconfigs that are now set on all architectures
- Remove per-architecture code to enable KVM_CAP_SYNC_MMU, all
architectures finally enable it in Linux 7.0"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: always define KVM_CAP_SYNC_MMU
KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER
KVM: arm64: Deduplicate ASID retrieval code
irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag
KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs
KVM: arm64: Fix protected mode handling of pages larger than 4kB
KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type
KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state()
KVM: arm64: Fix ID register initialization for non-protected pKVM guests
KVM: arm64: Optimise away S1POE handling when not supported by host
KVM: arm64: Hide S1POE from guests when not supported by the host
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix speculative safety in fred_extint()
- Fix __WARN_printf() trap in early_fixup_exception()
- Fix clang-build boot bug for unusual alignments, triggered by
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y
- Replace the final few __ASSEMBLY__ stragglers that snuck in lately
into non-UAPI x86 headers and use __ASSEMBLER__ consistently (again)
* tag 'x86-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers: Replace __ASSEMBLY__ stragglers with __ASSEMBLER__
x86/cfi: Fix CFI rewrite for odd alignments
x86/bug: Handle __WARN_printf() trap in early_fixup_exception()
x86/fred: Correct speculative safety in fred_extint()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events fixes from Ingo Molnar:
- Fix lock ordering bug found by lockdep in perf_event_wakeup()
- Fix uncore counter enumeration on Granite Rapids and Sierra Forest
- Fix perf_mmap() refcount bug found by Syzkaller
- Fix __perf_event_overflow() vs perf_remove_from_context() race
* tag 'perf-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix __perf_event_overflow() vs perf_remove_from_context() race
perf/core: Fix refcount bug and potential UAF in perf_mmap
perf/x86/intel/uncore: Add per-scheduler IMC CAS count events
perf/core: Fix invalid wait context in ctx_sched_in()
|
|
Cross-merge BPF and other fixes after downstream PR.
No conflicts.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix alignment of arm64 JIT buffer to prevent atomic tearing (Fuad
Tabba)
- Fix invariant violation for single value tnums in the verifier
(Harishankar Vishwanathan, Paul Chaignon)
- Fix a bunch of issues found by ASAN in selftests/bpf (Ihor Solodrai)
- Fix race in devmpa and cpumap on PREEMPT_RT (Jiayuan Chen)
- Fix show_fdinfo of kprobe_multi when cookies are not present (Jiri
Olsa)
- Fix race in freeing special fields in BPF maps to prevent memory
leaks (Kumar Kartikeya Dwivedi)
- Fix OOB read in dmabuf_collector (T.J. Mercier)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (36 commits)
selftests/bpf: Avoid simplification of crafted bounds test
selftests/bpf: Test refinement of single-value tnum
bpf: Improve bounds when tnum has a single possible value
bpf: Introduce tnum_step to step through tnum's members
bpf: Fix race in devmap on PREEMPT_RT
bpf: Fix race in cpumap on PREEMPT_RT
selftests/bpf: Add tests for special fields races
bpf: Retire rcu_trace_implies_rcu_gp() from local storage
bpf: Delay freeing fields in local storage
bpf: Lose const-ness of map in map_check_btf()
bpf: Register dtor for freeing special fields
selftests/bpf: Fix OOB read in dmabuf_collector
selftests/bpf: Fix a memory leak in xdp_flowtable test
bpf: Fix stack-out-of-bounds write in devmap
bpf: Fix kprobe_multi cookies access in show_fdinfo callback
bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing
selftests/bpf: Don't override SIGSEGV handler with ASAN
selftests/bpf: Check BPFTOOL env var in detect_bpftool_path()
selftests/bpf: Fix out-of-bounds array access bugs reported by ASAN
selftests/bpf: Fix array bounds warning in jit_disasm_helpers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix guest pfault init to pass a physical address to DIAG 0x258,
restoring pfault interrupts and avoiding vCPU stalls during host
page-in
- Fix kexec/kdump hangs with stack protector by marking
s390_reset_system() __no_stack_protector; set_prefix(0) switches
lowcore and the canary no longer matches
- Fix idle/vtime cputime accounting (idle-exit ordering, vtimer
double-forwarding) and small cleanups
* tag 's390-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pfault: Fix virtual vs physical address confusion
s390/kexec: Disable stack protector in s390_reset_system()
s390/idle: Remove psw_idle() prototype
s390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON()
s390/vtime: Use __this_cpu_read() / get rid of READ_ONCE()
s390/irq/idle: Remove psw bits early
s390/idle: Inline update_timer_idle()
s390/idle: Slightly optimize idle time accounting
s390/idle: Add comment for non obvious code
s390/vtime: Fix virtual timer forwarding
s390/idle: Fix cpu idle exit cpu time accounting
|
|
Add required dt node for cmu_g3d block, which provides
clocks for G3D IP
Signed-off-by: Raghav Sharma <raghav.s@samsung.com>
Link: https://patch.msgid.link/20260202103555.2089376-4-raghav.s@samsung.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Most rails are the same between Pixel 6 and Pro, with the following
differences:
* only Pro has UWB
* Pro uses l2m, not l14m, for TCXO
* Pro uses bucka, not l31m, for NFC
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20260210-s2mpg1x-regulators-dts-v2-1-68783c9e0a32@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 7.0, take #1
- Make sure we don't leak any S1POE state from guest to guest when
the feature is supported on the HW, but not enabled on the host
- Propagate the ID registers from the host into non-protected VMs
managed by pKVM, ensuring that the guest sees the intended feature set
- Drop double kern_hyp_va() from unpin_host_sve_state(), which could
bite us if we were to change kern_hyp_va() to not being idempotent
- Don't leak stage-2 mappings in protected mode
- Correctly align the faulting address when dealing with single page
stage-2 mappings for PAGE_SIZE > 4kB
- Fix detection of virtualisation-capable GICv5 IRS, due to the
maintainer being obviously fat fingered...
- Remove duplication of code retrieving the ASID for the purpose of
S1 PT handling
- Fix slightly abusive const-ification in vgic_set_kvm_info()
|
|
KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always
available. Move the definition from individual architectures to common
code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
All architectures now use MMU notifier for KVM page table management.
Remove the Kconfig symbol and the code that is used when it is
disabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add support for LVDS controller node
Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com>
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Link: https://lore.kernel.org/r/20260225085430.480052-4-manikandan.m@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
|
|
Add support for LCD controller node
Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com>
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Link: https://lore.kernel.org/r/20260225085430.480052-2-manikandan.m@microchip.com
[claudiu.beznea: add a space b/w the node address and the next '{']
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
|
|
Add config support to enable LVDS serializer
Signed-off-by: Aubin Constans <aubin.constans@microchip.com>
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Link: https://lore.kernel.org/r/20260225085430.480052-7-manikandan.m@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
|
|
Add config support to enable maxtouch capacitive touchscreen
Signed-off-by: Romain Sioen <romain.sioen@microchip.com>
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Link: https://lore.kernel.org/r/20260225085430.480052-6-manikandan.m@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
|
|
Add configs for DRM Atmel LCD Controller, Backlight and Simple Panel
Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com>
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Link: https://lore.kernel.org/r/20260225085430.480052-5-manikandan.m@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
|
|
IBS OP on future hardware can indicate data source from remote socket
as well. Advertise this capability to userspace so that userspace tools
can decode IBS data accordingly.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260216042530.1546-8-ravi.bangoria@amd.com
|
|
IBS OP on future hardware supports recording samples only for instructions
that does streaming store. Like the existing IBS filters, samples pointing
to instruction which does not cause streaming store are discarded and IBS
restarts internally.
Example:
$ perf record -e ibs_op/strmst=1/ -- <workload>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260216042530.1546-7-ravi.bangoria@amd.com
|
|
IBS on future hardware adds the ability to filter IBS events by examining
RIP bit 63. Because Linux kernel addresses always have bit 63 set while
user-space addresses never do, this capability can be used as a privilege
filter.
So far, IBS supports privilege filtering in software (swfilt=1), where
samples are dropped in the NMI handler. The RIP bit63 hardware filter
enables IBS to be usable by unprivileged users without passing swfilt
flag. So, swfilt flag will silently be ignored when the hardware
filtering capability is present.
Example (non-root user):
$ perf record -e ibs_op//u -- <workload>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260216042530.1546-6-ravi.bangoria@amd.com
|
|
IBS Fetch on future hardware adds fetch latency filtering which
generates interrupt only when FetchLat value exceeds a programmable
threshold.
Hardware allows threshold in 128-cycle increment (i.e. 128, 256, 384
etc.) from 128 to 1920 cycles. Like the existing IBS filters, samples
that fail the latency test are dropped and IBS restarts internally.
Since hardware supports threshold in multiple of 128, add a software
filter on top to support latency threshold with the granularity of 1
cycle in between [128-1920].
Example:
# perf record -e ibs_fetch/fetchlat=128/ -c 10000 -a -- sleep 5
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260216042530.1546-5-ravi.bangoria@amd.com
|
|
The existing IBS_{FETCH|OP}_CTL MSRs combine control and status bits
which leads to RMW race between HW and SW:
HW SW
------------------------ ------------------------------
config = rdmsr(IBS_OP_CTL);
config &= ~EN;
Set IBS_OP_CTL[Val] to 1
trigger NMI
wrmsr(IBS_OP_CTL, config);
// Val is accidentally cleared
Future hardware adds a control-only MSR, IBS_{FETCH|OP}_CTL2, which
provides a second-level "disable" bit (Dis). IBS is now:
Enabled: IBS_{FETCH|OP}_CTL[En] = 1 && IBS_{FETCH|OP}_CTL2[Dis] = 0
Disabled: IBS_{FETCH|OP}_CTL[En] = 0 || IBS_{FETCH|OP}_CTL2[Dis] = 1
The separate "Dis" bit lets software disable IBS without touching any
status fields, eliminating the hardware/software race.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260216042530.1546-4-ravi.bangoria@amd.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The diffstat is dominated by changes to our TLB invalidation errata
handling and the introduction of a new GCS selftest to catch one of
the issues that is fixed here relating to PROT_NONE mappings.
- Fix cpufreq warning due to attempting a cross-call with interrupts
masked when reading local AMU counters
- Fix DEBUG_PREEMPT warning from the delay loop when it tries to
access per-cpu errata workaround state for the virtual counter
- Re-jig and optimise our TLB invalidation errata workarounds in
preparation for more hardware brokenness
- Fix GCS mappings to interact properly with PROT_NONE and to avoid
corrupting the pte on CPUs with FEAT_LPA2
- Fix ioremap_prot() to extract only the memory attributes from the
user pte and ignore all the other 'prot' bits"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: topology: Fix false warning in counters_read_on_cpu() for same-CPU reads
arm64: Fix sampling the "stable" virtual counter in preemptible section
arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI
arm64: tlb: Allow XZR argument to TLBI ops
kselftest: arm64: Check access to GCS after mprotect(PROT_NONE)
arm64: gcs: Honour mprotect(PROT_NONE) on shadow stack mappings
arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled
arm64: io: Extract user memory type in ioremap_prot()
arm64: io: Rename ioremap_prot() to __ioremap_prot()
|
|
IBS on upcoming microarch introduced two new control MSRs and couple of
new features. Define macros for them.
New capabilities:
o IBS_CAPS_DIS: Alternate Fetch and Op IBS disable bits
o IBS_CAPS_FETCHLAT: Fetch Latency filter
o IBS_CAPS_BIT63_FILTER: Virtual address bit 63 based filters for Fetch
and Op
o IBS_CAPS_STRMST_RMTSOCKET: Streaming store filter and indicator,
remote socket indicator
New control MSRs for above features:
o MSR_AMD64_IBSFETCHCTL2
o MSR_AMD64_IBSOPCTL2
Also do cosmetic alignment changes.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260216042530.1546-3-ravi.bangoria@amd.com
|
|
Load latency filter threshold is encoded in config1[11:0]. Define a mask
for it instead of hardcoded 0xFFF. Unlike "config" fields whose layout
maps to PERF_{FETCH|OP}_CTL MSR, layout of "config1" is custom defined
so a new set of macros are needed for "config1" fields.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260216042530.1546-2-ravi.bangoria@amd.com
|
|
Consider the following race:
--------
o OP_CTL contains stale value: OP_CTL[Val]=1, OP_CTL[En]=0
o A new IBS OP event is being added
o [P]: Process context, [N]: NMI context
[P] perf_ibs_add(event) {
[P] if (test_and_set_bit(IBS_ENABLED, pcpu->state))
[P] return;
[P] /* pcpu->state = IBS_ENABLED */
[P]
[P] pcpu->event = event;
[P]
[P] perf_ibs_start(event) {
[P] set_bit(IBS_STARTED, pcpu->state);
[P] /* pcpu->state = IBS_ENABLED | IBS_STARTED */
[P] clear_bit(IBS_STOPPING, pcpu->state);
[P] /* pcpu->state = IBS_ENABLED | IBS_STARTED */
[N] --> NMI due to genuine FETCH event. perf_ibs_handle_irq()
[N] called for OP PMU as well.
[N]
[N] perf_ibs_handle_irq(perf_ibs) {
[N] event = pcpu->event; /* See line 6 */
[N]
[N] if (!test_bit(IBS_STARTED, pcpu->state)) /* false */
[N] return 0;
[N]
[N] if (WARN_ON_ONCE(!event)) /* false */
[N] goto fail;
[N]
[N] if (!(*buf++ & perf_ibs->valid_mask)) /* false due to stale
[N] * IBS_OP_CTL value */
[N] goto fail;
[N]
[N] ...
[N]
[N] perf_ibs_enable_event() // *Accidentally* enable the event.
[N] }
[N]
[N] /*
[N] * Repeated NMIs may follow due to accidentally enabled IBS OP
[N] * event if the sample period is very low. It could also lead
[N] * to pcpu->state corruption if the event gets throttled due
[N] * to too frequent NMIs.
[N] */
[P] perf_ibs_enable_event();
[P] }
[P] }
--------
We cannot safely clear IBS_{FETCH|OP}_CTL while disabling the event,
because the register might be read again later. So, clear the register
in the enable path - before we update pcpu->state and enable the event.
This guarantees that any NMI that lands in the gap finds Val=0 and
bails out cleanly.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20260216042216.1440-6-ravi.bangoria@amd.com
|
|
Calling perf_allow_kernel() from the NMI context is unsafe and could be
fatal. Capture the permission at event-initialization time by storing it
in event->hw.flags, and have the NMI handler rely on that cached flag
instead of making the call directly.
Fixes: 50a53b60e141d ("perf/amd/ibs: Prevent leaking sensitive data to userspace")
Reported-by: Sadasivan Shaiju <sadasivan.shaiju2@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20260216042216.1440-5-ravi.bangoria@amd.com
|
|
Commit 50a53b60e141 ("perf/amd/ibs: Prevent leaking sensitive data to
userspace") zeroed the physical address and also cleared the PhyAddrVal
flag before copying the value into a perf sample to avoid exposing
physical addresses to unprivileged users.
Clearing PhyAddrVal, however, has an unintended side-effect: several
other IBS fields are considered valid only when this bit is set. As a
result, those otherwise correct fields are discarded, reducing IBS
functionality.
Continue to zero the physical address, but keep the PhyAddrVal bit
intact so the related fields remain usable while still preventing any
address leak.
Fixes: 50a53b60e141 ("perf/amd/ibs: Prevent leaking sensitive data to userspace")
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20260216042216.1440-4-ravi.bangoria@amd.com
|
|
The ldlat dependency on l3missonly is specific to Zen 5; newer generations
are not affected. This quirk is documented as an erratum in the following
Revision Guide.
Erratum: 1606 IBS (Instruction Based Sampling) OP Load Latency Filtering
May Capture Unwanted Samples When L3Miss Filtering is Disabled
Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors,
Pub. 58251 Rev. 1.30 July 2025
https://bugzilla.kernel.org/attachment.cgi?id=309193
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20260216042216.1440-3-ravi.bangoria@amd.com
|
|
Add interrupt throttling accounting for below cases:
o IBS Op PMU: A software filter (in addition to the hardware filter)
drops samples whose load latency is below the user-specified
threshold.
o IBS Fetch PMU: Samples discarded due to the zero-RIP erratum (#1197).
Although these samples are discarded, the NMI cost is still incurred, so
they should be counted for interrupt throttling.
Fixes: 26db2e0c51fe83e1dd852c1321407835b481806e ("perf/x86/amd/ibs: Work around erratum #1197")
Fixes: d20610c19b4a22bc69085b7eb7a02741d51de30e ("perf/amd/ibs: Add support for OP Load Latency Filtering")
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20260216042216.1440-2-ravi.bangoria@amd.com
|
|
The TSC deadline timer is directly coupled to the TSC and setting the next
deadline is tedious as the clockevents core code converts the
CLOCK_MONOTONIC based absolute expiry time to a relative expiry by reading
the current time from the TSC. It converts that delta to cycles and hands
the result to lapic_next_deadline(), which then has read to the TSC and add
the delta to program the timer.
The core code now supports coupled clock event devices and can provide the
expiry time in TSC cycles directly without reading the TSC at all.
This obviouly works only when the TSC is the current clocksource, but
that's the default for all modern CPUs which implement the TSC deadline
timer. If the TSC is not the current clocksource (e.g. early boot) then the
core code falls back to the relative set_next_event() callback as before.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.076565985@kernel.org
|
|
XEN PV does not emulate the TSC deadline timer, so the PVOPS indirection
for writing the deadline MSR can be avoided completely.
Use native_wrmsrq() instead.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.877429827@kernel.org
|
|
lapic_next_deadline() contains a fence before the TSC read and the write to
the TSC_DEADLINE MSR with a content free and therefore useless comment:
/* This MSR is special and need a special fence: */
The MSR is not really special. It is just not a serializing MSR, but that
does not matter at all in this context as all of these operations are
strictly CPU local.
The only thing the fence prevents is that the RDTSC is speculated ahead,
but that's not really relevant as the delta is calculated way before based
on a previous TSC read and therefore inaccurate by definition.
So removing the fence is just making it slightly more inaccurate in the
worst case, but that is irrelevant as it's way below the actual system
immanent latencies and variations.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.809059527@kernel.org
|
|
Avoid the overhead of the indirect call for a single instruction to read
the TSC.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.741886362@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
"Two DMA-mapping fixes for the recently merged API rework (Jiri Pirko
and Stian Halseth)"
* tag 'dma-mapping-7.0-2026-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
sparc: Fix page alignment in dma mapping
dma-mapping: avoid random addr value print out on error path
|
|
The firmware trustzone needs a special call to bring up the secondary
cpu core on the Manta board. This seems to be not needed on other
exynos5 boards and comes down to the available firmware on
a particular board.
Signed-off-by: Alexandre Marquet <tb@a-marquet.fr>
Signed-off-by: Lukas Timmermann <linux@timmermann.space>
Reviewed-by: Henrik Grimler <henrik@grimler.se>
Link: https://patch.msgid.link/20260127-lat3st-staging-v4-3-797469aaaf9d@timmermann.space
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Manta is the code name for Google Nexus 10, and was manufactured by
Samsung with their Exynos5250 SoC.
Add an initial device-tree file for this board.
Signed-off-by: Alexandre Marquet <tb@a-marquet.fr>
Co-developed-by: Lukas Timmermann <linux@timmermann.space>
Signed-off-by: Lukas Timmermann <linux@timmermann.space>
Link: https://patch.msgid.link/20260127-lat3st-staging-v4-2-797469aaaf9d@timmermann.space
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
struct bpf_plt contains a u64 target field. Currently, the BPF JIT
allocator requests an alignment of 4 bytes (sizeof(u32)) for the JIT
buffer.
Because the base address of the JIT buffer can be 4-byte aligned (e.g.,
ending in 0x4 or 0xc), the relative padding logic in build_plt() fails
to ensure that target lands on an 8-byte boundary.
This leads to two issues:
1. UBSAN reports misaligned-access warnings when dereferencing the
structure.
2. More critically, target is updated concurrently via WRITE_ONCE() in
bpf_arch_text_poke() while the JIT'd code executes ldr. On arm64,
64-bit loads/stores are only guaranteed to be single-copy atomic if
they are 64-bit aligned. A misaligned target risks a torn read,
causing the JIT to jump to a corrupted address.
Fix this by increasing the allocation alignment requirement to 8 bytes
(sizeof(u64)) in bpf_jit_binary_pack_alloc(). This anchors the base of
the JIT buffer to an 8-byte boundary, allowing the relative padding math
in build_plt() to correctly align the target field.
Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20260226075525.233321-1-tabba@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit 3e86e4d74c04 ("kbuild: keep .modinfo section in
vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it
from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and
ELF_DETAILS was present in all architecture specific vmlinux linker
scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and
COMMON_DISCARDS may be used by other linker scripts, such as the s390
and x86 compressed boot images, which may not expect to have a .modinfo
section. In certain circumstances, this could result in a bootloader
failing to load the compressed kernel [1].
Commit ddc6cbef3ef1 ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with
SecureBoot trailer") recently addressed this for the s390 bzImage but
the same bug remains for arm, parisc, and x86. The presence of .modinfo
in the x86 bzImage was the root cause of the issue worked around with
commit d50f21091358 ("kbuild: align modinfo section for Secureboot
Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes
lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its
MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition.
Split .modinfo out from ELF_DETAILS into its own macro and handle it in
all vmlinux linker scripts. Discard .modinfo in the places where it was
previously being discarded from being in COMMON_DISCARDS, as it has
never been necessary in those uses.
Cc: stable@vger.kernel.org
Fixes: 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped")
Reported-by: Ed W <lists@wildgooses.com>
Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1]
Tested-by: Ed W <lists@wildgooses.com> # x86_64
Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
|
|
The counters_read_on_cpu() function warns when called with IRQs
disabled to prevent deadlock in smp_call_function_single(). However,
this warning is spurious when reading counters on the current CPU,
since no IPI is needed for same CPU reads.
Commit 12eb8f4fff24 ("cpufreq: CPPC: Update FIE arch_freq_scale in
ticks for non-PCC regs") changed the CPPC Frequency Invariance Engine
to read AMU counters directly from the scheduler tick for non-PCC
register spaces (like FFH), instead of deferring to a kthread. This
means counters_read_on_cpu() is now called with IRQs disabled from the
tick handler, triggering the warning.
Fix this by restructuring the logic: when IRQs are disabled (tick
context), call the function directly for same-CPU reads. Otherwise
use smp_call_function_single().
Fixes: 997c021abc6e ("cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Ben reports that when running with CONFIG_DEBUG_PREEMPT, using
__arch_counter_get_cntvct_stable() results in well deserves warnings,
as we access a per-CPU variable without preemption disabled.
Fix the issue by disabling preemption on reading the counter. We can
probably do a lot better by not disabling preemption on systems that
do not require horrible workarounds to return a valid counter value,
but this plugs the issue for the time being.
Fixes: 29cc0f3aa7c6 ("arm64: Force the use of CNTVCT_EL0 in __delay()")
Reported-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/aZw3EGs4rbQvbAzV@e134344.arm.com
Tested-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: André Draszik <andre.draszik@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
__READ_ONCE() with CONFIG_LTO=y
When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
could see:
| kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
| 982 | }
| | ^
| kernel/futex/core.c:976:2: note: spinlock acquired here
| 976 | spin_lock(lock_ptr);
| | ^
| kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
| 982 | }
| | ^
| kernel/futex/core.c:966:6: note: spinlock acquired here
| 966 | void futex_q_lockptr_lock(struct futex_q *q)
| | ^
| 2 warnings generated.
Where we have:
extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
..
void futex_q_lockptr_lock(struct futex_q *q)
{
spinlock_t *lock_ptr;
/*
* See futex_unqueue() why lock_ptr can change.
*/
guard(rcu)();
retry:
>> lock_ptr = READ_ONCE(q->lock_ptr);
spin_lock(lock_ptr);
...
}
At the time of the above report (prior to removal of the 'atomic' flag),
Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
'atomic ? __u.__val : q->lock_ptr' (now just '__u.__val'), and used
this as the identity of the context lock given it cannot "see through"
the inline assembly; however, we want 'q->lock_ptr' as the canonical
context lock.
While for code generation the compiler simplified to '__u.__val' for
pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
much earlier on the AST, and (b) would be the wrong deduction.
Now that we've gotten rid of the 'atomic' ternary comparison, we can
return '__u.__val' through a pointer that we initialize with '&x', but
then update via a pointer-to-pointer. When READ_ONCE()'ing a context
lock pointer, TSA's alias analysis does not invalidate the initial alias
when updated through the pointer-to-pointer, and we make it effectively
"see through" the __READ_ONCE().
Code generation is unchanged.
Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Boqun Feng <boqun@kernel.org>
Reviewed-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
|