Age | Commit message (Collapse) | Author | Files | Lines |
|
Unlike L3 and DF counters, UMC counters (PERF_CTRs) set the Overflow bit
(bit 48) and saturate on overflow. A subsequent pmu->read() of the event
reports an incorrect accumulated count as there is no difference between
the previous and the current values of the counter.
To avoid this, inspect the current counter value and proactively reset
the corresponding PERF_CTR register on every pmu->read(). Combined with
the periodic reads initiated by the hrtimer, the counters never get a
chance saturate but the resolution reduces to 47 bits.
Fixes: 25e56847821f ("perf/x86/amd/uncore: Add memory controller support")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/dee9c8af2c6d66814cf4c6224529c144c620cf2c.1744906694.git.sandipan.das@amd.com
|
|
Introduce a module parameter for configuring the hrtimer duration in
milliseconds. The default duration is 60000 milliseconds and the intent
is to allow users to customize it to suit jitter tolerances. It should
be noted that a longer duration will reduce jitter but affect accuracy
if the programmed events cause the counters to overflow multiple times
in a single interval.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/6cb0101da74955fa9c8361f168ffdf481ae8a200.1744906694.git.sandipan.das@amd.com
|
|
Uncore counters do not provide mechanisms like interrupts to report
overflows and the accumulated user-visible count is incorrect if there
is more than one overflow between two successive read requests for the
same event because the value of prev_count goes out-of-date for
calculating the correct delta.
To avoid this, start a hrtimer to periodically initiate a pmu->read() of
the active counters for keeping prev_count up-to-date. It should be
noted that the hrtimer duration should be lesser than the shortest time
it takes for a counter to overflow for this approach to be effective.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/8ecf5fe20452da1cd19cf3ff4954d3e7c5137468.1744906694.git.sandipan.das@amd.com
|
|
hrtimer handlers can be deferred to softirq context and affect timely
detection of counter overflows. Hence switch to HRTIMER_MODE_HARD.
Disabling and re-enabling IRQs in the hrtimer handler is not required
as pmu->start() and pmu->stop() can no longer intervene while updating
event->hw.prev_count.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/0ad4698465077225769e8edd5b2c7e8f48f636d5.1744906694.git.sandipan.das@amd.com
|
|
Fixes: d6389d3ccc13 ("perf/x86/amd/uncore: Refactor uncore management")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/30f9254c2de6c4318dd0809ef85a1677f68eef10.1744906694.git.sandipan.das@amd.com
|
|
Rename rep_nop() function to what it really does.
No functional change intended.
Suggested-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250418080805.83679-1-ubizjak@gmail.com
|
|
Current minimum required version of binutils is 2.25,
which supports PAUSE instruction mnemonic.
Replace "REP; NOP" with this proper mnemonic.
No functional change intended.
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250418080805.83679-2-ubizjak@gmail.com
|
|
Minimum version of binutils required to compile the kernel is 2.25.
This version correctly handles the "rep" prefixes, so it is possible
to remove the semicolon, which was used to support ancient versions
of GNU as.
Due to the semicolon, the compiler considers "rep; insn" (or its
alternate "rep\n\tinsn" form) as two separate instructions. Removing
the semicolon makes asm length calculations more accurate, consequently
making scheduling and inlining decisions of the compiler more accurate.
Removing the semicolon also enables assembler checks involving "rep"
prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in:
Error: invalid instruction `add' after `rep'
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@kernel.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20250418071437.4144391-2-ubizjak@gmail.com
|
|
Minimum version of binutils required to compile the kernel is 2.25.
This version correctly handles the "rep" prefixes, so it is possible
to remove the semicolon, which was used to support ancient versions
of GNU as.
Due to the semicolon, the compiler considers "rep; insn" (or its
alternate "rep\n\tinsn" form) as two separate instructions. Removing
the semicolon makes asm length calculations more accurate, consequently
making scheduling and inlining decisions of the compiler more accurate.
Removing the semicolon also enables assembler checks involving "rep"
prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in:
Error: invalid instruction `add' after `rep'
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Mares <mj@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250418071437.4144391-1-ubizjak@gmail.com
|
|
Add support to emulate all NOP instructions as the original uprobe
instruction.
This change speeds up uprobe on top of all NOP instructions and is a
preparation for usdt probe optimization, that will be done on top of
NOP5 instructions.
With this change the usdt probe on top of NOP5s won't take the performance
hit compared to usdt probe on top of standard NOP instructions.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Hao Luo <haoluo@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20250414083647.1234007-1-jolsa@kernel.org
|
|
CE4100 PCI specific code has no 'pci' suffix in the filename,
intel_mid_pci.c is the only one that duplicates the folder name in its
filename, drop that redundancy.
While at it, group the respective modules in the Makefile.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20250407070321.3761063-1-andriy.shevchenko@linux.intel.com
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc3).
No conflicts. Adjacent changes:
tools/net/ynl/pyynl/ynl_gen_c.py
4d07bbf2d456 ("tools: ynl-gen: don't declare loop iterator in place")
7e8ba0c7de2b ("tools: ynl: don't use genlmsghdr in classic netlink")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All the users of SHARED_KERNEL_PMD are gone. Zap it.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173244.1125BEC3%40davehans-spike.ostc.intel.com
|
|
MAX_PREALLOCATED_PMDS and PREALLOCATED_PMDS are now identical. Just
use PREALLOCATED_PMDS and remove "MAX".
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173242.5ED13A5B%40davehans-spike.ostc.intel.com
|
|
Finally, move away from having PAE kernels share any PMDs across
processes.
This was already the default on PTI kernels which are the common
case.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173241.1288CAB4%40davehans-spike.ostc.intel.com
|
|
The "paravirt environment" is no longer in the tree. Axe that part of the
comment. Also add a blurb to remind readers that "USER_PMDS" refer to
the PTI user *copy* of the page tables, not the user *portion*.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173240.5B1AB322%40davehans-spike.ostc.intel.com
|
|
There are a few too many levels of abstraction here.
First, just expand the PREALLOCATED_PMDS macro in place to make it
clear that it is only conditional on PTI.
Second, MAX_PREALLOCATED_PMDS is only used in one spot for an
on-stack allocation. It has a *maximum* value of 4. Do not bother
with the macro MAX() magic. Just set it to 4.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173238.6E3CDA56%40davehans-spike.ostc.intel.com
|
|
Each mm_struct has its own copy of the page tables. When core mm code
makes changes to a copy of the page tables those changes sometimes
need to be synchronized with other mms' copies of the page tables. But
when this synchronization actually needs to happen is highly
architecture and configuration specific.
In cases where kernel PMDs are shared across processes
(SHARED_KERNEL_PMD) the core mm does not itself need to do that
synchronization for kernel PMD changes. The x86 code communicates
this by clearing the PGTBL_PMD_MODIFIED bit cleared in those
configs to avoid expensive synchronization.
The kernel is moving toward never sharing kernel PMDs on 32-bit.
Prepare for that and make 32-bit PAE always set PGTBL_PMD_MODIFIED,
even if there is no modification to synchronize. This obviously adds
some synchronization overhead in cases where the kernel page tables
are being changed.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173237.EC790E95%40davehans-spike.ostc.intel.com
|
|
Kernel PMDs can either be shared across processes or private to a
process. On 64-bit, they are always shared. 32-bit non-PAE hardware
does not have PMDs, but the kernel logically squishes them into the
PGD and treats them as private. Here are the four cases:
64-bit: Shared
32-bit: non-PAE: Private
32-bit: PAE+ PTI: Private
32-bit: PAE+noPTI: Shared
Note that 32-bit is all "Private" except for PAE+noPTI being an
oddball. The 32-bit+PAE+noPTI case will be made like the rest of
32-bit shortly.
But until that can be done, temporarily treat the 32-bit+PAE+noPTI
case as Private. This will do unnecessary walks across pgd_list and
unnecessary PTE setting but should be otherwise harmless.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173235.F63F50D1%40davehans-spike.ostc.intel.com
|
|
A hardware PAE PGD is only 32 bytes. A PGD is PAGE_SIZE in the other
paging modes. But for reasons*, the kernel _sometimes_ allocates a
whole page even though it only ever uses 32 bytes.
Make PAE less weird. Just allocate a page like the other paging modes.
This was already being done for PTI (and Xen in the past) and nobody
screamed that loudly about it so it can't be that bad.
* The original reason for PAGE_SIZE allocations for the PAE PGDs was
Xen's need to detect page table writes. But 32-bit PTI forced it too
for reasons I'm unclear about.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173234.D34F0C3E%40davehans-spike.ostc.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"Just a single fix for the Xen multicall driver avoiding a percpu
variable referencing initdata by its initializer"
* tag 'for-linus-6.15a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: fix multicall debug feature
|
|
Extend the aperture calculation to consider sizes beyond the maximum
size of a region third table. Attempt to always use the smallest
table size possible to avoid unnecessary extra steps during translation.
Update reserved region calculations to use the appropriate table size.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20250411202433.181683-6-mjrosato@linux.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The origin_type of the dma_table is used to determine how many table
levels must be traversed for the translation.
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20250411202433.181683-4-mjrosato@linux.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The third argument of strscpy() is optional and can be left away iff
the destination is an array and the maximum size of the copy is the
size of destination.
Remove the third argument for those cases where this is possible.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Rename strncpy_skip_quote() to strscpy_skip_quote() and change its
implementation so that the destination string is always NUL terminated.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
There are hardly any strncpy() users left, therefore drop the
optimized s390 variant.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The CONFIG_DEBUG_VM=y warning in switch_mm_irqs_off() started
triggering in testing:
VM_WARN_ON_ONCE(prev != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(prev)));
AFAIU what happens is that unuse_temporary_mm() clears the mm_cpumask()
for the current CPU, while switch_mm_irqs_off() then checks that the
mm_cpumask() bit is set for the current CPU.
While this behaviour hasn't really changed since the following commit:
209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily")
introduced both, but the warning is wrong, so remove it.
[ mingo: Patchified Peter's email. ]
Reported-by: syzbot+c2537ce72a879a38113e@syzkaller.appspotmail.com
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/20250414135629.GA17910@noisy.programming.kicks-ass.net
|
|
Arch-PEBS retires IA32_PEBS_ENABLE and MSR_PEBS_DATA_CFG MSRs, so
intel_pmu_pebs_enable/disable() and intel_pmu_pebs_enable/disable_all()
are not needed to call for ach-PEBS.
To make the code cleaner, introduce static calls
x86_pmu_pebs_enable/disable() and x86_pmu_pebs_enable/disable_all()
instead of adding "x86_pmu.arch_pebs" check directly in these helpers.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20250415114428.341182-7-dapeng1.mi@linux.intel.com
|
|
Since architectural PEBS would be introduced in subsequent patches,
rename x86_pmu.pebs to x86_pmu.ds_pebs for distinguishing with the
upcoming architectural PEBS.
Besides restrict reserve_ds_buffers() helper to work only for the
legacy DS based PEBS and avoid it to corrupt the pebs_active flag and
release PEBS buffer incorrectly for arch-PEBS since the later patch
would reuse these flags and alloc/release_pebs_buffer() helpers for
arch-PEBS.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20250415114428.341182-6-dapeng1.mi@linux.intel.com
|
|
Move x86_pmu.bts flag initialization into bts_init() from
intel_ds_init() and rename intel_ds_init() to intel_pebs_init() since it
fully initializes PEBS now after removing the x86_pmu.bts
initialization.
It's safe to move x86_pmu.bts into bts_init() since all x86_pmu.bts flag
are called after bts_init() execution.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20250415114428.341182-5-dapeng1.mi@linux.intel.com
|
|
CPUID archPerfmonExt (0x23) leaves are supported to enumerate CPU
level's PMU capabilities on non-hybrid processors as well.
This patch supports to parse archPerfmonExt leaves on non-hybrid
processors. Architectural PEBS leverages archPerfmonExt sub-leaves 0x4
and 0x5 to enumerate the PEBS capabilities as well. This patch is a
precursor of the subsequent arch-PEBS enabling patches.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20250415114428.341182-4-dapeng1.mi@linux.intel.com
|
|
From the PMU's perspective, Clearwater Forest is similar to the previous
generation Sierra Forest.
The key differences are the ARCH PEBS feature and the new added 3 fixed
counters for topdown L1 metrics events.
The ARCH PEBS is supported in the following patches. This patch provides
support for basic perfmon features and 3 new added fixed counters.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20250415114428.341182-3-dapeng1.mi@linux.intel.com
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
From PMU's perspective, Panther Lake is similar to the previous
generation Lunar Lake. Both are hybrid platforms, with e-core and
p-core.
The key differences are the ARCH PEBS feature and several new events.
The ARCH PEBS is supported in the following patches.
The new events will be supported later in perf tool.
Share the code path with the Lunar Lake. Only update the name.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20250415114428.341182-2-dapeng1.mi@linux.intel.com
|
|
Currently when a user samples user space GPRs (--user-regs option) with
PEBS, the user space GPRs actually always come from software PMI
instead of from PEBS hardware. This leads to the sampled GPRs to
possibly be inaccurate for single PEBS record case because of the
skid between counter overflow and GPRs sampling on PMI.
For the large PEBS case, it is even worse. If user sets the
exclude_kernel attribute, large PEBS would be used to sample user space
GPRs, but since PEBS GPRs group is not really enabled, it leads to all
samples in the large PEBS record to share the same piece of user space
GPRs, like this reproducer shows:
$ perf record -e branches:pu --user-regs=ip,ax -c 100000 ./foo
$ perf report -D | grep "AX"
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
So enable GPRs group for user space GPRs sampling and prioritize reading
GPRs from PEBS. If the PEBS sampled GPRs is not user space GPRs (single
PEBS record case), perf_sample_regs_user() modifies them to user space
GPRs.
[ mingo: Clarified the changelog. ]
Fixes: c22497f5838c ("perf/x86/intel: Support adaptive PEBS v4")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250415104135.318169-2-dapeng1.mi@linux.intel.com
|
|
The below code would always unconditionally clear other status bits like
perf metrics overflow bit once PEBS buffer overflows:
status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
This is incorrect. Perf metrics overflow bit should be cleared only when
fixed counter 3 in PEBS counter group. Otherwise perf metrics overflow
could be missed to handle.
Closes: https://lore.kernel.org/all/20250225110012.GK31462@noisy.programming.kicks-ass.net/
Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250415104135.318169-1-dapeng1.mi@linux.intel.com
|
|
The scale of IIO bandwidth in free running counters is inherited from
the ICX. The counter increments for every 32 bytes rather than 4 bytes.
The IIO bandwidth out free running counters don't increment with a
consistent size. The increment depends on the requested size. It's
impossible to find a fixed increment. Remove it from the event_descs.
Fixes: 0378c93a92e2 ("perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server")
Reported-by: Tang Jun <dukang.tj@alibaba-inc.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250416142426.3933977-3-kan.liang@linux.intel.com
|
|
There was a mistake in the ICX uncore spec too. The counter increments
for every 32 bytes rather than 4 bytes.
The same as SNR, there are 1 ioclk and 8 IIO bandwidth in free running
counters. Reuse the snr_uncore_iio_freerunning_events().
Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support")
Reported-by: Tang Jun <dukang.tj@alibaba-inc.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250416142426.3933977-2-kan.liang@linux.intel.com
|
|
There was a mistake in the SNR uncore spec. The counter increments for
every 32 bytes of data sent from the IO agent to the SOC, not 4 bytes
which was documented in the spec.
The event list has been updated:
"EventName": "UNC_IIO_BANDWIDTH_IN.PART0_FREERUN",
"BriefDescription": "Free running counter that increments for every 32
bytes of data sent from the IO agent to the SOC",
Update the scale of the IIO bandwidth in free running counters as well.
Fixes: 210cc5f9db7a ("perf/x86/intel/uncore: Add uncore support for Snow Ridge server")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250416142426.3933977-1-kan.liang@linux.intel.com
|
|
When building with CONFIG_LTO_CLANG, there is an error in the x86 boot
startup code because it builds with a different code model than the rest
of the kernel:
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values: 'i32 2' from vmlinux.a(head64.o at 1302448), and 'i32 1' from vmlinux.a(map_kernel.o at 1314208)
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values: 'i32 2' from vmlinux.a(common.o at 1306108), and 'i32 1' from vmlinux.a(gdt_idt.o at 1314148)
As this directory is for code that only runs during early system
initialization, LTO is not very important, so filter out the LTO flags
from KBUILD_CFLAGS for arch/x86/boot/startup to resolve the build error.
Fixes: 4cecebf200ef ("x86/boot: Move the early GDT/IDT setup code into startup/")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20250414-x86-boot-startup-lto-error-v1-1-7c8bed7c131c@kernel.org
Closes: https://lore.kernel.org/CA+G9fYvnun+bhYgtt425LWxzOmj+8Jf3ruKeYxQSx-F6U7aisg@mail.gmail.com/
|
|
The RTAS call ibm,physical-attestation is used to retrieve
information about the trusted boot state of the firmware and
hypervisor on the system, and also Trusted Platform Modules (TPM)
data if the system is TCG 2.0 compliant.
This RTAS interface expects the caller to define different command
structs such as RetrieveTPMLog, RetrievePlatformCertificat and etc,
in a work area with a maximum size of 4K bytes and the response
buffer will be returned in the same work area.
The current implementation of this RTAS function is in the user
space but allocation of the work area is restricted with the system
lockdown. So this patch implements this RTAS function in the kernel
and expose to the user space with open/ioctl/read interfaces.
PAPR (2.13+ 21.3 ibm,physical-attestation) defines RTAS function:
- Pass the command struct to obtain the response buffer for the
specific command.
- This RTAS function is sequence RTAS call and has to issue RTAS
call multiple times to get the complete response buffer (max 64K).
The hypervisor expects the first RTAS call with the sequence 1 and
the subsequent calls with the sequence number returned from the
previous calls.
Expose these interfaces to user space with a
/dev/papr-physical-attestation character device using the following
programming model:
int devfd = open("/dev/papr-physical-attestation");
int fd = ioctl(devfd, PAPR_PHY_ATTEST_IOC_HANDLE,
struct papr_phy_attest_io_block);
- The user space defines the command struct and requests the
response for any command.
- Obtain the complete response buffer and returned the buffer as
blob to the command specific FD.
size = read(fd, buf, len);
- Can retrieve the response buffer once or multiple times until the
end of BLOB buffer.
Implemented this new kernel ABI support in librtas library for
system lockdown
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-8-haren@linux.ibm.com
|
|
ibm,platform-dump RTAS call in combination with writable mapping
/dev/mem is issued to collect platform dump from the hypervisor
and may need multiple calls to get the complete dump. The current
implementation uses rtas_platform_dump() API provided by librtas
library to issue these RTAS calls. But /dev/mem access by the
user space is prohibited under system lockdown.
The solution should be to restrict access to RTAS function in user
space and provide kernel interfaces to collect dump. This patch
adds papr-platform-dump character driver and expose standard
interfaces such as open / ioctl/ read to user space in ways that
are compatible with lockdown.
PAPR (7.3.3.4.1 ibm,platform-dump) provides a method to obtain
the complete dump:
- Each dump will be identified by ID called dump tag.
- A sequence of RTAS calls have to be issued until retrieve the
complete dump. The hypervisor expects the first RTAS call with
the sequence 0 and the subsequent calls with the sequence
number returned from the previous calls.
- The hypervisor returns "dump complete" status once the complete
dump is retrieved. But expects one more RTAS call from the
partition with the NULL buffer to invalidate dump which means
the dump will be removed in the hypervisor.
- Sequence of calls are allowed with different dump IDs at the
same time but not with the same dump ID.
Expose these interfaces to user space with a /dev/papr-platform-dump
character device using the following programming model:
int devfd = open("/dev/papr-platform-dump", O_RDONLY);
int fd = ioctl(devfd,PAPR_PLATFORM_DUMP_IOC_CREATE_HANDLE, &dump_id)
- Restrict user space to access with the same dump ID.
Typically we do not expect user space requests the dump
again for the same dump ID.
char *buf = malloc(size);
length = read(fd, buf, size);
- size should be minimum 1K based on PAPR and <= 4K based
on RTAS work area size. It will be restrict to RTAS work
area size. Using 4K work area based on the current
implementation in librtas library
- Each read call issue RTAS call to get the data based on
the size requirement and returns bytes returned from the
hypervisor
- If the previous call returns dump complete status, the
next read returns 0 like EOF.
ret = ioctl(PAPR_PLATFORM_DUMP_IOC_INVALIDATE, &dump_id)
- RTAS call with NULL buffer to invalidates the dump.
The read API should use the file descriptor obtained from ioctl
based on dump ID so that gets dump contents for the corresponding
dump ID. Implemented support in librtas (rtas_platform_dump()) for
this new ABI to support system lockdown.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-7-haren@linux.ibm.com
|
|
The RTAS call ibm,get-dynamic-sensor-state is used to get the
sensor state identified by the location code and the sensor
token. The librtas library provides an API
rtas_get_dynamic_sensor() which uses /dev/mem access for work
area allocation but is restricted under system lockdown.
This patch provides an interface with new ioctl
PAPR_DYNAMIC_SENSOR_IOC_GET to the papr-indices character
driver which executes this HCALL and copies the sensor state
in the user specified ioctl buffer.
Refer PAPR 7.3.19 ibm,get-dynamic-sensor-state for more
information on this RTAS call.
- User input parameters to the RTAS call: location code string
and the sensor token
Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
int fd = open("/dev/papr-indices", O_RDWR);
int ret = ioctl(fd, PAPR_DYNAMIC_SENSOR_IOC_GET,
struct papr_indices_io_block)
- The user space specifies input parameters in
papr_indices_io_block struct
- Returned state for the specified sensor is copied to
papr_indices_io_block.dynamic_param.state
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-6-haren@linux.ibm.com
|
|
The RTAS call ibm,set-dynamic-indicator is used to set the new
indicator state identified by a location code. The current
implementation uses rtas_set_dynamic_indicator() API provided by
librtas library which allocates RMO buffer and issue this RTAS
call in the user space. But /dev/mem access by the user space
is prohibited under system lockdown.
This patch provides an interface with new ioctl
PAPR_DYNAMIC_INDICATOR_IOC_SET to the papr-indices character
driver and expose this interface to the user space that is
compatible with lockdown.
Refer PAPR 7.3.18 ibm,set-dynamic-indicator for more
information on this RTAS call.
- User input parameters to the RTAS call: location code
string, indicator token and new state
Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
int fd = open("/dev/papr-indices", O_RDWR);
int ret = ioctl(fd, PAPR_DYNAMIC_INDICATOR_IOC_SET,
struct papr_indices_io_block)
- The user space passes input parameters in papr_indices_io_block
struct
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-5-haren@linux.ibm.com
|
|
The RTAS call ibm,get-indices is used to obtain indices and
location codes for a specified indicator or sensor token. The
current implementation uses rtas_get_indices() API provided by
librtas library which allocates RMO buffer and issue this RTAS
call in the user space. But writable mapping /dev/mem access by
the user space is prohibited under system lockdown.
To overcome the restricted access in the user space, the kernel
provide interfaces to collect indices data from the hypervisor.
This patch adds papr-indices character driver and expose standard
interfaces such as open / ioctl/ read to user space in ways that
are compatible with lockdown.
PAPR (2.13 7.3.17 ibm,get-indices RTAS Call) describes the
following steps to retrieve all indices data:
- User input parameters to the RTAS call: sensor or indicator,
and indice type
- ibm,get-indices is sequence RTAS call which means has to issue
multiple times to get the entire list of indicators or sensors
of a particular type. The hypervisor expects the first RTAS call
with the sequence 1 and the subsequent calls with the sequence
number returned from the previous calls.
- The OS may not interleave calls to ibm,get-indices for different
indicator or sensor types. Means other RTAS calls with different
type should not be issued while the previous type sequence is in
progress. So collect the entire list of indices and copied to
buffer BLOB during ioctl() and expose this buffer to the user
space with the file descriptor.
- The hypervisor fills the work area with a specific format but
does not return the number of bytes written to the buffer.
Instead of parsing the data for each call to determine the data
length, copy the work area size (RTAS_GET_INDICES_BUF_SIZE) to
the buffer. Return work-area size of data to the user space for
each read() call.
Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
int devfd = open("/dev/papr-indices", O_RDONLY);
int fd = ioctl(devfd, PAPR_INDICES_IOC_GET,
struct papr_indices_io_block)
- Collect all indices data for the specified token to the buffer
char *buf = malloc(RTAS_GET_INDICES_BUF_SIZE);
length = read(fd, buf, RTAS_GET_INDICES_BUF_SIZE)
- RTAS_GET_INDICES_BUF_SIZE of data is returned to the user
space.
- The user space retrieves the indices and their location codes
from the buffer
- Should issue multiple read() calls until reaches the end of
BLOB buffer.
The read() should use the file descriptor obtained from ioctl to
get the data that is exposed to file descriptor. Implemented
support in librtas (rtas_get_indices()) for this new ABI for
system lockdown.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-4-haren@linux.ibm.com
|
|
To issue ibm,get-indices, ibm,set-dynamic-indicator and
ibm,get-dynamic-sensor-state in the user space, the RMO buffer is
allocated for the work area which is restricted under system
lockdown. So instead of user space execution, the kernel will
provide /dev/papr-indices interface to execute these RTAS calls.
The user space assigns data in papr_indices_io_block struct
depends on the specific HCALL and passes to the following ioctls:
PAPR_INDICES_IOC_GET: Use for ibm,get-indices. Returns a
get-indices handle fd to read data.
PAPR_DYNAMIC_SENSOR_IOC_GET: Use for ibm,get-dynamic-sensor-state.
Updates the sensor state in
papr_indices_io_block.dynamic_param.state
PAPR_DYNAMIC_INDICATOR_IOC_SET: Use for ibm,set-dynamic-indicator.
Sets the new state for the input
indicator.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-3-haren@linux.ibm.com
|
|
The RTAS call can be normal where retrieves the data form the
hypervisor once or sequence based RTAS call which has to
issue multiple times until the complete data is obtained. For
some of these sequence RTAS calls, the OS should not interleave
calls with different input until the sequence is completed.
The data is collected for each call and copy to the buffer
for the entire sequence during ioctl() handle and then expose
this buffer to the user space with read() handle.
One such sequence RTAS call is ibm,get-vpd and its support is
already included in the current code. To add the similar support
for other sequence based calls, move the common functions in to
separate file and update papr_rtas_sequence struct with the
following callbacks so that RTAS call specific code will be
defined and executed to complete the sequence.
struct papr_rtas_sequence {
int error;
void params;
void (*begin) (struct papr_rtas_sequence *);
void (*end) (struct papr_rtas_sequence *);
const char * (*work) (struct papr_rtas_sequence *, size_t *);
};
params: Input parameters used to pass for RTAS call.
Begin: RTAS call specific function to initialize data
including work area allocation.
End: RTAS call specific function to free up resources
(free work area) after the sequence is completed.
Work: The actual RTAS call specific function which collects
the data from the hypervisor.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-2-haren@linux.ibm.com
|
|
The powerpc crc code was relying on pagefault_disable from being
pulled in by random header files.
Fix this by explicitly including uaccess.h. Also add other missing
header files to prevent similar problems in future.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 7ba8df47810f ("asm-generic: Make simd.h more resilient")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add the camera subsystem and CCI used to interface with cameras on the
Snapdragon 670.
Signed-off-by: Richard Acayan <mailingradian@gmail.com>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250205035013.206890-8-mailingradian@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into fixes
riscv fixes for 6.15-rc3
- A couple of fixes regarding module relocations
- Fix a build error by implementing missing alternative macros
- Another fix for kexec by fixing /proc/iomem
* tag 'riscv-fixes-6.15-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux:
riscv: Avoid fortify warning in syscall_get_arguments()
riscv: Provide all alternative macros all the time
riscv: module: Allocate PLT entries for R_RISCV_PLT32
riscv: module: Fix out-of-bounds relocation access
riscv: Properly export reserved regions in /proc/iomem
riscv: Fix unaligned access info messages
|