Age | Commit message (Collapse) | Author | Files | Lines |
|
throttling
amd_pmu_enable_all() does:
if (!test_bit(idx, cpuc->active_mask))
continue;
amd_pmu_enable_event(cpuc->events[idx]);
A perf NMI of another event can come between these two steps. Perf NMI
handler internally disables and enables _all_ events, including the one
which nmi-intercepted amd_pmu_enable_all() was in process of enabling.
If that unintentionally enabled event has very low sampling period and
causes immediate successive NMI, causing the event to be throttled,
cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop().
This will result in amd_pmu_enable_event() getting called with event=NULL
when amd_pmu_enable_all() resumes after handling the NMIs. This causes a
kernel crash:
BUG: kernel NULL pointer dereference, address: 0000000000000198
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
[...]
Call Trace:
<TASK>
amd_pmu_enable_all+0x68/0xb0
ctx_resched+0xd9/0x150
event_function+0xb8/0x130
? hrtimer_start_range_ns+0x141/0x4a0
? perf_duration_warn+0x30/0x30
remote_function+0x4d/0x60
__flush_smp_call_function_queue+0xc4/0x500
flush_smp_call_function_queue+0x11d/0x1b0
do_idle+0x18f/0x2d0
cpu_startup_entry+0x19/0x20
start_secondary+0x121/0x160
secondary_startup_64_no_verify+0xe5/0xeb
</TASK>
amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler
were recently added as part of BRS enablement but I'm not sure whether
we really need them. We can just disable BRS in the beginning and enable
it back while returning from NMI. This will solve the issue by not
enabling those events whose active_masks are set but are not yet enabled
in hw pmu.
Fixes: ada543459cab ("perf/x86/amd: Add AMD Fam19h Branch Sampling support")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221114044029.373-1-ravi.bangoria@amd.com
|
|
When a CPU comes online, the per-CPU NB and LLC uncore contexts are
freed but not the events array within the context structure. This
causes a memory leak as identified by the kmemleak detector.
[...]
unreferenced object 0xffff8c5944b8e320 (size 32):
comm "swapper/0", pid 1, jiffies 4294670387 (age 151.072s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000000759fb79>] amd_uncore_cpu_up_prepare+0xaf/0x230
[<00000000ddc9e126>] cpuhp_invoke_callback+0x2cf/0x470
[<0000000093e727d4>] cpuhp_issue_call+0x14d/0x170
[<0000000045464d54>] __cpuhp_setup_state_cpuslocked+0x11e/0x330
[<0000000069f67cbd>] __cpuhp_setup_state+0x6b/0x110
[<0000000015365e0f>] amd_uncore_init+0x260/0x321
[<00000000089152d2>] do_one_initcall+0x3f/0x1f0
[<000000002d0bd18d>] kernel_init_freeable+0x1ca/0x212
[<0000000030be8dde>] kernel_init+0x11/0x120
[<0000000059709e59>] ret_from_fork+0x22/0x30
unreferenced object 0xffff8c5944b8dd40 (size 64):
comm "swapper/0", pid 1, jiffies 4294670387 (age 151.072s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000306efe8b>] amd_uncore_cpu_up_prepare+0x183/0x230
[<00000000ddc9e126>] cpuhp_invoke_callback+0x2cf/0x470
[<0000000093e727d4>] cpuhp_issue_call+0x14d/0x170
[<0000000045464d54>] __cpuhp_setup_state_cpuslocked+0x11e/0x330
[<0000000069f67cbd>] __cpuhp_setup_state+0x6b/0x110
[<0000000015365e0f>] amd_uncore_init+0x260/0x321
[<00000000089152d2>] do_one_initcall+0x3f/0x1f0
[<000000002d0bd18d>] kernel_init_freeable+0x1ca/0x212
[<0000000030be8dde>] kernel_init+0x11/0x120
[<0000000059709e59>] ret_from_fork+0x22/0x30
[...]
Fix the problem by freeing the events array before freeing the uncore
context.
Fixes: 39621c5808f5 ("perf/x86/amd/uncore: Use dynamic events array")
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/4fa9e5ac6d6e41fa889101e7af7e6ba372cfea52.1662613255.git.sandipan.das@amd.com
|
|
PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
bit ambiguous name and also not generic enough to cover cxl.cache and
cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/f6268268-b4e9-9ed6-0453-65792644d953@amd.com
|
|
In case of fused compare and taken branch instructions, the AMD LBR points to
the compare instruction instead of the branch. Users of LBR usually expects
the from address to point to a branch instruction. The kernel has code to
adjust the from address via get_branch_type_fused(). However this correction
is only applied when a branch filter is applied. That means that if no
filter is present, the quality of the data is lower.
Fix the problem by applying the adjustment regardless of the filter setting,
bringing the AMD LBR to the same level as other LBR implementations.
Fixes: 245268c19f70 ("perf/x86/amd/lbr: Use fusion-aware branch classifier")
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20220928184043.408364-3-eranian@google.com
|
|
IBS_DC_PHYSADDR provides the physical data address for the tagged load/
store operation. Populate perf sample physical address using it.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-7-ravi.bangoria@amd.com
|
|
IBS_DC_LINADDR provides the linear data address for the tagged load/
store operation. Populate perf sample address using it.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-6-ravi.bangoria@amd.com
|
|
IbsDcMissLat indicates the number of clock cycles from when a miss is
detected in the data cache to when the data was delivered to the core.
Similarly, IbsTagToRetCtr provides number of cycles from when the op
was tagged to when the op was retired. Consider these fields for
sample->weight.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-5-ravi.bangoria@amd.com
|
|
struct perf_mem_data_src is used to pass arch specific memory access
details into generic form. These details gets consumed by tools like
perf mem and c2c. IBS tagged load/store sample provides most of the
information needed for these tools. Add a logic to convert IBS
specific raw data into perf_mem_data_src.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-4-ravi.bangoria@amd.com
|
|
Use the new sample_flags to indicate whether the raw data field is
filled by the PMU driver. Although it could check with the NULL,
follow the same rule with other fields.
Remove the raw field from the perf_sample_data_init() to minimize
the number of cache lines touched.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220921220032.2858517-2-namhyung@kernel.org
|
|
There's no in-tree user anymore. Let's get rid of it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908214104.3851807-3-namhyung@kernel.org
|
|
So that it can call perf_callchain() only if needed. Historically it used
__PERF_SAMPLE_CALLCHAIN_EARLY but we can do that with sample_flags in the
struct perf_sample_data.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908214104.3851807-1-namhyung@kernel.org
|
|
In preparation for making it a static_call, change the signature.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220829101321.573713839@infradead.org
|
|
Use the new sample_flags to indicate whether the branch stack is filled
by the PMU driver.
Remove the br_stack from the perf_sample_data_init() to minimize the number
of cache lines touched.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220901130959.1285717-4-kan.liang@linux.intel.com
|
|
Provide branch speculation information captured via AMD Last Branch Record
Extension Version 2 (LbrExtV2) by setting the speculation info in branch
records. The info is based on the "valid" and "spec" bits in the Branch To
registers.
Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ddc02f6320464cad0e3ff5bdb2314531568a91bc.1660211399.git.sandipan.das@amd.com
|
|
AMD Last Branch Record Extension Version 2 (LbrExtV2) can report a branch
from address that points to an instruction preceding the actual branch by
several bytes due to branch fusion and further optimizations in Zen4
processors.
In such cases, software should move forward sequentially in the instruction
stream from the reported address and the address of the first branch
encountered should be used instead. Hence, use the fusion-aware branch
classifier to determine the correct branch type and get the offset for
adjusting the branch from address.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/c324d2d0a9c3976da30b9563d09e50bfee0f264d.1660211399.git.sandipan.das@amd.com
|
|
With AMD Last Branch Record Extension Version 2 (LbrExtV2), it is necessary
to process the branch records further as hardware filtering is not granular
enough for identifying certain types of branches. E.g. to record system
calls, one should record far branches. The filter captures both far calls
and far returns but the irrelevant records are filtered out based on the
branch type as seen by the branch classifier.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/e51de057517f77788abd393c832e8dea616d489c.1660211399.git.sandipan.das@amd.com
|
|
If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected,
convert the requested branch filter (PERF_SAMPLE_BRANCH_* flags) to the
corresponding hardware filter value and stash it in the event data when
a branch stack is requested. The hardware filter value is also saved in
per-CPU areas for use during event scheduling.
Hardware filtering is provided by the LBR Branch Select register. It has
bits which when set, suppress recording of the following types of branches:
* CPL = 0 (Kernel only)
* CPL > 0 (Userspace only)
* Conditional Branches
* Near Relative Calls
* Near Indirect Calls
* Near Returns
* Near Indirect Jumps (excluding Near Indirect Calls and Near Returns)
* Near Relative Jumps (excluding Near Relative Calls)
* Far Branches
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/9336af5c9785b8e14c62220fc0e6cfb10ab97de3.1660211399.git.sandipan.das@amd.com
|
|
If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected,
enable it alongside LBR Freeze on PMI when an event requests branch stack
i.e. PERF_SAMPLE_BRANCH_STACK.
Each branch record is represented by a pair of registers, LBR From and LBR
To. The freeze feature prevents any updates to these registers once a PMC
overflows. The contents remain unchanged until the freeze bit is cleared by
the PMI handler.
The branch records are read and copied to sample data before unfreezing.
However, only valid entries are copied. There is no additional register to
denote which of the register pairs represent the top of the stack (TOS)
since internal register renaming always ensures that the first pair (i.e.
index 0) is the one representing the most recent branch and so on.
The LBR registers are per-thread resources and are cleared explicitly
whenever a new task is scheduled in. There are no special implications on
the contents of these registers when transitioning to deep C-states.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/d3b8500a3627a0d4d0259b005891ee248f248d91.1660211399.git.sandipan.das@amd.com
|
|
AMD Last Branch Record Extension Version 2 (LbrExtV2) is driven by Core PMC
overflows. It records recently taken branches up to the moment when the PMC
overflow occurs.
Detect the feature during PMU initialization and set the branch stack depth
using CPUID leaf 0x80000022 EBX.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/fc6e45378ada258f1bab79b0de6e05c393a8f1dd.1660211399.git.sandipan.das@amd.com
|
|
AMD processors that are capable of recording branches support either Branch
Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last
Branch Record Extension Version 2 (LbrExtV2) support, introduce new static
calls which act as gateways to call into the feature-dependent functions
based on what is available on the processor.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/b75dbc32663cb395f0d701167e952c6a6b0445a3.1660211399.git.sandipan.das@amd.com
|
|
AMD processors that are capable of recording branches support either Branch
Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last
Branch Record Extension Version 2 (LbrExtV2) support, reuse the "branches"
capability to advertise information about both BRS and LBR but make the
"branch-brs" event exclusive to Family 19h processors that support BRS.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ba4a4cde6db79b1c65c49834027bbdb8a915546b.1660211399.git.sandipan.das@amd.com
|
|
Move some of the Branch Sampling (BRS) specific functions out of the Core
events sources and into the BRS sources in preparation for adding other
mechanisms to record branches.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/b60283b57179475d18ee242d117c335c16733693.1660211399.git.sandipan.das@amd.com
|
|
The current RDPMC assignment scheme maps four DF PMCs and
six L3 PMCs from index 6 to 15.
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, there may be additional DF counters available
which are mapped starting from index 16 i.e. just after
the L3 counters. Update the RDPMC assignments accordingly.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/1359379ef34da760f108b075ac138ab082caa3ba.1652954372.git.sandipan.das@amd.com
|
|
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use bits 0-7, 32-37 as EventSelect and bits
8-15, 24-27 as UnitMask for Data Fabric (DF) events.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ffc24d5a3375b1d6e457d88e83241114de5c1942.1652954372.git.sandipan.das@amd.com
|
|
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use CPUID leaf 0x80000022 EBX to detect the
number of Data Fabric (DF) PMCs. This offers more
flexibility if the counts change in later processor
families.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/bac7b2806561e03f2acc7fdc9db94f102df80e1d.1652954372.git.sandipan.das@amd.com
|
|
Use the update_attrs attribute group introduced by commit
f3a3a8257e5a ("perf/core: Add attr_groups_update into struct
pmu") and the is_visible() callback to populate the family
specifc attributes for uncore events.
The changes apply to attributes that are unique to families
such as slicemask for Family 17h and coreid for Family 19h.
The addition of common attributes such as event and umask,
whose formats change across families, remain unchanged.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/a5e4f4dd5c459199fc497e82b858ba09dc91c064.1652954372.git.sandipan.das@amd.com
|
|
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, the number of available counters for a given
uncore PMU may not be fixed across families and models
and has to be determined at runtime.
The per-cpu uncore PMU data currently uses a fixed-sized
array for event information. Make it dynamic based on the
number of available counters.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/21eea0cb6de9d14f78d52d1d62637ae02bc900f5.1652954372.git.sandipan.das@amd.com
|
|
Commit 1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only
counting with SVM disabled") addresses an issue in which the
Host-Only bit in the counter control registers needs to be
masked off when SVM is not enabled.
The events need to be reloaded whenever SVM is enabled or
disabled for a CPU and this requires the PERF_CTL registers
to be reprogrammed using {enable,disable}_all(). However,
PerfMonV2 variants of these functions do not reprogram the
PERF_CTL registers. Hence, the legacy enable_all() function
should also be called.
Fixes: 9622e67e3980 ("perf/x86/amd/core: Add PerfMonV2 counter control")
Reported-by: Like Xu <likexu@tencent.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220518084327.464005-1-sandipan.das@amd.com
|
|
This fires on a Fam16h machine here:
unchecked MSR access error: WRMSR to 0xc000010f (tried to write 0x0000000000000018) \
at rIP: 0xffffffff81007db1 (amd_brs_reset+0x11/0x50)
Call Trace:
<TASK>
amd_pmu_cpu_starting
? x86_pmu_dead_cpu
x86_pmu_starting_cpu
cpuhp_invoke_callback
? x86_pmu_starting_cpu
? x86_pmu_dead_cpu
cpuhp_issue_call
? x86_pmu_starting_cpu
__cpuhp_setup_state_cpuslocked
? x86_pmu_dead_cpu
? x86_pmu_starting_cpu
__cpuhp_setup_state
? map_vsyscall
init_hw_perf_events
? map_vsyscall
do_one_initcall
? _raw_spin_unlock_irqrestore
? try_to_wake_up
kernel_init_freeable
? rest_init
kernel_init
ret_from_fork
because that CPU hotplug callback gets executed on any AMD CPU - not
only on the BRS-enabled ones. Check the BRS feature bit properly.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-By: Stephane Eranian <eranian@google.com>
Link: https://lkml.kernel.org/r/20220516154838.7044-1-bp@alien8.de
|
|
There's two problems with the current amd_brs_adjust_period() code:
- it isn't in fact AMD specific and wil always adjust the period;
- it adjusts the period, while it should only adjust the event count,
resulting in repoting a short period.
Fix this by using x86_pmu.limit_period, this makes it specific to the
AMD BRS case and ensures only the event count is adjusted while the
reported period is unmodified.
Fixes: ba2fe7500845 ("perf/x86/amd: Add AMD branch sampling period adjustment")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
'hwc' is never used in amd_pmu_enable_all(), so remove it.
Signed-off-by: Zucheng Zheng <zhengzucheng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421111031.174698-1-zhengzucheng@huawei.com
|
|
PMU driver can advertise certain feature via capability attribute('caps'
sysfs directory) which can be consumed by userspace tools like perf. Add
zen4_ibs_extensions capability attribute for IBS pmus. This attribute
will be enabled when CPUID_Fn8000001B_EAX[11] is set.
With patch on Zen4:
$ ls /sys/bus/event_source/devices/ibs_op/caps
zen4_ibs_extensions
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220509044914.1473-5-ravi.bangoria@amd.com
|
|
IBS L3 miss filtering works by tagging an instruction on IBS counter
overflow and generating an NMI if the tagged instruction causes an L3
miss. Samples without an L3 miss are discarded and counter is reset
with random value (between 1-15 for fetch pmu and 1-127 for op pmu).
This helps in reducing sampling overhead when user is interested only
in such samples. One of the use case of such filtered samples is to
feed data to page-migration daemon in tiered memory systems.
Add support for L3 miss filtering in IBS driver via new pmu attribute
"l3missonly". Example usage:
# perf record -a -e ibs_op/l3missonly=1/ --raw-samples sleep 5
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220509044914.1473-4-ravi.bangoria@amd.com
|
|
Currently, some attributes are added at build time whereas others
at boot time depending on IBS pmu capabilities. Instead, we can
just add all attribute groups at build time but hide individual
group at boot time using more appropriate ->is_visible() callback.
Also, struct perf_ibs has bunch of fields for pmu attributes which
just pass on the pointer, does not do anything else. Remove them.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220509044914.1473-3-ravi.bangoria@amd.com
|
|
IBS pmu initialization code ignores return value provided by
callee functions. Fix it.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220509044914.1473-2-ravi.bangoria@amd.com
|
|
IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.
Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220429051441.14251-1-ravi.bangoria@amd.com
|
|
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use a new scheme to process Core PMC overflows
in the NMI handler using the new global control and status
registers. This will be bypassed on unsupported hardware
(x86_pmu.version < 2).
In x86_pmu_handle_irq(), overflows are detected by testing
the contents of the PERF_CTR register for each active PMC in
a loop. The new scheme instead inspects the overflow bits of
the global status register.
The Performance Counter Global Status (PerfCntrGlobalStatus)
register has overflow (PerfCntrOvfl) bits for each PMC. This
is, however, a read-only MSR. To acknowledge that overflows
have been processed, the NMI handler must clear the bits by
writing to the PerfCntrGlobalStatusClr register.
In x86_pmu_handle_irq(), PMCs counting the same event that
are started and stopped at the same time record slightly
different counts due to delays in between reads from the
PERF_CTR registers. This is fixed by stopping and starting
the PMCs at the same before and with a single write to the
Performance Counter Global Control (PerfCntrGlobalCtl) upon
entering and before exiting the NMI handler.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/f20b7e4da0b0a83bdbe05857f354146623bc63ab.1650515382.git.sandipan.das@amd.com
|
|
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use a new scheme to manage the Core PMCs using
the new global control and status registers. This will be
bypassed on unsupported hardware (x86_pmu.version < 2).
Currently, all PMCs have dedicated control (PERF_CTL) and
counter (PERF_CTR) registers. For a given PMC, the enable
(En) bit of its PERF_CTL register is used to start or stop
counting.
The Performance Counter Global Control (PerfCntrGlobalCtl)
register has enable (PerfCntrEn) bits for each PMC. For a
PMC to start counting, both PERF_CTL and PerfCntrGlobalCtl
enable bits must be set. If either of those are cleared,
the PMC stops counting.
In x86_pmu_{en,dis}able_all(), the PERF_CTL registers of
all active PMCs are written to in a loop. Ideally, PMCs
counting the same event that were started and stopped at
the same time should record the same counts. Due to delays
in between writes to the PERF_CTL registers across loop
iterations, the PMCs cannot be enabled or disabled at the
same instant and hence, record slightly different counts.
This is fixed by enabling or disabling all active PMCs at
the same time with a single write to the PerfCntrGlobalCtl
register.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/dfe8e934074aaabc6ba748dfaccd0a77c974bb82.1650515382.git.sandipan.das@amd.com
|
|
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use CPUID leaf 0x80000022 EBX to detect the
number of Core PMCs. This offers more flexibility if the
counts change in later processor families.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/68a6d9688df189267db26530378870edd34f7b06.1650515382.git.sandipan.das@amd.com
|
|
AMD Performance Monitoring Version 2 (PerfMonV2) introduces
some new Core PMU features such as detection of the number
of available PMCs and managing PMCs using global registers
namely, PerfCntrGlobalCtl and PerfCntrGlobalStatus.
Clearing PerfCntrGlobalCtl and PerfCntrGlobalStatus ensures
that all PMCs are inactive and have no pending overflows
when CPUs are onlined or offlined.
The PMU version (x86_pmu.version) now indicates PerfMonV2
support and will be used to bypass the new features on
unsupported processors.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/dc8672ecbddff394e088ca8abf94b089b8ecc2e7.1650515382.git.sandipan.das@amd.com
|
|
On AMD Fam19h Zen3, the branch sampling (BRS) feature must be disabled before
entering low power and re-enabled (if was active) when returning from low
power. Otherwise, the NMI interrupt may be held up for too long and cause
problems. Stopping BRS will cause the NMI to be delivered if it was held up.
Define a perf_amd_brs_lopwr_cb() callback to stop/restart BRS. The callback
is protected by a jump label which is enabled only when AMD BRS is detected.
In all other cases, the callback is never called.
Signed-off-by: Stephane Eranian <eranian@google.com>
[peterz: static_call() and build fixes]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-10-eranian@google.com
|
|
Add a kernel config option CONFIG_PERF_EVENTS_AMD_BRS
to make the support for AMD Zen3 Branch Sampling (BRS) an opt-in
compile time option.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-8-eranian@google.com
|
|
The AMD Branch Sampling features does not provide hardware filtering by
privilege level. The associated PMU counter does but not the branch sampling
by itself. Given how BRS operates there is a possibility that BRS captures
kernel level branches even though the event is programmed to count only at
the user level.
Implement a workaround in software by removing the branches which belong to
the wrong privilege level. The privilege level is evaluated on the target of
the branch and not the source so as to be compatible with other architectures.
As a consequence of this patch, the number of entries in the
PERF_RECORD_BRANCH_STACK buffer may be less than the maximum (16). It could
even be zero. Another consequence is that consecutive entries in the branch
stack may not reflect actual code path and may have discontinuities, in case
kernel branches were suppressed. But this is no different than what happens
on other architectures.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-6-eranian@google.com
|
|
Add a pseudo event called branch-brs to help use the FAM Fam19h
Branch Sampling feature (BRS). BRS samples taken branches, so it is best used
when sampling on a retired taken branch event (0xc4) which is what BRS
captures. Instead of trying to remember the event code or actual event name,
users can simply do:
$ perf record -b -e cpu/branch-brs/ -c 1000037 .....
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-5-eranian@google.com
|
|
Add support for the AMD Fam19h 16-deep branch sampling feature as
described in the AMD PPR Fam19h Model 01h Revision B1. This is a model
specific extension. It is not an architected AMD feature.
The Branch Sampling (BRS) operates with a 16-deep saturating buffer in MSR
registers. There is no branch type filtering. All control flow changes are
captured. BRS relies on specific programming of the core PMU of Fam19h. In
particular, the following requirements must be met:
- the sampling period be greater than 16 (BRS depth)
- the sampling period must use a fixed and not frequency mode
BRS interacts with the NMI interrupt as well. Because enabling BRS is
expensive, it is only activated after P event occurrences, where P is the
desired sampling period. At P occurrences of the event, the counter
overflows, the CPU catches the interrupt, activates BRS for 16 branches until
it saturates, and then delivers the NMI to the kernel. Between the overflow
and the time BRS activates more branches may be executed skewing the period.
All along, the sampling event keeps counting. The skid may be attenuated by
reducing the sampling period by 16 (subsequent patch).
BRS is integrated into perf_events seamlessly via the same
PERF_RECORD_BRANCH_STACK sample format. BRS generates perf_branch_entry
records in the sampling buffer. No prediction information is supported. The
branches are stored in reverse order of execution. The most recent branch is
the first entry in each record.
No modification to the perf tool is necessary.
BRS can be used with any sampling event. However, it is recommended to use
the RETIRED_BRANCH_INSTRUCTIONS event because it matches what the BRS
captures.
$ perf record -b -c 1000037 -e cpu/event=0xc2,name=ret_br_instructions/ test
$ perf report -D
56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0
... branch stack: nr:16
..... 0: 0000000000401d24 -> 0000000000401d5a 0 cycles 0
..... 1: 0000000000401d5c -> 0000000000401d24 0 cycles 0
..... 2: 0000000000401d22 -> 0000000000401d5c 0 cycles 0
..... 3: 0000000000401d5e -> 0000000000401d22 0 cycles 0
..... 4: 0000000000401d20 -> 0000000000401d5e 0 cycles 0
..... 5: 0000000000401d3e -> 0000000000401d20 0 cycles 0
..... 6: 0000000000401d42 -> 0000000000401d3e 0 cycles 0
..... 7: 0000000000401d3c -> 0000000000401d42 0 cycles 0
..... 8: 0000000000401d44 -> 0000000000401d3c 0 cycles 0
..... 9: 0000000000401d3a -> 0000000000401d44 0 cycles 0
..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles 0
..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles 0
..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles 0
..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles 0
..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles 0
..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles 0
... thread: test:18230
...... dso: test
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-4-eranian@google.com
|
|
Variable shift is being initialized with a value that is never read, it
is being re-assigned later inside a loop. The assignment is redundant
and can be removed.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211207185001.1412413-1-colin.i.king@gmail.com
|
|
Add <asm/amd-ibs.h> with bitfield definitions for IBS MSRs,
and demonstrate usage within the driver.
Also move 'struct perf_ibs_data' where it can be shared with
the perf tool that will soon be using it.
No functional changes.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210817221048.88063-9-kim.phillips@amd.com
|
|
Add support to build the AMD uncore driver as a module.
This is in order to facilitate development without having
to reboot the kernel in most cases.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210817221048.88063-8-kim.phillips@amd.com
|
|
Factor out a helper function rather than export cpu_llc_id, which is
needed in order to be able to build the AMD uncore driver as a module.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210817221048.88063-7-kim.phillips@amd.com
|
|
Found by checkpatch.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210817221048.88063-6-kim.phillips@amd.com
|