Age | Commit message (Collapse) | Author | Files | Lines |
|
Add support for using 'perf report --dump-raw-trace' to parse PTT packet.
Example usage:
Output will contain raw PTT data and its textual representation, such
as (8DW format):
0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000 offset: 0
ref: 0xa5d50c725 idx: 0 tid: -1 cpu: 0
.
. ... HISI PTT data: size 4194304 bytes
. 00000000: 00 00 00 00 Prefix
. 00000004: 08 20 00 60 Header DW0
. 00000008: ff 02 00 01 Header DW1
. 0000000c: 20 08 00 00 Header DW2
. 00000010: 10 e7 44 ab Header DW3
. 00000014: 2a a8 1e 01 Time
. 00000020: 00 00 00 00 Prefix
. 00000024: 01 00 00 60 Header DW0
. 00000028: 0f 1e 00 01 Header DW1
. 0000002c: 04 00 00 00 Header DW2
. 00000030: 40 00 81 02 Header DW3
. 00000034: ee 02 00 00 Time
....
This patch only add basic parsing support according to the definition of
the PTT packet described in Documentation/trace/hisi-ptt.rst. And the
fields of each packet can be further decoded following the PCIe Spec's
definition of TLP packet.
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-4-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the
PCIe link's events, and trace the TLP headers).
This patch add support for PTT device in perf tool, so users could use
'perf record' to get TLP headers trace data.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
uClibc segfaulted because NULL was passed as the format to fprintf().
That happened because one of the format strings was missing and
intel_pt_print_info() didn't check that before calling fprintf().
Add the missing format string, and check format is not NULL before calling
fprintf().
Fixes: 11fa7cb86b56d361 ("perf tools: Pass Intel PT information for decoding MTC and CYC")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221012082259.22394-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When a program header was added, it moved the text section but
GEN_ELF_TEXT_OFFSET was not updated.
Fix by adding the program header size and aligning.
Fixes: babd04386b1df8c3 ("perf jit: Include program header in ELF files")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lieven Hey <lieven.hey@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If the kernel exposes a new perf_event_attr field in a format attr, perf
will return an error stating the specified PMU can't be found. For
example, a format attr with 'config3:0-63' causes an error as config3 is
unknown to perf. This causes a compatibility issue between a newer
kernel with older perf tool.
Before this change with a kernel adding 'config3' I get:
$ perf record -e arm_spe// -- true
event syntax error: 'arm_spe//'
\___ Cannot find PMU `arm_spe'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list
available events
After this change, I get:
$ perf record -e arm_spe// -- true
WARNING: 'arm_spe_0' format 'inv_event_filter' requires 'perf_event_attr::config3' which is not supported by this version of perf!
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.091 MB perf.data ]
To support unknown configN formats, rework the YACC implementation to
pass any config[0-9]+ format to perf_pmu__new_format() to handle with a
warning.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220914-arm-perf-tool-spe1-2-v2-v4-1-83c098e6212e@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The recent change in the cgroup will break the backward compatiblity in
the BPF program. It should support both old and new kernels using BPF
CO-RE technique.
Like the task_struct->__state handling in the offcpu analysis, we can
check the field name in the cgroup struct.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: bpf@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: zefan li <lizefan.x@bytedance.com>
Link: http://lore.kernel.org/lkml/20221011052808.282394-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Add support for AMD on 'perf mem' and 'perf c2c', the kernel
enablement patches went via tip.
Example:
$ sudo perf mem record -- -c 10000
^C[ perf record: Woken up 227 times to write data ]
[ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
$ sudo perf mem report -F mem,sample,snoop
Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
Memory access Samples Snoop
N/A 700620 N/A
L1 hit 126675 N/A
L2 hit 424 N/A
L3 hit 664 HitM
L3 hit 10 N/A
Local RAM hit 2 N/A
Remote RAM (1 hop) hit 8558 N/A
Remote Cache (1 hop) hit 3 N/A
Remote Cache (1 hop) hit 2 HitM
Remote Cache (2 hops) hit 10 HitM
Remote Cache (2 hops) hit 6 N/A
Uncached hit 4 N/A
$
- "perf lock" improvements:
- Add -E/--entries option to limit the number of entries to
display, say to ask for just the top 5 contended locks.
- Add -q/--quiet option to suppress header and debug messages.
- Add a 'perf test' kernel lock contention entry to test 'perf
lock'.
- "perf lock contention" improvements:
- Ask BPF's bpf_get_stackid() to skip some callchain entries.
The ones closer to the tooling are bpf related and not that
interesting, the ones calling the locking function are the ones
we're interested in, example of a full, unskipped callstack:
- Allow changing the callstack depth and number of entries to skip.
1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffbb8b8e75 bpf_trace_run2+0x35
0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb
0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5
0xffffffffbc1c26ff _raw_spin_lock+0x1f
0xffffffffbb841015 tick_do_update_jiffies64+0x25
0xffffffffbb8409ee tick_irq_enter+0x9e
- Show full callstack in verbose mode (-v option), sometimes this
is desirable instead of showing just one callstack entry.
- Allow multiple time ranges in 'perf record --delay' to help in
reducing the amount of data collected from hardware tracing (Intel
PT, etc) when there is a rough idea of periods of time where events
of interest take time.
- Add Intel PT to record only decoder debug messages when error
happens.
- Improve layout of Intel PT man page.
- Add new branch types: alignment, data and inst faults and arch
specific ones, such as fiq, debug_halt, debug_exit, debug_inst and
debug_data on arm64.
Kernel enablement went thru the tip tree.
- Fix 'perf probe' error log check in 'perf test' when no debuginfo is
available.
- Fix 'perf stat' aggregation mode logic, it should be looking at the
CPU not at the core number.
- Fix flags parsing in 'perf trace' filters.
- Introduce compact encoding of CPU range encoding on perf.data, to
avoid having a bitmap with all the CPUs.
- Improvements to the 'perf stat' metrics, including adding
"core_wide", and computing "smt" from the CPU topology.
- Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format,
that allows tooling to ask for the precise number of lost samples for
a given event.
- Add 'addr' sort key to see just the address of sampled instructions:
$ perf record -o- true | perf report -i- -s addr
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
# Samples: 12 of event 'cycles:u'
# Event count (approx.): 252512
#
# Overhead Address
# ........ ..................
42.96% 0x7f96f08443d7
29.55% 0x7f96f0859b50
14.76% 0x7f96f0852e02
8.30% 0x7f96f0855028
4.43% 0xffffffff8de01087
perf annotate: Toggle full address <-> offset display
- Add 'f' hotkey to the 'perf annotate' TUI interface when in
'disassembler output' mode ('o' hotkey) to toggle showing full
virtual address or just the offset.
- Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for
pre-existing threads, at the start of a 'perf record' session,
speeding up that record startup phase.
- Add a command line option to specify build ids in 'perf inject'.
- Update JSON event files for the Intel alderlake, broadwell,
broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake,
icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids,
skylake, skylakex, and tigerlake processors.
- Update vendor JSON event files for the ARM Neoverse V1 and E1
platforms.
- Add a 'perf test' entry for 'perf mem' where a struct has false
sharing and this gets detected in the 'perf mem' output, tested with
Intel, AMD and ARM64 systems.
- Add a 'perf test' entry to test the resolution of java symbols, where
an output like this is expected:
8.18% jshell jitted-50116-29.so [.] Interpreter
0.75% Thread-1 jitted-83602-1670.so [.] jdk.internal.jimage.BasicImageReader.getString(int)
- Add tests for the ARM64 CoreSight hardware tracing feature, with
specially crafted pureloop, memcpy, thread loop and unroll tread that
then gets traced and the output compared with expected output.
Documentation explaining it is also included.
- Add per thread Intel PT 'perf test' entry to check that
PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a
mixture of per thread and per CPU events and mmaps, verify that this
gets all recorded correctly.
- Introduce pthread mutex wrappers to allow for building with clang's
-Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by"
"lockable", "exclusive_lock_function", "exclusive_trylock_function",
"exclusive_locks_required", and "no_thread_safety_analysis" compiler
function attributes.
- Fix empty version number when building outside of a git repo.
- Improve feature detection display when multiple versions of a feature
are present, such as for binutils libbfd, that has a mix of possible
ways to detect according to the Linux distribution.
Previously in some cases we had:
Auto-detecting system features
<SNIP>
... libbfd: [ on ]
... libbfd-liberty: [ on ]
... libbfd-liberty-z: [ on ]
<SNIP>
Now for this case we show just the main feature:
Auto-detecting system features
<SNIP>
... libbfd: [ on ]
<SNIP>
- Remove some unused structs, variables, macros, function prototypes
and includes from various places.
* tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
perf script: Add missing fields in usage hint
perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB
perf mem/c2c: Avoid printing empty lines for unsupported events
perf mem/c2c: Add load store event mappings for AMD
perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO}
perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel
tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel
perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout()
perf test coresight: Add relevant documentation about ARM64 CoreSight testing
perf test: Add git ignore for tmp and output files of ARM CoreSight tests
perf test coresight: Add unroll thread test shell script
perf test coresight: Add unroll thread test tool
perf test coresight: Add thread loop test shell scripts
perf test coresight: Add thread loop test tool
perf test coresight: Add memcpy thread test shell script
perf test coresight: Add memcpy thread test tool
perf test: Add git ignore for perf data generated by the ARM CoreSight tests
perf test: Add arm64 asm pureloop test shell script
perf test: Add asm pureloop test tool
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset now support isolated cpus.partition type, which will enable
dynamic CPU isolation
- pids.peak added to remember the max number of pids used
- holes in cgroup namespace plugged
- internal cleanups
* tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (25 commits)
cgroup: use strscpy() is more robust and safer
iocost_monitor: reorder BlkgIterator
cgroup: simplify code in cgroup_apply_control
cgroup: Make cgroup_get_from_id() prettier
cgroup/cpuset: remove unreachable code
cgroup: Remove CFTYPE_PRESSURE
cgroup: Improve cftype add/rm error handling
kselftest/cgroup: Add cpuset v2 partition root state test
cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst
cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule
cgroup/cpuset: Relocate a code block in validate_change()
cgroup/cpuset: Show invalid partition reason string
cgroup/cpuset: Add a new isolated cpus.partition type
cgroup/cpuset: Relax constraints to partition & cpus changes
cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective
cgroup/cpuset: Miscellaneous cleanups & add helper functions
cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
cgroup: add pids.peak interface for pids controller
cgroup: Remove data-race around cgrp_dfl_visible
cgroup: Fix build failure when CONFIG_SHRINKER_DEBUG
...
|
|
A hw component to track outstanding L1 Data Cache misses is called LFB
(Line Fill Buffer) on Intel and Arm. However similar component exists on
other arch with different names, for ex, it's called MAB (Miss Address
Buffer) on AMD. Use 'LFB/MAB' instead of just 'LFB'.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-8-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The 'perf mem' and 'perf c2c' tools can be used with 3 different events:
load, store and combined load-store. Some architectures might support
only partial set of events in which case, perf prints an empty line for
unsupported events. Avoid that.
Ex, AMD Zen cpus supports only combined load-store event and does not
support individual load and store event.
Before patch:
$ perf mem record -e list
mem-ldst : available
$
After patch:
$ perf mem record -e list
mem-ldst : available
$
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-7-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add support for printing these new fields in perf mem report.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-4-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'perf stat' has options to aggregate the counts in different modes like
per socket, per core etc. The function "aggr_printout" in
util/stat-display.c which is used to print the aggregates, has a check
for cpu in case of AGGR_NONE.
This check was originally using condition : "if (id.cpu.cpu > -1)". But
this got changed after commit df936cadfb58 ("perf stat: Add JSON output
option"), which added option to output json format for different
aggregation modes. After this commit, the check in "aggr_printout" is
using "if (id.core > -1)".
The old code was using "id.cpu.cpu > -1" while the new code is using
"id.core > -1". But since the value printed is id.cpu.cpu, fix this
check to use cpu and not core.
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20221006114225.66303-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When synthesizing MMAP2 with build-id, it'd read the same file repeatedly as
it has no idea if it's done already. Maintain a dsos to check that and skip
the file access if possible.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220920222822.2171056-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The aggr_cpu_id has a thread value but it's actually an index to the
thread_map. To reduce possible confusion, rename it to thread_idx.
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now it always uses the global rt_stat. Let's get rid of the field from
the saved_value. When the both evsels are NULL, it'd return 0 so remove
the block in the saved_value_cmp.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now it's using the global rt_stat, no need to use per-thread stats. Let
get rid of them.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When AGGR_THREAD is active, it aggregates the values for each thread.
Previously it used cpu map index which is invalid for AGGR_THREAD so
it had to use separate runtime stats with index 0.
But it can just use the rt_stat with thread_map_index. Rename the
first_shadow_map_idx() and make it return the thread index.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The cpu_map_idx fields is just to differentiate values from other
entries. It doesn't need to be strictly cpu map index. Actually we can
pass thread map index or aggr map index. So rename the fields first.
No functional change intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
evsel__reset_stat_priv() is called more than once if user gave -r option
for multiple runs. But it doesn't need to re-initialize the id.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It uses only one member, no need to have it as an array.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For system-wide evsels, the thread map should be dummy - i.e. it has a
single entry of -1. But the code guarantees such a thread map, so no
need to handle it specially.
No functional change intended.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221003204647.1481128-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a help to create a system-wide sched_switch event. One merit is
that it sets the system-wide bit before adding it to evlist so that
the libperf can handle the cpu and thread maps correctly.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221003204647.1481128-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The cpu and thread maps are properly handled in libperf now. No need to
do it in the perf tools anymore. Let's remove the logic.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221003204647.1481128-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Counts are scaled prior to going into saved_value, reverse the scaling
so that metrics don't double scale values.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221004021612.325521-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
jevents.py no longer lowercases metrics and altering the case can cause
hashmap lookups to fail, so remove.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221004021612.325521-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Some TMA metrics have double if expressions like:
( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ) if #core_wide < 1 else ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else CPU_CLK_UNHALTED.THREAD
This currently fails to parse as the left hand side if expression needs
to be in parentheses. By allowing the if expression to have a right hand
side that is an if expression we can parse the expression above, with
left to right evaluation order that matches languages like Python.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221004021612.325521-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Unused macros reported by [-Wunused-macros].
This macros were introduced as __PERF_COUNTER_FIELD and used for reading
the bit in config.
cdd6c482c9ff9c55 ("perf: Do the big rename: Performance Counters ->
Performance Events") Changes it to __PERF_EVENT_FIELD but at this commit
there is already nowhere else using these macros, also no macros called
PERF_EVENT_##name##_MASK/SHIFT.
Now we are not reading type or id from config. These macros are useless
and incomplete.
So removing them for code cleaning.
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220926031440.28275-5-chenzhongjin@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When perf is built in a full source tree that is not a git repository,
e.g. from a kernel source tarball, `perf version` will print empty tag
and commit strings:
$ perf version
perf version
Currently the tag version is only generated from the root Makefile when
building in a git repository. If PERF-VERSION-FILE has not been
generated and the source tree is not in a git repository, then
PERF-VERSION-GEN will return an empty version.
The problem can be reproduced with the following steps:
$ wget https://git.kernel.org/torvalds/t/linux-6.0-rc7.tar.gz
$ tar -xf linux-6.0-rc7.tar.gz && cd linux-6.0-rc7
$ make -C tools/perf
$ tools/perf/perf -v
perf version
Builds from tarballs generated with `make perf-tar-src-pkg` are not
impacted by this issue as PERF-VERSION-FILE is included in the archive.
The perf RPM provided by Fedora for 5.18+ is experiencing this problem.
Package build logs[0] show that the build is attempting to fall back on
PERF-VERSION-FILE, but it is not present.
To resolve this, revert back to the previous logic of using the kernel
Makefile version if not in a git repository and PERF-VERSION-FILE does
not exist.
[0] https://kojipkgs.fedoraproject.org/packages/kernel-tools/5.19.4/200.fc36/data/logs/x86_64/build.log
Fixes: 7572733b84997d23 ("perf tools: Fix version kernel tag")
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Will Chandler <wfc@wfchandler.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220930151157.529674-1-wfc@wfchandler.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The struct lock_contention_key is never used, remove it.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/linux-perf-users/20220927013931.110475-6-yuancan@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The struct debug_line_info is never used, remove it.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/linux-perf-users/20220927013931.110475-5-yuancan@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
After commit 46bdc0bf8d21 ("perf metric: Simplify metric_refs
calculation"), no one use struct metric_ref_node, so remove it.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/linux-perf-users/20220927013931.110475-4-yuancan@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
After commit a93f0e551af9 ("perf symbols: Get kernel start address by symbol
name"), no one uses struct process_args any more, so remove it.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/linux-perf-users/20220927013931.110475-2-yuancan@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It was reported that it failed to build the BPF lock contention skeleton
on 32 bit arch due to the size of long. The lost count is used only for
reporting errors due to lack of stackmap space through bad_hist which
type is 'int'. Let's use int type then.
Fixes: 6d499a6b3d90277d ("perf lock: Print the number of lost entries for BPF")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20220926215638.3931222-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Unused macro reported by [-Wunused-macros].
This macro is introduced to calculate the 'unit' size, in:
d2fb8b4151a92223 ("perf tools: Add new perf_atoll() function to parse string representing size in bytes")
8ba7f6c2faada3ad ("saner perf_atoll()")
This commit has simplified the perf_atoll() function and remove the
'unit' variable. This macro is not deleted, but nowhere else is using
it.
A single letter macro is confusing and easy to be misused. So remove it
for code cleaning.
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220926031440.28275-6-chenzhongjin@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add debug messages to enable scripts to track aspects of 'perf record'
behaviour. The messages will be consumed after 'perf record' has run,
with the exception of "perf record has started" which is consequently
flushed.
Put comments so developers know which messages are also being used by test
scripts.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220912083412.7058-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Handle 'f' key to toggle the display offset and full address. Obviously
it only works when users set to see disassembler output ('o' key). It'd
be useful when users want to see the full virtual address in the TUI
annotate browser.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220923173142.805896-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sometimes users want to see actual (virtual) address of sampled instructions.
Add a new 'addr' sort key to display the raw addresses.
$ perf record -o- true | perf report -i- -s addr
# To display the perf.data header info, please use --header/--header-only options.
#
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
#
# Total Lost Samples: 0
#
# Samples: 12 of event 'cycles:u'
# Event count (approx.): 252512
#
# Overhead Address
# ........ ..................
#
42.96% 0x7f96f08443d7
29.55% 0x7f96f0859b50
14.76% 0x7f96f0852e02
8.30% 0x7f96f0855028
4.43% 0xffffffff8de01087
Note that it just compares and displays the sample ip. Each process can
have a different memory layout and the ip will be different even if they run
the same binary. So this sort key is mostly meaningful for per-process
profile data.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220923173142.805896-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The error code is set to -1 at the beginning of jit_write_elf(), but it is
assigned by jit_add_eh_frame_info() in the middle, hence the following
error can only return the error code of jit_add_eh_frame_info(). Reset
the error code to the default value after being assigned by
jit_add_eh_frame_info().
Fixes: 086f9f3d7897d808 ("perf jit: Generate .eh_frame/.eh_frame_hdr in DSO")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stefano Sanfilippo <ssanfilippo@chromium.org>
Link: https://lore.kernel.org/r/20220922141438.22487-2-shangxiaojing@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently it collects stack traces to max size then skip entries.
Because we don't have control how to skip perf callchains. But BPF can
do it with bpf_get_stackid() with a flag.
Say we have max-stack=4 and stack-skip=2, we get these stack traces.
Before: After:
.---> +---+ <--. .---> +---+ <--.
| | | | | | | |
| +---+ usable | +---+ |
max | | | max | | |
stack +---+ <--' stack +---+ usable
| | X | | | | |
| +---+ skip | +---+ |
| | X | | | | |
`---> +---+ `---> +---+ <--' <=== collection
| X |
+---+ skip
| X |
+---+
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220912055314.744552-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It needs stack traces to find callers of locks. To minimize the
performance overhead it only collects up to 8 entries for each stack
trace. And it skips first 3 entries as they came from BPF, tracepoint
and lock functions which are not interested for most users.
But it turned out that those numbers are different in some
configuration. Using fixed number can result in non meaningful caller
names. Let's make them adjustable with --stack-depth and --skip-stack
options.
On my setup, the default output is like below:
# /perf lock con -ab -F contended,wait_total sleep 3
contended total wait type caller
28 4.55 ms rwlock:W __bpf_trace_contention_begin+0xb
33 1.67 ms rwlock:W __bpf_trace_contention_begin+0xb
12 580.28 us spinlock __bpf_trace_contention_begin+0xb
60 240.54 us rwsem:R __bpf_trace_contention_begin+0xb
27 64.45 us spinlock __bpf_trace_contention_begin+0xb
If I change the stack skip to 5, the result will be like:
# perf lock con -ab -F contended,wait_total --stack-skip 5 sleep 3
contended total wait type caller
32 715.45 us spinlock folio_lruvec_lock_irqsave+0x61
26 550.22 us spinlock folio_lruvec_lock_irqsave+0x61
15 486.93 us rwsem:R mmap_read_lock+0x13
12 139.66 us rwsem:W vm_mmap_pgoff+0x93
1 7.04 us spinlock tick_do_update_jiffies64+0x25
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220912055314.744552-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently it shows a caller function for each entry, but users need to see
the full call stacks sometimes. Use -v/--verbose option to do that.
# perf lock con -a -b -v sleep 3
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
contended total wait max wait avg wait type caller
1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffbb8b8e75 bpf_trace_run2+0x35
0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb
0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5
0xffffffffbc1c26ff _raw_spin_lock+0x1f
0xffffffffbb841015 tick_do_update_jiffies64+0x25
0xffffffffbb8409ee tick_irq_enter+0x9e
1 7.70 us 7.70 us 7.70 us spinlock __bpf_trace_contention_begin+0xb
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffbb8b8e75 bpf_trace_run2+0x35
0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb
0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5
0xffffffffbc1c26ff _raw_spin_lock+0x1f
0xffffffffbb7bc27e raw_spin_rq_lock_nested+0xe
0xffffffffbb7cef9c load_balance+0x66c
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220912055314.744552-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Pass state necessary for core_wide into the expression parser. Add
system_wide and user_requested_cpu_list to perf_stat_config to make it
available at display time. evlist isn't used as the
evlist__create_maps, that computes user_requested_cpus, needs the list
of events which is generated by the metric.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220831174926.579643-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Having metric parsing as part of argument processing causes issues as
flags like metric-no-group may be specified later. It also denies the
opportunity to optimize the events on SMT systems where fewer events
may be possible if we know the target is system-wide. Move metric
parsing to after command line option parsing. Because of how stat runs
this moves the parsing after record/report which fail to work with
metrics currently anyway.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220831174926.579643-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It is possible to optimize metrics when all SMT threads (CPUs) on a
core are measuring events in system wide mode. For example, TMA
metrics defines CORE_CLKS for Sandybrdige as:
if SMT is disabled:
CPU_CLK_UNHALTED.THREAD
if SMT is enabled and recording on all SMT threads:
CPU_CLK_UNHALTED.THREAD_ANY / 2
if SMT is enabled and not recording on all SMT threads:
(CPU_CLK_UNHALTED.THREAD/2)*
(1+CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE/CPU_CLK_UNHALTED.REF_XCLK )
That is two more events are necessary when not gathering counts on all
SMT threads. To distinguish all SMT threads on a core vs system wide
(all CPUs) call the new property core wide. Add a core wide test that
determines the property from user requested CPUs, the topology and
system wide. System wide is required as other processes running on a
SMT thread will change the counts.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220831174926.579643-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The topology records sibling threads. Rather than computing SMT using
siblings in sysfs, reuse the values in topology. This only applies
when the file smt/active isn't available.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220831174926.579643-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We currently maintain the two independently and copy from one to the
other. This is a burden when additional scanner context values are
necessary, so combine them.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220831174926.579643-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It builds without it, perhaps with some older combination of flex/bison
we needed this, clean it up a bit removing this.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220909044542.1087870-3-cuigaosheng1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
leftover declarations
The hist_entry__sort_list and sort__first_dimension functions have been
removed in commit cfaa154b2335d4c8 ("perf tools: Get rid of obsolete
hist_entry__sort_list"), remove them.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220909044542.1087870-2-cuigaosheng1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Often cpumaps encode a range of all CPUs, add a compact encoding that
doesn't require a bit mask or list of all CPUs.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220614143353.1559597-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It is possible for casts to introduce alignment issues, prefer a union
for perf_record_event_update.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220614143353.1559597-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|