Age | Commit message (Collapse) | Author | Files | Lines |
|
include/linux/bpf.h
1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value")
aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record")
f71b2f64177a ("bpf: Refactor map->off_arr handling")
https://lore.kernel.org/all/20221114095000.67a73239@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A fix for the LLVM compilation error while building bpftool.
Replaces the expression:
_Static_assert((p) == NULL || ...)
by expression:
_Static_assert((__builtin_constant_p((p)) ? (p) == NULL : 0) || ...)
When "p" is not a constant the former is not considered to be a
constant expression by LLVM 14.
The error was introduced in the following patch-set: [1].
The error was reported here: [2].
[1] https://lore.kernel.org/bpf/20221109142611.879983-1-eddyz87@gmail.com/
[2] https://lore.kernel.org/all/202211110355.BcGcbZxP-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Fixes: c302378bc157 ("libbpf: Hashmap interface update to allow both long and void* keys/values")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221110223240.1350810-1-eddyz87@gmail.com
|
|
An update for libbpf's hashmap interface from void* -> void* to a
polymorphic one, allowing both long and void* keys and values.
This simplifies many use cases in libbpf as hashmaps there are mostly
integer to integer.
Perf copies hashmap implementation from libbpf and has to be
updated as well.
Changes to libbpf, selftests/bpf and perf are packed as a single
commit to avoid compilation issues with any future bisect.
Polymorphic interface is acheived by hiding hashmap interface
functions behind auxiliary macros that take care of necessary
type casts, for example:
#define hashmap_cast_ptr(p) \
({ \
_Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\
#p " pointee should be a long-sized integer or a pointer"); \
(long *)(p); \
})
bool hashmap_find(const struct hashmap *map, long key, long *value);
#define hashmap__find(map, key, value) \
hashmap_find((map), (long)(key), hashmap_cast_ptr(value))
- hashmap__find macro casts key and value parameters to long
and long* respectively
- hashmap_cast_ptr ensures that value pointer points to a memory
of appropriate size.
This hack was suggested by Andrii Nakryiko in [1].
This is a follow up for [2].
[1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/
[2] https://lore.kernel.org/bpf/af1facf9-7bc8-8a3d-0db4-7b3f333589a2@meta.com/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607d
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221109142611.879983-2-eddyz87@gmail.com
|
|
Commit 3af1dfdd51e06697 ("perf build: Move perf_dlfilters.h in the
source tree") moved perf_dlfilters.h to the include/perf/ directory
while include/perf is ignored because it has 'perf' in the name. Newly
created files in the include/perf/ directory will be ignored.
Testing:
Before:
$ touch tools/perf/include/perf/junk
$ git status | grep junk
$ git check-ignore -v tools/perf/include/perf/junk
tools/perf/.gitignore:6:perf tools/perf/include/perf/junk
After:
$ git status | grep junk
tools/perf/include/perf/junk
$ git check-ignore -v tools/perf/include/perf/junk
Add !include/perf/ to perf's .gitignore file.
Fixes: 3af1dfdd51e06697 ("perf build: Move perf_dlfilters.h in the source tree")
Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221103092704.173391-1-dolinux.peng@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit f4a2aade6809c657 ("perf tests powerpc: Fix branch stack sampling
test to include sanity check for branch filter") added a skip if certain
branch options aren't available.
But the change added both -b (--branch-any) and --branch-filter options
at the same time, which will always result in a failure on any platform
because the arguments can't be used together.
Fix this by removing -b (--branch-any) and leaving --branch-filter which
already specifies 'any'. Also add warning messages to the test and perf
tool.
Output on x86 before this fix:
$ sudo ./perf test branch
108: Check branch stack sampling : Skip
After:
$ sudo ./perf test branch
108: Check branch stack sampling : Ok
Fixes: f4a2aade6809c657 ("perf tests powerpc: Fix branch stack sampling test to include sanity check for branch filter")
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221028121913.745307-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'perf stat' with CSV output option prints an extra empty string as first
field in metrics output line. Sample output below:
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,1.78,msec,cpu-clock,1785146,100.00,0.973,CPUs utilized
S0,1,26,,context-switches,1781750,100.00,0.015,M/sec
S0,1,1,,cpu-migrations,1780526,100.00,0.561,K/sec
S0,1,1,,page-faults,1779060,100.00,0.561,K/sec
S0,1,875807,,cycles,1769826,100.00,0.491,GHz
S0,1,85281,,stalled-cycles-frontend,1767512,100.00,9.74,frontend cycles idle
S0,1,576839,,stalled-cycles-backend,1766260,100.00,65.86,backend cycles idle
S0,1,288430,,instructions,1762246,100.00,0.33,insn per cycle
====> ,S0,1,,,,,,,2.00,stalled cycles per insn
The above command line uses field separator as "," via "-x," option and
per-socket option displays socket value as first field. But here the
last line for "stalled cycles per insn" has "," in the beginning.
Sample output using interval mode:
# ./perf stat -I 1000 -x, --per-socket -a -C 1 ls
0.001813453,S0,1,1.87,msec,cpu-clock,1872052,100.00,0.002,CPUs utilized
0.001813453,S0,1,2,,context-switches,1868028,100.00,1.070,K/sec
------
0.001813453,S0,1,85379,,instructions,1856754,100.00,0.32,insn per cycle
====> 0.001813453,,S0,1,,,,,,,1.34,stalled cycles per insn
Above result also has an extra CSV separator after
the timestamp. Patch addresses extra field separator
in the beginning of the metric output line.
The counter stats are displayed by function
"perf_stat__print_shadow_stats" in code
"util/stat-shadow.c". While printing the stats info
for "stalled cycles per insn", function "new_line_csv"
is used as new_line callback.
The new_line_csv function has check for "os->prefix"
and if prefix is not null, it will be printed along
with cvs separator.
Snippet from "new_line_csv":
if (os->prefix)
fprintf(os->fh, "%s%s", os->prefix, config->csv_sep);
Here os->prefix gets printed followed by ","
which is the cvs separator. The os->prefix is
used in interval mode option ( -I ), to print
time stamp on every new line. But prefix is
already set to contain CSV separator when used
in interval mode for CSV option.
Reference: Function "static void print_interval"
Snippet:
sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
Also if prefix is not assigned (if not used with
-I option), it gets set to empty string.
Reference: function printout() in util/stat-display.c
Snippet:
.prefix = prefix ? prefix : "",
Since prefix already set to contain cvs_sep in interval
option, patch removes printing config->csv_sep in
new_line_csv function to avoid printing extra field.
After the patch:
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,2.04,msec,cpu-clock,2045202,100.00,1.013,CPUs utilized
S0,1,2,,context-switches,2041444,100.00,979.289,/sec
S0,1,0,,cpu-migrations,2040820,100.00,0.000,/sec
S0,1,2,,page-faults,2040288,100.00,979.289,/sec
S0,1,254589,,cycles,2036066,100.00,0.125,GHz
S0,1,82481,,stalled-cycles-frontend,2032420,100.00,32.40,frontend cycles idle
S0,1,113170,,stalled-cycles-backend,2031722,100.00,44.45,backend cycles idle
S0,1,88766,,instructions,2030942,100.00,0.35,insn per cycle
S0,1,,,,,,,1.27,stalled cycles per insn
Fixes: 92a61f6412d3a09d ("perf stat: Implement CSV metrics output")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Reviewed-By: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221018085605.63834-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The following command will get segfault due to missing aggr_header_csv
for AGGR_NODE:
$ sudo perf stat -a --per-node -x, --metric-only true
Committer testing:
Before this patch:
# perf stat -a --per-node -x, --metric-only true
Segmentation fault (core dumped)
#
After:
# gdb perf
-bash: gdb: command not found
# perf stat -a --per-node -x, --metric-only true
node,Ghz,frontend cycles idle,backend cycles idle,insn per cycle,branch-misses of all branches,
N0,32,0.335,2.10,0.65,0.69,0.03,1.92,
#
Fixes: 86895b480a2f10c7 ("perf stat: Add --per-node agregation support")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: http://lore.kernel.org/lkml/20221107213314.3239159-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit 0cc177cfc95d565e ("perf vendor events arm64: Add Hisi hip08 L3
metrics") add L3 metrics of hip08, but some metrics (IF_BP_MISP_BR_RET,
IF_BP_MISP_BR_RET, IF_BP_MISP_BR_BL) have incorrect event number due to
the mistakes in document, which caused incorrect result. Fix the
incorrect metrics.
Before:
65,811,214,308 armv8_pmuv3_0/event=0x1014/ # 18.87 push_branch
# -40.19 other_branch
3,564,316,780 BR_MIS_PRED # 0.51 indirect_branch
# 21.81 pop_branch
After:
6,537,146,245 BR_MIS_PRED # 0.48 indirect_branch
# 0.47 pop_branch
# 0.00 push_branch
# 0.05 other_branch
Fixes: 0cc177cfc95d565e ("perf vendor events arm64: Add Hisi hip08 L3 metrics")
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Acked-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221021105035.10000-2-shangxiaojing@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For modules, names from kallsyms__parse() contain the module name which
meant that module symbols did not match exactly by name.
Fix by matching the name string up to the separating tab character.
Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221026072736.2982-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes from:
825cf206ed510c4a ("statx: add direct I/O alignment information")
That add a constant that was manually added to tools/perf/trace/beauty/statx.c,
at some point this should move to the shell based automated way.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/stat.h' differs from latest version at 'include/uapi/linux/stat.h'
diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h
Cc: Eric Biggers <ebiggers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Y1gGQL5LonnuzeYd@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We also need to add SYM_TYPED_FUNC_START() to util/include/linux/linkage.h
and update tools/perf/check_headers.sh to ignore the include cfi_types.h
line when checking if the kernel original files drifted from the copies
we carry.
This is to get the changes from:
ccace936eec7b805 ("x86: Add types to indirectly called assembly functions")
Addressing these tools/perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/lkml/Y1f3VRIec9EBgX6F@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The virtual LBR test uses a python script to check the max size of
branch stack in the Intel-PT generated LBR. But it didn't check whether
python scripting is available (as it's optional).
Let's skip the test if the python support is not available.
Fixes: f77811a0f62577d2 ("perf test: test_intel_pt.sh: Add 9 tests")
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ammy Yi <ammy.yi@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221021181055.60183-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit e0b23af82d6f454c ("perf list: Add PMU pai_crypto event
description for IBM z16") introduced the "Processor Activity
Instrumentation" for cryptographic counters for z16. The PMU device
driver exports the counters via sysfs files listed in directory
/sys/devices/pai_crypto.
To specify an event from that PMU, use 'perf stat -e pai_crypto/XXX/'.
However the JSON file mentioned in above commit exports the counter
decriptions in file pmu-events/arch/s390/cf_z16/pai.json. Rename this
file to pmu-events/arch/s390/cf_z16/pai_crypto.json to make the naming
consistent.
Now 'perf list' shows the counter names under pai_crypto section:
pai_crypto:
CRYPTO_ALL
[CRYPTO ALL. Unit: pai_crypto]
...
Output before was
pai:
CRYPTO_ALL
[CRYPTO ALL. Unit: pai_crypto]
...
Fixes: e0b23af82d6f454c ("perf list: Add PMU pai_crypto event description for IBM z16")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20221021082557.2695382-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The write call may set errno which is problematic if occurring in a
function also setting errno. Save and restore errno around the write
call.
done_fd may be used after close, clear it as part of the close and check
its validity in the signal handler.
Suggested-by: <gthelen@google.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anand K Mistry <amistry@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20221024011024.462518-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
bpf_program__set_insns() is available
During the transition to libbpf 1.0 some functions that perf used were
deprecated and finally removed from libbpf, so bpf_program__set_insns()
was introduced for perf to continue to use its bpf loader.
But when build with LIBBPF_DYNAMIC=1 we now need to check if that
function is available so that perf can build with older libbpf versions,
even if the end result is emitting a warning to the user that the use
of the perf BPF loader requires a newer libbpf, since bpf_program__set_insns()
touches libbpf objects internal state.
This affects only 'perf trace' when using bpf C code or pre-compiled
bytecode as an event.
Noticed on RHEL9, that has libbpf 0.7.0, where bpf_program__set_insns()
isn't available.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The bpf_load_program() prototype appeared in tools/lib/bpf/bpf.h as
deprecated, but nowadays its completely removed, so add it back for
building with the system libbpf when using 'make LIBBPF_DYNAMIC=1'.
This is a stop gap hack till we do like tools/bpf does with bpftool,
i.e. bootstrap the libbpf build and install it in the perf build
directory when not using 'make LIBBPF_DYNAMIC=1'.
That has to be done to all libraries in tools/lib/, so tha we can
remove -Itools/lib/ from the tools/perf CFLAGS.
Noticed when building with LIBBPF_DYNAMIC=1 and libbpf 0.7.0 on RHEL9.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Testcase stat_all_metrics.sh fails in powerpc:
90: perf all metrics test : FAILED!
The testcase "stat_all_metrics.sh" verifies perf stat result for all the
metric events present in perf list. It runs perf metric events with
various commands and expects non-empty metric result.
Incase of powerpc:hv-24x7 events, some of the event count can be 0 based
on system configuration. And if that event used as denominator in divide
equation, it can cause divide by 0 error. The current nest_metric.json
file creating divide by 0 issue for some of the metric events, which
results in failure of the "stat_all_metrics.sh" test case.
Most of the metrics events have cycles or an event which expect to have
a larger value as denominator, so adding 1 to the denominator of the
metric expression as a fix.
Result in powerpc box after this patch changes:
90: perf all metrics test : Ok
Fixes: a3cbcadfdfc330c2 ("perf vendor events power10: Adds 24x7 nest metric events for power10 platform")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20221014140220.122251-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf build assumes documentation files starting with "perf-" are man
pages but perf-arm-coresight.txt is not a man page:
asciidoc: ERROR: perf-arm-coresight.txt: line 2: malformed manpage title
asciidoc: ERROR: perf-arm-coresight.txt: line 3: name section expected
asciidoc: FAILED: perf-arm-coresight.txt: line 3: section title expected
make[3]: *** [Makefile:266: perf-arm-coresight.xml] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [Makefile.perf:895: man] Error 2
Fix by renaming it.
Fixes: dc2e0fb00bb2b24f ("perf test coresight: Add relevant documentation about ARM64 CoreSight testing")
Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: coresight@lists.linaro.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/a176a3e1-6ddc-bb63-e41c-15cda8c2d5d2@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes in these csets:
e237506238352f3b ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
That doesn't cause any changes in the perf tools.
As a reminder, this table is used in tools perf to allow features such as:
[root@five ~]# perf trace -e set_mempolicy_home_node
^C[root@five ~]#
[root@five ~]# perf trace -v -e set_mempolicy_home_node
Using CPUID AuthenticAMD-25-21-0
event qualifier tracepoint filter: (common_pid != 253729 && common_pid != 3585) && (id == 450)
mmap size 528384B
^C[root@five ~]
[root@five ~]# perf trace -v -e set* --max-events 5
Using CPUID AuthenticAMD-25-21-0
event qualifier tracepoint filter: (common_pid != 253734 && common_pid != 3585) && (id == 38 || id == 54 || id == 105 || id == 106 || id == 109 || id == 112 || id == 113 || id == 114 || id == 116 || id == 117 || id == 119 || id == 122 || id == 123 || id == 141 || id == 160 || id == 164 || id == 170 || id == 171 || id == 188 || id == 205 || id == 218 || id == 238 || id == 273 || id == 308 || id == 450)
mmap size 528384B
0.000 ( 0.008 ms): bash/253735 setpgid(pid: 253735 (bash), pgid: 253735 (bash)) = 0
6849.011 ( 0.008 ms): bash/16046 setpgid(pid: 253736 (bash), pgid: 253736 (bash)) = 0
6849.080 ( 0.005 ms): bash/253736 setpgid(pid: 253736 (bash), pgid: 253736 (bash)) = 0
7437.718 ( 0.009 ms): gnome-shell/253737 set_robust_list(head: 0x7f34b527e920, len: 24) = 0
13445.986 ( 0.010 ms): bash/16046 setpgid(pid: 253738 (bash), pgid: 253738 (bash)) = 0
[root@five ~]#
That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.
$ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -w set_mempolicy_home_node
tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node
tools/perf/arch/s390/entry/syscalls/syscall.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node
$
$ grep -w set_mempolicy_home_node /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c
[450] = "set_mempolicy_home_node",
$
This addresses these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl'
diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/lkml/Y01HN2DGkWz8tC%2FJ@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add support for using 'perf report --dump-raw-trace' to parse PTT packet.
Example usage:
Output will contain raw PTT data and its textual representation, such
as (8DW format):
0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000 offset: 0
ref: 0xa5d50c725 idx: 0 tid: -1 cpu: 0
.
. ... HISI PTT data: size 4194304 bytes
. 00000000: 00 00 00 00 Prefix
. 00000004: 08 20 00 60 Header DW0
. 00000008: ff 02 00 01 Header DW1
. 0000000c: 20 08 00 00 Header DW2
. 00000010: 10 e7 44 ab Header DW3
. 00000014: 2a a8 1e 01 Time
. 00000020: 00 00 00 00 Prefix
. 00000024: 01 00 00 60 Header DW0
. 00000028: 0f 1e 00 01 Header DW1
. 0000002c: 04 00 00 00 Header DW2
. 00000030: 40 00 81 02 Header DW3
. 00000034: ee 02 00 00 Time
....
This patch only add basic parsing support according to the definition of
the PTT packet described in Documentation/trace/hisi-ptt.rst. And the
fields of each packet can be further decoded following the PCIe Spec's
definition of TLP packet.
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-4-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the
PCIe link's events, and trace the TLP headers).
This patch add support for PTT device in perf tool, so users could use
'perf record' to get TLP headers trace data.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add find_pmu_for_event() and use to simplify logic in
auxtrace_record_init(). find_pmu_for_event() will be reused in
subsequent patches.
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-2-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Testcase stat+json_output.sh fails in powerpc:
86: perf stat JSON output linter : FAILED!
The testcase "stat+json_output.sh" verifies perf stat JSON output. The
test covers aggregation modes like per-socket, per-core, per-die, -A
(no_aggr mode) along with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
fields in the output having "CPU" as first field. Same way, for
per-socket, it expects the first field in result to point to socket id.
The testcases compares the result with expected count.
The values for socket, die, core and cpu are fetched from topology
directory:
/sys/devices/system/cpu/cpu*/topology.
For example, socket value is fetched from "physical_package_id" file of
topology directory. (cpu__get_topology_int() in util/cpumap.c)
If a platform fails to fetch the topology information, values will be
set to -1. For example, incase of pSeries platform of powerpc, value for
"physical_package_id" is restricted and not exposed. So, -1 will be
assigned.
Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases where topology
values not exposed, first field of the output displaying will be empty.
This cause the testcase to fail, as it counts number of fields in the
output.
Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
becos of -1 value obtained from topology files for some, only 6 fields
are printed. Hence a testcase failure reported due to mismatch in number
of fields in the output.
Patch here adds a sanity check in the testcase for topology. Check will
help to skip the test if -1 value found.
Fixes: 0c343af2a2f82844 ("perf test: JSON format checking")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20221006155149.67205-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Testcase stat+csv_output.sh fails in powerpc:
84: perf stat CSV output linter: FAILED!
The testcase "stat+csv_output.sh" verifies perf stat CSV output. The
test covers aggregation modes like per-socket, per-core, per-die, -A
(no_aggr mode) along with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
fields in the output having "CPU" as first field. Same way, for
per-socket, it expects the first field in result to point to socket id.
The testcases compares the result with expected count.
The values for socket, die, core and cpu are fetched from topology
directory:
/sys/devices/system/cpu/cpu*/topology.
For example, socket value is fetched from "physical_package_id" file of
topology directory. (cpu__get_topology_int() in util/cpumap.c)
If a platform fails to fetch the topology information, values will be
set to -1. For example, incase of pSeries platform of powerpc, value for
"physical_package_id" is restricted and not exposed. So, -1 will be
assigned.
Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases where topology
values not exposed, first field of the output displaying will be empty.
This cause the testcase to fail, as it counts number of fields in the
output.
Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
becos of -1 value obtained from topology files for some, only 6 fields
are printed. Hence a testcase failure reported due to mismatch in number
of fields in the output.
Patch here adds a sanity check in the testcase for topology. Check will
help to skip the test if -1 value found.
Fixes: 7473ee56dbc91c98 ("perf test: Add checking for perf stat CSV output.")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20221006155149.67205-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
User space tasks can migrate between CPUs, so when tracing selected CPUs,
system-wide sideband is still needed, however evlist->core.has_user_cpus
is not set in the hybrid case, so check the target cpu_list instead.
Fixes: 7d189cadbeebc778 ("perf intel-pt: Track sideband system-wide when needed")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221012082259.22394-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
uClibc segfaulted because NULL was passed as the format to fprintf().
That happened because one of the format strings was missing and
intel_pt_print_info() didn't check that before calling fprintf().
Add the missing format string, and check format is not NULL before calling
fprintf().
Fixes: 11fa7cb86b56d361 ("perf tools: Pass Intel PT information for decoding MTC and CYC")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221012082259.22394-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since PERF_FORMAT_LOST was added, the default read format has that bit
set, so add it to the tests. Keep the old value as well so that the test
still passes on older kernels.
This fixes the following failure:
expected read_format=0|4, got 20
FAILED './tests/attr/test-record-C0' - match failure
Fixes: 85b425f31c8866e0 ("perf record: Set PERF_FORMAT_LOST by default")
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221012094633.21669-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add tests:
Test with MTC and TSC disabled
Test with branches disabled
Test with/without CYC
Test recording with sample mode
Test with kernel trace
Test virtual LBR
Test power events
Test with TNT packets disabled
Test with event_trace
These tests mostly check that perf record works with the corresponding
Intel PT config terms, sometimes also checking that certain packets do or
do not appear in the resulting trace as appropriate.
The "Test virtual LBR" is slightly trickier, using a Python script to
check that branch stacks are actually synthesized.
Signed-off-by: Ammy Yi <ammy.yi@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-8-adrian.hunter@intel.com
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When a program header was added, it moved the text section but
GEN_ELF_TEXT_OFFSET was not updated.
Fix by adding the program header size and aligning.
Fixes: babd04386b1df8c3 ("perf jit: Include program header in ELF files")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lieven Hey <lieven.hey@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a test for decoding self-modifying code using a jitdump file.
The test creates a workload that uses self-modifying code and generates its
own jitdump file. The result is processed with perf inject --jit and
checked for decoding errors.
Note the test will fail without patch "perf inject: Fix GEN_ELF_TEXT_OFFSET
for jit" applied.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Tidy alignment of test function lines to make them more readable.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Messages display with the perf test -v option. Add a message to show when
skipping a test because the user cannot do kernel tracing.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When not decoding, the options "-B -N --no-bpf-event" speed up perf record.
Make a common function for them.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
count_result() does not always reset ret=0 which means the value can spill
into the next test result.
Fix by explicitly setting it to zero between tests.
Committer testing:
# perf test "Miscellaneous Intel PT testing"
110: Miscellaneous Intel PT testing : Ok
#
Tested as well with:
# perf test -v "Miscellaneous Intel PT testing"
Fixes: fd9b45e39cfaf885 ("perf test: test_intel_pt.sh: Fix return checking")
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If the kernel exposes a new perf_event_attr field in a format attr, perf
will return an error stating the specified PMU can't be found. For
example, a format attr with 'config3:0-63' causes an error as config3 is
unknown to perf. This causes a compatibility issue between a newer
kernel with older perf tool.
Before this change with a kernel adding 'config3' I get:
$ perf record -e arm_spe// -- true
event syntax error: 'arm_spe//'
\___ Cannot find PMU `arm_spe'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list
available events
After this change, I get:
$ perf record -e arm_spe// -- true
WARNING: 'arm_spe_0' format 'inv_event_filter' requires 'perf_event_attr::config3' which is not supported by this version of perf!
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.091 MB perf.data ]
To support unknown configN formats, rework the YACC implementation to
pass any config[0-9]+ format to perf_pmu__new_format() to handle with a
warning.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220914-arm-perf-tool-spe1-2-v2-v4-1-83c098e6212e@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
$ perf list metricgroups
gives
List of pre-defined events (to be used in -e):
Metric Groups:
Backend
Bad
BadSpec
But that's incorrect of course because metric groups or metrics can only
be specified with -M. So fix the message to say -e or -M
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221004192634.998984-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The -C/--cpu option was maily for report but it also affected record as
it ate the option. So users needed to use "--" after perf mem record to
pass the info to the perf record properly.
Check if this option is set for record, and pass it to the actual perf
record.
Before)
$ sudo perf --debug perf-event-open mem record -C 0 2>&1 | grep -a sys_perf_event_open
...
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 4
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 8
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 11
...
After)
$ sudo perf --debug perf-event-open mem record -C 0 2>&1 | grep -a sys_perf_event_open
...
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 4
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221004200211.1444521-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
According to the document [1], it can also have 'hs', 'lo', 'vc', 'vs' as a
condition code. Let's add them too.
[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/condition-codes-1-condition-flags-and-codes
Reported-by: Kevin Nomura <nomurak@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20221006222232.266416-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This test commonly fails on Arm Juno because the instruction interval
is large enough to miss generating any samples for Perf in system-wide
mode.
Fix this by lowering the interval until a comfortable number of Perf
instructions are generated. The test is still quick to run because only
a small amount of trace is gathered.
Before:
sudo ./perf test coresight -vvv
...
Recording trace with system wide mode
Looking at perf.data file for dumping branch samples:
Looking at perf.data file for reporting branch samples:
Looking at perf.data file for instruction samples:
CoreSight system wide testing: FAIL
...
After:
sudo ./perf test coresight -vvv
...
Recording trace with system wide mode
Looking at perf.data file for dumping branch samples:
Looking at perf.data file for reporting branch samples:
Looking at perf.data file for instruction samples:
CoreSight system wide testing: PASS
...
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20221005140508.1537277-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The recent change in the cgroup will break the backward compatiblity in
the BPF program. It should support both old and new kernels using BPF
CO-RE technique.
Like the task_struct->__state handling in the offcpu analysis, we can
check the field name in the cgroup struct.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: bpf@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: zefan li <lizefan.x@bytedance.com>
Link: http://lore.kernel.org/lkml/20221011052808.282394-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Add support for AMD on 'perf mem' and 'perf c2c', the kernel
enablement patches went via tip.
Example:
$ sudo perf mem record -- -c 10000
^C[ perf record: Woken up 227 times to write data ]
[ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
$ sudo perf mem report -F mem,sample,snoop
Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
Memory access Samples Snoop
N/A 700620 N/A
L1 hit 126675 N/A
L2 hit 424 N/A
L3 hit 664 HitM
L3 hit 10 N/A
Local RAM hit 2 N/A
Remote RAM (1 hop) hit 8558 N/A
Remote Cache (1 hop) hit 3 N/A
Remote Cache (1 hop) hit 2 HitM
Remote Cache (2 hops) hit 10 HitM
Remote Cache (2 hops) hit 6 N/A
Uncached hit 4 N/A
$
- "perf lock" improvements:
- Add -E/--entries option to limit the number of entries to
display, say to ask for just the top 5 contended locks.
- Add -q/--quiet option to suppress header and debug messages.
- Add a 'perf test' kernel lock contention entry to test 'perf
lock'.
- "perf lock contention" improvements:
- Ask BPF's bpf_get_stackid() to skip some callchain entries.
The ones closer to the tooling are bpf related and not that
interesting, the ones calling the locking function are the ones
we're interested in, example of a full, unskipped callstack:
- Allow changing the callstack depth and number of entries to skip.
1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffbb8b8e75 bpf_trace_run2+0x35
0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb
0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5
0xffffffffbc1c26ff _raw_spin_lock+0x1f
0xffffffffbb841015 tick_do_update_jiffies64+0x25
0xffffffffbb8409ee tick_irq_enter+0x9e
- Show full callstack in verbose mode (-v option), sometimes this
is desirable instead of showing just one callstack entry.
- Allow multiple time ranges in 'perf record --delay' to help in
reducing the amount of data collected from hardware tracing (Intel
PT, etc) when there is a rough idea of periods of time where events
of interest take time.
- Add Intel PT to record only decoder debug messages when error
happens.
- Improve layout of Intel PT man page.
- Add new branch types: alignment, data and inst faults and arch
specific ones, such as fiq, debug_halt, debug_exit, debug_inst and
debug_data on arm64.
Kernel enablement went thru the tip tree.
- Fix 'perf probe' error log check in 'perf test' when no debuginfo is
available.
- Fix 'perf stat' aggregation mode logic, it should be looking at the
CPU not at the core number.
- Fix flags parsing in 'perf trace' filters.
- Introduce compact encoding of CPU range encoding on perf.data, to
avoid having a bitmap with all the CPUs.
- Improvements to the 'perf stat' metrics, including adding
"core_wide", and computing "smt" from the CPU topology.
- Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format,
that allows tooling to ask for the precise number of lost samples for
a given event.
- Add 'addr' sort key to see just the address of sampled instructions:
$ perf record -o- true | perf report -i- -s addr
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
# Samples: 12 of event 'cycles:u'
# Event count (approx.): 252512
#
# Overhead Address
# ........ ..................
42.96% 0x7f96f08443d7
29.55% 0x7f96f0859b50
14.76% 0x7f96f0852e02
8.30% 0x7f96f0855028
4.43% 0xffffffff8de01087
perf annotate: Toggle full address <-> offset display
- Add 'f' hotkey to the 'perf annotate' TUI interface when in
'disassembler output' mode ('o' hotkey) to toggle showing full
virtual address or just the offset.
- Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for
pre-existing threads, at the start of a 'perf record' session,
speeding up that record startup phase.
- Add a command line option to specify build ids in 'perf inject'.
- Update JSON event files for the Intel alderlake, broadwell,
broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake,
icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids,
skylake, skylakex, and tigerlake processors.
- Update vendor JSON event files for the ARM Neoverse V1 and E1
platforms.
- Add a 'perf test' entry for 'perf mem' where a struct has false
sharing and this gets detected in the 'perf mem' output, tested with
Intel, AMD and ARM64 systems.
- Add a 'perf test' entry to test the resolution of java symbols, where
an output like this is expected:
8.18% jshell jitted-50116-29.so [.] Interpreter
0.75% Thread-1 jitted-83602-1670.so [.] jdk.internal.jimage.BasicImageReader.getString(int)
- Add tests for the ARM64 CoreSight hardware tracing feature, with
specially crafted pureloop, memcpy, thread loop and unroll tread that
then gets traced and the output compared with expected output.
Documentation explaining it is also included.
- Add per thread Intel PT 'perf test' entry to check that
PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a
mixture of per thread and per CPU events and mmaps, verify that this
gets all recorded correctly.
- Introduce pthread mutex wrappers to allow for building with clang's
-Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by"
"lockable", "exclusive_lock_function", "exclusive_trylock_function",
"exclusive_locks_required", and "no_thread_safety_analysis" compiler
function attributes.
- Fix empty version number when building outside of a git repo.
- Improve feature detection display when multiple versions of a feature
are present, such as for binutils libbfd, that has a mix of possible
ways to detect according to the Linux distribution.
Previously in some cases we had:
Auto-detecting system features
<SNIP>
... libbfd: [ on ]
... libbfd-liberty: [ on ]
... libbfd-liberty-z: [ on ]
<SNIP>
Now for this case we show just the main feature:
Auto-detecting system features
<SNIP>
... libbfd: [ on ]
<SNIP>
- Remove some unused structs, variables, macros, function prototypes
and includes from various places.
* tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
perf script: Add missing fields in usage hint
perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB
perf mem/c2c: Avoid printing empty lines for unsupported events
perf mem/c2c: Add load store event mappings for AMD
perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO}
perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel
tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel
perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout()
perf test coresight: Add relevant documentation about ARM64 CoreSight testing
perf test: Add git ignore for tmp and output files of ARM CoreSight tests
perf test coresight: Add unroll thread test shell script
perf test coresight: Add unroll thread test tool
perf test coresight: Add thread loop test shell scripts
perf test coresight: Add thread loop test tool
perf test coresight: Add memcpy thread test shell script
perf test coresight: Add memcpy thread test tool
perf test: Add git ignore for perf data generated by the ARM CoreSight tests
perf test: Add arm64 asm pureloop test shell script
perf test: Add asm pureloop test tool
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset now support isolated cpus.partition type, which will enable
dynamic CPU isolation
- pids.peak added to remember the max number of pids used
- holes in cgroup namespace plugged
- internal cleanups
* tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (25 commits)
cgroup: use strscpy() is more robust and safer
iocost_monitor: reorder BlkgIterator
cgroup: simplify code in cgroup_apply_control
cgroup: Make cgroup_get_from_id() prettier
cgroup/cpuset: remove unreachable code
cgroup: Remove CFTYPE_PRESSURE
cgroup: Improve cftype add/rm error handling
kselftest/cgroup: Add cpuset v2 partition root state test
cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst
cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule
cgroup/cpuset: Relocate a code block in validate_change()
cgroup/cpuset: Show invalid partition reason string
cgroup/cpuset: Add a new isolated cpus.partition type
cgroup/cpuset: Relax constraints to partition & cpus changes
cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective
cgroup/cpuset: Miscellaneous cleanups & add helper functions
cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
cgroup: add pids.peak interface for pids controller
cgroup: Remove data-race around cgrp_dfl_visible
cgroup: Fix build failure when CONFIG_SHRINKER_DEBUG
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
- Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
- Add support for syscall wrappers.
- Add support for KFENCE on 64-bit.
- Update 64-bit HV KVM to use the new guest state entry/exit accounting
API.
- Support execute-only memory when using the Radix MMU (P9 or later).
- Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
- Updates to our linker script to move more data into read-only
sections.
- Allow the VDSO to be randomised on 32-bit.
- Many other small features and fixes.
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.
* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
KVM: PPC: Book3S HV: Fix stack frame regs marker
powerpc: Don't add __powerpc_ prefix to syscall entry points
powerpc/64s/interrupt: Fix stack frame regs marker
powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
powerpc/pseries: Add firmware details to the hardware description
powerpc/powernv: Add opal details to the hardware description
powerpc: Add device-tree model to the hardware description
powerpc/64: Add logical PVR to the hardware description
powerpc: Add PVR & CPU name to hardware description
powerpc: Add hardware description string
powerpc/configs: Enable PPC_UV in powernv_defconfig
powerpc/configs: Update config files for removed/renamed symbols
powerpc/mm: Fix UBSAN warning reported on hugetlb
powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
powerpc: Drops STABS_DEBUG from linker scripts
powerpc/64s: Remove lost/old comment
powerpc/64s: Remove old STAB comment
powerpc: remove orphan systbl_chk.sh
...
|
|
A few fields are missing in the usage message printed when an unknown
field option is passed. Add them to the list.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-9-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
A hw component to track outstanding L1 Data Cache misses is called LFB
(Line Fill Buffer) on Intel and Arm. However similar component exists on
other arch with different names, for ex, it's called MAB (Miss Address
Buffer) on AMD. Use 'LFB/MAB' instead of just 'LFB'.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-8-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The 'perf mem' and 'perf c2c' tools can be used with 3 different events:
load, store and combined load-store. Some architectures might support
only partial set of events in which case, perf prints an empty line for
unsupported events. Avoid that.
Ex, AMD Zen cpus supports only combined load-store event and does not
support individual load and store event.
Before patch:
$ perf mem record -e list
mem-ldst : available
$
After patch:
$ perf mem record -e list
mem-ldst : available
$
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-7-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The 'perf mem' and 'perf c2c' tools are wrappers around 'perf record'
with mem load/ store events. IBS tagged load/store sample provides most
of the information needed for these tools. Wire in the "ibs_op//" event
as mem-ldst event for AMD.
There are some limitations though: Only load/store micro-ops provide
mem/c2c information. Whereas, IBS does not have a way to choose a
particular type of micro-op to tag. This results in many non-LS
micro-ops being tagged which appear as N/A in the perf report. IBS,
being an uncore pmu from kernel point of view[1], does not support per
process monitoring. Thus, perf mem/c2c on AMD are currently supported in
per-cpu mode only.
Example:
$ sudo perf mem record -- -c 10000
^C[ perf record: Woken up 227 times to write data ]
[ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
$ sudo perf mem report -F mem,sample,snoop
Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
Memory access Samples Snoop
N/A 700620 N/A
L1 hit 126675 N/A
L2 hit 424 N/A
L3 hit 664 HitM
L3 hit 10 N/A
Local RAM hit 2 N/A
Remote RAM (1 hop) hit 8558 N/A
Remote Cache (1 hop) hit 3 N/A
Remote Cache (1 hop) hit 2 HitM
Remote Cache (2 hops) hit 10 HitM
Remote Cache (2 hops) hit 6 N/A
Uncached hit 4 N/A
$
[1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-6-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently perf sets PERF_SAMPLE_WEIGHT flag only for mem load events.
Set it for combined load-store event as well which will enable recording
of load latency by default on arch that does not support independent
mem load event.
Also document missing -W in perf-record man page.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-5-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add support for printing these new fields in perf mem report.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-4-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'perf stat' has options to aggregate the counts in different modes like
per socket, per core etc. The function "aggr_printout" in
util/stat-display.c which is used to print the aggregates, has a check
for cpu in case of AGGR_NONE.
This check was originally using condition : "if (id.cpu.cpu > -1)". But
this got changed after commit df936cadfb58 ("perf stat: Add JSON output
option"), which added option to output json format for different
aggregation modes. After this commit, the check in "aggr_printout" is
using "if (id.core > -1)".
The old code was using "id.cpu.cpu > -1" while the new code is using
"id.core > -1". But since the value printed is id.cpu.cpu, fix this
check to use cpu and not core.
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20221006114225.66303-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|