Age | Commit message (Collapse) | Author | Files | Lines |
|
Instructions with sign- and zero- extention like movsbl and movzwq were
not handled properly. As it can check different size suffix (-b, -w, -l
or -q) we can omit that and add the common parts even though some
combinations are not possible.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230908052216.566148-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
pmu_events_table__find() is no longer used so remove it and its Arm
specific version.
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230913153355.138331-4-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Currently the while loop always either exits on the first iteration with
a core PMU, or exits with NULL on heterogeneous systems or when not all
CPUs are online.
Both of the latter behaviors are undesirable for platforms other than
Arm so simplify it to always return the first core PMU, or NULL if none
exist.
This behavior was depended on by the Arm version of
pmu_metrics_table__find(), so the logic has been moved there instead.
Signed-off-by: James Clark <james.clark@arm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230913153355.138331-3-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
pmu__find_core_pmu() more logically belongs in pmus.c because it
iterates over all PMUs, so move it to pmus.c
At the same time rename it to perf_pmus__find_core_pmu() to match the
naming convention in this file.
list_prepare_entry() can't be used in perf_pmus__scan_core() anymore now
that it's called from the same compilation unit. This is with -O2
(specifically -O1 -ftree-vrp -finline-functions
-finline-small-functions) which allow the bounds of the array
access to be determined at compile time. list_prepare_entry() subtracts
the offset of the 'list' member in struct perf_pmu from &core_pmus,
which isn't a struct perf_pmu. The compiler sees that pmu results in
&core_pmus - 8 and refuses to compile. At runtime this works because
list_for_each_entry_continue() always adds the offset back again before
dereferencing ->next, but it's technically undefined behavior. With
-fsanitize=undefined an additional warning is generated.
Using list_first_entry_or_null() to get the first entry here avoids
doing &core_pmus - 8 but has the same result and fixes both the compile
warning and the undefined behavior warning. There are other uses of
list_prepare_entry() in pmus.c, but the compiler doesn't seem to be
able to see that they can also be called with &core_pmus, so I won't
change any at this time.
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230913153355.138331-2-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
syscalls with the kernel sources
To pick the changes in these csets:
c35559f94ebc3e3b ("x86/shstk: Introduce map_shadow_stack syscall")
78252deb023cf087 ("arch: Register fchmodat2, usually as syscall 452")
That add support for this new syscall in tools such as 'perf trace'.
For instance, this is now possible:
# perf trace -v -e fchmodat*,map_shadow_stack --max-events=4
Using CPUID AuthenticAMD-25-21-0
Reusing "openat" BPF sys_enter augmenter for "fchmodat"
event qualifier tracepoint filter: (common_pid != 3499340 && common_pid != 11259) && (id == 268 || id == 452 || id == 453)
^C#
And it'll work as with other syscalls, for instance openat:
# perf trace -e openat* --max-events=4
0.000 ( 0.015 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11
0.068 ( 0.019 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11
0.119 ( 0.008 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.current", flags: RDONLY|CLOEXEC) = 11
0.138 ( 0.006 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.min", flags: RDONLY|CLOEXEC) = 11
#
That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.
$ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -E fchmodat\|sys_map_shadow_stack
tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:258 n64 fchmodat sys_fchmodat
tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:452 n64 fchmodat2 sys_fchmodat2
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:297 common fchmodat sys_fchmodat
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:452 common fchmodat2 sys_fchmodat2
tools/perf/arch/s390/entry/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat sys_fchmodat
tools/perf/arch/s390/entry/syscalls/syscall.tbl:452 common fchmodat2 sys_fchmodat2 sys_fchmodat2
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:268 common fchmodat sys_fchmodat
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:452 common fchmodat2 sys_fchmodat2
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:453 64 map_shadow_stack sys_map_shadow_stack
$
$ grep -Ew map_shadow_stack\|fchmodat2 /tmp/build/perf-tools/arch/x86/include/generated/asm/syscalls_64.c
[452] = "fchmodat2",
[453] = "map_shadow_stack",
$
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Link: https://lore.kernel.org/lkml/ZP8bE7aXDBu%2Fdrak@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
parse_events_terms() existed in function names but was passed a
'struct list_head'.
As many parse_events functions take an evsel_config list as well as a
parse_event_term list, and the naming head_terms and head_config is
inconsistent, there's a potential to switch the lists and get errors.
Introduce a 'struct parse_events_terms', that just wraps a list_head, to
avoid this. Add the regular init/exit functions and transition the code
to use them.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230901233949.2930562-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The PMU name could be NULL in the case of the fake_pmu. Initialize the
name for the fake_pmu to "fake" so that all other logic can assume it
is initialized. Add a const to the type of name so that a literal can
be used to avoid additional initialization code. Propagate the cost
through related routines and remove now unnecessary "(char *)"
casts. Doing this located a bug in builtin-list for the pmu_glob that
was missing a strdup.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230825024002.801955-3-irogers@google.com
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Li <liwei391@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Ming Wang <wangming01@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Rather than read a sysfs events file into a 256 byte char buffer, pass
the FILE* directly to the lex/yacc parser.
This avoids there being a maximum events file size.
While changing the API, constify some arguments to remove unnecessary
casts.
Allocating the read buffer decreases the performance of pmu-scan by
around 3%.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230824041330.266337-10-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Pass the PMU so the format list can be better abstracted and later
lazily loaded.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-8-irogers@google.com
[ Did missing conversions in tools/perf/arch/arm*/util/cs-etm.c ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Abstract the format list better, hiding it in the PMU, by changing
perf_pmu__config_terms() the PMU rather than the format list in the PMU.
Change the PMU test to pass a dummy PMU for this purpose. Changing the
test allows perf_pmu__del_formats() to become static.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This will allow writing formulas that are conditional on a specific
CPU type or CPU version. It calls through to the existing
strcmp_cpuid_str() function in Perf which has a default weak version,
and an arch specific version for x86 and arm64.
The function takes an 'ID' type value, which is a string. But in this
case Arm CPU IDs are hex numbers prefixed with '0x'. metric.py
assumes strings are only used by event names, and that they can't start
with a number ('0'), so an additional change has to be made to the
regex to convert hex numbers back to 'ID' types.
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Forrington <nick.forrington@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230816114841.1679234-5-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now that variant and revision fields are taken into account the behavior
is slightly more complicated so add a test to ensure that this behaves
as expected.
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Forrington <nick.forrington@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230816114841.1679234-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently variant and revision fields are masked out of the MIDR so
it's not possible to compare different versions of the same CPU.
In a later commit a workaround will be removed just for N2 r0p3, so
enable comparisons on version.
This has the side effect of changing the MIDR stored in the header of
the perf.data file to no longer have masked version fields. It also
affects the lookups in mapfile.csv, but as that currently only has
zeroed version fields, it has no actual effect. The mapfile.csv
documentation also states to zero the version fields, so unless this
isn't done it will continue to have no effect.
There is an existing weak default strcmp_cpuid_str() function, and an
x86 version. This adds another version for arm64.
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Forrington <nick.forrington@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230816114841.1679234-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
util/perf_regs.h includes another perf_regs.h:
#include <perf_regs.h>
Here it includes architecture specific header, for example, if we build
arm64 target, the header tools/perf/arch/arm64/include/perf_regs.h is
included.
We use this implicit way to include architecture specific header, which
is not directive; furthermore, util/perf_regs.c is coupled with the
architecture specific definitions.
This patch moves out arch specific header from util/perf_regs.h for
generalizing the 'util' folder, as a result, the source files in 'arch'
folder explicitly include architecture's perf_regs.h.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20230606014559.21783-7-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The macros PERF_REGS_MAX and PERF_REGS_MASK are architecture specific,
let's remove them from the common file util/perf_regs.c.
As a side effect, the weak functions arch__intr_reg_mask() and
arch__user_reg_mask() just return zeros, every arch defines its own
functions in the 'arch' folder for returning right values.
Note, we don't need to return intr/user register masks dynamically, this
is because these two functions are invoked during recording phase but
not decoding phase, they are always invoked on the native environment,
thus we don't need to parse them dynamically.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20230606014559.21783-6-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The macros PERF_REG_{IP|SP} have been replaced by using functions
perf_arch_reg_{ip|sp}(), remove them!
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20230606014559.21783-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up the fixes that were just merged from perf-tools/perf-tools
for v6.5.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
As topdown.slots may appear as slots it may get confused with
uops_retired.slots which is an invalid perf metric event group leader.
Special case uops_retired.slots to avoid this confusion.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230801053634.1142634-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
syscalltbl.sh
Running shellcheck on syscalltbl.sh generates below warning:
In ./tools/perf/arch/x86/entry/syscalls/syscalltbl.sh line 27:
while read nr abi name entry compat; do
^-^ SC2034 (warning): abi appears unused. Verify use (or export if used externally).
^----^ SC2034 (warning): compat appears unused. Verify use (or export if used externally).
These variables are intentionally unused since they are needed to parse
through the output. Use "_" as a prefix for these throw away variables.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230709182800.53002-22-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
arch_skip_callchain_idx function
'perf record; with callchain recording fails as below
in powerpc:
./perf record -a -gR sleep 10
./perf report
perf: Segmentation fault
gdb trace points to thread__find_map
0 0x00000000101df314 in atomic_cmpxchg (newval=1818846826, oldval=1818846827, v=0x1001a8f3) at /home/athira/linux/tools/include/asm-generic/atomic-gcc.h:70
1 refcount_sub_and_test (i=1, r=0x1001a8f3) at /home/athira/linux/tools/include/linux/refcount.h:135
2 refcount_dec_and_test (r=0x1001a8f3) at /home/athira/linux/tools/include/linux/refcount.h:148
3 map__put (map=0x1001a8b3) at util/map.c:311
4 0x000000001016842c in __map__zput (map=0x7fffffffa368) at util/map.h:190
5 thread__find_map (thread=0x105b92f0, cpumode=<optimized out>, addr=13835058055283572736, al=al@entry=0x7fffffffa358) at util/event.c:582
6 0x000000001016882c in thread__find_symbol (thread=<optimized out>, cpumode=<optimized out>, addr=<optimized out>, al=0x7fffffffa358) at util/event.c:656
7 0x00000000102e12b4 in arch_skip_callchain_idx (thread=<optimized out>, chain=<optimized out>) at arch/powerpc/util/skip-callchain-idx.c:255
8 0x00000000101d3bf4 in thread__resolve_callchain_sample (thread=0x105b92f0, cursor=0x1053d160, evsel=<optimized out>, sample=0x7fffffffa908, parent=0x7fffffffa778, root_al=0x7fffffffa710,
max_stack=<optimized out>) at util/machine.c:2940
9 0x00000000101cd210 in sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=<optimized out>, evsel=<optimized out>, al=<optimized out>, max_stack=<optimized out>)
at util/callchain.c:1112
10 0x000000001022a9d8 in hist_entry_iter__add (iter=0x7fffffffa750, al=0x7fffffffa710, max_stack_depth=<optimized out>, arg=0x7fffffffbbd0) at util/hist.c:1232
11 0x0000000010056d98 in process_sample_event (tool=0x7fffffffbbd0, event=0x7ffff6223c38, sample=0x7fffffffa908, evsel=<optimized out>, machine=0x10524ef8) at builtin-report.c:332
Here arch_skip_callchain_idx calls thread__find_symbol and which
invokes thread__find_map with uninitialised "addr_location".
Snippet:
thread__find_symbol(thread, PERF_RECORD_MISC_USER, ip, &al);
Recent change with commit 0dd5041c9a0eaf8c ("perf addr_location: Add
init/exit/copy functions") , introduced "maps__zput" in the function
thread__find_map. This could result in segfault while accessing
uninitialised map from "struct addr_location". Fix this by adding
addr_location__init and addr_location__exit in arch_skip_callchain_idx.
Fixes: 0dd5041c9a0eaf8c ("perf addr_location: Add init/exit/copy functions")
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230724165815.17810-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit f8ad6018ce3c065a ("perf pmu: Remove duplication around
EVENT_SOURCE_DEVICE_PATH") uses sysfs__read_ull() to read a full sysfs
path, which will never succeeds as it already comes with the sysfs mount
point in it, which sysfs__read_ull() will add again.
Fix it by reading the file using filename__read_ull(), that will not add
the sysfs mount point.
Fixes: f8ad6018ce3c065a ("perf pmu: Remove duplication around EVENT_SOURCE_DEVICE_PATH")
Signed-off-by: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Tested-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/ZL4G7rWXkfv-Ectq@B-Q60VQ05P-2326.local
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
kernel sources
To pick the changes in these csets:
cf264e1329fb0307 ("cachestat: implement cachestat syscall")
That add support for this new syscall in tools such as 'perf trace'.
For instance, this is now possible:
# perf trace -e cachestat
^C[root@five ~]#
# perf trace -v -e cachestat
Using CPUID AuthenticAMD-25-21-0
event qualifier tracepoint filter: (common_pid != 3163687 && common_pid != 3147) && (id == 451)
mmap size 528384B
^C[root@five ~]
# perf trace -v -e *stat* --max-events=10
Using CPUID AuthenticAMD-25-21-0
event qualifier tracepoint filter: (common_pid != 3163713 && common_pid != 3147) && (id == 4 || id == 5 || id == 6 || id == 136 || id == 137 || id == 138 || id == 262 || id == 332 || id == 451)
mmap size 528384B
0.000 ( 0.009 ms): Cache2 I/O/4544 statfs(pathname: 0x45635288, buf: 0x7f8745725b60) = 0
0.012 ( 0.003 ms): Cache2 I/O/4544 newfstatat(dfd: CWD, filename: 0x45635288, statbuf: 0x7f874569d250) = 0
0.036 ( 0.002 ms): Cache2 I/O/4544 newfstatat(dfd: 138, filename: 0x541b7093, statbuf: 0x7f87457256f0, flag: 4096) = 0
0.372 ( 0.006 ms): Cache2 I/O/4544 statfs(pathname: 0x45635288, buf: 0x7f8745725b10) = 0
0.379 ( 0.003 ms): Cache2 I/O/4544 newfstatat(dfd: CWD, filename: 0x45635288, statbuf: 0x7f874569d250) = 0
0.390 ( 0.002 ms): Cache2 I/O/4544 newfstatat(dfd: 138, filename: 0x541b7093, statbuf: 0x7f87457256a0, flag: 4096) = 0
0.609 ( 0.005 ms): Cache2 I/O/4544 statfs(pathname: 0x45635288, buf: 0x7f8745725b60) = 0
0.615 ( 0.003 ms): Cache2 I/O/4544 newfstatat(dfd: CWD, filename: 0x45635288, statbuf: 0x7f874569d250) = 0
0.625 ( 0.002 ms): Cache2 I/O/4544 newfstatat(dfd: 138, filename: 0x541b7093, statbuf: 0x7f87457256f0, flag: 4096) = 0
0.826 ( 0.005 ms): Cache2 I/O/4544 statfs(pathname: 0x45635288, buf: 0x7f8745725b10) = 0
#
That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.
$ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -w sys_cachestat
tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:451 n64 cachestat sys_cachestat
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:451 common cachestat sys_cachestat
tools/perf/arch/s390/entry/syscalls/syscall.tbl:451 common cachestat sys_cachestat sys_cachestat
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:451 common cachestat sys_cachestat
$
$ grep -w cachestat /tmp/build/perf-tools/arch/x86/include/generated/asm/syscalls_64.c
[451] = "cachestat",
$
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Link: https://lore.kernel.org/lkml/ZK1pVBJpbjujJNJW@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull more perf tools updates from Namhyung Kim:
"These are remaining changes and fixes for this cycle.
Build:
- Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in
the debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used"
* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
perf test: Fix event parsing test on Arm
perf evsel amd: Fix IBS error message
perf: unwind: Fix symfs with libdw
perf symbol: Fix uninitialized return value in symbols__find_by_name()
perf test: Test perf lock contention CSV output
perf lock contention: Add --output option
perf lock contention: Add -x option for CSV style output
perf lock: Remove stale comments
perf vendor events intel: Update tigerlake to 1.13
perf vendor events intel: Update skylakex to 1.31
perf vendor events intel: Update skylake to 57
perf vendor events intel: Update sapphirerapids to 1.14
perf vendor events intel: Update icelakex to 1.21
perf vendor events intel: Update icelake to 1.19
perf vendor events intel: Update cascadelakex to 1.19
perf vendor events intel: Update meteorlake to 1.03
perf vendor events intel: Add rocketlake events/metrics
perf vendor metrics intel: Make transaction metrics conditional
perf jevents: Support for has_event function
...
|
|
AMD IBS can do per-process profiling[1] and is no longer restricted to
per-cpu or systemwide only. Remove stale error message. Also, checking
just exclude_kernel is not sufficient since IBS does not support any
privilege filters. So include all exclude_* checks. And finally, move
these checks under tools/perf/arch/x86/ from generic code.
Before:
$ sudo ./perf record -e ibs_op//k -C 0
Error:
AMD IBS may only be available in system-wide/per-cpu mode. Try
using -a, or -C and workload affinity
After:
$ sudo ./perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
[1] https://git.kernel.org/torvalds/c/30093056f7b2
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: ananth.narayan@amd.com
Cc: sandipan.das@amd.com
Cc: santosh.shukla@amd.com
Cc: irogers@google.com
Cc: peterz@infradead.org
Cc: adrian.hunter@intel.com
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: https://lore.kernel.org/r/20230630085230.437-1-ravi.bangoria@amd.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull perf tools updates from Namhyung Kim:
"Internal cleanup:
- Refactor PMU data management to handle hybrid systems in a generic
way.
Do more work in the lexer so that legacy event types parse more
easily. A side-effect of this is that if a PMU is specified,
scanning sysfs is avoided improving start-up time.
- Fix hybrid metrics, for example, the TopdownL1 works for both
performance and efficiency cores on Intel machines. To support
this, sort and regroup events after parsing.
- Add reference count checking for the 'thread' data structure.
- Lots of fixes for memory leaks in various places thanks to the ASAN
and Ian's refcount checker.
- Reduce the binary size by replacing static variables with local or
dynamically allocated memory.
- Introduce shared_mutex for annotate data to reduce memory
footprint.
- Make filesystem access library functions more thread safe.
Test:
- Organize cpu_map tests into a single suite.
- Add metric value validation test to check if the values are within
correct value ranges.
- Add perf stat stdio output test to check if event and metric names
match.
- Add perf data converter JSON output test.
- Fix a lot of issues reported by shellcheck(1). This is a
preparation to enable shellcheck by default.
- Make the large x86 new instructions test optional at build time
using EXTRA_TESTS=1.
- Add a test for libpfm4 events.
perf script:
- Add 'dsoff' outpuf field to display offset from the DSO.
$ perf script -F comm,pid,event,ip,dsoff
ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms])
ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms])
ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms])
ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms])
- Adjust width for large PID/TID values.
perf report:
- Robustify reading addr2line output for srcline by checking sentinel
output before the actual data and by using timeout of 1 second.
- Allow config terms (like 'name=ABC') with breakpoint events.
$ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1
perf annotate:
- Handle x86 instruction suffix like 'l' in 'movl' generally.
- Parse instruction operands properly even with a whitespace. This is
needed for llvm-objdump output.
- Support RISC-V binutils lookup using the triplet prefixes.
- Add '<' and '>' key to navigate to prev/next symbols in TUI.
- Fix instruction association and parsing for LoongArch.
perf stat:
- Add --per-cache aggregation option, optionally specify a cache
level like `--per-cache=L2`.
$ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
taskset -c 0-15,64-79,128-143,192-207\
perf bench sched messaging -p -t -l 100000 -g 8
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 8 groups == 320 threads run
Total time: 7.648 [sec]
Performance counter stats for 'system wide':
S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote
7.779171484 seconds time elapsed
- Change default (no event/metric) formatting for default metrics so
that events are hidden and the metric and group appear.
Performance counter stats for 'ls /':
1.85 msec task-clock # 0.594 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
97 page-faults # 52.517 K/sec
2,187,173 cycles # 1.184 GHz
2,474,459 instructions # 1.13 insn per cycle
531,584 branches # 287.805 M/sec
13,626 branch-misses # 2.56% of all branches
TopdownL1 # 23.5 % tma_backend_bound
# 11.5 % tma_bad_speculation
# 39.1 % tma_frontend_bound
# 25.9 % tma_retiring
- Allow --cputype option to have any PMU name (not just hybrid).
- Fix output value not to added when it runs multiple times with -r
option.
perf list:
- Show metricgroup description from JSON file called
metricgroups.json.
- Allow 'pfm' argument to list only libpfm4 events and check each
event is supported before showing it.
JSON vendor events:
- Avoid event grouping using "NO_GROUP_EVENTS" constraints. The
topdown events are correctly grouped even if no group exists.
- Add "Default" metric group to print it in the default output. And
use "DefaultMetricgroupName" to indicate the real metric group
name.
- Add AmpereOne core PMU events.
Misc:
- Define man page date correctly.
- Track exception level properly on ARM CoreSight ETM.
- Allow anonymous struct, union or enum when retrieving type names
from DWARF.
- Fix incorrect filename when calling `perf inject --jit`.
- Handle PLT size correctly on LoongArch"
* tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (269 commits)
perf test: Skip metrics w/o event name in stat STD output linter
perf test: Reorder event name checks in stat STD output linter
perf pmu: Remove a hard coded cpu PMU assumption
perf pmus: Add notion of default PMU for JSON events
perf unwind: Fix map reference counts
perf test: Set PERF_EXEC_PATH for script execution
perf script: Initialize buffer for regs_map()
perf tests: Fix test_arm_callgraph_fp variable expansion
perf symbol: Add LoongArch case in get_plt_sizes()
perf test: Remove x permission from lib/stat_output.sh
perf test: Rerun failed metrics with longer workload
perf test: Add skip list for metrics known would fail
perf test: Add metric value validation test
perf jit: Fix incorrect file name in DWARF line table
perf annotate: Fix instruction association and parsing for LoongArch
perf annotation: Switch lock from a mutex to a sharded_mutex
perf sharded_mutex: Introduce sharded_mutex
tools: Fix incorrect calculation of object size by sizeof
perf subcmd: Fix missing check for return value of malloc() in add_cmdname()
perf parse-events: Remove unneeded semicolon
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Rework & fix the event forwarding logic by extending the core
interface.
This fixes AMD PMU events that have to be forwarded from the
core PMU to the IBS PMU.
- Add self-tests to test AMD IBS invocation via core PMU events
- Clean up Intel FixCntrCtl MSR encoding & handling
* tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Re-instate the linear PMU search
perf/x86/intel: Define bit macros for FixCntrCtl MSR
perf test: Add selftest to test IBS invocation via core pmu events
perf/core: Remove pmu linear searching code
perf/ibs: Fix interface via core pmu events
perf/core: Rework forwarding of {task|cpu}-clock events
|
|
In the perf annotate view for LoongArch, there is no arrowed line
pointing to the target from the branch instruction. This issue is
caused by incorrect instruction association and parsing.
$ perf record alloc-6276705c94ad1398 # rust benchmark
$ perf report
0.28 │ ori $a1, $zero, 0x63
│ move $a2, $zero
10.55 │ addi.d $a3, $a2, 1(0x1)
│ sltu $a4, $a3, $s7
9.53 │ masknez $a4, $s7, $a4
│ sub.d $a3, $a3, $a4
12.12 │ st.d $a1, $fp, 24(0x18)
│ st.d $a3, $fp, 16(0x10)
16.29 │ slli.d $a2, $a2, 0x2
│ ldx.w $a2, $s8, $a2
12.77 │ st.w $a2, $sp, 724(0x2d4)
│ st.w $s0, $sp, 720(0x2d0)
7.03 │ addi.d $a2, $sp, 720(0x2d0)
│ addi.d $a1, $a1, -1(0xfff)
12.03 │ move $a2, $a3
│ → bne $a1, $s3, -52(0x3ffcc) # 82ce8 <test::bench::Bencher::iter+0x3f4>
2.50 │ addi.d $a0, $a0, 1(0x1)
This patch fixes instruction association issues, such as associating
branch instructions with jump_ops instead of call_ops, and corrects
false instruction matches. It also implements branch instruction parsing
specifically for LoongArch. With this patch, we will be able to see the
arrowed line.
0.79 │3ec: ori $a1, $zero, 0x63
│ move $a2, $zero
10.32 │3f4:┌─→addi.d $a3, $a2, 1(0x1)
│ │ sltu $a4, $a3, $s7
10.44 │ │ masknez $a4, $s7, $a4
│ │ sub.d $a3, $a3, $a4
14.17 │ │ st.d $a1, $fp, 24(0x18)
│ │ st.d $a3, $fp, 16(0x10)
13.15 │ │ slli.d $a2, $a2, 0x2
│ │ ldx.w $a2, $s8, $a2
11.00 │ │ st.w $a2, $sp, 724(0x2d4)
│ │ st.w $s0, $sp, 720(0x2d0)
8.00 │ │ addi.d $a2, $sp, 720(0x2d0)
│ │ addi.d $a1, $a1, -1(0xfff)
11.99 │ │ move $a2, $a3
│ └──bne $a1, $s3, 3f4
3.17 │ addi.d $a0, $a0, 1(0x1)
Signed-off-by: WANG Rui <wangrui@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: loongarch@lists.linux.dev
Cc: loongson-kernel@lists.loongnix.cn
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Link: https://lore.kernel.org/r/20230620132025.105563-1-wangrui@loongson.cn
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf mem/c2c on AMD internally uses IBS OP PMU, not the core PMU. Also,
AMD platforms does not have heterogeneous PMUs.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230615051700.1833-3-ravi.bangoria@amd.com
[ Added the improved comment for perf_pmus__num_mem_pmus() as b4 didn't from the per-patch (not series) newer version ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Found by leak sanitizer:
```
==1632594==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 21 byte(s) in 1 object(s) allocated from:
#0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
#1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369
#2 0x556701d70589 in perf_env__cpuid util/env.c:465
#3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14
#4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83
#5 0x556701d8f78b in evsel__config util/evsel.c:1366
#6 0x556701ef5872 in evlist__config util/record.c:108
#7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112
#8 0x556701cacd07 in run_test tests/builtin-test.c:236
#9 0x556701cacfac in test_and_print tests/builtin-test.c:265
#10 0x556701cadddb in __cmd_test tests/builtin-test.c:402
#11 0x556701caf2aa in cmd_test tests/builtin-test.c:559
#12 0x556701d3b557 in run_builtin tools/perf/perf.c:323
#13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377
#14 0x556701d3be90 in run_argv tools/perf/perf.c:421
#15 0x556701d3c3f8 in main tools/perf/perf.c:537
#16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s).
```
Fixes: f7b58cbdb3ff36eb ("perf mem/c2c: Add load store event mappings for AMD")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20230613235416.1650755-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There are multiple places where x86 specific code determines AMD vs
Intel arch and acts based on that. Consolidate those checks into a
single function.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20230613095506.547-3-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The "x86 instruction decoder - new instructions" test takes up space but
is only really useful to developers. Make it optional at build time.
Add variable EXTRA_TESTS which must be defined in order to build perf
with the test.
Example:
Before:
$ make -C tools/perf clean >/dev/null
$ make -C tools/perf >/dev/null
Makefile.config:650: No libunwind found. Please install libunwind-dev[el] >= 1.1 and/or set LIBUNWIND_DIR
Makefile.config:1149: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
PERF_VERSION = 6.4.rc3.gd15b8c76c964
$ readelf -SW tools/perf/perf | grep '\.rela.dyn\|.rodata\|\.data.rel.ro'
[10] .rela.dyn RELA 000000000002fcb0 02fcb0 0748b0 18 A 6 0 8
[18] .rodata PROGBITS 00000000002eb000 2eb000 6bac00 00 A 0 0 32
[25] .data.rel.ro PROGBITS 00000000009ea180 9e9180 04b540 00 WA 0 0 32
After:
$ make -C tools/perf clean >/dev/null
$ make -C tools/perf >/dev/null
Makefile.config:650: No libunwind found. Please install libunwind-dev[el] >= 1.1 and/or set LIBUNWIND_DIR
Makefile.config:1154: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
PERF_VERSION = 6.4.rc3.g4ea9c1569ea4
$ readelf -SW tools/perf/perf | grep '\.rela.dyn\|.rodata\|\.data.rel.ro'
[10] .rela.dyn RELA 000000000002f3c8 02f3c8 036d68 18 A 6 0 8
[18] .rodata PROGBITS 00000000002ac000 2ac000 68da80 00 A 0 0 32
[25] .data.rel.ro PROGBITS 000000000097d440 97c440 022280 00 WA 0 0 32
Committer notes:
Build with 'make EXTRA_TESTS=1 -C tools/perf O=/tmp/build/perf" and
reproduced the ELF section size differences.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/683fea7c-f5e9-fa20-f96b-f6233ed5d2a7@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Using accessors will make it easier to add reference count checking in
later patches.
Committer notes:
thread->nsinfo wasn't wrapped as it is used together with
nsinfo__zput(), where does a trick to set the field with a refcount
being dropped to NULL, and that doesn't work well with using
thread__nsinfo(thread), that loses the &thread->nsinfo pointer.
When refcount checking is added to 'struct thread', later in this
series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to
check the thread pointer.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Brian Robbins <brianrob@linux.microsoft.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Ye Xingchen <ye.xingchen@zte.com.cn>
Cc: Yuan Can <yuancan@huawei.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230608232823.4027869-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now the suffix is handled in the general code. Let's get rid of them.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230524205054.3087004-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In order to print the numerical entries of the syscall table,
there is no need to call the host compiler to build and then
run a program, this can be done directly by the shell script.
This is similar with commit 9854e7ad35fe ("perf arm64: Simplify
mksyscalltbl"). For now, the mksyscalltbl file of LoongArch is
almost same with arm64.
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongarch@lists.linux.dev
Link: https://lore.kernel.org/r/1685441401-8709-6-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Like x86, powerpc, mips and s390, use max_nr which is a digital
number to define SYSCALLTBL_ARM64_MAX_ID.
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongarch@lists.linux.dev
Link: https://lore.kernel.org/r/1685441401-8709-5-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
After commit 9854e7ad35fe ("perf arm64: Simplify mksyscalltbl"),
in the generated syscall table file syscalls.c, there exist some
__NR3264_ prefixed syscall numbers such as [__NR3264_ftruncate],
it looks like not so good, just do some small filter operations
to handle __NR3264_ prefixed syscall number as a digital number.
Without this patch:
[__NR3264_ftruncate] = "ftruncate",
With this patch:
[46] = "ftruncate",
Suggested-by: Alexander Kapshuk <alexander.kapshuk@gmail.com>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongarch@lists.linux.dev
Link: https://lore.kernel.org/r/1685441401-8709-4-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
After commit 9854e7ad35fecf30 ("perf arm64: Simplify mksyscalltbl") it
has been removed the temporary C program and used shell to generate
syscall table, so let us rename create_table_from_c() to
create_sc_table() to avoid confusion.
Suggested-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongarch@lists.linux.dev
Link: https://lore.kernel.org/r/1685441401-8709-3-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
syscalltbl_*[] should never be changing, let us declare it as const.
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongarch@lists.linux.dev
Link: https://lore.kernel.org/r/1685441401-8709-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf tools fixes for v6.4: 2nd batch
- Fix BPF CO-RE naming convention for checking the availability of fields on
'union perf_mem_data_src' on the running kernel.
- Remove the use of llvm-strip on BPF skel object files, not needed, fixes a
build breakage when the llvm package, that contains it in most distros, isn't
installed.
- Fix tools that use both evsel->{bpf_counter_list,bpf_filters}, removing them from a
union.
- Remove extra "--" from the 'perf ftrace latency' --use-nsec option,
previously it was working only when using the '-n' alternative.
- Don't stop building when both binutils-devel and a C++ compiler isn't
available to compile the alternative C++ demangle support code, disable that
feature instead.
- Sync the linux/in.h and coresight-pmu.h header copies with the kernel sources.
- Fix relative include path to cs-etm.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf_pmus__have_event()
Missed function rename from pmu_have_event to perf_pmus__have_event made
the perf build fail on powerpc.
Committer notes:
The perf_pmus__have_event() is declared in util/pmus.h, so use it
instead of by now needless util/pmu.h.
Fixes: 1eaf496ed386934f ("perf pmu: Separate pmu and pmus")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230530021433.3107580-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This allows the movement of 5,808 bytes from .data to .rodata.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230526183401.2326121-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This allows the movement of some sizeable data arrays (168,624 bytes) to
.data.relro. Without PIE or the strings it could be moved to .rodata.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230526183401.2326121-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf_pmus__has_hybrid was used to detect when there was >1 core PMU,
this can be achieved with perf_pmus__num_core_pmus that doesn't depend
upon is_pmu_hybrid and PMU name comparisons. When modifying the
function calls take the opportunity to improve comments,
enable/simplify tests that were previously failing for hybrid but now
pass and to simplify generic code.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-34-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Scanning all PMUs is expensive as all PMUs sysfs entries are loaded,
benchmarking shows more than 4x the cost:
```
$ perf bench internals pmu-scan -i 1000
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 989.231 usec (+- 1.535 usec)
Average PMU scanning took: 4309.425 usec (+- 74.322 usec)
```
Add new perf_pmus__scan_core routine that scans just core
PMUs. Replace perf_pmus__scan calls with perf_pmus__scan_core when
non-core PMUs are being ignored.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-30-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Separate and hide the pmus list in pmus.[ch]. Move pmus functionality
out of pmu.[ch] into pmus.[ch] renaming pmus functions which were
prefixed perf_pmu__ to perf_pmus__.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-28-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Find the PMU and then the event off of it.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-27-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Rather than iterating over a separate hybrid list, iterate all PMUs
with the hybrid ones having is_core as true.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-18-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
__evlist__add_default adds a cycles event to a typically empty evlist
and was extended for hybrid with evlist__add_default_hybrid, as more
than 1 PMU was necessary. Rather than have dedicated logic for the
cycles event, this change switches to parsing 'cycles:P' which will
handle wildcarding the PMUs appropriately for hybrid.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-14-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Change "../cs-etm.h" to just "../../../util/cs-etm.h" as ../cs-etm.h
doesn't exist.
Suggested-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230515165039.544045-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up fixes that were already merged upstream.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|