Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch parses `owner_lock_stat` into a RB tree, enabling ordered
reporting of owner lock statistics with stack traces. It also updates
the documentation for the `-o` option in contention mode, decouples `-o`
from `-t`, and issues a warning to inform users about the new behavior
of `-ov`.
Example output:
$ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex-spin -E3 perf bench sched pipe
...
contended total wait max wait avg wait type caller
171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57
0xffffffffac6318e7 pipe_read+0x57
0xffffffffac623862 vfs_read+0x332
0xffffffffac62434b ksys_read+0xbb
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50
0xffffffffac631ee0 pipe_write+0x50
0xffffffffac6241db vfs_write+0x3bb
0xffffffffac6244ab ksys_write+0xbb
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d
0xffffffffac691f0d do_epoll_wait+0x24d
0xffffffffac69249b do_epoll_pwait.part.0+0xb
0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
=== owner stack trace ===
3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348
0xffffffffac631bd8 pipe_read+0x348
0xffffffffac623862 vfs_read+0x332
0xffffffffac62434b ksys_read+0xbb
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
...
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-5-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This implements per-callstack aggregation of lock owners in addition to
per-thread. The owner callstack is captured using `bpf_get_task_stack()`
at `contention_begin()` and it also adds a custom stackid function for the
owner stacks to be compared easily.
The owner info is kept in a hash map using lock addr as a key to handle
multiple waiters for the same lock. At `contention_end()`, it updates the
owner lock stat based on the info that was saved at `contention_begin()`.
If there are more waiters, it'd update the owner pid to itself as
`contention_end()` means it gets the lock now. But it also needs to check
the return value of the lock function in case task was killed by a signal
or something.
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-3-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a struct and few bpf maps in order to tracing owner stack.
`struct owner_tracing_data`: Contains owner's pid, stack id, timestamp for
when the owner acquires lock, and the count of lock waiters.
`stack_buf`: Percpu buffer for retrieving owner stacktrace.
`owner_stacks`: For tracing owner stacktrace to customized owner stack id.
`owner_data`: For tracing lock_address to `struct owner_tracing_data` in
bpf program.
`owner_stat`: For reporting owner stacktrace in usermode.
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-2-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Fewer than 32k logical CPUs are currently supported by perf. A cpumap
is indexed by an integer (see perf_cpu_map__cpu) yielding a perf_cpu
that wraps a 4-byte int for the logical CPU - the wrapping is done
deliberately to avoid confusing a logical CPU with an index into a
cpumap. Using a 4-byte int within the perf_cpu is larger than required
so this patch reduces it to the 2-byte int16_t. For a cpumap
containing 16 entries this will reduce the array size from 64 to 32
bytes. For very large servers with lots of logical CPUs the size
savings will be greater.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250210191231.156294-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
After pmu_add_cpu_aliases() is called, perf_pmu__num_events() returns an
incorrect value that double counts common events and doesn't match the
actual count of events in the alias list. This is because after
'cpu_aliases_added == true', the number of events returned is
'sysfs_aliases + cpu_json_aliases'. But when adding 'case
EVENT_SRC_SYSFS' events, 'sysfs_aliases' and 'cpu_json_aliases' are both
incremented together, failing to account that these ones overlap and
only add a single item to the list. Fix it by adding another counter for
overlapping events which doesn't influence 'cpu_json_aliases'.
There doesn't seem to be a current issue because it's used in perf list
before pmu_add_cpu_aliases() so the correct value is returned. Other
uses in tests may also miss it for other reasons like only looking at
uncore events. However it's marked as a fixes commit in case any new fix
with new uses of perf_pmu__num_events() is backported.
Fixes: d9c5f5f94c2d ("perf pmu: Count sys and cpuid JSON events separately")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250226104111.564443-3-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf_pmus__destroy() treats all PMUs as allocated and free's them so we
can't have any static PMUs that are added to the PMU lists. Fix it by
allocating the tool PMU in the same way as the others. Current users of
the tool PMU already use find_pmu() and not perf_pmus__tool_pmu(), so
rename the function to add 'new' to avoid it being misused in the
future.
perf_pmus__fake_pmu() can remain as static as it's not added to the
PMU lists.
Fixes the following error:
$ perf bench internals pmu-scan
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
munmap_chunk(): invalid pointer
Aborted (core dumped)
Fixes: 240505b2d0ad ("perf tool_pmu: Factor tool events into their own PMU")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250226104111.564443-2-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Perf probe on vfs_fstatat fails as below on a powerpc system
$ ./perf probe -nf --max-probes=512 -a 'vfs_fstatat $params'
Segmentation fault (core dumped)
This is observed while running perftool-testsuite_probe testcase.
While running with verbose, its observed that segfault happens
at:
synthesize_probe_trace_arg ()
synthesize_probe_trace_command ()
probe_file.add_event ()
apply_perf_probe_events ()
__cmd_probe ()
cmd_probe ()
run_builtin ()
handle_internal_command ()
main ()
Code in synthesize_probe_trace_arg() access a null value and results in
segfault. Data structure which is null:
struct probe_trace_arg arg->value
We are hitting a case where arg->value is null in probe point:
"vfs_fstatat $params". This is happening since 'commit e896474fe485
("getname_maybe_null() - the third variant of pathname copy-in")'
Before the commit, probe point for vfs_fstatat was getting added only
for one location:
Writing event: p:probe/vfs_fstatat _text+6345404 dfd=%gpr3:s32 filename=%gpr4:x64 stat=%gpr5:x64 flags=%gpr6:s32
With this change, vfs_fstatat code is inlined for other locations in the
code:
Probe point found: __do_sys_lstat64+48
Probe point found: __do_sys_stat64+48
Probe point found: __do_sys_newlstat+48
Probe point found: __do_sys_newstat+48
Probe point found: vfs_fstatat+0
When trying to find matching dwarf information entry (DIE)
from the debuginfo, the code incorrectly picks DIE which is
not referring to vfs_fstatat. Snippet from dwarf entry in vmlinux
debuginfo file.
The main abstract die is:
<1><4214883>: Abbrev Number: 147 (DW_TAG_subprogram)
<4214885> DW_AT_external : 1
<4214885> DW_AT_name : (indirect string, offset: 0x17b9f3): vfs_fstatat
With formal parameters:
<2><4214896>: Abbrev Number: 51 (DW_TAG_formal_parameter)
<4214897> DW_AT_name : dfd
<2><42148a3>: Abbrev Number: 23 (DW_TAG_formal_parameter)
<42148a4> DW_AT_name : (indirect string, offset: 0x8fda9): filename
<2><42148b0>: Abbrev Number: 23 (DW_TAG_formal_parameter)
<42148b1> DW_AT_name : (indirect string, offset: 0x16bd9c): stat
<2><42148bd>: Abbrev Number: 23 (DW_TAG_formal_parameter)
<42148be> DW_AT_name : (indirect string, offset: 0x39832b): flags
While collecting variables/parameters for a probe point, the function
copy_variables_cb() also looks at dwarf debug entries based on the
instruction address. Snippet
if (dwarf_haspc(die_mem, vf->pf->addr))
return DIE_FIND_CB_CONTINUE;
else
return DIE_FIND_CB_SIBLING;
But incase of inlined function instance for vfs_fstatat, there are two
entries which has the instruction address entry point as same.
Instance 1: which is for vfs_fstatat and DW_AT_abstract_origin points to
0x4214883 (reference above for main abstract die)
<3><42131fa>: Abbrev Number: 59 (DW_TAG_inlined_subroutine)
<42131fb> DW_AT_abstract_origin: <0x4214883>
<42131ff> DW_AT_entry_pc : 0xc00000000062b1e0
Instance 2: which is not for vfs_fstatat but for getname
<5><4213270>: Abbrev Number: 39 (DW_TAG_inlined_subroutine)
<4213271> DW_AT_abstract_origin: <0x4215b6b>
<4213275> DW_AT_entry_pc : 0xc00000000062b1e0
But the copy_variables_cb() continues to add parameters from second
instance also based on the dwarf_haspc() check. This results in
formal parameters for getname also appended to params. But while
filling in the args->value for these parameters, since these args
are not part of dwarf with offset "42131fa". Hence value will be
null. This incorrect args results in segfault when value field is
accessed.
Save the dwarf dieoffset of the actual DW_TAG_subprogram as part of
"struct probe_finder". In copy_variables_cb(), include check to make
sure the DW_AT_abstract_origin points to the correct entry if the
dwarf_haspc() matches the instruction address.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250225123042.37263-1-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Especially while using several buckets, it isn't uncommon to have some
of them empty and reading the histogram may be a bit more complex:
# perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency 200
# DURATION | COUNT | GRAPH |
0 - 5 us | 14816 | ###################################### |
5 - 10 us | 1228 | ### |
10 - 15 us | 438 | # |
15 - 20 us | 106 | |
20 - 25 us | 21 | |
25 - 30 us | 11 | |
30 - 35 us | 1 | |
35 - 40 us | 2 | |
40 - 45 us | 4 | |
45 - 50 us | 0 | |
50 - 55 us | 1 | |
55 - 60 us | 0 | |
60 - 65 us | 1 | |
65 - 70 us | 1 | |
70 - 75 us | 1 | |
75 - 80 us | 2 | |
80 - 85 us | 0 | |
85 - 90 us | 1 | |
90 - 95 us | 0 | |
95 - 100 us | 1 | |
100 - 105 us | 0 | |
105 - 110 us | 0 | |
110 - 115 us | 0 | |
115 - 120 us | 0 | |
120 - 125 us | 1 | |
125 - 130 us | 0 | |
130 - 135 us | 0 | |
135 - 140 us | 1 | |
140 - 145 us | 0 | |
145 - 150 us | 0 | |
150 - 155 us | 0 | |
155 - 160 us | 0 | |
160 - 165 us | 0 | |
165 - 170 us | 0 | |
170 - 175 us | 0 | |
175 - 180 us | 0 | |
180 - 185 us | 0 | |
185 - 190 us | 0 | |
190 - 195 us | 0 | |
195 - 200 us | 0 | |
200 - ... us | 2 | |
Allow the optional flag --hide-empty to remove buckets with no element
and produce a more compact graph. This feature could be misleading since
there is no clear indication for missing buckets, for this reason it's
disabled by default.
# perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency --hide-empty 200
# DURATION | COUNT | GRAPH |
0 - 5 us | 14816 | ###################################### |
5 - 10 us | 1228 | ### |
10 - 15 us | 438 | # |
15 - 20 us | 106 | |
20 - 25 us | 21 | |
25 - 30 us | 11 | |
30 - 35 us | 1 | |
35 - 40 us | 2 | |
40 - 45 us | 4 | |
50 - 55 us | 1 | |
60 - 65 us | 1 | |
65 - 70 us | 1 | |
70 - 75 us | 1 | |
75 - 80 us | 2 | |
85 - 90 us | 1 | |
95 - 100 us | 1 | |
120 - 125 us | 1 | |
135 - 140 us | 1 | |
200 - ... us | 2 | |
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250207080446.77630-2-gmonaco@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The max-latency value can make the histogram smaller, but not larger, we
have a maximum of 22 buckets and specifying a max-latency that would
require more buckets has no effect.
Dynamically allocate the buckets and compute the bucket number from the
max latency as (max-min) / range + 2
If the maximum is not specified, we still set the bucket number to 22
and compute the maximum accordingly.
Fail if the maximum is smaller than min+range, this way we make sure we
always have 3 buckets: those below min, those above max and one in the
middle.
Since max-latency is not available in log2 mode, always use 22 buckets.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250207080446.77630-1-gmonaco@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Sometimes compiler generates code to use the stack pointer register
without frame pointer. As we know RSP is the stack register on x86,
let's treat it as same as fbreg. But the offset would be opposite
direction so update the debug message accordingly.
Reported-by: Blake Jones <blakejones@google.com>
Link: https://lore.kernel.org/r/20250126210242.1181225-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Currently, stats->nr_samples is incremented per entry in the branch stack
instead of per sample taken. As a result, statistics of samples taken
during perf record in --branch-filter or --branch-any mode does not
seem correct. Instead call hists__inc_nr_samples() for each sample taken
instead of for each entry in the branch stack.
Before:
$ ./perf record -e cycles:u -b -c 10000000000 ./tchain_edit
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.005 MB perf.data (2 samples) ]
$ perf report -D | tail -n 16
Aggregated stats:
TOTAL events: 16
COMM events: 2 (12.5%)
EXIT events: 1 ( 6.2%)
SAMPLE events: 2 (12.5%)
MMAP2 events: 2 (12.5%)
KSYMBOL events: 1 ( 6.2%)
FINISHED_ROUND events: 1 ( 6.2%)
ID_INDEX events: 1 ( 6.2%)
THREAD_MAP events: 1 ( 6.2%)
CPU_MAP events: 1 ( 6.2%)
EVENT_UPDATE events: 2 (12.5%)
TIME_CONV events: 1 ( 6.2%)
FINISHED_INIT events: 1 ( 6.2%)
cpu_core/cycles/u stats:
SAMPLE events: 64
After:
$ ./perf report -D | tail -n 16
Aggregated stats:
TOTAL events: 16
COMM events: 2 (12.5%)
EXIT events: 1 ( 6.2%)
SAMPLE events: 2 (12.5%)
MMAP2 events: 2 (12.5%)
KSYMBOL events: 1 ( 6.2%)
FINISHED_ROUND events: 1 ( 6.2%)
ID_INDEX events: 1 ( 6.2%)
THREAD_MAP events: 1 ( 6.2%)
CPU_MAP events: 1 ( 6.2%)
EVENT_UPDATE events: 2 (12.5%)
TIME_CONV events: 1 ( 6.2%)
FINISHED_INIT events: 1 ( 6.2%)
cpu_core/cycles/u stats:
SAMPLE events: 2
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250220045942.114965-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than copying the path and appending the directory entry in a
fresh path buffer, append to the path at the end of where it is for
the recursion level. This saves a PATH_MAX buffer per recursion level
and some unnecessary copying.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This avoids scanddir reading the directory into memory that's
allocated and instead allocates on the stack.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250222061015.303622-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Switch memory_node__read and build_mem_topology from opendir/readdir
to io_dir__readdir, with smaller stack allocations. Reduces peak
memory consumption of perf record by 10kb.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Compared to glibc's opendir/readdir this lowers the max RSS of perf
record by 1.8MB on a Debian machine.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Prior to commit 70c90e4a6b2f ("perf parse-events: Avoid scanning PMUs
before parsing") names (generally event names) excluded hyphen (minus)
symbols as the formation of legacy names with hyphens was handled in
the yacc code. That commit allowed hyphens supposedly making
name_minus unnecessary. However, changing name_minus to name has
issues in the term config tokens as then name ends up having priority
over numbers and name allows matching numbers since commit
5ceb57990bf4 ("perf parse: Allow tracepoint names to start with digits
"). It is also permissable for a name to match with a colon (':') in
it when its in a config term list. To address this rename name_minus
to term_name, make the pattern match name's except for the colon, add
number matching into the config term region with a higher priority
than name matching. This addresses an inconsistency and allows greater
matching for names inside of term lists, for example, they may start
with a number.
Rename name_tag to quoted_name and update comments and helper
functions to avoid str detecting quoted strings which was already done
by the lexer.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250109175401.161340-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When testing perf trace on NixOS, I noticed significant startup delays:
- `ls`: ~2ms
- `strace ls`: ~10ms
- `perf trace ls`: ~550ms
Profiling showed that 51% of the time is spent reading files,
26% in loading BPF programs, and 11% in `newfstatat`.
This patch optimizes module path exploration by avoiding `stat()` calls
unless necessary. For filesystems that do not implement `d_type`
(DT_UNKNOWN), it falls back to the old behavior.
See `readdir(3)` for details.
This reduces `perf trace ls` time to ~500ms.
A more thorough startup optimization based on command parameters would
be ideal, but that is a larger effort.
Signed-off-by: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com>
Acked-by: Howard Chu <howardchu95@gmail.com>
Link: https://lore.kernel.org/r/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In sysfs, the perf events are all located in
/sys/bus/event_source/devices/ but some places ended up hard-coding the
location to be at the root of /sys/devices/ which could be very risky as
you do not exactly know what type of device you are accessing in sysfs
at that location.
So fix this all up by properly pointing everything at the bus device
list instead of the root of the sysfs devices/ tree.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/2025021955-implant-excavator-179d@gregkh
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Reorder the struct fields by size to reduce paddings and reduce
struct simd_flags size from 8 to 1 byte.
This reduces struct hist_entry size by 8 bytes (592->584),
and leaves a single more usable 6 byte padding hole.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/7c1cb1c8f9901e945162701ba7269d0f9c70be89.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add record/report --latency flag that allows to capture and show
latency-centric profiles rather than the default CPU-consumption-centric
profiles. For latency profiles record captures context switch events,
and report shows Latency as the first column.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/e9640464bcbc47dde2cb557003f421052ebc9eec.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Latency output field is similar to overhead, but represents overhead for
latency rather than CPU consumption. It's re-scaled from overhead by dividing
weight by the current parallelism level at the time of the sample.
It effectively models profiling with 1 sample taken per unit of wall-clock
time rather than unit of CPU time.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/b6269518758c2166e6ffdc2f0e24cfdecc8ef9c1.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add parallelism filter that can be used to look at specific parallelism
levels only. The format is the same as cpu lists. For example:
Only single-threaded samples: --parallelism=1
Low parallelism only: --parallelism=1-4
High parallelism only: --parallelism=64-128
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/e61348985ff0a6a14b07c39e880edbd60a8f8635.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
We already have all u8 bits taken, adding one more filter leads to unpleasant
failure mode, where code compiles w/o warnings, but the last filters silently
don't work. Add a typedef and switch to u16.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/32b4ce1731126c88a2d9e191dc87e39ae4651cb7.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Show parallelism level in profiles if requested by user.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/7f7bb87cbaa51bf1fb008a0d68b687423ce4bad4.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add calculation of the current parallelism level (number of threads actively
running on CPUs). The parallelism level can be shown in reports on its own,
and to calculate latency overheads.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/0f8c1b8eb12619029e31b3d5c0346f4616a5aeda.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The struct dump_regs contains 512 bytes of cache_regs, meaning the two
values in perf_sample contribute 1088 bytes of its total 1384 bytes
size. Initializing this much memory has a cost reported by Tavian
Barnes <tavianator@tavianator.com> as about 2.5% when running `perf
script --itrace=i0`:
https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/
Adrian Hunter <adrian.hunter@intel.com> replied that the zero
initialization was necessary and couldn't simply be removed.
This patch aims to strike a middle ground of still zeroing the
perf_sample, but removing 79% of its size by make user_regs and
intr_regs optional pointers to zalloc-ed memory. To support the
allocation accessors are created for user_regs and intr_regs. To
support correct cleanup perf_sample__init and perf_sample__exit
functions are created and added throughout the code base.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250113194345.1537821-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
I found that it failed to load a binary using --symfs option. Say I
have a binary in /home/user/prog/xxx and a perf data file with it. If I
move them to a different machine and use --symfs, it tries to find the
binary in some locations under symfs using dso__read_binary_type_filename(),
but not the last one.
${symfs}/usr/lib/debug/home/user/prog/xxx.debug
${symfs}/usr/lib/debug/home/user/prog/xxx
${symfs}/home/user/prog/.debug/xxx
/home/user/prog/xxx
It should check ${symfs}/home/usr/prog/xxx. Let's fix it.
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250212221445.437481-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It was only used in perf trace and it switched to use hashmap instead.
Let's delete the code.
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250205205443.1986408-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Some topdown related metrics may fail on hybrid machines.
$ perf stat -M tma_frontend_bound
Cannot resolve IDs for tma_frontend_bound:
cpu_atom@TOPDOWN_FE_BOUND.ALL@ / (8 * cpu_atom@CPU_CLK_UNHALTED.CORE@)
In the find_tool_events(), the tool_pmu__event_to_str() is used to
compare the tool_events. It only checks the event name, no PMU or arch.
So the tool_events[TOOL_PMU__EVENT_SLOTS] is set to true, because the
p-core Topdown metrics has "slots" event.
The tool_events is shared. So when parsing the e-core metrics, the
"slots" is automatically added.
The "slots" event as a tool event should only be available on arm64. It
has a different meaning on X86. The tool_pmu__skip_event() intends
handle the case. Apply it for tool_pmu__event_to_str() as well.
There is a lack of sanity check in the expr__get_id(). Add the check.
Closes: https://lore.kernel.org/lkml/608077bc-4139-4a97-8dc4-7997177d95c4@linux.intel.com/
Fixes: 069057239a67 ("perf tool_pmu: Move expr literals to tool_pmu")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: thomas.falcon@intel.com
Link: https://lore.kernel.org/r/20250207152844.302167-1-kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The last use of machine__fprintf_vmlinux_path() was removed in 2011 by
commit ab81f3fd350c ("perf top: Reuse the 'report' hist_entry/hists
classes")
mmap_cpu_mask__duplicate() was added in 2021 by
commit 6bd006c6eb7f ("perf mmap: Introduce mmap_cpu_mask__duplicate()")
but hasn't been used since.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Tested-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250204220545.456435-1-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To get the various fixes in the current master.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The existing logic would disable uniquification on an evlist or enable
it per evsel, this is unfortunate as uniquification is most needed
when events have the same name and so the whole evlist must be
considered. Change the initial disable uniquify on an evlist
processing to also set a needs_uniquify flag, for cases like the
matching event names. This must be done as an initial pass as
uniquification of an event name will change the behavior of the
check. Keep the per counter uniquification but now only uniquify event
names when the needs_uniquify flag is set.
Before this change a hwmon like temp1 wouldn't be uniquified and
afterwards it will (ie the PMU is added to the temp1 event's name).
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250201074320.746259-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Counter merging was added in commit 942c5593393d ("perf stat: Add
perf_stat_merge_counters()"), however, it merges events with the same
name on different PMUs regardless of whether the different PMUs are
actually of the same type (ie they differ only in the suffix on the
PMU). For hwmon events there may be a temp1 event on every PMU, but
the PMU names are all unique and don't differ just by a suffix. The
merging is over eager and will merge all the hwmon counters together
meaning an aggregated and very large temp1 value is shown. The same
would be true for say cache events and memory controller events where
the PMUs differ but the event names are the same.
Fix the problem by correctly saying two PMUs alias when they differ
only by suffix.
Note, there is an overlap with evsel's merged_stat with aggregation
and the evsel's metric_leader where aggregation happens for metrics.
Fixes: 942c5593393d ("perf stat: Add perf_stat_merge_counters()")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250201074320.746259-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Wildcard PMU naming will match a name like pmu_1 to a PMU name like
pmu_10 but not to a PMU name like pmu_2 as the suffix forms part of
the match. No suffix matching will match pmu_10 to either pmu_1 or
pmu_2. Add or rename matching functions on PMU to make it clearer what
kind of matching is being performed.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250201074320.746259-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than scanning core or all PMUs, allow pmu_read_sysfs to read
some combination of core, other, hwmon and tool PMUs. The PMUs that
should be read and are already read are held as bitmaps. It is known
that a "hwmon_" prefix is necessary for a hwmon PMU's name, similarly
with "tool", so only scan those PMUs in situations the PMU name or the
PMU's type number make sense to.
The number of openat system calls reduces from 276 to 98 for a hwmon
event. The number of openats for regular perf events isn't changed.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250201074320.746259-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
evsel__is_hybrid returns true if there are multiple core PMUs and the
evsel is for a core PMU. Determining the number of core PMUs can
require loading/scanning PMUs. There's no point doing the scanning if
evsel for the is_hybrid test isn't core so reorder the tests to reduce
PMU scanning.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250201074320.746259-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf test 11 hwmon fails on s390 with this error
# ./perf test -Fv 11
--- start ---
---- end ----
11.1: Basic parsing test : Ok
--- start ---
Testing 'temp_test_hwmon_event1'
Using CPUID IBM,3931,704,A01,3.7,002f
temp_test_hwmon_event1 -> hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/
FAILED tests/hwmon_pmu.c:189 Unexpected config for
'temp_test_hwmon_event1', 292470092988416 != 655361
---- end ----
11.2: Parsing without PMU name : FAILED!
--- start ---
Testing 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/'
FAILED tests/hwmon_pmu.c:189 Unexpected config for
'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/',
292470092988416 != 655361
---- end ----
11.3: Parsing with PMU name : FAILED!
#
The root cause is in member test_event::config which is initialized
to 0xA0001 or 655361. During event parsing a long list event parsing
functions are called and end up with this gdb call stack:
#0 hwmon_pmu__config_term (hwm=0x168dfd0, attr=0x3ffffff5ee8,
term=0x168db60, err=0x3ffffff81c8) at util/hwmon_pmu.c:623
#1 hwmon_pmu__config_terms (pmu=0x168dfd0, attr=0x3ffffff5ee8,
terms=0x3ffffff5ea8, err=0x3ffffff81c8) at util/hwmon_pmu.c:662
#2 0x00000000012f870c in perf_pmu__config_terms (pmu=0x168dfd0,
attr=0x3ffffff5ee8, terms=0x3ffffff5ea8, zero=false,
apply_hardcoded=false, err=0x3ffffff81c8) at util/pmu.c:1519
#3 0x00000000012f88a4 in perf_pmu__config (pmu=0x168dfd0, attr=0x3ffffff5ee8,
head_terms=0x3ffffff5ea8, apply_hardcoded=false, err=0x3ffffff81c8)
at util/pmu.c:1545
#4 0x00000000012680c4 in parse_events_add_pmu (parse_state=0x3ffffff7fb8,
list=0x168dc00, pmu=0x168dfd0, const_parsed_terms=0x3ffffff6090,
auto_merge_stats=true, alternate_hw_config=10)
at util/parse-events.c:1508
#5 0x00000000012684c6 in parse_events_multi_pmu_add (parse_state=0x3ffffff7fb8,
event_name=0x168ec10 "temp_test_hwmon_event1", hw_config=10,
const_parsed_terms=0x0, listp=0x3ffffff6230, loc_=0x3ffffff70e0)
at util/parse-events.c:1592
#6 0x00000000012f0e4e in parse_events_parse (_parse_state=0x3ffffff7fb8,
scanner=0x16878c0) at util/parse-events.y:293
#7 0x00000000012695a0 in parse_events__scanner (str=0x3ffffff81d8
"temp_test_hwmon_event1", input=0x0, parse_state=0x3ffffff7fb8)
at util/parse-events.c:1867
#8 0x000000000126a1e8 in __parse_events (evlist=0x168b580,
str=0x3ffffff81d8 "temp_test_hwmon_event1", pmu_filter=0x0,
err=0x3ffffff81c8, fake_pmu=false, warn_if_reordered=true,
fake_tp=false) at util/parse-events.c:2136
#9 0x00000000011e36aa in parse_events (evlist=0x168b580,
str=0x3ffffff81d8 "temp_test_hwmon_event1", err=0x3ffffff81c8)
at /root/linux/tools/perf/util/parse-events.h:41
#10 0x00000000011e3e64 in do_test (i=0, with_pmu=false, with_alias=false)
at tests/hwmon_pmu.c:164
#11 0x00000000011e422c in test__hwmon_pmu (with_pmu=false)
at tests/hwmon_pmu.c:219
#12 0x00000000011e431c in test__hwmon_pmu_without_pmu (test=0x1610368
<suite.hwmon_pmu>, subtest=1) at tests/hwmon_pmu.c:23
where the attr::config is set to value 292470092988416 or 0x10a0000000000
in line 625 of file ./util/hwmon_pmu.c:
attr->config = key.type_and_num;
However member key::type_and_num is defined as union and bit field:
union hwmon_pmu_event_key {
long type_and_num;
struct {
int num :16;
enum hwmon_type type :8;
};
};
s390 is big endian and Intel is little endian architecture.
The events for the hwmon dummy pmu have num = 1 or num = 2 and
type is set to HWMON_TYPE_TEMP (which is 10).
On s390 this assignes member key::type_and_num the value of
0x10a0000000000 (which is 292470092988416) as shown in above
trace output.
Fix this and export the structure/union hwmon_pmu_event_key
so the test shares the same implementation as the event parsing
functions for union and bit fields. This should avoid
endianess issues on all platforms.
Output after:
# ./perf test -F 11
11.1: Basic parsing test : Ok
11.2: Parsing without PMU name : Ok
11.3: Parsing with PMU name : Ok
#
Fixes: 531ee0fd4836 ("perf test: Add hwmon "PMU" test")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250131112400.568975-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This is also used in util/comm.c now, so instead of selectively doing
the feature test, always do it. If it's ever used anywhere else it's
less likely to cause another build failure.
This doesn't remove the need to manually include libc_compat.h, and
missing that will still cause an error for glibc < 2.26. There isn't a
way to fix that without poisoning reallocarray like libbpf did, but that
has other downsides like making memory debugging tools less useful. So
for Perf keep it like this and we'll have to fix up any missed includes.
Fixes the following build error:
util/comm.c:152:31: error: implicit declaration of function
'reallocarray' [-Wimplicit-function-declaration]
152 | tmp = reallocarray(comm_strs->strs,
| ^~~~~~~~~~~~
Fixes: 13ca628716c6 ("perf comm: Add reference count checking to 'struct comm_str'")
Reported-by: Ali Utku Selen <ali.utku.selen@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250129154405.777533-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Legacy events typically don't have a PMU when added leading to
mismatched legacy/non-legacy cases in find_stat. Use evsel__find_pmu
to make sure the evsel PMU is looked up. Update the evsel__find_pmu
code to look for the PMU using the extended config type or, for legacy
hardware/hw_cache events on non-hybrid systems, just use the core PMU.
Before:
```
$ perf stat -e cycles,cpu/instructions/ -a sleep 1
Performance counter stats for 'system wide':
215,309,764 cycles
44,326,491 cpu/instructions/
1.002555314 seconds time elapsed
```
After:
```
$ perf stat -e cycles,cpu/instructions/ -a sleep 1
Performance counter stats for 'system wide':
990,676,332 cycles
1,235,762,487 cpu/instructions/ # 1.25 insn per cycle
1.002667198 seconds time elapsed
```
Fixes: 3612ca8e2935 ("perf stat: Fix the hard-coded metrics calculation on the hybrid")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250109222109.567031-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add helper to get the name of the evsel's PMU. This handles the case
where there's no sysfs PMU via parse_events event_type helper.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250109222109.567031-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now that filename__read_int() returns -errno instead of -1 these
statements need to be updated otherwise error values will be used as
die IDs.
This appears as a -2 die ID when the platform doesn't export one:
$ perf stat --per-core -a -- true
S36-D-2-C0 1 9.45 msec cpu-clock
And the session topology test fails:
$ perf test -vvv topology
CPU 0, core 0, socket 36
CPU 1, core 1, socket 36
CPU 2, core 2, socket 36
CPU 3, core 3, socket 36
FAILED tests/topology.c:137 Cpu map - Die ID doesn't match
---- end(-1) ----
38: Session topology : FAILED!
Fixes: 05be17eed774 ("tool api fs: Correctly encode errno for read/write open failures")
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241218115552.912517-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Prior to this change a string was used which could cause issues with
an unrecognized disassembler in symbol__disassembler. Change to
initializing an array of perf_disassembler enum values. If a value
already exists then adding it a second time is ignored to avoid array
out of bounds problems present in the previous code, it also allows a
statically sized array and removes memory allocation needs. Errors in
the disassembler string are reported when the config is parsed during
perf annotate or perf top start up. If the array is uninitialized
after processing the config file the default llvm, capstone then
objdump values are added but without a need to parse a string.
Fixes: a6e8a58de629 ("perf disasm: Allow configuring what disassemblers to use")
Closes: https://lore.kernel.org/lkml/CAP-5=fUdfCyxmEiTpzS2uumUp3-SyQOseX2xZo81-dQtWXj6vA@mail.gmail.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250124043856.1177264-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
As reported by Namhyung Kim and acknowledged by Qiao Zhao (link:
https://lore.kernel.org/linux-perf-users/20241206001436.1947528-1-namhyung@kernel.org/),
on certain machines, perf trace failed to load the BPF program into the
kernel. The verifier runs perf trace's BPF program for up to 1 million
instructions, returning an E2BIG error, whereas the perf trace BPF
program should be much less complex than that. This patch aims to fix
the issue described above.
The E2BIG problem from clang-15 to clang-16 is cause by this line:
} else if (size < 0 && size >= -6) { /* buffer */
Specifically this check: size < 0. seems like clang generates a cool
optimization to this sign check that breaks things.
Making 'size' s64, and use
} else if ((int)size < 0 && size >= -6) { /* buffer */
Solves the problem. This is some Hogwarts magic.
And the unbounded access of clang-12 and clang-14 (clang-13 works this
time) is fixed by making variable 'aug_size' s64.
As for this:
-if (aug_size > TRACE_AUG_MAX_BUF)
- aug_size = TRACE_AUG_MAX_BUF;
+aug_size = args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF : args->args[index];
This makes the BPF skel generated by clang-18 work. Yes, new clangs
introduce problems too.
Sorry, I only know that it works, but I don't know how it works. I'm not
an expert in the BPF verifier. I really hope this is not a kernel
version issue, as that would make the test case (kernel_nr) *
(clang_nr), a true horror story. I will test it on more kernel versions
in the future.
Fixes: 395d38419f18: ("perf trace augmented_raw_syscalls: Add more check s to pass the verifier")
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241213023047.541218-1-howardchu95@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
An evsel idx may not be stable due to sorting, evlist removal,
etc. Try to reduce it being part of APIs by explicitly passing the
evsel in annotate code. Internally the code just reads evsel->core.idx
so behavior is unchanged.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Chen Ni <nichen@iscas.ac.cn>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20250117181848.690474-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When a filtered column is not present in the sort order, profiles become
arbitrary broken. Filtered and non-filtered entries are collapsed
together, and the filtered-by field ends up with a random value (either
from a filtered or non-filtered entry). If we end up with filtered
entry/value, then the whole collapsed entry will be filtered out and will
be missing in the profile. If we end up with non-filtered entry/value,
then the overhead value will be wrongly larger (include some subset
of filtered out samples).
This leads to very confusing profiles. The problem is hard to notice,
and if noticed hard to understand. If the filter is for a single value,
then it can be fixed by adding the corresponding field to the sort order
(provided user understood the problem). But if the filter is for multiple
values, it's impossible to fix b/c there is no concept of binary sorting
based on filter predicate (we want to group all non-filtered values in
one bucket, and all filtered values in another).
Examples of affected commands:
perf report --tid=123
perf report --sort overhead,symbol --comm=foo,bar
Fix this by considering filtered status as the highest priority
sort/collapse predicate.
As a side effect this effectively adds a new feature of showing profile
where several lines are combined based on arbitrary filtering predicate.
For example, showing symbols from binaries foo and bar combined together,
but not from other binaries; or showing combined overhead of several
particular threads.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/359dc444ce94d20e59d3a9e360c36fbeac833a04.1736927981.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Application of cmp/sort/collapse fmt callbacks is duplicated 6 times.
Factor it into a common helper function. NFC.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/84c4b55614e24a344f86ae0db62e8fa8f251f874.1736927981.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To allow for setting a variable from some other tool, like with the
"wallclock" patchset needs to allow the user to opt-in to having
that key in the sort order for 'perf report'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/Z4akewi7UPXpagce@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|