diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-18 07:32:11 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-18 07:32:11 +0300 |
commit | 57d17378a4a042401b0c2fe211e5a0e3a276cb3d (patch) | |
tree | be65e51a7b11cccb903b44708b98b3af47477304 /tools/perf/util | |
parent | f0033681f0fe8421baf8db125e57fa6157824c2d (diff) | |
parent | 9bce13ea88f85344b765abe5d3dabdd0f44dc177 (diff) | |
download | linux-57d17378a4a042401b0c2fe211e5a0e3a276cb3d.tar.xz |
Merge tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"New features:
- Add 'trace' subcommand for 'perf ftrace', setting the stage for
more 'perf ftrace' subcommands. Not using a subcommand yields the
previous behaviour of 'perf ftrace'.
- Add 'latency' subcommand to 'perf ftrace', that can use the
function graph tracer or a BPF optimized one, via the -b/--use-bpf
option.
E.g.:
$ sudo perf ftrace latency -a -T mutex_lock sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 4596 | ######################## |
1 - 2 us | 1680 | ######### |
2 - 4 us | 1106 | ##### |
4 - 8 us | 546 | ## |
8 - 16 us | 562 | ### |
16 - 32 us | 1 | |
32 - 64 us | 0 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
The original implementation of this command was in the bcc tool.
- Support --cputype option for hybrid events in 'perf stat'.
Improvements:
- Call chain improvements for ARM64.
- No need to do any affinity setup when profiling pids.
- Reduce multiplexing with duration_time in 'perf stat' metrics.
- Improve error message for uncore events, stating that some event
groups are can only be used in system wide (-a) mode.
- perf stat metric group leader fixes/improvements, including arch
specific changes to better support Intel topdown events.
- Probe non-deprecated sysfs path first, i.e. try the path
/sys/devices/system/cpu/cpuN/topology/thread_siblings first, then
the old /sys/devices/system/cpu/cpuN/topology/core_cpus.
- Disable debuginfod by default in 'perf record', to avoid stalls on
distros such as Fedora 35.
- Use unbuffered output in 'perf bench' when pipe/tee'ing to a file.
- Enable ignore_missing_thread in 'perf trace'
Fixes:
- Avoid TUI crash when navigating in the annotation of recursive
functions.
- Fix hex dump character output in 'perf script'.
- Fix JSON indentation to 4 spaces standard in the ARM vendor event
files.
- Fix use after free in metric__new().
- Fix IS_ERR_OR_NULL() usage in the perf BPF loader.
- Fix up cross-arch register support, i.e. when printing register
names take into account the architecture where the perf.data file
was collected.
- Fix SMT fallback with large core counts.
- Don't lower case MetricExpr when parsing JSON files so as not to
lose info such as the ":G" event modifier in metrics.
perf test:
- Add basic stress test for sigtrap handling to 'perf test'.
- Fix 'perf test' failures on s/390
- Enable system wide for metricgroups test in 'perf test´.
- Use 3 digits for test numbering now we can have more tests.
Arch specific:
- Add events for Arm Neoverse N2 in the ARM JSON vendor event files
- Support PERF_MEM_LVLNUM encodings in powerpc, that came from a
single patch series, where I incorrectly merged the kernel bits,
that were then reverted after coordination with Michael Ellerman
and Stephen Rothwell.
- Add ARM SPE total latency as PERF_SAMPLE_WEIGHT.
- Update AMD documentation, with info on raw event encoding.
- Add support for global and local variants of the "p_stage_cyc" sort
key, applicable to perf.data files collected on powerpc.
- Remove duplicate and incorrect aux size checks in the ARM CoreSight
ETM code.
Refactorings:
- Add a perf_cpu abstraction to disambiguate CPUs and CPU map
indexes, fixing problems along the way.
- Document CPU map methods.
UAPI sync:
- Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench
mem memcpy'
- Sync UAPI files with the kernel sources: drm, msr-index,
cpufeatures.
Build system
- Enable warnings through HOSTCFLAGS.
- Drop requirement for libstdc++.so for libopencsd check
libperf:
- Make libperf adopt perf_counts_values__scale() from tools/perf/util/.
- Add a stat multiplexing test to libperf"
* tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (115 commits)
perf record: Disable debuginfod by default
perf evlist: No need to do any affinity setup when profiling pids
perf cpumap: Add is_dummy() method
perf metric: Fix metric_leader
perf cputopo: Fix CPU topology reading on s/390
perf metricgroup: Fix use after free in metric__new()
libperf tests: Update a use of the new cpumap API
perf arm: Fix off-by-one directory path
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Update tools's copy of drm.h header
tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
perf pmu-events: Don't lower case MetricExpr
perf expr: Add debug logging for literals
perf tools: Probe non-deprecated sysfs path 1st
perf tools: Fix SMT fallback with large core counts
perf cpumap: Give CPUs their own type
perf stat: Correct first_shadow_cpu to return index
perf script: Fix flipped index and cpu
perf c2c: Use more intention revealing iterator
...
Diffstat (limited to 'tools/perf/util')
64 files changed, 2317 insertions, 830 deletions
diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 2e5bfbb69960..2a403cefcaf2 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -1,3 +1,4 @@ +perf-y += arm64-frame-pointer-unwind-support.o perf-y += annotate.o perf-y += block-info.o perf-y += block-range.o @@ -144,6 +145,7 @@ perf-$(CONFIG_LIBBPF) += bpf-loader.o perf-$(CONFIG_LIBBPF) += bpf_map.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter_cgroup.o +perf-$(CONFIG_PERF_BPF_SKEL) += bpf_ftrace.o perf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o perf-$(CONFIG_LIBELF) += symbol-elf.o perf-$(CONFIG_LIBELF) += probe-file.o diff --git a/tools/perf/util/affinity.c b/tools/perf/util/affinity.c index 7b12bd7a3080..f1e30d566db3 100644 --- a/tools/perf/util/affinity.c +++ b/tools/perf/util/affinity.c @@ -11,7 +11,7 @@ static int get_cpu_set_size(void) { - int sz = cpu__max_cpu() + 8 - 1; + int sz = cpu__max_cpu().cpu + 8 - 1; /* * sched_getaffinity doesn't like masks smaller than the kernel. * Hopefully that's big enough. diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c index 3fc528c9270c..5e390a1a79ab 100644 --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c @@ -179,6 +179,8 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder) decoder->record.phys_addr = ip; break; case ARM_SPE_COUNTER: + if (idx == SPE_CNT_PKT_HDR_INDEX_TOTAL_LAT) + decoder->record.latency = payload; break; case ARM_SPE_CONTEXT: decoder->record.context_id = payload; diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h index 46a8556a9e95..69b31084d6be 100644 --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h @@ -33,6 +33,7 @@ struct arm_spe_record { enum arm_spe_sample_type type; int err; u32 op; + u32 latency; u64 from_ip; u64 to_ip; u64 timestamp; diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index fccac06b573a..d2b64e3f588b 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -58,6 +58,8 @@ struct arm_spe { u8 sample_branch; u8 sample_remote_access; u8 sample_memory; + u8 sample_instructions; + u64 instructions_sample_period; u64 l1d_miss_id; u64 l1d_access_id; @@ -68,6 +70,7 @@ struct arm_spe { u64 branch_miss_id; u64 remote_access_id; u64 memory_id; + u64 instructions_id; u64 kernel_start; @@ -90,6 +93,7 @@ struct arm_spe_queue { u64 time; u64 timestamp; struct thread *thread; + u64 period_instructions; }; static void arm_spe_dump(struct arm_spe *spe __maybe_unused, @@ -202,6 +206,7 @@ static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe, speq->pid = -1; speq->tid = -1; speq->cpu = -1; + speq->period_instructions = 0; /* params set */ params.get_trace = arm_spe_get_trace; @@ -330,6 +335,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq, sample.addr = record->virt_addr; sample.phys_addr = record->phys_addr; sample.data_src = data_src; + sample.weight = record->latency; return arm_spe_deliver_synth_event(spe, speq, event, &sample); } @@ -347,6 +353,36 @@ static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq, sample.id = spe_events_id; sample.stream_id = spe_events_id; sample.addr = record->to_ip; + sample.weight = record->latency; + + return arm_spe_deliver_synth_event(spe, speq, event, &sample); +} + +static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq, + u64 spe_events_id, u64 data_src) +{ + struct arm_spe *spe = speq->spe; + struct arm_spe_record *record = &speq->decoder->record; + union perf_event *event = speq->event_buf; + struct perf_sample sample = { .ip = 0, }; + + /* + * Handles perf instruction sampling period. + */ + speq->period_instructions++; + if (speq->period_instructions < spe->instructions_sample_period) + return 0; + speq->period_instructions = 0; + + arm_spe_prep_sample(spe, speq, event, &sample); + + sample.id = spe_events_id; + sample.stream_id = spe_events_id; + sample.addr = record->virt_addr; + sample.phys_addr = record->phys_addr; + sample.data_src = data_src; + sample.period = spe->instructions_sample_period; + sample.weight = record->latency; return arm_spe_deliver_synth_event(spe, speq, event, &sample); } @@ -480,6 +516,12 @@ static int arm_spe_sample(struct arm_spe_queue *speq) return err; } + if (spe->sample_instructions) { + err = arm_spe__synth_instruction_sample(speq, spe->instructions_id, data_src); + if (err) + return err; + } + return 0; } @@ -993,7 +1035,8 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session) attr.type = PERF_TYPE_HARDWARE; attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK; attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID | - PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC; + PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC | + PERF_SAMPLE_WEIGHT; if (spe->timeless_decoding) attr.sample_type &= ~(u64)PERF_SAMPLE_TIME; else @@ -1107,7 +1150,29 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session) return err; spe->memory_id = id; arm_spe_set_event_name(evlist, id, "memory"); + id += 1; + } + + if (spe->synth_opts.instructions) { + if (spe->synth_opts.period_type != PERF_ITRACE_PERIOD_INSTRUCTIONS) { + pr_warning("Only instruction-based sampling period is currently supported by Arm SPE.\n"); + goto synth_instructions_out; + } + if (spe->synth_opts.period > 1) + pr_warning("Arm SPE has a hardware-based sample period.\n" + "Additional instruction events will be discarded by --itrace\n"); + + spe->sample_instructions = true; + attr.config = PERF_COUNT_HW_INSTRUCTIONS; + attr.sample_period = spe->synth_opts.period; + spe->instructions_sample_period = attr.sample_period; + err = arm_spe_synth_event(session, &attr, id); + if (err) + return err; + spe->instructions_id = id; + arm_spe_set_event_name(evlist, id, "instructions"); } +synth_instructions_out: return 0; } diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c new file mode 100644 index 000000000000..2242a885fbd7 --- /dev/null +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "arm64-frame-pointer-unwind-support.h" +#include "callchain.h" +#include "event.h" +#include "perf_regs.h" // SMPL_REG_MASK +#include "unwind.h" + +#define perf_event_arm_regs perf_event_arm64_regs +#include "../../arch/arm64/include/uapi/asm/perf_regs.h" +#undef perf_event_arm_regs + +struct entries { + u64 stack[2]; + size_t length; +}; + +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample) +{ + return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs + && sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_LR); +} + +static int add_entry(struct unwind_entry *entry, void *arg) +{ + struct entries *entries = arg; + + entries->stack[entries->length++] = entry->ip; + return 0; +} + +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx) +{ + int ret; + struct entries entries = {}; + struct regs_dump old_regs = sample->user_regs; + + if (!get_leaf_frame_caller_enabled(sample)) + return 0; + + /* + * If PC and SP are not recorded, get the value of PC from the stack + * and set its mask. SP is not used when doing the unwinding but it + * still needs to be set to prevent failures. + */ + + if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) { + sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC); + sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1]; + } + + if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) { + sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP); + sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0; + } + + ret = unwind__get_entries(add_entry, &entries, thread, sample, 2); + sample->user_regs = old_regs; + + if (ret || entries.length != 2) + return ret; + + return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1]; +} diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h new file mode 100644 index 000000000000..32af9ce94398 --- /dev/null +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H + +#include "event.h" +#include "thread.h" + +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx); + +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */ diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index c679394b898d..5632efc44738 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -123,7 +123,7 @@ int auxtrace_mmap__mmap(struct auxtrace_mmap *mm, mm->prev = 0; mm->idx = mp->idx; mm->tid = mp->tid; - mm->cpu = mp->cpu; + mm->cpu = mp->cpu.cpu; if (!mp->len) { mm->base = NULL; @@ -180,7 +180,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp, else mp->tid = -1; } else { - mp->cpu = -1; + mp->cpu.cpu = -1; mp->tid = perf_thread_map__pid(evlist->core.threads, idx); } } @@ -292,7 +292,7 @@ static int auxtrace_queues__queue_buffer(struct auxtrace_queues *queues, if (!queue->set) { queue->set = true; queue->tid = buffer->tid; - queue->cpu = buffer->cpu; + queue->cpu = buffer->cpu.cpu; } buffer->buffer_nr = queues->next_buffer_nr++; @@ -339,11 +339,11 @@ static int auxtrace_queues__split_buffer(struct auxtrace_queues *queues, return 0; } -static bool filter_cpu(struct perf_session *session, int cpu) +static bool filter_cpu(struct perf_session *session, struct perf_cpu cpu) { unsigned long *cpu_bitmap = session->itrace_synth_opts->cpu_bitmap; - return cpu_bitmap && cpu != -1 && !test_bit(cpu, cpu_bitmap); + return cpu_bitmap && cpu.cpu != -1 && !test_bit(cpu.cpu, cpu_bitmap); } static int auxtrace_queues__add_buffer(struct auxtrace_queues *queues, @@ -399,7 +399,7 @@ int auxtrace_queues__add_event(struct auxtrace_queues *queues, struct auxtrace_buffer buffer = { .pid = -1, .tid = event->auxtrace.tid, - .cpu = event->auxtrace.cpu, + .cpu = { event->auxtrace.cpu }, .data_offset = data_offset, .offset = event->auxtrace.offset, .reference = event->auxtrace.reference, diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index bbf0d78c6401..19910b9011f3 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -15,6 +15,7 @@ #include <linux/list.h> #include <linux/perf_event.h> #include <linux/types.h> +#include <internal/cpumap.h> #include <asm/bitsperlong.h> #include <asm/barrier.h> @@ -240,7 +241,7 @@ struct auxtrace_buffer { size_t size; pid_t pid; pid_t tid; - int cpu; + struct perf_cpu cpu; void *data; off_t data_offset; void *mmap_addr; @@ -350,7 +351,7 @@ struct auxtrace_mmap_params { int prot; int idx; pid_t tid; - int cpu; + struct perf_cpu cpu; }; /** diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c index 528aeb0ab79d..7ecfaac7536a 100644 --- a/tools/perf/util/bpf-loader.c +++ b/tools/perf/util/bpf-loader.c @@ -424,7 +424,7 @@ preproc_gen_prologue(struct bpf_program *prog, int n, size_t prologue_cnt = 0; int i, err; - if (IS_ERR(priv) || !priv || priv->is_tp) + if (IS_ERR_OR_NULL(priv) || priv->is_tp) goto errout; pev = &priv->pev; @@ -573,7 +573,7 @@ static int hook_load_preprocessor(struct bpf_program *prog) bool need_prologue = false; int err, i; - if (IS_ERR(priv) || !priv) { + if (IS_ERR_OR_NULL(priv)) { pr_debug("Internal error when hook preprocessor\n"); return -BPF_LOADER_ERRNO__INTERNAL; } @@ -645,8 +645,11 @@ int bpf__probe(struct bpf_object *obj) goto out; priv = bpf_program__priv(prog); - if (IS_ERR(priv) || !priv) { - err = PTR_ERR(priv); + if (IS_ERR_OR_NULL(priv)) { + if (!priv) + err = -BPF_LOADER_ERRNO__INTERNAL; + else + err = PTR_ERR(priv); goto out; } @@ -696,7 +699,7 @@ int bpf__unprobe(struct bpf_object *obj) struct bpf_prog_priv *priv = bpf_program__priv(prog); int i; - if (IS_ERR(priv) || !priv || priv->is_tp) + if (IS_ERR_OR_NULL(priv) || priv->is_tp) continue; for (i = 0; i < priv->pev.ntevs; i++) { @@ -754,7 +757,7 @@ int bpf__foreach_event(struct bpf_object *obj, struct perf_probe_event *pev; int i, fd; - if (IS_ERR(priv) || !priv) { + if (IS_ERR_OR_NULL(priv)) { pr_debug("bpf: failed to get private field\n"); return -BPF_LOADER_ERRNO__INTERNAL; } diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c index 5a97fd7d0a71..3ce8d03cb7ec 100644 --- a/tools/perf/util/bpf_counter.c +++ b/tools/perf/util/bpf_counter.c @@ -265,7 +265,7 @@ static int bpf_program_profiler__read(struct evsel *evsel) return 0; } -static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu, +static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu_map_idx, int fd) { struct bpf_prog_profiler_bpf *skel; @@ -277,7 +277,7 @@ static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu, assert(skel != NULL); ret = bpf_map_update_elem(bpf_map__fd(skel->maps.events), - &cpu, &fd, BPF_ANY); + &cpu_map_idx, &fd, BPF_ANY); if (ret) return ret; } @@ -554,7 +554,7 @@ static int bperf__load(struct evsel *evsel, struct target *target) filter_type == BPERF_FILTER_TGID) key = evsel->core.threads->map[i].pid; else if (filter_type == BPERF_FILTER_CPU) - key = evsel->core.cpus->map[i]; + key = evsel->core.cpus->map[i].cpu; else break; @@ -580,12 +580,12 @@ out: return err; } -static int bperf__install_pe(struct evsel *evsel, int cpu, int fd) +static int bperf__install_pe(struct evsel *evsel, int cpu_map_idx, int fd) { struct bperf_leader_bpf *skel = evsel->leader_skel; return bpf_map_update_elem(bpf_map__fd(skel->maps.events), - &cpu, &fd, BPF_ANY); + &cpu_map_idx, &fd, BPF_ANY); } /* @@ -598,7 +598,7 @@ static int bperf_sync_counters(struct evsel *evsel) num_cpu = all_cpu_map->nr; for (i = 0; i < num_cpu; i++) { - cpu = all_cpu_map->map[i]; + cpu = all_cpu_map->map[i].cpu; bperf_trigger_reading(evsel->bperf_leader_prog_fd, cpu); } return 0; @@ -619,15 +619,17 @@ static int bperf__disable(struct evsel *evsel) static int bperf__read(struct evsel *evsel) { struct bperf_follower_bpf *skel = evsel->follower_skel; - __u32 num_cpu_bpf = cpu__max_cpu(); + __u32 num_cpu_bpf = cpu__max_cpu().cpu; struct bpf_perf_event_value values[num_cpu_bpf]; int reading_map_fd, err = 0; - __u32 i, j, num_cpu; + __u32 i; + int j; bperf_sync_counters(evsel); reading_map_fd = bpf_map__fd(skel->maps.accum_readings); for (i = 0; i < bpf_map__max_entries(skel->maps.accum_readings); i++) { + struct perf_cpu entry; __u32 cpu; err = bpf_map_lookup_elem(reading_map_fd, &i, values); @@ -637,16 +639,15 @@ static int bperf__read(struct evsel *evsel) case BPERF_FILTER_GLOBAL: assert(i == 0); - num_cpu = all_cpu_map->nr; - for (j = 0; j < num_cpu; j++) { - cpu = all_cpu_map->map[j]; + perf_cpu_map__for_each_cpu(entry, j, all_cpu_map) { + cpu = entry.cpu; perf_counts(evsel->counts, cpu, 0)->val = values[cpu].counter; perf_counts(evsel->counts, cpu, 0)->ena = values[cpu].enabled; perf_counts(evsel->counts, cpu, 0)->run = values[cpu].running; } break; case BPERF_FILTER_CPU: - cpu = evsel->core.cpus->map[i]; + cpu = evsel->core.cpus->map[i].cpu; perf_counts(evsel->counts, i, 0)->val = values[cpu].counter; perf_counts(evsel->counts, i, 0)->ena = values[cpu].enabled; perf_counts(evsel->counts, i, 0)->run = values[cpu].running; @@ -771,11 +772,11 @@ static inline bool bpf_counter_skip(struct evsel *evsel) evsel->follower_skel == NULL; } -int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd) +int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd) { if (bpf_counter_skip(evsel)) return 0; - return evsel->bpf_counter_ops->install_pe(evsel, cpu, fd); + return evsel->bpf_counter_ops->install_pe(evsel, cpu_map_idx, fd); } int bpf_counter__load(struct evsel *evsel, struct target *target) diff --git a/tools/perf/util/bpf_counter.h b/tools/perf/util/bpf_counter.h index 65ebaa6694fb..4dbf26408b69 100644 --- a/tools/perf/util/bpf_counter.h +++ b/tools/perf/util/bpf_counter.h @@ -16,7 +16,7 @@ typedef int (*bpf_counter_evsel_op)(struct evsel *evsel); typedef int (*bpf_counter_evsel_target_op)(struct evsel *evsel, struct target *target); typedef int (*bpf_counter_evsel_install_pe_op)(struct evsel *evsel, - int cpu, + int cpu_map_idx, int fd); struct bpf_counter_ops { @@ -40,7 +40,7 @@ int bpf_counter__enable(struct evsel *evsel); int bpf_counter__disable(struct evsel *evsel); int bpf_counter__read(struct evsel *evsel); void bpf_counter__destroy(struct evsel *evsel); -int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd); +int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd); #else /* HAVE_BPF_SKEL */ diff --git a/tools/perf/util/bpf_counter_cgroup.c b/tools/perf/util/bpf_counter_cgroup.c index cbc6c2bca488..631e34a0b66f 100644 --- a/tools/perf/util/bpf_counter_cgroup.c +++ b/tools/perf/util/bpf_counter_cgroup.c @@ -48,7 +48,7 @@ static int bperf_load_program(struct evlist *evlist) struct cgroup *cgrp, *leader_cgrp; __u32 i, cpu; __u32 nr_cpus = evlist->core.all_cpus->nr; - int total_cpus = cpu__max_cpu(); + int total_cpus = cpu__max_cpu().cpu; int map_size, map_fd; int prog_fd, err; @@ -125,7 +125,7 @@ static int bperf_load_program(struct evlist *evlist) for (cpu = 0; cpu < nr_cpus; cpu++) { int fd = FD(evsel, cpu); __u32 idx = evsel->core.idx * total_cpus + - evlist->core.all_cpus->map[cpu]; + evlist->core.all_cpus->map[cpu].cpu; err = bpf_map_update_elem(map_fd, &idx, &fd, BPF_ANY); @@ -212,7 +212,7 @@ static int bperf_cgrp__sync_counters(struct evlist *evlist) int prog_fd = bpf_program__fd(skel->progs.trigger_read); for (i = 0; i < nr_cpus; i++) { - cpu = evlist->core.all_cpus->map[i]; + cpu = evlist->core.all_cpus->map[i].cpu; bperf_trigger_reading(prog_fd, cpu); } @@ -245,7 +245,7 @@ static int bperf_cgrp__read(struct evsel *evsel) { struct evlist *evlist = evsel->evlist; int i, cpu, nr_cpus = evlist->core.all_cpus->nr; - int total_cpus = cpu__max_cpu(); + int total_cpus = cpu__max_cpu().cpu; struct perf_counts_values *counts; struct bpf_perf_event_value *values; int reading_map_fd, err = 0; @@ -272,7 +272,7 @@ static int bperf_cgrp__read(struct evsel *evsel) } for (i = 0; i < nr_cpus; i++) { - cpu = evlist->core.all_cpus->map[i]; + cpu = evlist->core.all_cpus->map[i].cpu; counts = perf_counts(evsel->counts, i, 0); counts->val = values[cpu].counter; diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c new file mode 100644 index 000000000000..d756cc66eef3 --- /dev/null +++ b/tools/perf/util/bpf_ftrace.c @@ -0,0 +1,152 @@ +#include <stdio.h> +#include <fcntl.h> +#include <stdint.h> +#include <stdlib.h> + +#include <linux/err.h> + +#include "util/ftrace.h" +#include "util/cpumap.h" +#include "util/thread_map.h" +#include "util/debug.h" +#include "util/evlist.h" +#include "util/bpf_counter.h" + +#include "util/bpf_skel/func_latency.skel.h" + +static struct func_latency_bpf *skel; + +int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace) +{ + int fd, err; + int i, ncpus = 1, ntasks = 1; + struct filter_entry *func; + + if (!list_is_singular(&ftrace->filters)) { + pr_err("ERROR: %s target function(s).\n", + list_empty(&ftrace->filters) ? "No" : "Too many"); + return -1; + } + + func = list_first_entry(&ftrace->filters, struct filter_entry, list); + + skel = func_latency_bpf__open(); + if (!skel) { + pr_err("Failed to open func latency skeleton\n"); + return -1; + } + + /* don't need to set cpu filter for system-wide mode */ + if (ftrace->target.cpu_list) { + ncpus = perf_cpu_map__nr(ftrace->evlist->core.cpus); + bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus); + } + + if (target__has_task(&ftrace->target) || target__none(&ftrace->target)) { + ntasks = perf_thread_map__nr(ftrace->evlist->core.threads); + bpf_map__set_max_entries(skel->maps.task_filter, ntasks); + } + + set_max_rlimit(); + + err = func_latency_bpf__load(skel); + if (err) { + pr_err("Failed to load func latency skeleton\n"); + goto out; + } + + if (ftrace->target.cpu_list) { + u32 cpu; + u8 val = 1; + + skel->bss->has_cpu = 1; + fd = bpf_map__fd(skel->maps.cpu_filter); + + for (i = 0; i < ncpus; i++) { + cpu = perf_cpu_map__cpu(ftrace->evlist->core.cpus, i).cpu; + bpf_map_update_elem(fd, &cpu, &val, BPF_ANY); + } + } + + if (target__has_task(&ftrace->target) || target__none(&ftrace->target)) { + u32 pid; + u8 val = 1; + + skel->bss->has_task = 1; + fd = bpf_map__fd(skel->maps.task_filter); + + for (i = 0; i < ntasks; i++) { + pid = perf_thread_map__pid(ftrace->evlist->core.threads, i); + bpf_map_update_elem(fd, &pid, &val, BPF_ANY); + } + } + + skel->links.func_begin = bpf_program__attach_kprobe(skel->progs.func_begin, + false, func->name); + if (IS_ERR(skel->links.func_begin)) { + pr_err("Failed to attach fentry program\n"); + err = PTR_ERR(skel->links.func_begin); + goto out; + } + + skel->links.func_end = bpf_program__attach_kprobe(skel->progs.func_end, + true, func->name); + if (IS_ERR(skel->links.func_end)) { + pr_err("Failed to attach fexit program\n"); + err = PTR_ERR(skel->links.func_end); + goto out; + } + + /* XXX: we don't actually use this fd - just for poll() */ + return open("/dev/null", O_RDONLY); + +out: + return err; +} + +int perf_ftrace__latency_start_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + skel->bss->enabled = 1; + return 0; +} + +int perf_ftrace__latency_stop_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + skel->bss->enabled = 0; + return 0; +} + +int perf_ftrace__latency_read_bpf(struct perf_ftrace *ftrace __maybe_unused, + int buckets[]) +{ + int i, fd, err; + u32 idx; + u64 *hist; + int ncpus = cpu__max_cpu().cpu; + + fd = bpf_map__fd(skel->maps.latency); + + hist = calloc(ncpus, sizeof(*hist)); + if (hist == NULL) + return -ENOMEM; + + for (idx = 0; idx < NUM_BUCKET; idx++) { + err = bpf_map_lookup_elem(fd, &idx, hist); + if (err) { + buckets[idx] = 0; + continue; + } + + for (i = 0; i < ncpus; i++) + buckets[idx] += hist[i]; + } + + free(hist); + return 0; +} + +int perf_ftrace__latency_cleanup_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + func_latency_bpf__destroy(skel); + return 0; +} diff --git a/tools/perf/util/bpf_skel/func_latency.bpf.c b/tools/perf/util/bpf_skel/func_latency.bpf.c new file mode 100644 index 000000000000..ea94187fe443 --- /dev/null +++ b/tools/perf/util/bpf_skel/func_latency.bpf.c @@ -0,0 +1,114 @@ +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +// Copyright (c) 2021 Google +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +// This should be in sync with "util/ftrace.h" +#define NUM_BUCKET 22 + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u64)); + __uint(value_size, sizeof(__u64)); + __uint(max_entries, 10000); +} functime SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u8)); + __uint(max_entries, 1); +} cpu_filter SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u8)); + __uint(max_entries, 1); +} task_filter SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u64)); + __uint(max_entries, NUM_BUCKET); +} latency SEC(".maps"); + + +int enabled = 0; +int has_cpu = 0; +int has_task = 0; + +SEC("kprobe/func") +int BPF_PROG(func_begin) +{ + __u64 key, now; + + if (!enabled) + return 0; + + key = bpf_get_current_pid_tgid(); + + if (has_cpu) { + __u32 cpu = bpf_get_smp_processor_id(); + __u8 *ok; + + ok = bpf_map_lookup_elem(&cpu_filter, &cpu); + if (!ok) + return 0; + } + + if (has_task) { + __u32 pid = key & 0xffffffff; + __u8 *ok; + + ok = bpf_map_lookup_elem(&task_filter, &pid); + if (!ok) + return 0; + } + + now = bpf_ktime_get_ns(); + + // overwrite timestamp for nested functions + bpf_map_update_elem(&functime, &key, &now, BPF_ANY); + return 0; +} + +SEC("kretprobe/func") +int BPF_PROG(func_end) +{ + __u64 tid; + __u64 *start; + + if (!enabled) + return 0; + + tid = bpf_get_current_pid_tgid(); + + start = bpf_map_lookup_elem(&functime, &tid); + if (start) { + __s64 delta = bpf_ktime_get_ns() - *start; + __u32 key; + __u64 *hist; + + bpf_map_delete_elem(&functime, &tid); + + if (delta < 0) + return 0; + + // calculate index using delta in usec + for (key = 0; key < (NUM_BUCKET - 1); key++) { + if (delta < ((1000UL) << key)) + break; + } + + hist = bpf_map_lookup_elem(&latency, &key); + if (!hist) + return 0; + + *hist += 1; + } + + return 0; +} diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c index 8e2777133bd9..131207b91d15 100644 --- a/tools/perf/util/callchain.c +++ b/tools/perf/util/callchain.c @@ -1600,7 +1600,7 @@ void callchain_cursor_reset(struct callchain_cursor *cursor) map__zput(node->ms.map); } -void callchain_param_setup(u64 sample_type) +void callchain_param_setup(u64 sample_type, const char *arch) { if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) { if ((sample_type & PERF_SAMPLE_REGS_USER) && @@ -1612,6 +1612,18 @@ void callchain_param_setup(u64 sample_type) else callchain_param.record_mode = CALLCHAIN_FP; } + + /* + * It's necessary to use libunwind to reliably determine the caller of + * a leaf function on aarch64, as otherwise we cannot know whether to + * start from the LR or FP. + * + * Always starting from the LR can result in duplicate or entirely + * erroneous entries. Always skipping the LR and starting from the FP + * can result in missing entries. + */ + if (callchain_param.record_mode == CALLCHAIN_FP && !strcmp(arch, "arm64")) + dwarf_callchain_users = true; } static bool chain_match(struct callchain_list *base_chain, diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h index 5824134f983b..d95615daed73 100644 --- a/tools/perf/util/callchain.h +++ b/tools/perf/util/callchain.h @@ -280,6 +280,8 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused, } #endif +void arch__add_leaf_frame_record_opts(struct record_opts *opts); + char *callchain_list__sym_name(struct callchain_list *cl, char *bf, size_t bfsize, bool show_dso); char *callchain_node__scnprintf_value(struct callchain_node *node, @@ -298,7 +300,7 @@ int callchain_branch_counts(struct callchain_root *root, u64 *branch_count, u64 *predicted_count, u64 *abort_count, u64 *cycles_count); -void callchain_param_setup(u64 sample_type); +void callchain_param_setup(u64 sample_type, const char *arch); bool callchain_cnode_matched(struct callchain_node *base_cnode, struct callchain_node *pair_cnode); diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c index 582f3aeaf5e4..2b81707b9dba 100644 --- a/tools/perf/util/counts.c +++ b/tools/perf/util/counts.c @@ -4,6 +4,7 @@ #include <string.h> #include "evsel.h" #include "counts.h" +#include <perf/threadmap.h> #include <linux/zalloc.h> struct perf_counts *perf_counts__new(int ncpus, int nthreads) @@ -55,9 +56,12 @@ void evsel__reset_counts(struct evsel *evsel) perf_counts__reset(evsel->counts); } -int evsel__alloc_counts(struct evsel *evsel, int ncpus, int nthreads) +int evsel__alloc_counts(struct evsel *evsel) { - evsel->counts = perf_counts__new(ncpus, nthreads); + struct perf_cpu_map *cpus = evsel__cpus(evsel); + int nthreads = perf_thread_map__nr(evsel->core.threads); + + evsel->counts = perf_counts__new(cpus ? cpus->nr : 1, nthreads); return evsel->counts != NULL ? 0 : -ENOMEM; } diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h index 7ff36bf6d644..5de275194f2b 100644 --- a/tools/perf/util/counts.h +++ b/tools/perf/util/counts.h @@ -18,21 +18,21 @@ struct perf_counts { static inline struct perf_counts_values* -perf_counts(struct perf_counts *counts, int cpu, int thread) +perf_counts(struct perf_counts *counts, int cpu_map_idx, int thread) { - return xyarray__entry(counts->values, cpu, thread); + return xyarray__entry(counts->values, cpu_map_idx, thread); } static inline bool -perf_counts__is_loaded(struct perf_counts *counts, int cpu, int thread) +perf_counts__is_loaded(struct perf_counts *counts, int cpu_map_idx, int thread) { - return *((bool *) xyarray__entry(counts->loaded, cpu, thread)); + return *((bool *) xyarray__entry(counts->loaded, cpu_map_idx, thread)); } static inline void -perf_counts__set_loaded(struct perf_counts *counts, int cpu, int thread, bool loaded) +perf_counts__set_loaded(struct perf_counts *counts, int cpu_map_idx, int thread, bool loaded) { - *((bool *) xyarray__entry(counts->loaded, cpu, thread)) = loaded; + *((bool *) xyarray__entry(counts->loaded, cpu_map_idx, thread)) = loaded; } struct perf_counts *perf_counts__new(int ncpus, int nthreads); @@ -40,7 +40,7 @@ void perf_counts__delete(struct perf_counts *counts); void perf_counts__reset(struct perf_counts *counts); void evsel__reset_counts(struct evsel *evsel); -int evsel__alloc_counts(struct evsel *evsel, int ncpus, int nthreads); +int evsel__alloc_counts(struct evsel *evsel); void evsel__free_counts(struct evsel *evsel); #endif /* __PERF_COUNTS_H */ diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c index 87d3eca9b872..12b2243222b0 100644 --- a/tools/perf/util/cpumap.c +++ b/tools/perf/util/cpumap.c @@ -13,9 +13,13 @@ #include <linux/ctype.h> #include <linux/zalloc.h> -static int max_cpu_num; -static int max_present_cpu_num; +static struct perf_cpu max_cpu_num; +static struct perf_cpu max_present_cpu_num; static int max_node_num; +/** + * The numa node X as read from /sys/devices/system/node/nodeX indexed by the + * CPU number. + */ static int *cpunode_map; static struct perf_cpu_map *cpu_map__from_entries(struct cpu_map_entries *cpus) @@ -33,9 +37,9 @@ static struct perf_cpu_map *cpu_map__from_entries(struct cpu_map_entries *cpus) * otherwise it would become 65535. */ if (cpus->cpu[i] == (u16) -1) - map->map[i] = -1; + map->map[i].cpu = -1; else - map->map[i] = (int) cpus->cpu[i]; + map->map[i].cpu = (int) cpus->cpu[i]; } } @@ -54,7 +58,7 @@ static struct perf_cpu_map *cpu_map__from_mask(struct perf_record_record_cpu_map int cpu, i = 0; for_each_set_bit(cpu, mask->mask, nbits) - map->map[i++] = cpu; + map->map[i++].cpu = cpu; } return map; @@ -87,7 +91,7 @@ struct perf_cpu_map *perf_cpu_map__empty_new(int nr) cpus->nr = nr; for (i = 0; i < nr; i++) - cpus->map[i] = -1; + cpus->map[i].cpu = -1; refcount_set(&cpus->refcnt, 1); } @@ -104,7 +108,7 @@ struct cpu_aggr_map *cpu_aggr_map__empty_new(int nr) cpus->nr = nr; for (i = 0; i < nr; i++) - cpus->map[i] = cpu_map__empty_aggr_cpu_id(); + cpus->map[i] = aggr_cpu_id__empty(); refcount_set(&cpus->refcnt, 1); } @@ -122,28 +126,21 @@ static int cpu__get_topology_int(int cpu, const char *name, int *value) return sysfs__read_int(path, value); } -int cpu_map__get_socket_id(int cpu) +int cpu__get_socket_id(struct perf_cpu cpu) { - int value, ret = cpu__get_topology_int(cpu, "physical_package_id", &value); + int value, ret = cpu__get_topology_int(cpu.cpu, "physical_package_id", &value); return ret ?: value; } -struct aggr_cpu_id cpu_map__get_socket(struct perf_cpu_map *map, int idx, - void *data __maybe_unused) +struct aggr_cpu_id aggr_cpu_id__socket(struct perf_cpu cpu, void *data __maybe_unused) { - int cpu; - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); + struct aggr_cpu_id id = aggr_cpu_id__empty(); - if (idx > map->nr) - return id; - - cpu = map->map[idx]; - - id.socket = cpu_map__get_socket_id(cpu); + id.socket = cpu__get_socket_id(cpu); return id; } -static int cmp_aggr_cpu_id(const void *a_pointer, const void *b_pointer) +static int aggr_cpu_id__cmp(const void *a_pointer, const void *b_pointer) { struct aggr_cpu_id *a = (struct aggr_cpu_id *)a_pointer; struct aggr_cpu_id *b = (struct aggr_cpu_id *)b_pointer; @@ -160,57 +157,64 @@ static int cmp_aggr_cpu_id(const void *a_pointer, const void *b_pointer) return a->thread - b->thread; } -int cpu_map__build_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **res, - struct aggr_cpu_id (*f)(struct perf_cpu_map *map, int cpu, void *data), - void *data) +struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus, + aggr_cpu_id_get_t get_id, + void *data) { - int nr = cpus->nr; - struct cpu_aggr_map *c = cpu_aggr_map__empty_new(nr); - int cpu, s2; - struct aggr_cpu_id s1; + int idx; + struct perf_cpu cpu; + struct cpu_aggr_map *c = cpu_aggr_map__empty_new(cpus->nr); if (!c) - return -1; + return NULL; /* Reset size as it may only be partially filled */ c->nr = 0; - for (cpu = 0; cpu < nr; cpu++) { - s1 = f(cpus, cpu, data); - for (s2 = 0; s2 < c->nr; s2++) { - if (cpu_map__compare_aggr_cpu_id(s1, c->map[s2])) + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + bool duplicate = false; + struct aggr_cpu_id cpu_id = get_id(cpu, data); + + for (int j = 0; j < c->nr; j++) { + if (aggr_cpu_id__equal(&cpu_id, &c->map[j])) { + duplicate = true; break; + } } - if (s2 == c->nr) { - c->map[c->nr] = s1; + if (!duplicate) { + c->map[c->nr] = cpu_id; c->nr++; } } + /* Trim. */ + if (c->nr != cpus->nr) { + struct cpu_aggr_map *trimmed_c = + realloc(c, + sizeof(struct cpu_aggr_map) + sizeof(struct aggr_cpu_id) * c->nr); + + if (trimmed_c) + c = trimmed_c; + } /* ensure we process id in increasing order */ - qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), cmp_aggr_cpu_id); + qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp); + + return c; - *res = c; - return 0; } -int cpu_map__get_die_id(int cpu) +int cpu__get_die_id(struct perf_cpu cpu) { - int value, ret = cpu__get_topology_int(cpu, "die_id", &value); + int value, ret = cpu__get_topology_int(cpu.cpu, "die_id", &value); return ret ?: value; } -struct aggr_cpu_id cpu_map__get_die(struct perf_cpu_map *map, int idx, void *data) +struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data) { - int cpu, die; - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); + struct aggr_cpu_id id; + int die; - if (idx > map->nr) - return id; - - cpu = map->map[idx]; - - die = cpu_map__get_die_id(cpu); + die = cpu__get_die_id(cpu); /* There is no die_id on legacy system. */ if (die == -1) die = 0; @@ -220,79 +224,59 @@ struct aggr_cpu_id cpu_map__get_die(struct perf_cpu_map *map, int idx, void *dat * with the socket ID and then add die to * make a unique ID. */ - id = cpu_map__get_socket(map, idx, data); - if (cpu_map__aggr_cpu_id_is_empty(id)) + id = aggr_cpu_id__socket(cpu, data); + if (aggr_cpu_id__is_empty(&id)) return id; id.die = die; return id; } -int cpu_map__get_core_id(int cpu) +int cpu__get_core_id(struct perf_cpu cpu) { - int value, ret = cpu__get_topology_int(cpu, "core_id", &value); + int value, ret = cpu__get_topology_int(cpu.cpu, "core_id", &value); return ret ?: value; } -int cpu_map__get_node_id(int cpu) -{ - return cpu__get_node(cpu); -} - -struct aggr_cpu_id cpu_map__get_core(struct perf_cpu_map *map, int idx, void *data) +struct aggr_cpu_id aggr_cpu_id__core(struct perf_cpu cpu, void *data) { - int cpu; - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); - - if (idx > map->nr) - return id; + struct aggr_cpu_id id; + int core = cpu__get_core_id(cpu); - cpu = map->map[idx]; - - cpu = cpu_map__get_core_id(cpu); - - /* cpu_map__get_die returns a struct with socket and die set*/ - id = cpu_map__get_die(map, idx, data); - if (cpu_map__aggr_cpu_id_is_empty(id)) + /* aggr_cpu_id__die returns a struct with socket and die set. */ + id = aggr_cpu_id__die(cpu, data); + if (aggr_cpu_id__is_empty(&id)) return id; /* * core_id is relative to socket and die, we need a global id. * So we combine the result from cpu_map__get_die with the core id */ - id.core = cpu; + id.core = core; return id; + } -struct aggr_cpu_id cpu_map__get_node(struct perf_cpu_map *map, int idx, void *data __maybe_unused) +struct aggr_cpu_id aggr_cpu_id__cpu(struct perf_cpu cpu, void *data) { - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); + struct aggr_cpu_id id; - if (idx < 0 || idx >= map->nr) + /* aggr_cpu_id__core returns a struct with socket, die and core set. */ + id = aggr_cpu_id__core(cpu, data); + if (aggr_cpu_id__is_empty(&id)) return id; - id.node = cpu_map__get_node_id(map->map[idx]); + id.cpu = cpu; return id; -} -int cpu_map__build_socket_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **sockp) -{ - return cpu_map__build_map(cpus, sockp, cpu_map__get_socket, NULL); } -int cpu_map__build_die_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **diep) +struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data __maybe_unused) { - return cpu_map__build_map(cpus, diep, cpu_map__get_die, NULL); -} - -int cpu_map__build_core_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **corep) -{ - return cpu_map__build_map(cpus, corep, cpu_map__get_core, NULL); -} + struct aggr_cpu_id id = aggr_cpu_id__empty(); -int cpu_map__build_node_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **numap) -{ - return cpu_map__build_map(cpus, numap, cpu_map__get_node, NULL); + id.node = cpu__get_node(cpu); + return id; } /* setup simple routines to easily access node numbers given a cpu number */ @@ -335,8 +319,8 @@ static void set_max_cpu_num(void) int ret = -1; /* set up default */ - max_cpu_num = 4096; - max_present_cpu_num = 4096; + max_cpu_num.cpu = 4096; + max_present_cpu_num.cpu = 4096; mnt = sysfs__mountpoint(); if (!mnt) @@ -349,7 +333,7 @@ static void set_max_cpu_num(void) goto out; } - ret = get_max_num(path, &max_cpu_num); + ret = get_max_num(path, &max_cpu_num.cpu); if (ret) goto out; @@ -360,11 +344,11 @@ static void set_max_cpu_num(void) goto out; } - ret = get_max_num(path, &max_present_cpu_num); + ret = get_max_num(path, &max_present_cpu_num.cpu); out: if (ret) - pr_err("Failed to read max cpus, using default of %d\n", max_cpu_num); + pr_err("Failed to read max cpus, using default of %d\n", max_cpu_num.cpu); } /* Determine highest possible node in the system for sparse allocation */ @@ -403,31 +387,31 @@ int cpu__max_node(void) return max_node_num; } -int cpu__max_cpu(void) +struct perf_cpu cpu__max_cpu(void) { - if (unlikely(!max_cpu_num)) + if (unlikely(!max_cpu_num.cpu)) set_max_cpu_num(); return max_cpu_num; } -int cpu__max_present_cpu(void) +struct perf_cpu cpu__max_present_cpu(void) { - if (unlikely(!max_present_cpu_num)) + if (unlikely(!max_present_cpu_num.cpu)) set_max_cpu_num(); return max_present_cpu_num; } -int cpu__get_node(int cpu) +int cpu__get_node(struct perf_cpu cpu) { if (unlikely(cpunode_map == NULL)) { pr_debug("cpu_map not initialized\n"); return -1; } - return cpunode_map[cpu]; + return cpunode_map[cpu.cpu]; } static int init_cpunode_map(void) @@ -437,13 +421,13 @@ static int init_cpunode_map(void) set_max_cpu_num(); set_max_node_num(); - cpunode_map = calloc(max_cpu_num, sizeof(int)); + cpunode_map = calloc(max_cpu_num.cpu, sizeof(int)); if (!cpunode_map) { pr_err("%s: calloc failed\n", __func__); return -1; } - for (i = 0; i < max_cpu_num; i++) + for (i = 0; i < max_cpu_num.cpu; i++) cpunode_map[i] = -1; return 0; @@ -502,47 +486,39 @@ int cpu__setup_cpunode_map(void) return 0; } -bool cpu_map__has(struct perf_cpu_map *cpus, int cpu) -{ - return perf_cpu_map__idx(cpus, cpu) != -1; -} - -int cpu_map__cpu(struct perf_cpu_map *cpus, int idx) -{ - return cpus->map[idx]; -} - size_t cpu_map__snprint(struct perf_cpu_map *map, char *buf, size_t size) { - int i, cpu, start = -1; + int i, start = -1; bool first = true; size_t ret = 0; #define COMMA first ? "" : "," for (i = 0; i < map->nr + 1; i++) { + struct perf_cpu cpu = { .cpu = INT_MAX }; bool last = i == map->nr; - cpu = last ? INT_MAX : map->map[i]; + if (!last) + cpu = map->map[i]; if (start == -1) { start = i; if (last) { ret += snprintf(buf + ret, size - ret, "%s%d", COMMA, - map->map[i]); + map->map[i].cpu); } - } else if (((i - start) != (cpu - map->map[start])) || last) { + } else if (((i - start) != (cpu.cpu - map->map[start].cpu)) || last) { int end = i - 1; if (start == end) { ret += snprintf(buf + ret, size - ret, "%s%d", COMMA, - map->map[start]); + map->map[start].cpu); } else { ret += snprintf(buf + ret, size - ret, "%s%d-%d", COMMA, - map->map[start], map->map[end]); + map->map[start].cpu, map->map[end].cpu); } first = false; start = i; @@ -569,23 +545,23 @@ size_t cpu_map__snprint_mask(struct perf_cpu_map *map, char *buf, size_t size) int i, cpu; char *ptr = buf; unsigned char *bitmap; - int last_cpu = cpu_map__cpu(map, map->nr - 1); + struct perf_cpu last_cpu = perf_cpu_map__cpu(map, map->nr - 1); if (buf == NULL) return 0; - bitmap = zalloc(last_cpu / 8 + 1); + bitmap = zalloc(last_cpu.cpu / 8 + 1); if (bitmap == NULL) { buf[0] = '\0'; return 0; } for (i = 0; i < map->nr; i++) { - cpu = cpu_map__cpu(map, i); + cpu = perf_cpu_map__cpu(map, i).cpu; bitmap[cpu / 8] |= 1 << (cpu % 8); } - for (cpu = last_cpu / 4 * 4; cpu >= 0; cpu -= 4) { + for (cpu = last_cpu.cpu / 4 * 4; cpu >= 0; cpu -= 4) { unsigned char bits = bitmap[cpu / 8]; if (cpu % 8) @@ -614,32 +590,35 @@ const struct perf_cpu_map *cpu_map__online(void) /* thread unsafe */ return online; } -bool cpu_map__compare_aggr_cpu_id(struct aggr_cpu_id a, struct aggr_cpu_id b) +bool aggr_cpu_id__equal(const struct aggr_cpu_id *a, const struct aggr_cpu_id *b) { - return a.thread == b.thread && - a.node == b.node && - a.socket == b.socket && - a.die == b.die && - a.core == b.core; + return a->thread == b->thread && + a->node == b->node && + a->socket == b->socket && + a->die == b->die && + a->core == b->core && + a->cpu.cpu == b->cpu.cpu; } -bool cpu_map__aggr_cpu_id_is_empty(struct aggr_cpu_id a) +bool aggr_cpu_id__is_empty(const struct aggr_cpu_id *a) { - return a.thread == -1 && - a.node == -1 && - a.socket == -1 && - a.die == -1 && - a.core == -1; + return a->thread == -1 && + a->node == -1 && + a->socket == -1 && + a->die == -1 && + a->core == -1 && + a->cpu.cpu == -1; } -struct aggr_cpu_id cpu_map__empty_aggr_cpu_id(void) +struct aggr_cpu_id aggr_cpu_id__empty(void) { struct aggr_cpu_id ret = { .thread = -1, .node = -1, .socket = -1, .die = -1, - .core = -1 + .core = -1, + .cpu = (struct perf_cpu){ .cpu = -1 }, }; return ret; } diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h index a27eeaf086e8..0d3c2006a15d 100644 --- a/tools/perf/util/cpumap.h +++ b/tools/perf/util/cpumap.h @@ -2,71 +2,135 @@ #ifndef __PERF_CPUMAP_H #define __PERF_CPUMAP_H +#include <stdbool.h> #include <stdio.h> #include <stdbool.h> #include <internal/cpumap.h> #include <perf/cpumap.h> +/** Identify where counts are aggregated, -1 implies not to aggregate. */ struct aggr_cpu_id { + /** A value in the range 0 to number of threads. */ int thread; + /** The numa node X as read from /sys/devices/system/node/nodeX. */ int node; + /** + * The socket number as read from + * /sys/devices/system/cpu/cpuX/topology/physical_package_id. + */ int socket; + /** The die id as read from /sys/devices/system/cpu/cpuX/topology/die_id. */ int die; + /** The core id as read from /sys/devices/system/cpu/cpuX/topology/core_id. */ int core; + /** CPU aggregation, note there is one CPU for each SMT thread. */ + struct perf_cpu cpu; }; +/** A collection of aggr_cpu_id values, the "built" version is sorted and uniqued. */ struct cpu_aggr_map { refcount_t refcnt; + /** Number of valid entries. */ int nr; + /** The entries. */ struct aggr_cpu_id map[]; }; struct perf_record_cpu_map_data; struct perf_cpu_map *perf_cpu_map__empty_new(int nr); -struct cpu_aggr_map *cpu_aggr_map__empty_new(int nr); struct perf_cpu_map *cpu_map__new_data(struct perf_record_cpu_map_data *data); size_t cpu_map__snprint(struct perf_cpu_map *map, char *buf, size_t size); size_t cpu_map__snprint_mask(struct perf_cpu_map *map, char *buf, size_t size); size_t cpu_map__fprintf(struct perf_cpu_map *map, FILE *fp); -int cpu_map__get_socket_id(int cpu); -struct aggr_cpu_id cpu_map__get_socket(struct perf_cpu_map *map, int idx, void *data); -int cpu_map__get_die_id(int cpu); -struct aggr_cpu_id cpu_map__get_die(struct perf_cpu_map *map, int idx, void *data); -int cpu_map__get_core_id(int cpu); -struct aggr_cpu_id cpu_map__get_core(struct perf_cpu_map *map, int idx, void *data); -int cpu_map__get_node_id(int cpu); -struct aggr_cpu_id cpu_map__get_node(struct perf_cpu_map *map, int idx, void *data); -int cpu_map__build_socket_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **sockp); -int cpu_map__build_die_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **diep); -int cpu_map__build_core_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **corep); -int cpu_map__build_node_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **nodep); const struct perf_cpu_map *cpu_map__online(void); /* thread unsafe */ -static inline int cpu_map__socket(struct perf_cpu_map *sock, int s) +int cpu__setup_cpunode_map(void); + +int cpu__max_node(void); +struct perf_cpu cpu__max_cpu(void); +struct perf_cpu cpu__max_present_cpu(void); + +/** + * cpu_map__is_dummy - Events associated with a pid, rather than a CPU, use a single dummy map with an entry of -1. + */ +static inline bool cpu_map__is_dummy(struct perf_cpu_map *cpus) { - if (!sock || s > sock->nr || s < 0) - return 0; - return sock->map[s]; + return cpus->nr == 1 && cpus->map[0].cpu == -1; } -int cpu__setup_cpunode_map(void); +/** + * cpu__get_node - Returns the numa node X as read from + * /sys/devices/system/node/nodeX for the given CPU. + */ +int cpu__get_node(struct perf_cpu cpu); +/** + * cpu__get_socket_id - Returns the socket number as read from + * /sys/devices/system/cpu/cpuX/topology/physical_package_id for the given CPU. + */ +int cpu__get_socket_id(struct perf_cpu cpu); +/** + * cpu__get_die_id - Returns the die id as read from + * /sys/devices/system/cpu/cpuX/topology/die_id for the given CPU. + */ +int cpu__get_die_id(struct perf_cpu cpu); +/** + * cpu__get_core_id - Returns the core id as read from + * /sys/devices/system/cpu/cpuX/topology/core_id for the given CPU. + */ +int cpu__get_core_id(struct perf_cpu cpu); -int cpu__max_node(void); -int cpu__max_cpu(void); -int cpu__max_present_cpu(void); -int cpu__get_node(int cpu); +/** + * cpu_aggr_map__empty_new - Create a cpu_aggr_map of size nr with every entry + * being empty. + */ +struct cpu_aggr_map *cpu_aggr_map__empty_new(int nr); + +typedef struct aggr_cpu_id (*aggr_cpu_id_get_t)(struct perf_cpu cpu, void *data); + +/** + * cpu_aggr_map__new - Create a cpu_aggr_map with an aggr_cpu_id for each cpu in + * cpus. The aggr_cpu_id is created with 'get_id' that may have a data value + * passed to it. The cpu_aggr_map is sorted with duplicate values removed. + */ +struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus, + aggr_cpu_id_get_t get_id, + void *data); -int cpu_map__build_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **res, - struct aggr_cpu_id (*f)(struct perf_cpu_map *map, int cpu, void *data), - void *data); +bool aggr_cpu_id__equal(const struct aggr_cpu_id *a, const struct aggr_cpu_id *b); +bool aggr_cpu_id__is_empty(const struct aggr_cpu_id *a); +struct aggr_cpu_id aggr_cpu_id__empty(void); -int cpu_map__cpu(struct perf_cpu_map *cpus, int idx); -bool cpu_map__has(struct perf_cpu_map *cpus, int cpu); -bool cpu_map__compare_aggr_cpu_id(struct aggr_cpu_id a, struct aggr_cpu_id b); -bool cpu_map__aggr_cpu_id_is_empty(struct aggr_cpu_id a); -struct aggr_cpu_id cpu_map__empty_aggr_cpu_id(void); +/** + * aggr_cpu_id__socket - Create an aggr_cpu_id with the socket populated with + * the socket for cpu. The function signature is compatible with + * aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__socket(struct perf_cpu cpu, void *data); +/** + * aggr_cpu_id__die - Create an aggr_cpu_id with the die and socket populated + * with the die and socket for cpu. The function signature is compatible with + * aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data); +/** + * aggr_cpu_id__core - Create an aggr_cpu_id with the core, die and socket + * populated with the core, die and socket for cpu. The function signature is + * compatible with aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__core(struct perf_cpu cpu, void *data); +/** + * aggr_cpu_id__core - Create an aggr_cpu_id with the cpu, core, die and socket + * populated with the cpu, core, die and socket for cpu. The function signature + * is compatible with aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__cpu(struct perf_cpu cpu, void *data); +/** + * aggr_cpu_id__node - Create an aggr_cpu_id with the numa node populated for + * cpu. The function signature is compatible with aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data); #endif /* __PERF_CPUMAP_H */ diff --git a/tools/perf/util/cputopo.c b/tools/perf/util/cputopo.c index 51b429c86f98..e20b835a1194 100644 --- a/tools/perf/util/cputopo.c +++ b/tools/perf/util/cputopo.c @@ -165,7 +165,8 @@ static bool has_die_topology(void) if (uname(&uts) < 0) return false; - if (strncmp(uts.machine, "x86_64", 6)) + if (strncmp(uts.machine, "x86_64", 6) && + strncmp(uts.machine, "s390x", 5)) return false; scnprintf(filename, MAXPATHLEN, DIE_CPUS_FMT, @@ -187,7 +188,7 @@ struct cpu_topology *cpu_topology__new(void) struct perf_cpu_map *map; bool has_die = has_die_topology(); - ncpus = cpu__max_present_cpu(); + ncpus = cpu__max_present_cpu().cpu; /* build online CPU map */ map = perf_cpu_map__new(NULL); @@ -218,7 +219,7 @@ struct cpu_topology *cpu_topology__new(void) tp->core_cpus_list = addr; for (i = 0; i < nr; i++) { - if (!cpu_map__has(map, i)) + if (!perf_cpu_map__has(map, (struct perf_cpu){ .cpu = i })) continue; ret = build_cpu_topology(tp, i); @@ -333,7 +334,7 @@ struct numa_topology *numa_topology__new(void) tp->nr = nr; for (i = 0; i < nr; i++) { - if (load_numa_node(&tp->nodes[i], node_map->map[i])) { + if (load_numa_node(&tp->nodes[i], node_map->map[i].cpu)) { numa_topology__delete(tp); tp = NULL; break; diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c index 2c06abf6dcd2..65e6c22f38e4 100644 --- a/tools/perf/util/debug.c +++ b/tools/perf/util/debug.c @@ -179,7 +179,7 @@ static int trace_event_printer(enum binary_printer_ops op, break; case BINARY_PRINT_CHAR_DATA: printed += color_fprintf(fp, color, "%c", - isprint(ch) ? ch : '.'); + isprint(ch) && isascii(ch) ? ch : '.'); break; case BINARY_PRINT_CHAR_PAD: printed += color_fprintf(fp, color, " "); diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index b9904896eb97..579e44c59914 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -285,13 +285,13 @@ out_enomem: int perf_env__read_cpu_topology_map(struct perf_env *env) { - int cpu, nr_cpus; + int idx, nr_cpus; if (env->cpu != NULL) return 0; if (env->nr_cpus_avail == 0) - env->nr_cpus_avail = cpu__max_present_cpu(); + env->nr_cpus_avail = cpu__max_present_cpu().cpu; nr_cpus = env->nr_cpus_avail; if (nr_cpus == -1) @@ -301,10 +301,12 @@ int perf_env__read_cpu_topology_map(struct perf_env *env) if (env->cpu == NULL) return -ENOMEM; - for (cpu = 0; cpu < nr_cpus; ++cpu) { - env->cpu[cpu].core_id = cpu_map__get_core_id(cpu); - env->cpu[cpu].socket_id = cpu_map__get_socket_id(cpu); - env->cpu[cpu].die_id = cpu_map__get_die_id(cpu); + for (idx = 0; idx < nr_cpus; ++idx) { + struct perf_cpu cpu = { .cpu = idx }; + + env->cpu[idx].core_id = cpu__get_core_id(cpu); + env->cpu[idx].socket_id = cpu__get_socket_id(cpu); + env->cpu[idx].die_id = cpu__get_die_id(cpu); } env->nr_cpus_avail = nr_cpus; @@ -381,7 +383,7 @@ static int perf_env__read_arch(struct perf_env *env) static int perf_env__read_nr_cpus_avail(struct perf_env *env) { if (env->nr_cpus_avail == 0) - env->nr_cpus_avail = cpu__max_present_cpu(); + env->nr_cpus_avail = cpu__max_present_cpu().cpu; return env->nr_cpus_avail ? 0 : -ENOENT; } @@ -487,7 +489,7 @@ const char *perf_env__pmu_mappings(struct perf_env *env) return env->pmu_mappings; } -int perf_env__numa_node(struct perf_env *env, int cpu) +int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu) { if (!env->nr_numa_map) { struct numa_node *nn; @@ -495,7 +497,7 @@ int perf_env__numa_node(struct perf_env *env, int cpu) for (i = 0; i < env->nr_numa_nodes; i++) { nn = &env->numa_nodes[i]; - nr = max(nr, perf_cpu_map__max(nn->map)); + nr = max(nr, perf_cpu_map__max(nn->map).cpu); } nr++; @@ -514,13 +516,14 @@ int perf_env__numa_node(struct perf_env *env, int cpu) env->nr_numa_map = nr; for (i = 0; i < env->nr_numa_nodes; i++) { - int tmp, j; + struct perf_cpu tmp; + int j; nn = &env->numa_nodes[i]; - perf_cpu_map__for_each_cpu(j, tmp, nn->map) - env->numa_map[j] = i; + perf_cpu_map__for_each_cpu(tmp, j, nn->map) + env->numa_map[tmp.cpu] = i; } } - return cpu >= 0 && cpu < env->nr_numa_map ? env->numa_map[cpu] : -1; + return cpu.cpu >= 0 && cpu.cpu < env->nr_numa_map ? env->numa_map[cpu.cpu] : -1; } diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h index 163e5ec503a2..a3541f98e1fc 100644 --- a/tools/perf/util/env.h +++ b/tools/perf/util/env.h @@ -4,6 +4,7 @@ #include <linux/types.h> #include <linux/rbtree.h> +#include "cpumap.h" #include "rwsem.h" struct perf_cpu_map; @@ -170,5 +171,5 @@ struct bpf_prog_info_node *perf_env__find_bpf_prog_info(struct perf_env *env, bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node); struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id); -int perf_env__numa_node(struct perf_env *env, int cpu); +int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu); #endif /* __PERF_ENV_H */ diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 5f92319ce258..6e88d404b5b3 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -342,36 +342,65 @@ static int evlist__nr_threads(struct evlist *evlist, struct evsel *evsel) return perf_thread_map__nr(evlist->core.threads); } -void evlist__cpu_iter_start(struct evlist *evlist) -{ - struct evsel *pos; - - /* - * Reset the per evsel cpu_iter. This is needed because - * each evsel's cpumap may have a different index space, - * and some operations need the index to modify - * the FD xyarray (e.g. open, close) - */ - evlist__for_each_entry(evlist, pos) - pos->cpu_iter = 0; -} +struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct affinity *affinity) +{ + struct evlist_cpu_iterator itr = { + .container = evlist, + .evsel = evlist__first(evlist), + .cpu_map_idx = 0, + .evlist_cpu_map_idx = 0, + .evlist_cpu_map_nr = perf_cpu_map__nr(evlist->core.all_cpus), + .cpu = (struct perf_cpu){ .cpu = -1}, + .affinity = affinity, + }; -bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu) -{ - if (ev->cpu_iter >= ev->core.cpus->nr) - return true; - if (cpu >= 0 && ev->core.cpus->map[ev->cpu_iter] != cpu) - return true; - return false; + if (itr.affinity) { + itr.cpu = perf_cpu_map__cpu(evlist->core.all_cpus, 0); + affinity__set(itr.affinity, itr.cpu.cpu); + itr.cpu_map_idx = perf_cpu_map__idx(itr.evsel->core.cpus, itr.cpu); + /* + * If this CPU isn't in the evsel's cpu map then advance through + * the list. + */ + if (itr.cpu_map_idx == -1) + evlist_cpu_iterator__next(&itr); + } + return itr; +} + +void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr) +{ + while (evlist_cpu_itr->evsel != evlist__last(evlist_cpu_itr->container)) { + evlist_cpu_itr->evsel = evsel__next(evlist_cpu_itr->evsel); + evlist_cpu_itr->cpu_map_idx = + perf_cpu_map__idx(evlist_cpu_itr->evsel->core.cpus, + evlist_cpu_itr->cpu); + if (evlist_cpu_itr->cpu_map_idx != -1) + return; + } + evlist_cpu_itr->evlist_cpu_map_idx++; + if (evlist_cpu_itr->evlist_cpu_map_idx < evlist_cpu_itr->evlist_cpu_map_nr) { + evlist_cpu_itr->evsel = evlist__first(evlist_cpu_itr->container); + evlist_cpu_itr->cpu = + perf_cpu_map__cpu(evlist_cpu_itr->container->core.all_cpus, + evlist_cpu_itr->evlist_cpu_map_idx); + if (evlist_cpu_itr->affinity) + affinity__set(evlist_cpu_itr->affinity, evlist_cpu_itr->cpu.cpu); + evlist_cpu_itr->cpu_map_idx = + perf_cpu_map__idx(evlist_cpu_itr->evsel->core.cpus, + evlist_cpu_itr->cpu); + /* + * If this CPU isn't in the evsel's cpu map then advance through + * the list. + */ + if (evlist_cpu_itr->cpu_map_idx == -1) + evlist_cpu_iterator__next(evlist_cpu_itr); + } } -bool evsel__cpu_iter_skip(struct evsel *ev, int cpu) +bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu_itr) { - if (!evsel__cpu_iter_skip_no_inc(ev, cpu)) { - ev->cpu_iter++; - return false; - } - return true; + return evlist_cpu_itr->evlist_cpu_map_idx >= evlist_cpu_itr->evlist_cpu_map_nr; } static int evsel__strcmp(struct evsel *pos, char *evsel_name) @@ -400,31 +429,26 @@ static int evlist__is_enabled(struct evlist *evlist) static void __evlist__disable(struct evlist *evlist, char *evsel_name) { struct evsel *pos; + struct evlist_cpu_iterator evlist_cpu_itr; struct affinity affinity; - int cpu, i, imm = 0; bool has_imm = false; if (affinity__setup(&affinity) < 0) return; /* Disable 'immediate' events last */ - for (imm = 0; imm <= 1; imm++) { - evlist__for_each_cpu(evlist, i, cpu) { - affinity__set(&affinity, cpu); - - evlist__for_each_entry(evlist, pos) { - if (evsel__strcmp(pos, evsel_name)) - continue; - if (evsel__cpu_iter_skip(pos, cpu)) - continue; - if (pos->disabled || !evsel__is_group_leader(pos) || !pos->core.fd) - continue; - if (pos->immediate) - has_imm = true; - if (pos->immediate != imm) - continue; - evsel__disable_cpu(pos, pos->cpu_iter - 1); - } + for (int imm = 0; imm <= 1; imm++) { + evlist__for_each_cpu(evlist_cpu_itr, evlist, &affinity) { + pos = evlist_cpu_itr.evsel; + if (evsel__strcmp(pos, evsel_name)) + continue; + if (pos->disabled || !evsel__is_group_leader(pos) || !pos->core.fd) + continue; + if (pos->immediate) + has_imm = true; + if (pos->immediate != imm) + continue; + evsel__disable_cpu(pos, evlist_cpu_itr.cpu_map_idx); } if (!has_imm) break; @@ -462,24 +486,19 @@ void evlist__disable_evsel(struct evlist *evlist, char *evsel_name) static void __evlist__enable(struct evlist *evlist, char *evsel_name) { struct evsel *pos; + struct evlist_cpu_iterator evlist_cpu_itr; struct affinity affinity; - int cpu, i; if (affinity__setup(&affinity) < 0) return; - evlist__for_each_cpu(evlist, i, cpu) { - affinity__set(&affinity, cpu); - - evlist__for_each_entry(evlist, pos) { - if (evsel__strcmp(pos, evsel_name)) - continue; - if (evsel__cpu_iter_skip(pos, cpu)) - continue; - if (!evsel__is_group_leader(pos) || !pos->core.fd) - continue; - evsel__enable_cpu(pos, pos->cpu_iter - 1); - } + evlist__for_each_cpu(evlist_cpu_itr, evlist, &affinity) { + pos = evlist_cpu_itr.evsel; + if (evsel__strcmp(pos, evsel_name)) + continue; + if (!evsel__is_group_leader(pos) || !pos->core.fd) + continue; + evsel__enable_cpu(pos, evlist_cpu_itr.cpu_map_idx); } affinity__cleanup(&affinity); evlist__for_each_entry(evlist, pos) { @@ -800,7 +819,7 @@ perf_evlist__mmap_cb_get(struct perf_evlist *_evlist, bool overwrite, int idx) static int perf_evlist__mmap_cb_mmap(struct perf_mmap *_map, struct perf_mmap_param *_mp, - int output, int cpu) + int output, struct perf_cpu cpu) { struct mmap *map = container_of(_map, struct mmap, core); struct mmap_params *mp = container_of(_mp, struct mmap_params, core); @@ -1264,14 +1283,14 @@ void evlist__set_selected(struct evlist *evlist, struct evsel *evsel) void evlist__close(struct evlist *evlist) { struct evsel *evsel; + struct evlist_cpu_iterator evlist_cpu_itr; struct affinity affinity; - int cpu, i; /* * With perf record core.cpus is usually NULL. * Use the old method to handle this for now. */ - if (!evlist->core.cpus) { + if (!evlist->core.cpus || cpu_map__is_dummy(evlist->core.cpus)) { evlist__for_each_entry_reverse(evlist, evsel) evsel__close(evsel); return; @@ -1279,15 +1298,12 @@ void evlist__close(struct evlist *evlist) if (affinity__setup(&affinity) < 0) return; - evlist__for_each_cpu(evlist, i, cpu) { - affinity__set(&affinity, cpu); - evlist__for_each_entry_reverse(evlist, evsel) { - if (evsel__cpu_iter_skip(evsel, cpu)) - continue; - perf_evsel__close_cpu(&evsel->core, evsel->cpu_iter - 1); - } + evlist__for_each_cpu(evlist_cpu_itr, evlist, &affinity) { + perf_evsel__close_cpu(&evlist_cpu_itr.evsel->core, + evlist_cpu_itr.cpu_map_idx); } + affinity__cleanup(&affinity); evlist__for_each_entry_reverse(evlist, evsel) { perf_evsel__free_fd(&evsel->core); diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 97bfb8d0be4f..64cba56fbc74 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -64,6 +64,7 @@ struct evlist { struct evsel *selected; struct events_stats stats; struct perf_env *env; + const char *hybrid_pmu_name; void (*trace_event_sample_raw)(struct evlist *evlist, union perf_event *event, struct perf_sample *sample); @@ -110,6 +111,7 @@ int __evlist__add_default_attrs(struct evlist *evlist, __evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array)) int arch_evlist__add_default_attrs(struct evlist *evlist); +struct evsel *arch_evlist__leader(struct list_head *list); int evlist__add_dummy(struct evlist *evlist); @@ -325,17 +327,53 @@ void evlist__to_front(struct evlist *evlist, struct evsel *move_evsel); #define evlist__for_each_entry_safe(evlist, tmp, evsel) \ __evlist__for_each_entry_safe(&(evlist)->core.entries, tmp, evsel) -#define evlist__for_each_cpu(evlist, index, cpu) \ - evlist__cpu_iter_start(evlist); \ - perf_cpu_map__for_each_cpu (cpu, index, (evlist)->core.all_cpus) +/** Iterator state for evlist__for_each_cpu */ +struct evlist_cpu_iterator { + /** The list being iterated through. */ + struct evlist *container; + /** The current evsel of the iterator. */ + struct evsel *evsel; + /** The CPU map index corresponding to the evsel->core.cpus for the current CPU. */ + int cpu_map_idx; + /** + * The CPU map index corresponding to evlist->core.all_cpus for the + * current CPU. Distinct from cpu_map_idx as the evsel's cpu map may + * contain fewer entries. + */ + int evlist_cpu_map_idx; + /** The number of CPU map entries in evlist->core.all_cpus. */ + int evlist_cpu_map_nr; + /** The current CPU of the iterator. */ + struct perf_cpu cpu; + /** If present, used to set the affinity when switching between CPUs. */ + struct affinity *affinity; +}; + +/** + * evlist__for_each_cpu - without affinity, iterate over the evlist. With + * affinity, iterate over all CPUs and then the evlist + * for each evsel on that CPU. When switching between + * CPUs the affinity is set to the CPU to avoid IPIs + * during syscalls. + * @evlist_cpu_itr: the iterator instance. + * @evlist: evlist instance to iterate. + * @affinity: NULL or used to set the affinity to the current CPU. + */ +#define evlist__for_each_cpu(evlist_cpu_itr, evlist, affinity) \ + for ((evlist_cpu_itr) = evlist__cpu_begin(evlist, affinity); \ + !evlist_cpu_iterator__end(&evlist_cpu_itr); \ + evlist_cpu_iterator__next(&evlist_cpu_itr)) + +/** Returns an iterator set to the first CPU/evsel of evlist. */ +struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct affinity *affinity); +/** Move to next element in iterator, updating CPU, evsel and the affinity. */ +void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr); +/** Returns true when iterator is at the end of the CPUs and evlist. */ +bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu_itr); struct evsel *evlist__get_tracking_event(struct evlist *evlist); void evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_evsel); -void evlist__cpu_iter_start(struct evlist *evlist); -bool evsel__cpu_iter_skip(struct evsel *ev, int cpu); -bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu); - struct evsel *evlist__find_evsel_by_str(struct evlist *evlist, const char *str); struct evsel *evlist__event2evsel(struct evlist *evlist, union perf_event *event); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index f29d37004f55..2f6b18af49e5 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1372,9 +1372,9 @@ int evsel__append_addr_filter(struct evsel *evsel, const char *filter) } /* Caller has to clear disabled after going through all CPUs. */ -int evsel__enable_cpu(struct evsel *evsel, int cpu) +int evsel__enable_cpu(struct evsel *evsel, int cpu_map_idx) { - return perf_evsel__enable_cpu(&evsel->core, cpu); + return perf_evsel__enable_cpu(&evsel->core, cpu_map_idx); } int evsel__enable(struct evsel *evsel) @@ -1387,9 +1387,9 @@ int evsel__enable(struct evsel *evsel) } /* Caller has to set disabled after going through all CPUs. */ -int evsel__disable_cpu(struct evsel *evsel, int cpu) +int evsel__disable_cpu(struct evsel *evsel, int cpu_map_idx) { - return perf_evsel__disable_cpu(&evsel->core, cpu); + return perf_evsel__disable_cpu(&evsel->core, cpu_map_idx); } int evsel__disable(struct evsel *evsel) @@ -1455,7 +1455,7 @@ void evsel__delete(struct evsel *evsel) free(evsel); } -void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread, +void evsel__compute_deltas(struct evsel *evsel, int cpu_map_idx, int thread, struct perf_counts_values *count) { struct perf_counts_values tmp; @@ -1463,12 +1463,12 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread, if (!evsel->prev_raw_counts) return; - if (cpu == -1) { + if (cpu_map_idx == -1) { tmp = evsel->prev_raw_counts->aggr; evsel->prev_raw_counts->aggr = *count; } else { - tmp = *perf_counts(evsel->prev_raw_counts, cpu, thread); - *perf_counts(evsel->prev_raw_counts, cpu, thread) = *count; + tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread); + *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count; } count->val = count->val - tmp.val; @@ -1476,46 +1476,28 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread, count->run = count->run - tmp.run; } -void perf_counts_values__scale(struct perf_counts_values *count, - bool scale, s8 *pscaled) +static int evsel__read_one(struct evsel *evsel, int cpu_map_idx, int thread) { - s8 scaled = 0; + struct perf_counts_values *count = perf_counts(evsel->counts, cpu_map_idx, thread); - if (scale) { - if (count->run == 0) { - scaled = -1; - count->val = 0; - } else if (count->run < count->ena) { - scaled = 1; - count->val = (u64)((double) count->val * count->ena / count->run); - } - } - - if (pscaled) - *pscaled = scaled; -} - -static int evsel__read_one(struct evsel *evsel, int cpu, int thread) -{ - struct perf_counts_values *count = perf_counts(evsel->counts, cpu, thread); - - return perf_evsel__read(&evsel->core, cpu, thread, count); + return perf_evsel__read(&evsel->core, cpu_map_idx, thread, count); } -static void evsel__set_count(struct evsel *counter, int cpu, int thread, u64 val, u64 ena, u64 run) +static void evsel__set_count(struct evsel *counter, int cpu_map_idx, int thread, + u64 val, u64 ena, u64 run) { struct perf_counts_values *count; - count = perf_counts(counter->counts, cpu, thread); + count = perf_counts(counter->counts, cpu_map_idx, thread); count->val = val; count->ena = ena; count->run = run; - perf_counts__set_loaded(counter->counts, cpu, thread, true); + perf_counts__set_loaded(counter->counts, cpu_map_idx, thread, true); } -static int evsel__process_group_data(struct evsel *leader, int cpu, int thread, u64 *data) +static int evsel__process_group_data(struct evsel *leader, int cpu_map_idx, int thread, u64 *data) { u64 read_format = leader->core.attr.read_format; struct sample_read_value *v; @@ -1534,7 +1516,7 @@ static int evsel__process_group_data(struct evsel *leader, int cpu, int thread, v = (struct sample_read_value *) data; - evsel__set_count(leader, cpu, thread, v[0].value, ena, run); + evsel__set_count(leader, cpu_map_idx, thread, v[0].value, ena, run); for (i = 1; i < nr; i++) { struct evsel *counter; @@ -1543,13 +1525,13 @@ static int evsel__process_group_data(struct evsel *leader, int cpu, int thread, if (!counter) return -EINVAL; - evsel__set_count(counter, cpu, thread, v[i].value, ena, run); + evsel__set_count(counter, cpu_map_idx, thread, v[i].value, ena, run); } return 0; } -static int evsel__read_group(struct evsel *leader, int cpu, int thread) +static int evsel__read_group(struct evsel *leader, int cpu_map_idx, int thread) { struct perf_stat_evsel *ps = leader->stats; u64 read_format = leader->core.attr.read_format; @@ -1570,67 +1552,67 @@ static int evsel__read_group(struct evsel *leader, int cpu, int thread) ps->group_data = data; } - if (FD(leader, cpu, thread) < 0) + if (FD(leader, cpu_map_idx, thread) < 0) return -EINVAL; - if (readn(FD(leader, cpu, thread), data, size) <= 0) + if (readn(FD(leader, cpu_map_idx, thread), data, size) <= 0) return -errno; - return evsel__process_group_data(leader, cpu, thread, data); + return evsel__process_group_data(leader, cpu_map_idx, thread, data); } -int evsel__read_counter(struct evsel *evsel, int cpu, int thread) +int evsel__read_counter(struct evsel *evsel, int cpu_map_idx, int thread) { u64 read_format = evsel->core.attr.read_format; if (read_format & PERF_FORMAT_GROUP) - return evsel__read_group(evsel, cpu, thread); + return evsel__read_group(evsel, cpu_map_idx, thread); - return evsel__read_one(evsel, cpu, thread); + return evsel__read_one(evsel, cpu_map_idx, thread); } -int __evsel__read_on_cpu(struct evsel *evsel, int cpu, int thread, bool scale) +int __evsel__read_on_cpu(struct evsel *evsel, int cpu_map_idx, int thread, bool scale) { struct perf_counts_values count; size_t nv = scale ? 3 : 1; - if (FD(evsel, cpu, thread) < 0) + if (FD(evsel, cpu_map_idx, thread) < 0) return -EINVAL; - if (evsel->counts == NULL && evsel__alloc_counts(evsel, cpu + 1, thread + 1) < 0) + if (evsel->counts == NULL && evsel__alloc_counts(evsel) < 0) return -ENOMEM; - if (readn(FD(evsel, cpu, thread), &count, nv * sizeof(u64)) <= 0) + if (readn(FD(evsel, cpu_map_idx, thread), &count, nv * sizeof(u64)) <= 0) return -errno; - evsel__compute_deltas(evsel, cpu, thread, &count); + evsel__compute_deltas(evsel, cpu_map_idx, thread, &count); perf_counts_values__scale(&count, scale, NULL); - *perf_counts(evsel->counts, cpu, thread) = count; + *perf_counts(evsel->counts, cpu_map_idx, thread) = count; return 0; } static int evsel__match_other_cpu(struct evsel *evsel, struct evsel *other, - int cpu) + int cpu_map_idx) { - int cpuid; + struct perf_cpu cpu; - cpuid = perf_cpu_map__cpu(evsel->core.cpus, cpu); - return perf_cpu_map__idx(other->core.cpus, cpuid); + cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx); + return perf_cpu_map__idx(other->core.cpus, cpu); } -static int evsel__hybrid_group_cpu(struct evsel *evsel, int cpu) +static int evsel__hybrid_group_cpu_map_idx(struct evsel *evsel, int cpu_map_idx) { struct evsel *leader = evsel__leader(evsel); if ((evsel__is_hybrid(evsel) && !evsel__is_hybrid(leader)) || (!evsel__is_hybrid(evsel) && evsel__is_hybrid(leader))) { - return evsel__match_other_cpu(evsel, leader, cpu); + return evsel__match_other_cpu(evsel, leader, cpu_map_idx); } - return cpu; + return cpu_map_idx; } -static int get_group_fd(struct evsel *evsel, int cpu, int thread) +static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread) { struct evsel *leader = evsel__leader(evsel); int fd; @@ -1644,11 +1626,11 @@ static int get_group_fd(struct evsel *evsel, int cpu, int thread) */ BUG_ON(!leader->core.fd); - cpu = evsel__hybrid_group_cpu(evsel, cpu); - if (cpu == -1) + cpu_map_idx = evsel__hybrid_group_cpu_map_idx(evsel, cpu_map_idx); + if (cpu_map_idx == -1) return -1; - fd = FD(leader, cpu, thread); + fd = FD(leader, cpu_map_idx, thread); BUG_ON(fd == -1); return fd; @@ -1662,16 +1644,16 @@ static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int } static int update_fds(struct evsel *evsel, - int nr_cpus, int cpu_idx, + int nr_cpus, int cpu_map_idx, int nr_threads, int thread_idx) { struct evsel *pos; - if (cpu_idx >= nr_cpus || thread_idx >= nr_threads) + if (cpu_map_idx >= nr_cpus || thread_idx >= nr_threads) return -EINVAL; evlist__for_each_entry(evsel->evlist, pos) { - nr_cpus = pos != evsel ? nr_cpus : cpu_idx; + nr_cpus = pos != evsel ? nr_cpus : cpu_map_idx; evsel__remove_fd(pos, nr_cpus, nr_threads, thread_idx); @@ -1685,10 +1667,10 @@ static int update_fds(struct evsel *evsel, return 0; } -bool evsel__ignore_missing_thread(struct evsel *evsel, - int nr_cpus, int cpu, - struct perf_thread_map *threads, - int thread, int err) +static bool evsel__ignore_missing_thread(struct evsel *evsel, + int nr_cpus, int cpu_map_idx, + struct perf_thread_map *threads, + int thread, int err) { pid_t ignore_pid = perf_thread_map__pid(threads, thread); @@ -1711,7 +1693,7 @@ bool evsel__ignore_missing_thread(struct evsel *evsel, * We should remove fd for missing_thread first * because thread_map__remove() will decrease threads->nr. */ - if (update_fds(evsel, nr_cpus, cpu, threads->nr, thread)) + if (update_fds(evsel, nr_cpus, cpu_map_idx, threads->nr, thread)) return false; if (thread_map__remove(threads, thread)) @@ -1993,9 +1975,9 @@ bool evsel__increase_rlimit(enum rlimit_action *set_rlimit) static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, struct perf_thread_map *threads, - int start_cpu, int end_cpu) + int start_cpu_map_idx, int end_cpu_map_idx) { - int cpu, thread, nthreads; + int idx, thread, nthreads; int pid = -1, err, old_errno; enum rlimit_action set_rlimit = NO_CHANGE; @@ -2022,7 +2004,7 @@ fallback_missing_features: display_attr(&evsel->core.attr); - for (cpu = start_cpu; cpu < end_cpu; cpu++) { + for (idx = start_cpu_map_idx; idx < end_cpu_map_idx; idx++) { for (thread = 0; thread < nthreads; thread++) { int fd, group_fd; @@ -2033,17 +2015,17 @@ retry_open: if (!evsel->cgrp && !evsel->core.system_wide) pid = perf_thread_map__pid(threads, thread); - group_fd = get_group_fd(evsel, cpu, thread); + group_fd = get_group_fd(evsel, idx, thread); test_attr__ready(); pr_debug2_peo("sys_perf_event_open: pid %d cpu %d group_fd %d flags %#lx", - pid, cpus->map[cpu], group_fd, evsel->open_flags); + pid, cpus->map[idx].cpu, group_fd, evsel->open_flags); - fd = sys_perf_event_open(&evsel->core.attr, pid, cpus->map[cpu], + fd = sys_perf_event_open(&evsel->core.attr, pid, cpus->map[idx].cpu, group_fd, evsel->open_flags); - FD(evsel, cpu, thread) = fd; + FD(evsel, idx, thread) = fd; if (fd < 0) { err = -errno; @@ -2053,10 +2035,10 @@ retry_open: goto try_fallback; } - bpf_counter__install_pe(evsel, cpu, fd); + bpf_counter__install_pe(evsel, idx, fd); if (unlikely(test_attr__enabled)) { - test_attr__open(&evsel->core.attr, pid, cpus->map[cpu], + test_attr__open(&evsel->core.attr, pid, cpus->map[idx], fd, group_fd, evsel->open_flags); } @@ -2097,7 +2079,7 @@ try_fallback: if (evsel__precise_ip_fallback(evsel)) goto retry_open; - if (evsel__ignore_missing_thread(evsel, cpus->nr, cpu, threads, thread, err)) { + if (evsel__ignore_missing_thread(evsel, cpus->nr, idx, threads, thread, err)) { /* We just removed 1 thread, so lower the upper nthreads limit. */ nthreads--; @@ -2112,7 +2094,7 @@ try_fallback: if (err == -EMFILE && evsel__increase_rlimit(&set_rlimit)) goto retry_open; - if (err != -EINVAL || cpu > 0 || thread > 0) + if (err != -EINVAL || idx > 0 || thread > 0) goto out_close; if (evsel__detect_missing_features(evsel)) @@ -2124,12 +2106,12 @@ out_close: old_errno = errno; do { while (--thread >= 0) { - if (FD(evsel, cpu, thread) >= 0) - close(FD(evsel, cpu, thread)); - FD(evsel, cpu, thread) = -1; + if (FD(evsel, idx, thread) >= 0) + close(FD(evsel, idx, thread)); + FD(evsel, idx, thread) = -1; } thread = nthreads; - } while (--cpu >= 0); + } while (--idx >= 0); errno = old_errno; return err; } @@ -2146,13 +2128,13 @@ void evsel__close(struct evsel *evsel) perf_evsel__free_id(&evsel->core); } -int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu) +int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu_map_idx) { - if (cpu == -1) + if (cpu_map_idx == -1) return evsel__open_cpu(evsel, cpus, NULL, 0, cpus ? cpus->nr : 1); - return evsel__open_cpu(evsel, cpus, NULL, cpu, cpu + 1); + return evsel__open_cpu(evsel, cpus, NULL, cpu_map_idx, cpu_map_idx + 1); } int evsel__open_per_thread(struct evsel *evsel, struct perf_thread_map *threads) @@ -2952,6 +2934,10 @@ int evsel__open_strerror(struct evsel *evsel, struct target *target, return scnprintf(msg, size, "wrong clockid (%d).", clockid); if (perf_missing_features.aux_output) return scnprintf(msg, size, "The 'aux_output' feature is not supported, update the kernel."); + if (!target__has_cpu(target)) + return scnprintf(msg, size, + "Invalid event (%s) in per-thread mode, enable system wide with '-a'.", + evsel__name(evsel)); break; case ENODATA: return scnprintf(msg, size, "Cannot collect data source with the load latency event alone. " @@ -2975,15 +2961,15 @@ struct perf_env *evsel__env(struct evsel *evsel) static int store_evsel_ids(struct evsel *evsel, struct evlist *evlist) { - int cpu, thread; + int cpu_map_idx, thread; - for (cpu = 0; cpu < xyarray__max_x(evsel->core.fd); cpu++) { + for (cpu_map_idx = 0; cpu_map_idx < xyarray__max_x(evsel->core.fd); cpu_map_idx++) { for (thread = 0; thread < xyarray__max_y(evsel->core.fd); thread++) { - int fd = FD(evsel, cpu, thread); + int fd = FD(evsel, cpu_map_idx, thread); if (perf_evlist__id_add_fd(&evlist->core, &evsel->core, - cpu, thread, fd) < 0) + cpu_map_idx, thread, fd) < 0) return -1; } } diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 29d49a8c1e92..5720ceebffac 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -121,7 +121,6 @@ struct evsel { bool errored; struct hashmap *per_pkg_mask; int err; - int cpu_iter; struct { evsel__sb_cb_t *cb; void *data; @@ -195,9 +194,6 @@ static inline int evsel__nr_cpus(struct evsel *evsel) return evsel__cpus(evsel)->nr; } -void perf_counts_values__scale(struct perf_counts_values *count, - bool scale, s8 *pscaled); - void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread, struct perf_counts_values *count); @@ -288,12 +284,12 @@ void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr); int evsel__set_filter(struct evsel *evsel, const char *filter); int evsel__append_tp_filter(struct evsel *evsel, const char *filter); int evsel__append_addr_filter(struct evsel *evsel, const char *filter); -int evsel__enable_cpu(struct evsel *evsel, int cpu); +int evsel__enable_cpu(struct evsel *evsel, int cpu_map_idx); int evsel__enable(struct evsel *evsel); int evsel__disable(struct evsel *evsel); -int evsel__disable_cpu(struct evsel *evsel, int cpu); +int evsel__disable_cpu(struct evsel *evsel, int cpu_map_idx); -int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu); +int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu_map_idx); int evsel__open_per_thread(struct evsel *evsel, struct perf_thread_map *threads); int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, struct perf_thread_map *threads); @@ -305,10 +301,6 @@ bool evsel__detect_missing_features(struct evsel *evsel); enum rlimit_action { NO_CHANGE, SET_TO_MAX, INCREASED_MAX }; bool evsel__increase_rlimit(enum rlimit_action *set_rlimit); -bool evsel__ignore_missing_thread(struct evsel *evsel, - int nr_cpus, int cpu, - struct perf_thread_map *threads, - int thread, int err); bool evsel__precise_ip_fallback(struct evsel *evsel); struct perf_sample; @@ -337,32 +329,32 @@ static inline bool evsel__match2(struct evsel *e1, struct evsel *e2) (e1->core.attr.config == e2->core.attr.config); } -int evsel__read_counter(struct evsel *evsel, int cpu, int thread); +int evsel__read_counter(struct evsel *evsel, int cpu_map_idx, int thread); -int __evsel__read_on_cpu(struct evsel *evsel, int cpu, int thread, bool scale); +int __evsel__read_on_cpu(struct evsel *evsel, int cpu_map_idx, int thread, bool scale); /** * evsel__read_on_cpu - Read out the results on a CPU and thread * * @evsel - event selector to read value - * @cpu - CPU of interest + * @cpu_map_idx - CPU of interest * @thread - thread of interest */ -static inline int evsel__read_on_cpu(struct evsel *evsel, int cpu, int thread) +static inline int evsel__read_on_cpu(struct evsel *evsel, int cpu_map_idx, int thread) { - return __evsel__read_on_cpu(evsel, cpu, thread, false); + return __evsel__read_on_cpu(evsel, cpu_map_idx, thread, false); } /** * evsel__read_on_cpu_scaled - Read out the results on a CPU and thread, scaled * * @evsel - event selector to read value - * @cpu - CPU of interest + * @cpu_map_idx - CPU of interest * @thread - thread of interest */ -static inline int evsel__read_on_cpu_scaled(struct evsel *evsel, int cpu, int thread) +static inline int evsel__read_on_cpu_scaled(struct evsel *evsel, int cpu_map_idx, int thread) { - return __evsel__read_on_cpu(evsel, cpu, thread, true); + return __evsel__read_on_cpu(evsel, cpu_map_idx, thread, true); } int evsel__parse_sample(struct evsel *evsel, union perf_event *event, diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c index 666b59baeb70..675f318ce7c1 100644 --- a/tools/perf/util/expr.c +++ b/tools/perf/util/expr.c @@ -405,12 +405,17 @@ double expr_id_data__source_count(const struct expr_id_data *data) double expr__get_literal(const char *literal) { static struct cpu_topology *topology; + double result = NAN; - if (!strcmp("#smt_on", literal)) - return smt_on() > 0 ? 1.0 : 0.0; + if (!strcasecmp("#smt_on", literal)) { + result = smt_on() > 0 ? 1.0 : 0.0; + goto out; + } - if (!strcmp("#num_cpus", literal)) - return cpu__max_present_cpu(); + if (!strcmp("#num_cpus", literal)) { + result = cpu__max_present_cpu().cpu; + goto out; + } /* * Assume that topology strings are consistent, such as CPUs "0-1" @@ -422,16 +427,24 @@ double expr__get_literal(const char *literal) topology = cpu_topology__new(); if (!topology) { pr_err("Error creating CPU topology"); - return NAN; + goto out; } } - if (!strcmp("#num_packages", literal)) - return topology->package_cpus_lists; - if (!strcmp("#num_dies", literal)) - return topology->die_cpus_lists; - if (!strcmp("#num_cores", literal)) - return topology->core_cpus_lists; + if (!strcmp("#num_packages", literal)) { + result = topology->package_cpus_lists; + goto out; + } + if (!strcmp("#num_dies", literal)) { + result = topology->die_cpus_lists; + goto out; + } + if (!strcmp("#num_cores", literal)) { + result = topology->core_cpus_lists; + goto out; + } pr_err("Unrecognized literal '%s'", literal); - return NAN; +out: + pr_debug2("literal: %s = %f\n", literal, result); + return result; } diff --git a/tools/perf/util/ftrace.h b/tools/perf/util/ftrace.h new file mode 100644 index 000000000000..887f68a185f7 --- /dev/null +++ b/tools/perf/util/ftrace.h @@ -0,0 +1,81 @@ +#ifndef __PERF_FTRACE_H__ +#define __PERF_FTRACE_H__ + +#include <linux/list.h> + +#include "target.h" + +struct evlist; + +struct perf_ftrace { + struct evlist *evlist; + struct target target; + const char *tracer; + struct list_head filters; + struct list_head notrace; + struct list_head graph_funcs; + struct list_head nograph_funcs; + unsigned long percpu_buffer_size; + bool inherit; + int graph_depth; + int func_stack_trace; + int func_irq_info; + int graph_nosleep_time; + int graph_noirqs; + int graph_verbose; + int graph_thresh; + unsigned int initial_delay; +}; + +struct filter_entry { + struct list_head list; + char name[]; +}; + +#define NUM_BUCKET 22 /* 20 + 2 (for outliers in both direction) */ + +#ifdef HAVE_BPF_SKEL + +int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace); +int perf_ftrace__latency_start_bpf(struct perf_ftrace *ftrace); +int perf_ftrace__latency_stop_bpf(struct perf_ftrace *ftrace); +int perf_ftrace__latency_read_bpf(struct perf_ftrace *ftrace, + int buckets[]); +int perf_ftrace__latency_cleanup_bpf(struct perf_ftrace *ftrace); + +#else /* !HAVE_BPF_SKEL */ + +static inline int +perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + return -1; +} + +static inline int +perf_ftrace__latency_start_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + return -1; +} + +static inline int +perf_ftrace__latency_stop_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + return -1; +} + +static inline int +perf_ftrace__latency_read_bpf(struct perf_ftrace *ftrace __maybe_unused, + int buckets[] __maybe_unused) +{ + return -1; +} + +static inline int +perf_ftrace__latency_cleanup_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + return -1; +} + +#endif /* HAVE_BPF_SKEL */ + +#endif /* __PERF_FTRACE_H__ */ diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index e3c1a532d059..6da12e522edc 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -472,7 +472,7 @@ static int write_nrcpus(struct feat_fd *ff, u32 nrc, nra; int ret; - nrc = cpu__max_present_cpu(); + nrc = cpu__max_present_cpu().cpu; nr = sysconf(_SC_NPROCESSORS_ONLN); if (nr < 0) @@ -1163,7 +1163,7 @@ static int build_caches(struct cpu_cache_level caches[], u32 *cntp) u32 nr, cpu; u16 level; - nr = cpu__max_cpu(); + nr = cpu__max_cpu().cpu; for (cpu = 0; cpu < nr; cpu++) { for (level = 0; level < MAX_CACHE_LVL; level++) { @@ -1195,7 +1195,7 @@ static int build_caches(struct cpu_cache_level caches[], u32 *cntp) static int write_cache(struct feat_fd *ff, struct evlist *evlist __maybe_unused) { - u32 max_caches = cpu__max_cpu() * MAX_CACHE_LVL; + u32 max_caches = cpu__max_cpu().cpu * MAX_CACHE_LVL; struct cpu_cache_level caches[max_caches]; u32 cnt = 0, i, version = 1; int ret; diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index b776465e04ef..0a8033b09e28 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -211,7 +211,9 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h) hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10); hists__new_col_len(hists, HISTC_LOCAL_INS_LAT, 13); hists__new_col_len(hists, HISTC_GLOBAL_INS_LAT, 13); - hists__new_col_len(hists, HISTC_P_STAGE_CYC, 13); + hists__new_col_len(hists, HISTC_LOCAL_P_STAGE_CYC, 13); + hists__new_col_len(hists, HISTC_GLOBAL_P_STAGE_CYC, 13); + if (symbol_conf.nanosecs) hists__new_col_len(hists, HISTC_TIME, 16); else diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 621f35ae1efa..2a15e22fb89c 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -75,7 +75,8 @@ enum hist_column { HISTC_MEM_BLOCKED, HISTC_LOCAL_INS_LAT, HISTC_GLOBAL_INS_LAT, - HISTC_P_STAGE_CYC, + HISTC_LOCAL_P_STAGE_CYC, + HISTC_GLOBAL_P_STAGE_CYC, HISTC_NR_COLS, /* Last entry */ }; diff --git a/tools/perf/util/libunwind/arm64.c b/tools/perf/util/libunwind/arm64.c index c397be0c2e32..15f60fd09424 100644 --- a/tools/perf/util/libunwind/arm64.c +++ b/tools/perf/util/libunwind/arm64.c @@ -23,7 +23,9 @@ #include "unwind.h" #include "libunwind-aarch64.h" +#define perf_event_arm_regs perf_event_arm64_regs #include <../../../../arch/arm64/include/uapi/asm/perf_regs.h> +#undef perf_event_arm_regs #include "../../arch/arm64/util/unwind-libunwind.c" /* NO_LIBUNWIND_DEBUG_FRAME is a feature flag for local libunwind, diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index fb8496df8432..3901440aeff9 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -34,6 +34,7 @@ #include "bpf-event.h" #include <internal/lib.h> // page_size #include "cgroup.h" +#include "arm64-frame-pointer-unwind-support.h" #include <linux/ctype.h> #include <symbol/kallsyms.h> @@ -2710,6 +2711,15 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread, return err; } +static u64 get_leaf_frame_caller(struct perf_sample *sample, + struct thread *thread, int usr_idx) +{ + if (machine__normalized_is(thread->maps->machine, "arm64")) + return get_leaf_frame_caller_aarch64(sample, thread, usr_idx); + else + return 0; +} + static int thread__resolve_callchain_sample(struct thread *thread, struct callchain_cursor *cursor, struct evsel *evsel, @@ -2723,9 +2733,10 @@ static int thread__resolve_callchain_sample(struct thread *thread, struct ip_callchain *chain = sample->callchain; int chain_nr = 0; u8 cpumode = PERF_RECORD_MISC_USER; - int i, j, err, nr_entries; + int i, j, err, nr_entries, usr_idx; int skip_idx = -1; int first_call = 0; + u64 leaf_frame_caller; if (chain) chain_nr = chain->nr; @@ -2850,6 +2861,34 @@ check_calls: continue; } + /* + * PERF_CONTEXT_USER allows us to locate where the user stack ends. + * Depending on callchain_param.order and the position of PERF_CONTEXT_USER, + * the index will be different in order to add the missing frame + * at the right place. + */ + + usr_idx = callchain_param.order == ORDER_CALLEE ? j-2 : j-1; + + if (usr_idx >= 0 && chain->ips[usr_idx] == PERF_CONTEXT_USER) { + + leaf_frame_caller = get_leaf_frame_caller(sample, thread, usr_idx); + + /* + * check if leaf_frame_Caller != ip to not add the same + * value twice. + */ + + if (leaf_frame_caller && leaf_frame_caller != ip) { + + err = add_callchain_ip(thread, cursor, parent, + root_al, &cpumode, leaf_frame_caller, + false, NULL, NULL, 0); + if (err) + return (err < 0) ? err : 0; + } + } + err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip, false, NULL, NULL, 0); @@ -3079,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid, } /* - * Compares the raw arch string. N.B. see instead perf_env__arch() if a - * normalized arch is needed. + * Compares the raw arch string. N.B. see instead perf_env__arch() or + * machine__normalized_is() if a normalized arch is needed. */ bool machine__is(struct machine *machine, const char *arch) { return machine && !strcmp(perf_env__raw_arch(machine->env), arch); } +bool machine__normalized_is(struct machine *machine, const char *arch) +{ + return machine && !strcmp(perf_env__arch(machine->env), arch); +} + int machine__nr_cpus_avail(struct machine *machine) { return machine ? perf_env__nr_cpus_avail(machine->env) : 0; diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h index a143087eeb47..c5a45dc8df4c 100644 --- a/tools/perf/util/machine.h +++ b/tools/perf/util/machine.h @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine) } bool machine__is(struct machine *machine, const char *arch); +bool machine__normalized_is(struct machine *machine, const char *arch); int machine__nr_cpus_avail(struct machine *machine); struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid); diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c index 3167b4628b6d..ed0ab838bcc5 100644 --- a/tools/perf/util/mem-events.c +++ b/tools/perf/util/mem-events.c @@ -309,6 +309,9 @@ static const char * const mem_hops[] = { * to be set with mem_hops field. */ "core, same node", + "node, same socket", + "socket, same board", + "board", }; int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) @@ -316,7 +319,7 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) size_t i, l = 0; u64 m = PERF_MEM_LVL_NA; u64 hit, miss; - int printed; + int printed = 0; if (mem_info) m = mem_info->data_src.mem_lvl; @@ -335,18 +338,22 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) l += 7; } - if (mem_info && mem_info->data_src.mem_hops) + /* + * Incase mem_hops field is set, we can skip printing data source via + * PERF_MEM_LVL namespace. + */ + if (mem_info && mem_info->data_src.mem_hops) { l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]); - - printed = 0; - for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) { - if (!(m & 0x1)) - continue; - if (printed++) { - strcat(out, " or "); - l += 4; + } else { + for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) { + if (!(m & 0x1)) + continue; + if (printed++) { + strcat(out, " or "); + l += 4; + } + l += scnprintf(out + l, sz - l, mem_lvl[i]); } - l += scnprintf(out + l, sz - l, mem_lvl[i]); } if (mem_info && mem_info->data_src.mem_lvl_num) { diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index fffe02aae3ed..d8492e339521 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -209,8 +209,8 @@ static struct metric *metric__new(const struct pmu_event *pe, m->metric_name = pe->metric_name; m->modifier = modifier ? strdup(modifier) : NULL; if (modifier && !m->modifier) { - free(m); expr__ctx_free(m->pctx); + free(m); return NULL; } m->metric_expr = pe->metric_expr; @@ -314,7 +314,7 @@ static int setup_metric_events(struct hashmap *ids, */ metric_id = evsel__metric_id(ev); evlist__for_each_entry_continue(metric_evlist, ev) { - if (!strcmp(evsel__metric_id(metric_events[i]), metric_id)) + if (!strcmp(evsel__metric_id(ev), metric_id)) ev->metric_leader = metric_events[i]; } } @@ -1115,13 +1115,27 @@ out: return ret; } +/** + * metric_list_cmp - list_sort comparator that sorts metrics with more events to + * the front. duration_time is excluded from the count. + */ static int metric_list_cmp(void *priv __maybe_unused, const struct list_head *l, const struct list_head *r) { const struct metric *left = container_of(l, struct metric, nd); const struct metric *right = container_of(r, struct metric, nd); + struct expr_id_data *data; + int left_count, right_count; + + left_count = hashmap__size(left->pctx->ids); + if (!expr__get_id(left->pctx, "duration_time", &data)) + left_count--; + + right_count = hashmap__size(right->pctx->ids); + if (!expr__get_id(right->pctx, "duration_time", &data)) + right_count--; - return hashmap__size(right->pctx->ids) - hashmap__size(left->pctx->ids); + return right_count - left_count; } /** @@ -1299,14 +1313,16 @@ err_out: /** * parse_ids - Build the event string for the ids and parse them creating an * evlist. The encoded metric_ids are decoded. + * @metric_no_merge: is metric sharing explicitly disabled. * @fake_pmu: used when testing metrics not supported by the current CPU. * @ids: the event identifiers parsed from a metric. * @modifier: any modifiers added to the events. * @has_constraint: false if events should be placed in a weak group. * @out_evlist: the created list of events. */ -static int parse_ids(struct perf_pmu *fake_pmu, struct expr_parse_ctx *ids, - const char *modifier, bool has_constraint, struct evlist **out_evlist) +static int parse_ids(bool metric_no_merge, struct perf_pmu *fake_pmu, + struct expr_parse_ctx *ids, const char *modifier, + bool has_constraint, struct evlist **out_evlist) { struct parse_events_error parse_error; struct evlist *parsed_evlist; @@ -1314,12 +1330,19 @@ static int parse_ids(struct perf_pmu *fake_pmu, struct expr_parse_ctx *ids, int ret; *out_evlist = NULL; - if (hashmap__size(ids->ids) == 0) { + if (!metric_no_merge || hashmap__size(ids->ids) == 0) { char *tmp; /* - * No ids/events in the expression parsing context. Events may - * have been removed because of constant evaluation, e.g.: - * event1 if #smt_on else 0 + * We may fail to share events between metrics because + * duration_time isn't present in one metric. For example, a + * ratio of cache misses doesn't need duration_time but the same + * events may be used for a misses per second. Events without + * sharing implies multiplexing, that is best avoided, so place + * duration_time in every group. + * + * Also, there may be no ids/events in the expression parsing + * context because of constant evaluation, e.g.: + * event1 if #smt_on else 0 * Add a duration_time event to avoid a parse error on an empty * string. */ @@ -1387,7 +1410,8 @@ static int parse_groups(struct evlist *perf_evlist, const char *str, ret = build_combined_expr_ctx(&metric_list, &combined); if (!ret && combined && hashmap__size(combined->ids)) { - ret = parse_ids(fake_pmu, combined, /*modifier=*/NULL, + ret = parse_ids(metric_no_merge, fake_pmu, combined, + /*modifier=*/NULL, /*has_constraint=*/true, &combined_evlist); } @@ -1435,7 +1459,7 @@ static int parse_groups(struct evlist *perf_evlist, const char *str, } } if (!metric_evlist) { - ret = parse_ids(fake_pmu, m->pctx, m->modifier, + ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier, m->has_constraint, &m->evlist); if (ret) goto out; diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index 23ecdba9e670..12261ed8c15b 100644 --- a/tools/perf/util/mmap.c +++ b/tools/perf/util/mmap.c @@ -94,7 +94,7 @@ static void perf_mmap__aio_free(struct mmap *map, int idx) } } -static int perf_mmap__aio_bind(struct mmap *map, int idx, int cpu, int affinity) +static int perf_mmap__aio_bind(struct mmap *map, int idx, struct perf_cpu cpu, int affinity) { void *data; size_t mmap_len; @@ -138,7 +138,7 @@ static void perf_mmap__aio_free(struct mmap *map, int idx) } static int perf_mmap__aio_bind(struct mmap *map __maybe_unused, int idx __maybe_unused, - int cpu __maybe_unused, int affinity __maybe_unused) + struct perf_cpu cpu __maybe_unused, int affinity __maybe_unused) { return 0; } @@ -240,7 +240,8 @@ void mmap__munmap(struct mmap *map) static void build_node_mask(int node, struct mmap_cpu_mask *mask) { - int c, cpu, nr_cpus; + int idx, nr_cpus; + struct perf_cpu cpu; const struct perf_cpu_map *cpu_map = NULL; cpu_map = cpu_map__online(); @@ -248,16 +249,16 @@ static void build_node_mask(int node, struct mmap_cpu_mask *mask) return; nr_cpus = perf_cpu_map__nr(cpu_map); - for (c = 0; c < nr_cpus; c++) { - cpu = cpu_map->map[c]; /* map c index to online cpu index */ + for (idx = 0; idx < nr_cpus; idx++) { + cpu = cpu_map->map[idx]; /* map c index to online cpu index */ if (cpu__get_node(cpu) == node) - set_bit(cpu, mask->bits); + set_bit(cpu.cpu, mask->bits); } } static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp) { - map->affinity_mask.nbits = cpu__max_cpu(); + map->affinity_mask.nbits = cpu__max_cpu().cpu; map->affinity_mask.bits = bitmap_zalloc(map->affinity_mask.nbits); if (!map->affinity_mask.bits) return -1; @@ -265,12 +266,12 @@ static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params * if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1) build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask); else if (mp->affinity == PERF_AFFINITY_CPU) - set_bit(map->core.cpu, map->affinity_mask.bits); + set_bit(map->core.cpu.cpu, map->affinity_mask.bits); return 0; } -int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu) +int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, struct perf_cpu cpu) { if (perf_mmap__mmap(&map->core, &mp->core, fd, cpu)) { pr_debug2("failed to mmap perf event ring buffer, error %d\n", diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index 8e259b9610f8..83f6bd4d4082 100644 --- a/tools/perf/util/mmap.h +++ b/tools/perf/util/mmap.h @@ -7,6 +7,7 @@ #include <linux/types.h> #include <linux/ring_buffer.h> #include <linux/bitops.h> +#include <perf/cpumap.h> #include <stdbool.h> #include <pthread.h> // for cpu_set_t #ifdef HAVE_AIO_SUPPORT @@ -52,7 +53,7 @@ struct mmap_params { struct auxtrace_mmap_params auxtrace_mp; }; -int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu); +int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, struct perf_cpu cpu); void mmap__munmap(struct mmap *map); union perf_event *perf_mmap__read_forward(struct mmap *map); diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c index 608b20c72a5c..48aa3217300b 100644 --- a/tools/perf/util/namespaces.c +++ b/tools/perf/util/namespaces.c @@ -60,17 +60,49 @@ void namespaces__free(struct namespaces *namespaces) free(namespaces); } +static int nsinfo__get_nspid(struct nsinfo *nsi, const char *path) +{ + FILE *f = NULL; + char *statln = NULL; + size_t linesz = 0; + char *nspid; + + f = fopen(path, "r"); + if (f == NULL) + return -1; + + while (getline(&statln, &linesz, f) != -1) { + /* Use tgid if CONFIG_PID_NS is not defined. */ + if (strstr(statln, "Tgid:") != NULL) { + nsi->tgid = (pid_t)strtol(strrchr(statln, '\t'), + NULL, 10); + nsi->nstgid = nsi->tgid; + } + + if (strstr(statln, "NStgid:") != NULL) { + nspid = strrchr(statln, '\t'); + nsi->nstgid = (pid_t)strtol(nspid, NULL, 10); + /* + * If innermost tgid is not the first, process is in a different + * PID namespace. + */ + nsi->in_pidns = (statln + sizeof("NStgid:") - 1) != nspid; + break; + } + } + + fclose(f); + free(statln); + return 0; +} + int nsinfo__init(struct nsinfo *nsi) { char oldns[PATH_MAX]; char spath[PATH_MAX]; char *newns = NULL; - char *statln = NULL; - char *nspid; struct stat old_stat; struct stat new_stat; - FILE *f = NULL; - size_t linesz = 0; int rv = -1; if (snprintf(oldns, PATH_MAX, "/proc/self/ns/mnt") >= PATH_MAX) @@ -100,34 +132,9 @@ int nsinfo__init(struct nsinfo *nsi) if (snprintf(spath, PATH_MAX, "/proc/%d/status", nsi->pid) >= PATH_MAX) goto out; - f = fopen(spath, "r"); - if (f == NULL) - goto out; - - while (getline(&statln, &linesz, f) != -1) { - /* Use tgid if CONFIG_PID_NS is not defined. */ - if (strstr(statln, "Tgid:") != NULL) { - nsi->tgid = (pid_t)strtol(strrchr(statln, '\t'), - NULL, 10); - nsi->nstgid = nsi->tgid; - } - - if (strstr(statln, "NStgid:") != NULL) { - nspid = strrchr(statln, '\t'); - nsi->nstgid = (pid_t)strtol(nspid, NULL, 10); - /* If innermost tgid is not the first, process is in a different - * PID namespace. - */ - nsi->in_pidns = (statln + sizeof("NStgid:") - 1) != nspid; - break; - } - } - rv = 0; + rv = nsinfo__get_nspid(nsi, spath); out: - if (f != NULL) - (void) fclose(f); - free(statln); free(newns); return rv; } @@ -299,3 +306,12 @@ int nsinfo__stat(const char *filename, struct stat *st, struct nsinfo *nsi) return ret; } + +bool nsinfo__is_in_root_namespace(void) +{ + struct nsinfo nsi; + + memset(&nsi, 0x0, sizeof(nsi)); + nsinfo__get_nspid(&nsi, "/proc/self/status"); + return !nsi.in_pidns; +} diff --git a/tools/perf/util/namespaces.h b/tools/perf/util/namespaces.h index ad9775db7b9c..9ceea9643507 100644 --- a/tools/perf/util/namespaces.h +++ b/tools/perf/util/namespaces.h @@ -59,6 +59,8 @@ void nsinfo__mountns_exit(struct nscookie *nc); char *nsinfo__realpath(const char *path, struct nsinfo *nsi); int nsinfo__stat(const char *filename, struct stat *st, struct nsinfo *nsi); +bool nsinfo__is_in_root_namespace(void); + static inline void __nsinfo__zput(struct nsinfo **nsip) { if (nsip) { diff --git a/tools/perf/util/parse-events-hybrid.c b/tools/perf/util/parse-events-hybrid.c index 9fc86971027b..284f8eabd3b9 100644 --- a/tools/perf/util/parse-events-hybrid.c +++ b/tools/perf/util/parse-events-hybrid.c @@ -63,10 +63,13 @@ static int create_event_hybrid(__u32 config_type, int *idx, static int pmu_cmp(struct parse_events_state *parse_state, struct perf_pmu *pmu) { - if (!parse_state->hybrid_pmu_name) - return 0; + if (parse_state->evlist && parse_state->evlist->hybrid_pmu_name) + return strcmp(parse_state->evlist->hybrid_pmu_name, pmu->name); + + if (parse_state->hybrid_pmu_name) + return strcmp(parse_state->hybrid_pmu_name, pmu->name); - return strcmp(parse_state->hybrid_pmu_name, pmu->name); + return 0; } static int add_hw_hybrid(struct parse_events_state *parse_state, diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index ba74fdf74af9..acf20ce98ce9 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -1824,6 +1824,11 @@ out: return ret; } +__weak struct evsel *arch_evlist__leader(struct list_head *list) +{ + return list_first_entry(list, struct evsel, core.node); +} + void parse_events__set_leader(char *name, struct list_head *list, struct parse_events_state *parse_state) { @@ -1837,9 +1842,10 @@ void parse_events__set_leader(char *name, struct list_head *list, if (parse_events__set_leader_for_uncore_aliase(name, list, parse_state)) return; - __perf_evlist__set_leader(list); - leader = list_entry(list->next, struct evsel, core.node); + leader = arch_evlist__leader(list); + __perf_evlist__set_leader(list, &leader->core); leader->group_name = name ? strdup(name) : NULL; + list_move(&leader->core.node, list); } /* list_event is assumed to point to malloc'ed memory */ diff --git a/tools/perf/util/perf_api_probe.c b/tools/perf/util/perf_api_probe.c index 020411682a3c..734d006d9a8c 100644 --- a/tools/perf/util/perf_api_probe.c +++ b/tools/perf/util/perf_api_probe.c @@ -11,7 +11,7 @@ typedef void (*setup_probe_fn_t)(struct evsel *evsel); -static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str) +static int perf_do_probe_api(setup_probe_fn_t fn, struct perf_cpu cpu, const char *str) { struct evlist *evlist; struct evsel *evsel; @@ -29,7 +29,7 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str) evsel = evlist__first(evlist); while (1) { - fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, flags); + fd = sys_perf_event_open(&evsel->core.attr, pid, cpu.cpu, -1, flags); if (fd < 0) { if (pid == -1 && errno == EACCES) { pid = 0; @@ -43,7 +43,7 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str) fn(evsel); - fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, flags); + fd = sys_perf_event_open(&evsel->core.attr, pid, cpu.cpu, -1, flags); if (fd < 0) { if (errno == EINVAL) err = -EINVAL; @@ -61,7 +61,8 @@ static bool perf_probe_api(setup_probe_fn_t fn) { const char *try[] = {"cycles:u", "instructions:u", "cpu-clock:u", NULL}; struct perf_cpu_map *cpus; - int cpu, ret, i = 0; + struct perf_cpu cpu; + int ret, i = 0; cpus = perf_cpu_map__new(NULL); if (!cpus) @@ -136,15 +137,17 @@ bool perf_can_record_cpu_wide(void) .exclude_kernel = 1, }; struct perf_cpu_map *cpus; - int cpu, fd; + struct perf_cpu cpu; + int fd; cpus = perf_cpu_map__new(NULL); if (!cpus) return false; + cpu = cpus->map[0]; perf_cpu_map__put(cpus); - fd = sys_perf_event_open(&attr, -1, cpu, -1, 0); + fd = sys_perf_event_open(&attr, -1, cpu.cpu, -1, 0); if (fd < 0) return false; close(fd); diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 06a7461ba864..a982e40ee5a9 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include <errno.h> +#include <string.h> #include "perf_regs.h" #include "event.h" @@ -20,6 +21,671 @@ uint64_t __weak arch__user_reg_mask(void) } #ifdef HAVE_PERF_REGS_SUPPORT + +#define perf_event_arm_regs perf_event_arm64_regs +#include "../../arch/arm64/include/uapi/asm/perf_regs.h" +#undef perf_event_arm_regs + +#include "../../arch/arm/include/uapi/asm/perf_regs.h" +#include "../../arch/csky/include/uapi/asm/perf_regs.h" +#include "../../arch/mips/include/uapi/asm/perf_regs.h" +#include "../../arch/powerpc/include/uapi/asm/perf_regs.h" +#include "../../arch/riscv/include/uapi/asm/perf_regs.h" +#include "../../arch/s390/include/uapi/asm/perf_regs.h" +#include "../../arch/x86/include/uapi/asm/perf_regs.h" + +static const char *__perf_reg_name_arm64(int id) +{ + switch (id) { + case PERF_REG_ARM64_X0: + return "x0"; + case PERF_REG_ARM64_X1: + return "x1"; + case PERF_REG_ARM64_X2: + return "x2"; + case PERF_REG_ARM64_X3: + return "x3"; + case PERF_REG_ARM64_X4: + return "x4"; + case PERF_REG_ARM64_X5: + return "x5"; + case PERF_REG_ARM64_X6: + return "x6"; + case PERF_REG_ARM64_X7: + return "x7"; + case PERF_REG_ARM64_X8: + return "x8"; + case PERF_REG_ARM64_X9: + return "x9"; + case PERF_REG_ARM64_X10: + return "x10"; + case PERF_REG_ARM64_X11: + return "x11"; + case PERF_REG_ARM64_X12: + return "x12"; + case PERF_REG_ARM64_X13: + return "x13"; + case PERF_REG_ARM64_X14: + return "x14"; + case PERF_REG_ARM64_X15: + return "x15"; + case PERF_REG_ARM64_X16: + return "x16"; + case PERF_REG_ARM64_X17: + return "x17"; + case PERF_REG_ARM64_X18: + return "x18"; + case PERF_REG_ARM64_X19: + return "x19"; + case PERF_REG_ARM64_X20: + return "x20"; + case PERF_REG_ARM64_X21: + return "x21"; + case PERF_REG_ARM64_X22: + return "x22"; + case PERF_REG_ARM64_X23: + return "x23"; + case PERF_REG_ARM64_X24: + return "x24"; + case PERF_REG_ARM64_X25: + return "x25"; + case PERF_REG_ARM64_X26: + return "x26"; + case PERF_REG_ARM64_X27: + return "x27"; + case PERF_REG_ARM64_X28: + return "x28"; + case PERF_REG_ARM64_X29: + return "x29"; + case PERF_REG_ARM64_SP: + return "sp"; + case PERF_REG_ARM64_LR: + return "lr"; + case PERF_REG_ARM64_PC: + return "pc"; + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_arm(int id) +{ + switch (id) { + case PERF_REG_ARM_R0: + return "r0"; + case PERF_REG_ARM_R1: + return "r1"; + case PERF_REG_ARM_R2: + return "r2"; + case PERF_REG_ARM_R3: + return "r3"; + case PERF_REG_ARM_R4: + return "r4"; + case PERF_REG_ARM_R5: + return "r5"; + case PERF_REG_ARM_R6: + return "r6"; + case PERF_REG_ARM_R7: + return "r7"; + case PERF_REG_ARM_R8: + return "r8"; + case PERF_REG_ARM_R9: + return "r9"; + case PERF_REG_ARM_R10: + return "r10"; + case PERF_REG_ARM_FP: + return "fp"; + case PERF_REG_ARM_IP: + return "ip"; + case PERF_REG_ARM_SP: + return "sp"; + case PERF_REG_ARM_LR: + return "lr"; + case PERF_REG_ARM_PC: + return "pc"; + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_csky(int id) +{ + switch (id) { + case PERF_REG_CSKY_A0: + return "a0"; + case PERF_REG_CSKY_A1: + return "a1"; + case PERF_REG_CSKY_A2: + return "a2"; + case PERF_REG_CSKY_A3: + return "a3"; + case PERF_REG_CSKY_REGS0: + return "regs0"; + case PERF_REG_CSKY_REGS1: + return "regs1"; + case PERF_REG_CSKY_REGS2: + return "regs2"; + case PERF_REG_CSKY_REGS3: + return "regs3"; + case PERF_REG_CSKY_REGS4: + return "regs4"; + case PERF_REG_CSKY_REGS5: + return "regs5"; + case PERF_REG_CSKY_REGS6: + return "regs6"; + case PERF_REG_CSKY_REGS7: + return "regs7"; + case PERF_REG_CSKY_REGS8: + return "regs8"; + case PERF_REG_CSKY_REGS9: + return "regs9"; + case PERF_REG_CSKY_SP: + return "sp"; + case PERF_REG_CSKY_LR: + return "lr"; + case PERF_REG_CSKY_PC: + return "pc"; +#if defined(__CSKYABIV2__) + case PERF_REG_CSKY_EXREGS0: + return "exregs0"; + case PERF_REG_CSKY_EXREGS1: + return "exregs1"; + case PERF_REG_CSKY_EXREGS2: + return "exregs2"; + case PERF_REG_CSKY_EXREGS3: + return "exregs3"; + case PERF_REG_CSKY_EXREGS4: + return "exregs4"; + case PERF_REG_CSKY_EXREGS5: + return "exregs5"; + case PERF_REG_CSKY_EXREGS6: + return "exregs6"; + case PERF_REG_CSKY_EXREGS7: + return "exregs7"; + case PERF_REG_CSKY_EXREGS8: + return "exregs8"; + case PERF_REG_CSKY_EXREGS9: + return "exregs9"; + case PERF_REG_CSKY_EXREGS10: + return "exregs10"; + case PERF_REG_CSKY_EXREGS11: + return "exregs11"; + case PERF_REG_CSKY_EXREGS12: + return "exregs12"; + case PERF_REG_CSKY_EXREGS13: + return "exregs13"; + case PERF_REG_CSKY_EXREGS14: + return "exregs14"; + case PERF_REG_CSKY_TLS: + return "tls"; + case PERF_REG_CSKY_HI: + return "hi"; + case PERF_REG_CSKY_LO: + return "lo"; +#endif + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_mips(int id) +{ + switch (id) { + case PERF_REG_MIPS_PC: + return "PC"; + case PERF_REG_MIPS_R1: + return "$1"; + case PERF_REG_MIPS_R2: + return "$2"; + case PERF_REG_MIPS_R3: + return "$3"; + case PERF_REG_MIPS_R4: + return "$4"; + case PERF_REG_MIPS_R5: + return "$5"; + case PERF_REG_MIPS_R6: + return "$6"; + case PERF_REG_MIPS_R7: + return "$7"; + case PERF_REG_MIPS_R8: + return "$8"; + case PERF_REG_MIPS_R9: + return "$9"; + case PERF_REG_MIPS_R10: + return "$10"; + case PERF_REG_MIPS_R11: + return "$11"; + case PERF_REG_MIPS_R12: + return "$12"; + case PERF_REG_MIPS_R13: + return "$13"; + case PERF_REG_MIPS_R14: + return "$14"; + case PERF_REG_MIPS_R15: + return "$15"; + case PERF_REG_MIPS_R16: + return "$16"; + case PERF_REG_MIPS_R17: + return "$17"; + case PERF_REG_MIPS_R18: + return "$18"; + case PERF_REG_MIPS_R19: + return "$19"; + case PERF_REG_MIPS_R20: + return "$20"; + case PERF_REG_MIPS_R21: + return "$21"; + case PERF_REG_MIPS_R22: + return "$22"; + case PERF_REG_MIPS_R23: + return "$23"; + case PERF_REG_MIPS_R24: + return "$24"; + case PERF_REG_MIPS_R25: + return "$25"; + case PERF_REG_MIPS_R28: + return "$28"; + case PERF_REG_MIPS_R29: + return "$29"; + case PERF_REG_MIPS_R30: + return "$30"; + case PERF_REG_MIPS_R31: + return "$31"; + default: + break; + } + return NULL; +} + +static const char *__perf_reg_name_powerpc(int id) +{ + switch (id) { + case PERF_REG_POWERPC_R0: + return "r0"; + case PERF_REG_POWERPC_R1: + return "r1"; + case PERF_REG_POWERPC_R2: + return "r2"; + case PERF_REG_POWERPC_R3: + return "r3"; + case PERF_REG_POWERPC_R4: + return "r4"; + case PERF_REG_POWERPC_R5: + return "r5"; + case PERF_REG_POWERPC_R6: + return "r6"; + case PERF_REG_POWERPC_R7: + return "r7"; + case PERF_REG_POWERPC_R8: + return "r8"; + case PERF_REG_POWERPC_R9: + return "r9"; + case PERF_REG_POWERPC_R10: + return "r10"; + case PERF_REG_POWERPC_R11: + return "r11"; + case PERF_REG_POWERPC_R12: + return "r12"; + case PERF_REG_POWERPC_R13: + return "r13"; + case PERF_REG_POWERPC_R14: + return "r14"; + case PERF_REG_POWERPC_R15: + return "r15"; + case PERF_REG_POWERPC_R16: + return "r16"; + case PERF_REG_POWERPC_R17: + return "r17"; + case PERF_REG_POWERPC_R18: + return "r18"; + case PERF_REG_POWERPC_R19: + return "r19"; + case PERF_REG_POWERPC_R20: + return "r20"; + case PERF_REG_POWERPC_R21: + return "r21"; + case PERF_REG_POWERPC_R22: + return "r22"; + case PERF_REG_POWERPC_R23: + return "r23"; + case PERF_REG_POWERPC_R24: + return "r24"; + case PERF_REG_POWERPC_R25: + return "r25"; + case PERF_REG_POWERPC_R26: + return "r26"; + case PERF_REG_POWERPC_R27: + return "r27"; + case PERF_REG_POWERPC_R28: + return "r28"; + case PERF_REG_POWERPC_R29: + return "r29"; + case PERF_REG_POWERPC_R30: + return "r30"; + case PERF_REG_POWERPC_R31: + return "r31"; + case PERF_REG_POWERPC_NIP: + return "nip"; + case PERF_REG_POWERPC_MSR: + return "msr"; + case PERF_REG_POWERPC_ORIG_R3: + return "orig_r3"; + case PERF_REG_POWERPC_CTR: + return "ctr"; + case PERF_REG_POWERPC_LINK: + return "link"; + case PERF_REG_POWERPC_XER: + return "xer"; + case PERF_REG_POWERPC_CCR: + return "ccr"; + case PERF_REG_POWERPC_SOFTE: + return "softe"; + case PERF_REG_POWERPC_TRAP: + return "trap"; + case PERF_REG_POWERPC_DAR: + return "dar"; + case PERF_REG_POWERPC_DSISR: + return "dsisr"; + case PERF_REG_POWERPC_SIER: + return "sier"; + case PERF_REG_POWERPC_MMCRA: + return "mmcra"; + case PERF_REG_POWERPC_MMCR0: + return "mmcr0"; + case PERF_REG_POWERPC_MMCR1: + return "mmcr1"; + case PERF_REG_POWERPC_MMCR2: + return "mmcr2"; + case PERF_REG_POWERPC_MMCR3: + return "mmcr3"; + case PERF_REG_POWERPC_SIER2: + return "sier2"; + case PERF_REG_POWERPC_SIER3: + return "sier3"; + case PERF_REG_POWERPC_PMC1: + return "pmc1"; + case PERF_REG_POWERPC_PMC2: + return "pmc2"; + case PERF_REG_POWERPC_PMC3: + return "pmc3"; + case PERF_REG_POWERPC_PMC4: + return "pmc4"; + case PERF_REG_POWERPC_PMC5: + return "pmc5"; + case PERF_REG_POWERPC_PMC6: + return "pmc6"; + case PERF_REG_POWERPC_SDAR: + return "sdar"; + case PERF_REG_POWERPC_SIAR: + return "siar"; + default: + break; + } + return NULL; +} + +static const char *__perf_reg_name_riscv(int id) +{ + switch (id) { + case PERF_REG_RISCV_PC: + return "pc"; + case PERF_REG_RISCV_RA: + return "ra"; + case PERF_REG_RISCV_SP: + return "sp"; + case PERF_REG_RISCV_GP: + return "gp"; + case PERF_REG_RISCV_TP: + return "tp"; + case PERF_REG_RISCV_T0: + return "t0"; + case PERF_REG_RISCV_T1: + return "t1"; + case PERF_REG_RISCV_T2: + return "t2"; + case PERF_REG_RISCV_S0: + return "s0"; + case PERF_REG_RISCV_S1: + return "s1"; + case PERF_REG_RISCV_A0: + return "a0"; + case PERF_REG_RISCV_A1: + return "a1"; + case PERF_REG_RISCV_A2: + return "a2"; + case PERF_REG_RISCV_A3: + return "a3"; + case PERF_REG_RISCV_A4: + return "a4"; + case PERF_REG_RISCV_A5: + return "a5"; + case PERF_REG_RISCV_A6: + return "a6"; + case PERF_REG_RISCV_A7: + return "a7"; + case PERF_REG_RISCV_S2: + return "s2"; + case PERF_REG_RISCV_S3: + return "s3"; + case PERF_REG_RISCV_S4: + return "s4"; + case PERF_REG_RISCV_S5: + return "s5"; + case PERF_REG_RISCV_S6: + return "s6"; + case PERF_REG_RISCV_S7: + return "s7"; + case PERF_REG_RISCV_S8: + return "s8"; + case PERF_REG_RISCV_S9: + return "s9"; + case PERF_REG_RISCV_S10: + return "s10"; + case PERF_REG_RISCV_S11: + return "s11"; + case PERF_REG_RISCV_T3: + return "t3"; + case PERF_REG_RISCV_T4: + return "t4"; + case PERF_REG_RISCV_T5: + return "t5"; + case PERF_REG_RISCV_T6: + return "t6"; + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_s390(int id) +{ + switch (id) { + case PERF_REG_S390_R0: + return "R0"; + case PERF_REG_S390_R1: + return "R1"; + case PERF_REG_S390_R2: + return "R2"; + case PERF_REG_S390_R3: + return "R3"; + case PERF_REG_S390_R4: + return "R4"; + case PERF_REG_S390_R5: + return "R5"; + case PERF_REG_S390_R6: + return "R6"; + case PERF_REG_S390_R7: + return "R7"; + case PERF_REG_S390_R8: + return "R8"; + case PERF_REG_S390_R9: + return "R9"; + case PERF_REG_S390_R10: + return "R10"; + case PERF_REG_S390_R11: + return "R11"; + case PERF_REG_S390_R12: + return "R12"; + case PERF_REG_S390_R13: + return "R13"; + case PERF_REG_S390_R14: + return "R14"; + case PERF_REG_S390_R15: + return "R15"; + case PERF_REG_S390_FP0: + return "FP0"; + case PERF_REG_S390_FP1: + return "FP1"; + case PERF_REG_S390_FP2: + return "FP2"; + case PERF_REG_S390_FP3: + return "FP3"; + case PERF_REG_S390_FP4: + return "FP4"; + case PERF_REG_S390_FP5: + return "FP5"; + case PERF_REG_S390_FP6: + return "FP6"; + case PERF_REG_S390_FP7: + return "FP7"; + case PERF_REG_S390_FP8: + return "FP8"; + case PERF_REG_S390_FP9: + return "FP9"; + case PERF_REG_S390_FP10: + return "FP10"; + case PERF_REG_S390_FP11: + return "FP11"; + case PERF_REG_S390_FP12: + return "FP12"; + case PERF_REG_S390_FP13: + return "FP13"; + case PERF_REG_S390_FP14: + return "FP14"; + case PERF_REG_S390_FP15: + return "FP15"; + case PERF_REG_S390_MASK: + return "MASK"; + case PERF_REG_S390_PC: + return "PC"; + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_x86(int id) +{ + switch (id) { + case PERF_REG_X86_AX: + return "AX"; + case PERF_REG_X86_BX: + return "BX"; + case PERF_REG_X86_CX: + return "CX"; + case PERF_REG_X86_DX: + return "DX"; + case PERF_REG_X86_SI: + return "SI"; + case PERF_REG_X86_DI: + return "DI"; + case PERF_REG_X86_BP: + return "BP"; + case PERF_REG_X86_SP: + return "SP"; + case PERF_REG_X86_IP: + return "IP"; + case PERF_REG_X86_FLAGS: + return "FLAGS"; + case PERF_REG_X86_CS: + return "CS"; + case PERF_REG_X86_SS: + return "SS"; + case PERF_REG_X86_DS: + return "DS"; + case PERF_REG_X86_ES: + return "ES"; + case PERF_REG_X86_FS: + return "FS"; + case PERF_REG_X86_GS: + return "GS"; + case PERF_REG_X86_R8: + return "R8"; + case PERF_REG_X86_R9: + return "R9"; + case PERF_REG_X86_R10: + return "R10"; + case PERF_REG_X86_R11: + return "R11"; + case PERF_REG_X86_R12: + return "R12"; + case PERF_REG_X86_R13: + return "R13"; + case PERF_REG_X86_R14: + return "R14"; + case PERF_REG_X86_R15: + return "R15"; + +#define XMM(x) \ + case PERF_REG_X86_XMM ## x: \ + case PERF_REG_X86_XMM ## x + 1: \ + return "XMM" #x; + XMM(0) + XMM(1) + XMM(2) + XMM(3) + XMM(4) + XMM(5) + XMM(6) + XMM(7) + XMM(8) + XMM(9) + XMM(10) + XMM(11) + XMM(12) + XMM(13) + XMM(14) + XMM(15) +#undef XMM + default: + return NULL; + } + + return NULL; +} + +const char *perf_reg_name(int id, const char *arch) +{ + const char *reg_name = NULL; + + if (!strcmp(arch, "csky")) + reg_name = __perf_reg_name_csky(id); + else if (!strcmp(arch, "mips")) + reg_name = __perf_reg_name_mips(id); + else if (!strcmp(arch, "powerpc")) + reg_name = __perf_reg_name_powerpc(id); + else if (!strcmp(arch, "riscv")) + reg_name = __perf_reg_name_riscv(id); + else if (!strcmp(arch, "s390")) + reg_name = __perf_reg_name_s390(id); + else if (!strcmp(arch, "x86")) + reg_name = __perf_reg_name_x86(id); + else if (!strcmp(arch, "arm")) + reg_name = __perf_reg_name_arm(id); + else if (!strcmp(arch, "arm64")) + reg_name = __perf_reg_name_arm64(id); + + return reg_name ?: "unknown"; +} + int perf_reg_value(u64 *valp, struct regs_dump *regs, int id) { int i, idx = 0; diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index eeac181ebccf..ce1127af05e4 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -11,8 +11,11 @@ struct sample_reg { const char *name; uint64_t mask; }; -#define SMPL_REG(n, b) { .name = #n, .mask = 1ULL << (b) } -#define SMPL_REG2(n, b) { .name = #n, .mask = 3ULL << (b) } + +#define SMPL_REG_MASK(b) (1ULL << (b)) +#define SMPL_REG(n, b) { .name = #n, .mask = SMPL_REG_MASK(b) } +#define SMPL_REG2_MASK(b) (3ULL << (b)) +#define SMPL_REG2(n, b) { .name = #n, .mask = SMPL_REG2_MASK(b) } #define SMPL_REG_END { .name = NULL } enum { @@ -31,22 +34,16 @@ extern const struct sample_reg sample_reg_masks[]; #define DWARF_MINIMAL_REGS ((1ULL << PERF_REG_IP) | (1ULL << PERF_REG_SP)) +const char *perf_reg_name(int id, const char *arch); int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); -static inline const char *perf_reg_name(int id) -{ - const char *reg_name = __perf_reg_name(id); - - return reg_name ?: "unknown"; -} - #else #define PERF_REGS_MASK 0 #define PERF_REGS_MAX 0 #define DWARF_MINIMAL_REGS PERF_REGS_MASK -static inline const char *perf_reg_name(int id __maybe_unused) +static inline const char *perf_reg_name(int id __maybe_unused, const char *arch __maybe_unused) { return "unknown"; } diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index 82c7f034d91a..f3e5131f183c 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -1059,7 +1059,7 @@ static struct mmap *get_md(struct evlist *evlist, int cpu) for (i = 0; i < evlist->core.nr_mmaps; i++) { struct mmap *md = &evlist->mmap[i]; - if (md->core.cpu == cpu) + if (md->core.cpu.cpu == cpu) return md; } @@ -1445,7 +1445,7 @@ error: * Dummy, to avoid dragging all the test_attr infrastructure in the python * binding. */ -void test_attr__open(struct perf_event_attr *attr, pid_t pid, int cpu, +void test_attr__open(struct perf_event_attr *attr, pid_t pid, struct perf_cpu cpu, int fd, int group_fd, unsigned long flags) { } diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c index bff669b615ee..20461f174991 100644 --- a/tools/perf/util/record.c +++ b/tools/perf/util/record.c @@ -106,7 +106,7 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call if (opts->group) evlist__set_leader(evlist); - if (evlist->core.cpus->map[0] < 0) + if (evlist->core.cpus->map[0].cpu < 0) opts->no_inherit = true; use_comm_exec = perf_can_comm_exec(); @@ -229,7 +229,8 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str) { struct evlist *temp_evlist; struct evsel *evsel; - int err, fd, cpu; + int err, fd; + struct perf_cpu cpu = { .cpu = 0 }; bool ret = false; pid_t pid = -1; @@ -246,14 +247,16 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str) if (!evlist || perf_cpu_map__empty(evlist->core.cpus)) { struct perf_cpu_map *cpus = perf_cpu_map__new(NULL); - cpu = cpus ? cpus->map[0] : 0; + if (cpus) + cpu = cpus->map[0]; + perf_cpu_map__put(cpus); } else { cpu = evlist->core.cpus->map[0]; } while (1) { - fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, + fd = sys_perf_event_open(&evsel->core.attr, pid, cpu.cpu, -1, perf_event_open_cloexec_flag()); if (fd < 0) { if (pid == -1 && errno == EACCES) { diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index d1f1501ce7fc..f5ad0e62227a 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -36,6 +36,7 @@ #include "../debug.h" #include "../dso.h" #include "../callchain.h" +#include "../env.h" #include "../evsel.h" #include "../event.h" #include "../thread.h" @@ -687,7 +688,7 @@ static void set_sample_datasrc_in_dict(PyObject *dict, _PyUnicode_FromString(decode)); } -static void regs_map(struct regs_dump *regs, uint64_t mask, char *bf, int size) +static void regs_map(struct regs_dump *regs, uint64_t mask, const char *arch, char *bf, int size) { unsigned int i = 0, r; int printed = 0; @@ -702,7 +703,7 @@ static void regs_map(struct regs_dump *regs, uint64_t mask, char *bf, int size) printed += scnprintf(bf + printed, size - printed, "%5s:0x%" PRIx64 " ", - perf_reg_name(r), val); + perf_reg_name(r, arch), val); } } @@ -711,6 +712,7 @@ static void set_regs_in_dict(PyObject *dict, struct evsel *evsel) { struct perf_event_attr *attr = &evsel->core.attr; + const char *arch = perf_env__arch(evsel__env(evsel)); /* * Here value 28 is a constant size which can be used to print @@ -722,12 +724,12 @@ static void set_regs_in_dict(PyObject *dict, int size = __sw_hweight64(attr->sample_regs_intr) * 28; char bf[size]; - regs_map(&sample->intr_regs, attr->sample_regs_intr, bf, sizeof(bf)); + regs_map(&sample->intr_regs, attr->sample_regs_intr, arch, bf, sizeof(bf)); pydict_set_item_string_decref(dict, "iregs", _PyUnicode_FromString(bf)); - regs_map(&sample->user_regs, attr->sample_regs_user, bf, sizeof(bf)); + regs_map(&sample->user_regs, attr->sample_regs_user, arch, bf, sizeof(bf)); pydict_set_item_string_decref(dict, "uregs", _PyUnicode_FromString(bf)); @@ -1555,7 +1557,7 @@ static void get_handler_name(char *str, size_t size, } static void -process_stat(struct evsel *counter, int cpu, int thread, u64 tstamp, +process_stat(struct evsel *counter, struct perf_cpu cpu, int thread, u64 tstamp, struct perf_counts_values *count) { PyObject *handler, *t; @@ -1575,7 +1577,7 @@ process_stat(struct evsel *counter, int cpu, int thread, u64 tstamp, return; } - PyTuple_SetItem(t, n++, _PyLong_FromLong(cpu)); + PyTuple_SetItem(t, n++, _PyLong_FromLong(cpu.cpu)); PyTuple_SetItem(t, n++, _PyLong_FromLong(thread)); tuple_set_u64(t, n++, tstamp); @@ -1599,7 +1601,7 @@ static void python_process_stat(struct perf_stat_config *config, int cpu, thread; if (config->aggr_mode == AGGR_GLOBAL) { - process_stat(counter, -1, -1, tstamp, + process_stat(counter, (struct perf_cpu){ .cpu = -1 }, -1, tstamp, &counter->counts->aggr); return; } diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index d8857d1b6d7c..f19348dddd55 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -15,6 +15,7 @@ #include "map_symbol.h" #include "branch.h" #include "debug.h" +#include "env.h" #include "evlist.h" #include "evsel.h" #include "memswap.h" @@ -1168,7 +1169,7 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack) } } -static void regs_dump__printf(u64 mask, u64 *regs) +static void regs_dump__printf(u64 mask, u64 *regs, const char *arch) { unsigned rid, i = 0; @@ -1176,7 +1177,7 @@ static void regs_dump__printf(u64 mask, u64 *regs) u64 val = regs[i++]; printf(".... %-5s 0x%016" PRIx64 "\n", - perf_reg_name(rid), val); + perf_reg_name(rid, arch), val); } } @@ -1194,7 +1195,7 @@ static inline const char *regs_dump_abi(struct regs_dump *d) return regs_abi[d->abi]; } -static void regs__printf(const char *type, struct regs_dump *regs) +static void regs__printf(const char *type, struct regs_dump *regs, const char *arch) { u64 mask = regs->mask; @@ -1203,23 +1204,23 @@ static void regs__printf(const char *type, struct regs_dump *regs) mask, regs_dump_abi(regs)); - regs_dump__printf(mask, regs->regs); + regs_dump__printf(mask, regs->regs, arch); } -static void regs_user__printf(struct perf_sample *sample) +static void regs_user__printf(struct perf_sample *sample, const char *arch) { struct regs_dump *user_regs = &sample->user_regs; if (user_regs->regs) - regs__printf("user", user_regs); + regs__printf("user", user_regs, arch); } -static void regs_intr__printf(struct perf_sample *sample) +static void regs_intr__printf(struct perf_sample *sample, const char *arch) { struct regs_dump *intr_regs = &sample->intr_regs; if (intr_regs->regs) - regs__printf("intr", intr_regs); + regs__printf("intr", intr_regs, arch); } static void stack_user__printf(struct stack_dump *dump) @@ -1304,7 +1305,7 @@ char *get_page_size_name(u64 size, char *str) } static void dump_sample(struct evsel *evsel, union perf_event *event, - struct perf_sample *sample) + struct perf_sample *sample, const char *arch) { u64 sample_type; char str[PAGE_SIZE_NAME_LEN]; @@ -1325,10 +1326,10 @@ static void dump_sample(struct evsel *evsel, union perf_event *event, branch_stack__printf(sample, evsel__has_branch_callstack(evsel)); if (sample_type & PERF_SAMPLE_REGS_USER) - regs_user__printf(sample); + regs_user__printf(sample, arch); if (sample_type & PERF_SAMPLE_REGS_INTR) - regs_intr__printf(sample); + regs_intr__printf(sample, arch); if (sample_type & PERF_SAMPLE_STACK_USER) stack_user__printf(&sample->user_stack); @@ -1502,7 +1503,7 @@ static int machines__deliver_event(struct machines *machines, ++evlist->stats.nr_unknown_id; return 0; } - dump_sample(evsel, event, sample); + dump_sample(evsel, event, sample, perf_env__arch(machine->env)); if (machine == NULL) { ++evlist->stats.nr_unprocessable_samples; return 0; @@ -2537,15 +2538,15 @@ int perf_session__cpu_bitmap(struct perf_session *session, } for (i = 0; i < map->nr; i++) { - int cpu = map->map[i]; + struct perf_cpu cpu = map->map[i]; - if (cpu >= nr_cpus) { + if (cpu.cpu >= nr_cpus) { pr_err("Requested CPU %d too large. " - "Consider raising MAX_NR_CPUS\n", cpu); + "Consider raising MAX_NR_CPUS\n", cpu.cpu); goto out_delete_map; } - set_bit(cpu, cpu_bitmap); + set_bit(cpu.cpu, cpu_bitmap); } err = 0; @@ -2597,7 +2598,7 @@ int perf_event__process_id_index(struct perf_session *session, if (!sid) return -ENOENT; sid->idx = e->idx; - sid->cpu = e->cpu; + sid->cpu.cpu = e->cpu; sid->tid = e->tid; } return 0; diff --git a/tools/perf/util/smt.c b/tools/perf/util/smt.c index 34f1b1b1176c..2b0a36ebf27a 100644 --- a/tools/perf/util/smt.c +++ b/tools/perf/util/smt.c @@ -5,6 +5,56 @@ #include "api/fs/fs.h" #include "smt.h" +/** + * hweight_str - Returns the number of bits set in str. Stops at first non-hex + * or ',' character. + */ +static int hweight_str(char *str) +{ + int result = 0; + + while (*str) { + switch (*str++) { + case '0': + case ',': + break; + case '1': + case '2': + case '4': + case '8': + result++; + break; + case '3': + case '5': + case '6': + case '9': + case 'a': + case 'A': + case 'c': + case 'C': + result += 2; + break; + case '7': + case 'b': + case 'B': + case 'd': + case 'D': + case 'e': + case 'E': + result += 3; + break; + case 'f': + case 'F': + result += 4; + break; + default: + goto done; + } + } +done: + return result; +} + int smt_on(void) { static bool cached; @@ -15,9 +65,12 @@ int smt_on(void) if (cached) return cached_result; - if (sysfs__read_int("devices/system/cpu/smt/active", &cached_result) >= 0) - goto done; + if (sysfs__read_int("devices/system/cpu/smt/active", &cached_result) >= 0) { + cached = true; + return cached_result; + } + cached_result = 0; ncpu = sysconf(_SC_NPROCESSORS_CONF); for (cpu = 0; cpu < ncpu; cpu++) { unsigned long long siblings; @@ -26,27 +79,21 @@ int smt_on(void) char fn[256]; snprintf(fn, sizeof fn, - "devices/system/cpu/cpu%d/topology/core_cpus", cpu); + "devices/system/cpu/cpu%d/topology/thread_siblings", cpu); if (sysfs__read_str(fn, &str, &strlen) < 0) { snprintf(fn, sizeof fn, - "devices/system/cpu/cpu%d/topology/thread_siblings", - cpu); + "devices/system/cpu/cpu%d/topology/core_cpus", cpu); if (sysfs__read_str(fn, &str, &strlen) < 0) continue; } /* Entry is hex, but does not have 0x, so need custom parser */ - siblings = strtoull(str, NULL, 16); + siblings = hweight_str(str); free(str); - if (hweight64(siblings) > 1) { + if (siblings > 1) { cached_result = 1; - cached = true; break; } } - if (!cached) { - cached_result = 0; -done: - cached = true; - } + cached = true; return cached_result; } diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index d9a106f0edb2..cfba8c337783 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -37,7 +37,7 @@ const char default_parent_pattern[] = "^sys_|^do_page_fault"; const char *parent_pattern = default_parent_pattern; const char *default_sort_order = "comm,dso,symbol"; const char default_branch_sort_order[] = "comm,dso_from,symbol_from,symbol_to,cycles"; -const char default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,p_stage_cyc"; +const char default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc"; const char default_top_sort_order[] = "dso,symbol"; const char default_diff_sort_order[] = "dso,symbol"; const char default_tracepoint_sort_order[] = "trace"; @@ -46,8 +46,8 @@ const char *field_order; regex_t ignore_callees_regex; int have_ignore_callees = 0; enum sort_mode sort__mode = SORT_MODE__NORMAL; -const char *dynamic_headers[] = {"local_ins_lat", "p_stage_cyc"}; -const char *arch_specific_sort_keys[] = {"p_stage_cyc"}; +static const char *const dynamic_headers[] = {"local_ins_lat", "ins_lat", "local_p_stage_cyc", "p_stage_cyc"}; +static const char *const arch_specific_sort_keys[] = {"local_p_stage_cyc", "p_stage_cyc"}; /* * Replaces all occurrences of a char used with the: @@ -1392,22 +1392,37 @@ struct sort_entry sort_global_ins_lat = { }; static int64_t -sort__global_p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right) +sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right) { return left->p_stage_cyc - right->p_stage_cyc; } +static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width) +{ + return repsep_snprintf(bf, size, "%-*u", width, + he->p_stage_cyc * he->stat.nr_events); +} + + static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf, size_t size, unsigned int width) { return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc); } -struct sort_entry sort_p_stage_cyc = { - .se_header = "Pipeline Stage Cycle", - .se_cmp = sort__global_p_stage_cyc_cmp, +struct sort_entry sort_local_p_stage_cyc = { + .se_header = "Local Pipeline Stage Cycle", + .se_cmp = sort__p_stage_cyc_cmp, .se_snprintf = hist_entry__p_stage_cyc_snprintf, - .se_width_idx = HISTC_P_STAGE_CYC, + .se_width_idx = HISTC_LOCAL_P_STAGE_CYC, +}; + +struct sort_entry sort_global_p_stage_cyc = { + .se_header = "Pipeline Stage Cycle", + .se_cmp = sort__p_stage_cyc_cmp, + .se_snprintf = hist_entry__global_p_stage_cyc_snprintf, + .se_width_idx = HISTC_GLOBAL_P_STAGE_CYC, }; struct sort_entry sort_mem_daddr_sym = { @@ -1858,7 +1873,8 @@ static struct sort_dimension common_sort_dimensions[] = { DIM(SORT_CODE_PAGE_SIZE, "code_page_size", sort_code_page_size), DIM(SORT_LOCAL_INS_LAT, "local_ins_lat", sort_local_ins_lat), DIM(SORT_GLOBAL_INS_LAT, "ins_lat", sort_global_ins_lat), - DIM(SORT_PIPELINE_STAGE_CYC, "p_stage_cyc", sort_p_stage_cyc), + DIM(SORT_LOCAL_PIPELINE_STAGE_CYC, "local_p_stage_cyc", sort_local_p_stage_cyc), + DIM(SORT_GLOBAL_PIPELINE_STAGE_CYC, "p_stage_cyc", sort_global_p_stage_cyc), }; #undef DIM diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 7b7145501933..f994261888e1 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -235,7 +235,8 @@ enum sort_type { SORT_CODE_PAGE_SIZE, SORT_LOCAL_INS_LAT, SORT_GLOBAL_INS_LAT, - SORT_PIPELINE_STAGE_CYC, + SORT_LOCAL_PIPELINE_STAGE_CYC, + SORT_GLOBAL_PIPELINE_STAGE_CYC, /* branch stack specific sort keys */ __SORT_BRANCH_STACK, diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index 588601000f3f..5db83e51ceef 100644 --- a/tools/perf/util/stat-display.c +++ b/tools/perf/util/stat-display.c @@ -4,6 +4,7 @@ #include <linux/string.h> #include <linux/time64.h> #include <math.h> +#include <perf/cpumap.h> #include "color.h" #include "counts.h" #include "evlist.h" @@ -120,11 +121,10 @@ static void aggr_printout(struct perf_stat_config *config, id.die, config->csv_output ? 0 : -3, id.core, config->csv_sep); - } else if (id.core > -1) { + } else if (id.cpu.cpu > -1) { fprintf(config->output, "CPU%*d%s", config->csv_output ? 0 : -7, - evsel__cpus(evsel)->map[id.core], - config->csv_sep); + id.cpu.cpu, config->csv_sep); } break; case AGGR_THREAD: @@ -327,26 +327,24 @@ static void print_metric_header(struct perf_stat_config *config, fprintf(os->fh, "%*s ", config->metric_only_len, unit); } -static int first_shadow_cpu(struct perf_stat_config *config, - struct evsel *evsel, struct aggr_cpu_id id) +static int first_shadow_cpu_map_idx(struct perf_stat_config *config, + struct evsel *evsel, const struct aggr_cpu_id *id) { - struct evlist *evlist = evsel->evlist; - int i; + struct perf_cpu_map *cpus = evsel__cpus(evsel); + struct perf_cpu cpu; + int idx; if (config->aggr_mode == AGGR_NONE) - return id.core; + return perf_cpu_map__idx(cpus, id->cpu); if (!config->aggr_get_id) return 0; - for (i = 0; i < evsel__nr_cpus(evsel); i++) { - int cpu2 = evsel__cpus(evsel)->map[i]; + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu); - if (cpu_map__compare_aggr_cpu_id( - config->aggr_get_id(config, evlist->core.cpus, cpu2), - id)) { - return cpu2; - } + if (aggr_cpu_id__equal(&cpu_id, id)) + return idx; } return 0; } @@ -505,7 +503,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int } perf_stat__print_shadow_stats(config, counter, uval, - first_shadow_cpu(config, counter, id), + first_shadow_cpu_map_idx(config, counter, &id), &out, &config->metric_events, st); if (!config->csv_output && !config->metric_only) { print_noise(config, counter, noise); @@ -516,23 +514,26 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int static void aggr_update_shadow(struct perf_stat_config *config, struct evlist *evlist) { - int cpu, s; + int idx, s; + struct perf_cpu cpu; struct aggr_cpu_id s2, id; u64 val; struct evsel *counter; + struct perf_cpu_map *cpus; for (s = 0; s < config->aggr_map->nr; s++) { id = config->aggr_map->map[s]; evlist__for_each_entry(evlist, counter) { + cpus = evsel__cpus(counter); val = 0; - for (cpu = 0; cpu < evsel__nr_cpus(counter); cpu++) { - s2 = config->aggr_get_id(config, evlist->core.cpus, cpu); - if (!cpu_map__compare_aggr_cpu_id(s2, id)) + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + s2 = config->aggr_get_id(config, cpu); + if (!aggr_cpu_id__equal(&s2, &id)) continue; - val += perf_counts(counter->counts, cpu, 0)->val; + val += perf_counts(counter->counts, idx, 0)->val; } perf_stat__update_shadow_stats(counter, val, - first_shadow_cpu(config, counter, id), + first_shadow_cpu_map_idx(config, counter, &id), &rt_stat); } } @@ -627,25 +628,28 @@ struct aggr_data { u64 ena, run, val; struct aggr_cpu_id id; int nr; - int cpu; + int cpu_map_idx; }; static void aggr_cb(struct perf_stat_config *config, struct evsel *counter, void *data, bool first) { struct aggr_data *ad = data; - int cpu; + int idx; + struct perf_cpu cpu; + struct perf_cpu_map *cpus; struct aggr_cpu_id s2; - for (cpu = 0; cpu < evsel__nr_cpus(counter); cpu++) { + cpus = evsel__cpus(counter); + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { struct perf_counts_values *counts; - s2 = config->aggr_get_id(config, evsel__cpus(counter), cpu); - if (!cpu_map__compare_aggr_cpu_id(s2, ad->id)) + s2 = config->aggr_get_id(config, cpu); + if (!aggr_cpu_id__equal(&s2, &ad->id)) continue; if (first) ad->nr++; - counts = perf_counts(counter->counts, cpu, 0); + counts = perf_counts(counter->counts, idx, 0); /* * When any result is bad, make them all to give * consistent output in interval mode. @@ -665,7 +669,7 @@ static void aggr_cb(struct perf_stat_config *config, static void print_counter_aggrdata(struct perf_stat_config *config, struct evsel *counter, int s, char *prefix, bool metric_only, - bool *first, int cpu) + bool *first, struct perf_cpu cpu) { struct aggr_data ad; FILE *output = config->output; @@ -695,10 +699,9 @@ static void print_counter_aggrdata(struct perf_stat_config *config, fprintf(output, "%s", prefix); uval = val * counter->scale; - if (cpu != -1) { - id = cpu_map__empty_aggr_cpu_id(); - id.core = cpu; - } + if (cpu.cpu != -1) + id = aggr_cpu_id__cpu(cpu, /*data=*/NULL); + printout(config, id, nr, counter, uval, prefix, run, ena, 1.0, &rt_stat); if (!metric_only) @@ -731,8 +734,8 @@ static void print_aggr(struct perf_stat_config *config, first = true; evlist__for_each_entry(evlist, counter) { print_counter_aggrdata(config, counter, s, - prefix, metric_only, - &first, -1); + prefix, metric_only, + &first, (struct perf_cpu){ .cpu = -1 }); } if (metric_only) fputc('\n', output); @@ -778,7 +781,7 @@ static struct perf_aggr_thread_value *sort_aggr_thread( continue; buf[i].counter = counter; - buf[i].id = cpu_map__empty_aggr_cpu_id(); + buf[i].id = aggr_cpu_id__empty(); buf[i].id.thread = thread; buf[i].uval = uval; buf[i].val = val; @@ -866,7 +869,7 @@ static void print_counter_aggr(struct perf_stat_config *config, fprintf(output, "%s", prefix); uval = cd.avg * counter->scale; - printout(config, cpu_map__empty_aggr_cpu_id(), 0, counter, uval, prefix, cd.avg_running, + printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled, cd.avg, &rt_stat); if (!metric_only) fprintf(output, "\n"); @@ -878,9 +881,9 @@ static void counter_cb(struct perf_stat_config *config __maybe_unused, { struct aggr_data *ad = data; - ad->val += perf_counts(counter->counts, ad->cpu, 0)->val; - ad->ena += perf_counts(counter->counts, ad->cpu, 0)->ena; - ad->run += perf_counts(counter->counts, ad->cpu, 0)->run; + ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val; + ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena; + ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run; } /* @@ -893,11 +896,12 @@ static void print_counter(struct perf_stat_config *config, FILE *output = config->output; u64 ena, run, val; double uval; - int cpu; + int idx; + struct perf_cpu cpu; struct aggr_cpu_id id; - for (cpu = 0; cpu < evsel__nr_cpus(counter); cpu++) { - struct aggr_data ad = { .cpu = cpu }; + perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) { + struct aggr_data ad = { .cpu_map_idx = idx }; if (!collect_data(config, counter, counter_cb, &ad)) return; @@ -909,8 +913,7 @@ static void print_counter(struct perf_stat_config *config, fprintf(output, "%s", prefix); uval = val * counter->scale; - id = cpu_map__empty_aggr_cpu_id(); - id.core = cpu; + id = aggr_cpu_id__cpu(cpu, /*data=*/NULL); printout(config, id, 0, counter, uval, prefix, run, ena, 1.0, &rt_stat); @@ -922,29 +925,32 @@ static void print_no_aggr_metric(struct perf_stat_config *config, struct evlist *evlist, char *prefix) { - int cpu; - int nrcpus = 0; - struct evsel *counter; - u64 ena, run, val; - double uval; - struct aggr_cpu_id id; + int all_idx; + struct perf_cpu cpu; - nrcpus = evlist->core.cpus->nr; - for (cpu = 0; cpu < nrcpus; cpu++) { + perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.cpus) { + struct evsel *counter; bool first = true; if (prefix) fputs(prefix, config->output); evlist__for_each_entry(evlist, counter) { - id = cpu_map__empty_aggr_cpu_id(); - id.core = cpu; + u64 ena, run, val; + double uval; + struct aggr_cpu_id id; + int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu); + + if (counter_idx < 0) + continue; + + id = aggr_cpu_id__cpu(cpu, /*data=*/NULL); if (first) { aggr_printout(config, counter, id, 0); first = false; } - val = perf_counts(counter->counts, cpu, 0)->val; - ena = perf_counts(counter->counts, cpu, 0)->ena; - run = perf_counts(counter->counts, cpu, 0)->run; + val = perf_counts(counter->counts, counter_idx, 0)->val; + ena = perf_counts(counter->counts, counter_idx, 0)->ena; + run = perf_counts(counter->counts, counter_idx, 0)->run; uval = val * counter->scale; printout(config, id, 0, counter, uval, prefix, @@ -1208,19 +1214,23 @@ static void print_percore_thread(struct perf_stat_config *config, { int s; struct aggr_cpu_id s2, id; + struct perf_cpu_map *cpus; bool first = true; + int idx; + struct perf_cpu cpu; - for (int i = 0; i < evsel__nr_cpus(counter); i++) { - s2 = config->aggr_get_id(config, evsel__cpus(counter), i); + cpus = evsel__cpus(counter); + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + s2 = config->aggr_get_id(config, cpu); for (s = 0; s < config->aggr_map->nr; s++) { id = config->aggr_map->map[s]; - if (cpu_map__compare_aggr_cpu_id(s2, id)) + if (aggr_cpu_id__equal(&s2, &id)) break; } print_counter_aggrdata(config, counter, s, prefix, false, - &first, i); + &first, cpu); } } @@ -1243,8 +1253,8 @@ static void print_percore(struct perf_stat_config *config, fprintf(output, "%s", prefix); print_counter_aggrdata(config, counter, s, - prefix, metric_only, - &first, -1); + prefix, metric_only, + &first, (struct perf_cpu){ .cpu = -1 }); } if (metric_only) diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index 5c7308efa768..10af7804e482 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -32,7 +32,7 @@ struct saved_value { struct evsel *evsel; enum stat_type type; int ctx; - int cpu; + int cpu_map_idx; struct cgroup *cgrp; struct runtime_stat *stat; struct stats stats; @@ -47,8 +47,8 @@ static int saved_value_cmp(struct rb_node *rb_node, const void *entry) rb_node); const struct saved_value *b = entry; - if (a->cpu != b->cpu) - return a->cpu - b->cpu; + if (a->cpu_map_idx != b->cpu_map_idx) + return a->cpu_map_idx - b->cpu_map_idx; /* * Previously the rbtree was used to link generic metrics. @@ -105,7 +105,7 @@ static void saved_value_delete(struct rblist *rblist __maybe_unused, } static struct saved_value *saved_value_lookup(struct evsel *evsel, - int cpu, + int cpu_map_idx, bool create, enum stat_type type, int ctx, @@ -115,7 +115,7 @@ static struct saved_value *saved_value_lookup(struct evsel *evsel, struct rblist *rblist; struct rb_node *nd; struct saved_value dm = { - .cpu = cpu, + .cpu_map_idx = cpu_map_idx, .evsel = evsel, .type = type, .ctx = ctx, @@ -213,10 +213,10 @@ struct runtime_stat_data { static void update_runtime_stat(struct runtime_stat *st, enum stat_type type, - int cpu, u64 count, + int cpu_map_idx, u64 count, struct runtime_stat_data *rsd) { - struct saved_value *v = saved_value_lookup(NULL, cpu, true, type, + struct saved_value *v = saved_value_lookup(NULL, cpu_map_idx, true, type, rsd->ctx, st, rsd->cgrp); if (v) @@ -229,7 +229,7 @@ static void update_runtime_stat(struct runtime_stat *st, * instruction rates, etc: */ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, - int cpu, struct runtime_stat *st) + int cpu_map_idx, struct runtime_stat *st) { u64 count_ns = count; struct saved_value *v; @@ -241,88 +241,88 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, count *= counter->scale; if (evsel__is_clock(counter)) - update_runtime_stat(st, STAT_NSECS, cpu, count_ns, &rsd); + update_runtime_stat(st, STAT_NSECS, cpu_map_idx, count_ns, &rsd); else if (evsel__match(counter, HARDWARE, HW_CPU_CYCLES)) - update_runtime_stat(st, STAT_CYCLES, cpu, count, &rsd); + update_runtime_stat(st, STAT_CYCLES, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, CYCLES_IN_TX)) - update_runtime_stat(st, STAT_CYCLES_IN_TX, cpu, count, &rsd); + update_runtime_stat(st, STAT_CYCLES_IN_TX, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TRANSACTION_START)) - update_runtime_stat(st, STAT_TRANSACTION, cpu, count, &rsd); + update_runtime_stat(st, STAT_TRANSACTION, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, ELISION_START)) - update_runtime_stat(st, STAT_ELISION, cpu, count, &rsd); + update_runtime_stat(st, STAT_ELISION, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS)) update_runtime_stat(st, STAT_TOPDOWN_TOTAL_SLOTS, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED)) update_runtime_stat(st, STAT_TOPDOWN_SLOTS_ISSUED, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED)) update_runtime_stat(st, STAT_TOPDOWN_SLOTS_RETIRED, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES)) update_runtime_stat(st, STAT_TOPDOWN_FETCH_BUBBLES, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES)) update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_RETIRING)) update_runtime_stat(st, STAT_TOPDOWN_RETIRING, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_BAD_SPEC)) update_runtime_stat(st, STAT_TOPDOWN_BAD_SPEC, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_FE_BOUND)) update_runtime_stat(st, STAT_TOPDOWN_FE_BOUND, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND)) update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_HEAVY_OPS)) update_runtime_stat(st, STAT_TOPDOWN_HEAVY_OPS, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_BR_MISPREDICT)) update_runtime_stat(st, STAT_TOPDOWN_BR_MISPREDICT, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_LAT)) update_runtime_stat(st, STAT_TOPDOWN_FETCH_LAT, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_MEM_BOUND)) update_runtime_stat(st, STAT_TOPDOWN_MEM_BOUND, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND)) update_runtime_stat(st, STAT_STALLED_CYCLES_BACK, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS)) - update_runtime_stat(st, STAT_BRANCHES, cpu, count, &rsd); + update_runtime_stat(st, STAT_BRANCHES, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES)) - update_runtime_stat(st, STAT_CACHEREFS, cpu, count, &rsd); + update_runtime_stat(st, STAT_CACHEREFS, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1D)) - update_runtime_stat(st, STAT_L1_DCACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_L1_DCACHE, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1I)) - update_runtime_stat(st, STAT_L1_ICACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_L1_ICACHE, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_LL)) - update_runtime_stat(st, STAT_LL_CACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_LL_CACHE, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_DTLB)) - update_runtime_stat(st, STAT_DTLB_CACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_DTLB_CACHE, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_ITLB)) - update_runtime_stat(st, STAT_ITLB_CACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_ITLB_CACHE, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, SMI_NUM)) - update_runtime_stat(st, STAT_SMI_NUM, cpu, count, &rsd); + update_runtime_stat(st, STAT_SMI_NUM, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, APERF)) - update_runtime_stat(st, STAT_APERF, cpu, count, &rsd); + update_runtime_stat(st, STAT_APERF, cpu_map_idx, count, &rsd); if (counter->collect_stat) { - v = saved_value_lookup(counter, cpu, true, STAT_NONE, 0, st, + v = saved_value_lookup(counter, cpu_map_idx, true, STAT_NONE, 0, st, rsd.cgrp); update_stats(&v->stats, count); if (counter->metric_leader) v->metric_total += count; } else if (counter->metric_leader) { v = saved_value_lookup(counter->metric_leader, - cpu, true, STAT_NONE, 0, st, rsd.cgrp); + cpu_map_idx, true, STAT_NONE, 0, st, rsd.cgrp); v->metric_total += count; v->metric_other++; } @@ -464,12 +464,12 @@ void perf_stat__collect_metric_expr(struct evlist *evsel_list) } static double runtime_stat_avg(struct runtime_stat *st, - enum stat_type type, int cpu, + enum stat_type type, int cpu_map_idx, struct runtime_stat_data *rsd) { struct saved_value *v; - v = saved_value_lookup(NULL, cpu, false, type, rsd->ctx, st, rsd->cgrp); + v = saved_value_lookup(NULL, cpu_map_idx, false, type, rsd->ctx, st, rsd->cgrp); if (!v) return 0.0; @@ -477,12 +477,12 @@ static double runtime_stat_avg(struct runtime_stat *st, } static double runtime_stat_n(struct runtime_stat *st, - enum stat_type type, int cpu, + enum stat_type type, int cpu_map_idx, struct runtime_stat_data *rsd) { struct saved_value *v; - v = saved_value_lookup(NULL, cpu, false, type, rsd->ctx, st, rsd->cgrp); + v = saved_value_lookup(NULL, cpu_map_idx, false, type, rsd->ctx, st, rsd->cgrp); if (!v) return 0.0; @@ -490,7 +490,7 @@ static double runtime_stat_n(struct runtime_stat *st, } static void print_stalled_cycles_frontend(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -498,7 +498,7 @@ static void print_stalled_cycles_frontend(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_CYCLES, cpu, rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -513,7 +513,7 @@ static void print_stalled_cycles_frontend(struct perf_stat_config *config, } static void print_stalled_cycles_backend(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -521,7 +521,7 @@ static void print_stalled_cycles_backend(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_CYCLES, cpu, rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -532,7 +532,7 @@ static void print_stalled_cycles_backend(struct perf_stat_config *config, } static void print_branch_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -540,7 +540,7 @@ static void print_branch_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_BRANCHES, cpu, rsd); + total = runtime_stat_avg(st, STAT_BRANCHES, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -551,7 +551,7 @@ static void print_branch_misses(struct perf_stat_config *config, } static void print_l1_dcache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -559,7 +559,7 @@ static void print_l1_dcache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_L1_DCACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_L1_DCACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -570,7 +570,7 @@ static void print_l1_dcache_misses(struct perf_stat_config *config, } static void print_l1_icache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -578,7 +578,7 @@ static void print_l1_icache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_L1_ICACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_L1_ICACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -588,7 +588,7 @@ static void print_l1_icache_misses(struct perf_stat_config *config, } static void print_dtlb_cache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -596,7 +596,7 @@ static void print_dtlb_cache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_DTLB_CACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_DTLB_CACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -606,7 +606,7 @@ static void print_dtlb_cache_misses(struct perf_stat_config *config, } static void print_itlb_cache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -614,7 +614,7 @@ static void print_itlb_cache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_ITLB_CACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_ITLB_CACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -624,7 +624,7 @@ static void print_itlb_cache_misses(struct perf_stat_config *config, } static void print_ll_cache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -632,7 +632,7 @@ static void print_ll_cache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_LL_CACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_LL_CACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -690,61 +690,61 @@ static double sanitize_val(double x) return x; } -static double td_total_slots(int cpu, struct runtime_stat *st, +static double td_total_slots(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { - return runtime_stat_avg(st, STAT_TOPDOWN_TOTAL_SLOTS, cpu, rsd); + return runtime_stat_avg(st, STAT_TOPDOWN_TOTAL_SLOTS, cpu_map_idx, rsd); } -static double td_bad_spec(int cpu, struct runtime_stat *st, +static double td_bad_spec(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { double bad_spec = 0; double total_slots; double total; - total = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_ISSUED, cpu, rsd) - - runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, cpu, rsd) + - runtime_stat_avg(st, STAT_TOPDOWN_RECOVERY_BUBBLES, cpu, rsd); + total = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_ISSUED, cpu_map_idx, rsd) - + runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, cpu_map_idx, rsd) + + runtime_stat_avg(st, STAT_TOPDOWN_RECOVERY_BUBBLES, cpu_map_idx, rsd); - total_slots = td_total_slots(cpu, st, rsd); + total_slots = td_total_slots(cpu_map_idx, st, rsd); if (total_slots) bad_spec = total / total_slots; return sanitize_val(bad_spec); } -static double td_retiring(int cpu, struct runtime_stat *st, +static double td_retiring(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { double retiring = 0; - double total_slots = td_total_slots(cpu, st, rsd); + double total_slots = td_total_slots(cpu_map_idx, st, rsd); double ret_slots = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, - cpu, rsd); + cpu_map_idx, rsd); if (total_slots) retiring = ret_slots / total_slots; return retiring; } -static double td_fe_bound(int cpu, struct runtime_stat *st, +static double td_fe_bound(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { double fe_bound = 0; - double total_slots = td_total_slots(cpu, st, rsd); + double total_slots = td_total_slots(cpu_map_idx, st, rsd); double fetch_bub = runtime_stat_avg(st, STAT_TOPDOWN_FETCH_BUBBLES, - cpu, rsd); + cpu_map_idx, rsd); if (total_slots) fe_bound = fetch_bub / total_slots; return fe_bound; } -static double td_be_bound(int cpu, struct runtime_stat *st, +static double td_be_bound(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { - double sum = (td_fe_bound(cpu, st, rsd) + - td_bad_spec(cpu, st, rsd) + - td_retiring(cpu, st, rsd)); + double sum = (td_fe_bound(cpu_map_idx, st, rsd) + + td_bad_spec(cpu_map_idx, st, rsd) + + td_retiring(cpu_map_idx, st, rsd)); if (sum == 0) return 0; return sanitize_val(1.0 - sum); @@ -755,15 +755,15 @@ static double td_be_bound(int cpu, struct runtime_stat *st, * the ratios we need to recreate the sum. */ -static double td_metric_ratio(int cpu, enum stat_type type, +static double td_metric_ratio(int cpu_map_idx, enum stat_type type, struct runtime_stat *stat, struct runtime_stat_data *rsd) { - double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, cpu, rsd) + - runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, cpu, rsd) + - runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, cpu, rsd) + - runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, cpu, rsd); - double d = runtime_stat_avg(stat, type, cpu, rsd); + double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, cpu_map_idx, rsd) + + runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, cpu_map_idx, rsd) + + runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, cpu_map_idx, rsd) + + runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, cpu_map_idx, rsd); + double d = runtime_stat_avg(stat, type, cpu_map_idx, rsd); if (sum) return d / sum; @@ -775,23 +775,23 @@ static double td_metric_ratio(int cpu, enum stat_type type, * We allow two missing. */ -static bool full_td(int cpu, struct runtime_stat *stat, +static bool full_td(int cpu_map_idx, struct runtime_stat *stat, struct runtime_stat_data *rsd) { int c = 0; - if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, cpu, rsd) > 0) + if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, cpu_map_idx, rsd) > 0) c++; - if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, cpu, rsd) > 0) + if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, cpu_map_idx, rsd) > 0) c++; - if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, cpu, rsd) > 0) + if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, cpu_map_idx, rsd) > 0) c++; - if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, cpu, rsd) > 0) + if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, cpu_map_idx, rsd) > 0) c++; return c >= 2; } -static void print_smi_cost(struct perf_stat_config *config, int cpu, +static void print_smi_cost(struct perf_stat_config *config, int cpu_map_idx, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -799,9 +799,9 @@ static void print_smi_cost(struct perf_stat_config *config, int cpu, double smi_num, aperf, cycles, cost = 0.0; const char *color = NULL; - smi_num = runtime_stat_avg(st, STAT_SMI_NUM, cpu, rsd); - aperf = runtime_stat_avg(st, STAT_APERF, cpu, rsd); - cycles = runtime_stat_avg(st, STAT_CYCLES, cpu, rsd); + smi_num = runtime_stat_avg(st, STAT_SMI_NUM, cpu_map_idx, rsd); + aperf = runtime_stat_avg(st, STAT_APERF, cpu_map_idx, rsd); + cycles = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, rsd); if ((cycles == 0) || (aperf == 0)) return; @@ -818,7 +818,7 @@ static void print_smi_cost(struct perf_stat_config *config, int cpu, static int prepare_metric(struct evsel **metric_events, struct metric_ref *metric_refs, struct expr_parse_ctx *pctx, - int cpu, + int cpu_map_idx, struct runtime_stat *st) { double scale; @@ -836,7 +836,7 @@ static int prepare_metric(struct evsel **metric_events, scale = 1e-9; source_count = 1; } else { - v = saved_value_lookup(metric_events[i], cpu, false, + v = saved_value_lookup(metric_events[i], cpu_map_idx, false, STAT_NONE, 0, st, metric_events[i]->cgrp); if (!v) @@ -874,7 +874,7 @@ static void generic_metric(struct perf_stat_config *config, const char *metric_name, const char *metric_unit, int runtime, - int cpu, + int cpu_map_idx, struct perf_stat_output_ctx *out, struct runtime_stat *st) { @@ -889,7 +889,7 @@ static void generic_metric(struct perf_stat_config *config, return; pctx->runtime = runtime; - i = prepare_metric(metric_events, metric_refs, pctx, cpu, st); + i = prepare_metric(metric_events, metric_refs, pctx, cpu_map_idx, st); if (i < 0) { expr__ctx_free(pctx); return; @@ -934,7 +934,7 @@ static void generic_metric(struct perf_stat_config *config, expr__ctx_free(pctx); } -double test_generic_metric(struct metric_expr *mexp, int cpu, struct runtime_stat *st) +double test_generic_metric(struct metric_expr *mexp, int cpu_map_idx, struct runtime_stat *st) { struct expr_parse_ctx *pctx; double ratio = 0.0; @@ -943,7 +943,7 @@ double test_generic_metric(struct metric_expr *mexp, int cpu, struct runtime_sta if (!pctx) return NAN; - if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, cpu, st) < 0) + if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, cpu_map_idx, st) < 0) goto out; if (expr__parse(&ratio, pctx, mexp->metric_expr)) @@ -956,7 +956,7 @@ out: void perf_stat__print_shadow_stats(struct perf_stat_config *config, struct evsel *evsel, - double avg, int cpu, + double avg, int cpu_map_idx, struct perf_stat_output_ctx *out, struct rblist *metric_events, struct runtime_stat *st) @@ -975,7 +975,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, if (config->iostat_run) { iostat_print_metric(config, evsel, out); } else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) { - total = runtime_stat_avg(st, STAT_CYCLES, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, &rsd); if (total) { ratio = avg / total; @@ -985,11 +985,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, NULL, NULL, "insn per cycle", 0); } - total = runtime_stat_avg(st, STAT_STALLED_CYCLES_FRONT, cpu, &rsd); + total = runtime_stat_avg(st, STAT_STALLED_CYCLES_FRONT, cpu_map_idx, &rsd); total = max(total, runtime_stat_avg(st, STAT_STALLED_CYCLES_BACK, - cpu, &rsd)); + cpu_map_idx, &rsd)); if (total && avg) { out->new_line(config, ctxp); @@ -999,8 +999,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ratio); } } else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) { - if (runtime_stat_n(st, STAT_BRANCHES, cpu, &rsd) != 0) - print_branch_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_BRANCHES, cpu_map_idx, &rsd) != 0) + print_branch_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all branches", 0); } else if ( @@ -1009,8 +1009,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_L1_DCACHE, cpu, &rsd) != 0) - print_l1_dcache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_L1_DCACHE, cpu_map_idx, &rsd) != 0) + print_l1_dcache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all L1-dcache accesses", 0); } else if ( @@ -1019,8 +1019,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_L1_ICACHE, cpu, &rsd) != 0) - print_l1_icache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_L1_ICACHE, cpu_map_idx, &rsd) != 0) + print_l1_icache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all L1-icache accesses", 0); } else if ( @@ -1029,8 +1029,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_DTLB_CACHE, cpu, &rsd) != 0) - print_dtlb_cache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_DTLB_CACHE, cpu_map_idx, &rsd) != 0) + print_dtlb_cache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all dTLB cache accesses", 0); } else if ( @@ -1039,8 +1039,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_ITLB_CACHE, cpu, &rsd) != 0) - print_itlb_cache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_ITLB_CACHE, cpu_map_idx, &rsd) != 0) + print_itlb_cache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all iTLB cache accesses", 0); } else if ( @@ -1049,27 +1049,27 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_LL_CACHE, cpu, &rsd) != 0) - print_ll_cache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_LL_CACHE, cpu_map_idx, &rsd) != 0) + print_ll_cache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all LL-cache accesses", 0); } else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) { - total = runtime_stat_avg(st, STAT_CACHEREFS, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CACHEREFS, cpu_map_idx, &rsd); if (total) ratio = avg * 100 / total; - if (runtime_stat_n(st, STAT_CACHEREFS, cpu, &rsd) != 0) + if (runtime_stat_n(st, STAT_CACHEREFS, cpu_map_idx, &rsd) != 0) print_metric(config, ctxp, NULL, "%8.3f %%", "of all cache refs", ratio); else print_metric(config, ctxp, NULL, NULL, "of all cache refs", 0); } else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) { - print_stalled_cycles_frontend(config, cpu, avg, out, st, &rsd); + print_stalled_cycles_frontend(config, cpu_map_idx, avg, out, st, &rsd); } else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) { - print_stalled_cycles_backend(config, cpu, avg, out, st, &rsd); + print_stalled_cycles_backend(config, cpu_map_idx, avg, out, st, &rsd); } else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) { - total = runtime_stat_avg(st, STAT_NSECS, cpu, &rsd); + total = runtime_stat_avg(st, STAT_NSECS, cpu_map_idx, &rsd); if (total) { ratio = avg / total; @@ -1078,7 +1078,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, NULL, NULL, "Ghz", 0); } } else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX)) { - total = runtime_stat_avg(st, STAT_CYCLES, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, &rsd); if (total) print_metric(config, ctxp, NULL, @@ -1088,8 +1088,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, NULL, NULL, "transactional cycles", 0); } else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX_CP)) { - total = runtime_stat_avg(st, STAT_CYCLES, cpu, &rsd); - total2 = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, &rsd); + total2 = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu_map_idx, &rsd); if (total2 < avg) total2 = avg; @@ -1099,19 +1099,19 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, else print_metric(config, ctxp, NULL, NULL, "aborted cycles", 0); } else if (perf_stat_evsel__is(evsel, TRANSACTION_START)) { - total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu_map_idx, &rsd); if (avg) ratio = total / avg; - if (runtime_stat_n(st, STAT_CYCLES_IN_TX, cpu, &rsd) != 0) + if (runtime_stat_n(st, STAT_CYCLES_IN_TX, cpu_map_idx, &rsd) != 0) print_metric(config, ctxp, NULL, "%8.0f", "cycles / transaction", ratio); else print_metric(config, ctxp, NULL, NULL, "cycles / transaction", 0); } else if (perf_stat_evsel__is(evsel, ELISION_START)) { - total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu_map_idx, &rsd); if (avg) ratio = total / avg; @@ -1124,28 +1124,28 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, else print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0); } else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) { - double fe_bound = td_fe_bound(cpu, st, &rsd); + double fe_bound = td_fe_bound(cpu_map_idx, st, &rsd); if (fe_bound > 0.2) color = PERF_COLOR_RED; print_metric(config, ctxp, color, "%8.1f%%", "frontend bound", fe_bound * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) { - double retiring = td_retiring(cpu, st, &rsd); + double retiring = td_retiring(cpu_map_idx, st, &rsd); if (retiring > 0.7) color = PERF_COLOR_GREEN; print_metric(config, ctxp, color, "%8.1f%%", "retiring", retiring * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) { - double bad_spec = td_bad_spec(cpu, st, &rsd); + double bad_spec = td_bad_spec(cpu_map_idx, st, &rsd); if (bad_spec > 0.1) color = PERF_COLOR_RED; print_metric(config, ctxp, color, "%8.1f%%", "bad speculation", bad_spec * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) { - double be_bound = td_be_bound(cpu, st, &rsd); + double be_bound = td_be_bound(cpu_map_idx, st, &rsd); const char *name = "backend bound"; static int have_recovery_bubbles = -1; @@ -1158,14 +1158,14 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, if (be_bound > 0.2) color = PERF_COLOR_RED; - if (td_total_slots(cpu, st, &rsd) > 0) + if (td_total_slots(cpu_map_idx, st, &rsd) > 0) print_metric(config, ctxp, color, "%8.1f%%", name, be_bound * 100.); else print_metric(config, ctxp, NULL, NULL, name, 0); } else if (perf_stat_evsel__is(evsel, TOPDOWN_RETIRING) && - full_td(cpu, st, &rsd)) { - double retiring = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd)) { + double retiring = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_RETIRING, st, &rsd); if (retiring > 0.7) @@ -1173,8 +1173,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "retiring", retiring * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_FE_BOUND) && - full_td(cpu, st, &rsd)) { - double fe_bound = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd)) { + double fe_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_FE_BOUND, st, &rsd); if (fe_bound > 0.2) @@ -1182,8 +1182,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "frontend bound", fe_bound * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_BE_BOUND) && - full_td(cpu, st, &rsd)) { - double be_bound = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd)) { + double be_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BE_BOUND, st, &rsd); if (be_bound > 0.2) @@ -1191,8 +1191,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "backend bound", be_bound * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_BAD_SPEC) && - full_td(cpu, st, &rsd)) { - double bad_spec = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd)) { + double bad_spec = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BAD_SPEC, st, &rsd); if (bad_spec > 0.1) @@ -1200,11 +1200,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "bad speculation", bad_spec * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_HEAVY_OPS) && - full_td(cpu, st, &rsd) && (config->topdown_level > 1)) { - double retiring = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd) && (config->topdown_level > 1)) { + double retiring = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_RETIRING, st, &rsd); - double heavy_ops = td_metric_ratio(cpu, + double heavy_ops = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_HEAVY_OPS, st, &rsd); double light_ops = retiring - heavy_ops; @@ -1220,11 +1220,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "light operations", light_ops * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_BR_MISPREDICT) && - full_td(cpu, st, &rsd) && (config->topdown_level > 1)) { - double bad_spec = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd) && (config->topdown_level > 1)) { + double bad_spec = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BAD_SPEC, st, &rsd); - double br_mis = td_metric_ratio(cpu, + double br_mis = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BR_MISPREDICT, st, &rsd); double m_clears = bad_spec - br_mis; @@ -1240,11 +1240,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "machine clears", m_clears * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_LAT) && - full_td(cpu, st, &rsd) && (config->topdown_level > 1)) { - double fe_bound = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd) && (config->topdown_level > 1)) { + double fe_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_FE_BOUND, st, &rsd); - double fetch_lat = td_metric_ratio(cpu, + double fetch_lat = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_FETCH_LAT, st, &rsd); double fetch_bw = fe_bound - fetch_lat; @@ -1260,11 +1260,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "fetch bandwidth", fetch_bw * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_MEM_BOUND) && - full_td(cpu, st, &rsd) && (config->topdown_level > 1)) { - double be_bound = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd) && (config->topdown_level > 1)) { + double be_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BE_BOUND, st, &rsd); - double mem_bound = td_metric_ratio(cpu, + double mem_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_MEM_BOUND, st, &rsd); double core_bound = be_bound - mem_bound; @@ -1281,12 +1281,12 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, core_bound * 100.); } else if (evsel->metric_expr) { generic_metric(config, evsel->metric_expr, evsel->metric_events, NULL, - evsel->name, evsel->metric_name, NULL, 1, cpu, out, st); - } else if (runtime_stat_n(st, STAT_NSECS, cpu, &rsd) != 0) { + evsel->name, evsel->metric_name, NULL, 1, cpu_map_idx, out, st); + } else if (runtime_stat_n(st, STAT_NSECS, cpu_map_idx, &rsd) != 0) { char unit = ' '; char unit_buf[10] = "/sec"; - total = runtime_stat_avg(st, STAT_NSECS, cpu, &rsd); + total = runtime_stat_avg(st, STAT_NSECS, cpu_map_idx, &rsd); if (total) ratio = convert_unit_double(1000000000.0 * avg / total, &unit); @@ -1294,7 +1294,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit); print_metric(config, ctxp, NULL, "%8.3f", unit_buf, ratio); } else if (perf_stat_evsel__is(evsel, SMI_NUM)) { - print_smi_cost(config, cpu, out, st, &rsd); + print_smi_cost(config, cpu_map_idx, out, st, &rsd); } else { num = 0; } @@ -1307,7 +1307,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, out->new_line(config, ctxp); generic_metric(config, mexp->metric_expr, mexp->metric_events, mexp->metric_refs, evsel->name, mexp->metric_name, - mexp->metric_unit, mexp->runtime, cpu, out, st); + mexp->metric_unit, mexp->runtime, cpu_map_idx, out, st); } } if (num == 0) diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index 09ea334586f2..ee6f03481215 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -152,11 +152,13 @@ static void evsel__free_stat_priv(struct evsel *evsel) zfree(&evsel->stats); } -static int evsel__alloc_prev_raw_counts(struct evsel *evsel, int ncpus, int nthreads) +static int evsel__alloc_prev_raw_counts(struct evsel *evsel) { + int cpu_map_nr = evsel__nr_cpus(evsel); + int nthreads = perf_thread_map__nr(evsel->core.threads); struct perf_counts *counts; - counts = perf_counts__new(ncpus, nthreads); + counts = perf_counts__new(cpu_map_nr, nthreads); if (counts) evsel->prev_raw_counts = counts; @@ -177,12 +179,9 @@ static void evsel__reset_prev_raw_counts(struct evsel *evsel) static int evsel__alloc_stats(struct evsel *evsel, bool alloc_raw) { - int ncpus = evsel__nr_cpus(evsel); - int nthreads = perf_thread_map__nr(evsel->core.threads); - if (evsel__alloc_stat_priv(evsel) < 0 || - evsel__alloc_counts(evsel, ncpus, nthreads) < 0 || - (alloc_raw && evsel__alloc_prev_raw_counts(evsel, ncpus, nthreads) < 0)) + evsel__alloc_counts(evsel) < 0 || + (alloc_raw && evsel__alloc_prev_raw_counts(evsel) < 0)) return -ENOMEM; return 0; @@ -293,11 +292,12 @@ static bool pkg_id_equal(const void *__key1, const void *__key2, return *key1 == *key2; } -static int check_per_pkg(struct evsel *counter, - struct perf_counts_values *vals, int cpu, bool *skip) +static int check_per_pkg(struct evsel *counter, struct perf_counts_values *vals, + int cpu_map_idx, bool *skip) { struct hashmap *mask = counter->per_pkg_mask; struct perf_cpu_map *cpus = evsel__cpus(counter); + struct perf_cpu cpu = perf_cpu_map__cpu(cpus, cpu_map_idx); int s, d, ret = 0; uint64_t *key; @@ -328,7 +328,7 @@ static int check_per_pkg(struct evsel *counter, if (!(vals->run && vals->ena)) return 0; - s = cpu_map__get_socket(cpus, cpu, NULL).socket; + s = cpu__get_socket_id(cpu); if (s < 0) return -1; @@ -336,7 +336,7 @@ static int check_per_pkg(struct evsel *counter, * On multi-die system, die_id > 0. On no-die system, die_id = 0. * We use hashmap(socket, die) to check the used socket+die pair. */ - d = cpu_map__get_die(cpus, cpu, NULL).die; + d = cpu__get_die_id(cpu); if (d < 0) return -1; @@ -345,9 +345,10 @@ static int check_per_pkg(struct evsel *counter, return -ENOMEM; *key = (uint64_t)d << 32 | s; - if (hashmap__find(mask, (void *)key, NULL)) + if (hashmap__find(mask, (void *)key, NULL)) { *skip = true; - else + free(key); + } else ret = hashmap__add(mask, (void *)key, (void *)1); return ret; @@ -355,14 +356,14 @@ static int check_per_pkg(struct evsel *counter, static int process_counter_values(struct perf_stat_config *config, struct evsel *evsel, - int cpu, int thread, + int cpu_map_idx, int thread, struct perf_counts_values *count) { struct perf_counts_values *aggr = &evsel->counts->aggr; static struct perf_counts_values zero; bool skip = false; - if (check_per_pkg(evsel, count, cpu, &skip)) { + if (check_per_pkg(evsel, count, cpu_map_idx, &skip)) { pr_err("failed to read per-pkg counter\n"); return -1; } @@ -378,11 +379,11 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel, case AGGR_NODE: case AGGR_NONE: if (!evsel->snapshot) - evsel__compute_deltas(evsel, cpu, thread, count); + evsel__compute_deltas(evsel, cpu_map_idx, thread, count); perf_counts_values__scale(count, config->scale, NULL); if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) { perf_stat__update_shadow_stats(evsel, count->val, - cpu, &rt_stat); + cpu_map_idx, &rt_stat); } if (config->aggr_mode == AGGR_THREAD) { @@ -411,15 +412,15 @@ static int process_counter_maps(struct perf_stat_config *config, { int nthreads = perf_thread_map__nr(counter->core.threads); int ncpus = evsel__nr_cpus(counter); - int cpu, thread; + int idx, thread; if (counter->core.system_wide) nthreads = 1; for (thread = 0; thread < nthreads; thread++) { - for (cpu = 0; cpu < ncpus; cpu++) { - if (process_counter_values(config, counter, cpu, thread, - perf_counts(counter->counts, cpu, thread))) + for (idx = 0; idx < ncpus; idx++) { + if (process_counter_values(config, counter, idx, thread, + perf_counts(counter->counts, idx, thread))) return -1; } } @@ -531,7 +532,7 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp) int create_perf_stat_counter(struct evsel *evsel, struct perf_stat_config *config, struct target *target, - int cpu) + int cpu_map_idx) { struct perf_event_attr *attr = &evsel->core.attr; struct evsel *leader = evsel__leader(evsel); @@ -585,7 +586,7 @@ int create_perf_stat_counter(struct evsel *evsel, } if (target__has_cpu(target) && !target__has_per_thread(target)) - return evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu); + return evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu_map_idx); return evsel__open_per_thread(evsel, evsel->core.threads); } diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index 32c8527de347..335d19cc3063 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -108,8 +108,7 @@ struct runtime_stat { struct rblist value_list; }; -typedef struct aggr_cpu_id (*aggr_get_id_t)(struct perf_stat_config *config, - struct perf_cpu_map *m, int cpu); +typedef struct aggr_cpu_id (*aggr_get_id_t)(struct perf_stat_config *config, struct perf_cpu cpu); struct perf_stat_config { enum aggr_mode aggr_mode; @@ -209,7 +208,7 @@ void perf_stat__init_shadow_stats(void); void perf_stat__reset_shadow_stats(void); void perf_stat__reset_shadow_per_stat(struct runtime_stat *st); void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, - int cpu, struct runtime_stat *st); + int cpu_map_idx, struct runtime_stat *st); struct perf_stat_output_ctx { void *ctx; print_metric_t print_metric; @@ -249,10 +248,10 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp); int create_perf_stat_counter(struct evsel *evsel, struct perf_stat_config *config, struct target *target, - int cpu); + int cpu_map_idx); void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *config, struct target *_target, struct timespec *ts, int argc, const char **argv); struct metric_expr; -double test_generic_metric(struct metric_expr *mexp, int cpu, struct runtime_stat *st); +double test_generic_metric(struct metric_expr *mexp, int cpu_map_idx, struct runtime_stat *st); #endif diff --git a/tools/perf/util/svghelper.c b/tools/perf/util/svghelper.c index 96f941e01681..4c9f211249db 100644 --- a/tools/perf/util/svghelper.c +++ b/tools/perf/util/svghelper.c @@ -728,7 +728,7 @@ static int str_to_bitmap(char *s, cpumask_t *b, int nr_cpus) int i; int ret = 0; struct perf_cpu_map *m; - int c; + struct perf_cpu c; m = perf_cpu_map__new(s); if (!m) @@ -736,12 +736,12 @@ static int str_to_bitmap(char *s, cpumask_t *b, int nr_cpus) for (i = 0; i < m->nr; i++) { c = m->map[i]; - if (c >= nr_cpus) { + if (c.cpu >= nr_cpus) { ret = -1; break; } - set_bit(c, cpumask_bits(b)); + set_bit(c.cpu, cpumask_bits(b)); } perf_cpu_map__put(m); diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c index 198982109f0f..c9ba8050cc2b 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1191,7 +1191,7 @@ static void synthesize_cpus(struct cpu_map_entries *cpus, cpus->nr = map->nr; for (i = 0; i < map->nr; i++) - cpus->cpu[i] = map->map[i]; + cpus->cpu[i] = map->map[i].cpu; } static void synthesize_mask(struct perf_record_record_cpu_map *mask, @@ -1203,7 +1203,7 @@ static void synthesize_mask(struct perf_record_record_cpu_map *mask, mask->long_size = sizeof(long); for (i = 0; i < map->nr; i++) - set_bit(map->map[i], mask->mask); + set_bit(map->map[i].cpu, mask->mask); } static size_t cpus_size(struct perf_cpu_map *map) @@ -1219,7 +1219,7 @@ static size_t mask_size(struct perf_cpu_map *map, int *max) for (i = 0; i < map->nr; i++) { /* bit position of the cpu is + 1 */ - int bit = map->map[i] + 1; + int bit = map->map[i].cpu + 1; if (bit > *max) *max = bit; @@ -1354,7 +1354,7 @@ int perf_event__synthesize_stat_config(struct perf_tool *tool, } int perf_event__synthesize_stat(struct perf_tool *tool, - u32 cpu, u32 thread, u64 id, + struct perf_cpu cpu, u32 thread, u64 id, struct perf_counts_values *count, perf_event__handler_t process, struct machine *machine) @@ -1366,7 +1366,7 @@ int perf_event__synthesize_stat(struct perf_tool *tool, event.header.misc = 0; event.id = id; - event.cpu = cpu; + event.cpu = cpu.cpu; event.thread = thread; event.val = count->val; event.ena = count->ena; @@ -1763,7 +1763,7 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_ } e->idx = sid->idx; - e->cpu = sid->cpu; + e->cpu = sid->cpu.cpu; e->tid = sid->tid; } } diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h index c931433bacbf..78a0450db164 100644 --- a/tools/perf/util/synthetic-events.h +++ b/tools/perf/util/synthetic-events.h @@ -6,6 +6,7 @@ #include <sys/types.h> // pid_t #include <linux/compiler.h> #include <linux/types.h> +#include <perf/cpumap.h> struct auxtrace_record; struct dso; @@ -63,7 +64,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs); int perf_event__synthesize_stat_round(struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine); -int perf_event__synthesize_stat(struct perf_tool *tool, u32 cpu, u32 thread, u64 id, struct perf_counts_values *count, perf_event__handler_t process, struct machine *machine); +int perf_event__synthesize_stat(struct perf_tool *tool, struct perf_cpu cpu, u32 thread, u64 id, struct perf_counts_values *count, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_thread_map2(struct perf_tool *tool, struct perf_thread_map *threads, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_thread_map(struct perf_tool *tool, struct perf_thread_map *threads, perf_event__handler_t process, struct machine *machine, bool needs_mmap, bool mmap_data); int perf_event__synthesize_threads(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine, bool needs_mmap, bool mmap_data, unsigned int nr_threads_synthesize); diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index df3c4671be72..fb4f6616b5fa 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -416,3 +416,18 @@ char *perf_exe(char *buf, int len) } return strcpy(buf, "perf"); } + +void perf_debuginfod_setup(struct perf_debuginfod *di) +{ + /* + * By default '!di->set' we clear DEBUGINFOD_URLS, so debuginfod + * processing is not triggered, otherwise we set it to 'di->urls' + * value. If 'di->urls' is "system" we keep DEBUGINFOD_URLS value. + */ + if (!di->set) + setenv("DEBUGINFOD_URLS", "", 1); + else if (di->urls && strcmp(di->urls, "system")) + setenv("DEBUGINFOD_URLS", di->urls, 1); + + pr_debug("DEBUGINFOD_URLS=%s\n", getenv("DEBUGINFOD_URLS")); +} diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h index 9f0d36ba77f2..7b625cbd2dd8 100644 --- a/tools/perf/util/util.h +++ b/tools/perf/util/util.h @@ -11,6 +11,9 @@ #include <stddef.h> #include <linux/compiler.h> #include <sys/types.h> +#ifndef __cplusplus +#include <internal/cpumap.h> +#endif /* General helper functions */ void usage(const char *err) __noreturn; @@ -66,6 +69,12 @@ extern bool test_attr__enabled; void test_attr__ready(void); void test_attr__init(void); struct perf_event_attr; -void test_attr__open(struct perf_event_attr *attr, pid_t pid, int cpu, +void test_attr__open(struct perf_event_attr *attr, pid_t pid, struct perf_cpu cpu, int fd, int group_fd, unsigned long flags); + +struct perf_debuginfod { + const char *urls; + bool set; +}; +void perf_debuginfod_setup(struct perf_debuginfod *di); #endif /* GIT_COMPAT_UTIL_H */ |