diff options
author | Ingo Molnar <mingo@kernel.org> | 2019-05-18 11:24:43 +0300 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2019-05-18 11:24:43 +0300 |
commit | 62e1c09418fc16d27720b128275cac61367e2c1b (patch) | |
tree | 4759aa6662b1398e2b93696ace58f6f309722b06 /tools/perf | |
parent | 01be377c62210a8d8fef35be906f9349591bb7cd (diff) | |
parent | 4fc4d8dfa056dfd48afe73b9ea3b7570ceb80b9c (diff) | |
download | linux-62e1c09418fc16d27720b128275cac61367e2c1b.tar.xz |
Merge tag 'perf-core-for-mingo-5.2-20190517' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf.data:
Alexey Budankov:
- Streaming compression of perf ring buffer into PERF_RECORD_COMPRESSED
user space records, resulting in ~3-5x perf.data file size reduction
on variety of tested workloads what saves storage space on larger
server systems where perf.data size can easily reach several tens or
even hundreds of GiBs, especially when profiling with DWARF-based
stacks and tracing of context switches.
perf record:
Arnaldo Carvalho de Melo
- Improve -user-regs/intr-regs suggestions to overcome errors.
perf annotate:
Jin Yao:
- Remove hist__account_cycles() from callback, speeding up branch processing
(perf record -b).
perf stat:
- Add a 'percore' event qualifier, e.g.: -e cpu/event=0,umask=0x3,percore=1/,
that sums up the event counts for both hardware threads in a core.
We can already do this with --per-core, but it's often useful to do
this together with other metrics that are collected per hardware thread.
I.e. now its possible to do this per-event, and have it mixed with other
events not aggregated by core.
core libraries:
Donald Yandt:
- Check for errors when doing fgets(/proc/version).
Jiri Olsa:
- Speed up report for perf compiled with linbunwind.
tools headers:
Arnaldo Carvalho de Melo
- Update memcpy_64.S, x86's kvm.h and pt_regs.h.
arm64:
Florian Fainelli:
- Map Brahma-B53 CPUID to cortex-a53 events.
- Add Cortex-A57 and Cortex-A72 events.
csky:
Mao Han:
- Add DWARF register mappings for libdw, allowing --call-graph=dwarf to work
on the C-SKY arch.
x86:
Andi Kleen/Kan Liang:
- Add support for recording and printing XMM registers, available, for
instance, on Icelake.
Kan Liang:
- Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON support.
UPI replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP.
Intel PT:
Adrian Hunter
. Fix instructions sampling rate.
. Timestamp fixes.
. Improve exported-sql-viewer GUI, allowing, for instance, to copy'n'paste
the trees, useful for e-mailing.
Documentation:
Thomas Richter:
- Add description for 'perf --debug stderr=1', which redirects stderr to stdout.
libtraceevent:
Tzvetomir Stoyanov:
- Add man pages for the various APIs.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'tools/perf')
52 files changed, 1525 insertions, 205 deletions
diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 138fb6e94b3c..18ed1b0fceb3 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -199,6 +199,18 @@ also be supplied. For example: perf stat -C 0 -e 'hv_gpci/dtbp_ptitc,phys_processor_idx=0x2/' ... +EVENT QUALIFIERS: + +It is also possible to add extra qualifiers to an event: + +percore: + +Sums up the event counts for all hardware threads in a core, e.g.: + + + perf stat -e cpu/event=0,umask=0x3,percore=1/ + + EVENT GROUPS ------------ diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 58986f4cc190..de269430720a 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -406,7 +406,8 @@ symbolic names, e.g. on x86, ax, si. To list the available registers use --intr-regs=ax,bx. The list of register is architecture dependent. --user-regs:: -Capture user registers at sample time. Same arguments as -I. +Similar to -I, but capture user registers at sample time. To list the available +user registers use --user-regs=\?. --running-time:: Record running and enabled time for read events (:S) @@ -478,6 +479,11 @@ Also at some cases executing less output write syscalls with bigger data size can take less time than executing more output write syscalls with smaller data size thus lowering runtime profiling overhead. +-z:: +--compression-level[=n]:: +Produce compressed trace using specified level n (default: 1 - fastest compression, +22 - smallest trace) + --all-kernel:: Configure all used events to run in kernel space. diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 39c05f89104e..1e312c2672e4 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -43,6 +43,10 @@ report:: param1 and param2 are defined as formats for the PMU in /sys/bus/event_source/devices/<pmu>/format/* + 'percore' is a event qualifier that sums up the event counts for both + hardware threads in a core. For example: + perf stat -A -a -e cpu/event,percore=1/,otherevent ... + - a symbolically formed event like 'pmu/config=M,config1=N,config2=K/' where M, N, K are numbers (in decimal, hex, octal format). Acceptable values for each of 'config', 'config1' and 'config2' diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt index 593ef49b273c..6967e9b02be5 100644 --- a/tools/perf/Documentation/perf.data-file-format.txt +++ b/tools/perf/Documentation/perf.data-file-format.txt @@ -272,6 +272,19 @@ struct { Two uint64_t for the time of first sample and the time of last sample. + HEADER_COMPRESSED = 27, + +struct { + u32 version; + u32 type; + u32 level; + u32 ratio; + u32 mmap_len; +}; + +Indicates that trace contains records of PERF_RECORD_COMPRESSED type +that have perf_events records in compressed form. + other bits are reserved and should ignored for now HEADER_FEAT_BITS = 256, @@ -437,6 +450,17 @@ struct auxtrace_error_event { Describes a header feature. These are records used in pipe-mode that contain information that otherwise would be in perf.data file's header. + PERF_RECORD_COMPRESSED = 81, + +struct compressed_event { + struct perf_event_header header; + char data[]; +}; + +The header is followed by compressed data frame that can be decompressed +into array of perf trace records. The size of the entire compressed event +record including the header is limited by the max value of header.size. + Event types Define the event attributes with their IDs. diff --git a/tools/perf/Documentation/perf.txt b/tools/perf/Documentation/perf.txt index 864e37597252..401f0ed67439 100644 --- a/tools/perf/Documentation/perf.txt +++ b/tools/perf/Documentation/perf.txt @@ -22,6 +22,8 @@ OPTIONS verbose - general debug messages ordered-events - ordered events object debug messages data-convert - data convert command debug messages + stderr - write debug output (option -v) to stderr + in browser mode --buildid-dir:: Setup buildid cache directory. It has higher priority than diff --git a/tools/perf/arch/x86/include/perf_regs.h b/tools/perf/arch/x86/include/perf_regs.h index 7f6d538f8a89..b7cd91a9014f 100644 --- a/tools/perf/arch/x86/include/perf_regs.h +++ b/tools/perf/arch/x86/include/perf_regs.h @@ -8,9 +8,10 @@ void perf_regs_load(u64 *regs); +#define PERF_REGS_MAX PERF_REG_X86_XMM_MAX +#define PERF_XMM_REGS_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) #ifndef HAVE_ARCH_X86_64_SUPPORT #define PERF_REGS_MASK ((1ULL << PERF_REG_X86_32_MAX) - 1) -#define PERF_REGS_MAX PERF_REG_X86_32_MAX #define PERF_SAMPLE_REGS_ABI PERF_SAMPLE_REGS_ABI_32 #else #define REG_NOSUPPORT ((1ULL << PERF_REG_X86_DS) | \ @@ -18,7 +19,6 @@ void perf_regs_load(u64 *regs); (1ULL << PERF_REG_X86_FS) | \ (1ULL << PERF_REG_X86_GS)) #define PERF_REGS_MASK (((1ULL << PERF_REG_X86_64_MAX) - 1) & ~REG_NOSUPPORT) -#define PERF_REGS_MAX PERF_REG_X86_64_MAX #define PERF_SAMPLE_REGS_ABI PERF_SAMPLE_REGS_ABI_64 #endif #define PERF_REG_IP PERF_REG_X86_IP @@ -77,6 +77,28 @@ static inline const char *perf_reg_name(int id) case PERF_REG_X86_R15: return "R15"; #endif /* HAVE_ARCH_X86_64_SUPPORT */ + +#define XMM(x) \ + case PERF_REG_X86_XMM ## x: \ + case PERF_REG_X86_XMM ## x + 1: \ + return "XMM" #x; + XMM(0) + XMM(1) + XMM(2) + XMM(3) + XMM(4) + XMM(5) + XMM(6) + XMM(7) + XMM(8) + XMM(9) + XMM(10) + XMM(11) + XMM(12) + XMM(13) + XMM(14) + XMM(15) +#undef XMM default: return NULL; } diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/util/perf_regs.c index fead6b3b4206..7886ca5263e3 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -31,6 +31,22 @@ const struct sample_reg sample_reg_masks[] = { SMPL_REG(R14, PERF_REG_X86_R14), SMPL_REG(R15, PERF_REG_X86_R15), #endif + SMPL_REG2(XMM0, PERF_REG_X86_XMM0), + SMPL_REG2(XMM1, PERF_REG_X86_XMM1), + SMPL_REG2(XMM2, PERF_REG_X86_XMM2), + SMPL_REG2(XMM3, PERF_REG_X86_XMM3), + SMPL_REG2(XMM4, PERF_REG_X86_XMM4), + SMPL_REG2(XMM5, PERF_REG_X86_XMM5), + SMPL_REG2(XMM6, PERF_REG_X86_XMM6), + SMPL_REG2(XMM7, PERF_REG_X86_XMM7), + SMPL_REG2(XMM8, PERF_REG_X86_XMM8), + SMPL_REG2(XMM9, PERF_REG_X86_XMM9), + SMPL_REG2(XMM10, PERF_REG_X86_XMM10), + SMPL_REG2(XMM11, PERF_REG_X86_XMM11), + SMPL_REG2(XMM12, PERF_REG_X86_XMM12), + SMPL_REG2(XMM13, PERF_REG_X86_XMM13), + SMPL_REG2(XMM14, PERF_REG_X86_XMM14), + SMPL_REG2(XMM15, PERF_REG_X86_XMM15), SMPL_REG_END }; @@ -254,3 +270,31 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op) return SDT_ARG_VALID; } + +uint64_t arch__intr_reg_mask(void) +{ + struct perf_event_attr attr = { + .type = PERF_TYPE_HARDWARE, + .config = PERF_COUNT_HW_CPU_CYCLES, + .sample_type = PERF_SAMPLE_REGS_INTR, + .sample_regs_intr = PERF_XMM_REGS_MASK, + .precise_ip = 1, + .disabled = 1, + .exclude_kernel = 1, + }; + int fd; + /* + * In an unnamed union, init it here to build on older gcc versions + */ + attr.sample_period = 1; + + event_attr_init(&attr); + + fd = sys_perf_event_open(&attr, 0, -1, -1, 0); + if (fd != -1) { + close(fd); + return (PERF_XMM_REGS_MASK | PERF_REGS_MASK); + } + + return PERF_REGS_MASK; +} diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 67f9d9ffacfb..77deb3a40596 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -159,8 +159,6 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter, struct perf_evsel *evsel = iter->evsel; int err; - hist__account_cycles(sample->branch_stack, al, sample, false); - bi = he->branch_info; err = addr_map_symbol__inc_samples(&bi->from, sample, evsel); @@ -199,6 +197,8 @@ static int process_branch_callback(struct perf_evsel *evsel, if (a.map != NULL) a.map->dso->hit = 1; + hist__account_cycles(sample->branch_stack, al, sample, false); + ret = hist_entry_iter__add(&iter, &a, PERF_MAX_STACK_DEPTH, ann); return ret; } diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index 24086b7f1b14..8e0e06d3edfc 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -837,6 +837,9 @@ int cmd_inject(int argc, const char **argv) if (inject.session == NULL) return -1; + if (zstd_init(&(inject.session->zstd_data), 0) < 0) + pr_warning("Decompression initialization failed.\n"); + if (inject.build_ids) { /* * to make sure the mmap records are ordered correctly @@ -867,6 +870,7 @@ int cmd_inject(int argc, const char **argv) ret = __cmd_inject(&inject); out_delete: + zstd_fini(&(inject.session->zstd_data)); perf_session__delete(inject.session); return ret; } diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index c5e10552776a..e2c3a585a61e 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -133,6 +133,11 @@ static int record__write(struct record *rec, struct perf_mmap *map __maybe_unuse return 0; } +static int record__aio_enabled(struct record *rec); +static int record__comp_enabled(struct record *rec); +static size_t zstd_compress(struct perf_session *session, void *dst, size_t dst_size, + void *src, size_t src_size); + #ifdef HAVE_AIO_SUPPORT static int record__aio_write(struct aiocb *cblock, int trace_fd, void *buf, size_t size, off_t off) @@ -183,9 +188,9 @@ static int record__aio_complete(struct perf_mmap *md, struct aiocb *cblock) if (rem_size == 0) { cblock->aio_fildes = -1; /* - * md->refcount is incremented in perf_mmap__push() for - * every enqueued aio write request so decrement it because - * the request is now complete. + * md->refcount is incremented in record__aio_pushfn() for + * every aio write request started in record__aio_push() so + * decrement it because the request is now complete. */ perf_mmap__put(md); rc = 1; @@ -240,18 +245,89 @@ static int record__aio_sync(struct perf_mmap *md, bool sync_all) } while (1); } -static int record__aio_pushfn(void *to, struct aiocb *cblock, void *bf, size_t size, off_t off) +struct record_aio { + struct record *rec; + void *data; + size_t size; +}; + +static int record__aio_pushfn(struct perf_mmap *map, void *to, void *buf, size_t size) { - struct record *rec = to; - int ret, trace_fd = rec->session->data->file.fd; + struct record_aio *aio = to; - rec->samples++; + /* + * map->base data pointed by buf is copied into free map->aio.data[] buffer + * to release space in the kernel buffer as fast as possible, calling + * perf_mmap__consume() from perf_mmap__push() function. + * + * That lets the kernel to proceed with storing more profiling data into + * the kernel buffer earlier than other per-cpu kernel buffers are handled. + * + * Coping can be done in two steps in case the chunk of profiling data + * crosses the upper bound of the kernel buffer. In this case we first move + * part of data from map->start till the upper bound and then the reminder + * from the beginning of the kernel buffer till the end of the data chunk. + */ + + if (record__comp_enabled(aio->rec)) { + size = zstd_compress(aio->rec->session, aio->data + aio->size, + perf_mmap__mmap_len(map) - aio->size, + buf, size); + } else { + memcpy(aio->data + aio->size, buf, size); + } + + if (!aio->size) { + /* + * Increment map->refcount to guard map->aio.data[] buffer + * from premature deallocation because map object can be + * released earlier than aio write request started on + * map->aio.data[] buffer is complete. + * + * perf_mmap__put() is done at record__aio_complete() + * after started aio request completion or at record__aio_push() + * if the request failed to start. + */ + perf_mmap__get(map); + } + + aio->size += size; + + return size; +} - ret = record__aio_write(cblock, trace_fd, bf, size, off); +static int record__aio_push(struct record *rec, struct perf_mmap *map, off_t *off) +{ + int ret, idx; + int trace_fd = rec->session->data->file.fd; + struct record_aio aio = { .rec = rec, .size = 0 }; + + /* + * Call record__aio_sync() to wait till map->aio.data[] buffer + * becomes available after previous aio write operation. + */ + + idx = record__aio_sync(map, false); + aio.data = map->aio.data[idx]; + ret = perf_mmap__push(map, &aio, record__aio_pushfn); + if (ret != 0) /* ret > 0 - no data, ret < 0 - error */ + return ret; + + rec->samples++; + ret = record__aio_write(&(map->aio.cblocks[idx]), trace_fd, aio.data, aio.size, *off); if (!ret) { - rec->bytes_written += size; + *off += aio.size; + rec->bytes_written += aio.size; if (switch_output_size(rec)) trigger_hit(&switch_output_trigger); + } else { + /* + * Decrement map->refcount incremented in record__aio_pushfn() + * back if record__aio_write() operation failed to start, otherwise + * map->refcount is decremented in record__aio_complete() after + * aio write operation finishes successfully. + */ + perf_mmap__put(map); } return ret; @@ -273,7 +349,7 @@ static void record__aio_mmap_read_sync(struct record *rec) struct perf_evlist *evlist = rec->evlist; struct perf_mmap *maps = evlist->mmap; - if (!rec->opts.nr_cblocks) + if (!record__aio_enabled(rec)) return; for (i = 0; i < evlist->nr_mmaps; i++) { @@ -307,13 +383,8 @@ static int record__aio_parse(const struct option *opt, #else /* HAVE_AIO_SUPPORT */ static int nr_cblocks_max = 0; -static int record__aio_sync(struct perf_mmap *md __maybe_unused, bool sync_all __maybe_unused) -{ - return -1; -} - -static int record__aio_pushfn(void *to __maybe_unused, struct aiocb *cblock __maybe_unused, - void *bf __maybe_unused, size_t size __maybe_unused, off_t off __maybe_unused) +static int record__aio_push(struct record *rec __maybe_unused, struct perf_mmap *map __maybe_unused, + off_t *off __maybe_unused) { return -1; } @@ -372,6 +443,32 @@ static int record__mmap_flush_parse(const struct option *opt, return 0; } +#ifdef HAVE_ZSTD_SUPPORT +static unsigned int comp_level_default = 1; + +static int record__parse_comp_level(const struct option *opt, const char *str, int unset) +{ + struct record_opts *opts = opt->value; + + if (unset) { + opts->comp_level = 0; + } else { + if (str) + opts->comp_level = strtol(str, NULL, 0); + if (!opts->comp_level) + opts->comp_level = comp_level_default; + } + + return 0; +} +#endif +static unsigned int comp_level_max = 22; + +static int record__comp_enabled(struct record *rec) +{ + return rec->opts.comp_level > 0; +} + static int process_synthesized_event(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample __maybe_unused, @@ -385,6 +482,11 @@ static int record__pushfn(struct perf_mmap *map, void *to, void *bf, size_t size { struct record *rec = to; + if (record__comp_enabled(rec)) { + size = zstd_compress(rec->session, map->data, perf_mmap__mmap_len(map), bf, size); + bf = map->data; + } + rec->samples++; return record__write(rec, map, bf, size); } @@ -582,7 +684,7 @@ static int record__mmap_evlist(struct record *rec, opts->auxtrace_mmap_pages, opts->auxtrace_snapshot_mode, opts->nr_cblocks, opts->affinity, - opts->mmap_flush) < 0) { + opts->mmap_flush, opts->comp_level) < 0) { if (errno == EPERM) { pr_err("Permission error mapping pages.\n" "Consider increasing " @@ -771,6 +873,37 @@ static void record__adjust_affinity(struct record *rec, struct perf_mmap *map) } } +static size_t process_comp_header(void *record, size_t increment) +{ + struct compressed_event *event = record; + size_t size = sizeof(*event); + + if (increment) { + event->header.size += increment; + return increment; + } + + event->header.type = PERF_RECORD_COMPRESSED; + event->header.size = size; + + return size; +} + +static size_t zstd_compress(struct perf_session *session, void *dst, size_t dst_size, + void *src, size_t src_size) +{ + size_t compressed; + size_t max_record_size = PERF_SAMPLE_MAX_SIZE - sizeof(struct compressed_event) - 1; + + compressed = zstd_compress_stream_to_records(&session->zstd_data, dst, dst_size, src, src_size, + max_record_size, process_comp_header); + + session->bytes_transferred += src_size; + session->bytes_compressed += compressed; + + return compressed; +} + static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist, bool overwrite, bool synch) { @@ -779,7 +912,7 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli int rc = 0; struct perf_mmap *maps; int trace_fd = rec->data.file.fd; - off_t off; + off_t off = 0; if (!evlist) return 0; @@ -805,20 +938,14 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli map->flush = 1; } if (!record__aio_enabled(rec)) { - if (perf_mmap__push(map, rec, record__pushfn) != 0) { + if (perf_mmap__push(map, rec, record__pushfn) < 0) { if (synch) map->flush = flush; rc = -1; goto out; } } else { - int idx; - /* - * Call record__aio_sync() to wait till map->data buffer - * becomes available after previous aio write request. - */ - idx = record__aio_sync(map, false); - if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) { + if (record__aio_push(rec, map, &off) < 0) { record__aio_set_pos(trace_fd, off); if (synch) map->flush = flush; @@ -888,6 +1015,8 @@ static void record__init_features(struct record *rec) perf_header__clear_feat(&session->header, HEADER_CLOCKID); perf_header__clear_feat(&session->header, HEADER_DIR_FORMAT); + if (!record__comp_enabled(rec)) + perf_header__clear_feat(&session->header, HEADER_COMPRESSED); perf_header__clear_feat(&session->header, HEADER_STAT); } @@ -1186,6 +1315,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) bool disabled = false, draining = false; struct perf_evlist *sb_evlist = NULL; int fd; + float ratio = 0; atexit(record__sig_exit); signal(SIGCHLD, sig_handler); @@ -1215,6 +1345,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) fd = perf_data__fd(data); rec->session = session; + if (zstd_init(&session->zstd_data, rec->opts.comp_level) < 0) { + pr_err("Compression initialization failed.\n"); + return -1; + } + + session->header.env.comp_type = PERF_COMP_ZSTD; + session->header.env.comp_level = rec->opts.comp_level; + record__init_features(rec); if (rec->opts.use_clockid && rec->opts.clockid_res_ns) @@ -1244,6 +1382,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) err = -1; goto out_child; } + session->header.env.comp_mmap_len = session->evlist->mmap_len; err = bpf__apply_obj_config(); if (err) { @@ -1491,6 +1630,11 @@ out_child: record__mmap_read_all(rec, true); record__aio_mmap_read_sync(rec); + if (rec->session->bytes_transferred && rec->session->bytes_compressed) { + ratio = (float)rec->session->bytes_transferred/(float)rec->session->bytes_compressed; + session->header.env.comp_ratio = ratio + 0.5; + } + if (forks) { int exit_status; @@ -1537,12 +1681,19 @@ out_child: else samples[0] = '\0'; - fprintf(stderr, "[ perf record: Captured and wrote %.3f MB %s%s%s ]\n", + fprintf(stderr, "[ perf record: Captured and wrote %.3f MB %s%s%s", perf_data__size(data) / 1024.0 / 1024.0, data->path, postfix, samples); + if (ratio) { + fprintf(stderr, ", compressed (original %.3f MB, ratio is %.3f)", + rec->session->bytes_transferred / 1024.0 / 1024.0, + ratio); + } + fprintf(stderr, " ]\n"); } out_delete_session: + zstd_fini(&session->zstd_data); perf_session__delete(session); if (!opts->no_bpf_event) @@ -2017,10 +2168,10 @@ static struct option __record_options[] = { "use per-thread mmaps"), OPT_CALLBACK_OPTARG('I', "intr-regs", &record.opts.sample_intr_regs, NULL, "any register", "sample selected machine registers on interrupt," - " use -I ? to list register names", parse_regs), + " use '-I?' to list register names", parse_intr_regs), OPT_CALLBACK_OPTARG(0, "user-regs", &record.opts.sample_user_regs, NULL, "any register", "sample selected machine registers on interrupt," - " use -I ? to list register names", parse_regs), + " use '--user-regs=?' to list register names", parse_user_regs), OPT_BOOLEAN(0, "running-time", &record.opts.running_time, "Record running/enabled time of read (:S) events"), OPT_CALLBACK('k', "clockid", &record.opts, @@ -2068,6 +2219,11 @@ static struct option __record_options[] = { OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu", "Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer", record__parse_affinity), +#ifdef HAVE_ZSTD_SUPPORT + OPT_CALLBACK_OPTARG('z', "compression-level", &record.opts, &comp_level_default, + "n", "Compressed records using specified level (default: 1 - fastest compression, 22 - greatest compression)", + record__parse_comp_level), +#endif OPT_END() }; @@ -2127,6 +2283,12 @@ int cmd_record(int argc, const char **argv) "cgroup monitoring only available in system-wide mode"); } + + if (rec->opts.comp_level != 0) { + pr_debug("Compression enabled, disabling build id collection at the end of the session.\n"); + rec->no_buildid = true; + } + if (rec->opts.record_switch_events && !perf_can_record_switch_events()) { ui__error("kernel does not support recording context switch events\n"); @@ -2272,12 +2434,15 @@ int cmd_record(int argc, const char **argv) if (rec->opts.nr_cblocks > nr_cblocks_max) rec->opts.nr_cblocks = nr_cblocks_max; - if (verbose > 0) - pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks); + pr_debug("nr_cblocks: %d\n", rec->opts.nr_cblocks); pr_debug("affinity: %s\n", affinity_tags[rec->opts.affinity]); pr_debug("mmap flush: %d\n", rec->opts.mmap_flush); + if (rec->opts.comp_level > comp_level_max) + rec->opts.comp_level = comp_level_max; + pr_debug("comp level: %d\n", rec->opts.comp_level); + err = __cmd_record(&record, argc, argv); out: perf_evlist__delete(rec->evlist); diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 4054eb1f98ac..1ca533f06a4c 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -136,9 +136,6 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter, if (!ui__has_annotation() && !rep->symbol_ipc) return 0; - hist__account_cycles(sample->branch_stack, al, sample, - rep->nonany_branch_mode); - if (sort__mode == SORT_MODE__BRANCH) { bi = he->branch_info; err = addr_map_symbol__inc_samples(&bi->from, sample, evsel); @@ -181,9 +178,6 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter, if (!ui__has_annotation() && !rep->symbol_ipc) return 0; - hist__account_cycles(sample->branch_stack, al, sample, - rep->nonany_branch_mode); - bi = he->branch_info; err = addr_map_symbol__inc_samples(&bi->from, sample, evsel); if (err) @@ -282,6 +276,11 @@ static int process_sample_event(struct perf_tool *tool, if (al.map != NULL) al.map->dso->hit = 1; + if (ui__has_annotation() || rep->symbol_ipc) { + hist__account_cycles(sample->branch_stack, &al, sample, + rep->nonany_branch_mode); + } + ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep); if (ret < 0) pr_debug("problem adding hist entry, skipping event\n"); @@ -1259,6 +1258,9 @@ repeat: if (session == NULL) return -1; + if (zstd_init(&(session->zstd_data), 0) < 0) + pr_warning("Decompression initialization failed. Reported data may be incomplete.\n"); + if (report.queue_size) { ordered_events__set_alloc_size(&session->ordered_events, report.queue_size); @@ -1449,7 +1451,7 @@ repeat: error: if (report.ptime_range) zfree(&report.ptime_range); - + zstd_fini(&(session->zstd_data)); perf_session__delete(session); return ret; } diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index a3c060878faa..24b8e690fb69 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -847,6 +847,18 @@ static int perf_stat__get_core_cached(struct perf_stat_config *config, return perf_stat__get_aggr(config, perf_stat__get_core, map, idx); } +static bool term_percore_set(void) +{ + struct perf_evsel *counter; + + evlist__for_each_entry(evsel_list, counter) { + if (counter->percore) + return true; + } + + return false; +} + static int perf_stat_init_aggr_mode(void) { int nr; @@ -867,6 +879,15 @@ static int perf_stat_init_aggr_mode(void) stat_config.aggr_get_id = perf_stat__get_core_cached; break; case AGGR_NONE: + if (term_percore_set()) { + if (cpu_map__build_core_map(evsel_list->cpus, + &stat_config.aggr_map)) { + perror("cannot build core map"); + return -1; + } + stat_config.aggr_get_id = perf_stat__get_core_cached; + } + break; case AGGR_GLOBAL: case AGGR_THREAD: case AGGR_UNSET: diff --git a/tools/perf/perf.h b/tools/perf/perf.h index 369eae61068d..d59dee61b64d 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -86,6 +86,7 @@ struct record_opts { int nr_cblocks; int affinity; int mmap_flush; + unsigned int comp_level; }; enum perf_affinity { diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a57-a72/core-imp-def.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a57-a72/core-imp-def.json new file mode 100644 index 000000000000..0ac9b7927450 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a57-a72/core-imp-def.json @@ -0,0 +1,179 @@ +[ + { + "ArchStdEvent": "L1D_CACHE_RD", + }, + { + "ArchStdEvent": "L1D_CACHE_WR", + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_RD", + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_WR", + }, + { + "ArchStdEvent": "L1D_CACHE_WB_VICTIM", + }, + { + "ArchStdEvent": "L1D_CACHE_WB_CLEAN", + }, + { + "ArchStdEvent": "L1D_CACHE_INVAL", + }, + { + "ArchStdEvent": "L1D_TLB_REFILL_RD", + }, + { + "ArchStdEvent": "L1D_TLB_REFILL_WR", + }, + { + "ArchStdEvent": "L2D_CACHE_RD", + }, + { + "ArchStdEvent": "L2D_CACHE_WR", + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_RD", + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_WR", + }, + { + "ArchStdEvent": "L2D_CACHE_WB_VICTIM", + }, + { + "ArchStdEvent": "L2D_CACHE_WB_CLEAN", + }, + { + "ArchStdEvent": "L2D_CACHE_INVAL", + }, + { + "ArchStdEvent": "BUS_ACCESS_RD", + }, + { + "ArchStdEvent": "BUS_ACCESS_WR", + }, + { + "ArchStdEvent": "BUS_ACCESS_SHARED", + }, + { + "ArchStdEvent": "BUS_ACCESS_NOT_SHARED", + }, + { + "ArchStdEvent": "BUS_ACCESS_NORMAL", + }, + { + "ArchStdEvent": "BUS_ACCESS_PERIPH", + }, + { + "ArchStdEvent": "MEM_ACCESS_RD", + }, + { + "ArchStdEvent": "MEM_ACCESS_WR", + }, + { + "ArchStdEvent": "UNALIGNED_LD_SPEC", + }, + { + "ArchStdEvent": "UNALIGNED_ST_SPEC", + }, + { + "ArchStdEvent": "UNALIGNED_LDST_SPEC", + }, + { + "ArchStdEvent": "LDREX_SPEC", + }, + { + "ArchStdEvent": "STREX_PASS_SPEC", + }, + { + "ArchStdEvent": "STREX_FAIL_SPEC", + }, + { + "ArchStdEvent": "LD_SPEC", + }, + { + "ArchStdEvent": "ST_SPEC", + }, + { + "ArchStdEvent": "LDST_SPEC", + }, + { + "ArchStdEvent": "DP_SPEC", + }, + { + "ArchStdEvent": "ASE_SPEC", + }, + { + "ArchStdEvent": "VFP_SPEC", + }, + { + "ArchStdEvent": "PC_WRITE_SPEC", + }, + { + "ArchStdEvent": "CRYPTO_SPEC", + }, + { + "ArchStdEvent": "BR_IMMED_SPEC", + }, + { + "ArchStdEvent": "BR_RETURN_SPEC", + }, + { + "ArchStdEvent": "BR_INDIRECT_SPEC", + }, + { + "ArchStdEvent": "ISB_SPEC", + }, + { + "ArchStdEvent": "DSB_SPEC", + }, + { + "ArchStdEvent": "DMB_SPEC", + }, + { + "ArchStdEvent": "EXC_UNDEF", + }, + { + "ArchStdEvent": "EXC_SVC", + }, + { + "ArchStdEvent": "EXC_PABORT", + }, + { + "ArchStdEvent": "EXC_DABORT", + }, + { + "ArchStdEvent": "EXC_IRQ", + }, + { + "ArchStdEvent": "EXC_FIQ", + }, + { + "ArchStdEvent": "EXC_SMC", + }, + { + "ArchStdEvent": "EXC_HVC", + }, + { + "ArchStdEvent": "EXC_TRAP_PABORT", + }, + { + "ArchStdEvent": "EXC_TRAP_DABORT", + }, + { + "ArchStdEvent": "EXC_TRAP_OTHER", + }, + { + "ArchStdEvent": "EXC_TRAP_IRQ", + }, + { + "ArchStdEvent": "EXC_TRAP_FIQ", + }, + { + "ArchStdEvent": "RC_LD_SPEC", + }, + { + "ArchStdEvent": "RC_ST_SPEC", + }, +] diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-events/arch/arm64/mapfile.csv index 59cd8604b0bd..927fcddcb4aa 100644 --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv @@ -12,7 +12,10 @@ # # #Family-model,Version,Filename,EventType -0x00000000410fd03[[:xdigit:]],v1,arm/cortex-a53,core +0x00000000410fd030,v1,arm/cortex-a53,core +0x00000000420f1000,v1,arm/cortex-a53,core +0x00000000410fd070,v1,arm/cortex-a57-a72,core +0x00000000410fd080,v1,arm/cortex-a57-a72,core 0x00000000420f5160,v1,cavium/thunderx2,core 0x00000000430f0af0,v1,cavium/thunderx2,core 0x00000000480fd010,v1,hisilicon/hip08,core diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index 68c92bb599ee..58f77fd0f59f 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -235,6 +235,7 @@ static struct map { { "iMPH-U", "uncore_arb" }, { "CPU-M-CF", "cpum_cf" }, { "CPU-M-SF", "cpum_sf" }, + { "UPI LL", "uncore_upi" }, {} }; @@ -414,7 +415,6 @@ static int save_arch_std_events(void *data, char *name, char *event, char *metric_name, char *metric_group) { struct event_struct *es; - struct stat *sb = data; es = malloc(sizeof(*es)); if (!es) diff --git a/tools/perf/scripts/python/exported-sql-viewer.py b/tools/perf/scripts/python/exported-sql-viewer.py index 74ef92f1d19a..affed7d149be 100755 --- a/tools/perf/scripts/python/exported-sql-viewer.py +++ b/tools/perf/scripts/python/exported-sql-viewer.py @@ -456,6 +456,10 @@ class CallGraphLevelItemBase(object): self.query_done = False; self.child_count = 0 self.child_items = [] + if parent_item: + self.level = parent_item.level + 1 + else: + self.level = 0 def getChildItem(self, row): return self.child_items[row] @@ -877,9 +881,14 @@ class TreeWindowBase(QMdiSubWindow): super(TreeWindowBase, self).__init__(parent) self.model = None - self.view = None self.find_bar = None + self.view = QTreeView() + self.view.setSelectionMode(QAbstractItemView.ContiguousSelection) + self.view.CopyCellsToClipboard = CopyTreeCellsToClipboard + + self.context_menu = TreeContextMenu(self.view) + def DisplayFound(self, ids): if not len(ids): return False @@ -921,7 +930,6 @@ class CallGraphWindow(TreeWindowBase): self.model = LookupCreateModel("Context-Sensitive Call Graph", lambda x=glb: CallGraphModel(x)) - self.view = QTreeView() self.view.setModel(self.model) for c, w in ((0, 250), (1, 100), (2, 60), (3, 70), (4, 70), (5, 100)): @@ -944,7 +952,6 @@ class CallTreeWindow(TreeWindowBase): self.model = LookupCreateModel("Call Tree", lambda x=glb: CallTreeModel(x)) - self.view = QTreeView() self.view.setModel(self.model) for c, w in ((0, 230), (1, 100), (2, 100), (3, 70), (4, 70), (5, 100)): @@ -1649,10 +1656,14 @@ class BranchWindow(QMdiSubWindow): self.view = QTreeView() self.view.setUniformRowHeights(True) + self.view.setSelectionMode(QAbstractItemView.ContiguousSelection) + self.view.CopyCellsToClipboard = CopyTreeCellsToClipboard self.view.setModel(self.model) self.ResizeColumnsToContents() + self.context_menu = TreeContextMenu(self.view) + self.find_bar = FindBar(self, self, True) self.finder = ChildDataItemFinder(self.model.root) @@ -2261,6 +2272,240 @@ class ResizeColumnsToContentsBase(QObject): self.data_model.rowsInserted.disconnect(self.UpdateColumnWidths) self.ResizeColumnsToContents() +# Convert value to CSV + +def ToCSValue(val): + if '"' in val: + val = val.replace('"', '""') + if "," in val or '"' in val: + val = '"' + val + '"' + return val + +# Key to sort table model indexes by row / column, assuming fewer than 1000 columns + +glb_max_cols = 1000 + +def RowColumnKey(a): + return a.row() * glb_max_cols + a.column() + +# Copy selected table cells to clipboard + +def CopyTableCellsToClipboard(view, as_csv=False, with_hdr=False): + indexes = sorted(view.selectedIndexes(), key=RowColumnKey) + idx_cnt = len(indexes) + if not idx_cnt: + return + if idx_cnt == 1: + with_hdr=False + min_row = indexes[0].row() + max_row = indexes[0].row() + min_col = indexes[0].column() + max_col = indexes[0].column() + for i in indexes: + min_row = min(min_row, i.row()) + max_row = max(max_row, i.row()) + min_col = min(min_col, i.column()) + max_col = max(max_col, i.column()) + if max_col > glb_max_cols: + raise RuntimeError("glb_max_cols is too low") + max_width = [0] * (1 + max_col - min_col) + for i in indexes: + c = i.column() - min_col + max_width[c] = max(max_width[c], len(str(i.data()))) + text = "" + pad = "" + sep = "" + if with_hdr: + model = indexes[0].model() + for col in range(min_col, max_col + 1): + val = model.headerData(col, Qt.Horizontal) + if as_csv: + text += sep + ToCSValue(val) + sep = "," + else: + c = col - min_col + max_width[c] = max(max_width[c], len(val)) + width = max_width[c] + align = model.headerData(col, Qt.Horizontal, Qt.TextAlignmentRole) + if align & Qt.AlignRight: + val = val.rjust(width) + text += pad + sep + val + pad = " " * (width - len(val)) + sep = " " + text += "\n" + pad = "" + sep = "" + last_row = min_row + for i in indexes: + if i.row() > last_row: + last_row = i.row() + text += "\n" + pad = "" + sep = "" + if as_csv: + text += sep + ToCSValue(str(i.data())) + sep = "," + else: + width = max_width[i.column() - min_col] + if i.data(Qt.TextAlignmentRole) & Qt.AlignRight: + val = str(i.data()).rjust(width) + else: + val = str(i.data()) + text += pad + sep + val + pad = " " * (width - len(val)) + sep = " " + QApplication.clipboard().setText(text) + +def CopyTreeCellsToClipboard(view, as_csv=False, with_hdr=False): + indexes = view.selectedIndexes() + if not len(indexes): + return + + selection = view.selectionModel() + + first = None + for i in indexes: + above = view.indexAbove(i) + if not selection.isSelected(above): + first = i + break + + if first is None: + raise RuntimeError("CopyTreeCellsToClipboard internal error") + + model = first.model() + row_cnt = 0 + col_cnt = model.columnCount(first) + max_width = [0] * col_cnt + + indent_sz = 2 + indent_str = " " * indent_sz + + expanded_mark_sz = 2 + if sys.version_info[0] == 3: + expanded_mark = "\u25BC " + not_expanded_mark = "\u25B6 " + else: + expanded_mark = unicode(chr(0xE2) + chr(0x96) + chr(0xBC) + " ", "utf-8") + not_expanded_mark = unicode(chr(0xE2) + chr(0x96) + chr(0xB6) + " ", "utf-8") + leaf_mark = " " + + if not as_csv: + pos = first + while True: + row_cnt += 1 + row = pos.row() + for c in range(col_cnt): + i = pos.sibling(row, c) + if c: + n = len(str(i.data())) + else: + n = len(str(i.data()).strip()) + n += (i.internalPointer().level - 1) * indent_sz + n += expanded_mark_sz + max_width[c] = max(max_width[c], n) + pos = view.indexBelow(pos) + if not selection.isSelected(pos): + break + + text = "" + pad = "" + sep = "" + if with_hdr: + for c in range(col_cnt): + val = model.headerData(c, Qt.Horizontal, Qt.DisplayRole).strip() + if as_csv: + text += sep + ToCSValue(val) + sep = "," + else: + max_width[c] = max(max_width[c], len(val)) + width = max_width[c] + align = model.headerData(c, Qt.Horizontal, Qt.TextAlignmentRole) + if align & Qt.AlignRight: + val = val.rjust(width) + text += pad + sep + val + pad = " " * (width - len(val)) + sep = " " + text += "\n" + pad = "" + sep = "" + + pos = first + while True: + row = pos.row() + for c in range(col_cnt): + i = pos.sibling(row, c) + val = str(i.data()) + if not c: + if model.hasChildren(i): + if view.isExpanded(i): + mark = expanded_mark + else: + mark = not_expanded_mark + else: + mark = leaf_mark + val = indent_str * (i.internalPointer().level - 1) + mark + val.strip() + if as_csv: + text += sep + ToCSValue(val) + sep = "," + else: + width = max_width[c] + if c and i.data(Qt.TextAlignmentRole) & Qt.AlignRight: + val = val.rjust(width) + text += pad + sep + val + pad = " " * (width - len(val)) + sep = " " + pos = view.indexBelow(pos) + if not selection.isSelected(pos): + break + text = text.rstrip() + "\n" + pad = "" + sep = "" + + QApplication.clipboard().setText(text) + +def CopyCellsToClipboard(view, as_csv=False, with_hdr=False): + view.CopyCellsToClipboard(view, as_csv, with_hdr) + +def CopyCellsToClipboardHdr(view): + CopyCellsToClipboard(view, False, True) + +def CopyCellsToClipboardCSV(view): + CopyCellsToClipboard(view, True, True) + +# Context menu + +class ContextMenu(object): + + def __init__(self, view): + self.view = view + self.view.setContextMenuPolicy(Qt.CustomContextMenu) + self.view.customContextMenuRequested.connect(self.ShowContextMenu) + + def ShowContextMenu(self, pos): + menu = QMenu(self.view) + self.AddActions(menu) + menu.exec_(self.view.mapToGlobal(pos)) + + def AddCopy(self, menu): + menu.addAction(CreateAction("&Copy selection", "Copy to clipboard", lambda: CopyCellsToClipboardHdr(self.view), self.view)) + menu.addAction(CreateAction("Copy selection as CS&V", "Copy to clipboard as CSV", lambda: CopyCellsToClipboardCSV(self.view), self.view)) + + def AddActions(self, menu): + self.AddCopy(menu) + +class TreeContextMenu(ContextMenu): + + def __init__(self, view): + super(TreeContextMenu, self).__init__(view) + + def AddActions(self, menu): + i = self.view.currentIndex() + text = str(i.data()).strip() + if len(text): + menu.addAction(CreateAction('Copy "' + text + '"', "Copy to clipboard", lambda: QApplication.clipboard().setText(text), self.view)) + self.AddCopy(menu) + # Table window class TableWindow(QMdiSubWindow, ResizeColumnsToContentsBase): @@ -2279,9 +2524,13 @@ class TableWindow(QMdiSubWindow, ResizeColumnsToContentsBase): self.view.verticalHeader().setVisible(False) self.view.sortByColumn(-1, Qt.AscendingOrder) self.view.setSortingEnabled(True) + self.view.setSelectionMode(QAbstractItemView.ContiguousSelection) + self.view.CopyCellsToClipboard = CopyTableCellsToClipboard self.ResizeColumnsToContents() + self.context_menu = ContextMenu(self.view) + self.find_bar = FindBar(self, self, True) self.finder = ChildDataItemFinder(self.data_model) @@ -2395,6 +2644,10 @@ class TopCallsWindow(QMdiSubWindow, ResizeColumnsToContentsBase): self.view.setModel(self.model) self.view.setEditTriggers(QAbstractItemView.NoEditTriggers) self.view.verticalHeader().setVisible(False) + self.view.setSelectionMode(QAbstractItemView.ContiguousSelection) + self.view.CopyCellsToClipboard = CopyTableCellsToClipboard + + self.context_menu = ContextMenu(self.view) self.ResizeColumnsToContents() @@ -2660,6 +2913,60 @@ class HelpOnlyWindow(QMainWindow): self.setCentralWidget(self.text) +# PostqreSQL server version + +def PostqreSQLServerVersion(db): + query = QSqlQuery(db) + QueryExec(query, "SELECT VERSION()") + if query.next(): + v_str = query.value(0) + v_list = v_str.strip().split(" ") + if v_list[0] == "PostgreSQL" and v_list[2] == "on": + return v_list[1] + return v_str + return "Unknown" + +# SQLite version + +def SQLiteVersion(db): + query = QSqlQuery(db) + QueryExec(query, "SELECT sqlite_version()") + if query.next(): + return query.value(0) + return "Unknown" + +# About dialog + +class AboutDialog(QDialog): + + def __init__(self, glb, parent=None): + super(AboutDialog, self).__init__(parent) + + self.setWindowTitle("About Exported SQL Viewer") + self.setMinimumWidth(300) + + pyside_version = "1" if pyside_version_1 else "2" + + text = "<pre>" + text += "Python version: " + sys.version.split(" ")[0] + "\n" + text += "PySide version: " + pyside_version + "\n" + text += "Qt version: " + qVersion() + "\n" + if glb.dbref.is_sqlite3: + text += "SQLite version: " + SQLiteVersion(glb.db) + "\n" + else: + text += "PostqreSQL version: " + PostqreSQLServerVersion(glb.db) + "\n" + text += "</pre>" + + self.text = QTextBrowser() + self.text.setHtml(text) + self.text.setReadOnly(True) + self.text.setOpenExternalLinks(True) + + self.vbox = QVBoxLayout() + self.vbox.addWidget(self.text) + + self.setLayout(self.vbox); + # Font resize def ResizeFont(widget, diff): @@ -2732,6 +3039,8 @@ class MainWindow(QMainWindow): file_menu.addAction(CreateExitAction(glb.app, self)) edit_menu = menu.addMenu("&Edit") + edit_menu.addAction(CreateAction("&Copy", "Copy to clipboard", self.CopyToClipboard, self, QKeySequence.Copy)) + edit_menu.addAction(CreateAction("Copy as CS&V", "Copy to clipboard as CSV", self.CopyToClipboardCSV, self)) edit_menu.addAction(CreateAction("&Find...", "Find items", self.Find, self, QKeySequence.Find)) edit_menu.addAction(CreateAction("Fetch &more records...", "Fetch more records", self.FetchMoreRecords, self, [QKeySequence(Qt.Key_F8)])) edit_menu.addAction(CreateAction("&Shrink Font", "Make text smaller", self.ShrinkFont, self, [QKeySequence("Ctrl+-")])) @@ -2755,6 +3064,21 @@ class MainWindow(QMainWindow): help_menu = menu.addMenu("&Help") help_menu.addAction(CreateAction("&Exported SQL Viewer Help", "Helpful information", self.Help, self, QKeySequence.HelpContents)) + help_menu.addAction(CreateAction("&About Exported SQL Viewer", "About this application", self.About, self)) + + def Try(self, fn): + win = self.mdi_area.activeSubWindow() + if win: + try: + fn(win.view) + except: + pass + + def CopyToClipboard(self): + self.Try(CopyCellsToClipboardHdr) + + def CopyToClipboardCSV(self): + self.Try(CopyCellsToClipboardCSV) def Find(self): win = self.mdi_area.activeSubWindow() @@ -2773,12 +3097,10 @@ class MainWindow(QMainWindow): pass def ShrinkFont(self): - win = self.mdi_area.activeSubWindow() - ShrinkFont(win.view) + self.Try(ShrinkFont) def EnlargeFont(self): - win = self.mdi_area.activeSubWindow() - EnlargeFont(win.view) + self.Try(EnlargeFont) def EventMenu(self, events, reports_menu): branches_events = 0 @@ -2828,6 +3150,10 @@ class MainWindow(QMainWindow): def Help(self): HelpWindow(self.glb, self) + def About(self): + dialog = AboutDialog(self.glb, self) + dialog.exec_() + # XED Disassembler class xed_state_t(Structure): diff --git a/tools/perf/tests/dso-data.c b/tools/perf/tests/dso-data.c index 7f6c52021e41..946ab4b63acd 100644 --- a/tools/perf/tests/dso-data.c +++ b/tools/perf/tests/dso-data.c @@ -304,7 +304,7 @@ int test__dso_data_cache(struct test *test __maybe_unused, int subtest __maybe_u /* Make sure we did not leak any file descriptor. */ nr_end = open_files_cnt(); pr_debug("nr start %ld, nr stop %ld\n", nr, nr_end); - TEST_ASSERT_VAL("failed leadking files", nr == nr_end); + TEST_ASSERT_VAL("failed leaking files", nr == nr_end); return 0; } @@ -380,6 +380,6 @@ int test__dso_data_reopen(struct test *test __maybe_unused, int subtest __maybe_ /* Make sure we did not leak any file descriptor. */ nr_end = open_files_cnt(); pr_debug("nr start %ld, nr stop %ld\n", nr, nr_end); - TEST_ASSERT_VAL("failed leadking files", nr == nr_end); + TEST_ASSERT_VAL("failed leaking files", nr == nr_end); return 0; } diff --git a/tools/perf/tests/make b/tools/perf/tests/make index e46723568516..5363a12a8b9b 100644 --- a/tools/perf/tests/make +++ b/tools/perf/tests/make @@ -107,7 +107,7 @@ make_minimal := NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 make_minimal += NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 make_minimal += NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 make_minimal += NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 -make_minimal += NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 +make_minimal += NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 # $(run) contains all available tests run := make_pure diff --git a/tools/perf/tests/shell/record+zstd_comp_decomp.sh b/tools/perf/tests/shell/record+zstd_comp_decomp.sh new file mode 100755 index 000000000000..5dcba800109f --- /dev/null +++ b/tools/perf/tests/shell/record+zstd_comp_decomp.sh @@ -0,0 +1,34 @@ +#!/bin/sh +# Zstd perf.data compression/decompression + +trace_file=$(mktemp /tmp/perf.data.XXX) +perf_tool=perf + +skip_if_no_z_record() { + $perf_tool record -h 2>&1 | grep -q '\-z, \-\-compression\-level' +} + +collect_z_record() { + echo "Collecting compressed record file:" + $perf_tool record -o $trace_file -g -z -F 5000 -- \ + dd count=500 if=/dev/random of=/dev/null +} + +check_compressed_stats() { + echo "Checking compressed events stats:" + $perf_tool report -i $trace_file --header --stats | \ + grep -E "(# compressed : Zstd,)|(COMPRESSED events:)" +} + +check_compressed_output() { + $perf_tool inject -i $trace_file -o $trace_file.decomp && + $perf_tool report -i $trace_file --stdio | head -n -3 > $trace_file.comp.output && + $perf_tool report -i $trace_file.decomp --stdio | head -n -3 > $trace_file.decomp.output && + diff $trace_file.comp.output $trace_file.decomp.output +} + +skip_if_no_z_record || exit 2 +collect_z_record && check_compressed_stats && check_compressed_output +err=$? +rm -f $trace_file* +exit $err diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 8dd3102301ea..6d5bbc8b589b 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -145,6 +145,8 @@ perf-y += scripting-engines/ perf-$(CONFIG_ZLIB) += zlib.o perf-$(CONFIG_LZMA) += lzma.o +perf-$(CONFIG_ZSTD) += zstd.o + perf-y += demangle-java.o perf-y += demangle-rust.o diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 09762985c713..0b8573fd9b05 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1021,7 +1021,7 @@ static void annotation__count_and_fill(struct annotation *notes, u64 start, u64 float ipc = n_insn / ((double)ch->cycles / (double)ch->num); /* Hide data when there are too many overlaps. */ - if (ch->reset >= 0x7fff || ch->reset >= ch->num / 2) + if (ch->reset >= 0x7fff) return; for (offset = start; offset <= end; offset++) { diff --git a/tools/perf/util/compress.h b/tools/perf/util/compress.h index 892e92e7e7fc..0cd3369af2a4 100644 --- a/tools/perf/util/compress.h +++ b/tools/perf/util/compress.h @@ -2,6 +2,11 @@ #ifndef PERF_COMPRESS_H #define PERF_COMPRESS_H +#include <stdbool.h> +#ifdef HAVE_ZSTD_SUPPORT +#include <zstd.h> +#endif + #ifdef HAVE_ZLIB_SUPPORT int gzip_decompress_to_file(const char *input, int output_fd); bool gzip_is_compressed(const char *input); @@ -12,4 +17,52 @@ int lzma_decompress_to_file(const char *input, int output_fd); bool lzma_is_compressed(const char *input); #endif +struct zstd_data { +#ifdef HAVE_ZSTD_SUPPORT + ZSTD_CStream *cstream; + ZSTD_DStream *dstream; +#endif +}; + +#ifdef HAVE_ZSTD_SUPPORT + +int zstd_init(struct zstd_data *data, int level); +int zstd_fini(struct zstd_data *data); + +size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size, + void *src, size_t src_size, size_t max_record_size, + size_t process_header(void *record, size_t increment)); + +size_t zstd_decompress_stream(struct zstd_data *data, void *src, size_t src_size, + void *dst, size_t dst_size); +#else /* !HAVE_ZSTD_SUPPORT */ + +static inline int zstd_init(struct zstd_data *data __maybe_unused, int level __maybe_unused) +{ + return 0; +} + +static inline int zstd_fini(struct zstd_data *data __maybe_unused) +{ + return 0; +} + +static inline +size_t zstd_compress_stream_to_records(struct zstd_data *data __maybe_unused, + void *dst __maybe_unused, size_t dst_size __maybe_unused, + void *src __maybe_unused, size_t src_size __maybe_unused, + size_t max_record_size __maybe_unused, + size_t process_header(void *record, size_t increment) __maybe_unused) +{ + return 0; +} + +static inline size_t zstd_decompress_stream(struct zstd_data *data __maybe_unused, void *src __maybe_unused, + size_t src_size __maybe_unused, void *dst __maybe_unused, + size_t dst_size __maybe_unused) +{ + return 0; +} +#endif + #endif /* PERF_COMPRESS_H */ diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h index 4f8e2b485c01..271a90b326c4 100644 --- a/tools/perf/util/env.h +++ b/tools/perf/util/env.h @@ -62,6 +62,11 @@ struct perf_env { struct cpu_topology_map *cpu; struct cpu_cache_level *caches; int caches_cnt; + u32 comp_ratio; + u32 comp_ver; + u32 comp_type; + u32 comp_level; + u32 comp_mmap_len; struct numa_node *numa_nodes; struct memory_node *memory_nodes; unsigned long long memory_bsize; @@ -80,6 +85,12 @@ struct perf_env { } bpf_progs; }; +enum perf_compress_type { + PERF_COMP_NONE = 0, + PERF_COMP_ZSTD, + PERF_COMP_MAX +}; + struct bpf_prog_info_node; struct btf_node; diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index ba7be74fad6e..d1ad6c419724 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -68,6 +68,7 @@ static const char *perf_event__names[] = { [PERF_RECORD_EVENT_UPDATE] = "EVENT_UPDATE", [PERF_RECORD_TIME_CONV] = "TIME_CONV", [PERF_RECORD_HEADER_FEATURE] = "FEATURE", + [PERF_RECORD_COMPRESSED] = "COMPRESSED", }; static const char *perf_ns__names[] = { diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index 4e908ec1ef64..9e999550f247 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -255,6 +255,7 @@ enum perf_user_event_type { /* above any possible kernel type */ PERF_RECORD_EVENT_UPDATE = 78, PERF_RECORD_TIME_CONV = 79, PERF_RECORD_HEADER_FEATURE = 80, + PERF_RECORD_COMPRESSED = 81, PERF_RECORD_HEADER_MAX }; @@ -627,6 +628,11 @@ struct feature_event { char data[]; }; +struct compressed_event { + struct perf_event_header header; + char data[]; +}; + union perf_event { struct perf_event_header header; struct mmap_event mmap; @@ -660,6 +666,7 @@ union perf_event { struct feature_event feat; struct ksymbol_event ksymbol_event; struct bpf_event bpf_event; + struct compressed_event pack; }; void perf_event__print_totals(void); diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 4b6783ff5813..69d0fa8ab16f 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1009,7 +1009,8 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str, */ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages, unsigned int auxtrace_pages, - bool auxtrace_overwrite, int nr_cblocks, int affinity, int flush) + bool auxtrace_overwrite, int nr_cblocks, int affinity, int flush, + int comp_level) { struct perf_evsel *evsel; const struct cpu_map *cpus = evlist->cpus; @@ -1019,7 +1020,8 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages, * Its value is decided by evsel's write_backward. * So &mp should not be passed through const pointer. */ - struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity, .flush = flush }; + struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity, .flush = flush, + .comp_level = comp_level }; if (!evlist->mmap) evlist->mmap = perf_evlist__alloc_mmap(evlist, false); @@ -1051,7 +1053,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages, int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages) { - return perf_evlist__mmap_ex(evlist, pages, 0, false, 0, PERF_AFFINITY_SYS, 1); + return perf_evlist__mmap_ex(evlist, pages, 0, false, 0, PERF_AFFINITY_SYS, 1, 0); } int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index c9a0f72677fd..49354fe24d5f 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -178,7 +178,7 @@ unsigned long perf_event_mlock_kb_in_pages(void); int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages, unsigned int auxtrace_pages, bool auxtrace_overwrite, int nr_cblocks, - int affinity, int flush); + int affinity, int flush, int comp_level); int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages); void perf_evlist__munmap(struct perf_evlist *evlist); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index a10cf4cde920..a6f572a40deb 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -813,6 +813,8 @@ static void apply_config_terms(struct perf_evsel *evsel, break; case PERF_EVSEL__CONFIG_TERM_DRV_CFG: break; + case PERF_EVSEL__CONFIG_TERM_PERCORE: + break; default: break; } diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 6d190cbf1070..cad54e8ba522 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -50,6 +50,7 @@ enum term_type { PERF_EVSEL__CONFIG_TERM_OVERWRITE, PERF_EVSEL__CONFIG_TERM_DRV_CFG, PERF_EVSEL__CONFIG_TERM_BRANCH, + PERF_EVSEL__CONFIG_TERM_PERCORE, }; struct perf_evsel_config_term { @@ -67,6 +68,7 @@ struct perf_evsel_config_term { bool overwrite; char *branch; unsigned long max_events; + bool percore; } val; bool weak; }; @@ -158,6 +160,7 @@ struct perf_evsel { struct perf_evsel **metric_events; bool collect_stat; bool weak_group; + bool percore; const char *pmu_name; struct { perf_evsel__sb_cb_t *cb; diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index 2d2af2ac2b1e..847ae51a524b 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -1344,6 +1344,30 @@ out: return ret; } +static int write_compressed(struct feat_fd *ff __maybe_unused, + struct perf_evlist *evlist __maybe_unused) +{ + int ret; + + ret = do_write(ff, &(ff->ph->env.comp_ver), sizeof(ff->ph->env.comp_ver)); + if (ret) + return ret; + + ret = do_write(ff, &(ff->ph->env.comp_type), sizeof(ff->ph->env.comp_type)); + if (ret) + return ret; + + ret = do_write(ff, &(ff->ph->env.comp_level), sizeof(ff->ph->env.comp_level)); + if (ret) + return ret; + + ret = do_write(ff, &(ff->ph->env.comp_ratio), sizeof(ff->ph->env.comp_ratio)); + if (ret) + return ret; + + return do_write(ff, &(ff->ph->env.comp_mmap_len), sizeof(ff->ph->env.comp_mmap_len)); +} + static void print_hostname(struct feat_fd *ff, FILE *fp) { fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname); @@ -1688,6 +1712,13 @@ static void print_cache(struct feat_fd *ff, FILE *fp __maybe_unused) } } +static void print_compressed(struct feat_fd *ff, FILE *fp) +{ + fprintf(fp, "# compressed : %s, level = %d, ratio = %d\n", + ff->ph->env.comp_type == PERF_COMP_ZSTD ? "Zstd" : "Unknown", + ff->ph->env.comp_level, ff->ph->env.comp_ratio); +} + static void print_pmu_mappings(struct feat_fd *ff, FILE *fp) { const char *delimiter = "# pmu mappings: "; @@ -2667,6 +2698,27 @@ out: return err; } +static int process_compressed(struct feat_fd *ff, + void *data __maybe_unused) +{ + if (do_read_u32(ff, &(ff->ph->env.comp_ver))) + return -1; + + if (do_read_u32(ff, &(ff->ph->env.comp_type))) + return -1; + + if (do_read_u32(ff, &(ff->ph->env.comp_level))) + return -1; + + if (do_read_u32(ff, &(ff->ph->env.comp_ratio))) + return -1; + + if (do_read_u32(ff, &(ff->ph->env.comp_mmap_len))) + return -1; + + return 0; +} + struct feature_ops { int (*write)(struct feat_fd *ff, struct perf_evlist *evlist); void (*print)(struct feat_fd *ff, FILE *fp); @@ -2730,6 +2782,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = { FEAT_OPN(DIR_FORMAT, dir_format, false), FEAT_OPR(BPF_PROG_INFO, bpf_prog_info, false), FEAT_OPR(BPF_BTF, bpf_btf, false), + FEAT_OPR(COMPRESSED, compressed, false), }; struct header_print_data { diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h index 386da49e1bfa..5b3abe4172e2 100644 --- a/tools/perf/util/header.h +++ b/tools/perf/util/header.h @@ -42,6 +42,7 @@ enum { HEADER_DIR_FORMAT, HEADER_BPF_PROG_INFO, HEADER_BPF_BTF, + HEADER_COMPRESSED, HEADER_LAST_FEATURE, HEADER_FEAT_BITS = 256, }; diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 872fab163585..f4c3c84b090f 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -58,6 +58,7 @@ enum intel_pt_pkt_state { INTEL_PT_STATE_NO_IP, INTEL_PT_STATE_ERR_RESYNC, INTEL_PT_STATE_IN_SYNC, + INTEL_PT_STATE_TNT_CONT, INTEL_PT_STATE_TNT, INTEL_PT_STATE_TIP, INTEL_PT_STATE_TIP_PGD, @@ -72,8 +73,9 @@ static inline bool intel_pt_sample_time(enum intel_pt_pkt_state pkt_state) case INTEL_PT_STATE_NO_IP: case INTEL_PT_STATE_ERR_RESYNC: case INTEL_PT_STATE_IN_SYNC: - case INTEL_PT_STATE_TNT: + case INTEL_PT_STATE_TNT_CONT: return true; + case INTEL_PT_STATE_TNT: case INTEL_PT_STATE_TIP: case INTEL_PT_STATE_TIP_PGD: case INTEL_PT_STATE_FUP: @@ -888,16 +890,20 @@ static uint64_t intel_pt_next_period(struct intel_pt_decoder *decoder) timestamp = decoder->timestamp + decoder->timestamp_insn_cnt; masked_timestamp = timestamp & decoder->period_mask; if (decoder->continuous_period) { - if (masked_timestamp != decoder->last_masked_timestamp) + if (masked_timestamp > decoder->last_masked_timestamp) return 1; } else { timestamp += 1; masked_timestamp = timestamp & decoder->period_mask; - if (masked_timestamp != decoder->last_masked_timestamp) { + if (masked_timestamp > decoder->last_masked_timestamp) { decoder->last_masked_timestamp = masked_timestamp; decoder->continuous_period = true; } } + + if (masked_timestamp < decoder->last_masked_timestamp) + return decoder->period_ticks; + return decoder->period_ticks - (timestamp - masked_timestamp); } @@ -926,7 +932,10 @@ static void intel_pt_sample_insn(struct intel_pt_decoder *decoder) case INTEL_PT_PERIOD_TICKS: timestamp = decoder->timestamp + decoder->timestamp_insn_cnt; masked_timestamp = timestamp & decoder->period_mask; - decoder->last_masked_timestamp = masked_timestamp; + if (masked_timestamp > decoder->last_masked_timestamp) + decoder->last_masked_timestamp = masked_timestamp; + else + decoder->last_masked_timestamp += decoder->period_ticks; break; case INTEL_PT_PERIOD_NONE: case INTEL_PT_PERIOD_MTC: @@ -1254,7 +1263,9 @@ static int intel_pt_walk_tnt(struct intel_pt_decoder *decoder) return -ENOENT; } decoder->tnt.count -= 1; - if (!decoder->tnt.count) + if (decoder->tnt.count) + decoder->pkt_state = INTEL_PT_STATE_TNT_CONT; + else decoder->pkt_state = INTEL_PT_STATE_IN_SYNC; decoder->tnt.payload <<= 1; decoder->state.from_ip = decoder->ip; @@ -1285,7 +1296,9 @@ static int intel_pt_walk_tnt(struct intel_pt_decoder *decoder) if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) { decoder->tnt.count -= 1; - if (!decoder->tnt.count) + if (decoder->tnt.count) + decoder->pkt_state = INTEL_PT_STATE_TNT_CONT; + else decoder->pkt_state = INTEL_PT_STATE_IN_SYNC; if (decoder->tnt.payload & BIT63) { decoder->tnt.payload <<= 1; @@ -1305,8 +1318,11 @@ static int intel_pt_walk_tnt(struct intel_pt_decoder *decoder) return 0; } decoder->ip += intel_pt_insn.length; - if (!decoder->tnt.count) + if (!decoder->tnt.count) { + decoder->sample_timestamp = decoder->timestamp; + decoder->sample_insn_cnt = decoder->timestamp_insn_cnt; return -EAGAIN; + } decoder->tnt.payload <<= 1; continue; } @@ -2365,6 +2381,7 @@ const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder) err = intel_pt_walk_trace(decoder); break; case INTEL_PT_STATE_TNT: + case INTEL_PT_STATE_TNT_CONT: err = intel_pt_walk_tnt(decoder); if (err == -EAGAIN) err = intel_pt_walk_trace(decoder); diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 3c520baa198c..28a9541c4835 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1234,8 +1234,9 @@ static char *get_kernel_version(const char *root_dir) if (!file) return NULL; - version[0] = '\0'; tmp = fgets(version, sizeof(version), file); + if (!tmp) + *version = '\0'; fclose(file); name = strstr(version, prefix); diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index ef3d79b2c90b..868c0b0e909c 100644 --- a/tools/perf/util/mmap.c +++ b/tools/perf/util/mmap.c @@ -157,6 +157,10 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb } #ifdef HAVE_AIO_SUPPORT +static int perf_mmap__aio_enabled(struct perf_mmap *map) +{ + return map->aio.nr_cblocks > 0; +} #ifdef HAVE_LIBNUMA_SUPPORT static int perf_mmap__aio_alloc(struct perf_mmap *map, int idx) @@ -198,7 +202,7 @@ static int perf_mmap__aio_bind(struct perf_mmap *map, int idx, int cpu, int affi return 0; } -#else +#else /* !HAVE_LIBNUMA_SUPPORT */ static int perf_mmap__aio_alloc(struct perf_mmap *map, int idx) { map->aio.data[idx] = malloc(perf_mmap__mmap_len(map)); @@ -285,81 +289,12 @@ static void perf_mmap__aio_munmap(struct perf_mmap *map) zfree(&map->aio.cblocks); zfree(&map->aio.aiocb); } - -int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx, - int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off), - off_t *off) +#else /* !HAVE_AIO_SUPPORT */ +static int perf_mmap__aio_enabled(struct perf_mmap *map __maybe_unused) { - u64 head = perf_mmap__read_head(md); - unsigned char *data = md->base + page_size; - unsigned long size, size0 = 0; - void *buf; - int rc = 0; - - rc = perf_mmap__read_init(md); - if (rc < 0) - return (rc == -EAGAIN) ? 0 : -1; - - /* - * md->base data is copied into md->data[idx] buffer to - * release space in the kernel buffer as fast as possible, - * thru perf_mmap__consume() below. - * - * That lets the kernel to proceed with storing more - * profiling data into the kernel buffer earlier than other - * per-cpu kernel buffers are handled. - * - * Coping can be done in two steps in case the chunk of - * profiling data crosses the upper bound of the kernel buffer. - * In this case we first move part of data from md->start - * till the upper bound and then the reminder from the - * beginning of the kernel buffer till the end of - * the data chunk. - */ - - size = md->end - md->start; - - if ((md->start & md->mask) + size != (md->end & md->mask)) { - buf = &data[md->start & md->mask]; - size = md->mask + 1 - (md->start & md->mask); - md->start += size; - memcpy(md->aio.data[idx], buf, size); - size0 = size; - } - - buf = &data[md->start & md->mask]; - size = md->end - md->start; - md->start += size; - memcpy(md->aio.data[idx] + size0, buf, size); - - /* - * Increment md->refcount to guard md->data[idx] buffer - * from premature deallocation because md object can be - * released earlier than aio write request started - * on mmap->data[idx] is complete. - * - * perf_mmap__put() is done at record__aio_complete() - * after started request completion. - */ - perf_mmap__get(md); - - md->prev = head; - perf_mmap__consume(md); - - rc = push(to, &md->aio.cblocks[idx], md->aio.data[idx], size0 + size, *off); - if (!rc) { - *off += size0 + size; - } else { - /* - * Decrement md->refcount back if aio write - * operation failed to start. - */ - perf_mmap__put(md); - } - - return rc; + return 0; } -#else + static int perf_mmap__aio_mmap(struct perf_mmap *map __maybe_unused, struct mmap_params *mp __maybe_unused) { @@ -374,6 +309,10 @@ static void perf_mmap__aio_munmap(struct perf_mmap *map __maybe_unused) void perf_mmap__munmap(struct perf_mmap *map) { perf_mmap__aio_munmap(map); + if (map->data != NULL) { + munmap(map->data, perf_mmap__mmap_len(map)); + map->data = NULL; + } if (map->base != NULL) { munmap(map->base, perf_mmap__mmap_len(map)); map->base = NULL; @@ -442,6 +381,19 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c map->flush = mp->flush; + map->comp_level = mp->comp_level; + + if (map->comp_level && !perf_mmap__aio_enabled(map)) { + map->data = mmap(NULL, perf_mmap__mmap_len(map), PROT_READ|PROT_WRITE, + MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); + if (map->data == MAP_FAILED) { + pr_debug2("failed to mmap data buffer, error %d\n", + errno); + map->data = NULL; + return -1; + } + } + if (auxtrace_mmap__mmap(&map->auxtrace_mmap, &mp->auxtrace_mp, map->base, fd)) return -1; @@ -540,7 +492,7 @@ int perf_mmap__push(struct perf_mmap *md, void *to, rc = perf_mmap__read_init(md); if (rc < 0) - return (rc == -EAGAIN) ? 0 : -1; + return (rc == -EAGAIN) ? 1 : -1; size = md->end - md->start; diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index b82f8c2d55c4..274ce389cd84 100644 --- a/tools/perf/util/mmap.h +++ b/tools/perf/util/mmap.h @@ -40,6 +40,8 @@ struct perf_mmap { #endif cpu_set_t affinity_mask; u64 flush; + void *data; + int comp_level; }; /* @@ -71,7 +73,7 @@ enum bkw_mmap_state { }; struct mmap_params { - int prot, mask, nr_cblocks, affinity, flush; + int prot, mask, nr_cblocks, affinity, flush, comp_level; struct auxtrace_mmap_params auxtrace_mp; }; @@ -99,18 +101,6 @@ union perf_event *perf_mmap__read_event(struct perf_mmap *map); int perf_mmap__push(struct perf_mmap *md, void *to, int push(struct perf_mmap *map, void *to, void *buf, size_t size)); -#ifdef HAVE_AIO_SUPPORT -int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx, - int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off), - off_t *off); -#else -static inline int perf_mmap__aio_push(struct perf_mmap *md __maybe_unused, void *to __maybe_unused, int idx __maybe_unused, - int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off) __maybe_unused, - off_t *off __maybe_unused) -{ - return 0; -} -#endif size_t perf_mmap__mmap_len(struct perf_mmap *map); diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 4432bfe039fd..cf0b9b81c5aa 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -950,6 +950,7 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = { [PARSE_EVENTS__TERM_TYPE_OVERWRITE] = "overwrite", [PARSE_EVENTS__TERM_TYPE_NOOVERWRITE] = "no-overwrite", [PARSE_EVENTS__TERM_TYPE_DRV_CFG] = "driver-config", + [PARSE_EVENTS__TERM_TYPE_PERCORE] = "percore", }; static bool config_term_shrinked; @@ -970,6 +971,7 @@ config_term_avail(int term_type, struct parse_events_error *err) case PARSE_EVENTS__TERM_TYPE_CONFIG2: case PARSE_EVENTS__TERM_TYPE_NAME: case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD: + case PARSE_EVENTS__TERM_TYPE_PERCORE: return true; default: if (!err) @@ -1061,6 +1063,14 @@ do { \ case PARSE_EVENTS__TERM_TYPE_MAX_EVENTS: CHECK_TYPE_VAL(NUM); break; + case PARSE_EVENTS__TERM_TYPE_PERCORE: + CHECK_TYPE_VAL(NUM); + if ((unsigned int)term->val.num > 1) { + err->str = strdup("expected 0 or 1"); + err->idx = term->err_val; + return -EINVAL; + } + break; default: err->str = strdup("unknown term"); err->idx = term->err_term; @@ -1199,6 +1209,10 @@ do { \ case PARSE_EVENTS__TERM_TYPE_DRV_CFG: ADD_CONFIG_TERM(DRV_CFG, drv_cfg, term->val.str); break; + case PARSE_EVENTS__TERM_TYPE_PERCORE: + ADD_CONFIG_TERM(PERCORE, percore, + term->val.num ? true : false); + break; default: break; } @@ -1260,6 +1274,18 @@ int parse_events_add_tool(struct parse_events_state *parse_state, return add_event_tool(list, &parse_state->idx, tool_event); } +static bool config_term_percore(struct list_head *config_terms) +{ + struct perf_evsel_config_term *term; + + list_for_each_entry(term, config_terms, list) { + if (term->type == PERF_EVSEL__CONFIG_TERM_PERCORE) + return term->val.percore; + } + + return false; +} + int parse_events_add_pmu(struct parse_events_state *parse_state, struct list_head *list, char *name, struct list_head *head_config, @@ -1333,6 +1359,7 @@ int parse_events_add_pmu(struct parse_events_state *parse_state, evsel->metric_name = info.metric_name; evsel->pmu_name = name; evsel->use_uncore_alias = use_uncore_alias; + evsel->percore = config_term_percore(&evsel->config_terms); } return evsel ? 0 : -ENOMEM; diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index a052cd6ac63e..f7139e1a2fd3 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -75,6 +75,7 @@ enum { PARSE_EVENTS__TERM_TYPE_NOOVERWRITE, PARSE_EVENTS__TERM_TYPE_OVERWRITE, PARSE_EVENTS__TERM_TYPE_DRV_CFG, + PARSE_EVENTS__TERM_TYPE_PERCORE, __PARSE_EVENTS__TERM_TYPE_NR, }; diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l index c54bfe88626c..ca6098874fe2 100644 --- a/tools/perf/util/parse-events.l +++ b/tools/perf/util/parse-events.l @@ -283,6 +283,7 @@ inherit { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_INHERIT); } no-inherit { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); } overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); } no-overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); } +percore { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_PERCORE); } , { return ','; } "/" { BEGIN(INITIAL); return '/'; } {name_minus} { return str(yyscanner, PE_NAME); } diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c index e6599e290f46..08581e276225 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -5,13 +5,14 @@ #include <subcmd/parse-options.h> #include "util/parse-regs-options.h" -int -parse_regs(const struct option *opt, const char *str, int unset) +static int +__parse_regs(const struct option *opt, const char *str, int unset, bool intr) { uint64_t *mode = (uint64_t *)opt->value; const struct sample_reg *r; char *s, *os = NULL, *p; int ret = -1; + uint64_t mask; if (unset) return 0; @@ -22,6 +23,11 @@ parse_regs(const struct option *opt, const char *str, int unset) if (*mode) return -1; + if (intr) + mask = arch__intr_reg_mask(); + else + mask = arch__user_reg_mask(); + /* str may be NULL in case no arg is passed to -I */ if (str) { /* because str is read-only */ @@ -37,19 +43,20 @@ parse_regs(const struct option *opt, const char *str, int unset) if (!strcmp(s, "?")) { fprintf(stderr, "available registers: "); for (r = sample_reg_masks; r->name; r++) { - fprintf(stderr, "%s ", r->name); + if (r->mask & mask) + fprintf(stderr, "%s ", r->name); } fputc('\n', stderr); /* just printing available regs */ return -1; } for (r = sample_reg_masks; r->name; r++) { - if (!strcasecmp(s, r->name)) + if ((r->mask & mask) && !strcasecmp(s, r->name)) break; } if (!r->name) { - ui__warning("unknown register %s," - " check man page\n", s); + ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n", + s, intr ? "-I" : "--user-regs="); goto error; } @@ -65,8 +72,20 @@ parse_regs(const struct option *opt, const char *str, int unset) /* default to all possible regs */ if (*mode == 0) - *mode = PERF_REGS_MASK; + *mode = mask; error: free(os); return ret; } + +int +parse_user_regs(const struct option *opt, const char *str, int unset) +{ + return __parse_regs(opt, str, unset, false); +} + +int +parse_intr_regs(const struct option *opt, const char *str, int unset) +{ + return __parse_regs(opt, str, unset, true); +} diff --git a/tools/perf/util/parse-regs-options.h b/tools/perf/util/parse-regs-options.h index cdefb1acf6be..2b23d25c6394 100644 --- a/tools/perf/util/parse-regs-options.h +++ b/tools/perf/util/parse-regs-options.h @@ -2,5 +2,6 @@ #ifndef _PERF_PARSE_REGS_OPTIONS_H #define _PERF_PARSE_REGS_OPTIONS_H 1 struct option; -int parse_regs(const struct option *opt, const char *str, int unset); +int parse_user_regs(const struct option *opt, const char *str, int unset); +int parse_intr_regs(const struct option *opt, const char *str, int unset); #endif /* _PERF_PARSE_REGS_OPTIONS_H */ diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 2acfcc527cac..2774cec1f15f 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -13,6 +13,16 @@ int __weak arch_sdt_arg_parse_op(char *old_op __maybe_unused, return SDT_ARG_SKIP; } +uint64_t __weak arch__intr_reg_mask(void) +{ + return PERF_REGS_MASK; +} + +uint64_t __weak arch__user_reg_mask(void) +{ + return PERF_REGS_MASK; +} + #ifdef HAVE_PERF_REGS_SUPPORT int perf_reg_value(u64 *valp, struct regs_dump *regs, int id) { diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index c9319f8d17a6..cb9c246c8962 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -12,6 +12,7 @@ struct sample_reg { uint64_t mask; }; #define SMPL_REG(n, b) { .name = #n, .mask = 1ULL << (b) } +#define SMPL_REG2(n, b) { .name = #n, .mask = 3ULL << (b) } #define SMPL_REG_END { .name = NULL } extern const struct sample_reg sample_reg_masks[]; @@ -22,6 +23,8 @@ enum { }; int arch_sdt_arg_parse_op(char *old_op, char **new_op); +uint64_t arch__intr_reg_mask(void); +uint64_t arch__user_reg_mask(void); #ifdef HAVE_PERF_REGS_SUPPORT #include <perf_regs.h> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index bad5f87ae001..2310a1752983 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -29,6 +29,61 @@ #include "stat.h" #include "arch/common.h" +#ifdef HAVE_ZSTD_SUPPORT +static int perf_session__process_compressed_event(struct perf_session *session, + union perf_event *event, u64 file_offset) +{ + void *src; + size_t decomp_size, src_size; + u64 decomp_last_rem = 0; + size_t decomp_len = session->header.env.comp_mmap_len; + struct decomp *decomp, *decomp_last = session->decomp_last; + + decomp = mmap(NULL, sizeof(struct decomp) + decomp_len, PROT_READ|PROT_WRITE, + MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); + if (decomp == MAP_FAILED) { + pr_err("Couldn't allocate memory for decompression\n"); + return -1; + } + + decomp->file_pos = file_offset; + decomp->head = 0; + + if (decomp_last) { + decomp_last_rem = decomp_last->size - decomp_last->head; + memcpy(decomp->data, &(decomp_last->data[decomp_last->head]), decomp_last_rem); + decomp->size = decomp_last_rem; + } + + src = (void *)event + sizeof(struct compressed_event); + src_size = event->pack.header.size - sizeof(struct compressed_event); + + decomp_size = zstd_decompress_stream(&(session->zstd_data), src, src_size, + &(decomp->data[decomp_last_rem]), decomp_len - decomp_last_rem); + if (!decomp_size) { + munmap(decomp, sizeof(struct decomp) + decomp_len); + pr_err("Couldn't decompress data\n"); + return -1; + } + + decomp->size += decomp_size; + + if (session->decomp == NULL) { + session->decomp = decomp; + session->decomp_last = decomp; + } else { + session->decomp_last->next = decomp; + session->decomp_last = decomp; + } + + pr_debug("decomp (B): %ld to %ld\n", src_size, decomp_size); + + return 0; +} +#else /* !HAVE_ZSTD_SUPPORT */ +#define perf_session__process_compressed_event perf_session__process_compressed_event_stub +#endif + static int perf_session__deliver_event(struct perf_session *session, union perf_event *event, struct perf_tool *tool, @@ -197,6 +252,21 @@ static void perf_session__delete_threads(struct perf_session *session) machine__delete_threads(&session->machines.host); } +static void perf_session__release_decomp_events(struct perf_session *session) +{ + struct decomp *next, *decomp; + size_t decomp_len; + next = session->decomp; + decomp_len = session->header.env.comp_mmap_len; + do { + decomp = next; + if (decomp == NULL) + break; + next = decomp->next; + munmap(decomp, decomp_len + sizeof(struct decomp)); + } while (1); +} + void perf_session__delete(struct perf_session *session) { if (session == NULL) @@ -205,6 +275,7 @@ void perf_session__delete(struct perf_session *session) auxtrace_index__free(&session->auxtrace_index); perf_session__destroy_kernel_maps(session); perf_session__delete_threads(session); + perf_session__release_decomp_events(session); perf_env__exit(&session->header.env); machines__exit(&session->machines); if (session->data) @@ -358,6 +429,14 @@ static int process_stat_round_stub(struct perf_session *perf_session __maybe_unu return 0; } +static int perf_session__process_compressed_event_stub(struct perf_session *session __maybe_unused, + union perf_event *event __maybe_unused, + u64 file_offset __maybe_unused) +{ + dump_printf(": unhandled!\n"); + return 0; +} + void perf_tool__fill_defaults(struct perf_tool *tool) { if (tool->sample == NULL) @@ -430,6 +509,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool) tool->time_conv = process_event_op2_stub; if (tool->feature == NULL) tool->feature = process_event_op2_stub; + if (tool->compressed == NULL) + tool->compressed = perf_session__process_compressed_event; } static void swap_sample_id_all(union perf_event *event, void *data) @@ -1373,7 +1454,9 @@ static s64 perf_session__process_user_event(struct perf_session *session, int fd = perf_data__fd(session->data); int err; - dump_event(session->evlist, event, file_offset, &sample); + if (event->header.type != PERF_RECORD_COMPRESSED || + tool->compressed == perf_session__process_compressed_event_stub) + dump_event(session->evlist, event, file_offset, &sample); /* These events are processed right away */ switch (event->header.type) { @@ -1426,6 +1509,11 @@ static s64 perf_session__process_user_event(struct perf_session *session, return tool->time_conv(session, event); case PERF_RECORD_HEADER_FEATURE: return tool->feature(session, event); + case PERF_RECORD_COMPRESSED: + err = tool->compressed(session, event, file_offset); + if (err) + dump_event(session->evlist, event, file_offset, &sample); + return err; default: return -EINVAL; } @@ -1708,6 +1796,8 @@ static int perf_session__flush_thread_stacks(struct perf_session *session) volatile int session_done; +static int __perf_session__process_decomp_events(struct perf_session *session); + static int __perf_session__process_pipe_events(struct perf_session *session) { struct ordered_events *oe = &session->ordered_events; @@ -1788,6 +1878,10 @@ more: if (skip > 0) head += skip; + err = __perf_session__process_decomp_events(session); + if (err) + goto out_err; + if (!session_done()) goto more; done: @@ -1836,6 +1930,39 @@ fetch_mmaped_event(struct perf_session *session, return event; } +static int __perf_session__process_decomp_events(struct perf_session *session) +{ + s64 skip; + u64 size, file_pos = 0; + struct decomp *decomp = session->decomp_last; + + if (!decomp) + return 0; + + while (decomp->head < decomp->size && !session_done()) { + union perf_event *event = fetch_mmaped_event(session, decomp->head, decomp->size, decomp->data); + + if (!event) + break; + + size = event->header.size; + + if (size < sizeof(struct perf_event_header) || + (skip = perf_session__process_event(session, event, file_pos)) < 0) { + pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n", + decomp->file_pos + decomp->head, event->header.size, event->header.type); + return -EINVAL; + } + + if (skip) + size += skip; + + decomp->head += size; + } + + return 0; +} + /* * On 64bit we can mmap the data file in one go. No need for tiny mmap * slices. On 32bit we use 32MB. @@ -1945,6 +2072,10 @@ more: head += size; file_pos += size; + err = __perf_session__process_decomp_events(session); + if (err) + goto out; + ui_progress__update(prog, size); if (session_done()) diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h index d96eccd7d27f..dd8920b745bc 100644 --- a/tools/perf/util/session.h +++ b/tools/perf/util/session.h @@ -8,6 +8,7 @@ #include "machine.h" #include "data.h" #include "ordered-events.h" +#include "util/compress.h" #include <linux/kernel.h> #include <linux/rbtree.h> #include <linux/perf_event.h> @@ -35,6 +36,19 @@ struct perf_session { struct ordered_events ordered_events; struct perf_data *data; struct perf_tool *tool; + u64 bytes_transferred; + u64 bytes_compressed; + struct zstd_data zstd_data; + struct decomp *decomp; + struct decomp *decomp_last; +}; + +struct decomp { + struct decomp *next; + u64 file_pos; + u64 head; + size_t size; + char data[]; }; struct perf_tool; diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index 3324f23c7efc..4c53bae5644b 100644 --- a/tools/perf/util/stat-display.c +++ b/tools/perf/util/stat-display.c @@ -88,9 +88,17 @@ static void aggr_printout(struct perf_stat_config *config, config->csv_sep); break; case AGGR_NONE: - fprintf(config->output, "CPU%*d%s", - config->csv_output ? 0 : -4, - perf_evsel__cpus(evsel)->map[id], config->csv_sep); + if (evsel->percore) { + fprintf(config->output, "S%d-C%*d%s", + cpu_map__id_to_socket(id), + config->csv_output ? 0 : -5, + cpu_map__id_to_cpu(id), config->csv_sep); + } else { + fprintf(config->output, "CPU%*d%s ", + config->csv_output ? 0 : -5, + perf_evsel__cpus(evsel)->map[id], + config->csv_sep); + } break; case AGGR_THREAD: fprintf(config->output, "%*s-%*d%s", @@ -594,6 +602,41 @@ static void aggr_cb(struct perf_stat_config *config, } } +static void print_counter_aggrdata(struct perf_stat_config *config, + struct perf_evsel *counter, int s, + char *prefix, bool metric_only, + bool *first) +{ + struct aggr_data ad; + FILE *output = config->output; + u64 ena, run, val; + int id, nr; + double uval; + + ad.id = id = config->aggr_map->map[s]; + ad.val = ad.ena = ad.run = 0; + ad.nr = 0; + if (!collect_data(config, counter, aggr_cb, &ad)) + return; + + nr = ad.nr; + ena = ad.ena; + run = ad.run; + val = ad.val; + if (*first && metric_only) { + *first = false; + aggr_printout(config, counter, id, nr); + } + if (prefix && !metric_only) + fprintf(output, "%s", prefix); + + uval = val * counter->scale; + printout(config, id, nr, counter, uval, prefix, + run, ena, 1.0, &rt_stat); + if (!metric_only) + fputc('\n', output); +} + static void print_aggr(struct perf_stat_config *config, struct perf_evlist *evlist, char *prefix) @@ -601,9 +644,7 @@ static void print_aggr(struct perf_stat_config *config, bool metric_only = config->metric_only; FILE *output = config->output; struct perf_evsel *counter; - int s, id, nr; - double uval; - u64 ena, run, val; + int s; bool first; if (!(config->aggr_map || config->aggr_get_id)) @@ -616,33 +657,14 @@ static void print_aggr(struct perf_stat_config *config, * Without each counter has its own line. */ for (s = 0; s < config->aggr_map->nr; s++) { - struct aggr_data ad; if (prefix && metric_only) fprintf(output, "%s", prefix); - ad.id = id = config->aggr_map->map[s]; first = true; evlist__for_each_entry(evlist, counter) { - ad.val = ad.ena = ad.run = 0; - ad.nr = 0; - if (!collect_data(config, counter, aggr_cb, &ad)) - continue; - nr = ad.nr; - ena = ad.ena; - run = ad.run; - val = ad.val; - if (first && metric_only) { - first = false; - aggr_printout(config, counter, id, nr); - } - if (prefix && !metric_only) - fprintf(output, "%s", prefix); - - uval = val * counter->scale; - printout(config, id, nr, counter, uval, prefix, - run, ena, 1.0, &rt_stat); - if (!metric_only) - fputc('\n', output); + print_counter_aggrdata(config, counter, s, + prefix, metric_only, + &first); } if (metric_only) fputc('\n', output); @@ -1089,6 +1111,30 @@ static void print_footer(struct perf_stat_config *config) "the same PMU. Try reorganizing the group.\n"); } +static void print_percore(struct perf_stat_config *config, + struct perf_evsel *counter, char *prefix) +{ + bool metric_only = config->metric_only; + FILE *output = config->output; + int s; + bool first = true; + + if (!(config->aggr_map || config->aggr_get_id)) + return; + + for (s = 0; s < config->aggr_map->nr; s++) { + if (prefix && metric_only) + fprintf(output, "%s", prefix); + + print_counter_aggrdata(config, counter, s, + prefix, metric_only, + &first); + } + + if (metric_only) + fputc('\n', output); +} + void perf_evlist__print_counters(struct perf_evlist *evlist, struct perf_stat_config *config, @@ -1139,7 +1185,10 @@ perf_evlist__print_counters(struct perf_evlist *evlist, print_no_aggr_metric(config, evlist, prefix); else { evlist__for_each_entry(evlist, counter) { - print_counter(config, counter, prefix); + if (counter->percore) + print_percore(config, counter, prefix); + else + print_counter(config, counter, prefix); } } break; diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index 2856cc9d5a31..c3115d939b0b 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -277,9 +277,11 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel if (!evsel->snapshot) perf_evsel__compute_deltas(evsel, cpu, thread, count); perf_counts_values__scale(count, config->scale, NULL); - if (config->aggr_mode == AGGR_NONE) - perf_stat__update_shadow_stats(evsel, count->val, cpu, - &rt_stat); + if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) { + perf_stat__update_shadow_stats(evsel, count->val, + cpu, &rt_stat); + } + if (config->aggr_mode == AGGR_THREAD) { if (config->stats) perf_stat__update_shadow_stats(evsel, diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c index 50678d318185..403045a2bbea 100644 --- a/tools/perf/util/thread.c +++ b/tools/perf/util/thread.c @@ -15,6 +15,7 @@ #include "map.h" #include "symbol.h" #include "unwind.h" +#include "callchain.h" #include <api/fs/fs.h> @@ -327,7 +328,7 @@ static int thread__prepare_access(struct thread *thread) { int err = 0; - if (symbol_conf.use_callchain) + if (dwarf_callchain_users) err = __thread__prepare_access(thread); return err; diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h index 250391672f9f..9096a6e3de59 100644 --- a/tools/perf/util/tool.h +++ b/tools/perf/util/tool.h @@ -28,6 +28,7 @@ typedef int (*event_attr_op)(struct perf_tool *tool, typedef int (*event_op2)(struct perf_session *session, union perf_event *event); typedef s64 (*event_op3)(struct perf_session *session, union perf_event *event); +typedef int (*event_op4)(struct perf_session *session, union perf_event *event, u64 data); typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event, struct ordered_events *oe); @@ -72,6 +73,7 @@ struct perf_tool { stat, stat_round, feature; + event_op4 compressed; event_op3 auxtrace; bool ordered_events; bool ordering_requires_timestamps; diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c index f3c666a84e4d..25e1406b1f8b 100644 --- a/tools/perf/util/unwind-libunwind-local.c +++ b/tools/perf/util/unwind-libunwind-local.c @@ -617,8 +617,6 @@ static unw_accessors_t accessors = { static int _unwind__prepare_access(struct thread *thread) { - if (!dwarf_callchain_users) - return 0; thread->addr_space = unw_create_addr_space(&accessors, 0); if (!thread->addr_space) { pr_err("unwind: Can't create unwind address space.\n"); @@ -631,15 +629,11 @@ static int _unwind__prepare_access(struct thread *thread) static void _unwind__flush_access(struct thread *thread) { - if (!dwarf_callchain_users) - return; unw_flush_cache(thread->addr_space, 0, 0); } static void _unwind__finish_access(struct thread *thread) { - if (!dwarf_callchain_users) - return; unw_destroy_addr_space(thread->addr_space); } diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c index 9778b3133b77..c0811977d7d5 100644 --- a/tools/perf/util/unwind-libunwind.c +++ b/tools/perf/util/unwind-libunwind.c @@ -5,6 +5,7 @@ #include "session.h" #include "debug.h" #include "env.h" +#include "callchain.h" struct unwind_libunwind_ops __weak *local_unwind_libunwind_ops; struct unwind_libunwind_ops __weak *x86_32_unwind_libunwind_ops; @@ -24,6 +25,9 @@ int unwind__prepare_access(struct thread *thread, struct map *map, struct unwind_libunwind_ops *ops = local_unwind_libunwind_ops; int err; + if (!dwarf_callchain_users) + return 0; + if (thread->addr_space) { pr_debug("unwind: thread map already set, dso=%s\n", map->dso->name); @@ -65,12 +69,18 @@ out_register: void unwind__flush_access(struct thread *thread) { + if (!dwarf_callchain_users) + return; + if (thread->unwind_libunwind_ops) thread->unwind_libunwind_ops->flush_access(thread); } void unwind__finish_access(struct thread *thread) { + if (!dwarf_callchain_users) + return; + if (thread->unwind_libunwind_ops) thread->unwind_libunwind_ops->finish_access(thread); } diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c new file mode 100644 index 000000000000..23bdb9884576 --- /dev/null +++ b/tools/perf/util/zstd.c @@ -0,0 +1,111 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <string.h> + +#include "util/compress.h" +#include "util/debug.h" + +int zstd_init(struct zstd_data *data, int level) +{ + size_t ret; + + data->dstream = ZSTD_createDStream(); + if (data->dstream == NULL) { + pr_err("Couldn't create decompression stream.\n"); + return -1; + } + + ret = ZSTD_initDStream(data->dstream); + if (ZSTD_isError(ret)) { + pr_err("Failed to initialize decompression stream: %s\n", ZSTD_getErrorName(ret)); + return -1; + } + + if (!level) + return 0; + + data->cstream = ZSTD_createCStream(); + if (data->cstream == NULL) { + pr_err("Couldn't create compression stream.\n"); + return -1; + } + + ret = ZSTD_initCStream(data->cstream, level); + if (ZSTD_isError(ret)) { + pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret)); + return -1; + } + + return 0; +} + +int zstd_fini(struct zstd_data *data) +{ + if (data->dstream) { + ZSTD_freeDStream(data->dstream); + data->dstream = NULL; + } + + if (data->cstream) { + ZSTD_freeCStream(data->cstream); + data->cstream = NULL; + } + + return 0; +} + +size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size, + void *src, size_t src_size, size_t max_record_size, + size_t process_header(void *record, size_t increment)) +{ + size_t ret, size, compressed = 0; + ZSTD_inBuffer input = { src, src_size, 0 }; + ZSTD_outBuffer output; + void *record; + + while (input.pos < input.size) { + record = dst; + size = process_header(record, 0); + compressed += size; + dst += size; + dst_size -= size; + output = (ZSTD_outBuffer){ dst, (dst_size > max_record_size) ? + max_record_size : dst_size, 0 }; + ret = ZSTD_compressStream(data->cstream, &output, &input); + ZSTD_flushStream(data->cstream, &output); + if (ZSTD_isError(ret)) { + pr_err("failed to compress %ld bytes: %s\n", + (long)src_size, ZSTD_getErrorName(ret)); + memcpy(dst, src, src_size); + return src_size; + } + size = output.pos; + size = process_header(record, size); + compressed += size; + dst += size; + dst_size -= size; + } + + return compressed; +} + +size_t zstd_decompress_stream(struct zstd_data *data, void *src, size_t src_size, + void *dst, size_t dst_size) +{ + size_t ret; + ZSTD_inBuffer input = { src, src_size, 0 }; + ZSTD_outBuffer output = { dst, dst_size, 0 }; + + while (input.pos < input.size) { + ret = ZSTD_decompressStream(data->dstream, &output, &input); + if (ZSTD_isError(ret)) { + pr_err("failed to decompress (B): %ld -> %ld : %s\n", + src_size, output.size, ZSTD_getErrorName(ret)); + break; + } + output.dst = dst + output.pos; + output.size = dst_size - output.pos; + } + + return output.pos; +} |