diff options
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r-- | tools/perf/Documentation/perf-annotate.txt | 3 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-check.txt | 82 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-ftrace.txt | 48 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-kvm.txt | 6 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-list.txt | 1 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-mem.txt | 94 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-record.txt | 14 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-report.txt | 1 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-sched.txt | 9 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-script.txt | 5 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-stat.txt | 8 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-top.txt | 4 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-trace.txt | 4 | ||||
-rw-r--r-- | tools/perf/Documentation/topdown.txt | 30 |
14 files changed, 265 insertions, 44 deletions
diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt index b95524bea021..156c5f37b051 100644 --- a/tools/perf/Documentation/perf-annotate.txt +++ b/tools/perf/Documentation/perf-annotate.txt @@ -165,6 +165,9 @@ include::itrace.txt[] --type-stat:: Show stats for the data type annotation. +--skip-empty:: + Do not display empty (or dummy) events. + SEE ALSO -------- diff --git a/tools/perf/Documentation/perf-check.txt b/tools/perf/Documentation/perf-check.txt new file mode 100644 index 000000000000..10f69fb6850b --- /dev/null +++ b/tools/perf/Documentation/perf-check.txt @@ -0,0 +1,82 @@ +perf-check(1) +=============== + +NAME +---- +perf-check - check if features are present in perf + +SYNOPSIS +-------- +[verse] +'perf check' [<options>] +'perf check' {feature <feature_list>} [<options>] + +DESCRIPTION +----------- +With no subcommands given, 'perf check' command just prints the command +usage on the standard output. + +If the subcommand 'feature' is used, then status of feature is printed +on the standard output (unless '-q' is also passed), ie. whether it is +compiled-in/built-in or not. +Also, 'perf check feature' returns with exit status 0 if the feature +is built-in, otherwise returns with exit status 1. + +SUBCOMMANDS +----------- + +feature:: + + Print whether feature(s) is compiled-in or not, and also returns with an + exit status of 0, if passed feature(s) are compiled-in, else 1. + + It expects a feature list as an argument. There can be a single feature + name/macro, or multiple features can also be passed as a comma-separated + list, in which case the exit status will be 0 only if all of the passed + features are compiled-in. + + The feature names/macros are case-insensitive. + + Example Usage: + perf check feature libtraceevent + perf check feature HAVE_LIBTRACEEVENT + perf check feature libtraceevent,bpf + + Supported feature names/macro: + aio / HAVE_AIO_SUPPORT + bpf / HAVE_LIBBPF_SUPPORT + bpf_skeletons / HAVE_BPF_SKEL + debuginfod / HAVE_DEBUGINFOD_SUPPORT + dwarf / HAVE_DWARF_SUPPORT + dwarf_getlocations / HAVE_DWARF_GETLOCATIONS_SUPPORT + dwarf-unwind / HAVE_DWARF_UNWIND_SUPPORT + auxtrace / HAVE_AUXTRACE_SUPPORT + libaudit / HAVE_LIBAUDIT_SUPPORT + libbfd / HAVE_LIBBFD_SUPPORT + libcapstone / HAVE_LIBCAPSTONE_SUPPORT + libcrypto / HAVE_LIBCRYPTO_SUPPORT + libdw-dwarf-unwind / HAVE_DWARF_SUPPORT + libelf / HAVE_LIBELF_SUPPORT + libnuma / HAVE_LIBNUMA_SUPPORT + libopencsd / HAVE_CSTRACE_SUPPORT + libperl / HAVE_LIBPERL_SUPPORT + libpfm4 / HAVE_LIBPFM + libpython / HAVE_LIBPYTHON_SUPPORT + libslang / HAVE_SLANG_SUPPORT + libtraceevent / HAVE_LIBTRACEEVENT + libunwind / HAVE_LIBUNWIND_SUPPORT + lzma / HAVE_LZMA_SUPPORT + numa_num_possible_cpus / HAVE_LIBNUMA_SUPPORT + syscall_table / HAVE_SYSCALL_TABLE_SUPPORT + zlib / HAVE_ZLIB_SUPPORT + zstd / HAVE_ZSTD_SUPPORT + +OPTIONS +------- +-q:: +--quiet:: + Do not print any messages or warnings + + This can be used along with subcommands such as 'perf check feature' + to hide unnecessary output in test scripts, eg. + 'perf check feature --quiet libtraceevent' diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt index d780b93fcf87..eaec8253be68 100644 --- a/tools/perf/Documentation/perf-ftrace.txt +++ b/tools/perf/Documentation/perf-ftrace.txt @@ -9,7 +9,7 @@ perf-ftrace - simple wrapper for kernel's ftrace functionality SYNOPSIS -------- [verse] -'perf ftrace' {trace|latency} <command> +'perf ftrace' {trace|latency|profile} <command> DESCRIPTION ----------- @@ -23,6 +23,9 @@ kernel's ftrace infrastructure. 'perf ftrace latency' calculates execution latency of a given function (optionally with BPF) and display it as a histogram. + 'perf ftrace profile' show a execution profile for each function including + total, average, max time and the number of calls. + The following options apply to perf ftrace. COMMON OPTIONS @@ -125,6 +128,7 @@ OPTIONS for 'perf ftrace trace' - verbose - Show process names, PIDs, timestamps, etc. - thresh=<n> - Setup trace duration threshold in microseconds. - depth=<n> - Set max depth for function graph tracer to follow. + - tail - Print function name at the end. OPTIONS for 'perf ftrace latency' @@ -145,6 +149,48 @@ OPTIONS for 'perf ftrace latency' Use nano-second instead of micro-second as a base unit of the histogram. +OPTIONS for 'perf ftrace profile' +--------------------------------- + +-T:: +--trace-funcs=:: + Set function filter on the given function (or a glob pattern). + Multiple functions can be given by using this option more than once. + The function argument also can be a glob pattern. It will be passed + to 'set_ftrace_filter' in tracefs. + +-N:: +--notrace-funcs=:: + Do not trace functions given by the argument. Like -T option, this + can be used more than once to specify multiple functions (or glob + patterns). It will be passed to 'set_ftrace_notrace' in tracefs. + +-G:: +--graph-funcs=:: + Set graph filter on the given function (or a glob pattern). This is + useful to trace for functions executed from the given function. This + can be used more than once to specify multiple functions. It will be + passed to 'set_graph_function' in tracefs. + +-g:: +--nograph-funcs=:: + Set graph notrace filter on the given function (or a glob pattern). + Like -G option, this is useful for the function_graph tracer only and + disables tracing for function executed from the given function. This + can be used more than once to specify multiple functions. It will be + passed to 'set_graph_notrace' in tracefs. + +-m:: +--buffer-size:: + Set the size of per-cpu tracing buffer, <size> is expected to + be a number with appended unit character - B/K/M/G. + +-s:: +--sort=:: + Sort the result by the given field. Available values are: + total, avg, max, count, name. Default is 'total'. + + SEE ALSO -------- linkperf:perf-record[1], linkperf:perf-trace[1] diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt index b66be66fe836..c26524d38f47 100644 --- a/tools/perf/Documentation/perf-kvm.txt +++ b/tools/perf/Documentation/perf-kvm.txt @@ -115,9 +115,9 @@ STAT LIVE OPTIONS -m:: --mmap-pages=:: - Number of mmap data pages (must be a power of two) or size - specification with appended unit character - B/K/M/G. The - size is rounded up to have nearest pages power of two value. + Number of mmap data pages (must be a power of two) or size + specification in bytes with appended unit character - B/K/M/G. + The size is rounded up to the nearest power-of-two page value. -a:: --all-cpus:: diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 6bf2468f59d3..dea005410ec0 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -72,6 +72,7 @@ counted. The following modifiers exist: W - group is weak and will fallback to non-group if not schedulable, e - group or event are exclusive and do not share the PMU b - use BPF aggregration (see perf stat --bpf-counters) + R - retire latency value of the event The 'p' modifier can be used for specifying how precise the instruction address should be. The 'p' modifier can be specified multiple times: diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt index 47456b212e99..8a1bd9ff0f86 100644 --- a/tools/perf/Documentation/perf-mem.txt +++ b/tools/perf/Documentation/perf-mem.txt @@ -28,15 +28,8 @@ and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide. Due to the statistical nature of SPE sampling, not every memory operation will be sampled. -OPTIONS -------- -<command>...:: - Any command you can specify in a shell. - --i:: ---input=<file>:: - Input file name. - +COMMON OPTIONS +-------------- -f:: --force:: Don't do ownership validation @@ -45,24 +38,9 @@ OPTIONS --type=<type>:: Select the memory operation type: load or store (default: load,store) --D:: ---dump-raw-samples:: - Dump the raw decoded samples on the screen in a format that is easy to parse with - one sample per line. - --x:: ---field-separator=<separator>:: - Specify the field separator used when dump raw samples (-D option). By default, - The separator is the space character. - --C:: ---cpu=<cpu>:: - Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a - comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. Default - is to monitor all CPUS. --U:: ---hide-unresolved:: - Only display entries resolved to a symbol. +-v:: +--verbose:: + Be more verbose (show counter open errors, etc) -p:: --phys-data:: @@ -73,6 +51,9 @@ OPTIONS RECORD OPTIONS -------------- +<command>...:: + Any command you can specify in a shell. + -e:: --event <event>:: Event selector. Use 'perf mem record -e list' to list available events. @@ -85,14 +66,65 @@ RECORD OPTIONS --all-user:: Configure all used events to run in user space. --v:: ---verbose:: - Be more verbose (show counter open errors, etc) - --ldlat <n>:: Specify desired latency for loads event. Supported on Intel and Arm64 processors only. Ignored on other archs. +REPORT OPTIONS +-------------- +-i:: +--input=<file>:: + Input file name. + +-C:: +--cpu=<cpu>:: + Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a + comma-separated list with no space: 0,1. Ranges of CPUs are specified with - + like 0-2. Default is to monitor all CPUS. + +-D:: +--dump-raw-samples:: + Dump the raw decoded samples on the screen in a format that is easy to parse with + one sample per line. + +-s:: +--sort=<key>:: + Group result by given key(s) - multiple keys can be specified + in CSV format. The keys are specific to memory samples are: + symbol_daddr, symbol_iaddr, dso_daddr, locked, tlb, mem, snoop, + dcacheline, phys_daddr, data_page_size, blocked. + + - symbol_daddr: name of data symbol being executed on at the time of sample + - symbol_iaddr: name of code symbol being executed on at the time of sample + - dso_daddr: name of library or module containing the data being executed + on at the time of the sample + - locked: whether the bus was locked at the time of the sample + - tlb: type of tlb access for the data at the time of the sample + - mem: type of memory access for the data at the time of the sample + - snoop: type of snoop (if any) for the data at the time of the sample + - dcacheline: the cacheline the data address is on at the time of the sample + - phys_daddr: physical address of data being executed on at the time of sample + - data_page_size: the data page size of data being executed on at the time of sample + - blocked: reason of blocked load access for the data at the time of the sample + + And the default sort keys are changed to local_weight, mem, sym, dso, + symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat. + +-T:: +--type-profile:: + Show data-type profile result instead of code symbols. This requires + the debug information and it will change the default sort keys to: + mem, snoop, tlb, type. + +-U:: +--hide-unresolved:: + Only display entries resolved to a symbol. + +-x:: +--field-separator=<separator>:: + Specify the field separator used when dump raw samples (-D option). By default, + The separator is the space character. + In addition, for report all perf report options are valid, and for record all perf record options. diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index d6532ed97c02..242223240a08 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -273,10 +273,11 @@ OPTIONS -m:: --mmap-pages=:: Number of mmap data pages (must be a power of two) or size - specification with appended unit character - B/K/M/G. The - size is rounded up to have nearest pages power of two value. - Also, by adding a comma, the number of mmap pages for AUX - area tracing can be specified. + specification in bytes with appended unit character - B/K/M/G. + The size is rounded up to the nearest power-of-two page value. + By adding a comma, an additional parameter with the same + semantics used for the normal mmap areas can be specified for + AUX tracing area. -g:: Enables call-graph (stack chain/backtrace) recording for both @@ -828,6 +829,11 @@ filtered through the mask provided by -C option. only, as of now. So the applications built without the frame pointer might see bogus addresses. +--setup-filter=<action>:: + Prepare BPF filter to be used by regular users. The action should be + either "pin" or "unpin". The filter can be used after it's pinned. + + include::intel-hybrid.txt[] SEE ALSO diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index d2b1593ef700..7c66d81ab978 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -614,6 +614,7 @@ include::itrace.txt[] 'Avg Cycles%' - block average sampled cycles / sum of total block average sampled cycles 'Avg Cycles' - block average sampled cycles + 'Branch Counter' - block branch counter histogram (with -v showing the number) --skip-empty:: Do not print 0 results in the --stat output. diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt index 84d49f9241b1..3db64954a267 100644 --- a/tools/perf/Documentation/perf-sched.txt +++ b/tools/perf/Documentation/perf-sched.txt @@ -212,6 +212,15 @@ OPTIONS for 'perf sched timehist' --state:: Show task state when it switched out. +--show-prio:: + Show task priority. + +--prio:: + Only show events for given task priority(ies). Multiple priorities can be + provided as a comma-separated list with no spaces: 0,120. Ranges of + priorities are specified with -: 120-129. A combination of both can also be + provided: 0,120-129. + OPTIONS for 'perf sched replay' ------------------------------ diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index ff086ef05a0c..b72866ef270b 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -134,7 +134,7 @@ OPTIONS srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output, brstackinsn, brstackinsnlen, brstackdisasm, brstackoff, callindent, insn, disasm, insnlen, synth, phys_addr, metric, misc, srccode, ipc, data_page_size, - code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat, + code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat, brcntr, Field list can be prepended with the type, trace, sw or hw, to indicate to which event type the field list applies. @@ -369,6 +369,9 @@ OPTIONS --demangle-kernel:: Demangle kernel symbol names to human readable form (for C++ kernels). +--addr2line=<path>:: + Path to addr2line binary. + --header Show perf.data header. diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 29756a87ab6f..2bc063672486 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -498,6 +498,14 @@ To interpret the results it is usually needed to know on which CPUs the workload runs on. If needed the CPUs can be forced using taskset. +--record-tpebs:: +Enable automatic sampling on Intel TPEBS retire_latency events (event with :R +modifier). Without this option, perf would not capture dynamic retire_latency +at runtime. Currently, a zero value is assigned to the retire_latency event when +this option is not set. The TPEBS hardware feature starts from Intel Granite +Rapids microarchitecture. This option only exists in X86_64 and is meaningful on +Intel platforms with TPEBS feature. + --td-level:: Print the top-down statistics that equal the input level. It allows users to print the interested top-down metrics level instead of the diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 667e5102075e..af3e4230c72f 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -83,8 +83,8 @@ Default is to monitor all CPUS. -m <pages>:: --mmap-pages=<pages>:: Number of mmap data pages (must be a power of two) or size - specification with appended unit character - B/K/M/G. The - size is rounded up to have nearest pages power of two value. + specification in bytes with appended unit character - B/K/M/G. + The size is rounded up to the nearest power-of-two page value. -p <pid>:: --pid=<pid>:: diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt index f0da8cf63e9a..6e0cc50bbc13 100644 --- a/tools/perf/Documentation/perf-trace.txt +++ b/tools/perf/Documentation/perf-trace.txt @@ -106,8 +106,8 @@ filter out the startup phase of the program, which is often very different. -m:: --mmap-pages=:: Number of mmap data pages (must be a power of two) or size - specification with appended unit character - B/K/M/G. The - size is rounded up to have nearest pages power of two value. + specification in bytes with appended unit character - B/K/M/G. + The size is rounded up to the nearest power-of-two page value. -C:: --cpu:: diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt index ae0aee86844f..5c17fff694ee 100644 --- a/tools/perf/Documentation/topdown.txt +++ b/tools/perf/Documentation/topdown.txt @@ -325,6 +325,36 @@ other four level 2 metrics by subtracting corresponding metrics as below. Fetch_Bandwidth = Frontend_Bound - Fetch_Latency Core_Bound = Backend_Bound - Memory_Bound +TPEBS in TopDown +================ + +TPEBS (Timed PEBS) is one of the new Intel PMU features provided since Granite +Rapids microarchitecture. The TPEBS feature adds a 16 bit retire_latency field +in the Basic Info group of the PEBS record. It records the Core cycles since the +retirement of the previous instruction to the retirement of current instruction. +Please refer to Section 8.4.1 of "Intel® Architecture Instruction Set Extensions +Programming Reference" for more details about this feature. Because this feature +extends PEBS record, sampling with weight option is required to get the +retire_latency value. + + perf record -e event_name -W ... + +In the most recent release of TMA, the metrics begin to use event retire_latency +values in some of the metrics’ formulas on processors that support TPEBS feature. +For previous generations that do not support TPEBS, the values are static and +predefined per processor family by the hardware architects. Due to the diversity +of workloads in execution environments, retire_latency values measured at real +time are more accurate. Therefore, new TMA metrics that use TPEBS will provide +more accurate performance analysis results. + +To support TPEBS in TMA metrics, a new modifier :R on event is added. Perf would +capture retire_latency value of required events(event with :R in metric formula) +with perf record. The retire_latency value would be used in metric calculation. +Currently, this feature is supported through perf stat + + perf stat -M metric_name --record-tpebs ... + + [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win [2] https://sites.google.com/site/analysismethods/yasin-pubs |