summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r--tools/perf/Documentation/perf-annotate.txt3
-rw-r--r--tools/perf/Documentation/perf-check.txt82
-rw-r--r--tools/perf/Documentation/perf-ftrace.txt48
-rw-r--r--tools/perf/Documentation/perf-kvm.txt6
-rw-r--r--tools/perf/Documentation/perf-list.txt1
-rw-r--r--tools/perf/Documentation/perf-mem.txt94
-rw-r--r--tools/perf/Documentation/perf-record.txt14
-rw-r--r--tools/perf/Documentation/perf-report.txt1
-rw-r--r--tools/perf/Documentation/perf-sched.txt9
-rw-r--r--tools/perf/Documentation/perf-script.txt5
-rw-r--r--tools/perf/Documentation/perf-stat.txt8
-rw-r--r--tools/perf/Documentation/perf-top.txt4
-rw-r--r--tools/perf/Documentation/perf-trace.txt4
-rw-r--r--tools/perf/Documentation/topdown.txt30
14 files changed, 265 insertions, 44 deletions
diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt
index b95524bea021..156c5f37b051 100644
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@@ -165,6 +165,9 @@ include::itrace.txt[]
--type-stat::
Show stats for the data type annotation.
+--skip-empty::
+ Do not display empty (or dummy) events.
+
SEE ALSO
--------
diff --git a/tools/perf/Documentation/perf-check.txt b/tools/perf/Documentation/perf-check.txt
new file mode 100644
index 000000000000..10f69fb6850b
--- /dev/null
+++ b/tools/perf/Documentation/perf-check.txt
@@ -0,0 +1,82 @@
+perf-check(1)
+===============
+
+NAME
+----
+perf-check - check if features are present in perf
+
+SYNOPSIS
+--------
+[verse]
+'perf check' [<options>]
+'perf check' {feature <feature_list>} [<options>]
+
+DESCRIPTION
+-----------
+With no subcommands given, 'perf check' command just prints the command
+usage on the standard output.
+
+If the subcommand 'feature' is used, then status of feature is printed
+on the standard output (unless '-q' is also passed), ie. whether it is
+compiled-in/built-in or not.
+Also, 'perf check feature' returns with exit status 0 if the feature
+is built-in, otherwise returns with exit status 1.
+
+SUBCOMMANDS
+-----------
+
+feature::
+
+ Print whether feature(s) is compiled-in or not, and also returns with an
+ exit status of 0, if passed feature(s) are compiled-in, else 1.
+
+ It expects a feature list as an argument. There can be a single feature
+ name/macro, or multiple features can also be passed as a comma-separated
+ list, in which case the exit status will be 0 only if all of the passed
+ features are compiled-in.
+
+ The feature names/macros are case-insensitive.
+
+ Example Usage:
+ perf check feature libtraceevent
+ perf check feature HAVE_LIBTRACEEVENT
+ perf check feature libtraceevent,bpf
+
+ Supported feature names/macro:
+ aio / HAVE_AIO_SUPPORT
+ bpf / HAVE_LIBBPF_SUPPORT
+ bpf_skeletons / HAVE_BPF_SKEL
+ debuginfod / HAVE_DEBUGINFOD_SUPPORT
+ dwarf / HAVE_DWARF_SUPPORT
+ dwarf_getlocations / HAVE_DWARF_GETLOCATIONS_SUPPORT
+ dwarf-unwind / HAVE_DWARF_UNWIND_SUPPORT
+ auxtrace / HAVE_AUXTRACE_SUPPORT
+ libaudit / HAVE_LIBAUDIT_SUPPORT
+ libbfd / HAVE_LIBBFD_SUPPORT
+ libcapstone / HAVE_LIBCAPSTONE_SUPPORT
+ libcrypto / HAVE_LIBCRYPTO_SUPPORT
+ libdw-dwarf-unwind / HAVE_DWARF_SUPPORT
+ libelf / HAVE_LIBELF_SUPPORT
+ libnuma / HAVE_LIBNUMA_SUPPORT
+ libopencsd / HAVE_CSTRACE_SUPPORT
+ libperl / HAVE_LIBPERL_SUPPORT
+ libpfm4 / HAVE_LIBPFM
+ libpython / HAVE_LIBPYTHON_SUPPORT
+ libslang / HAVE_SLANG_SUPPORT
+ libtraceevent / HAVE_LIBTRACEEVENT
+ libunwind / HAVE_LIBUNWIND_SUPPORT
+ lzma / HAVE_LZMA_SUPPORT
+ numa_num_possible_cpus / HAVE_LIBNUMA_SUPPORT
+ syscall_table / HAVE_SYSCALL_TABLE_SUPPORT
+ zlib / HAVE_ZLIB_SUPPORT
+ zstd / HAVE_ZSTD_SUPPORT
+
+OPTIONS
+-------
+-q::
+--quiet::
+ Do not print any messages or warnings
+
+ This can be used along with subcommands such as 'perf check feature'
+ to hide unnecessary output in test scripts, eg.
+ 'perf check feature --quiet libtraceevent'
diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt
index d780b93fcf87..eaec8253be68 100644
--- a/tools/perf/Documentation/perf-ftrace.txt
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -9,7 +9,7 @@ perf-ftrace - simple wrapper for kernel's ftrace functionality
SYNOPSIS
--------
[verse]
-'perf ftrace' {trace|latency} <command>
+'perf ftrace' {trace|latency|profile} <command>
DESCRIPTION
-----------
@@ -23,6 +23,9 @@ kernel's ftrace infrastructure.
'perf ftrace latency' calculates execution latency of a given function
(optionally with BPF) and display it as a histogram.
+ 'perf ftrace profile' show a execution profile for each function including
+ total, average, max time and the number of calls.
+
The following options apply to perf ftrace.
COMMON OPTIONS
@@ -125,6 +128,7 @@ OPTIONS for 'perf ftrace trace'
- verbose - Show process names, PIDs, timestamps, etc.
- thresh=<n> - Setup trace duration threshold in microseconds.
- depth=<n> - Set max depth for function graph tracer to follow.
+ - tail - Print function name at the end.
OPTIONS for 'perf ftrace latency'
@@ -145,6 +149,48 @@ OPTIONS for 'perf ftrace latency'
Use nano-second instead of micro-second as a base unit of the histogram.
+OPTIONS for 'perf ftrace profile'
+---------------------------------
+
+-T::
+--trace-funcs=::
+ Set function filter on the given function (or a glob pattern).
+ Multiple functions can be given by using this option more than once.
+ The function argument also can be a glob pattern. It will be passed
+ to 'set_ftrace_filter' in tracefs.
+
+-N::
+--notrace-funcs=::
+ Do not trace functions given by the argument. Like -T option, this
+ can be used more than once to specify multiple functions (or glob
+ patterns). It will be passed to 'set_ftrace_notrace' in tracefs.
+
+-G::
+--graph-funcs=::
+ Set graph filter on the given function (or a glob pattern). This is
+ useful to trace for functions executed from the given function. This
+ can be used more than once to specify multiple functions. It will be
+ passed to 'set_graph_function' in tracefs.
+
+-g::
+--nograph-funcs=::
+ Set graph notrace filter on the given function (or a glob pattern).
+ Like -G option, this is useful for the function_graph tracer only and
+ disables tracing for function executed from the given function. This
+ can be used more than once to specify multiple functions. It will be
+ passed to 'set_graph_notrace' in tracefs.
+
+-m::
+--buffer-size::
+ Set the size of per-cpu tracing buffer, <size> is expected to
+ be a number with appended unit character - B/K/M/G.
+
+-s::
+--sort=::
+ Sort the result by the given field. Available values are:
+ total, avg, max, count, name. Default is 'total'.
+
+
SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-trace[1]
diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt
index b66be66fe836..c26524d38f47 100644
--- a/tools/perf/Documentation/perf-kvm.txt
+++ b/tools/perf/Documentation/perf-kvm.txt
@@ -115,9 +115,9 @@ STAT LIVE OPTIONS
-m::
--mmap-pages=::
- Number of mmap data pages (must be a power of two) or size
- specification with appended unit character - B/K/M/G. The
- size is rounded up to have nearest pages power of two value.
+ Number of mmap data pages (must be a power of two) or size
+ specification in bytes with appended unit character - B/K/M/G.
+ The size is rounded up to the nearest power-of-two page value.
-a::
--all-cpus::
diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index 6bf2468f59d3..dea005410ec0 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -72,6 +72,7 @@ counted. The following modifiers exist:
W - group is weak and will fallback to non-group if not schedulable,
e - group or event are exclusive and do not share the PMU
b - use BPF aggregration (see perf stat --bpf-counters)
+ R - retire latency value of the event
The 'p' modifier can be used for specifying how precise the instruction
address should be. The 'p' modifier can be specified multiple times:
diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 47456b212e99..8a1bd9ff0f86 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -28,15 +28,8 @@ and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
Due to the statistical nature of SPE sampling, not every memory operation will
be sampled.
-OPTIONS
--------
-<command>...::
- Any command you can specify in a shell.
-
--i::
---input=<file>::
- Input file name.
-
+COMMON OPTIONS
+--------------
-f::
--force::
Don't do ownership validation
@@ -45,24 +38,9 @@ OPTIONS
--type=<type>::
Select the memory operation type: load or store (default: load,store)
--D::
---dump-raw-samples::
- Dump the raw decoded samples on the screen in a format that is easy to parse with
- one sample per line.
-
--x::
---field-separator=<separator>::
- Specify the field separator used when dump raw samples (-D option). By default,
- The separator is the space character.
-
--C::
---cpu=<cpu>::
- Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
- comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. Default
- is to monitor all CPUS.
--U::
---hide-unresolved::
- Only display entries resolved to a symbol.
+-v::
+--verbose::
+ Be more verbose (show counter open errors, etc)
-p::
--phys-data::
@@ -73,6 +51,9 @@ OPTIONS
RECORD OPTIONS
--------------
+<command>...::
+ Any command you can specify in a shell.
+
-e::
--event <event>::
Event selector. Use 'perf mem record -e list' to list available events.
@@ -85,14 +66,65 @@ RECORD OPTIONS
--all-user::
Configure all used events to run in user space.
--v::
---verbose::
- Be more verbose (show counter open errors, etc)
-
--ldlat <n>::
Specify desired latency for loads event. Supported on Intel and Arm64
processors only. Ignored on other archs.
+REPORT OPTIONS
+--------------
+-i::
+--input=<file>::
+ Input file name.
+
+-C::
+--cpu=<cpu>::
+ Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
+ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -
+ like 0-2. Default is to monitor all CPUS.
+
+-D::
+--dump-raw-samples::
+ Dump the raw decoded samples on the screen in a format that is easy to parse with
+ one sample per line.
+
+-s::
+--sort=<key>::
+ Group result by given key(s) - multiple keys can be specified
+ in CSV format. The keys are specific to memory samples are:
+ symbol_daddr, symbol_iaddr, dso_daddr, locked, tlb, mem, snoop,
+ dcacheline, phys_daddr, data_page_size, blocked.
+
+ - symbol_daddr: name of data symbol being executed on at the time of sample
+ - symbol_iaddr: name of code symbol being executed on at the time of sample
+ - dso_daddr: name of library or module containing the data being executed
+ on at the time of the sample
+ - locked: whether the bus was locked at the time of the sample
+ - tlb: type of tlb access for the data at the time of the sample
+ - mem: type of memory access for the data at the time of the sample
+ - snoop: type of snoop (if any) for the data at the time of the sample
+ - dcacheline: the cacheline the data address is on at the time of the sample
+ - phys_daddr: physical address of data being executed on at the time of sample
+ - data_page_size: the data page size of data being executed on at the time of sample
+ - blocked: reason of blocked load access for the data at the time of the sample
+
+ And the default sort keys are changed to local_weight, mem, sym, dso,
+ symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.
+
+-T::
+--type-profile::
+ Show data-type profile result instead of code symbols. This requires
+ the debug information and it will change the default sort keys to:
+ mem, snoop, tlb, type.
+
+-U::
+--hide-unresolved::
+ Only display entries resolved to a symbol.
+
+-x::
+--field-separator=<separator>::
+ Specify the field separator used when dump raw samples (-D option). By default,
+ The separator is the space character.
+
In addition, for report all perf report options are valid, and for record
all perf record options.
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d6532ed97c02..242223240a08 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -273,10 +273,11 @@ OPTIONS
-m::
--mmap-pages=::
Number of mmap data pages (must be a power of two) or size
- specification with appended unit character - B/K/M/G. The
- size is rounded up to have nearest pages power of two value.
- Also, by adding a comma, the number of mmap pages for AUX
- area tracing can be specified.
+ specification in bytes with appended unit character - B/K/M/G.
+ The size is rounded up to the nearest power-of-two page value.
+ By adding a comma, an additional parameter with the same
+ semantics used for the normal mmap areas can be specified for
+ AUX tracing area.
-g::
Enables call-graph (stack chain/backtrace) recording for both
@@ -828,6 +829,11 @@ filtered through the mask provided by -C option.
only, as of now. So the applications built without the frame
pointer might see bogus addresses.
+--setup-filter=<action>::
+ Prepare BPF filter to be used by regular users. The action should be
+ either "pin" or "unpin". The filter can be used after it's pinned.
+
+
include::intel-hybrid.txt[]
SEE ALSO
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index d2b1593ef700..7c66d81ab978 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -614,6 +614,7 @@ include::itrace.txt[]
'Avg Cycles%' - block average sampled cycles / sum of total block average
sampled cycles
'Avg Cycles' - block average sampled cycles
+ 'Branch Counter' - block branch counter histogram (with -v showing the number)
--skip-empty::
Do not print 0 results in the --stat output.
diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt
index 84d49f9241b1..3db64954a267 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -212,6 +212,15 @@ OPTIONS for 'perf sched timehist'
--state::
Show task state when it switched out.
+--show-prio::
+ Show task priority.
+
+--prio::
+ Only show events for given task priority(ies). Multiple priorities can be
+ provided as a comma-separated list with no spaces: 0,120. Ranges of
+ priorities are specified with -: 120-129. A combination of both can also be
+ provided: 0,120-129.
+
OPTIONS for 'perf sched replay'
------------------------------
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index ff086ef05a0c..b72866ef270b 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -134,7 +134,7 @@ OPTIONS
srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
brstackinsn, brstackinsnlen, brstackdisasm, brstackoff, callindent, insn, disasm,
insnlen, synth, phys_addr, metric, misc, srccode, ipc, data_page_size,
- code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat,
+ code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat, brcntr,
Field list can be prepended with the type, trace, sw or hw,
to indicate to which event type the field list applies.
@@ -369,6 +369,9 @@ OPTIONS
--demangle-kernel::
Demangle kernel symbol names to human readable form (for C++ kernels).
+--addr2line=<path>::
+ Path to addr2line binary.
+
--header
Show perf.data header.
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 29756a87ab6f..2bc063672486 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -498,6 +498,14 @@ To interpret the results it is usually needed to know on which
CPUs the workload runs on. If needed the CPUs can be forced using
taskset.
+--record-tpebs::
+Enable automatic sampling on Intel TPEBS retire_latency events (event with :R
+modifier). Without this option, perf would not capture dynamic retire_latency
+at runtime. Currently, a zero value is assigned to the retire_latency event when
+this option is not set. The TPEBS hardware feature starts from Intel Granite
+Rapids microarchitecture. This option only exists in X86_64 and is meaningful on
+Intel platforms with TPEBS feature.
+
--td-level::
Print the top-down statistics that equal the input level. It allows
users to print the interested top-down metrics level instead of the
diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 667e5102075e..af3e4230c72f 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -83,8 +83,8 @@ Default is to monitor all CPUS.
-m <pages>::
--mmap-pages=<pages>::
Number of mmap data pages (must be a power of two) or size
- specification with appended unit character - B/K/M/G. The
- size is rounded up to have nearest pages power of two value.
+ specification in bytes with appended unit character - B/K/M/G.
+ The size is rounded up to the nearest power-of-two page value.
-p <pid>::
--pid=<pid>::
diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt
index f0da8cf63e9a..6e0cc50bbc13 100644
--- a/tools/perf/Documentation/perf-trace.txt
+++ b/tools/perf/Documentation/perf-trace.txt
@@ -106,8 +106,8 @@ filter out the startup phase of the program, which is often very different.
-m::
--mmap-pages=::
Number of mmap data pages (must be a power of two) or size
- specification with appended unit character - B/K/M/G. The
- size is rounded up to have nearest pages power of two value.
+ specification in bytes with appended unit character - B/K/M/G.
+ The size is rounded up to the nearest power-of-two page value.
-C::
--cpu::
diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
index ae0aee86844f..5c17fff694ee 100644
--- a/tools/perf/Documentation/topdown.txt
+++ b/tools/perf/Documentation/topdown.txt
@@ -325,6 +325,36 @@ other four level 2 metrics by subtracting corresponding metrics as below.
Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
Core_Bound = Backend_Bound - Memory_Bound
+TPEBS in TopDown
+================
+
+TPEBS (Timed PEBS) is one of the new Intel PMU features provided since Granite
+Rapids microarchitecture. The TPEBS feature adds a 16 bit retire_latency field
+in the Basic Info group of the PEBS record. It records the Core cycles since the
+retirement of the previous instruction to the retirement of current instruction.
+Please refer to Section 8.4.1 of "Intel® Architecture Instruction Set Extensions
+Programming Reference" for more details about this feature. Because this feature
+extends PEBS record, sampling with weight option is required to get the
+retire_latency value.
+
+ perf record -e event_name -W ...
+
+In the most recent release of TMA, the metrics begin to use event retire_latency
+values in some of the metrics’ formulas on processors that support TPEBS feature.
+For previous generations that do not support TPEBS, the values are static and
+predefined per processor family by the hardware architects. Due to the diversity
+of workloads in execution environments, retire_latency values measured at real
+time are more accurate. Therefore, new TMA metrics that use TPEBS will provide
+more accurate performance analysis results.
+
+To support TPEBS in TMA metrics, a new modifier :R on event is added. Perf would
+capture retire_latency value of required events(event with :R in metric formula)
+with perf record. The retire_latency value would be used in metric calculation.
+Currently, this feature is supported through perf stat
+
+ perf stat -M metric_name --record-tpebs ...
+
+
[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
[2] https://sites.google.com/site/analysismethods/yasin-pubs