diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-27 01:54:00 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-27 01:54:00 +0300 |
commit | b50ecc5aca4d18f1f0c4942f5c797bc85edef144 (patch) | |
tree | 4bb02793452d5f8a38922f1d740ea08627819f32 /tools/perf/scripts/python/arm-cs-trace-disasm.py | |
parent | 9160b68e0cf8d57243f17debcb564ce01e327ada (diff) | |
parent | 6d78089da9805787a72e52604ad4b2ed7380be3f (diff) | |
download | linux-b50ecc5aca4d18f1f0c4942f5c797bc85edef144.tar.xz |
Merge tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf record:
- Enable leader sampling for inherited task events. It was supported
only for system-wide events but the kernel started to support such
a setup since v6.12.
This is to reduce the number of PMU interrupts. The samples of the
leader event will contain counts of other events and no samples
will be generated for the other member events.
$ perf record -e '{cycles,instructions}:S' ${MYPROG}
perf report:
- Fix --branch-history option to display more branch-related
information like prediction, abort and cycles which is available
on Intel machines.
$ perf record -bg -- perf test -w brstack
$ perf report --branch-history
...
#
# Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage]
# ........ ........................ .............. .................... ......... ..... ...... ....................
#
8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - -
|
---xas_load xarray.h:171
|
|--5.68%--xas_load xarray.c:245 (cycles:1)
| xas_load xarray.c:242
| xas_load xarray.h:1260 (cycles:1)
| xas_descend xarray.c:146
| xas_load xarray.c:244 (cycles:2)
| xas_load xarray.c:245
| xas_descend xarray.c:218 (cycles:10)
...
perf stat:
- Add HWMON PMU support.
The HWMON provides various system information like CPU/GPU
temperature, fan speed and so on. Expose them as PMU events so that
users can see the values using perf stat commands.
$ perf stat -e temp_cpu,fan1 true
Performance counter stats for 'true':
60.00 'C temp_cpu
0 rpm fan1
0.000745382 seconds time elapsed
0.000883000 seconds user
0.000000000 seconds sys
- Display metric threshold in JSON output.
Some metrics define thresholds to classify value ranges. It used to
be in a different color but it won't work for JSON.
Add "metric-threshold" field to the JSON that can be one of "good",
"less good", "nearly bad" and "bad".
# perf stat -a -M TopdownL1 -j true
{"counter-value" : "18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" : 5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"}
{"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"}
{"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"}
{"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"}
{"counter-value" : "3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
{"counter-value" : "5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
{"counter-value" : "7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
{"counter-value" : "1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
...
perf sched:
- Add -P/--pre-migrations option for 'timehist' sub-command to track
time a task waited on a run-queue before migrating to a different
CPU.
$ perf sched timehist -P
time cpu task name wait time sch delay run time pre-mig time
[tid/pid] (msec) (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- --------- ---------
585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000
585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000
585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000
585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000
585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000
585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000
585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000
585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000
585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001
585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000
...
Build:
- Make libunwind opt-in (LIBUNWIND=1) rather than opt-out.
The perf tools are generally built with libelf and libdw which has
unwinder functionality. The libunwind support predates it and no
need to have duplicate unwinders by default.
- Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify
it's using libdw for handling DWARF information.
Internals:
- Do not set exclude_guest bit in the perf_event_attr by default.
This was causing a trouble in AMD IBS PMU as it doesn't support the
bit. The bit will be set when it's needed later by the fallback
logic. Also update the missing feature detection logic to make sure
not clear supported bits unnecessarily.
- Run perf test in parallel by default and mark flaky tests
"exclusive" to run them serially at the end. Some test numbers are
changed but the test can complete in less than half the time.
JSON vendor events:
- Add AMD Zen 5 events and metrics.
- Add i.MX91 and i.MX95 DDR metrics
- Fix HiSilicon HIP08 Topdown metric name.
- Support compat events on PowerPC"
* tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (232 commits)
perf tests: Fix hwmon parsing with PMU name test
perf hwmon_pmu: Ensure hwmon key union is zeroed before use
perf tests hwmon_pmu: Remove double evlist__delete()
perf/test: fix perf ftrace test on s390
perf bpf-filter: Return -ENOMEM directly when pfi allocation fails
perf test: Correct hwmon test PMU detection
perf: Remove unused del_perf_probe_events()
perf pmu: Move pmu_metrics_table__find and remove ARM override
perf jevents: Add map_for_cpu()
perf header: Pass a perf_cpu rather than a PMU to get_cpuid_str
perf header: Avoid transitive PMU includes
perf arm64 header: Use cpu argument in get_cpuid
perf header: Refactor get_cpuid to take a CPU for ARM
perf header: Move is_cpu_online to numa bench
perf jevents: fix breakage when do perf stat on system metric
perf test: Add missing __exit calls in tool/hwmon tests
perf tests: Make leader sampling test work without branch event
perf util: Remove kernel version deadcode
perf test shell trace_exit_race: Use --no-comm to avoid cases where COMM isn't resolved
perf test shell trace_exit_race: Show what went wrong in verbose mode
...
Diffstat (limited to 'tools/perf/scripts/python/arm-cs-trace-disasm.py')
-rwxr-xr-x | tools/perf/scripts/python/arm-cs-trace-disasm.py | 143 |
1 files changed, 112 insertions, 31 deletions
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py index 7aff02d84ffb..ba208c90d631 100755 --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py @@ -11,36 +11,74 @@ import os from os import path import re from subprocess import * -from optparse import OptionParser, make_option +import argparse +import platform -from perf_trace_context import perf_set_itrace_options, \ - perf_sample_insn, perf_sample_srccode +from perf_trace_context import perf_sample_srccode, perf_config_get # Below are some example commands for using this script. +# Note a --kcore recording is required for accurate decode +# due to the alternatives patching mechanism. However this +# script only supports reading vmlinux for disassembly dump, +# meaning that any patched instructions will appear +# as unpatched, but the instruction ranges themselves will +# be correct. In addition to this, source line info comes +# from Perf, and when using kcore there is no debug info. The +# following lists the supported features in each mode: +# +# +-----------+-----------------+------------------+------------------+ +# | Recording | Accurate decode | Source line dump | Disassembly dump | +# +-----------+-----------------+------------------+------------------+ +# | --kcore | yes | no | yes | +# | normal | no | yes | yes | +# +-----------+-----------------+------------------+------------------+ +# +# Output disassembly with objdump and auto detect vmlinux +# (when running on same machine.) +# perf script -s scripts/python/arm-cs-trace-disasm.py -d # -# Output disassembly with objdump: -# perf script -s scripts/python/arm-cs-trace-disasm.py \ -# -- -d objdump -k path/to/vmlinux # Output disassembly with llvm-objdump: # perf script -s scripts/python/arm-cs-trace-disasm.py \ # -- -d llvm-objdump-11 -k path/to/vmlinux +# # Output only source line and symbols: # perf script -s scripts/python/arm-cs-trace-disasm.py -# Command line parsing. -option_list = [ - # formatting options for the bottom entry of the stack - make_option("-k", "--vmlinux", dest="vmlinux_name", - help="Set path to vmlinux file"), - make_option("-d", "--objdump", dest="objdump_name", - help="Set path to objdump executable file"), - make_option("-v", "--verbose", dest="verbose", - action="store_true", default=False, - help="Enable debugging log") -] +def default_objdump(): + config = perf_config_get("annotate.objdump") + return config if config else "objdump" -parser = OptionParser(option_list=option_list) -(options, args) = parser.parse_args() +# Command line parsing. +def int_arg(v): + v = int(v) + if v < 0: + raise argparse.ArgumentTypeError("Argument must be a positive integer") + return v + +args = argparse.ArgumentParser() +args.add_argument("-k", "--vmlinux", + help="Set path to vmlinux file. Omit to autodetect if running on same machine") +args.add_argument("-d", "--objdump", nargs="?", const=default_objdump(), + help="Show disassembly. Can also be used to change the objdump path"), +args.add_argument("-v", "--verbose", action="store_true", help="Enable debugging log") +args.add_argument("--start-time", type=int_arg, help="Monotonic clock time of sample to start from. " + "See 'time' field on samples in -v mode.") +args.add_argument("--stop-time", type=int_arg, help="Monotonic clock time of sample to stop at. " + "See 'time' field on samples in -v mode.") +args.add_argument("--start-sample", type=int_arg, help="Index of sample to start from. " + "See 'index' field on samples in -v mode.") +args.add_argument("--stop-sample", type=int_arg, help="Index of sample to stop at. " + "See 'index' field on samples in -v mode.") + +options = args.parse_args() +if (options.start_time and options.stop_time and + options.start_time >= options.stop_time): + print("--start-time must less than --stop-time") + exit(2) +if (options.start_sample and options.stop_sample and + options.start_sample >= options.stop_sample): + print("--start-sample must less than --stop-sample") + exit(2) # Initialize global dicts and regular expression disasm_cache = dict() @@ -48,11 +86,23 @@ cpu_data = dict() disasm_re = re.compile(r"^\s*([0-9a-fA-F]+):") disasm_func_re = re.compile(r"^\s*([0-9a-fA-F]+)\s.*:") cache_size = 64*1024 +sample_idx = -1 glb_source_file_name = None glb_line_number = None glb_dso = None +kver = platform.release() +vmlinux_paths = [ + f"/usr/lib/debug/boot/vmlinux-{kver}.debug", + f"/usr/lib/debug/lib/modules/{kver}/vmlinux", + f"/lib/modules/{kver}/build/vmlinux", + f"/usr/lib/debug/boot/vmlinux-{kver}", + f"/boot/vmlinux-{kver}", + f"/boot/vmlinux", + f"vmlinux" +] + def get_optional(perf_dict, field): if field in perf_dict: return perf_dict[field] @@ -63,12 +113,25 @@ def get_offset(perf_dict, field): return "+%#x" % perf_dict[field] return "" +def find_vmlinux(): + if hasattr(find_vmlinux, "path"): + return find_vmlinux.path + + for v in vmlinux_paths: + if os.access(v, os.R_OK): + find_vmlinux.path = v + break + else: + find_vmlinux.path = None + + return find_vmlinux.path + def get_dso_file_path(dso_name, dso_build_id): if (dso_name == "[kernel.kallsyms]" or dso_name == "vmlinux"): - if (options.vmlinux_name): - return options.vmlinux_name; + if (options.vmlinux): + return options.vmlinux; else: - return dso_name + return find_vmlinux() if find_vmlinux() else dso_name if (dso_name == "[vdso]") : append = "/vdso" @@ -92,7 +155,7 @@ def read_disam(dso_fname, dso_start, start_addr, stop_addr): else: start_addr = start_addr - dso_start; stop_addr = stop_addr - dso_start; - disasm = [ options.objdump_name, "-d", "-z", + disasm = [ options.objdump, "-d", "-z", "--start-address="+format(start_addr,"#x"), "--stop-address="+format(stop_addr,"#x") ] disasm += [ dso_fname ] @@ -112,10 +175,10 @@ def print_disam(dso_fname, dso_start, start_addr, stop_addr): def print_sample(sample): print("Sample = { cpu: %04d addr: 0x%016x phys_addr: 0x%016x ip: 0x%016x " \ - "pid: %d tid: %d period: %d time: %d }" % \ + "pid: %d tid: %d period: %d time: %d index: %d}" % \ (sample['cpu'], sample['addr'], sample['phys_addr'], \ sample['ip'], sample['pid'], sample['tid'], \ - sample['period'], sample['time'])) + sample['period'], sample['time'], sample_idx)) def trace_begin(): print('ARM CoreSight Trace Data Assembler Dump') @@ -177,6 +240,7 @@ def print_srccode(comm, param_dict, sample, symbol, dso): def process_event(param_dict): global cache_size global options + global sample_idx sample = param_dict["sample"] comm = param_dict["comm"] @@ -187,11 +251,26 @@ def process_event(param_dict): dso_start = get_optional(param_dict, "dso_map_start") dso_end = get_optional(param_dict, "dso_map_end") symbol = get_optional(param_dict, "symbol") + map_pgoff = get_optional(param_dict, "map_pgoff") + # check for valid map offset + if (str(map_pgoff) == '[unknown]'): + map_pgoff = 0 cpu = sample["cpu"] ip = sample["ip"] addr = sample["addr"] + sample_idx += 1 + + if (options.start_time and sample["time"] < options.start_time): + return + if (options.stop_time and sample["time"] > options.stop_time): + exit(0) + if (options.start_sample and sample_idx < options.start_sample): + return + if (options.stop_sample and sample_idx > options.stop_sample): + exit(0) + if (options.verbose == True): print("Event type: %s" % name) print_sample(sample) @@ -243,9 +322,10 @@ def process_event(param_dict): # Record for previous sample packet cpu_data[str(cpu) + 'addr'] = addr - # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4 - if (start_addr == 0 and stop_addr == 4): - print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu) + # Filter out zero start_address. Optionally identify CS_ETM_TRACE_ON packet + if (start_addr == 0): + if ((stop_addr == 4) and (options.verbose == True)): + print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu) return if (start_addr < int(dso_start) or start_addr > int(dso_end)): @@ -256,19 +336,20 @@ def process_event(param_dict): print("Stop address 0x%x is out of range [ 0x%x .. 0x%x ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso)) return - if (options.objdump_name != None): + if (options.objdump != None): # It doesn't need to decrease virtual memory offset for disassembly # for kernel dso and executable file dso, so in this case we set # vm_start to zero. if (dso == "[kernel.kallsyms]" or dso_start == 0x400000): dso_vm_start = 0 + map_pgoff = 0 else: dso_vm_start = int(dso_start) dso_fname = get_dso_file_path(dso, dso_bid) if path.exists(dso_fname): - print_disam(dso_fname, dso_vm_start, start_addr, stop_addr) + print_disam(dso_fname, dso_vm_start, start_addr + map_pgoff, stop_addr + map_pgoff) else: - print("Failed to find dso %s for address range [ 0x%x .. 0x%x ]" % (dso, start_addr, stop_addr)) + print("Failed to find dso %s for address range [ 0x%x .. 0x%x ]" % (dso, start_addr + map_pgoff, stop_addr + map_pgoff)) print_srccode(comm, param_dict, sample, symbol, dso) |