diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-25 23:20:41 +0300 | 
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-25 23:20:41 +0300 | 
| commit | 7e4dc77b2869a683fc43c0394fca5441816390ba (patch) | |
| tree | 62e734c599bc1da2712fdb63be996622c415a83a /tools/perf/scripts/python/stackcollapse.py | |
| parent | 89e7eb098adfe342bc036f00201eb579d448f033 (diff) | |
| parent | 5048c2af078d5976895d521262a8802ea791f3b0 (diff) | |
| download | linux-7e4dc77b2869a683fc43c0394fca5441816390ba.tar.xz | |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "With over 300 commits it's been a busy cycle - with most of the work
  concentrated on the tooling side (as it should).
  The main kernel side enhancements were:
   - Add per event callchain limit: Recently we introduced a sysctl to
     tune the max-stack for all events for which callchains were
     requested:
       $ sysctl kernel.perf_event_max_stack
       kernel.perf_event_max_stack = 127
     Now this patch introduces a way to configure this per event, i.e.
     this becomes possible:
       $ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a
     allowing finer tuning of how much buffer space callchains use.
     This uses an u16 from the reserved space at the end, leaving
     another u16 for future use.
     There has been interest in even finer tuning, namely to control the
     max stack for kernel and userspace callchains separately.  Further
     discussion is needed, we may for instance use the remaining u16 for
     that and when it is present, assume that the sample_max_stack
     introduced in this patch applies for the kernel, and the u16 left
     is used for limiting the userspace callchain (Arnaldo Carvalho de
     Melo)
   - Optimize AUX event (hardware assisted side-band event) delivery
     (Kan Liang)
   - Rework Intel family name macro usage (this is partially x86 arch
     work) (Dave Hansen)
   - Refine and fix Intel LBR support (David Carrillo-Cisneros)
   - Add support for Intel 'TopDown' events (Andi Kleen)
   - Intel uncore PMU driver fixes and enhancements (Kan Liang)
   - ... other misc changes.
  Here's an incomplete list of the tooling enhancements (but there's
  much more, see the shortlog and the git log for details):
   - Support cross unwinding, i.e.  collecting '--call-graph dwarf'
     perf.data files in one machine and then doing analysis in another
     machine of a different hardware architecture.  This enables, for
     instance, to do:
       $ perf record -a --call-graph dwarf
     on a x86-32 or aarch64 system and then do 'perf report' on it on a
     x86_64 workstation (He Kuang)
   - Allow reading from a backward ring buffer (one setup via
     sys_perf_event_open() with perf_event_attr.write_backward = 1)
     (Wang Nan)
   - Finish merging initial SDT (Statically Defined Traces) support, see
     cset comments for details about how it all works (Masami Hiramatsu)
   - Support attaching eBPF programs to tracepoints (Wang Nan)
   - Add demangling of symbols in programs written in the Rust language
     (David Tolnay)
   - Add support for tracepoints in the python binding, including an
     example, that sets up and parses sched:sched_switch events,
     tools/perf/python/tracepoint.py (Jiri Olsa)
   - Introduce --stdio-color to set up the color output mode selection
     in 'annotate' and 'report', allowing emit color escape sequences
     when redirecting the output of these tools (Arnaldo Carvalho de
     Melo)
   - Add 'callindent' option to 'perf script -F', to indent the Intel PT
     call stack, making this output more ftrace-like (Adrian Hunter,
     Andi Kleen)
   - Allow dumping the object files generated by llvm when processing
     eBPF scriptlet events (Wang Nan)
   - Add stackcollapse.py script to help generating flame graphs (Paolo
     Bonzini)
   - Add --ldlat option to 'perf mem' to specify load latency for loads
     event (e.g. cpu/mem-loads/ ) (Jiri Olsa)
   - Tooling support for Intel TopDown counters, recently added to the
     kernel (Andi Kleen)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
  perf tests: Add is_printable_array test
  perf tools: Make is_printable_array global
  perf script python: Fix string vs byte array resolving
  perf probe: Warn unmatched function filter correctly
  perf cpu_map: Add more helpers
  perf stat: Balance opening and reading events
  tools: Copy linux/{hash,poison}.h and check for drift
  perf tools: Remove include/linux/list.h from perf's MANIFEST
  tools: Copy the bitops files accessed from the kernel and check for drift
  Remove: kernel unistd*h files from perf's MANIFEST, not used
  perf tools: Remove tools/perf/util/include/linux/const.h
  perf tools: Remove tools/perf/util/include/asm/byteorder.h
  perf tools: Add missing linux/compiler.h include to perf-sys.h
  perf jit: Remove some no-op error handling
  perf jit: Add missing curly braces
  objtool: Initialize variable to silence old compiler
  objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
  perf record: Add --tail-synthesize option
  perf session: Don't warn about out of order event if write_backward is used
  perf tools: Enable overwrite settings
  ...
Diffstat (limited to 'tools/perf/scripts/python/stackcollapse.py')
| -rwxr-xr-x | tools/perf/scripts/python/stackcollapse.py | 125 | 
1 files changed, 125 insertions, 0 deletions
| diff --git a/tools/perf/scripts/python/stackcollapse.py b/tools/perf/scripts/python/stackcollapse.py new file mode 100755 index 000000000000..5a605f70ef32 --- /dev/null +++ b/tools/perf/scripts/python/stackcollapse.py @@ -0,0 +1,125 @@ +# stackcollapse.py - format perf samples with one line per distinct call stack +# +# This script's output has two space-separated fields.  The first is a semicolon +# separated stack including the program name (from the "comm" field) and the +# function names from the call stack.  The second is a count: +# +#  swapper;start_kernel;rest_init;cpu_idle;default_idle;native_safe_halt 2 +# +# The file is sorted according to the first field. +# +# Input may be created and processed using: +# +#  perf record -a -g -F 99 sleep 60 +#  perf script report stackcollapse > out.stacks-folded +# +# (perf script record stackcollapse works too). +# +# Written by Paolo Bonzini <pbonzini@redhat.com> +# Based on Brendan Gregg's stackcollapse-perf.pl script. + +import os +import sys +from collections import defaultdict +from optparse import OptionParser, make_option + +sys.path.append(os.environ['PERF_EXEC_PATH'] + \ +                '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') + +from perf_trace_context import * +from Core import * +from EventClass import * + +# command line parsing + +option_list = [ +    # formatting options for the bottom entry of the stack +    make_option("--include-tid", dest="include_tid", +                 action="store_true", default=False, +                 help="include thread id in stack"), +    make_option("--include-pid", dest="include_pid", +                 action="store_true", default=False, +                 help="include process id in stack"), +    make_option("--no-comm", dest="include_comm", +                 action="store_false", default=True, +                 help="do not separate stacks according to comm"), +    make_option("--tidy-java", dest="tidy_java", +                 action="store_true", default=False, +                 help="beautify Java signatures"), +    make_option("--kernel", dest="annotate_kernel", +                 action="store_true", default=False, +                 help="annotate kernel functions with _[k]") +] + +parser = OptionParser(option_list=option_list) +(opts, args) = parser.parse_args() + +if len(args) != 0: +    parser.error("unexpected command line argument") +if opts.include_tid and not opts.include_comm: +    parser.error("requesting tid but not comm is invalid") +if opts.include_pid and not opts.include_comm: +    parser.error("requesting pid but not comm is invalid") + +# event handlers + +lines = defaultdict(lambda: 0) + +def process_event(param_dict): +    def tidy_function_name(sym, dso): +        if sym is None: +            sym = '[unknown]' + +        sym = sym.replace(';', ':') +        if opts.tidy_java: +            # the original stackcollapse-perf.pl script gives the +            # example of converting this: +            #    Lorg/mozilla/javascript/MemberBox;.<init>(Ljava/lang/reflect/Method;)V +            # to this: +            #    org/mozilla/javascript/MemberBox:.init +            sym = sym.replace('<', '') +            sym = sym.replace('>', '') +            if sym[0] == 'L' and sym.find('/'): +                sym = sym[1:] +            try: +                sym = sym[:sym.index('(')] +            except ValueError: +                pass + +        if opts.annotate_kernel and dso == '[kernel.kallsyms]': +            return sym + '_[k]' +        else: +            return sym + +    stack = list() +    if 'callchain' in param_dict: +        for entry in param_dict['callchain']: +            entry.setdefault('sym', dict()) +            entry['sym'].setdefault('name', None) +            entry.setdefault('dso', None) +            stack.append(tidy_function_name(entry['sym']['name'], +                                            entry['dso'])) +    else: +        param_dict.setdefault('symbol', None) +        param_dict.setdefault('dso', None) +        stack.append(tidy_function_name(param_dict['symbol'], +                                        param_dict['dso'])) + +    if opts.include_comm: +        comm = param_dict["comm"].replace(' ', '_') +        sep = "-" +        if opts.include_pid: +            comm = comm + sep + str(param_dict['sample']['pid']) +            sep = "/" +        if opts.include_tid: +            comm = comm + sep + str(param_dict['sample']['tid']) +        stack.append(comm) + +    stack_string = ';'.join(reversed(stack)) +    lines[stack_string] = lines[stack_string] + 1 + +def trace_end(): +    list = lines.keys() +    list.sort() +    for stack in list: +        print "%s %d" % (stack, lines[stack]) | 
