<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/tools/perf/util/thread-stack.h, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-03-25T11:59:10+00:00</updated>
<entry>
<title>perf tools: Remove duplicate struct forward declarations</title>
<updated>2021-03-25T11:59:10+00:00</updated>
<author>
<name>Wan Jiabing</name>
<email>wanjiabing@vivo.com</email>
</author>
<published>2021-03-25T04:39:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=405e07010d375d2123ec9d2e22197490eb698f74'/>
<id>urn:sha1:405e07010d375d2123ec9d2e22197490eb698f74</id>
<content type='text'>
'struct evlist' has been declared at 10th line.

'struct comm' has been declared at 15th line.

Remove the duplicates

Signed-off-by: Wan Jiabing &lt;wanjiabing@vivo.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: kael_w@yeah.net
Link: http://lore.kernel.org/lkml/20210325043947.846093-1-wanjiabing@vivo.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Add thread_stack__br_sample_late()</title>
<updated>2020-05-05T19:35:29+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2020-04-29T15:07:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3749e0bbdef24efbf1698bf0dbd9575fddb9ed22'/>
<id>urn:sha1:3749e0bbdef24efbf1698bf0dbd9575fddb9ed22</id>
<content type='text'>
Add a thread stack function to create a branch stack for hardware events
where the sample records get created some time after the event occurred.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: http://lore.kernel.org/lkml/20200429150751.12570-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Add branch stack support</title>
<updated>2020-05-05T19:35:29+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2020-04-29T15:07:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=86d67180b920d178ae1c2923f50a0759d6ce1a10'/>
<id>urn:sha1:86d67180b920d178ae1c2923f50a0759d6ce1a10</id>
<content type='text'>
Intel PT already has support for creating branch stacks for each context
(per-cpu or per-thread). In the more common per-cpu case, the branch stack
is not separated for different threads, instead being cleared in between
each sample.

That approach will not work very well for adding branch stacks to
regular events. The branch stacks really need to be accumulated
separately for each thread.

As a start to accomplishing that, this patch adds support for putting
branch stack support into the thread-stack. The advantages are:

1. the branches are accumulated separately for each thread
2. the branch stack is cleared only in between continuous traces

This helps pave the way for adding branch stacks to regular events, not
just synthesized events as at present.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: http://lore.kernel.org/lkml/20200429150751.12570-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Add thread_stack__sample_late()</title>
<updated>2020-04-16T15:19:15+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2020-04-01T10:16:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4fef41bfb1d8d2ada4a18eb3ab80c2682bcbae12'/>
<id>urn:sha1:4fef41bfb1d8d2ada4a18eb3ab80c2682bcbae12</id>
<content type='text'>
Add a thread stack function to create a call chain for hardware events
where the sample records get created some time after the event occurred.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: http://lore.kernel.org/lkml/20200401101613.6201-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'perf-core-for-mingo-5.3-20190611' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core</title>
<updated>2019-06-17T18:48:14+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2019-06-17T18:48:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3ce5aceb5dee298b082adfa2baa0df5a447c1b0b'/>
<id>urn:sha1:3ce5aceb5dee298b082adfa2baa0df5a447c1b0b</id>
<content type='text'>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf record:

  Alexey Budankov:

  - Allow mixing --user-regs with --call-graph=dwarf, making sure that
    the minimal set of registers for DWARF unwinding is present in the
    set of user registers requested to be present in each sample, while
    warning the user that this may make callchains unreliable if more
    that the minimal set of registers is needed to unwind.

  yuzhoujian:

  - Add support to collect callchains from kernel or user space only,
    IOW allow setting the perf_event_attr.exclude_callchain_{kernel,user}
    bits from the command line.

perf trace:

  Arnaldo Carvalho de Melo:

  - Remove x86_64 specific syscall numbers from the augmented_raw_syscalls
    BPF in-kernel collector of augmented raw_syscalls:sys_{enter,exit}
    payloads, use instead the syscall numbers obtainer either by the
    arch specific syscalltbl generators or from audit-libs.

  - Allow 'perf trace' to ask for the number of bytes to collect for
    string arguments, for now ask for PATH_MAX, i.e. the whole
    pathnames, which ends up being just a way to speficy which syscall
    args are pathnames and thus should be read using bpf_probe_read_str().

  - Skip unknown syscalls when expanding strace like syscall groups.
    This helps using the 'string' group of syscalls to work in arm64,
    where some of the syscalls present in x86_64 that deal with
    strings, for instance 'access', are deprecated and this should not
    be asked for tracing.

  Leo Yan:

  - Exit when failing to build eBPF program.

perf config:

  Arnaldo Carvalho de Melo:

  - Bail out when a handler returns failure for a key-value pair. This
    helps with cases where processing a key-value pair is not just a
    matter of setting some tool specific knob, involving, for instance
    building a BPF program to then attach to the list of events 'perf
    trace' will use, e.g. augmented_raw_syscalls.c.

perf.data:

  Kan Liang:

  - Read and store die ID information available in new Intel processors
    in CPUID.1F in the CPU topology written in the perf.data header.

perf stat:

  Kan Liang:

  - Support per-die aggregation.

Documentation:

  Arnaldo Carvalho de Melo:

  - Update perf.data documentation about the CPU_TOPOLOGY, MEM_TOPOLOGY,
    CLOCKID and DIR_FORMAT headers.

  Song Liu:

  - Add description of headers HEADER_BPF_PROG_INFO and HEADER_BPF_BTF.

  Leo Yan:

  - Update default value for llvm.clang-bpf-cmd-template in 'man perf-config'.

JVMTI:

  Jiri Olsa:

  - Address gcc string overflow warning for strncpy()

core:

  - Remove superfluous nthreads system_wide setup in perf_evsel__alloc_fd().

Intel PT:

  Adrian Hunter:

  - Add support for samples to contain IPC ratio, collecting cycles
    information from CYC packets, showing the IPC info periodically, because
    Intel PT does not update the cycle count on every branch or instruction,
    the incremental values will often be zero.  When there are values, they
    will be the number of instructions and number of cycles since the last
    update, and thus represent the average IPC since the last IPC value.

    E.g.:

    # perf record --cpu 1 -m200000 -a -e intel_pt/cyc/u sleep 0.0001
    rounding mmap pages size to 1024M (262144 pages)
    [ perf record: Woken up 0 times to write data ]
    [ perf record: Captured and wrote 2.208 MB perf.data ]
    # perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid
    #
    &lt;SNIP + add line numbering to make sense of IPC counts e.g.: (18/3)&gt;
    1   cc1 63501.650479626: 7f5219ac27bf _int_free+0x3f   jnz 0x7f5219ac2af0       IPC: 0.81 (36/44)
    2   cc1 63501.650479626: 7f5219ac27c5 _int_free+0x45   cmp $0x1f, %rbp
    3   cc1 63501.650479626: 7f5219ac27c9 _int_free+0x49   jbe 0x7f5219ac2b00
    4   cc1 63501.650479626: 7f5219ac27cf _int_free+0x4f   test $0x8, %al
    5   cc1 63501.650479626: 7f5219ac27d1 _int_free+0x51   jnz 0x7f5219ac2b00
    6   cc1 63501.650479626: 7f5219ac27d7 _int_free+0x57   movq  0x13c58a(%rip), %rcx
    7   cc1 63501.650479626: 7f5219ac27de _int_free+0x5e   mov %rdi, %r12
    8   cc1 63501.650479626: 7f5219ac27e1 _int_free+0x61   movq  %fs:(%rcx), %rax
    9   cc1 63501.650479626: 7f5219ac27e5 _int_free+0x65   test %rax, %rax
   10   cc1 63501.650479626: 7f5219ac27e8 _int_free+0x68   jz 0x7f5219ac2821
   11   cc1 63501.650479626: 7f5219ac27ea _int_free+0x6a   leaq  -0x11(%rbp), %rdi
   12   cc1 63501.650479626: 7f5219ac27ee _int_free+0x6e   mov %rdi, %rsi
   13   cc1 63501.650479626: 7f5219ac27f1 _int_free+0x71   shr $0x4, %rsi
   14   cc1 63501.650479626: 7f5219ac27f5 _int_free+0x75   cmpq  %rsi, 0x13caf4(%rip)
   15   cc1 63501.650479626: 7f5219ac27fc _int_free+0x7c   jbe 0x7f5219ac2821
   16   cc1 63501.650479626: 7f5219ac2821 _int_free+0xa1   cmpq  0x13f138(%rip), %rbp
   17   cc1 63501.650479626: 7f5219ac2828 _int_free+0xa8   jnbe 0x7f5219ac28d8
   18   cc1 63501.650479626: 7f5219ac28d8 _int_free+0x158  testb  $0x2, 0x8(%rbx)
   19   cc1 63501.650479628: 7f5219ac28dc _int_free+0x15c  jnz 0x7f5219ac2ab0       IPC: 6.00 (18/3)
    &lt;SNIP&gt;

  - Allow using time ranges with Intel PT, i.e. these features, already
    present but not optimially usable with Intel PT, should be now:

        Select the second 10% time slice:

        $ perf script --time 10%/2

        Select from 0% to 10% time slice:

        $ perf script --time 0%-10%

        Select the first and second 10% time slices:

        $ perf script --time 10%/1,10%/2

        Select from 0% to 10% and 30% to 40% slices:

        $ perf script --time 0%-10%,30%-40%

cs-etm (ARM):

  Mathieu Poirier:

  - Add support for CPU-wide trace scenarios.

s390:

  Thomas Richter:

  - Fix missing kvm module load for s390.

  - Fix OOM error in TUI mode on s390

  - Support s390 diag event display when doing analysis on !s390
    architectures.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288</title>
<updated>2019-06-05T15:36:37+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-29T14:18:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2025cf9e193de05b0654570dd639acb49ebd3adf'/>
<id>urn:sha1:2025cf9e193de05b0654570dd639acb49ebd3adf</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms and conditions of the gnu general public license
  version 2 as published by the free software foundation this program
  is distributed in the hope it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 263 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Reviewed-by: Alexios Zavras &lt;alexios.zavras@intel.com&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Accumulate IPC information</title>
<updated>2019-06-05T12:47:57+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-05-20T11:37:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=003ccdc7165accee073ce261fc670f64cc98d0f7'/>
<id>urn:sha1:003ccdc7165accee073ce261fc670f64cc98d0f7</id>
<content type='text'>
Cycle and instruction counts are added to the stack. The IPC of a
function and all functions it calls, is also recorded.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: http://lkml.kernel.org/r/20190520113728.14389-14-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf db-export: Add calls parent_id to enable creation of call trees</title>
<updated>2019-03-01T17:50:47+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-02-28T13:00:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f435887ec0c941b97301bd6ed1f3e4b5200df409'/>
<id>urn:sha1:f435887ec0c941b97301bd6ed1f3e4b5200df409</id>
<content type='text'>
The call_path can be used to find the parent symbol for a call but not
the exact parent call. To do that add parent_id to the call_return
export. This enables the creation of a call tree from the exported data.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: https://lkml.kernel.org/n/tip-6j7tzdxo67cox6kan7k22oo6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Represent jmps to the start of a different symbol</title>
<updated>2019-02-06T13:00:40+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-01-09T09:18:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f08046cb3082b313e7b08dc35838cf8bd902c36b'/>
<id>urn:sha1:f08046cb3082b313e7b08dc35838cf8bd902c36b</id>
<content type='text'>
The compiler might optimize a call/ret combination by making it a jmp.
However the thread-stack does not presently cater for that, so that such
control flow is not visible in the call graph. Make it visible by
recording on the stack a branch to the start of a different symbol.
Note, that means when a ret pops the stack, all jmps must be popped off
first.

Example:

  $ cat jmp-to-fn.c
  __attribute__((noinline)) int bar(void)
  {
          return -1;
  }

  __attribute__((noinline)) int foo(void)
  {
          return bar() + 1;
  }

  int main()
  {
          return foo();
  }
  $ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c
  $ objdump -d jmp-to-fn
  &lt;SNIP&gt;
  0000000000001040 &lt;main&gt;:
      1040:       31 c0                   xor    %eax,%eax
      1042:       e9 09 01 00 00          jmpq   1150 &lt;foo&gt;
  &lt;SNIP&gt;
  0000000000001140 &lt;bar&gt;:
      1140:       b8 ff ff ff ff          mov    $0xffffffff,%eax
      1145:       c3                      retq
  &lt;SNIP&gt;
  0000000000001150 &lt;foo&gt;:
      1150:       31 c0                   xor    %eax,%eax
      1152:       e8 e9 ff ff ff          callq  1140 &lt;bar&gt;
      1157:       83 c0 01                add    $0x1,%eax
      115a:       c3                      retq
  &lt;SNIP&gt;
  $ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ]
  $ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
  2019-01-08 13:24:58.783069 Creating database...
  2019-01-08 13:24:58.794650 Writing records...
  2019-01-08 13:24:59.008050 Adding indexes
  2019-01-08 13:24:59.015802 Done
  $  ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db

Before:

    main
        -&gt; bar

After:

    main
        -&gt; foo
            -&gt; bar

Committer testing:

Install the python2-pyside package, then select these menu options
on the GUI:

   "Reports"
      "Context sensitive callgraphs"

Then go on expanding the symbols, to get, full picture when doing this
on a fedora:29 with gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC):

jmp-to-fn
  PID:TID
    _start                (ld-2.28.so)
      __libc_start_main
        main
          foo
            bar

To verify that indeed, this fixes the problem.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Fix thread stack processing for the idle task</title>
<updated>2019-01-02T14:03:17+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2018-12-21T12:06:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=256d92bc93fd40411a02be5cdba74a7bf91e6e09'/>
<id>urn:sha1:256d92bc93fd40411a02be5cdba74a7bf91e6e09</id>
<content type='text'>
perf creates a single 'struct thread' to represent the idle task. That
is because threads are identified by PID and TID, and the idle task
always has PID == TID == 0.

However, there are actually separate idle tasks for each CPU. That
creates a problem for thread stack processing which assumes that each
thread has a single stack, not one stack per CPU.

Fix that by passing through the CPU number, and in the case of the idle
"thread", pick the thread stack from an array based on the CPU number.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20181221120620.9659-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
</feed>
