summaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2019-02-22perf thread-stack: Improve thread_stack__no_call_return()Adrian Hunter1-3/+46
Improve thread_stack__no_call_return() to better handle 'returns' that do not match the stack i.e. 'no call'. See code comments for details. The example below shows how retpolines are affected: Example: $ cat simple-retpoline.c __attribute__((noinline)) int bar(void) { return -1; } int foo(void) { return bar() + 1; } __attribute__((indirect_branch("thunk"))) int main() { int (*volatile fn)(void) = foo; fn(); return fn(); } $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c $ objdump -d simple-retpoline <SNIP> 0000000000001040 <main>: 1040: 48 83 ec 18 sub $0x18,%rsp 1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo> 104b: 48 89 44 24 08 mov %rax,0x8(%rsp) 1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax 1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax> 105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax 105f: 48 83 c4 18 add $0x18,%rsp 1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax> <SNIP> 0000000000001160 <bar>: 1160: b8 ff ff ff ff mov $0xffffffff,%eax 1165: c3 retq <SNIP> 0000000000001170 <foo>: 1170: e8 eb ff ff ff callq 1160 <bar> 1175: 83 c0 01 add $0x1,%eax 1178: c3 retq 0000000000001179 <__x86_indirect_thunk_rax>: 1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc> 117e: f3 90 pause 1180: 0f ae e8 lfence 1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5> 1185: 48 89 04 24 mov %rax,(%rsp) 1189: c3 retq <SNIP> $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ] $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls 2019-01-08 14:03:37.851655 Creating database... 2019-01-08 14:03:37.863256 Writing records... 2019-01-08 14:03:38.069750 Adding indexes 2019-01-08 14:03:38.078799 Done $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db Before: main -> __x86_indirect_thunk_rax -> __x86_indirect_thunk_rax -> __x86_indirect_thunk_rax -> bar After: main -> __x86_indirect_thunk_rax -> __x86_indirect_thunk_rax -> foo -> bar Committer testing: Chose "Reports", Then "Context-Sensitive Call Graph" and then go on expanding: Before: simple-retpolin PID:PID _start _start __libc_start_main main __x86_indirect_thunk_rax __x86_indirect_thunk_rax bar After: Remove the "simple.retpoline.db" file, run again the 'perf script' line to regenerate the .db file and run the exported-sql-viewer.py again to get the same all the way to 'main', then, from there, including 'main': main __x86_indirect_thunk_rax __x86_indirect_thunk_rax foo bar Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-21perf annotate: Fix getting source line failureWei Li1-2/+2
The output of "perf annotate -l --stdio xxx" changed since commit 425859ff0de33 ("perf annotate: No need to calculate notes->start twice") removed notes->start assignment in symbol__calc_lines(). It will get failed in find_address_in_section() from symbol__tty_annotate() subroutine as the a2l->addr is wrong. So the annotate summary doesn't report the line number of source code correctly. Before fix: liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c void hotspot_1(void) { volatile int i; for (i = 0; i < 0x10000000; i++); for (i = 0; i < 0x10000000; i++); for (i = 0; i < 0x10000000; i++); } int main(void) { hotspot_1(); return 0; } liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1 liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ] liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1 ---------------------------------------------- 19.30 common_while_1[32] 19.03 common_while_1[4e] 19.01 common_while_1[16] 5.04 common_while_1[13] 4.99 common_while_1[4b] 4.78 common_while_1[2c] 4.77 common_while_1[10] 4.66 common_while_1[2f] 4.59 common_while_1[51] 4.59 common_while_1[35] 4.52 common_while_1[19] 4.20 common_while_1[56] 0.51 common_while_1[48] Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period) ----------------------------------------------------------------------------------------------------------------- : : : : Disassembly of section .text: : : 00000000000005fa <hotspot_1>: : hotspot_1(): : void hotspot_1(void) : { 0.00 : 5fa: push %rbp 0.00 : 5fb: mov %rsp,%rbp : volatile int i; : : for (i = 0; i < 0x10000000; i++); 0.00 : 5fe: movl $0x0,-0x4(%rbp) 0.00 : 605: jmp 610 <hotspot_1+0x16> 0.00 : 607: mov -0x4(%rbp),%eax common_while_1[10] 4.77 : 60a: add $0x1,%eax common_while_1[13] 5.04 : 60d: mov %eax,-0x4(%rbp) common_while_1[16] 19.01 : 610: mov -0x4(%rbp),%eax common_while_1[19] 4.52 : 613: cmp $0xfffffff,%eax 0.00 : 618: jle 607 <hotspot_1+0xd> : for (i = 0; i < 0x10000000; i++); ... After fix: liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ] liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1 ---------------------------------------------- 33.34 common_while_1.c:5 33.34 common_while_1.c:6 33.32 common_while_1.c:7 Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period) ----------------------------------------------------------------------------------------------------------------- : : : : Disassembly of section .text: : : 00000000000005fa <hotspot_1>: : hotspot_1(): : void hotspot_1(void) : { 0.00 : 5fa: push %rbp 0.00 : 5fb: mov %rsp,%rbp : volatile int i; : : for (i = 0; i < 0x10000000; i++); 0.00 : 5fe: movl $0x0,-0x4(%rbp) 0.00 : 605: jmp 610 <hotspot_1+0x16> 0.00 : 607: mov -0x4(%rbp),%eax common_while_1.c:5 4.70 : 60a: add $0x1,%eax 4.89 : 60d: mov %eax,-0x4(%rbp) common_while_1.c:5 19.03 : 610: mov -0x4(%rbp),%eax common_while_1.c:5 4.72 : 613: cmp $0xfffffff,%eax 0.00 : 618: jle 607 <hotspot_1+0xd> : for (i = 0; i < 0x10000000; i++); 0.00 : 61a: movl $0x0,-0x4(%rbp) 0.00 : 621: jmp 62c <hotspot_1+0x32> 0.00 : 623: mov -0x4(%rbp),%eax common_while_1.c:6 4.54 : 626: add $0x1,%eax 4.73 : 629: mov %eax,-0x4(%rbp) common_while_1.c:6 19.54 : 62c: mov -0x4(%rbp),%eax common_while_1.c:6 4.54 : 62f: cmp $0xfffffff,%eax ... Signed-off-by: Wei Li <liwei391@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 425859ff0de33 ("perf annotate: No need to calculate notes->start twice") Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-20perf tools: Make rm_rf() remove single fileJiri Olsa1-3/+13
Let rm_rf() remove a file if it's provided by path, not just directories. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190220122800.864-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-20perf cpumap: Increase debug level for cpu_map__snprint verbose outputJiri Olsa1-1/+1
So it does not screw up single -v verbose output. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190220122800.864-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-20perf bpf-event: Add missing new line into pr_debug callJiri Olsa1-1/+1
Add a missing new line into pr_debug call in perf_event__synthesize_bpf_events(), so that the error message does not screw the verbose output. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lkml.kernel.org/r/20190220122800.864-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-20perf evsel: Force sample_type for slave eventsJiri Olsa1-0/+8
Force sample_type setup for slave events in group leader sessions. We don't get sample for slave events, we make them when delivering group leader sample. Set the slave event to follow the master sample_type to ease up report. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190220122800.864-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-20perf session: Don't report zero period samples for slave eventsJiri Olsa1-0/+7
There's no reason to deliver a sample with zero period. It means there was no value for slave event since its last group leader sample. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190220122800.864-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-19perf bpf: Add bpf_map dumperArnaldo Carvalho de Melo3-0/+95
At some point I'll suggest moving this to libbpf, for now I'll experiment with ways to dump BPF maps set by events in 'perf trace', starting with a very basic dumper for the current very limited needs of the augmented_raw_syscalls code: dumping booleans. Having functions that apply to the map keys and values and do table lookup in things like syscall id to string tables should come next. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Yonghong Song <yhs@fb.com> Link: https://lkml.kernel.org/n/tip-lz14w0esqyt1333aon05jpwc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-19perf report: Don't shadow inlined symbol with different addr rangeHe Kuang2-3/+9
We can't assume inlined symbols with the same name are equal, because their address range may be different. This will cause the symbols with different addresses be shadowed when adding to the hist entry, and lead to ERANGE error when checking the symbol address during sample parse, the addr should be within the range of [sym.start, sym.end]. The error message is like: "0x36aea60 [0x8]: failed to process type: 68". The second parameter of symbol__new() is the length of the fake symbol for the inline frame, which is the subtraction of the end and start address of base_sym. Signed-off-by: He Kuang <hekuang@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: aa441895f7b4 ("perf report: Compare symbol name for inlined frames when sorting") Link: http://lkml.kernel.org/r/20190219130531.15692-1-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-19perf tools: Use sysfs__mountpoint() when reading cpu topologyJiri Olsa1-7/+22
Use sysfs__mountpoint() when reading sysfs files to obtain cpu/numa topologies. Also use scnprintf instead of sprintf as suggested by Namhyung. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190219095815.15931-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-19perf tools: Add numa_topology objectJiri Olsa3-93/+160
Add the numa_topology object to return the list of numa nodes together with their cpus. It will replace the numa code in header.c and will be used from 'perf record' in the following patches. Add the following interface functions to load numa details: struct numa_topology *numa_topology__new(void); void numa_topology__delete(struct numa_topology *tp); And replace the current (copied) local interface, with no functional changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190219095815.15931-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-19perf tools: Add cpu_topology objectJiri Olsa4-146/+166
Make struct cpu_topo global and rename it to 'struct cpu_topology', so that it can be used from the 'perf record' command in the following patches. Add the following interface functions to load/free cpu topology details: struct cpu_topology *cpu_topology__new(void); void cpu_topology__delete(struct cpu_topology *tp); Move it to a separate source file cputopo.c together with numa related object in the following patches. No functional change, the new interface will be used in upcoming changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190219095815.15931-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-19perf header: Fix wrong node write in NUMA_TOPOLOGY featureJiri Olsa1-1/+1
We are currently passing the node index instead of the real node number. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: fbe96f29ce4b ("perf tools: Make perf.data more self-descriptive (v8)" Link: http://lkml.kernel.org/r/20190219095815.15931-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf header: Remove unused 'cpu_nr' field from 'struct cpu_topo'Jiri Olsa1-2/+0
Not used at all. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190213123246.4015-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf header: Get rid of write_it labelJiri Olsa1-4/+2
Simplifying the code a bit. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190213123246.4015-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf list: Display metric expressions for --details optionJiri Olsa3-3/+10
Display metric expression itself when --details is specified. Current list with no details: # perf list metrics ... TopDownL1: IPC [Instructions Per Cycle (per logical thread)] SLOTS [Total issue-pipeline slots] ... Detailed output with metric formula: # perf list --details metrics ... TopDownL1: IPC [Instructions Per Cycle (per logical thread)] [inst_retired.any / cpu_clk_unhalted.thread] SLOTS [Total issue-pipeline slots] [4*(( cpu_clk_unhalted.thread_any / 2 ) if #smt_on else cycles)] ... Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190213123246.4015-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf tools: Fix legacy events symbol separator parsingJiri Olsa1-2/+2
Fixing legacy symbol events parsing. We can't support single slash separator, like 'cycles/u', because it conflicts with non empty terms, like 'cycles/period/u'. Keeping only '//' and ':' separator for these events: cycles//u cycles:k And removing '/' separator support, which is not working anymore. Also adding automated tests for above events. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190213123246.4015-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf tools: Rename build libperf to perfJiri Olsa5-144/+144
Rename build libperf to perf, because it's used to build perf. The libperf build object name will be used for libperf library. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190213123246.4015-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Modularize auxtrace_buffer fetch functionMathieu Poirier1-12/+29
Making the auxtrace_buffer fetch function modular so that it can be called from different decoding context (timeless vs. non-timeless), avoiding to repeat code. No change in functionality is introduced by this patch. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-14-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Modularize main packet processing loopMathieu Poirier1-57/+72
Making the main packet processing loop modular so that it can be called from different decoding context (timeless vs. non-timless), avoiding to repeat code. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-13-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Modularize main decoder functionMathieu Poirier1-12/+29
Making the main decoder block modular so that it can be called from different decoding context (timeless vs. non-timeless), avoiding to repeat code. No change in functionality is introduced by this patch. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-12-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Make cs_etm__run_decoder() queue independentMathieu Poirier2-33/+26
This patch makes decoding of auxtrace buffer centered around a struct cs_etm_queue. This eliminates surperflous variables and is a precursor for work that simplifies the main decoder loop. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-11-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Rethink kernel address initialisationMathieu Poirier1-4/+3
Moving initialisation of the kernel start address to function cs_etm__setup_queues(), considered to be the common denominator for queue initialisation. That way we don't have to repeat the same code at different places. No change of functionatlity is introduced by this patch. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-10-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Cleaning up function cs_etm__alloc_queue()Mathieu Poirier1-21/+16
Function cs_etm__alloc_queue() should only be concerned with the allocation of memory for the etmq and accompanying decoder. Everything else should be done in the calling function. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-9-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Fix erroneous commentMathieu Poirier1-1/+1
The comment just before initialising the decoder is plane wrong since it is part of the decoding queue setup function and the operation code specifically mention that trace data is to be decoded rather than printed out. This patch simply fix the comment to prevent people from getting really confused. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-8-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Introducing function cs_etm__init_trace_params()Mathieu Poirier2-58/+58
The trace parameter initialisation code is repeated in two different places, something that bloats the file and can lead to errors. This is fixed by introducing a helper function and calling the right protocol initialisation code when required. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-7-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Fix memory leak in error pathMathieu Poirier1-7/+13
Memory allocated for variable 't_params' isn't released properly in the error path of function cs_etm_queue *cs_etm__alloc_queue() and cs_etm__dump_event(), something this patch addresses. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-6-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Introducing function cs_etm_decoder__init_dparams()Mathieu Poirier2-14/+30
Introducing function cs_etm_decoder__init_dparams() to avoid repeating code at two different places. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-5-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Fix wrong return values in error pathMathieu Poirier1-2/+2
Function cs_etm__mem_access() is supposed to return a u32 but the error path returns negative values at a couple of places, something that really throws off the clients using it. Fix the situation by return '0'. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-4-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Remove unused structure field "time" and "timestamp"Mathieu Poirier1-8/+4
Field "time" and "timestamp" in structure cs_etm_queue are no longer used and need to be removed. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-3-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Remove unused structure field "state"Mathieu Poirier1-1/+0
Field "state" in structure cs_etm_queue is no longer used and needs to be removed. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190212171618.25355-2-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf utils: Silence "Couldn't synthesize bpf events" warning for EPERMSong Liu1-2/+2
Synthesizing BPF events is only supported for root. Silent warning msg when non-root user runs perf-record. Reported-by: David Carrillo-Cisneros <davidca@fb.com> Signed-off-by: Song Liu <songliubraving@fb.com> Tested-by: David Carrillo-Cisneros <davidca@fb.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/20190204193140.719740-1-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf report: Add s390 diagnosic sampling descriptor sizeThomas Richter1-0/+5
On IBM z13 machine types 2964 and 2965 the descriptor sizes for sampling and diagnostic sampling entries might be missing in the trailer entry and are set to zero. This leads to a perf report failure when processing diagnostic sampling entries. This patch adds missing descriptor sizes when the trailer entry contains zero for these fields. Output before: [root@s38lp82 perf]# ./perf report --stdio | fgrep Samples 0xabbf0 [0x8]: failed to process type: 68 Error: failed to process sample [root@s38lp82 perf]# Output after: [root@s38lp82 perf]# ./perf report --stdio | fgrep Samples # Total Lost Samples: 0 # Samples: 3K of event 'SF_CYCLES_BASIC_DIAG' # Samples: 162 of event 'CF_DIAG' [root@s38lp82 perf]# Fixes: 2b1444f2e28b ("perf report: Add raw report support for s390 auxiliary trace") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20190211100627.85714-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf cs-etm: Add proper header file for symbolsMathieu Poirier1-0/+1
After 'commit e22c1c751140 ("perf thread: Don't include symbol.h, symbol_conf.h is enough")' Compilation of the perf tools is broken when using the functionality provided by the openCSD library: [...] ... timerfd: [ on ] ... sched_getcpu: [ on ] ... sdt: [ OFF ] ... setns: [ on ] ... libopencsd: [ on ] [...] CC util/arm-spe.o CC util/arm-spe-pkt-decoder.o CC util/s390-cpumsf.o CC util/cs-etm.o CC util/parse-branch-options.o util/cs-etm.c: In function ‘cs_etm__mem_access’: util/cs-etm.c:297:24: error: storage size of ‘al’ isn’t known struct addr_location al; And rightly so since file cs-etm.c doesn't include symbol.h, something that is rectified in this patch. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190208223543.31836-1-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-09Merge tag 'perf-core-for-mingo-5.1-20190206' of ↵Ingo Molnar62-342/+1017
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: Hardware tracing: Adrian Hunter: - Handle calls optimized into jumps to a different symbol in the thread stack routines used to process hardware traces (Adrian Hunter) Intel PT: Adrian Hunter: - Fix overlap calculation for padding. - Fix CYC timestamp calculation after OVF. - Packet splitting can only happen in 32-bit. - Add timestamp to auxtrace errors. ARM CoreSight: Leo Yan: - Add last instruction information in packet - Set sample flags for instruction range, exception and return packets and for a trace discontinuity. - Add exception number in exception packet - Change tuple from traceID-CPU# to traceID-metadata - Add traceID in packet Mathieu Poirier: - Add "sinks" group to PMU directory - Use event attributes to send sink information to kernel - Remove set_drv_config() API, no longer used. perf annotate: Jiri Olsa: - Delay symbol annotation to the resort phase, speeding up 'perf report' startup. perf record: Alexey Budankov: - Allow binding userspace buffers to NUMA nodes. Symbols: Adrian Hunter: - Fix calculating of symbol sizes when splitting kallsyms into maps for kcore processing. Vendor events: William Cohen: - Intel: Fix Load_Miss_Real_Latency on CLX Misc: Arnaldo Carvalho de Melo: - Streamline headers, removing includes when all that is needed are just forward declarations, fixup the fallout for cases where headers should have been explicitely included but were instead obtained indirectly, by sheer luck. - Add fallback versions for CPU_{OR,EQUAL}(), so that code using it continue to build on older systems where those were not yet introduced or in systems using some other libc than the GNU one where those helpers aren't present. Documentation: Changbin Du: - Add documentation for BPF event selection. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-09Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar3-3/+24
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-06perf auxtrace: Add timestamp to auxtrace errorsAdrian Hunter7-16/+48
The timestamp can use useful to find part of a trace that has an error without outputting all of the trace e.g. using the itrace 's' option to skip initial number of events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190206103947.15750-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf intel-pt: Packet splitting can happen only on 32-bitAdrian Hunter1-1/+1
Data is copied when the trace is stopped, so packets are never split between buffers except when processing if the buffer cannot fit in the address space which can only happen on 32-bit systems. Change the logic to reflect that. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190206103947.15750-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf intel-pt: Fix CYC timestamp calculation after OVFAdrian Hunter1-1/+0
CYC packet timestamp calculation depends upon CBR which was being cleared upon overflow (OVF). That can cause errors due to failing to synchronize with sideband events. Even if a CBR change has been lost, the old CBR is still a better estimate than zero. So remove the clearing of CBR. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190206103947.15750-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf intel-pt: Fix overlap calculation for paddingAdrian Hunter1-2/+34
Auxtrace records might have up to 7 bytes of padding appended. Adjust the overlap accordingly. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190206103947.15750-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf auxtrace: Define auxtrace record alignmentAdrian Hunter2-2/+5
Define auxtrace record alignment so that it can be referenced elsewhere. Note this is preparation for patch "perf intel-pt: Fix overlap calculation for padding" Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190206103947.15750-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf thread-stack: Represent jmps to the start of a different symbolAdrian Hunter2-2/+31
The compiler might optimize a call/ret combination by making it a jmp. However the thread-stack does not presently cater for that, so that such control flow is not visible in the call graph. Make it visible by recording on the stack a branch to the start of a different symbol. Note, that means when a ret pops the stack, all jmps must be popped off first. Example: $ cat jmp-to-fn.c __attribute__((noinline)) int bar(void) { return -1; } __attribute__((noinline)) int foo(void) { return bar() + 1; } int main() { return foo(); } $ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c $ objdump -d jmp-to-fn <SNIP> 0000000000001040 <main>: 1040: 31 c0 xor %eax,%eax 1042: e9 09 01 00 00 jmpq 1150 <foo> <SNIP> 0000000000001140 <bar>: 1140: b8 ff ff ff ff mov $0xffffffff,%eax 1145: c3 retq <SNIP> 0000000000001150 <foo>: 1150: 31 c0 xor %eax,%eax 1152: e8 e9 ff ff ff callq 1140 <bar> 1157: 83 c0 01 add $0x1,%eax 115a: c3 retq <SNIP> $ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ] $ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls 2019-01-08 13:24:58.783069 Creating database... 2019-01-08 13:24:58.794650 Writing records... 2019-01-08 13:24:59.008050 Adding indexes 2019-01-08 13:24:59.015802 Done $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db Before: main -> bar After: main -> foo -> bar Committer testing: Install the python2-pyside package, then select these menu options on the GUI: "Reports" "Context sensitive callgraphs" Then go on expanding the symbols, to get, full picture when doing this on a fedora:29 with gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC): jmp-to-fn PID:TID _start (ld-2.28.so) __libc_start_main main foo bar To verify that indeed, this fixes the problem. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190109091835.5570-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf thread-stack: Tidy thread_stack__no_call_return() by adding more local ↵Adrian Hunter1-18/+17
variables Make thread_stack__no_call_return() more readable by adding more local variables. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190109091835.5570-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf thread-stack: Tidy thread_stack__push_cp() usageAdrian Hunter1-10/+3
If 'cp' is checked in thread_stack__push_cp() a number of error checks can be removed, reducing code size and improving readability. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190109091835.5570-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf tools: Fix split_kallsyms_for_kcore() for trampoline symbolsAdrian Hunter1-0/+2
Kallsyms symbols do not have a size, so the size becomes the distance to the next symbol. Consequently the recently added trampoline symbols end up with large sizes because the trampolines are some distance from one another and the main kernel map. However, symbols that end outside their map can disrupt the symbol tree because, after mapping, it can appear incorrectly that they overlap other symbols. Add logic to truncate symbol size to the end of the corresponding map. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: stable@vger.kernel.org Fixes: d83212d5dd67 ("kallsyms, x86: Export addresses of PTI entry trampolines") Link: http://lkml.kernel.org/r/20190109091835.5570-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf cs-etm: Set sample flags for exception return packetLeo Yan1-0/+44
When return from exception, we need to distinguish if it's system call return or for other type exceptions for setting sample flags. Due to the exception return packet doesn't contain exception number, so we cannot decide sample flags based on exception number. On the other hand, the exception return packet is followed by an instruction range packet; this range packet deliveries the start address after exception handling, we can check if it is a SVC instruction just before the start address. If there has one SVC instruction is found ahead the return address, this means it's an exception return for system call; otherwise it is an normal return for other exceptions. This patch is to set sample flags for exception return packet, firstly it simply set sample flags as PERF_IP_FLAG_INTERRUPT for all exception returns since at this point it doesn't know what's exactly the exception type. We will defer to decide if it's an exception return for system call when the next instruction range packet comes, it checks if there has one SVC instruction prior to the start address and if so we will change sample flags to PERF_IP_FLAG_SYSCALLRET for system call return. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190129122842.32041-9-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf cs-etm: Set sample flags for exception packetLeo Yan2-0/+259
The exception taken and returning are typical flow for instruction jump but it needs to be handled with exception packets. This patch is to set sample flags for exception packet. Since the exception packet contains the exception number, according to the exception number this patch makes decision for belonging to which exception types. The decoder have defined different exception number for ETMv3 and ETMv4 separately, hence this patch needs firstly decide the ETM version by using the metadata magic number, and this patch adds helper function cs_etm__get_magic() for easily getting magic number. Based on different ETM version, the exception packet contains the exception number, according to the exception number this patch makes decision for the exception belonging to which exception types. In this patch, it introduces helper function cs_etm__is_svc_instr(); for ETMv4 CS_ETMV4_EXC_CALL covers SVC, SMC and HVC cases in the single exception number, thus need to use cs_etm__is_svc_instr() to decide an exception taken for system call. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Robert Walker <robert.walker@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190129122842.32041-8-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf cs-etm: Add traceID in packetLeo Yan2-0/+3
Add traceID in packet, thus we can use traceID to retrieve metadata pointer from traceID-metadata tuple. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190129122842.32041-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf cs-etm: Change tuple from traceID-CPU# to traceID-metadataLeo Yan3-12/+31
If packet processing wants to know the packet is bound with which ETM version, it needs to access metadata to decide that based on metadata magic number; but we cannot simply to use CPU logic ID number as index to access metadata sequential array, especially when system have hotplugged off CPUs, the metadata array are only allocated for online CPUs but not offline CPUs, so the CPU logic number doesn't match with its index in the array. This patch is to change tuple from traceID-CPU# to traceID-metadata, thus it can use the tuple to retrieve metadata pointer according to traceID. For safe accessing metadata fields, this patch provides helper function cs_etm__get_cpu() which is used to return CPU number according to traceID; cs_etm_decoder__buffer_packet() is the first consumer for this helper function. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190129122842.32041-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf cs-etm: Add exception number in exception packetLeo Yan2-4/+17
When an exception packet comes, it contains the information for exception number; the exception number indicates the exception types, so from it we can know if the exception is taken for interrupt, system call or other traps, etc. This patch simply adds a field in cs_etm_packet struct, it records exception number for exception packet that will then be used to properly identify exception types to the perf synthesize mechanic. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190129122842.32041-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>