summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2020-11-26perf arm-spe: Add more sub classes for operation packetLeo Yan1-2/+16
For the operation type packet payload with load/store class, it misses to support these sub classes: - A load/store targeting the general-purpose registers; - A load/store targeting unspecified registers; - The ARMv8.4 nested virtualisation extension can redirect system register accesses to a memory page controlled by the hypervisor. The SPE profiling feature in newer implementations can tag those memory accesses accordingly. Add the bit pattern describing load/store sub classes, so that the perf tool can decode it properly. Inspired by Andre Przywara, refined the commit log and code for more clear description. Co-developed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-15-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Refactor operation packet handlingLeo Yan2-12/+37
Defines macros for operation packet header and formats (support sub classes for 'other', 'branch', 'load and store', etc). Uses these macros for operation packet decoding and dumping. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-14-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Add new function arm_spe_pkt_desc_op_type()Leo Yan1-34/+45
The operation type packet is complex and contains subclass; the parsing flow causes deep indentation; for more readable, this patch introduces a new function arm_spe_pkt_desc_op_type() which is used for operation type parsing. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-13-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Remove size condition checking for eventsLeo Yan2-14/+9
In the Armv8 ARM (ARM DDI 0487F.c), chapter "D10.2.6 Events packet", it describes the event bit is valid with specific payload requirement. For example, the Last Level cache access event, the bit is defined as: E[8], byte 1 bit [0], when SZ == 0b01 , when SZ == 0b10 , or when SZ == 0b11 It requires the payload size is at least 2 bytes, when byte 1 (start counting from 0) is valid, E[8] (bit 0 in byte 1) can be used for LLC access event type. For safety, the code checks the condition for payload size firstly, if meet the requirement for payload size, then continue to parse event type. If review function arm_spe_get_payload(), it has used cast, so any bytes beyond the valid size have been set to zeros. For this reason, we don't need to check payload size anymore afterwards when parse events, thus this patch removes payload size conditions. Suggested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-12-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Refactor event type handlingLeo Yan3-28/+29
Move the enums of event types to arm-spe-pkt-decoder.h, thus function arm_spe_pkt_desc_event() can use them for bitmasks. Suggested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-11-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Add new function arm_spe_pkt_desc_event()Leo Yan1-26/+37
This patch moves out the event packet parsing from arm_spe_pkt_desc() to the new function arm_spe_pkt_desc_event(). Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-10-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Refactor counter packet handlingLeo Yan2-5/+11
This patch defines macros for counter packet header, and uses macros to replace hard code values in functions arm_spe_get_counter() and arm_spe_pkt_desc(). In the function arm_spe_get_counter(), adds a new line for more readable. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-9-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Add new function arm_spe_pkt_desc_counter()Leo Yan1-15/+28
This patch moves out the counter packet parsing code from arm_spe_pkt_desc() to the new function arm_spe_pkt_desc_counter(). Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-8-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Refactor context packet handlingLeo Yan2-1/+4
Minor refactoring to use macro for index mask. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm_spe: Fixup top byte for data virtual addressLeo Yan1-3/+17
To establish a valid address from the address packet payload and finally the address value can be used for parsing data symbol in DSO, current code uses 0xff to replace the tag in the top byte of data virtual address. So far the code only fixups top byte for the memory layouts with 4KB pages, it misses to support memory layouts with 64KB pages. This patch adds the conditions for checking bits [55:52] are 0xf, if detects the pattern it will fill 0xff into the top byte of the address, also adds comment to explain the fixing up. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Refactor address packet handlingLeo Yan3-42/+48
This patch is to refactor address packet handling, it defines macros for address packet's header and payload, these macros are used by decoder and the dump flow. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Add new function arm_spe_pkt_desc_addr()Leo Yan1-26/+38
This patch moves out the address parsing code from arm_spe_pkt_desc() and uses the new introduced function arm_spe_pkt_desc_addr() to process address packet. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Refactor packet header parsingLeo Yan2-51/+61
The packet header parsing uses the hard coded values and it uses nested if-else statements. To improve the readability, this patch refactors the macros for packet header format so it removes the hard coded values. Furthermore, based on the new mask macros it reduces the nested if-else statements and changes to use the flat conditions checking, this is directive and can easily map to the descriptions in ARMv8-a architecture reference manual (ARM DDI 0487E.a), chapter 'D10.1.5 Statistical Profiling Extension protocol packet headers'. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Refactor printing string to bufferLeo Yan2-153/+151
When outputs strings to the decoding buffer with function snprintf(), SPE decoder needs to detects if any error returns from snprintf() and if so needs to directly bail out. If snprintf() returns success, it needs to update buffer pointer and reduce the buffer length so can continue to output the next string into the consequent memory space. This complex logics are spreading in the function arm_spe_pkt_desc() so there has many duplicate codes for handling error detecting, increment buffer pointer and decrement buffer size. To avoid the duplicate code, this patch introduces a new helper function arm_spe_pkt_out_string() which is used to wrap up the complex logics, and it's used by the caller arm_spe_pkt_desc(). This patch moves the variable 'blen' as the function's local variable so allows to remove the unnecessary braces and improve the readability. This patch simplifies the return value for arm_spe_pkt_desc(): '0' means success and other values mean an error has occurred. To realize this, it relies on arm_spe_pkt_out_string()'s parameter 'err', the 'err' is a cumulative value, returns its final value if printing buffer is called for one time or multiple times. Finally, the error is handled in a central place, rather than directly bailing out in switch-cases, it returns error at the end of arm_spe_pkt_desc(). This patch changes the caller arm_spe_dump() to respect the updated return value semantics of arm_spe_pkt_desc(). Suggested-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-16perf expr: Force encapsulation on expr_id_dataIan Rogers4-25/+66
This patch resolves some undefined behavior where variables in expr_id_data were accessed (for debugging) without being defined. To better enforce the tagged union behavior, the struct is moved into expr.c and accessors provided. Tag values (kinds) are explicitly identified. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-By: Kajol Jain<kjain@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20200826153055.2067780-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-16perf vendor events: Update Skylake client events to v50Jin Yao8-4502/+4552
- Update Skylake events to v50. - Update Skylake JSON metrics from TMAM 4.0. - Fix the issue in DRAM_Parallel_Reads - Fix the perf test warning Before: root@kbl-ppc:~# perf stat -M DRAM_Parallel_Reads -- sleep 1 event syntax error: '{arb/event=0x80,umask=0x2/,arb/event=0x80,umask=0x2,thresh=1/}:W' \___ unknown term 'thresh' for pmu 'uncore_arb' valid terms: event,edge,inv,umask,cmask,config,config1,config2,name,period,percore Initial error: event syntax error: '..umask=0x2/,arb/event=0x80,umask=0x2,thresh=1/}:W' \___ Cannot find PMU `arb'. Missing kernel support? root@kbl-ppc:~# perf test metrics 10: PMU events : 10.3: Parsing of PMU event table metrics : Skip (some metrics failed) 10.4: Parsing of PMU event table metrics with fake PMUs: Ok 67: Parse and process metrics : Ok After: root@kbl-ppc:~# perf stat -M MEM_Parallel_Reads -- sleep 1 Performance counter stats for 'system wide': 4,951,646 arb/event=0x80,umask=0x2/ # 26.30 MEM_Parallel_Reads (50.04%) 188,251 arb/event=0x80,umask=0x2,cmask=1/ (49.96%) 1.000867010 seconds time elapsed root@kbl-ppc:~# perf test metrics 10: PMU events : 10.3: Parsing of PMU event table metrics : Ok 10.4: Parsing of PMU event table metrics with fake PMUs: Ok 67: Parse and process metrics : Ok Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/93fae76f-ce2b-ab0b-3ae9-cc9a2b4cbaec@linux.intel.com/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-16perf data: Allow to use stdio functions for pipe modeNamhyung Kim5-11/+58
When perf data is in a pipe, it reads each event separately using read(2) syscall. This is a huge performance bottleneck when processing large data like in perf inject. Also perf inject needs to use write(2) syscall for the output. So convert it to use buffer I/O functions in stdio library for pipe data. This makes inject-build-id bench time drops from 20ms to 8ms. $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 8.074 msec (+- 0.013 msec) Average time per event: 0.792 usec (+- 0.001 usec) Average memory usage: 8328 KB (+- 0 KB) Average build-id-all injection took: 5.490 msec (+- 0.008 msec) Average time per event: 0.538 usec (+- 0.001 usec) Average memory usage: 7563 KB (+- 0 KB) This patch enables it just for perf inject when used with pipe (it's a default behavior). Maybe we could do it for perf record and/or report later.. Committer testing: Before: $ perf stat -r 5 perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 13.605 msec (+- 0.064 msec) Average time per event: 1.334 usec (+- 0.006 usec) Average memory usage: 12220 KB (+- 7 KB) Average build-id-all injection took: 11.458 msec (+- 0.058 msec) Average time per event: 1.123 usec (+- 0.006 usec) Average memory usage: 11546 KB (+- 8 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 13.673 msec (+- 0.057 msec) Average time per event: 1.341 usec (+- 0.006 usec) Average memory usage: 12508 KB (+- 8 KB) Average build-id-all injection took: 11.437 msec (+- 0.046 msec) Average time per event: 1.121 usec (+- 0.004 usec) Average memory usage: 11812 KB (+- 7 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 13.641 msec (+- 0.069 msec) Average time per event: 1.337 usec (+- 0.007 usec) Average memory usage: 12302 KB (+- 8 KB) Average build-id-all injection took: 10.820 msec (+- 0.106 msec) Average time per event: 1.061 usec (+- 0.010 usec) Average memory usage: 11616 KB (+- 7 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 13.379 msec (+- 0.074 msec) Average time per event: 1.312 usec (+- 0.007 usec) Average memory usage: 12334 KB (+- 8 KB) Average build-id-all injection took: 11.288 msec (+- 0.071 msec) Average time per event: 1.107 usec (+- 0.007 usec) Average memory usage: 11657 KB (+- 8 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 13.534 msec (+- 0.058 msec) Average time per event: 1.327 usec (+- 0.006 usec) Average memory usage: 12264 KB (+- 8 KB) Average build-id-all injection took: 11.557 msec (+- 0.076 msec) Average time per event: 1.133 usec (+- 0.007 usec) Average memory usage: 11593 KB (+- 8 KB) Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 4,060.05 msec task-clock:u # 1.566 CPUs utilized ( +- 0.65% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 101,888 page-faults:u # 0.025 M/sec ( +- 0.12% ) 3,745,833,163 cycles:u # 0.923 GHz ( +- 0.10% ) (83.22%) 194,346,613 stalled-cycles-frontend:u # 5.19% frontend cycles idle ( +- 0.57% ) (83.30%) 708,495,034 stalled-cycles-backend:u # 18.91% backend cycles idle ( +- 0.48% ) (83.48%) 5,629,328,628 instructions:u # 1.50 insn per cycle # 0.13 stalled cycles per insn ( +- 0.21% ) (83.57%) 1,236,697,927 branches:u # 304.602 M/sec ( +- 0.16% ) (83.44%) 17,564,877 branch-misses:u # 1.42% of all branches ( +- 0.23% ) (82.99%) 2.5934 +- 0.0128 seconds time elapsed ( +- 0.49% ) $ After: $ perf stat -r 5 perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 8.560 msec (+- 0.125 msec) Average time per event: 0.839 usec (+- 0.012 usec) Average memory usage: 12520 KB (+- 8 KB) Average build-id-all injection took: 5.789 msec (+- 0.054 msec) Average time per event: 0.568 usec (+- 0.005 usec) Average memory usage: 11919 KB (+- 9 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 8.639 msec (+- 0.111 msec) Average time per event: 0.847 usec (+- 0.011 usec) Average memory usage: 12732 KB (+- 8 KB) Average build-id-all injection took: 5.647 msec (+- 0.069 msec) Average time per event: 0.554 usec (+- 0.007 usec) Average memory usage: 12093 KB (+- 7 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 8.551 msec (+- 0.096 msec) Average time per event: 0.838 usec (+- 0.009 usec) Average memory usage: 12739 KB (+- 8 KB) Average build-id-all injection took: 5.617 msec (+- 0.061 msec) Average time per event: 0.551 usec (+- 0.006 usec) Average memory usage: 12105 KB (+- 7 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 8.403 msec (+- 0.097 msec) Average time per event: 0.824 usec (+- 0.010 usec) Average memory usage: 12770 KB (+- 8 KB) Average build-id-all injection took: 5.611 msec (+- 0.085 msec) Average time per event: 0.550 usec (+- 0.008 usec) Average memory usage: 12134 KB (+- 8 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 8.518 msec (+- 0.102 msec) Average time per event: 0.835 usec (+- 0.010 usec) Average memory usage: 12518 KB (+- 10 KB) Average build-id-all injection took: 5.503 msec (+- 0.073 msec) Average time per event: 0.540 usec (+- 0.007 usec) Average memory usage: 11882 KB (+- 8 KB) Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 2,394.88 msec task-clock:u # 1.577 CPUs utilized ( +- 0.83% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 103,181 page-faults:u # 0.043 M/sec ( +- 0.11% ) 3,548,172,030 cycles:u # 1.482 GHz ( +- 0.30% ) (83.26%) 81,537,700 stalled-cycles-frontend:u # 2.30% frontend cycles idle ( +- 1.54% ) (83.24%) 876,631,544 stalled-cycles-backend:u # 24.71% backend cycles idle ( +- 1.14% ) (83.45%) 5,960,361,707 instructions:u # 1.68 insn per cycle # 0.15 stalled cycles per insn ( +- 0.27% ) (83.26%) 1,269,413,491 branches:u # 530.054 M/sec ( +- 0.10% ) (83.48%) 11,372,453 branch-misses:u # 0.90% of all branches ( +- 0.52% ) (83.31%) 1.51874 +- 0.00642 seconds time elapsed ( +- 0.42% ) $ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201030054742.87740-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf arm-spe: Fix packet length handlingLeo Yan1-22/+12
When processing address packet and counter packet, if the packet contains extended header, it misses to account the extra one byte for header length calculation, thus returns the wrong packet length. To correct the packet length calculation, one possible fixing is simply to plus extra 1 for extended header, but will spread some duplicate code in the flows for processing address packet and counter packet. Alternatively, we can refine the function arm_spe_get_payload() to not only support short header and allow it to support extended header, and rely on it for the packet length calculation. So this patch refactors function arm_spe_get_payload() with a new argument 'ext_hdr' for support extended header; the packet processing flows can invoke this function to unify the packet length calculation. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20201111071149.815-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf arm-spe: Refactor arm_spe_get_events()Leo Yan1-4/+2
In function arm_spe_get_events(), the event packet's 'index' is assigned as payload length, but the flow is not directive: it firstly gets the packet length from the return value of arm_spe_get_payload(), the value includes header length (1) and payload length: int ret = arm_spe_get_payload(buf, len, packet); and then reduces header length from packet length, so finally get the payload length: packet->index = ret - 1; To simplify the code, this patch directly assigns payload length to event packet's index; and at the end it calls arm_spe_get_payload() to return the payload value. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20201111071149.815-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf arm-spe: Refactor payload size calculationLeo Yan1-9/+9
This patch defines macro to extract "sz" field from header, and renames the function payloadlen() to arm_spe_payload_len(). Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20201111071149.815-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf arm-spe: Fix a typo in commentLeo Yan1-1/+1
Fix a typo: s/iff/if. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20201111071149.815-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf arm-spe: Include bitops.h for BIT() macroLeo Yan2-8/+4
Include header linux/bitops.h, directly use its BIT() macro and remove the self defined macros. Committer notes: Use BIT_ULL() instead of BIT to build on 32-bit arches as mentioned in review by Andre Przywara <andre.przywara@arm.com>. I noticed the build failure when crossbuilding to arm32 from x86_64. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20201111071149.815-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf mem: Support ARM SPE eventsLeo Yan2-1/+38
This patch adds ARM SPE events for perf memory profiling: 'spe-load': event for only recording memory load ops; 'spe-store': event for only recording memory store ops; 'spe-ldst': event for recording memory load and store ops. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-10-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf c2c: Support AUX traceLeo Yan1-0/+12
This patch adds the AUX callbacks in session structure, so support AUX trace for "perf c2c" tool; make itrace memory event as default for "perf c2c", this tells the AUX trace decoder to synthesize samples and can be used for statistics. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-9-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf mem: Support AUX traceLeo Yan1-0/+13
The 'perf mem' tool doesn't support AUX trace data so it cannot receive the hardware tracing data. On arm64, although it doesn't support PMU events for memory load and store, ARM SPE is a good candidate for memory profiling, the hardware tracer can record memory accessing operations with affiliated information (e.g. physical address and virtual address for accessing, cache levels, TLB walking, latency, etc). To allow "perf mem" tool to support AUX trace, this patch adds the AUX callbacks for session structure; make itrace memory event as default for "perf mem", this tells the AUX trace decoder to synthesize memory samples. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-8-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf auxtrace: Add itrace option '-M' for memory eventsLeo Yan3-0/+7
This patch is to add itrace option '-M' to synthesize memory event. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf mem: Only initialize memory event for recordingLeo Yan1-5/+5
It's needless to initialize memory events for reporting, this patch moves memory event initialization for only recording. Furthermore, the change allows to parse perf data on cross platforms, e.g. perf tool can report result properly even the machine doesn't support the memory events. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORELeo Yan1-4/+13
When user doesn't specify event name, perf c2c tool enables both the load and store events, and this leads to failure for opening the duplicate PMU device for AUX trace. After the memory event PERF_MEM_EVENTS__LOAD_STORE is introduced, when the user doesn't specify event name, this patch converts the required operation to PERF_MEM_EVENTS__LOAD_STORE if the arch supports it. Otherwise, the tool still rolls back to enable events PERF_MEM_EVENTS__LOAD and PERF_MEM_EVENTS__STORE. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORELeo Yan3-7/+31
On the architectures with perf memory profiling, two types of hardware events have been supported: load and store; if want to profile memory for both load and store operations, the tool will use these two events at the same time, the usage is: # perf mem record -t load,store -- uname But this cannot be applied for AUX tracing event, the same PMU event can be used to only trace memory load, or only memory store, or trace for both memory load and store. This patch introduces a new event PERF_MEM_EVENTS__LOAD_STORE, which is used to support the event which can record both memory load and store operations. When user specifies memory operation type as 'load,store', or doesn't set type so use 'load,store' as default, if the arch supports the event PERF_MEM_EVENTS__LOAD_STORE, the tool will convert the required operations to this single event; otherwise, if the arch doesn't support PERF_MEM_EVENTS__LOAD_STORE, the tool rolls back to enable both events PERF_MEM_EVENTS__LOAD and PERF_MEM_EVENTS__STORE, which keeps the same behaviour with before. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf mem: Introduce weak function perf_mem_events__ptr()Leo Yan4-21/+46
Different architectures might use different event or different event parameters for memory profiling, this patch introduces a weak perf_mem_events__ptr() function which allows to return back a architecture specific memory event. Since the variable 'perf_mem_events' can be only accessed by the perf_mem_events__ptr() function, mark the variable as 'static', this allows the architectures to define its own memory event array. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf mem: Search event name with more flexible pathLeo Yan1-3/+3
The perf tool searches a memory event name under the folder '/sys/devices/cpu/events/', this leads to the limitation for the selection of a memory profiling event which must be under this folder. Thus it's impossible to use any other event as memory event which is not under this specific folder, e.g. Arm SPE hardware event is not located in '/sys/devices/cpu/events/' so it cannot be enabled for memory profiling. This patch changes to search folder from '/sys/devices/cpu/events/' to '/sys/devices', so it give flexibility to find events which can be used for memory profiling. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf jevents: Add test for arch std eventsJohn Garry4-0/+31
Recently there was an undetected breakage for std arch event support. Add support in "PMU events" testcase to detect such breakages. For this, the "test" arch needs has support added to process std arch events. And a test event is added for the test, ifself. Also add a few code comments to help understand the code a bit better. Committer testing: Before: # perf test -vv pmu |& grep l3_cache_rd # After: # perf test -vv pmu |& grep l3_cache_rd testing event table l3_cache_rd: pass testing aliases PMU cpu: matched event l3_cache_rd # Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-By: Kajol Jain<kjain@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/1603364547-197086-3-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf jevents: Tidy error handlingJohn Garry1-48/+35
There is much duplication in the error handling for directory transvering for prcessing JSONs. Factor out the common code to tidy a bit. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-By: Kajol Jain<kjain@linux.ibm.com> Link: https://lore.kernel.org/r/1603364547-197086-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf trace beauty: Allow header files in a different pathNamhyung Kim2-3/+3
Current script to generate mmap flags and prot checks headers from the uapi/asm-generic directory but it might come from a different directory in some environment. So change the pattern to accept it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201023020628.346257-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf stat: Add --quiet optionAndi Kleen3-1/+10
Add a new --quiet option to 'perf stat'. This is useful with 'perf stat record' to write the data only to the perf.data file, which can lower measurement overhead because the data doesn't need to be formatted. On my 4C desktop: % time ./perf stat record -e $(python -c 'print ",".join(["cycles"]*1000)') -a -I 1000 sleep 5 ... real 0m5.377s user 0m0.238s sys 0m0.452s % time ./perf stat record --quiet -e $(python -c 'print ",".join(["cycles"]*1000)') -a -I 1000 sleep 5 real 0m5.452s user 0m0.183s sys 0m0.423s In this example it cuts the user time by 20%. On systems with more cores the savings are higher. Signed-off-by: Andi Kleen <andi@firstfloor.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Link: http://lore.kernel.org/lkml/20201027002737.30942-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf stat: Support regex pattern in --for-each-cgroupNamhyung Kim3-26/+182
To make the command line even more compact with cgroups, support regex pattern matching in cgroup names. $ perf stat -a -e cpu-clock,cycles --for-each-cgroup ^foo sleep 1 3,000.73 msec cpu-clock foo # 2.998 CPUs utilized 12,530,992,699 cycles foo # 7.517 GHz (100.00%) 1,000.61 msec cpu-clock foo/bar # 1.000 CPUs utilized 4,178,529,579 cycles foo/bar # 2.506 GHz (100.00%) 1,000.03 msec cpu-clock foo/baz # 0.999 CPUs utilized 4,176,104,315 cycles foo/baz # 2.505 GHz (100.00%) 1.000892614 seconds time elapsed Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201027072855.655449-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf test: Use generic event for expand_libpfm_events()Namhyung Kim1-1/+1
I found that the UNHALTED_CORE_CYCLES event is only available in the Intel machines and it makes other vendors/archs fail on the test. As libpfm4 can parse the generic events like cycles, let's use them. Fixes: 40b74c30ffb9 ("perf test: Add expand cgroup event test") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201027072855.655449-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf kvm: Add kvm-stat for arm64Sergey Senozhatsky4-0/+179
Add support for 'perf kvm stat' on arm64 platform. Example: # perf kvm stat report Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time DABT_LOW 661867 98.91% 40.45% 2.19us 3364.65us 6.24us ( +- 0.34% ) IRQ 4598 0.69% 57.44% 2.89us 3397.59us 1276.27us ( +- 1.61% ) WFx 1475 0.22% 1.71% 2.22us 3388.63us 118.31us ( +- 8.69% ) IABT_LOW 1018 0.15% 0.38% 2.22us 2742.07us 38.29us ( +- 12.55% ) SYS64 180 0.03% 0.01% 2.07us 112.91us 6.57us ( +- 14.95% ) HVC64 17 0.00% 0.01% 2.19us 322.35us 42.95us ( +- 58.98% ) Total Samples:669155, Total events handled time:10216387.86us. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Suleiman Souhlal <suleiman@google.com> Link: http://lore.kernel.org/lkml/20201027062421.463355-1-sergey.senozhatsky@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf env: Conditionally compile BPF support code on having HAVE_LIBBPF_SUPPORTArnaldo Carvalho de Melo4-22/+32
If libbpf isn't selected, no need for a bunch of related code, that were not even being used, as code using these perf_env methods was also enclosed in HAVE_LIBBPF_SUPPORT. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf annotate: Move bpf header inclusion to inside HAVE_LIBBPF_SUPPORTArnaldo Carvalho de Melo1-4/+4
No need to include it otherwise. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf tests: Skip the llvm and bpf tests if HAVE_LIBBPF_SUPPORT isn't definedArnaldo Carvalho de Melo2-13/+21
If either NO_LIBBPF=1 is passed, explicitely disabling it or if libbpf is not available due to some missing dependency, skip its tests, telling the user the feature isn't available. # perf test <SNIP> 40: LLVM search and compile : Skip (not compiled in) 41: Session topology : Ok 42: BPF filter : Skip (not compiled in) <SNIP> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf bpf: Enclose libbpf.h include within HAVE_LIBBPF_SUPPORTArnaldo Carvalho de Melo2-0/+28
As it uses the 'deprecated' attribute in a way that breaks the build with old gcc compilers, so to continue being able to build in such systems where NO_LIBBPF=1 is being used, enclose it under HAVE_LIBBPF_SUPPORT. 1 centos:6 : FAIL gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 2 oraclelinux:6 : FAIL gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) CC /tmp/build/perf/builtin-record.o In file included from util/bpf-loader.h:11, from builtin-record.c:39: /git/linux/tools/lib/bpf/libbpf.h:203: error: wrong number of arguments specified for 'deprecated' attribute Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf test: Implement skip_reason callback for watchpoint testsTommi Rantala3-6/+17
Currently reason for skipping the read only watchpoint test is only seen when running in verbose mode: $ perf test watchpoint 23: Watchpoint : 23.1: Read Only Watchpoint : Skip 23.2: Write Only Watchpoint : Ok 23.3: Read / Write Watchpoint : Ok 23.4: Modify Watchpoint : Ok $ perf test -v watchpoint 23: Watchpoint : 23.1: Read Only Watchpoint : --- start --- test child forked, pid 60204 Hardware does not support read only watchpoints. test child finished with -2 Implement skip_reason callback for the watchpoint tests, so that it's easy to see reason why the test is skipped: $ perf test watchpoint 23: Watchpoint : 23.1: Read Only Watchpoint : Skip (missing hardware support) 23.2: Write Only Watchpoint : Ok 23.3: Read / Write Watchpoint : Ok 23.4: Modify Watchpoint : Ok Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201016131650.72476-1-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf tests tsc: Add checking helper is_supported()Leo Yan3-0/+15
So far tsc is enabled on x86_64, i386 and Arm64 architectures, add checking helper to skip this testing for other architectures. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20201019100236.23675-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf tests tsc: Make tsc testing as a common testingLeo Yan7-10/+8
x86 arch provides the testing for conversion between tsc and perf time, the testing is located in x86 arch folder. Move this testing out from x86 arch folder and place it into the common testing folder, so allows to execute tsc testing on other architectures (e.g. Arm64). This patch removes the inclusion of "arch-tests.h" from the testing code, this can avoid building failure if any arch has no this header file. Committer testing: $ perf test -v tsc Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc 70: Convert perf time to TSC : --- start --- test child forked, pid 4032834 mmap size 528384B 1st event perf time 165409788843605 tsc 336578703793868 rdtsc time 165409788854986 tsc 336578703837038 2nd event perf time 165409788855487 tsc 336578703838935 test child finished with 0 ---- end ---- Convert perf time to TSC: Ok $ Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20201019100236.23675-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf mem2node: Improve warning if detected no memory nodesLeo Yan1-1/+2
Some archs (e.g. x86 and Arm64) don't enable the configuration CONFIG_MEMORY_HOTPLUG by default, if this configuration is not enabled when build the kernel image, the SysFS for memory nodes will be missed. This results in perf tool has no chance to catpure the memory nodes information, when perf tool reports the result and detects no memory nodes, it outputs "assertion failed at util/mem2node.c:99". The output log doesn't give out reason for the failure and users have no clue for how to fix it. This patch changes to use explicit way for warning: it tells user that detected no memory nodes and suggests to enable CONFIG_MEMORY_HOTPLUG for kernel building. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201019003613.8399-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf version: Add a feature for libpfm4Ian Rogers1-0/+1
If perf is built with libpfm4 (LIBPFM4=1) then advertise it in perf -vv. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201019232545.4047264-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf annotate mips: Add perf arch instructions annotate handlersDengcheng Zhu3-1/+55
Support the MIPS architecture using the ins_ops association method. With this patch, perf-annotate can work well on MIPS. Testing it with a perf.data file collected on a mips machine: $./perf annotate -i perf.data : Disassembly of section .text: : : 00000000000be6a0 <get_next_seq>: : get_next_seq(): 0.00 : be6a0: lw v0,0(a0) 0.00 : be6a4: daddiu sp,sp,-128 0.00 : be6a8: ld a7,72(a0) 0.00 : be6ac: gssq s5,s4,80(sp) 0.00 : be6b0: gssq s1,s0,48(sp) 0.00 : be6b4: gssq s8,gp,112(sp) 0.00 : be6b8: gssq s7,s6,96(sp) 0.00 : be6bc: gssq s3,s2,64(sp) 0.00 : be6c0: sd a3,0(sp) 0.00 : be6c4: move s0,a0 0.00 : be6c8: sd v0,32(sp) 0.00 : be6cc: sd a5,8(sp) 0.00 : be6d0: sd zero,8(a0) 0.00 : be6d4: sd a6,16(sp) 0.00 : be6d8: ld s2,48(a0) 8.53 : be6dc: ld s1,40(a0) 9.42 : be6e0: ld v1,32(a0) 0.00 : be6e4: nop 0.00 : be6e8: ld s4,24(a0) 0.00 : be6ec: ld s5,16(a0) 0.00 : be6f0: sd a7,40(sp) 10.11 : be6f4: ld s6,64(a0) ... The original patch link: https://lore.kernel.org/patchwork/patch/1180480/ Signed-off-by: Dengcheng Zhu <dzhu@wavecomp.com> Cc: Dengcheng Zhu <dzhu@wavecomp.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Cc: linux-mips@vger.kernel.org [ fanpeng@loongson.cn: Add missing "bgtzl", "bltzl", "bgezl", "blezl", "beql" and "bnel" for pre-R6processors ] Signed-off-by: Peng Fan <fanpeng@loongson.cn> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03perf tools: Add missing swap for cgroup eventsNamhyung Kim1-0/+13
It was missed to add a swap function for PERF_RECORD_CGROUP. Fixes: ba78c1c5461c ("perf tools: Basic support for CGROUP event") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201102140228.303657-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03perf tools: Add missing swap for ino_generationJiri Olsa1-0/+1
We are missing swap for ino_generation field. Fixes: 5c5e854bc760 ("perf tools: Add attr->mmap2 support") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201101233103.3537427-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>