<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/tools/perf/util/sort.c, branch v5.19</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.19</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.19'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-02-16T14:21:22+00:00</updated>
<entry>
<title>perf report: Add "addr_from" and "addr_to" sort dimensions</title>
<updated>2022-02-16T14:21:22+00:00</updated>
<author>
<name>Stephane Eranian</name>
<email>eranian@google.com</email>
</author>
<published>2022-02-08T21:16:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=052747700e914896e8c78ff019411487dc7c12a0'/>
<id>urn:sha1:052747700e914896e8c78ff019411487dc7c12a0</id>
<content type='text'>
With the existing symbol_from/symbol_to, branches captured in the same
function would be collapsed into a single function if the latencies
associated with the each branch (cycles) were all the same.  That is the
case on Intel Broadwell, for instance. Since Intel Skylake, the latency
is captured by hardware and therefore is used to disambiguate branches.

Add addr_from/addr_to sort dimensions to sort branches based on their
addresses and not the function there are in. The output is still the
function name but the offset within the function is provided to uniquely
identify each branch.  These new sort dimensions also help with annotate
because they create different entries in the histogram which, in turn,
generates proper branch annotations.

Here is an example using AMD's branch sampling:

  $ perf record -a -b -c 1000037 -e cpu/branch-brs/ test_prg

  $ perf report
  Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
  Overhead  Command          Source Shared Object  Source Symbol                                   Target Symbol                                   Basic Block Cycle
    99.65%  test_prg	   test_prg              [.] test_thread                                 [.] test_thread                                 -
     0.02%  test_prg         [kernel.vmlinux]      [k] asm_sysvec_apic_timer_interrupt             [k] error_entry                                 -

  $ perf report -F overhead,comm,dso,addr_from,addr_to
  Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
  Overhead  Command          Shared Object     Source Address          Target Address
     4.22%  test_prg         test_prg          [.] test_thread+0x3c    [.] test_thread+0x4
     4.13%  test_prg         test_prg          [.] test_thread+0x4     [.] test_thread+0x3a
     4.09%  test_prg         test_prg          [.] test_thread+0x3a    [.] test_thread+0x6
     4.08%  test_prg         test_prg          [.] test_thread+0x2     [.] test_thread+0x3c
     4.06%  test_prg         test_prg          [.] test_thread+0x3e    [.] test_thread+0x2
     3.87%  test_prg         test_prg          [.] test_thread+0x6     [.] test_thread+0x38
     3.84%  test_prg         test_prg          [.] test_thread         [.] test_thread+0x3e
     3.76%  test_prg         test_prg          [.] test_thread+0x1e    [.] test_thread
     3.76%  test_prg         test_prg          [.] test_thread+0x38    [.] test_thread+0x8
     3.56%  test_prg         test_prg          [.] test_thread+0x22    [.] test_thread+0x1e
     3.54%  test_prg         test_prg          [.] test_thread+0x8     [.] test_thread+0x36
     3.47%  test_prg         test_prg          [.] test_thread+0x1c    [.] test_thread+0x22
     3.45%  test_prg         test_prg          [.] test_thread+0x36    [.] test_thread+0xa
     3.28%  test_prg         test_prg          [.] test_thread+0x24    [.] test_thread+0x1c
     3.25%  test_prg         test_prg          [.] test_thread+0xa     [.] test_thread+0x34
     3.24%  test_prg         test_prg          [.] test_thread+0x1a    [.] test_thread+0x24
     3.20%  test_prg         test_prg          [.] test_thread+0x34    [.] test_thread+0xc
     3.04%  test_prg         test_prg          [.] test_thread+0x26    [.] test_thread+0x1a
     3.01%  test_prg         test_prg          [.] test_thread+0xc     [.] test_thread+0x32
     2.98%  test_prg         test_prg          [.] test_thread+0x18    [.] test_thread+0x26
     2.94%  test_prg         test_prg          [.] test_thread+0x32    [.] test_thread+0xe
     2.76%  test_prg         test_prg          [.] test_thread+0x28    [.] test_thread+0x18
     2.73%  test_prg         test_prg          [.] test_thread+0xe     [.] test_thread+0x30
     2.67%  test_prg         test_prg          [.] test_thread+0x30    [.] test_thread+0x10
     2.67%  test_prg         test_prg          [.] test_thread+0x16    [.] test_thread+0x28
     2.46%  test_prg         test_prg          [.] test_thread+0x10    [.] test_thread+0x2e
     2.44%  test_prg         test_prg          [.] test_thread+0x2a    [.] test_thread+0x16
     2.38%  test_prg         test_prg          [.] test_thread+0x14    [.] test_thread+0x2a
     2.32%  test_prg         test_prg          [.] test_thread+0x2e    [.] test_thread+0x12
     2.28%  test_prg         test_prg          [.] test_thread+0x12    [.] test_thread+0x2c
     2.16%  test_prg         test_prg          [.] test_thread+0x2c    [.] test_thread+0x14
     0.02%  test_prg         [kernel.vmlinux]  [k] asm_sysvec_apic_ti+0x5  [k] error_entry

Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Kim Phillips &lt;kim.phillips@amd.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Link: http://lore.kernel.org/lkml/20220208211637.2221872-13-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf tools: Apply correct label to user/kernel symbols in branch mode</title>
<updated>2022-02-06T12:03:06+00:00</updated>
<author>
<name>German Gomez</name>
<email>german.gomez@arm.com</email>
</author>
<published>2022-01-26T10:59:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=05b5a9d6285412d97fc61b8ec113d1d4f6b950c2'/>
<id>urn:sha1:05b5a9d6285412d97fc61b8ec113d1d4f6b950c2</id>
<content type='text'>
In branch mode, the branch symbols were being displayed with incorrect
cpumode labels. So fix this.

For example, before:
  # perf record -b -a -- sleep 1
  # perf report -b

  Overhead  Command  Source Shared Object  Source Symbol               Target Symbol
     0.08%  swapper  [kernel.kallsyms]     [k] rcu_idle_enter          [k] cpuidle_enter_state
 ==&gt; 0.08%  cmd0     [kernel.kallsyms]     [.] psi_group_change        [.] psi_group_change
     0.08%  cmd1     [kernel.kallsyms]     [k] psi_group_change        [k] psi_group_change

After:
  # perf report -b

  Overhead  Command  Source Shared Object  Source Symbol               Target Symbol
     0.08%  swapper  [kernel.kallsyms]     [k] rcu_idle_enter          [k] cpuidle_enter_state
     0.08%  cmd0     [kernel.kallsyms]     [k] psi_group_change        [k] pei_group_change
     0.08%  cmd1     [kernel.kallsyms]     [k] psi_group_change        [k] psi_group_change

Reviewed-by: James Clark &lt;james.clark@arm.com&gt;
Signed-off-by: German Gomez &lt;german.gomez@arm.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lore.kernel.org/r/20220126105927.3411216-1-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux</title>
<updated>2022-01-18T04:32:11+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-01-18T04:32:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=57d17378a4a042401b0c2fe211e5a0e3a276cb3d'/>
<id>urn:sha1:57d17378a4a042401b0c2fe211e5a0e3a276cb3d</id>
<content type='text'>
Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Add 'trace' subcommand for 'perf ftrace', setting the stage for
     more 'perf ftrace' subcommands. Not using a subcommand yields the
     previous behaviour of 'perf ftrace'.

   - Add 'latency' subcommand to 'perf ftrace', that can use the
     function graph tracer or a BPF optimized one, via the -b/--use-bpf
     option.

     E.g.:

	$ sudo perf ftrace latency -a -T mutex_lock sleep 1
	#   DURATION     |      COUNT | GRAPH                          |
	     0 - 1    us |       4596 | ########################       |
	     1 - 2    us |       1680 | #########                      |
	     2 - 4    us |       1106 | #####                          |
	     4 - 8    us |        546 | ##                             |
	     8 - 16   us |        562 | ###                            |
	    16 - 32   us |          1 |                                |
	    32 - 64   us |          0 |                                |
	    64 - 128  us |          0 |                                |
	   128 - 256  us |          0 |                                |
	   256 - 512  us |          0 |                                |
	   512 - 1024 us |          0 |                                |
	     1 - 2    ms |          0 |                                |
	     2 - 4    ms |          0 |                                |
	     4 - 8    ms |          0 |                                |
	     8 - 16   ms |          0 |                                |
	    16 - 32   ms |          0 |                                |
	    32 - 64   ms |          0 |                                |
	    64 - 128  ms |          0 |                                |
	   128 - 256  ms |          0 |                                |
	   256 - 512  ms |          0 |                                |
	   512 - 1024 ms |          0 |                                |
	     1 - ...   s |          0 |                                |

     The original implementation of this command was in the bcc tool.

   - Support --cputype option for hybrid events in 'perf stat'.

  Improvements:

   - Call chain improvements for ARM64.

   - No need to do any affinity setup when profiling pids.

   - Reduce multiplexing with duration_time in 'perf stat' metrics.

   - Improve error message for uncore events, stating that some event
     groups are can only be used in system wide (-a) mode.

   - perf stat metric group leader fixes/improvements, including arch
     specific changes to better support Intel topdown events.

   - Probe non-deprecated sysfs path first, i.e. try the path
     /sys/devices/system/cpu/cpuN/topology/thread_siblings first, then
     the old /sys/devices/system/cpu/cpuN/topology/core_cpus.

   - Disable debuginfod by default in 'perf record', to avoid stalls on
     distros such as Fedora 35.

   - Use unbuffered output in 'perf bench' when pipe/tee'ing to a file.

   - Enable ignore_missing_thread in 'perf trace'

  Fixes:

   - Avoid TUI crash when navigating in the annotation of recursive
     functions.

   - Fix hex dump character output in 'perf script'.

   - Fix JSON indentation to 4 spaces standard in the ARM vendor event
     files.

   - Fix use after free in metric__new().

   - Fix IS_ERR_OR_NULL() usage in the perf BPF loader.

   - Fix up cross-arch register support, i.e. when printing register
     names take into account the architecture where the perf.data file
     was collected.

   - Fix SMT fallback with large core counts.

   - Don't lower case MetricExpr when parsing JSON files so as not to
     lose info such as the ":G" event modifier in metrics.

  perf test:

   - Add basic stress test for sigtrap handling to 'perf test'.

   - Fix 'perf test' failures on s/390

   - Enable system wide for metricgroups test in 'perf test´.

   - Use 3 digits for test numbering now we can have more tests.

  Arch specific:

   - Add events for Arm Neoverse N2 in the ARM JSON vendor event files

   - Support PERF_MEM_LVLNUM encodings in powerpc, that came from a
     single patch series, where I incorrectly merged the kernel bits,
     that were then reverted after coordination with Michael Ellerman
     and Stephen Rothwell.

   - Add ARM SPE total latency as PERF_SAMPLE_WEIGHT.

   - Update AMD documentation, with info on raw event encoding.

   - Add support for global and local variants of the "p_stage_cyc" sort
     key, applicable to perf.data files collected on powerpc.

   - Remove duplicate and incorrect aux size checks in the ARM CoreSight
     ETM code.

  Refactorings:

   - Add a perf_cpu abstraction to disambiguate CPUs and CPU map
     indexes, fixing problems along the way.

   - Document CPU map methods.

  UAPI sync:

   - Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench
     mem memcpy'

   - Sync UAPI files with the kernel sources: drm, msr-index,
     cpufeatures.

  Build system

   - Enable warnings through HOSTCFLAGS.

   - Drop requirement for libstdc++.so for libopencsd check

  libperf:

   - Make libperf adopt perf_counts_values__scale() from tools/perf/util/.

   - Add a stat multiplexing test to libperf"

* tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (115 commits)
  perf record: Disable debuginfod by default
  perf evlist: No need to do any affinity setup when profiling pids
  perf cpumap: Add is_dummy() method
  perf metric: Fix metric_leader
  perf cputopo: Fix CPU topology reading on s/390
  perf metricgroup: Fix use after free in metric__new()
  libperf tests: Update a use of the new cpumap API
  perf arm: Fix off-by-one directory path
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Update tools's copy of drm.h header
  tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
  perf pmu-events: Don't lower case MetricExpr
  perf expr: Add debug logging for literals
  perf tools: Probe non-deprecated sysfs path 1st
  perf tools: Fix SMT fallback with large core counts
  perf cpumap: Give CPUs their own type
  perf stat: Correct first_shadow_cpu to return index
  perf script: Fix flipped index and cpu
  perf c2c: Use more intention revealing iterator
  ...
</content>
</entry>
<entry>
<title>perf sort: Include global and local variants for p_stage_cyc sort key</title>
<updated>2022-01-10T18:39:00+00:00</updated>
<author>
<name>Athira Rajeev</name>
<email>atrajeev@linux.vnet.ibm.com</email>
</author>
<published>2021-12-03T02:20:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e3304c21357268ecbe156ed6247a03dc78d3fce4'/>
<id>urn:sha1:e3304c21357268ecbe156ed6247a03dc78d3fce4</id>
<content type='text'>
Sort key 'p_stage_cyc' is used to present the latency cycles spent in
pipeline stages.

perf has local 'p_stage_cyc' sort key to display this info. There is no
global variant available for this sort key. The local variant shows
latency in a single sample, whereas the global value will be useful to
present the total latency (sum of latencies) in the hist entry. It
represents the latency number multiplied by the number of samples.

Add global ('p_stage_cyc') and local variant ('local_p_stage_cyc') for
this sort key. Use 'local_p_stage_cyc' as default option for "mem" sort
mode.

Also add this to the list of dynamic sort keys and made the
"dynamic_headers" and "arch_specific_sort_keys" as static.

Reported-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Athira Jajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Tested-by: Nageswara R Sastry &lt;rnsastry@linux.ibm.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.vnet.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20211203022038.48240-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>tools/perf: Add '__rel_loc' event field parsing support</title>
<updated>2021-12-06T20:37:22+00:00</updated>
<author>
<name>Masami Hiramatsu</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2021-11-22T09:30:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7c689c839734a23eda855e69a56ed4795533bf71'/>
<id>urn:sha1:7c689c839734a23eda855e69a56ed4795533bf71</id>
<content type='text'>
Add new '__rel_loc' dynamic data location attribute support.
This type attribute is similar to the '__data_loc' but records the
offset from the field itself.
The libtraceevent adds TEP_FIELD_IS_RELATIVE to the
'tep_format_field::flags' with TEP_FIELD_IS_DYNAMIC for'__rel_loc'.

Link: https://lkml.kernel.org/r/163757344810.510314.12449413842136229871.stgit@devnote2

Cc: Beau Belgrave &lt;beaub@linux.microsoft.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Tom Zanussi &lt;zanussi@kernel.org&gt;
Signed-off-by: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Signed-off-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>perf sort: Fix the 'p_stage_cyc' sort key behavior</title>
<updated>2021-11-18T13:08:07+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2021-11-05T22:56:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db4b284029099224f387d75198e5995df1cb8aef'/>
<id>urn:sha1:db4b284029099224f387d75198e5995df1cb8aef</id>
<content type='text'>
andle 'p_stage_cyc' (for pipeline stage cycles) sort key with the same
rationale as for the 'weight' and 'local_weight', see the fix in this
series for a full explanation.

Not sure it also needs the local and global variants.

But I couldn't test it actually because I don't have the machine.

Reviewed-by: Athira Jajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Athira Jajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Athira Jajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Link: https://lore.kernel.org/r/20211105225617.151364-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf sort: Fix the 'ins_lat' sort key behavior</title>
<updated>2021-11-18T13:08:07+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2021-11-05T22:56:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d03c75363eeca861c843319a0e6f4426234ed6c'/>
<id>urn:sha1:4d03c75363eeca861c843319a0e6f4426234ed6c</id>
<content type='text'>
Handle 'ins_lat' (for instruction latency) and 'local_ins_lat' sort keys
with the same rationale as for the 'weight' and 'local_weight', see the
previous fix in this series for a full explanation.

But I couldn't test it actually, so only build tested.

Reviewed-by: Athira Jajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Athira Jajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Athira Jajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Link: https://lore.kernel.org/r/20211105225617.151364-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf sort: Fix the 'weight' sort key behavior</title>
<updated>2021-11-18T13:08:07+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2021-11-05T22:56:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=784e8adda4cdb3e2510742023729851b6c08803c'/>
<id>urn:sha1:784e8adda4cdb3e2510742023729851b6c08803c</id>
<content type='text'>
Currently, the 'weight' field in the perf sample has latency information
for some instructions like in memory accesses.  And perf tool has 'weight'
and 'local_weight' sort keys to display the info.

But it's somewhat confusing what it shows exactly.  In my understanding,
'local_weight' shows a weight in a single sample, and (global) 'weight'
shows a sum of the weights in the hist_entry.

For example:

  $ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M

  $ perf report --stdio -n -s +local_weight
  ...
  #
  # Overhead  Samples  Command  Shared Object     Symbol                     Local Weight
  # ........  .......  .......  ................  .........................  ............
  #
      21.23%      313  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   32
      12.43%      183  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   35
      11.97%      159  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   36
      10.40%      141  dd       [kernel.vmlinux]  [k] lockref_put_return     32
       7.63%      113  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   33
       6.37%       92  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   34
       6.15%       90  dd       [kernel.vmlinux]  [k] lockref_put_return     33
  ...

So let's look at the 'lockref_get_not_zero' symbols.  The top entry
shows that 313 samples were captured with 'local_weight' 32, so the
total weight should be 313 x 32 = 10016.  But it's not the case:

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  ......
  #
       1.36%        4  dd       [kernel.vmlinux]  36            144
       0.47%        4  dd       [kernel.vmlinux]  37            148
       0.42%        4  dd       [kernel.vmlinux]  32            128
       0.40%        4  dd       [kernel.vmlinux]  34            136
       0.35%        4  dd       [kernel.vmlinux]  36            144
       0.34%        4  dd       [kernel.vmlinux]  35            140
       0.30%        4  dd       [kernel.vmlinux]  36            144
       0.30%        4  dd       [kernel.vmlinux]  34            136
       0.30%        4  dd       [kernel.vmlinux]  32            128
       0.30%        4  dd       [kernel.vmlinux]  32            128
  ...

With the 'weight' sort key, it's divided to 4 samples even with the same
info ('comm', 'dso', 'sym' and 'local_weight').  I don't think this is
what we want.

I found this because of the way it aggregates the 'weight' value.  Since
it's not a period, we should not add them in the he-&gt;stat.  Otherwise,
two 32 'weight' entries will create a 64 'weight' entry.

After that, new 32 'weight' samples don't have a matching entry so it'd
create a new entry and make it a 64 'weight' entry again and again.
Later, they will be merged into 128 'weight' entries during the
hists__collapse_resort() with 4 samples, multiple times like above.

Let's keep the weight and display it differently.  For 'local_weight',
it can show the weight as is, and for (global) 'weight' it can display
the number multiplied by the number of samples.

With this change, I can see the expected numbers.

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  .....
  #
      21.23%      313  dd       [kernel.vmlinux]  32            10016
      12.43%      183  dd       [kernel.vmlinux]  35            6405
      11.97%      159  dd       [kernel.vmlinux]  36            5724
       7.63%      113  dd       [kernel.vmlinux]  33            3729
       6.37%       92  dd       [kernel.vmlinux]  34            3128
       4.17%       59  dd       [kernel.vmlinux]  37            2183
       0.08%        1  dd       [kernel.vmlinux]  269           269
       0.08%        1  dd       [kernel.vmlinux]  38            38

Reviewed-by: Athira Jajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Athira Jajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Free generated help strings for sort option</title>
<updated>2021-07-15T20:27:52+00:00</updated>
<author>
<name>Riccardo Mancini</name>
<email>rickyman7@gmail.com</email>
</author>
<published>2021-07-15T16:07:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a37338aad8c4d8676173ead14e881d2ec308155c'/>
<id>urn:sha1:a37338aad8c4d8676173ead14e881d2ec308155c</id>
<content type='text'>
ASan reports the memory leak of the strings allocated by sort_help() when
running perf report.

This patch changes the returned pointer to char* (instead of const
char*), saves it in a temporary variable, and finally deallocates it at
function exit.

Signed-off-by: Riccardo Mancini &lt;rickyman7@gmail.com&gt;
Fixes: 702fb9b415e7c99b ("perf report: Show all sort keys in help output")
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lore.kernel.org/lkml/a38b13f02812a8a6759200b9063c6191337f44d4.1626343282.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf sort: Display sort dimension p_stage_cyc only on supported archs</title>
<updated>2021-03-26T11:50:00+00:00</updated>
<author>
<name>Athira Rajeev</name>
<email>atrajeev@linux.vnet.ibm.com</email>
</author>
<published>2021-03-22T14:57:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=50fa3a531e8e4b58550171fb159d0aa578c6b52d'/>
<id>urn:sha1:50fa3a531e8e4b58550171fb159d0aa578c6b52d</id>
<content type='text'>
The sort dimension "p_stage_cyc" is used to represent pipeline
stage cycle information. Presently, this is used only in powerpc.

For unsupported platforms, we don't want to display it
in the perf report output columns. Hence add check in sort_dimension__add()
and skip the sort key incase it is not applicable for the particular arch.

Signed-off-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Reviewed-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/1616425047-1666-6-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
</feed>
