<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/tools/perf/util/sort.h, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-08-21T14:48:43+00:00</updated>
<entry>
<title>perf annotate-data: Add 'typecln' sort key</title>
<updated>2024-08-21T14:48:43+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2024-08-19T23:36:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fd45d52eae5c42fdb68802127fa98d3718c80043'/>
<id>urn:sha1:fd45d52eae5c42fdb68802127fa98d3718c80043</id>
<content type='text'>
Sometimes it's useful to organize member fields in cache-line boundary.

The 'typecln' sort key is short for type-cacheline and to show samples
in each cacheline.  The cacheline size is fixed to 64 for now, but it
can read the actual size once it saves the value from sysfs.

For example, you maybe want to which cacheline in a target is hot or
cold.  The following shows members in the cfs_rq's first cache line.

  $ perf report -s type,typecln,typeoff -H
  ...
  -    2.67%        struct cfs_rq
     +    1.23%        struct cfs_rq: cache-line 2
     +    0.57%        struct cfs_rq: cache-line 4
     +    0.46%        struct cfs_rq: cache-line 6
     -    0.41%        struct cfs_rq: cache-line 0
             0.39%        struct cfs_rq +0x14 (h_nr_running)
             0.02%        struct cfs_rq +0x38 (tasks_timeline.rb_leftmost)
  ...

Committer testing:

  # root@number:~# perf report -s type,typecln,typeoff -H --stdio
  # Total Lost Samples: 0
  #
  # Samples: 5K of event 'cpu_atom/mem-loads,ldlat=5/P'
  # Event count (approx.): 312251
  #
  #       Overhead  Data Type / Data Type Cacheline / Data Type Offset
  # ..............  ..................................................
  #
  &lt;SNIP&gt;
       0.07%        struct sigaction
          0.05%        struct sigaction: cache-line 1
             0.02%        struct sigaction +0x58 (sa_mask)
             0.02%        struct sigaction +0x78 (sa_mask)
          0.03%        struct sigaction: cache-line 0
             0.02%        struct sigaction +0x38 (sa_mask)
             0.01%        struct sigaction +0x8 (sa_mask)
  &lt;SNIP&gt;

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20240819233603.54941-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf tools: Add mode argument to sort_help()</title>
<updated>2024-08-01T21:55:55+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2024-07-31T23:55:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=871893d748cce346097d907b9937604af3dafcf1'/>
<id>urn:sha1:871893d748cce346097d907b9937604af3dafcf1</id>
<content type='text'>
Some sort keys are meaningful only in a specific mode - like branch
stack and memory (data-src).  Add the mode to skip unnecessary ones.
This will be used for 'perf mem report' later.

While at it, change the prefix for the -F/--fields option to remove
the duplicate part.

Before:

  $ perf report -F
   Error: switch `F' requires a value
   Usage: perf report [&lt;options&gt;]

      -F, --fields &lt;key[,keys...]&gt;
  			  output field(s): overhead period sample  overhead overhead_sys
  			  overhead_us overhead_guest_sys overhead_guest_us overhead_children
  			  sample period weight1 weight2 weight3 ins_lat retire_lat
  			  ...
After:

  $ perf report -F
   Error: switch `F' requires a value
   Usage: perf report [&lt;options&gt;]

      -F, --fields &lt;key[,keys...]&gt;
  			  output field(s): overhead overhead_sys overhead_us
  			  overhead_guest_sys overhead_guest_us overhead_children
  			  sample period weight1 weight2 weight3 ins_lat retire_lat
  			  ...

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Link: https://lore.kernel.org/r/20240731235505.710436-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Add weight[123] output fields</title>
<updated>2024-04-17T15:21:39+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2024-04-11T18:17:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7043dc5286a8c082d449e2257f9f87d7f764e7d4'/>
<id>urn:sha1:7043dc5286a8c082d449e2257f9f87d7f764e7d4</id>
<content type='text'>
Add weight1, weight2 and weight3 fields to -F/--fields and their aliases
like 'ins_lat', 'p_stage_cyc' and 'retire_lat'.  Note that they are in
the sort keys too but the difference is that output fields will sum up
the weight values and display the average.

In the sort key, users can see the distribution of weight value and I
think it's confusing we have local vs. global weight for the same weight.

For example, I experiment with mem-loads events to get the weights.  On
my laptop, it seems only weight1 field is supported.

  $ perf mem record -- perf test -w noploop

Let's look at the noploop function only.  It has 7 samples.

  $ perf script -F event,ip,sym,weight | grep noploop
  # event                         weight     ip           sym
  cpu/mem-loads,ldlat=30/P:           43     55b3c122bffc noploop
  cpu/mem-loads,ldlat=30/P:           48     55b3c122bffc noploop
  cpu/mem-loads,ldlat=30/P:           38     55b3c122bffc noploop    &lt;--- same weight
  cpu/mem-loads,ldlat=30/P:           38     55b3c122bffc noploop    &lt;--- same weight
  cpu/mem-loads,ldlat=30/P:           59     55b3c122bffc noploop
  cpu/mem-loads,ldlat=30/P:           33     55b3c122bffc noploop
  cpu/mem-loads,ldlat=30/P:           38     55b3c122bffc noploop    &lt;--- same weight

When you use the 'weight' sort key, it'd show entries with a separate
weight value separately.  Also note that the first entry has 3 samples
with weight value 38, so they are displayed together and the weight
value is the sum of 3 samples (114 = 38 * 3).

  $ perf report -n -s +weight | grep -e Weight -e noploop
  # Overhead  Samples  Command   Shared Object   Symbol         Weight
       0.53%        3     perf   perf            [.] noploop    114
       0.18%        1     perf   perf            [.] noploop    59
       0.18%        1     perf   perf            [.] noploop    48
       0.18%        1     perf   perf            [.] noploop    43
       0.18%        1     perf   perf            [.] noploop    33

If you use 'local_weight' sort key, you can see the actual weight.

  $ perf report -n -s +local_weight | grep -e Weight -e noploop
  # Overhead  Samples  Command   Shared Object   Symbol         Local Weight
       0.53%        3     perf   perf            [.] noploop    38
       0.18%        1     perf   perf            [.] noploop    59
       0.18%        1     perf   perf            [.] noploop    48
       0.18%        1     perf   perf            [.] noploop    43
       0.18%        1     perf   perf            [.] noploop    33

But when you use the -F/--field option instead, you can see the average
weight for the while noploop function (as it won't group samples by
weight value and use the default 'comm,dso,sym' sort keys).

  $ perf report -n -F +weight | grep -e Weight -e noploop
  Warning:
  --fields weight shows the average value unlike in the --sort key.
  # Overhead  Samples  Weight1  Command  Shared Object  Symbol
       1.23%        7     42.4  perf     perf           [.] noploop

The weight1 field shows the average value:
  (38 * 3 + 59 + 48 + 43 + 33) / 7 = 42.4

Also it'd show the warning that 'weight' field has the average value.
Using 'weight1' can remove the warning.

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Link: https://lore.kernel.org/r/20240411181718.2367948-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf hist: Move histogram related code to hist.h</title>
<updated>2024-04-17T15:21:39+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2024-04-11T18:17:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0993d724674a9d905ebdc9b9cd0b64e30523c5d6'/>
<id>urn:sha1:0993d724674a9d905ebdc9b9cd0b64e30523c5d6</id>
<content type='text'>
It's strange that sort.h has the definition of struct hist_entry.  As
sort.h already includes hist.h, let's move the data structure to hist.h.

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Link: https://lore.kernel.org/r/20240411181718.2367948-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Add 'symoff' sort key</title>
<updated>2023-12-24T01:39:42+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-12-13T00:13:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e2c1c8ff2d2ffec340b8fc73ee13b8fb516d1c6d'/>
<id>urn:sha1:e2c1c8ff2d2ffec340b8fc73ee13b8fb516d1c6d</id>
<content type='text'>
The symoff sort key is to print symbol and offset of sample.  This is
useful for data type profiling to show exact instruction in the function
which refers the data.

  $ perf report -s type,sym,typeoff,symoff --hierarchy
  ...
  #       Overhead  Data Type / Symbol / Data Type Offset / Symbol Offset
  # ..............  .....................................................
  #
      1.23%         struct cfs_rq
        0.84%         update_blocked_averages
          0.19%         struct cfs_rq +336 (leaf_cfs_rq_list.next)
             0.19%         [k] update_blocked_averages+0x96
          0.19%         struct cfs_rq +0 (load.weight)
             0.14%         [k] update_blocked_averages+0x104
             0.04%         [k] update_blocked_averages+0x31c
          0.17%         struct cfs_rq +404 (throttle_count)
             0.12%         [k] update_blocked_averages+0x9d
             0.05%         [k] update_blocked_averages+0x1f9
          0.08%         struct cfs_rq +272 (propagate)
             0.07%         [k] update_blocked_averages+0x3d3
             0.02%         [k] update_blocked_averages+0x45b
  ...

Committer testing:

  # perf report --stdio -s type,typeoff,symoff
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 4  of event 'cpu_atom/mem-loads,ldlat=30/P'
  # Event count (approx.): 7
  #
  # Overhead  Data Type  Data Type Offset  Symbol Offset
  # ........  .........  ................  .............
  #
      42.86%  struct list_head  struct list_head +8 (prev)  [k] __list_del_entry_valid_or_report+0x7
      28.57%  (unknown)  (unknown) +0 (no field)  [.] _nl_intern_locale_data+0x25
      14.29%  char       char +0 (no field)  [k] strncpy_from_user+0xa5
      14.29%  (unknown)  (unknown) +0 (no field)  [.] _dl_lookup_symbol_x+0x50

  #
  # (Tip: To change sampling frequency to 100 Hz: perf record -F 100)
  #

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-14-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Add 'typeoff' sort key</title>
<updated>2023-12-24T01:39:42+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-12-13T00:13:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=871304a79f755b2ab594bbd21857ecb4c4aa57c9'/>
<id>urn:sha1:871304a79f755b2ab594bbd21857ecb4c4aa57c9</id>
<content type='text'>
The typeoff sort key shows the data type name, offset and the name of
the field.  This is useful to see which field in the struct is accessed
most frequently.

  $ perf report -s type,typeoff --hierarchy --stdio
  ...
  #     Overhead  Data Type / Data Type Offset
  # ............  ............................
  #
  ...
        1.23%     struct cfs_rq
           0.19%    struct cfs_rq +404 (throttle_count)
           0.19%    struct cfs_rq +0 (load.weight)
           0.19%    struct cfs_rq +336 (leaf_cfs_rq_list.next)
           0.09%    struct cfs_rq +272 (propagate)
           0.09%    struct cfs_rq +196 (removed.nr)
           0.09%    struct cfs_rq +80 (curr)
           0.09%    struct cfs_rq +544 (lt_b_children_throttled)
           0.06%    struct cfs_rq +320 (rq)

Committer testing:

Again with the perf.data from the previous csets:

  # perf report --stdio -s type,typeoff
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 4  of event 'cpu_atom/mem-loads,ldlat=30/P'
  # Event count (approx.): 7
  #
  # Overhead  Data Type  Data Type Offset
  # ........  .........  ................
  #
      42.86%  struct list_head  struct list_head +8 (prev)
      42.86%  (unknown)  (unknown) +0 (no field)
      14.29%  char       char +0 (no field)

  #
  # (Tip: To see callchains in a more compact form: perf report -g folded)
  #
  # perf report --stdio -s dso,type,typeoff
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 4  of event 'cpu_atom/mem-loads,ldlat=30/P'
  # Event count (approx.): 7
  #
  # Overhead  Shared Object         Data Type  Data Type Offset
  # ........  ....................  .........  ................
  #
      42.86%  [kernel.kallsyms]     struct list_head  struct list_head +8 (prev)
      28.57%  libc.so.6             (unknown)  (unknown) +0 (no field)
      14.29%  [kernel.kallsyms]     char       char +0 (no field)
      14.29%  ld-linux-x86-64.so.2  (unknown)  (unknown) +0 (no field)

  #
  # (Tip: If you have debuginfo enabled, try: perf report -s sym,srcline)
  #
  #

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-13-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Add 'type' sort key</title>
<updated>2023-12-24T01:39:42+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-12-13T00:13:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2f2c41bdd87f450a6a71c5d090d42c248ca4bf1e'/>
<id>urn:sha1:2f2c41bdd87f450a6a71c5d090d42c248ca4bf1e</id>
<content type='text'>
The 'type' sort key is to aggregate hist entries by data type they
access.  Add mem_type field to hist_entry struct to save the type.  If
hist_entry__get_data_type() returns NULL, it'd use the 'unknown_type'
instance.

Committer testing:

Before:

  # perf mem record  sleep 2s
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.037 MB perf.data (4 samples) ]
  root@number:/home/acme/Downloads# perf report --stdio -s type
  Error:
  Unknown --sort key: `type'
   Usage: perf report [&lt;options&gt;]

      -s, --sort &lt;key[,key2...]&gt;
                            sort by key(s): overhead overhead_sys overhead_us overhead_guest_sys
                            overhead_guest_us overhead_children sample period
                            pid comm dso symbol parent cpu socket srcline srcfile
                            local_weight weight transaction trace symbol_size
                            dso_size cgroup cgroup_id ipc_null time code_page_size
                            local_ins_lat ins_lat local_p_stage_cyc p_stage_cyc
                            addr local_retire_lat retire_lat simd dso_from dso_to
                            symbol_from symbol_to mispredict abort in_tx cycles
                            srcline_from srcline_to ipc_lbr addr_from addr_to
                            symbol_daddr dso_daddr locked tlb mem snoop dcacheline
                            symbol_iaddr phys_daddr data_page_size blocked
  #

After:

  # perf report --stdio -s type
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 4  of event 'cpu_atom/mem-loads,ldlat=30/P'
  # Event count (approx.): 7
  #
  # Overhead  Data Type
  # ........  .........
  #
     100.00%  (unknown)

  #
  # (Tip: Print event counts in CSV format with: perf stat -x,)
  #
  # rpm -q kernel-debuginfo
  kernel-debuginfo-6.6.4-200.fc39.x86_64
  # uname -r
  6.6.4-200.fc39.x86_64
  #

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: linux-toolchains@vger.kernel.org&gt;
Cc: linux-trace-devel@vger.kernel.org&gt;
Link: https://lore.kernel.org/r/20231213001323.718046-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf report: Add 'simd' sort field</title>
<updated>2023-03-20T22:28:21+00:00</updated>
<author>
<name>German Gomez</name>
<email>german.gomez@arm.com</email>
</author>
<published>2023-03-20T15:15:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ea15483e7c55f73809cd9e208fff511966539ee5'/>
<id>urn:sha1:ea15483e7c55f73809cd9e208fff511966539ee5</id>
<content type='text'>
Add 'simd' sort field to visualize SIMD ops in 'perf report'.

Rows are labeled with the SIMD ISA, and the type of predicate (if any):

  - [p] partial predicate
  - [e] empty predicate (no elements in the vector being used)

Example with Arm SPE and SVE (Scalable Vector Extension):

  #include &lt;arm_sve.h&gt;

  double src[1025], dst[1025];

  int main(void) {
    svfloat64_t vc = svdup_f64(1);
    for(;;)
      for(int i = 0; i &lt; 1025; i += svcntd())
      {
        svbool_t pg = svwhilelt_b64(i, 1025);
        svfloat64_t vsrc = svld1(pg, &amp;src[i]);
        svfloat64_t vdst = svadd_x(pg, vsrc, vc);
        svst1(pg, &amp;dst[i], vdst);
      }
    return 0;
  }

  ... compiled using "gcc-11 -march=armv8-a+sve -O3"

Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1:

  $ perf record -e arm_spe_0// -- ./a.out
  $ perf report --itrace=i1i -s overhead,pid,simd,sym

  Overhead      Pid:Command   Simd     Symbol
  ........  ................  .......  ......................

    53.76%    10758:program            [.] main
    46.14%    10758:program   [.] SVE  [.] main
     0.09%    10758:program   [p] SVE  [.] main

The report shows 0.09% of the sampled SVE operations use partial
predicates due to src and dst arrays not being multiples of the vector
register lengths.

Signed-off-by: German Gomez &lt;german.gomez@arm.com&gt;
Acked-by: Ian Rogers &lt;irogers@google.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Leo Yan &lt;leo.yan@linaro.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark &lt;james.clark@arm.com&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf hist: Add 'kvm_info' field in histograms entry</title>
<updated>2023-03-15T19:47:20+00:00</updated>
<author>
<name>Leo Yan</name>
<email>leo.yan@linaro.org</email>
</author>
<published>2023-03-15T14:51:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ebf39d29b985b0c0f75c2e752781a9a80daadc0e'/>
<id>urn:sha1:ebf39d29b985b0c0f75c2e752781a9a80daadc0e</id>
<content type='text'>
__hists__add_entry() creates a temporary entry and compare it with
existed histograms entries, if any existed entry equals to the
temporary entry it skips to allocation to avoid duplication.

The problem for support KVM event in histograms is it doesn't contain
any info to identify KVM event and can be used for comparison entries.

This patch adds 'kvm_info' field in the histograms entry which contains
the KVM event's key, this identifier will be used for comparison
histograms entries in later change.

Signed-off-by: Leo Yan &lt;leo.yan@linaro.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@arm.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf c2c: Add report option to show false sharing in adjacent cachelines</title>
<updated>2023-02-16T12:33:45+00:00</updated>
<author>
<name>Feng Tang</name>
<email>feng.tang@intel.com</email>
</author>
<published>2023-02-14T07:58:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1470a108a60e8c0c4d19da10117c9b98f0078654'/>
<id>urn:sha1:1470a108a60e8c0c4d19da10117c9b98f0078654</id>
<content type='text'>
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.

0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.

So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.

In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):

  ----------------------------------------------------------------------
     26       31        2        0        0        0  0xffff888103ec6000
  ----------------------------------------------------------------------
   35.48%   50.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff8133148b   1153   66    971   3748   74  [k] get_mem_cgroup_from_mm
    6.45%    0.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff813396e4    570    0   1531    879   75  [k] mem_cgroup_charge
   25.81%   50.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81331472    949   70    593   3359   74  [k] get_mem_cgroup_from_mm
   19.35%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81339686   1352    0   1073   1022   74  [k] mem_cgroup_charge
    9.68%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff813396d6   1401    0    863    768   74  [k] mem_cgroup_charge
    3.23%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81333106    618    0    804     11    9  [k] uncharge_batch

The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.

[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/

Committer notes:

Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx

Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.

Reviewed-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Reviewed-by: Leo Yan &lt;leo.yan@linaro.org&gt;
Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Tested-by: Leo Yan &lt;leo.yan@linaro.org&gt;
Acked-by: Joe Mario &lt;jmario@redhat.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tim Chen &lt;tim.c.chen@intel.com&gt;
Cc: Xing Zhengjun &lt;zhengjun.xing@linux.intel.com&gt;
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
</feed>
