<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/tools/perf/Documentation, branch v6.7.3</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.7.3</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.7.3'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-10-25T17:02:10+00:00</updated>
<entry>
<title>perf bench sched pipe: Add -G/--cgroups option</title>
<updated>2023-10-25T17:02:10+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-10-17T20:23:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=79a3371bdf453bedb9d2c80df5adc201dddc11b5'/>
<id>urn:sha1:79a3371bdf453bedb9d2c80df5adc201dddc11b5</id>
<content type='text'>
The -G/--cgroups option is to put sender and receiver in different
cgroups in order to measure cgroup context switch overheads.

Users need to make sure the cgroups exist and accessible.  The following
example should the effect of this change.  Please don't forget taskset
before the perf bench to measure cgroup switches properly.  Otherwise
each task would run on a different CPU and generate cgroup switches
regardless of this change.

  # perf stat -e context-switches,cgroup-switches \
  &gt; taskset -c 0 perf bench sched pipe -l 10000 &gt; /dev/null

   Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000':

              20,001      context-switches
                   2      cgroup-switches

         0.053449651 seconds time elapsed

         0.011286000 seconds user
         0.041869000 seconds sys

  # perf stat -e context-switches,cgroup-switches \
  &gt; taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB &gt; /dev/null

   Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB':

              20,001      context-switches
              20,001      cgroup-switches

         0.052768627 seconds time elapsed

         0.006284000 seconds user
         0.046266000 seconds sys

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Link: https://lore.kernel.org/r/20231017202342.1353124-1-namhyung@kernel.org
</content>
</entry>
<entry>
<title>perf lock contention: Add -G/--cgroup-filter option</title>
<updated>2023-09-12T20:32:00+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-09-06T17:49:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4fd06bd2dcc893d922985cc4e9082477f6d115e6'/>
<id>urn:sha1:4fd06bd2dcc893d922985cc4e9082477f6d115e6</id>
<content type='text'>
The -G/--cgroup-filter is to limit lock contention collection on the
tasks in the specific cgroups only.

  $ sudo ./perf lock con -abt -G /user.slice/.../vte-spawn-52221fb8-b33f-4a52-b5c3-e35d1e6fc0e0.scope \
    ./perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.174 [sec]
   contended   total wait     max wait     avg wait          pid   comm

           4    114.45 us     60.06 us     28.61 us       214847   sched-messaging
           2    111.40 us     60.84 us     55.70 us       214848   sched-messaging
           2    106.09 us     59.42 us     53.04 us       214837   sched-messaging
           1     81.70 us     81.70 us     81.70 us       214709   sched-messaging
          68     78.44 us      6.83 us      1.15 us       214633   sched-messaging
          69     73.71 us      2.69 us      1.07 us       214632   sched-messaging
           4     72.62 us     60.83 us     18.15 us       214850   sched-messaging
           2     71.75 us     67.60 us     35.88 us       214840   sched-messaging
           2     69.29 us     67.53 us     34.65 us       214804   sched-messaging
           2     69.00 us     68.23 us     34.50 us       214826   sched-messaging
  ...

Export cgroup__new() function as it's needed from outside.

Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Hao Luo &lt;haoluo@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf lock contention: Add --lock-cgroup option</title>
<updated>2023-09-12T20:32:00+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-09-06T17:49:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d1792d0a2564caf1620a2cc9aa6f12284b3a56f'/>
<id>urn:sha1:4d1792d0a2564caf1620a2cc9aa6f12284b3a56f</id>
<content type='text'>
The --lock-cgroup option shows lock contention stats break down by
cgroups.

Add LOCK_AGGR_CGROUP mode and use it instead of use_cgroup field.

  $ sudo ./perf lock con -ab --lock-cgroup sleep 1
   contended   total wait     max wait     avg wait   cgroup

           8     15.70 us      6.34 us      1.96 us   /
           2      1.48 us       747 ns       738 ns   /user.slice/.../app.slice/app-gnome-google\x2dchrome-6442.scope
           1       848 ns       848 ns       848 ns   /user.slice/.../session.slice/org.gnome.Shell@x11.service
           1       220 ns       220 ns       220 ns   /user.slice/.../session.slice/pipewire-pulse.service

For now, the cgroup mode only works with BPF (-b).

Committer notes:

Remove -g as it is used in the other tools with a clear meaning of
collect/show callchains. As agreed with Namhyung off list.

Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Hao Luo &lt;haoluo@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf kwork top: Implements BPF-based cpu usage statistics</title>
<updated>2023-09-12T20:31:59+00:00</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-08-12T08:49:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8c98420987cda501924b7a34f2413b982517d244'/>
<id>urn:sha1:8c98420987cda501924b7a34f2413b982517d244</id>
<content type='text'>
Use BPF to collect statistics on the CPU usage based on perf BPF skeletons.

Example usage:

  # perf kwork top -h

   Usage: perf kwork top [&lt;options&gt;]

      -b, --use-bpf         Use BPF to measure task cpu usage
      -C, --cpu &lt;cpu&gt;       list of cpus to profile
      -i, --input &lt;file&gt;    input file name
      -n, --name &lt;name&gt;     event name to profile
      -s, --sort &lt;key[,key2...]&gt;
                            sort by key(s): rate, runtime, tid
          --time &lt;str&gt;      Time span for analysis (start,stop)

  #
  # perf kwork -k sched top -b
  Starting trace, Hit &lt;Ctrl+C&gt; to stop and report
  ^C
  Total  : 160702.425 ms, 8 cpus
  %Cpu(s):  36.00% id,   0.00% hi,   0.00% si
  %Cpu0   [||||||||||||||||||              61.66%]
  %Cpu1   [||||||||||||||||||              61.27%]
  %Cpu2   [|||||||||||||||||||             66.40%]
  %Cpu3   [||||||||||||||||||              61.28%]
  %Cpu4   [||||||||||||||||||              61.82%]
  %Cpu5   [|||||||||||||||||||||||         77.41%]
  %Cpu6   [||||||||||||||||||              61.73%]
  %Cpu7   [||||||||||||||||||              63.25%]

        PID     SPID    %CPU           RUNTIME  COMMMAND
    -------------------------------------------------------------
          0        0   38.72       8089.463 ms  [swapper/1]
          0        0   38.71       8084.547 ms  [swapper/3]
          0        0   38.33       8007.532 ms  [swapper/0]
          0        0   38.26       7992.985 ms  [swapper/6]
          0        0   38.17       7971.865 ms  [swapper/4]
          0        0   36.74       7447.765 ms  [swapper/7]
          0        0   33.59       6486.942 ms  [swapper/2]
          0        0   22.58       3771.268 ms  [swapper/5]
       9545     9351    2.48        447.136 ms  sched-messaging
       9574     9351    2.09        418.583 ms  sched-messaging
       9724     9351    2.05        372.407 ms  sched-messaging
       9531     9351    2.01        368.804 ms  sched-messaging
       9512     9351    2.00        362.250 ms  sched-messaging
       9514     9351    1.95        357.767 ms  sched-messaging
       9538     9351    1.86        384.476 ms  sched-messaging
       9712     9351    1.84        386.490 ms  sched-messaging
       9723     9351    1.83        380.021 ms  sched-messaging
       9722     9351    1.82        382.738 ms  sched-messaging
       9517     9351    1.81        354.794 ms  sched-messaging
       9559     9351    1.79        344.305 ms  sched-messaging
       9725     9351    1.77        365.315 ms  sched-messaging
  &lt;SNIP&gt;

  # perf kwork -k sched top -b -n perf
  Starting trace, Hit &lt;Ctrl+C&gt; to stop and report
  ^C
  Total  : 151563.332 ms, 8 cpus
  %Cpu(s):  26.49% id,   0.00% hi,   0.00% si
  %Cpu0   [                                 0.01%]
  %Cpu1   [                                 0.00%]
  %Cpu2   [                                 0.00%]
  %Cpu3   [                                 0.00%]
  %Cpu4   [                                 0.00%]
  %Cpu5   [                                 0.00%]
  %Cpu6   [                                 0.00%]
  %Cpu7   [                                 0.00%]

        PID     SPID    %CPU           RUNTIME  COMMMAND
    -------------------------------------------------------------
       9754     9754    0.01          2.303 ms  perf

  #
  # perf kwork -k sched top -b -C 2,3,4
  Starting trace, Hit &lt;Ctrl+C&gt; to stop and report
  ^C
  Total  :  48016.721 ms, 3 cpus
  %Cpu(s):  27.82% id,   0.00% hi,   0.00% si
  %Cpu2   [||||||||||||||||||||||          74.68%]
  %Cpu3   [|||||||||||||||||||||           71.06%]
  %Cpu4   [|||||||||||||||||||||           70.91%]

        PID     SPID    %CPU           RUNTIME  COMMMAND
    -------------------------------------------------------------
          0        0   29.08       4734.998 ms  [swapper/4]
          0        0   28.93       4710.029 ms  [swapper/3]
          0        0   25.31       3912.363 ms  [swapper/2]
      10248    10158    1.62        264.931 ms  sched-messaging
      10253    10158    1.62        265.136 ms  sched-messaging
      10158    10158    1.60        263.013 ms  bash
      10360    10158    1.49        243.639 ms  sched-messaging
      10413    10158    1.48        238.604 ms  sched-messaging
      10531    10158    1.47        234.067 ms  sched-messaging
      10400    10158    1.47        240.631 ms  sched-messaging
      10355    10158    1.47        230.586 ms  sched-messaging
      10377    10158    1.43        234.835 ms  sched-messaging
      10526    10158    1.42        232.045 ms  sched-messaging
      10298    10158    1.41        222.396 ms  sched-messaging
      10410    10158    1.38        221.853 ms  sched-messaging
      10364    10158    1.38        226.042 ms  sched-messaging
      10480    10158    1.36        213.633 ms  sched-messaging
      10370    10158    1.36        223.620 ms  sched-messaging
      10553    10158    1.34        217.169 ms  sched-messaging
      10291    10158    1.34        211.516 ms  sched-messaging
      10251    10158    1.34        218.813 ms  sched-messaging
      10522    10158    1.33        218.498 ms  sched-messaging
      10288    10158    1.33        216.787 ms  sched-messaging
  &lt;SNIP&gt;

Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Sandipan Das &lt;sandipan.das@amd.com&gt;
Link: https://lore.kernel.org/r/20230812084917.169338-15-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf kwork top: Add -C/--cpu -i/--input -n/--name -s/--sort --time options</title>
<updated>2023-09-12T20:31:59+00:00</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-08-12T08:49:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aa172a5ad3159c1cc97da37253c972683681adaa'/>
<id>urn:sha1:aa172a5ad3159c1cc97da37253c972683681adaa</id>
<content type='text'>
Provide the following options for perf kwork top:

1. -C, --cpu &lt;cpu&gt;		list of cpus to profile
2. -i, --input &lt;file&gt;		input file name
3. -n, --name &lt;name&gt;		event name to profile
4. -s, --sort &lt;key[,key2...]&gt;	sort by key(s): rate, runtime, tid
5. --time &lt;str&gt;		Time span for analysis (start,stop)

Example usage:

  # perf kwork top -h

   Usage: perf kwork top [&lt;options&gt;]

      -C, --cpu &lt;cpu&gt;       list of cpus to profile
      -i, --input &lt;file&gt;    input file name
      -n, --name &lt;name&gt;     event name to profile
      -s, --sort &lt;key[,key2...]&gt;
                            sort by key(s): rate, runtime, tid
          --time &lt;str&gt;      Time span for analysis (start,stop)

  # perf kwork top -C 2,4,5

  Total  :  51226.940 ms, 3 cpus
  %Cpu(s):  92.59% id,   0.00% hi,   0.09% si
  %Cpu2   [|                                4.61%]
  %Cpu4   [                                 0.01%]
  %Cpu5   [|||||                           17.31%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.98      17073.515 ms  swapper/4
          0   95.17      16250.874 ms  swapper/2
          0   82.62      14108.577 ms  swapper/5
       4342   21.70       3708.358 ms  perf
         16    0.13         22.296 ms  rcu_preempt
         75    0.02          4.261 ms  kworker/2:1
         98    0.01          2.540 ms  jbd2/sda-8
         61    0.01          3.404 ms  kcompactd0
         87    0.00          0.145 ms  kworker/5:1H
         73    0.00          0.596 ms  kworker/5:1
         41    0.00          0.041 ms  ksoftirqd/5
         40    0.00          0.718 ms  migration/5
         64    0.00          0.115 ms  kworker/4:1
         35    0.00          0.556 ms  migration/4
        353    0.00          1.143 ms  sshd
         26    0.00          1.665 ms  ksoftirqd/2
         25    0.00          0.662 ms  migration/2

  # perf kwork top -i perf.data

  Total  : 136601.588 ms, 8 cpus
  %Cpu(s):  95.66% id,   0.04% hi,   0.05% si
  %Cpu0   [                                 0.02%]
  %Cpu1   [                                 0.01%]
  %Cpu2   [|                                4.61%]
  %Cpu3   [                                 0.04%]
  %Cpu4   [                                 0.01%]
  %Cpu5   [|||||                           17.31%]
  %Cpu6   [                                 0.51%]
  %Cpu7   [|||                             11.42%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.98      17073.515 ms  swapper/4
          0   99.98      17072.173 ms  swapper/1
          0   99.93      17064.229 ms  swapper/3
          0   99.62      17011.013 ms  swapper/0
          0   99.47      16985.180 ms  swapper/6
          0   95.17      16250.874 ms  swapper/2
          0   88.51      15111.684 ms  swapper/7
          0   82.62      14108.577 ms  swapper/5
       4342   33.00       5644.045 ms  perf
       4344    0.43         74.351 ms  perf
         16    0.13         22.296 ms  rcu_preempt
       4345    0.05         10.093 ms  perf
       4343    0.05          8.769 ms  perf
       4341    0.02          4.882 ms  perf
       4095    0.02          4.605 ms  kworker/7:1
         75    0.02          4.261 ms  kworker/2:1
        120    0.01          1.909 ms  systemd-journal
         98    0.01          2.540 ms  jbd2/sda-8
         61    0.01          3.404 ms  kcompactd0
        667    0.01          2.542 ms  kworker/u16:2
       4340    0.00          1.052 ms  kworker/7:2
         97    0.00          0.489 ms  kworker/7:1H
         51    0.00          0.209 ms  ksoftirqd/7
         50    0.00          0.646 ms  migration/7
         76    0.00          0.753 ms  kworker/6:1
         45    0.00          0.572 ms  migration/6
         87    0.00          0.145 ms  kworker/5:1H
         73    0.00          0.596 ms  kworker/5:1
         41    0.00          0.041 ms  ksoftirqd/5
         40    0.00          0.718 ms  migration/5
         64    0.00          0.115 ms  kworker/4:1
         35    0.00          0.556 ms  migration/4
        353    0.00          2.600 ms  sshd
         74    0.00          0.205 ms  kworker/3:1
         33    0.00          1.576 ms  kworker/3:0H
         30    0.00          0.996 ms  migration/3
         26    0.00          1.665 ms  ksoftirqd/2
         25    0.00          0.662 ms  migration/2
        397    0.00          0.057 ms  kworker/1:1
         20    0.00          1.005 ms  migration/1
       2909    0.00          1.053 ms  kworker/0:2
         17    0.00          0.720 ms  migration/0
         15    0.00          0.039 ms  ksoftirqd/0

  # perf kwork top -n perf

  Total  : 136601.588 ms, 8 cpus
  %Cpu(s):  95.66% id,   0.04% hi,   0.05% si
  %Cpu0   [                                 0.01%]
  %Cpu1   [                                 0.00%]
  %Cpu2   [|                                4.44%]
  %Cpu3   [                                 0.00%]
  %Cpu4   [                                 0.00%]
  %Cpu5   [                                 0.00%]
  %Cpu6   [                                 0.49%]
  %Cpu7   [|||                             11.38%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
       4342   15.74       2695.516 ms  perf
       4344    0.43         74.351 ms  perf
       4345    0.05         10.093 ms  perf
       4343    0.05          8.769 ms  perf
       4341    0.02          4.882 ms  perf

  # perf kwork top -s tid

  Total  : 136601.588 ms, 8 cpus
  %Cpu(s):  95.66% id,   0.04% hi,   0.05% si
  %Cpu0   [                                 0.02%]
  %Cpu1   [                                 0.01%]
  %Cpu2   [|                                4.61%]
  %Cpu3   [                                 0.04%]
  %Cpu4   [                                 0.01%]
  %Cpu5   [|||||                           17.31%]
  %Cpu6   [                                 0.51%]
  %Cpu7   [|||                             11.42%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.62      17011.013 ms  swapper/0
          0   99.98      17072.173 ms  swapper/1
          0   95.17      16250.874 ms  swapper/2
          0   99.93      17064.229 ms  swapper/3
          0   99.98      17073.515 ms  swapper/4
          0   82.62      14108.577 ms  swapper/5
          0   99.47      16985.180 ms  swapper/6
          0   88.51      15111.684 ms  swapper/7
         15    0.00          0.039 ms  ksoftirqd/0
         16    0.13         22.296 ms  rcu_preempt
         17    0.00          0.720 ms  migration/0
         20    0.00          1.005 ms  migration/1
         25    0.00          0.662 ms  migration/2
         26    0.00          1.665 ms  ksoftirqd/2
         30    0.00          0.996 ms  migration/3
         33    0.00          1.576 ms  kworker/3:0H
         35    0.00          0.556 ms  migration/4
         40    0.00          0.718 ms  migration/5
         41    0.00          0.041 ms  ksoftirqd/5
         45    0.00          0.572 ms  migration/6
         50    0.00          0.646 ms  migration/7
         51    0.00          0.209 ms  ksoftirqd/7
         61    0.01          3.404 ms  kcompactd0
         64    0.00          0.115 ms  kworker/4:1
         73    0.00          0.596 ms  kworker/5:1
         74    0.00          0.205 ms  kworker/3:1
         75    0.02          4.261 ms  kworker/2:1
         76    0.00          0.753 ms  kworker/6:1
         87    0.00          0.145 ms  kworker/5:1H
         97    0.00          0.489 ms  kworker/7:1H
         98    0.01          2.540 ms  jbd2/sda-8
        120    0.01          1.909 ms  systemd-journal
        353    0.00          2.600 ms  sshd
        397    0.00          0.057 ms  kworker/1:1
        667    0.01          2.542 ms  kworker/u16:2
       2909    0.00          1.053 ms  kworker/0:2
       4095    0.02          4.605 ms  kworker/7:1
       4340    0.00          1.052 ms  kworker/7:2
       4341    0.02          4.882 ms  perf
       4342   33.00       5644.045 ms  perf
       4343    0.05          8.769 ms  perf
       4344    0.43         74.351 ms  perf
       4345    0.05         10.093 ms  perf

  # perf kwork top --time 128800,

  Total  :  53495.122 ms, 8 cpus
  %Cpu(s):  94.71% id,   0.09% hi,   0.09% si
  %Cpu0   [                                 0.07%]
  %Cpu1   [                                 0.04%]
  %Cpu2   [||                               8.49%]
  %Cpu3   [                                 0.09%]
  %Cpu4   [                                 0.02%]
  %Cpu5   [                                 0.06%]
  %Cpu6   [                                 0.12%]
  %Cpu7   [||||||                          21.24%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.96       3981.363 ms  swapper/4
          0   99.94       3978.955 ms  swapper/1
          0   99.91       9329.375 ms  swapper/5
          0   99.87       4906.829 ms  swapper/3
          0   99.86       9028.064 ms  swapper/6
          0   98.67       3928.161 ms  swapper/0
          0   91.17       8388.432 ms  swapper/2
          0   78.65       7125.602 ms  swapper/7
       4342   29.42       2675.198 ms  perf
         16    0.18         16.817 ms  rcu_preempt
       4345    0.09          8.183 ms  perf
       4344    0.04          4.290 ms  perf
       4343    0.03          2.844 ms  perf
        353    0.03          2.600 ms  sshd
       4095    0.02          2.702 ms  kworker/7:1
        120    0.02          1.909 ms  systemd-journal
         98    0.02          2.540 ms  jbd2/sda-8
         61    0.02          1.886 ms  kcompactd0
        667    0.02          1.011 ms  kworker/u16:2
         75    0.02          2.693 ms  kworker/2:1
       4341    0.01          1.838 ms  perf
         30    0.01          0.788 ms  migration/3
         26    0.01          1.665 ms  ksoftirqd/2
         20    0.01          0.752 ms  migration/1
       2909    0.01          0.604 ms  kworker/0:2
       4340    0.00          0.635 ms  kworker/7:2
         97    0.00          0.214 ms  kworker/7:1H
         51    0.00          0.209 ms  ksoftirqd/7
         50    0.00          0.646 ms  migration/7
         76    0.00          0.602 ms  kworker/6:1
         45    0.00          0.366 ms  migration/6
         87    0.00          0.145 ms  kworker/5:1H
         40    0.00          0.446 ms  migration/5
         35    0.00          0.318 ms  migration/4
         74    0.00          0.205 ms  kworker/3:1
         33    0.00          0.080 ms  kworker/3:0H
         25    0.00          0.448 ms  migration/2
        397    0.00          0.057 ms  kworker/1:1
         17    0.00          0.365 ms  migration/0

Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Sandipan Das &lt;sandipan.das@amd.com&gt;
Link: https://lore.kernel.org/r/20230812084917.169338-14-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf kwork top: Introduce new top utility</title>
<updated>2023-09-12T20:31:59+00:00</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-08-12T08:49:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=55c40e505234588d049384aa45a6931438c3028d'/>
<id>urn:sha1:55c40e505234588d049384aa45a6931438c3028d</id>
<content type='text'>
Some common tools for collecting statistics on CPU usage, such as top,
obtain statistics from timer interrupt sampling, and then periodically
read statistics from /proc/stat.

This method has some deviations:

1. In the tick interrupt, the time between the last tick and the current
   tick is counted in the current task. However, the task may be running
   only part of the time.
2. For each task, the top tool periodically reads the /proc/{PID}/status
   information. For tasks with a short life cycle, it may be missed.

In conclusion, the top tool cannot accurately collect statistics on the
CPU usage and running time of tasks.

The statistical method based on sched_switch tracepoint can accurately
calculate the CPU usage of all tasks. This method is applicable to
scenarios where performance comparison data is of high precision.

Example usage:

  # perf kwork

   Usage: perf kwork [&lt;options&gt;] {record|report|latency|timehist|top}

      -D, --dump-raw-trace  dump raw trace in ASCII
      -f, --force           don't complain, do it
      -k, --kwork &lt;kwork&gt;   list of kwork to profile (irq, softirq, workqueue, sched, etc)
      -v, --verbose         be more verbose (show symbol address, etc)

  # perf kwork -k sched record -- perf bench sched messaging -g 1 -l 10000
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 1 groups == 40 processes run

       Total time: 14.074 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 15.886 MB perf.data (129472 samples) ]
  # perf kwork top

  Total  : 115708.178 ms, 8 cpus
  %Cpu(s):   9.78% id
  %Cpu0   [|||||||||||||||||||||||||||     90.55%]
  %Cpu1   [|||||||||||||||||||||||||||     90.51%]
  %Cpu2   [||||||||||||||||||||||||||      88.57%]
  %Cpu3   [|||||||||||||||||||||||||||     91.18%]
  %Cpu4   [|||||||||||||||||||||||||||     91.09%]
  %Cpu5   [|||||||||||||||||||||||||||     90.88%]
  %Cpu6   [||||||||||||||||||||||||||      88.64%]
  %Cpu7   [|||||||||||||||||||||||||||     90.28%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
       4113   22.23       3221.547 ms  sched-messaging
       4105   21.61       3131.495 ms  sched-messaging
       4119   21.53       3120.937 ms  sched-messaging
       4103   21.39       3101.614 ms  sched-messaging
       4106   21.37       3095.209 ms  sched-messaging
       4104   21.25       3077.269 ms  sched-messaging
       4115   21.21       3073.188 ms  sched-messaging
       4109   21.18       3069.022 ms  sched-messaging
       4111   20.78       3010.033 ms  sched-messaging
       4114   20.74       3007.073 ms  sched-messaging
       4108   20.73       3002.137 ms  sched-messaging
       4107   20.47       2967.292 ms  sched-messaging
       4117   20.39       2955.335 ms  sched-messaging
       4112   20.34       2947.080 ms  sched-messaging
       4118   20.32       2942.519 ms  sched-messaging
       4121   20.23       2929.865 ms  sched-messaging
       4110   20.22       2930.078 ms  sched-messaging
       4122   20.15       2919.542 ms  sched-messaging
       4120   19.77       2866.032 ms  sched-messaging
       4116   19.72       2857.660 ms  sched-messaging
       4127   16.19       2346.334 ms  sched-messaging
       4142   15.86       2297.600 ms  sched-messaging
       4141   15.62       2262.646 ms  sched-messaging
       4136   15.41       2231.408 ms  sched-messaging
       4130   15.38       2227.008 ms  sched-messaging
       4129   15.31       2217.692 ms  sched-messaging
       4126   15.21       2201.711 ms  sched-messaging
       4139   15.19       2200.722 ms  sched-messaging
       4137   15.10       2188.633 ms  sched-messaging
       4134   15.06       2182.082 ms  sched-messaging
       4132   15.02       2177.530 ms  sched-messaging
       4131   14.73       2131.973 ms  sched-messaging
       4125   14.68       2125.439 ms  sched-messaging
       4128   14.66       2122.255 ms  sched-messaging
       4123   14.65       2122.113 ms  sched-messaging
       4135   14.56       2107.144 ms  sched-messaging
       4133   14.51       2103.549 ms  sched-messaging
       4124   14.27       2066.671 ms  sched-messaging
       4140   14.17       2052.251 ms  sched-messaging
       4138   13.81       2000.361 ms  sched-messaging
          0   11.42       1652.009 ms  swapper/2
          0   11.35       1641.694 ms  swapper/6
          0    9.71       1405.108 ms  swapper/7
          0    9.48       1372.338 ms  swapper/1
          0    9.44       1366.013 ms  swapper/0
          0    9.11       1318.382 ms  swapper/5
          0    8.90       1287.582 ms  swapper/4
          0    8.81       1274.356 ms  swapper/3
       4100    2.61        379.328 ms  perf
       4101    1.16        169.487 ms  perf-exec
        151    0.65         94.741 ms  systemd-resolve
        249    0.36         53.030 ms  sd-resolve
        153    0.14         21.405 ms  systemd-timesyn
          1    0.10         16.200 ms  systemd
         16    0.09         15.785 ms  rcu_preempt
       4102    0.06          9.727 ms  perf
       4095    0.03          5.464 ms  kworker/7:1
         98    0.02          3.231 ms  jbd2/sda-8
        353    0.02          4.115 ms  sshd
         75    0.02          3.889 ms  kworker/2:1
         73    0.01          1.552 ms  kworker/5:1
         64    0.01          1.591 ms  kworker/4:1
         74    0.01          1.952 ms  kworker/3:1
         61    0.01          2.608 ms  kcompactd0
        397    0.01          1.602 ms  kworker/1:1
         69    0.01          1.817 ms  kworker/1:1H
         10    0.01          2.553 ms  kworker/u16:0
       2909    0.01          2.684 ms  kworker/0:2
       1211    0.00          0.426 ms  kworker/7:0
         97    0.00          0.153 ms  kworker/7:1H
         51    0.00          0.100 ms  ksoftirqd/7
        120    0.00          0.856 ms  systemd-journal
         76    0.00          1.414 ms  kworker/6:1
         46    0.00          0.246 ms  ksoftirqd/6
         45    0.00          0.164 ms  migration/6
         41    0.00          0.098 ms  ksoftirqd/5
         40    0.00          0.207 ms  migration/5
         86    0.00          1.339 ms  kworker/4:1H
         36    0.00          0.252 ms  ksoftirqd/4
         35    0.00          0.090 ms  migration/4
         31    0.00          0.156 ms  ksoftirqd/3
         30    0.00          0.073 ms  migration/3
         26    0.00          0.180 ms  ksoftirqd/2
         25    0.00          0.085 ms  migration/2
         21    0.00          0.106 ms  ksoftirqd/1
         20    0.00          0.118 ms  migration/1
        302    0.00          1.440 ms  systemd-logind
         17    0.00          0.132 ms  migration/0
         15    0.00          0.255 ms  ksoftirqd/0

Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Sandipan Das &lt;sandipan.das@amd.com&gt;
Link: https://lore.kernel.org/r/20230812084917.169338-10-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf kwork: Add sched record support</title>
<updated>2023-09-12T20:31:59+00:00</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-08-12T08:49:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=38d8d013a525daf4f98a2b53065d8e89d5f810dd'/>
<id>urn:sha1:38d8d013a525daf4f98a2b53065d8e89d5f810dd</id>
<content type='text'>
The kwork_class type of sched is added to support recording and parsing of
sched_switch events.

As follows:

  # perf kwork -h

   Usage: perf kwork [&lt;options&gt;] {record|report|latency|timehist}

      -D, --dump-raw-trace  dump raw trace in ASCII
      -f, --force           don't complain, do it
      -k, --kwork &lt;kwork&gt;   list of kwork to profile (irq, softirq, workqueue, sched, etc)
      -v, --verbose         be more verbose (show symbol address, etc)

  # perf kwork -k sched record true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.083 MB perf.data (47 samples) ]
  # perf evlist
  sched:sched_switch
  dummy:HG
  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Sandipan Das &lt;sandipan.das@amd.com&gt;
Link: https://lore.kernel.org/r/20230812084917.169338-8-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf kwork: Add the supported subcommands to the document</title>
<updated>2023-09-12T20:31:59+00:00</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-08-12T08:49:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=76e0d8c821bbd952730799cc7af841f9de67b7f7'/>
<id>urn:sha1:76e0d8c821bbd952730799cc7af841f9de67b7f7</id>
<content type='text'>
Add missing report, latency and timehist subcommands to the document.

Fixes: f98919ec4fccdacf ("perf kwork: Implement 'report' subcommand")
Fixes: ad3d9f7a929ab2df ("perf kwork: Implement perf kwork latency")
Fixes: bcc8b3e88d6fa1a3 ("perf kwork: Implement perf kwork timehist")
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Sandipan Das &lt;sandipan.das@amd.com&gt;
Link: https://lore.kernel.org/r/20230812084917.169338-3-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf record: Track sideband events for all CPUs when tracing selected CPUs</title>
<updated>2023-09-12T20:31:43+00:00</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-09-04T02:33:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=74b4f3ecdf64b62446abfb36669b3d40a42d34eb'/>
<id>urn:sha1:74b4f3ecdf64b62446abfb36669b3d40a42d34eb</id>
<content type='text'>
User space tasks can migrate between CPUs, we need to track side-band
events for all CPUs.

The specific scenarios are as follows:

         CPU0                                 CPU1
  perf record -C 0 start
                              taskA starts to be created and executed
                                -&gt; PERF_RECORD_COMM and PERF_RECORD_MMAP
                                   events only deliver to CPU1
                              ......
                                |
                          migrate to CPU0
                                |
  Running on CPU0    &lt;----------/
  ...

  perf record -C 0 stop

Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.

The solution is to record sideband events for all CPUs when tracing
selected CPUs. Because this modifies the default behavior, add related
comments to the perf record man page.

The sys_perf_event_open invoked is as follows:

  # perf --debug verbose=3 record -e cpu-clock -C 1 true
  &lt;SNIP&gt;
  Opening: cpu-clock
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0 (PERF_COUNT_SW_CPU_CLOCK)
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    freq                             1
    sample_id_all                    1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
  Opening: dummy:u
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0x9 (PERF_COUNT_SW_DUMMY)
    { sample_period, sample_freq }   1
    sample_type                      IP|TID|TIME|CPU|IDENTIFIER
    read_format                      ID|LOST
    inherit                          1
    exclude_kernel                   1
    exclude_hv                       1
    mmap                             1
    comm                             1
    task                             1
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
  &lt;SNIP&gt;

Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Tested-by: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@arm.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Richter &lt;tmricht@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20230904023340.12707-5-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf docs: Fix format of unordered lists</title>
<updated>2023-08-16T11:37:49+00:00</updated>
<author>
<name>Changbin Du</name>
<email>changbin.du@huawei.com</email>
</author>
<published>2023-07-18T08:52:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a1ef3aaf6ada374818ebcf978c175e65a4cd60ab'/>
<id>urn:sha1:a1ef3aaf6ada374818ebcf978c175e65a4cd60ab</id>
<content type='text'>
Fix the format of unordered lists so the can wrap properly.

Signed-off-by: Changbin Du &lt;changbin.du@huawei.com&gt;
Acked-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230718085242.3090797-1-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
</feed>
