summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide/perf
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide/perf')
-rw-r--r--Documentation/admin-guide/perf/dwc_pcie_pmu.rst6
-rw-r--r--Documentation/admin-guide/perf/hisi-pmu.rst5
-rw-r--r--Documentation/admin-guide/perf/index.rst2
-rw-r--r--Documentation/admin-guide/perf/mrvl-odyssey-ddr-pmu.rst80
-rw-r--r--Documentation/admin-guide/perf/mrvl-odyssey-tad-pmu.rst37
-rw-r--r--Documentation/admin-guide/perf/nvidia-pmu.rst52
6 files changed, 169 insertions, 13 deletions
diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
index 39b8e1fdd0cd..cb376f335f40 100644
--- a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
+++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
@@ -60,7 +60,7 @@ description of available events and configuration options in sysfs, see
The "format" directory describes format of the config fields of the
perf_event_attr structure. The "events" directory provides configuration
templates for all documented events. For example,
-"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
+"rx_pcie_tlp_data_payload" is an equivalent of "eventid=0x21,type=0x0".
The "perf list" command shall list the available events from sysfs, e.g.::
@@ -79,8 +79,8 @@ Example usage of counting PCIe RX TLP data payload (Units of bytes)::
The average RX/TX bandwidth can be calculated using the following formula:
- PCIe RX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window
- PCIe TX Bandwidth = Tx_PCIe_TLP_Data_Payload / Measure_Time_Window
+ PCIe RX Bandwidth = rx_pcie_tlp_data_payload / Measure_Time_Window
+ PCIe TX Bandwidth = tx_pcie_tlp_data_payload / Measure_Time_Window
Lane Event Usage
-------------------------------
diff --git a/Documentation/admin-guide/perf/hisi-pmu.rst b/Documentation/admin-guide/perf/hisi-pmu.rst
index 5cc248d18c63..48992a0b8e94 100644
--- a/Documentation/admin-guide/perf/hisi-pmu.rst
+++ b/Documentation/admin-guide/perf/hisi-pmu.rst
@@ -35,7 +35,10 @@ e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
SCCL ID #1.
The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
-ID used to count the uncore PMU event.
+ID used to count the uncore PMU event. An "associated_cpus" sysfs attribute is
+also provided to show the CPUs associated with this PMU. The "cpumask" indicates
+the CPUs to open the events, usually as a hint for userspaces tools like perf.
+It only contains one associated CPU from the "associated_cpus".
Example usage of perf::
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index a58bd3f7e190..072b510385c4 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -14,6 +14,8 @@ Performance monitor support
qcom_l2_pmu
qcom_l3_pmu
starfive_starlink_pmu
+ mrvl-odyssey-ddr-pmu
+ mrvl-odyssey-tad-pmu
arm-ccn
arm-cmn
arm-ni
diff --git a/Documentation/admin-guide/perf/mrvl-odyssey-ddr-pmu.rst b/Documentation/admin-guide/perf/mrvl-odyssey-ddr-pmu.rst
new file mode 100644
index 000000000000..2e817593a4d9
--- /dev/null
+++ b/Documentation/admin-guide/perf/mrvl-odyssey-ddr-pmu.rst
@@ -0,0 +1,80 @@
+===================================================================
+Marvell Odyssey DDR PMU Performance Monitoring Unit (PMU UNCORE)
+===================================================================
+
+Odyssey DRAM Subsystem supports eight counters for monitoring performance
+and software can program those counters to monitor any of the defined
+performance events. Supported performance events include those counted
+at the interface between the DDR controller and the PHY, interface between
+the DDR Controller and the CHI interconnect, or within the DDR Controller.
+
+Additionally DSS also supports two fixed performance event counters, one
+for ddr reads and the other for ddr writes.
+
+The counter will be operating in either manual or auto mode.
+
+The PMU driver exposes the available events and format options under sysfs::
+
+ /sys/bus/event_source/devices/mrvl_ddr_pmu_<>/events/
+ /sys/bus/event_source/devices/mrvl_ddr_pmu_<>/format/
+
+Examples::
+
+ $ perf list | grep ddr
+ mrvl_ddr_pmu_<>/ddr_act_bypass_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_bsm_alloc/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_bsm_starvation/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_active_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_mwr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_rd_active_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_rd_or_wr_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_read/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_wr_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_write/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_capar_error/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_crit_ref/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_ddr_reads/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_ddr_writes/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_cmd_is_retry/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_cycles/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_parity_poison/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_rd_data_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_wr_data_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dqsosc_mpc/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dqsosc_mrr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_enter_mpsm/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_enter_powerdown/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_enter_selfref/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_pri_rdaccess/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_rd_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_rd_or_wr_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_rmw_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_wr_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hpri_sched_rd_crit_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_load_mode/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_lpri_sched_rd_crit_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_precharge/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_precharge_for_other/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_precharge_for_rdwr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_raw_hazard/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_rd_bypass_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_rd_crc_error/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_rd_uc_ecc_error/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_rdwr_transitions/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_refresh/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_retry_fifo_full/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_spec_ref/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_tcr_mrr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_war_hazard/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_waw_hazard/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_win_limit_reached_rd/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_win_limit_reached_wr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_wr_crc_error/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_wr_trxn_crit_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_write_combine/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_zqcl/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_zqlatch/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_zqstart/ [Kernel PMU event]
+
+ $ perf stat -e ddr_cam_read,ddr_cam_write,ddr_cam_active_access,ddr_cam
+ rd_or_wr_access,ddr_cam_rd_active_access,ddr_cam_mwr <workload>
diff --git a/Documentation/admin-guide/perf/mrvl-odyssey-tad-pmu.rst b/Documentation/admin-guide/perf/mrvl-odyssey-tad-pmu.rst
new file mode 100644
index 000000000000..ad1975b14087
--- /dev/null
+++ b/Documentation/admin-guide/perf/mrvl-odyssey-tad-pmu.rst
@@ -0,0 +1,37 @@
+====================================================================
+Marvell Odyssey LLC-TAD Performance Monitoring Unit (PMU UNCORE)
+====================================================================
+
+Each TAD provides eight 64-bit counters for monitoring
+cache behavior.The driver always configures the same counter for
+all the TADs. The user would end up effectively reserving one of
+eight counters in every TAD to look across all TADs.
+The occurrences of events are aggregated and presented to the user
+at the end of running the workload. The driver does not provide a
+way for the user to partition TADs so that different TADs are used for
+different applications.
+
+The performance events reflect various internal or interface activities.
+By combining the values from multiple performance counters, cache
+performance can be measured in terms such as: cache miss rate, cache
+allocations, interface retry rate, internal resource occupancy, etc.
+
+The PMU driver exposes the available events and format options under sysfs::
+
+ /sys/bus/event_source/devices/tad/events/
+ /sys/bus/event_source/devices/tad/format/
+
+Examples::
+
+ $ perf list | grep tad
+ tad/tad_alloc_any/ [Kernel PMU event]
+ tad/tad_alloc_dtg/ [Kernel PMU event]
+ tad/tad_alloc_ltg/ [Kernel PMU event]
+ tad/tad_hit_any/ [Kernel PMU event]
+ tad/tad_hit_dtg/ [Kernel PMU event]
+ tad/tad_hit_ltg/ [Kernel PMU event]
+ tad/tad_req_msh_in_exlmn/ [Kernel PMU event]
+ tad/tad_tag_rd/ [Kernel PMU event]
+ tad/tad_tot_cycle/ [Kernel PMU event]
+
+ $ perf stat -e tad_alloc_dtg,tad_alloc_ltg,tad_alloc_any,tad_hit_dtg,tad_hit_ltg,tad_hit_any,tad_tag_rd <workload>
diff --git a/Documentation/admin-guide/perf/nvidia-pmu.rst b/Documentation/admin-guide/perf/nvidia-pmu.rst
index 2e0d47cfe7ea..f538ef67e0e8 100644
--- a/Documentation/admin-guide/perf/nvidia-pmu.rst
+++ b/Documentation/admin-guide/perf/nvidia-pmu.rst
@@ -34,7 +34,7 @@ strongly-ordered (SO) PCIE write traffic to local/remote memory. Please see
traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_scf_pmu_<socket-id>.
Example usage:
@@ -66,7 +66,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
the PMU traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
Example usage:
@@ -86,6 +86,22 @@ Example usage:
perf stat -a -e nvidia_nvlink_c2c0_pmu_3/event=0x0/
+The NVLink-C2C has two ports that can be connected to one GPU (occupying both
+ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
+parameter to select the port(s) to monitor. Each bit represents the port number,
+e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
+PMU will monitor both ports by default if not specified.
+
+Example for port filtering:
+
+* Count event id 0x0 from the GPU connected with socket 0 on port 0::
+
+ perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x1/
+
+* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
+
+ perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x3/
+
NVLink-C2C1 PMU
-------------------
@@ -96,7 +112,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
the PMU traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
Example usage:
@@ -116,6 +132,22 @@ Example usage:
perf stat -a -e nvidia_nvlink_c2c1_pmu_3/event=0x0/
+The NVLink-C2C has two ports that can be connected to one GPU (occupying both
+ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
+parameter to select the port(s) to monitor. Each bit represents the port number,
+e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
+PMU will monitor both ports by default if not specified.
+
+Example for port filtering:
+
+* Count event id 0x0 from the GPU connected with socket 0 on port 0::
+
+ perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x1/
+
+* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
+
+ perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x3/
+
CNVLink PMU
---------------
@@ -125,13 +157,14 @@ to local memory. For PCIE traffic, this PMU captures read and relaxed ordered
for more info about the PMU traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>.
Each SoC socket can be connected to one or more sockets via CNVLink. The user can
use "rem_socket" bitmap parameter to select the remote socket(s) to monitor.
Each bit represents the socket number, e.g. "rem_socket=0xE" corresponds to
-socket 1 to 3.
-/sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
+socket 1 to 3. The PMU will monitor all remote sockets by default if not
+specified.
+/sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
shows the valid bits that can be set in the "rem_socket" parameter.
The PMU can not distinguish the remote traffic initiator, therefore it does not
@@ -165,12 +198,13 @@ local/remote memory. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section
for more info about the PMU traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>.
Each SoC socket can support multiple root ports. The user can use
"root_port" bitmap parameter to select the port(s) to monitor, i.e.
-"root_port=0xF" corresponds to root port 0 to 3.
-/sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
+"root_port=0xF" corresponds to root port 0 to 3. The PMU will monitor all root
+ports by default if not specified.
+/sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
shows the valid bits that can be set in the "root_port" parameter.
Example usage: