perf: add NVIDIA Tegra410 CPU Memory Latency PMU

Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC. The PMU is used to measure latency between the edge of the Unified Coherence Fabric to the local system DRAM. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
author: Besar Wicaksono <bwicaksono@nvidia.com> 2026-03-24 04:29:50 +0300
committer: Will Deacon <will@kernel.org> 2026-03-24 15:37:32 +0300
commit: 429b7638b2df5538e945aaa2cc189cf0d6e8fb3a (patch)
tree: 60f76092c02b55d3c5191703e3a024ccd452928d /Documentation/admin-guide
parent: 3dd73022306bfdb29b1c33cb106fe337f46a6105 (diff)
download: linux-429b7638b2df5538e945aaa2cc189cf0d6e8fb3a.tar.xz
1 files changed, 25 insertions, 0 deletions
diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
index c065764d41fe..9945c43f6a7a 100644
--- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
+++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
@@ -8,6 +8,7 @@ metrics like memory bandwidth, latency, and utilization:
 * Unified Coherence Fabric (UCF)
 * PCIE
 * PCIE-TGT
+* CPU Memory (CMEM) Latency
 
 PMU Driver
 ----------
@@ -344,3 +345,27 @@ Example usage:
   0x10000 to 0x100FF on socket 0's PCIE RC-1::
 
     perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=0x1,dst_addr_base=0x10000,dst_addr_mask=0xFFF00,dst_addr_en=0x1/
+
+CPU Memory (CMEM) Latency PMU
+-----------------------------
+
+This PMU monitors latency events of memory read requests from the edge of the
+Unified Coherence Fabric (UCF) to local CPU DRAM:
+
+  * RD_REQ counters: count read requests (32B per request).
+  * RD_CUM_OUTS counters: accumulated outstanding request counter, which track
+    how many cycles the read requests are in flight.
+  * CYCLES counter: counts the number of elapsed cycles.
+
+The average latency is calculated as::
+
+   FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
+   AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ
+   AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_source/devices/nvidia_cmem_latency_pmu_<socket-id>.
+
+Example usage::
+
+  perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}'
author	Besar Wicaksono <bwicaksono@nvidia.com>	2026-03-24 04:29:50 +0300
committer	Will Deacon <will@kernel.org>	2026-03-24 15:37:32 +0300
commit	429b7638b2df5538e945aaa2cc189cf0d6e8fb3a (patch)
tree	60f76092c02b55d3c5191703e3a024ccd452928d /Documentation/admin-guide
parent	3dd73022306bfdb29b1c33cb106fe337f46a6105 (diff)
download	linux-429b7638b2df5538e945aaa2cc189cf0d6e8fb3a.tar.xz