diff options
| author | Besar Wicaksono <bwicaksono@nvidia.com> | 2026-03-24 04:29:50 +0300 |
|---|---|---|
| committer | Will Deacon <will@kernel.org> | 2026-03-24 15:37:32 +0300 |
| commit | 429b7638b2df5538e945aaa2cc189cf0d6e8fb3a (patch) | |
| tree | 60f76092c02b55d3c5191703e3a024ccd452928d /Documentation/admin-guide | |
| parent | 3dd73022306bfdb29b1c33cb106fe337f46a6105 (diff) | |
| download | linux-429b7638b2df5538e945aaa2cc189cf0d6e8fb3a.tar.xz | |
perf: add NVIDIA Tegra410 CPU Memory Latency PMU
Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC.
The PMU is used to measure latency between the edge of the
Unified Coherence Fabric to the local system DRAM.
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
Diffstat (limited to 'Documentation/admin-guide')
| -rw-r--r-- | Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst | 25 |
1 files changed, 25 insertions, 0 deletions
diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst index c065764d41fe..9945c43f6a7a 100644 --- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst @@ -8,6 +8,7 @@ metrics like memory bandwidth, latency, and utilization: * Unified Coherence Fabric (UCF) * PCIE * PCIE-TGT +* CPU Memory (CMEM) Latency PMU Driver ---------- @@ -344,3 +345,27 @@ Example usage: 0x10000 to 0x100FF on socket 0's PCIE RC-1:: perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=0x1,dst_addr_base=0x10000,dst_addr_mask=0xFFF00,dst_addr_en=0x1/ + +CPU Memory (CMEM) Latency PMU +----------------------------- + +This PMU monitors latency events of memory read requests from the edge of the +Unified Coherence Fabric (UCF) to local CPU DRAM: + + * RD_REQ counters: count read requests (32B per request). + * RD_CUM_OUTS counters: accumulated outstanding request counter, which track + how many cycles the read requests are in flight. + * CYCLES counter: counts the number of elapsed cycles. + +The average latency is calculated as:: + + FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS + AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ + AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ + +The events and configuration options of this PMU device are described in sysfs, +see /sys/bus/event_source/devices/nvidia_cmem_latency_pmu_<socket-id>. + +Example usage:: + + perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}' |
