summaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/gecko.py
diff options
context:
space:
mode:
authorAnkit Agrawal <ankita@nvidia.com>2025-01-24 21:31:01 +0300
committerAlex Williamson <alex.williamson@redhat.com>2025-01-27 19:43:33 +0300
commitd85f69d520e6aca8ae5ce353666e2fc2756eb9e7 (patch)
tree72e90c9c541b34fe184030d200c8582106afa5b9 /tools/perf/scripts/python/gecko.py
parent6a9eb2d125ba90d13b45bcfabcddf9f61268f6a8 (diff)
downloadlinux-d85f69d520e6aca8ae5ce353666e2fc2756eb9e7.tar.xz
vfio/nvgrace-gpu: Check the HBM training and C2C link status
In contrast to Grace Hopper systems, the HBM training has been moved out of the UEFI on the Grace Blackwell systems. This reduces the system bootup time significantly. The onus of checking whether the HBM training has completed thus falls on the module. The HBM training status can be determined from a BAR0 register. Similarly, another BAR0 register exposes the status of the CPU-GPU chip-to-chip (C2C) cache coherent interconnect. Based on testing, 30s is determined to be sufficient to ensure initialization completion on all the Grace based systems. Thus poll these register and check for 30s. If the HBM training is not complete or if the C2C link is not ready, fail the probe. While the time is not required on Grace Hopper systems, it is beneficial to make the check to ensure the device is in an expected state. Hence keeping it generalized to both the generations. Ensure that the BAR0 is enabled before accessing the registers. CC: Alex Williamson <alex.williamson@redhat.com> CC: Kevin Tian <kevin.tian@intel.com> CC: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20250124183102.3976-4-ankita@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Diffstat (limited to 'tools/perf/scripts/python/gecko.py')
0 files changed, 0 insertions, 0 deletions