diff options
| author | Benjamin Welton <bewelton@amd.com> | 2026-02-08 19:42:00 +0300 |
|---|---|---|
| committer | Alex Deucher <alexander.deucher@amd.com> | 2026-05-11 22:55:55 +0300 |
| commit | a789761de3053d25f03787ac40897dbea14ee368 (patch) | |
| tree | 564e29b645169b4398c5df730829a1179995d636 /include/uapi/linux | |
| parent | f96538285cfdbb3acf5e3356e0bb88c38815790b (diff) | |
| download | linux-a789761de3053d25f03787ac40897dbea14ee368.tar.xz | |
amd/amdkfd: Add kfd_ioctl_profiler to contain profiler kernel driver changes
kfd_ioctl_profiler takes a similar approach to that of
kfd_ioctl_dbg_trap (which contains debugger related IOCTL
services) where kfd_ioctl_profiler will contain all profiler
related IOCTL services. The IOCTL is designed to be expanded
as needed to support additional profiler functionality.
The current functionality of the IOCTL is to allow for profilers
which need PMC counters from GPU devices to both signal to other
profilers that may be on the system that the device has active PMC
profiling taking place on it (multiple PMC profilers on the same
device can result in corrupted counter data) and to setup the device
to allow for the collection of SQ PMC data on all queues on the device.
For PMC data for the SQ block (such as SQ_WAVES) to be available
to a profiler, mmPERFCOUNT_ENABLE must be set on the queues. When
profiling a single process, the profiler can inject PM4 packets into
each queue to turn on PERFCOUNT_ENABLE. When profiling system wide,
the profiler does not have this option and must have a way to turn
on profiling for queues in which it cannot inject packets into directly.
Accomplishing this requires a few steps:
1. Checking if the user has the necessary permissions to profile system
wide on the device. This check uses the same check that linux perf
uses to determine if a user has the necessary permissions to profile
at this scope (primarily if the process has CAP_SYS_PERFMON or is root).
2. Locking the device for profiling. This is done by setting a lock bit
on the device struct and storing the process that locked the device.
3. Iterating all queues on the device and issuing an MQD Update to enable
perfcounting on the queues.
4. Actions to cleanup if the process exits or releases the lock.
The IOCTL also contains a link to the existing PC Sampling IOCTL as well.
This is per a suggestion that we should potentially remove the PC Sampling
IOCTL to have it be a part of the profiler IOCTL. This is a future change.
In addition, we do expect to expand the profiler IOCTL to include
additional profiler functionality in the future (which necessitates the
use of a version number).
v2: sqaush in proper IOCTL number
Proposed userpace support:
https://github.com/ROCm/rocm-systems/commit/40abc95a6463a61bb318a67efd6d9cc3e5ee8839
Signed-off-by: Benjamin Welton <benjamin.welton@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Acked-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Diffstat (limited to 'include/uapi/linux')
| -rw-r--r-- | include/uapi/linux/kfd_ioctl.h | 28 |
1 files changed, 27 insertions, 1 deletions
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index e72359370857..cc3ed0765c83 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1558,6 +1558,29 @@ struct kfd_ioctl_dbg_trap_args { }; }; +#define KFD_IOC_PROFILER_VERSION_NUM 1 +enum kfd_profiler_ops { + KFD_IOC_PROFILER_PMC = 0, + KFD_IOC_PROFILER_VERSION = 2, +}; + +/** + * Enables/Disables GPU Specific profiler settings + */ +struct kfd_ioctl_pmc_settings { + __u32 gpu_id; /* This is the user_gpu_id */ + __u32 lock; /* Lock GPU for Profiling */ + __u32 perfcount_enable; /* Force Perfcount Enable for queues on GPU */ +}; + +struct kfd_ioctl_profiler_args { + __u32 op; /* kfd_profiler_op */ + union { + struct kfd_ioctl_pmc_settings pmc; + __u32 version; /* KFD_IOC_PROFILER_VERSION_NUM */ + }; +}; + #define AMDKFD_IOCTL_BASE 'K' #define AMDKFD_IO(nr) _IO(AMDKFD_IOCTL_BASE, nr) #define AMDKFD_IOR(nr, type) _IOR(AMDKFD_IOCTL_BASE, nr, type) @@ -1681,7 +1704,10 @@ struct kfd_ioctl_dbg_trap_args { #define AMDKFD_IOC_CREATE_PROCESS \ AMDKFD_IO(0x27) +#define AMDKFD_IOC_PROFILER \ + AMDKFD_IOWR(0x28, struct kfd_ioctl_profiler_args) + #define AMDKFD_COMMAND_START 0x01 -#define AMDKFD_COMMAND_END 0x28 +#define AMDKFD_COMMAND_END 0x29 #endif |
