kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c, branch v6.6.131

drm/amdgpu: Fix fence put before wait in amdgpu_amdkfd_submit_ib

2026-04-02T11:07:23+00:00

[ Upstream commit 7150850146ebfa4ca998f653f264b8df6f7f85be ] amdgpu_amdkfd_submit_ib() submits a GPU job and gets a fence from amdgpu_ib_schedule(). This fence is used to wait for job completion. Currently, the code drops the fence reference using dma_fence_put() before calling dma_fence_wait(). If dma_fence_put() releases the last reference, the fence may be freed before dma_fence_wait() is called. This can lead to a use-after-free. Fix this by waiting on the fence first and releasing the reference only after dma_fence_wait() completes. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:697 amdgpu_amdkfd_submit_ib() warn: passing freed memory 'f' (line 696) Fixes: 9ae55f030dc5 ("drm/amdgpu: Follow up change to previous drm scheduler change.") Cc: Felix Kuehling Cc: Dan Carpenter Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Reviewed-by: Christian König Signed-off-by: Alex Deucher (cherry picked from commit 8b9e5259adc385b61a6590a13b82ae0ac2bd3482) Signed-off-by: Sasha Levin

drm/amdkfd: drop struct kfd_cu_info

2025-01-02T09:32:09+00:00

[ Upstream commit 0021d70a0654e668d457758110abec33dfbd3ba5 ] I think this was an abstraction back from when kfd supported both radeon and amdgpu. Since we just support amdgpu now, there is no more need for this and we can use the amdgpu structures directly. This also avoids having the kfd_cu_info structures on the stack when inlining which can blow up the stack. Cc: Arnd Bergmann Acked-by: Arnd Bergmann Reviewed-by: Felix Kuehling Acked-by: Christian König Signed-off-by: Alex Deucher Stable-dep-of: 438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd") Signed-off-by: Sasha Levin

drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer

2024-10-10T09:57:32+00:00

[ Upstream commit c86ad39140bbcb9dc75a10046c2221f657e8083b ] Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the local variable, the original pointer not set to NULL, this could cause use-after-free bug. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Acked-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3

2023-09-11T22:16:31+00:00

Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: use correct method to get clock under SRIOV

2023-08-31T21:58:29+00:00

[What] Current SRIOV still using adev->clock.default_XX which gets from atomfirmware. But these fields are abandoned in atomfirmware long ago. Which may cause function to return a 0 value. [How] We don't need to check whether SR-IOV. For SR-IOV one-vf-mode, pm is enabled and VF is able to read dpm clock from pmfw, so we can use dpm clock interface directly. For multi-VF mode, VF pm is disabled, so driver can just react as pm disabled. One-vf-mode is introduced from GFX9 so it shall not have any backward compatibility issue. Signed-off-by: Horace Chen Acked-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: drop IOMMUv2 support

2023-08-11T18:47:25+00:00

Now that we use the dGPU path for all APUs, drop the IOMMUv2 support. v2: drop the now unused queue manager functions for gfx7/8 APUs Reviewed-by: Felix Kuehling Acked-by: Christian König Tested-by: Mike Lothian Signed-off-by: Alex Deucher

drm/amdkfd: Fix stack size in 'amdgpu_amdkfd_unmap_hiq'

2023-07-12T16:22:52+00:00

Dynamically allocate large local variable instead of putting it onto the stack to avoid exceeding the stack size: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c: In function ‘amdgpu_amdkfd_unmap_hiq’: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:868:1: warning: the frame size of 1280 bytes is larger than 1024 bytes [-Wframe-larger-than=] Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202307080505.V12qS0oz-lkp@intel.com Suggested-by: Guchun Chen Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Acked-by: Guchun Chen Reviewed-by: Mukul Joshi Signed-off-by: Alex Deucher

drm/amdkfd: Use KIQ to unmap HIQ

2023-07-07T17:51:48+00:00

Currently, we unmap HIQ by directly writing to HQD registers. This doesn't work for GFX9.4.3. Instead, use KIQ to unmap HIQ, similar to how we use KIQ to map HIQ. Using KIQ to unmap HIQ works for all GFX series post GFXv9. Signed-off-by: Mukul Joshi Reviewed-by: Harish Kasiviswanathan Signed-off-by: Alex Deucher

drm/amdkfd: add debug suspend and resume process queues operation

2023-06-09T16:36:43+00:00

In order to inspect waves from the saved context at any point during a debug session, the debugger must be able to preempt queues to trigger context save by suspending them. On queue suspend, the KFD will copy the context save header information so that the debugger can correctly crawl the appropriate size of the saved context. The debugger must then also be allowed to resume suspended queues. A queue that is newly created cannot be suspended because queue ids are recycled after destruction so the debugger needs to know that this has occurred. Query functions will be later added that will clear a given queue of its new queue status. A queue cannot be destroyed while it is suspended to preserve its saved context during debugger inspection. Have queue destruction block while a queue is suspended and unblocked when it is resumed. Likewise, if a queue is about to be destroyed, it cannot be suspended. Return the number of queues successfully suspended or resumed along with a per queue status array where the upper bits per queue status show that the request was invalid (new/destroyed queue suspend request, missing queue) or an error occurred (HWS in a fatal state so it can't suspend or resume queues). Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: update process interrupt handling for debug events

2023-06-09T16:36:17+00:00

The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts. If a debugger session exits, any exceptions it subscribed to may still have interrupts in the interrupt ring buffer or KGD/KFD pipeline. To prevent a new session from inheriting stale interrupts, when a new queue is created, open an interrupt drain and allow the IH ring to drain from a timestamped checkpoint. Then inject a custom IV so that once the custom IV is picked up by the KFD, it's safe to close the drain and proceed with queue creation. The drain must also be on debug disable as SW interrupts may still be processed. Drain at this time and clear all the exception status. The debugger may also not be attached nor subscibed to certain exceptions so forward them directly to the runtime. GFX10 also requires its own IV processing, hence the creation of kfd_int_process_v10.c. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher