kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c, branch v6.12.80

drm/amdgpu: prevent immediate PASID reuse case

2026-04-02T11:09:45+00:00

commit 14b81abe7bdc25f8097906fc2f91276ffedb2d26 upstream. PASID resue could cause interrupt issue when process immediately runs into hw state left by previous process exited with the same PASID, it's possible that page faults are still pending in the IH ring buffer when the process exits and frees up its PASID. To prevent the case, it uses idr cyclic allocator same as kernel pid's. Signed-off-by: Eric Huang Reviewed-by: Christian König Signed-off-by: Alex Deucher (cherry picked from commit 8f1de51f49be692de137c8525106e0fce2d1912d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: Forward VMID reservation errors

2026-01-11T14:25:20+00:00

[ Upstream commit 8defb4f081a5feccc3ea8372d0c7af3522124e1f ] Otherwise userspace may be fooled into believing it has a reserved VMID when in reality it doesn't, ultimately leading to GPU hangs when SPM is used. Fixes: 80e709ee6ecc ("drm/amdgpu: add option params to enforce process isolation between graphics and compute") Cc: stable@vger.kernel.org Reviewed-by: Christian König Signed-off-by: Natalie Vock Signed-off-by: Alex Deucher [ adapted 3-argument amdgpu_vmid_alloc_reserved(adev, vm, vmhub) call to 2-argument version and added separate error check to preserve reserved_vmid tracking logic. ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

Revert "drm/amdgpu: Avoid extra evict-restore process."

2025-09-09T16:58:23+00:00

This reverts commit 71598a5a7797f0052aaa7bcff0b8d4b8f20f1441 which is commit 1f02f2044bda1db1fd995bc35961ab075fa7b5a2 upstream. This commit introduced a regression, however the fix for the regression: aa5fc4362fac ("drm/amdgpu: fix task hang from failed job submission during process kill") depends on things not yet present in 6.12.y and older kernels. Since this commit is more of an optimization, just revert it for 6.12.y and older stable kernels. Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org # 6.1.x - 6.12.x Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: Avoid extra evict-restore process.

2025-08-28T14:31:02+00:00

commit 1f02f2044bda1db1fd995bc35961ab075fa7b5a2 upstream. If vm belongs to another process, this is fclose after fork, wait may enable signaling KFD eviction fence and cause parent process queue evicted. [677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu] [677852.634814] __dma_fence_enable_signaling+0x3e/0xe0 [677852.634820] dma_fence_wait_timeout+0x3a/0x140 [677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl] [677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu] [677852.635026] amdgpu_flush+0x34/0x50 [amdgpu] [677852.635208] filp_flush+0x38/0x90 [677852.635213] filp_close+0x14/0x30 [677852.635216] do_close_on_exec+0xdd/0x130 [677852.635221] begin_new_exec+0x1da/0x490 [677852.635225] load_elf_binary+0x307/0xea0 [677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5 [677852.635235] ? ima_bprm_check+0xa2/0xd0 [677852.635240] search_binary_handler+0xda/0x260 [677852.635245] exec_binprm+0x58/0x1a0 [677852.635249] bprm_execve.part.0+0x16f/0x210 [677852.635254] bprm_execve+0x45/0x80 [677852.635257] do_execveat_common.isra.0+0x190/0x200 Suggested-by: Christian König Signed-off-by: Gang Ba Reviewed-by: Christian König Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: Unlocked unmap only clear page table leaves

2025-04-20T08:15:23+00:00

[ Upstream commit 23b645231eeffdaf44021debac881d2f26824150 ] SVM migration unmap pages from GPU and then update mapping to GPU to recover page fault. Currently unmap clears the PDE entry for range length >= huge page and free PTB bo, update mapping to alloc new PT bo. There is race bug that the freed entry bo maybe still on the pt_free list, reused when updating mapping and then freed, leave invalid PDE entry and cause GPU page fault. By setting the update to clear only one PDE entry or clear PTB, to avoid unmap to free PTE bo. This fixes the race bug and improve the unmap and map to GPU performance. Update mapping to huge page will still free the PTB bo. With this change, the vm->pt_freed list and work is not needed. Add WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the PTB. Signed-off-by: Philip Yang Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update

2024-12-27T13:02:10+00:00

commit 85230ee36d88e7a09fb062d43203035659dd10a5 upstream. Third time's the charm, I hope? Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Reviewed-by: Christian König Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher (cherry picked from commit 695c2c745e5dff201b75da8a1d237ce403600d04) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: fix when the cleaner shader is emitted

2024-12-19T17:13:07+00:00

commit f4df208177d02f1c90f3644da3a2453080b8c24f upstream. Emitting the cleaner shader must come after the check if a VM switch is necessary or not. Otherwise we will emit the cleaner shader every time and not just when it is necessary because we switched between applications. This can otherwise crash on gang submit and probably decreases performance quite a bit. v2: squash in fix from Srini (Alex) Signed-off-by: Christian König Fixes: ee7a846ea27b ("drm/amdgpu: Emit cleaner shader at end of IB submission") Acked-by: Alex Deucher Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: sync to KFD fences before clearing PTEs

2024-09-25T16:55:44+00:00

This patch tries to solve the basic problem we also need to sync to the KFD fences of the BO because otherwise it can be that we clear PTEs while the KFD queues are still running. Signed-off-by: Christian König Acked-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu: nuke the VM PD/PT shadow handling

2024-09-18T20:15:06+00:00

This was only used as workaround for recovering the page tables after VRAM was lost and is no longer necessary after the function amdgpu_vm_bo_reset_state_machine() started to do the same. Compute never used shadows either, so the only proplematic case left is SVM and that is most likely not recoverable in any way when VRAM is lost. Signed-off-by: Christian König Acked-by: Lijo Lazar Signed-off-by: Alex Deucher

drm/amdgpu: revert "use CPU for page table update if SDMA is unavailable"

2024-09-06T21:55:06+00:00

That is clearly not something we should do upstream. The SDMA is mandatory for the driver to work correctly. We could do this for emulation and bringup, but in those cases the engineer should probably enabled CPU based updates manually. This reverts commit 62eefd10ac1c7e976bda47ff311bd87cee40ab8d. Signed-off-by: Christian König Reviewed-by: Yifan Zhang Signed-off-by: Alex Deucher