diff options
| author | Chenglei Xie <Chenglei.Xie@amd.com> | 2026-04-07 17:51:24 +0300 |
|---|---|---|
| committer | Alex Deucher <alexander.deucher@amd.com> | 2026-04-17 21:47:06 +0300 |
| commit | ddda81c4d7e71e41b1be91d921fd85747eddbd12 (patch) | |
| tree | 5caa72674aff23ff41080c18cfd9654e319f773e | |
| parent | 574b3b14f7d1b329fc6e67b79328f0e6f4d4b3d4 (diff) | |
| download | linux-ddda81c4d7e71e41b1be91d921fd85747eddbd12.tar.xz | |
drm/amdgpu: gate VM CPU HDP flush on reset lock
During GPU reset, the application could still run CPU page table updates. Each commit called
amdgpu_device_flush_hdp(), which on SR-IOV sends work through the KIQ ring.
That can advance sync_seq while the GPU is being reset,
leaving fence writeback out of sync and causing amdgpu_fence_emit_polling()
to time out on later KIQ use.
Fix:
amdgpu_vm_cpu_commit():
Reset will flush HDP anyway, the HDP flush in amdgpu_vm_cpu_commit() can be skipped
when a reset is ongoging.
Take reset_domain->sem with down_read_trylock() before amdgpu_device_flush_hdp().
If the reset path holds the write lock, skip the HDP flush so no HDP-related HW
access (including KIQ) runs during reset; state is re-established after reset.
Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
| -rw-r--r-- | drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c | 12 |
1 files changed, 11 insertions, 1 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c index 22e2e5b47341..f078db3fef79 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c @@ -21,6 +21,8 @@ */ #include "amdgpu_vm.h" +#include "amdgpu.h" +#include "amdgpu_reset.h" #include "amdgpu_object.h" #include "amdgpu_trace.h" @@ -108,11 +110,19 @@ static int amdgpu_vm_cpu_update(struct amdgpu_vm_update_params *p, static int amdgpu_vm_cpu_commit(struct amdgpu_vm_update_params *p, struct dma_fence **fence) { + struct amdgpu_device *adev = p->adev; + if (p->needs_flush) atomic64_inc(&p->vm->tlb_seq); mb(); - amdgpu_device_flush_hdp(p->adev, NULL); + /* A reset flushed the HDP anyway, so that here can be skipped when a reset is ongoing */ + if (!down_read_trylock(&adev->reset_domain->sem)) + return 0; + + amdgpu_device_flush_hdp(adev, NULL); + up_read(&adev->reset_domain->sem); + return 0; } |
