summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
diff options
context:
space:
mode:
authorChenglei Xie <Chenglei.Xie@amd.com>2025-08-07 23:52:34 +0300
committerAlex Deucher <alexander.deucher@amd.com>2025-08-15 20:07:30 +0300
commitd2fa0ec6e0aea6ffbd41939d0c7671db16991ca4 (patch)
tree1b32b60d80016dba1fb45df8fa1b35e4b6388a9d /drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
parentfc4e990a326e608eb8937eba737908c660b7a410 (diff)
downloadlinux-d2fa0ec6e0aea6ffbd41939d0c7671db16991ca4.tar.xz
drm/amdgpu: refactor bad_page_work for corner case handling
When a poison is consumed on the guest before the guest receives the host's poison creation msg, a corner case may occur to have poison_handler complete processing earlier than it should to cause the guest to hang waiting for the req_bad_pages reply during a VF FLR, resulting in the VM becoming inaccessible in stress tests. To fix this issue, this patch refactored the mailbox sequence by seperating the bad_page_work into two parts req_bad_pages_work and handle_bad_pages_work. Old sequence: 1.Stop data exchange work 2.Guest sends MB_REQ_RAS_BAD_PAGES to host and keep polling for IDH_RAS_BAD_PAGES_READY 3.If the IDH_RAS_BAD_PAGES_READY arrives within timeout limit, re-init the data exchange region for updated bad page info else timeout with error message New sequence: req_bad_pages_work: 1.Stop data exhange work 2.Guest sends MB_REQ_RAS_BAD_PAGES to host Once Guest receives IDH_RAS_BAD_PAGES_READY event handle_bad_pages_work: 3.re-init the data exchange region for updated bad page info Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com> Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h')
-rw-r--r--drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h3
1 files changed, 2 insertions, 1 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index 3da3ebb1d9a1..58accf2259b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -267,7 +267,8 @@ struct amdgpu_virt {
struct amdgpu_irq_src rcv_irq;
struct work_struct flr_work;
- struct work_struct bad_pages_work;
+ struct work_struct req_bad_pages_work;
+ struct work_struct handle_bad_pages_work;
struct amdgpu_mm_table mm_table;
const struct amdgpu_virt_ops *ops;