kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h, branch v6.19.11

drm/amdgpu: always backup and reemit fences

2026-01-05T22:28:45+00:00

If when we backup the ring contents for reemit before a ring reset, we skip jobs associated with the bad context, however, we need to make sure the fences are reemited as unprocessed submissions may depend on them. v2: clean up fence handling, make helpers static Reviewed-by: Timur Kristóf Signed-off-by: Alex Deucher (cherry picked from commit 155a748f14bc0b72783994dea7c5a12276730342)

drm/amdgpu: don't reemit ring contents more than once

2026-01-05T22:28:45+00:00

If we cancel a bad job and reemit the ring contents, and we get another timeout, cancel everything rather than reemitting. The wptr markers are only relevant for the original emit. If we reemit, the wptr markers are no longer correct. Reviewed-by: Timur Kristóf Signed-off-by: Alex Deucher (cherry picked from commit fb62a2067ca4555a6572d911e05919a311c010aa)

drm/amdgpu: Implement user queue reset functionality

2025-11-04T16:53:05+00:00

This patch adds robust reset handling for user queues (userq) to improve recovery from queue failures. The key components include: 1. Queue detection and reset logic: - amdgpu_userq_detect_and_reset_queues() identifies failed queues - Per-IP detect_and_reset callbacks for targeted recovery - Falls back to full GPU reset when needed 2. Reset infrastructure: - Adds userq_reset_work workqueue for async reset handling - Implements pre/post reset handlers for queue state management - Integrates with existing GPU reset framework 3. Error handling improvements: - Enhanced state tracking with HUNG state - Automatic reset triggering on critical failures - VRAM loss handling during recovery 4. Integration points: - Added to device init/reset paths - Called during queue destroy, suspend, and isolation events - Handles both individual queue and full GPU resets The reset functionality works with both gfx/compute and sdma queues, providing better resilience against queue failures while minimizing disruption to unaffected queues. v2: add detection and reset calls when preemption/unmaped fails. add a per device userq counter for each user queue type.(Alex) v3: make sure we hold the adev->userq_mutex when we call amdgpu_userq_detect_and_reset_queues. (Alex) warn if the adev->userq_mutex is not held. v4: make sure we have all of the uqm->userq_mutex held. warn if the uqm->userq_mutex is not held. v5: Use array for user queue type counters.(Alex) all of the uqm->userq_mutex need to be held when calling detect and reset. (Alex) v6: fix lock dep warning in amdgpu_userq_fence_dence_driver_process v7: add the queue types in an array and use a loop in amdgpu_userq_detect_and_reset_queues (Lijo) v8: remove atomic_set(&userq_mgr->userq_count[i], 0). it should already be 0 since we kzalloc the structure (Alex) v9: For consistency with kernel queues, We may want something like: amdgpu_userq_is_reset_type_supported (Alex) Signed-off-by: Jesse Zhang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: clean up and unify hw fence handling

2025-10-13T18:14:35+00:00

Decouple the amdgpu fence from the amdgpu_job structure. This lets us clean up the separate fence ops for the embedded fence and other fences. This also allows us to allocate the vm fence up front when we allocate the job. v2: Additional cleanup suggested by Christian v3: Additional cleanups suggested by Christian v4: Additional cleanups suggested by David and vm fence fix v5: cast seqno (David) Cc: David.Wu3@amd.com Cc: christian.koenig@amd.com Tested-by: David (Ming Qiang) Wu Reviewed-by: David (Ming Qiang) Wu Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: set an error on all fences from a bad context

2025-10-13T18:14:15+00:00

When we backup ring contents to reemit after a queue reset, we don't backup ring contents from the bad context. When we signal the fences, we should set an error on those fences as well. v2: misc cleanups v3: add locking for fence error, fix comment (Christian) v4: fix wrap around, locking (Christian) Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Use memset32 for ring clearing

2025-09-15T20:55:22+00:00

Use memset32 instead of open coding it, just because it is a tiny bit nicer. Reviewed-by: Christian König Signed-off-by: Tvrtko Ursulin Signed-off-by: Alex Deucher

drm/amdgpu: Fix allocating extra dwords for rings (v2)

2025-09-15T20:52:52+00:00

Rename extra_dw to extra_bytes and document what it's for. The value is already used as if it were bytes in vcn_v4_0.c and in amdgpu_ring_init. Just adjust the dword count in jpeg_v1_0.c so that it becomes a byte count. v2: Rename extra_dw to extra_bytes as discussed during review. Fixes: c8c1a1d2ef04 ("drm/amdgpu: define and add extra dword for jpeg ring") Signed-off-by: Timur Kristóf Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Remove volatile from ring manipulation

2025-09-15T20:51:15+00:00

None of the pointer operations handled by the ring file requires volatile, for this reason, this commit removes all occurrences of volatile associated with rings. Reviewed-by: Christian König Signed-off-by: Rodrigo Siqueira Signed-off-by: Alex Deucher

drm/amdgpu: move reset support type checks into the caller

2025-07-17T16:36:56+00:00

Rather than checking in the callbacks, check if the reset type is supported in the caller. Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher

drm/amdgpu: track ring state associated with a fence

2025-07-16T20:14:11+00:00

We need to know the wptr and sequence number associated with a fence so that we can re-emit the unprocessed state after a ring reset. Pre-allocate storage space for the ring buffer contents and add helpers to save off and re-emit the unprocessed state so that it can be re-emitted after the queue is reset. Reviewed-by: Christian König Signed-off-by: Alex Deucher