<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-05T22:28:45+00:00</updated>
<entry>
<title>drm/amdgpu: always backup and reemit fences</title>
<updated>2026-01-05T22:28:45+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-11-13T19:12:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=531b43260928a53f4190f3e61de788551af6157e'/>
<id>urn:sha1:531b43260928a53f4190f3e61de788551af6157e</id>
<content type='text'>
If when we backup the ring contents for reemit before a
ring reset, we skip jobs associated with the bad
context, however, we need to make sure the fences
are reemited as unprocessed submissions may depend on
them.

v2: clean up fence handling, make helpers static

Reviewed-by: Timur Kristóf &lt;timur.kristof@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 155a748f14bc0b72783994dea7c5a12276730342)
</content>
</entry>
<entry>
<title>drm/amdgpu: don't reemit ring contents more than once</title>
<updated>2026-01-05T22:28:45+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-11-13T18:24:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9fc27cbabee6d2e63b0268ca709ad3129b3ac50d'/>
<id>urn:sha1:9fc27cbabee6d2e63b0268ca709ad3129b3ac50d</id>
<content type='text'>
If we cancel a bad job and reemit the ring contents, and
we get another timeout, cancel everything rather than reemitting.
The wptr markers are only relevant for the original emit.  If
we reemit, the wptr markers are no longer correct.

Reviewed-by: Timur Kristóf &lt;timur.kristof@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit fb62a2067ca4555a6572d911e05919a311c010aa)
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement user queue reset functionality</title>
<updated>2025-11-04T16:53:05+00:00</updated>
<author>
<name>Jesse.Zhang</name>
<email>Jesse.Zhang@amd.com</email>
</author>
<published>2025-10-24T02:51:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=290f46cf5726509f55fac4c7abe9b9d311aa5a3a'/>
<id>urn:sha1:290f46cf5726509f55fac4c7abe9b9d311aa5a3a</id>
<content type='text'>
This patch adds robust reset handling for user queues (userq) to improve
recovery from queue failures. The key components include:

1. Queue detection and reset logic:
   - amdgpu_userq_detect_and_reset_queues() identifies failed queues
   - Per-IP detect_and_reset callbacks for targeted recovery
   - Falls back to full GPU reset when needed

2. Reset infrastructure:
   - Adds userq_reset_work workqueue for async reset handling
   - Implements pre/post reset handlers for queue state management
   - Integrates with existing GPU reset framework

3. Error handling improvements:
   - Enhanced state tracking with HUNG state
   - Automatic reset triggering on critical failures
   - VRAM loss handling during recovery

4. Integration points:
   - Added to device init/reset paths
   - Called during queue destroy, suspend, and isolation events
   - Handles both individual queue and full GPU resets

The reset functionality works with both gfx/compute and sdma queues,
providing better resilience against queue failures while minimizing
disruption to unaffected queues.

v2: add detection and reset calls when preemption/unmaped fails.
    add a per device userq counter for each user queue type.(Alex)
v3: make sure we hold the adev-&gt;userq_mutex when we call amdgpu_userq_detect_and_reset_queues. (Alex)
   warn if the adev-&gt;userq_mutex is not held.
v4: make sure we have all of the uqm-&gt;userq_mutex held.
   warn if the uqm-&gt;userq_mutex is not held.

v5: Use array for user queue type counters.(Alex)
    all of the uqm-&gt;userq_mutex need to be held when calling detect and reset.  (Alex)

v6: fix lock dep warning in amdgpu_userq_fence_dence_driver_process

v7: add the queue types in an array and use a loop in amdgpu_userq_detect_and_reset_queues (Lijo)
v8: remove atomic_set(&amp;userq_mgr-&gt;userq_count[i], 0).
   it should already be 0 since we kzalloc the structure (Alex)
v9: For consistency with kernel queues, We may want something like:
   amdgpu_userq_is_reset_type_supported (Alex)

Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: clean up and unify hw fence handling</title>
<updated>2025-10-13T18:14:35+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-08-27T15:34:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db36632ea51e8b02dd637d291c755899b20a5a31'/>
<id>urn:sha1:db36632ea51e8b02dd637d291c755899b20a5a31</id>
<content type='text'>
Decouple the amdgpu fence from the amdgpu_job structure.
This lets us clean up the separate fence ops for the embedded
fence and other fences.  This also allows us to allocate the
vm fence up front when we allocate the job.

v2: Additional cleanup suggested by Christian
v3: Additional cleanups suggested by Christian
v4: Additional cleanups suggested by David and
    vm fence fix
v5: cast seqno (David)

Cc: David.Wu3@amd.com
Cc: christian.koenig@amd.com
Tested-by: David (Ming Qiang) Wu &lt;David.Wu3@amd.com&gt;
Reviewed-by: David (Ming Qiang) Wu &lt;David.Wu3@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: set an error on all fences from a bad context</title>
<updated>2025-10-13T18:14:15+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-09-03T17:48:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ff780f4f80323148d43198f2052c14160c8428d3'/>
<id>urn:sha1:ff780f4f80323148d43198f2052c14160c8428d3</id>
<content type='text'>
When we backup ring contents to reemit after a queue reset,
we don't backup ring contents from the bad context.  When
we signal the fences, we should set an error on those
fences as well.

v2: misc cleanups
v3: add locking for fence error, fix comment (Christian)
v4: fix wrap around, locking (Christian)

Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence")
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Use memset32 for ring clearing</title>
<updated>2025-09-15T20:55:22+00:00</updated>
<author>
<name>Tvrtko Ursulin</name>
<email>tvrtko.ursulin@igalia.com</email>
</author>
<published>2025-09-09T14:49:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e2ee0f1b1ab1cfdab1ddc832e336fc85da286dfe'/>
<id>urn:sha1:e2ee0f1b1ab1cfdab1ddc832e336fc85da286dfe</id>
<content type='text'>
Use memset32 instead of open coding it, just because it is
a tiny bit nicer.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix allocating extra dwords for rings (v2)</title>
<updated>2025-09-15T20:52:52+00:00</updated>
<author>
<name>Timur Kristóf</name>
<email>timur.kristof@gmail.com</email>
</author>
<published>2025-09-09T14:49:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ae5c2bee1680436d9bf8bfaca7416496adff0ee0'/>
<id>urn:sha1:ae5c2bee1680436d9bf8bfaca7416496adff0ee0</id>
<content type='text'>
Rename extra_dw to extra_bytes and document what it's for.

The value is already used as if it were bytes in vcn_v4_0.c
and in amdgpu_ring_init. Just adjust the dword count in
jpeg_v1_0.c so that it becomes a byte count.

v2:
Rename extra_dw to extra_bytes as discussed during review.

Fixes: c8c1a1d2ef04 ("drm/amdgpu: define and add extra dword for jpeg ring")
Signed-off-by: Timur Kristóf &lt;timur.kristof@gmail.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Remove volatile from ring manipulation</title>
<updated>2025-09-15T20:51:15+00:00</updated>
<author>
<name>Rodrigo Siqueira</name>
<email>siqueira@igalia.com</email>
</author>
<published>2025-09-08T23:15:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f307cfb91734b36bde2597552971553d91838048'/>
<id>urn:sha1:f307cfb91734b36bde2597552971553d91838048</id>
<content type='text'>
None of the pointer operations handled by the ring file requires
volatile, for this reason, this commit removes all occurrences of
volatile associated with rings.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Rodrigo Siqueira &lt;siqueira@igalia.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: move reset support type checks into the caller</title>
<updated>2025-07-17T16:36:56+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-07-15T15:55:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6ac55eab4fc41e0ea80f9064945e4340f13d8b5c'/>
<id>urn:sha1:6ac55eab4fc41e0ea80f9064945e4340f13d8b5c</id>
<content type='text'>
Rather than checking in the callbacks, check if the reset
type is supported in the caller.

Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: track ring state associated with a fence</title>
<updated>2025-07-16T20:14:11+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-05-28T01:35:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=77cc0da39c7ce203cd3ce6bc5696421947a979d7'/>
<id>urn:sha1:77cc0da39c7ce203cd3ce6bc5696421947a979d7</id>
<content type='text'>
We need to know the wptr and sequence number associated
with a fence so that we can re-emit the unprocessed state
after a ring reset.  Pre-allocate storage space for
the ring buffer contents and add helpers to save off
and re-emit the unprocessed state so that it can be
re-emitted after the queue is reset.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
