<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-04T12:21:02+00:00</updated>
<entry>
<title>drm/amdgpu: avoid a warning in timedout job handler</title>
<updated>2026-03-04T12:21:02+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-12-12T16:46:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d89aef80790f488d5123179326fa9ee8d5be67bc'/>
<id>urn:sha1:d89aef80790f488d5123179326fa9ee8d5be67bc</id>
<content type='text'>
[ Upstream commit c8cf9ddc549fb93cb5a35f3fe23487b1e6707e74 ]

Only set an error on the fence if the fence is not
signalled.  We can end up with a warning if the
per queue reset path signals the fence and sets an error
as part of the reset, but fails to recover.

Reviewed-by: Timur Kristóf &lt;timur.kristof@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: switch job hw_fence to amdgpu_fence</title>
<updated>2025-07-06T09:01:47+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-06-02T15:31:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5f2e040f19c4d73496ac8c32f2c441a6146430c9'/>
<id>urn:sha1:5f2e040f19c4d73496ac8c32f2c441a6146430c9</id>
<content type='text'>
commit ebe43542702c3d15d1a1d95e8e13b1b54076f05a upstream.

Use the amdgpu fence container so we can store additional
data in the fence.  This also fixes the start_time handling
for MCBP since we were casting the fence to an amdgpu_fence
and it wasn't.

Fixes: 3f4c175d62d8 ("drm/amdgpu: MCBP based on DRM scheduler (v9)")
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit bf1cd14f9e2e1fdf981eed273ddd595863f5288c)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: don't access invalid sched</title>
<updated>2024-12-27T13:02:11+00:00</updated>
<author>
<name>Pierre-Eric Pelloux-Prayer</name>
<email>pierre-eric.pelloux-prayer@amd.com</email>
</author>
<published>2024-12-06T12:17:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=67291d601f2b032062b1b2f60ffef1b63e10094c'/>
<id>urn:sha1:67291d601f2b032062b1b2f60ffef1b63e10094c</id>
<content type='text'>
[ Upstream commit a93b1020eb9386d7da11608477121b10079c076a ]

Since 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
accessing job-&gt;base.sched can produce unexpected results as the initialisation
of (*job)-&gt;base.sched done in amdgpu_job_alloc is overwritten by the
memset.

This commit fixes an issue when a CS would fail validation and would
be rejected after job-&gt;num_ibs is incremented. In this case,
amdgpu_ib_free(ring-&gt;adev, ...) will be called, which would crash the
machine because the ring value is bogus.

To fix this, pass a NULL pointer to amdgpu_ib_free(): we can do this
because the device is actually not used in this function.

The next commit will remove the ring argument completely.

Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
Signed-off-by: Pierre-Eric Pelloux-Prayer &lt;pierre-eric.pelloux-prayer@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 2ae520cb12831d264ceb97c61f72c59d33c0dbd7)
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: skip coredump after job timeout in SRIOV</title>
<updated>2024-09-25T16:55:52+00:00</updated>
<author>
<name>ZhenGuo Yin</name>
<email>zhenguo.yin@amd.com</email>
</author>
<published>2024-09-19T03:38:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e1d27f7a9cea1e0c06699164e3b177862e7b4096'/>
<id>urn:sha1:e1d27f7a9cea1e0c06699164e3b177862e7b4096</id>
<content type='text'>
VF FLR will be triggered by host driver before job timeout,
hence the error status of GPU get cleared. Performing a
coredump here is unnecessary.

Signed-off-by: ZhenGuo Yin &lt;zhenguo.yin@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Move the dumping log out of for loop</title>
<updated>2024-08-29T17:39:19+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2024-08-28T08:06:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=30e8f4c2bd532c44af0e0fad9c04e7d2970b91a6'/>
<id>urn:sha1:30e8f4c2bd532c44af0e0fad9c04e7d2970b91a6</id>
<content type='text'>
log message "Dumping IP State Completed" needs to
be logged only once when state dumping is complete.

Hence moving it out of the for loop.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Acked-by: Trigger Huang &lt;Trigger.Huang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Do core dump immediately when job tmo</title>
<updated>2024-08-29T17:39:00+00:00</updated>
<author>
<name>Trigger Huang</name>
<email>Trigger.Huang@amd.com</email>
</author>
<published>2024-08-19T08:04:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c67db6a6a6be4bb1db1b0fd5b24040d68e461cb1'/>
<id>urn:sha1:c67db6a6a6be4bb1db1b0fd5b24040d68e461cb1</id>
<content type='text'>
Do the coredump immediately after a job timeout to get a closer
representation of GPU's error status.

V2: This will skip printing vram_lost as the GPU reset is not
happened yet (Alex)

V3: Unconditionally call the core dump as we care about all the reset
functions(soft-recovery and queue reset and full adapter reset, Alex)

V4: Do the dump after adev-&gt;job_hang = true (Sunil)

Signed-off-by: Trigger Huang &lt;Trigger.Huang@amd.com&gt;
Acked-by:  Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'amd-drm-next-6.12-2024-08-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next</title>
<updated>2024-08-27T12:33:12+00:00</updated>
<author>
<name>Daniel Vetter</name>
<email>daniel.vetter@ffwll.ch</email>
</author>
<published>2024-08-27T12:33:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e55ef65510a401862b902dc979441ea10ae25c61'/>
<id>urn:sha1:e55ef65510a401862b902dc979441ea10ae25c61</id>
<content type='text'>
amd-drm-next-6.12-2024-08-26:

amdgpu:
- SDMA devcoredump support
- DCN 4.0.1 updates
- DC SUBVP fixes
- Refactor OPP in DC
- Refactor MMHUBBUB in DC
- DC DML 2.1 updates
- DC FAMS2 updates
- RAS updates
- GFX12 updates
- VCN 4.0.3 updates
- JPEG 4.0.3 updates
- Enable wave kill (soft recovery) for compute queues
- Clean up CP error interrupt handling
- Enable CP bad opcode interrupts
- VCN 4.x fixes
- VCN 5.x fixes
- GPU reset fixes
- Fix vbios embedded EDID size handling
- SMU 14.x updates
- Misc code cleanups and spelling fixes
- VCN devcoredump support
- ISP MFD i2c support
- DC vblank fixes
- GFX 12 fixes
- PSR fixes
- Convert vbios embedded EDID to drm_edid
- DCN 3.5 updates
- DMCUB updates
- Cursor fixes
- Overdrive support for SMU 14.x
- GFX CP padding optimizations
- DCC fixes
- DSC fixes
- Preliminary per queue reset infrastructure
- Initial per queue reset support for GFX 9
- Initial per queue reset support for GFX 7, 8
- DCN 3.2 fixes
- DP MST fixes
- SR-IOV fixes
- GFX 9.4.3/4 devcoredump support
- Add process isolation framework
- Enable process isolation support for GFX 9.4.3/4
- Take IOMMU remapping into account for P2P DMA checks

amdkfd:
- CRIU fixes
- Improved input validation for user queues
- HMM fix
- Enable process isolation support for GFX 9.4.3/4
- Initial per queue reset support for GFX 9
- Allow users to target recommended SDMA engines

radeon:
- remove .load and drm_dev_alloc
- Fix vbios embedded EDID size handling
- Convert vbios embedded EDID to drm_edid
- Use GEM references instead of TTM
- r100 cp init cleanup
- Fix potential overflows in evergreen CS offset tracking

UAPI:
- KFD support for targetting queues on recommended SDMA engines
  Proposed userspace:
  https://github.com/ROCm/ROCR-Runtime/commit/2f588a24065f41c208c3701945e20be746d8faf7
  https://github.com/ROCm/ROCR-Runtime/commit/eb30a5bbc7719c6ffcf2d2dd2878bc53a47b3f30

drm/buddy:
- Add start address support for trim function

From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240826201528.55307-1-alexander.deucher@amd.com
</content>
</entry>
<entry>
<title>drm/amdgpu: increase the reset counter for the queue reset</title>
<updated>2024-08-16T18:17:56+00:00</updated>
<author>
<name>Prike Liang</name>
<email>Prike.Liang@amd.com</email>
</author>
<published>2024-06-12T07:49:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fb0a5834a338329bc665c7ce2b89f3e376557565'/>
<id>urn:sha1:fb0a5834a338329bc665c7ce2b89f3e376557565</id>
<content type='text'>
Update the reset counter for the amdgpu_cs_query_reset_state()

Acked-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add per ring reset support (v5)</title>
<updated>2024-08-16T18:17:52+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2024-06-03T18:38:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=15789fa0f0e29cf802f30d0e308da9c6b18c116a'/>
<id>urn:sha1:15789fa0f0e29cf802f30d0e308da9c6b18c116a</id>
<content type='text'>
If a specific job is hung, try and reset just the
ring associated with the job.

v2: move to amdgpu_job.c
v3: fix drm_sched_stop() handling when ring reset fails
v4: drop unnecessary amdgpu_fence_driver_clear_job_fences() and
    drm_sched_increase_karma()
v5: rework sched_stop handling

Acked-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Forward soft recovery errors to userspace</title>
<updated>2024-08-07T22:20:07+00:00</updated>
<author>
<name>Joshua Ashton</name>
<email>joshua@froggi.es</email>
</author>
<published>2024-03-07T19:04:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=829798c789f567ef6ba4b084c15b7b5f3bd98d51'/>
<id>urn:sha1:829798c789f567ef6ba4b084c15b7b5f3bd98d51</id>
<content type='text'>
As we discussed before[1], soft recovery should be
forwarded to userspace, or we can get into a really
bad state where apps will keep submitting hanging
command buffers cascading us to a hard reset.

1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
Signed-off-by: Joshua Ashton &lt;joshua@froggi.es&gt;
Reviewed-by: Marek Olšák &lt;marek.olsak@amd.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 434967aadbbbe3ad9103cc29e9a327de20fdba01)
Cc: stable@vger.kernel.org
</content>
</entry>
</feed>
