<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-07-06T09:00:15+00:00</updated>
<entry>
<title>drm/amdgpu: switch job hw_fence to amdgpu_fence</title>
<updated>2025-07-06T09:00:15+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-06-02T15:31:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8302adf60aba60da707af566801c0641c8d367c9'/>
<id>urn:sha1:8302adf60aba60da707af566801c0641c8d367c9</id>
<content type='text'>
commit ebe43542702c3d15d1a1d95e8e13b1b54076f05a upstream.

Use the amdgpu fence container so we can store additional
data in the fence.  This also fixes the start_time handling
for MCBP since we were casting the fence to an amdgpu_fence
and it wasn't.

Fixes: 3f4c175d62d8 ("drm/amdgpu: MCBP based on DRM scheduler (v9)")
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit bf1cd14f9e2e1fdf981eed273ddd595863f5288c)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix</title>
<updated>2023-08-16T19:46:39+00:00</updated>
<author>
<name>Tim Huang</name>
<email>Tim.Huang@amd.com</email>
</author>
<published>2023-08-14T07:13:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f1740b1ab2703b2a057da7cf33b03297e0381aa0'/>
<id>urn:sha1:f1740b1ab2703b2a057da7cf33b03297e0381aa0</id>
<content type='text'>
GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.

[  203.349571] [drm] Fence fallback timer expired on ring sdma0
[  203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[  203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0

For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.

Signed-off-by: Tim Huang &lt;Tim.Huang@amd.com&gt;
Reviewed-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdgpu: mark force completed fences with -ECANCELED</title>
<updated>2023-06-15T15:37:55+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2023-04-17T10:52:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0a33b11d26c6b7e975b54d469a739ffac29f67ab'/>
<id>urn:sha1:0a33b11d26c6b7e975b54d469a739ffac29f67ab</id>
<content type='text'>
When we force complete fences we should mark them as canceled.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add amdgpu_error_* debugfs file</title>
<updated>2023-06-15T15:37:54+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2023-04-19T10:51:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b13eb02ba8ba7617d41212121891756da31f1d8b'/>
<id>urn:sha1:b13eb02ba8ba7617d41212121891756da31f1d8b</id>
<content type='text'>
This allows us to insert some error codes into the bottom of the pipeline
on an engine.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: remove unnecessary (void*) conversions</title>
<updated>2023-06-09T14:40:12+00:00</updated>
<author>
<name>Su Hui</name>
<email>suhui@nfschina.com</email>
</author>
<published>2023-05-15T01:34:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=109b4d8cfe4279da1cbcbcd99ae54cb2b2aee521'/>
<id>urn:sha1:109b4d8cfe4279da1cbcbcd99ae54cb2b2aee521</id>
<content type='text'>
No need cast (void*) to (struct amdgpu_device *).

Signed-off-by: Su Hui &lt;suhui@nfschina.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged</title>
<updated>2023-06-09T13:39:09+00:00</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2023-05-09T08:15:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3083b1007d4b8d377f8e2b5ce349a275a2fff725'/>
<id>urn:sha1:3083b1007d4b8d377f8e2b5ce349a275a2fff725</id>
<content type='text'>
When performing device unbind or halt, we have disabled all irqs at the
very begining like amdgpu_pci_remove or amdgpu_device_halt. So
amdgpu_irq_put for irqs stored in fence driver should not be called
any more, otherwise, below calltrace will arrive.

[  139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu]
[  139.114655] Call Trace:
[  139.114655]  &lt;TASK&gt;
[  139.114657]  amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu]
[  139.114836]  amdgpu_device_fini_hw+0xb6/0x350 [amdgpu]
[  139.114955]  amdgpu_driver_unload_kms+0x51/0x70 [amdgpu]
[  139.115075]  amdgpu_pci_remove+0x63/0x160 [amdgpu]
[  139.115193]  ? __pm_runtime_resume+0x64/0x90
[  139.115195]  pci_device_remove+0x3a/0xb0
[  139.115197]  device_remove+0x43/0x70
[  139.115198]  device_release_driver_internal+0xbd/0x140

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: improve wait logic at fence polling</title>
<updated>2023-06-09T13:39:06+00:00</updated>
<author>
<name>Alex Sierra</name>
<email>alex.sierra@amd.com</email>
</author>
<published>2023-04-24T19:27:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6e87c4229513904295674b84b6e2d12951567191'/>
<id>urn:sha1:6e87c4229513904295674b84b6e2d12951567191</id>
<content type='text'>
Accomplish this by reading the seq number right away instead of sleep
for 5us. There are certain cases where the fence is ready almost
immediately. Sleep number granularity was also reduced as the majority
of the kiq tlb flush takes between 2us to 6us.

Signed-off-by: Alex Sierra &lt;alex.sierra@amd.com&gt;
Acked-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd/amdgpu: Fix errors &amp; warnings in amdgpu _bios, _cs, _dma_buf, _fence.c</title>
<updated>2023-06-09T13:29:51+00:00</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2023-05-03T08:40:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9e690184586bfb88efa176cdf912414f6c53519c'/>
<id>urn:sha1:9e690184586bfb88efa176cdf912414f6c53519c</id>
<content type='text'>
The following checkpatch errors &amp; warning is removed.

ERROR: else should follow close brace '}'
ERROR: trailing statements should be on next line
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Possible repeated word: 'Fences'
WARNING: Missing a blank line after declarations
WARNING: braces {} are not necessary for single statement blocks
WARNING: Comparisons should place the constant on the right side of the test
WARNING: printk() should include KERN_&lt;LEVEL&gt; facility level

Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs</title>
<updated>2023-03-23T13:31:30+00:00</updated>
<author>
<name>YuBiao Wang</name>
<email>YuBiao.Wang@amd.com</email>
</author>
<published>2023-03-16T03:30:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=033c56474acf567a450f8bafca50e0b610f2b716'/>
<id>urn:sha1:033c56474acf567a450f8bafca50e0b610f2b716</id>
<content type='text'>
[Why]
For engines not supporting soft reset, i.e. VCN, there will be a failed
ib test before mode 1 reset during asic reset. The fences in this case
are never signaled and next time when we try to free the sa_bo, kernel
will hang.

[How]
During pre_asic_reset, driver will clear job fences and afterwards the
fences' refcount will be reduced to 1. For drm_sched_jobs it will be
released in job_free_cb, and for non-sched jobs like ib_test, it's meant
to be released in sa_bo_free but only when the fences are signaled. So
we have to force signal the non_sched bad job's fence during
pre_asic_reset or the clear is not complete.

Signed-off-by: YuBiao Wang &lt;YuBiao.Wang@amd.com&gt;
Acked-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini</title>
<updated>2023-02-09T03:32:50+00:00</updated>
<author>
<name>Guilherme G. Piccoli</name>
<email>gpiccoli@igalia.com</email>
</author>
<published>2023-02-02T13:48:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5ad7bbf3dba5c4a684338df1f285080f2588b535'/>
<id>urn:sha1:5ad7bbf3dba5c4a684338df1f285080f2588b535</id>
<content type='text'>
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 &lt;TASK&gt;
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/

Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Guchun Chen &lt;guchun.chen@amd.com&gt;
Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Cc: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Guilherme G. Piccoli &lt;gpiccoli@igalia.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
</feed>
