<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c, branch v6.1.168</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-08-23T15:52:40+00:00</updated>
<entry>
<title>drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix</title>
<updated>2023-08-23T15:52:40+00:00</updated>
<author>
<name>Tim Huang</name>
<email>Tim.Huang@amd.com</email>
</author>
<published>2023-08-14T07:13:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9c8c2cf9f9bce4fa4c4c4e821bdaa4e15762e403'/>
<id>urn:sha1:9c8c2cf9f9bce4fa4c4c4e821bdaa4e15762e403</id>
<content type='text'>
commit f1740b1ab2703b2a057da7cf33b03297e0381aa0 upstream.

GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.

[  203.349571] [drm] Fence fallback timer expired on ring sdma0
[  203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[  203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0

For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.

Signed-off-by: Tim Huang &lt;Tim.Huang@amd.com&gt;
Reviewed-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged</title>
<updated>2023-06-09T08:34:16+00:00</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2023-05-09T08:15:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=68ce1d57e52c6f88ffffb0d81b10650eafc8e540'/>
<id>urn:sha1:68ce1d57e52c6f88ffffb0d81b10650eafc8e540</id>
<content type='text'>
[ Upstream commit c1a322a7a4a96cd0a3dde32ce37af437a78bf8cd ]

When performing device unbind or halt, we have disabled all irqs at the
very begining like amdgpu_pci_remove or amdgpu_device_halt. So
amdgpu_irq_put for irqs stored in fence driver should not be called
any more, otherwise, below calltrace will arrive.

[  139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu]
[  139.114655] Call Trace:
[  139.114655]  &lt;TASK&gt;
[  139.114657]  amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu]
[  139.114836]  amdgpu_device_fini_hw+0xb6/0x350 [amdgpu]
[  139.114955]  amdgpu_driver_unload_kms+0x51/0x70 [amdgpu]
[  139.115075]  amdgpu_pci_remove+0x63/0x160 [amdgpu]
[  139.115193]  ? __pm_runtime_resume+0x64/0x90
[  139.115195]  pci_device_remove+0x3a/0xb0
[  139.115197]  device_remove+0x43/0x70
[  139.115198]  device_release_driver_internal+0xbd/0x140

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs</title>
<updated>2023-04-20T10:35:11+00:00</updated>
<author>
<name>YuBiao Wang</name>
<email>YuBiao.Wang@amd.com</email>
</author>
<published>2023-03-16T03:30:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b4efb6af93ca755f5469114adfd7f26bb944962c'/>
<id>urn:sha1:b4efb6af93ca755f5469114adfd7f26bb944962c</id>
<content type='text'>
[ Upstream commit 033c56474acf567a450f8bafca50e0b610f2b716 ]

[Why]
For engines not supporting soft reset, i.e. VCN, there will be a failed
ib test before mode 1 reset during asic reset. The fences in this case
are never signaled and next time when we try to free the sa_bo, kernel
will hang.

[How]
During pre_asic_reset, driver will clear job fences and afterwards the
fences' refcount will be reduced to 1. For drm_sched_jobs it will be
released in job_free_cb, and for non-sched jobs like ib_test, it's meant
to be released in sa_bo_free but only when the fences are signaled. So
we have to force signal the non_sched bad job's fence during
pre_asic_reset or the clear is not complete.

Signed-off-by: YuBiao Wang &lt;YuBiao.Wang@amd.com&gt;
Acked-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini</title>
<updated>2023-02-14T18:11:55+00:00</updated>
<author>
<name>Guilherme G. Piccoli</name>
<email>gpiccoli@igalia.com</email>
</author>
<published>2023-02-02T13:48:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2bcbbef9cace772f5b7128b11401c515982de34b'/>
<id>urn:sha1:2bcbbef9cace772f5b7128b11401c515982de34b</id>
<content type='text'>
commit 5ad7bbf3dba5c4a684338df1f285080f2588b535 upstream.

Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 &lt;TASK&gt;
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/

Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Guchun Chen &lt;guchun.chen@amd.com&gt;
Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Cc: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Guilherme G. Piccoli &lt;gpiccoli@igalia.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Remove fence_process in count_emitted</title>
<updated>2022-09-27T22:03:09+00:00</updated>
<author>
<name>Jiadong.Zhu</name>
<email>Jiadong.Zhu@amd.com</email>
</author>
<published>2022-09-23T13:17:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=11e38360cc15e2d5995449d19030b13a67d73cb3'/>
<id>urn:sha1:11e38360cc15e2d5995449d19030b13a67d73cb3</id>
<content type='text'>
The function amdgpu_fence_count_emitted used in work_hander should not call
amdgpu_fence_process which must be used in irq handler.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Jiadong.Zhu &lt;Jiadong.Zhu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd: Fix typo 'the the' in comment</title>
<updated>2022-07-25T13:31:05+00:00</updated>
<author>
<name>Slark Xiao</name>
<email>slark_xiao@163.com</email>
</author>
<published>2022-07-21T06:09:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9dd4545f65679042273a93054ec3bb665ecf7366'/>
<id>urn:sha1:9dd4545f65679042273a93054ec3bb665ecf7366</id>
<content type='text'>
Replace 'the the' with 'the' in the comment.

Signed-off-by: Slark Xiao &lt;slark_xiao@163.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: support reset flag set for gpu reset</title>
<updated>2022-07-13T15:25:17+00:00</updated>
<author>
<name>Likun Gao</name>
<email>Likun.Gao@amd.com</email>
</author>
<published>2022-07-08T03:14:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f1549c09c520877be211d483d3c6f4e7f77d2588'/>
<id>urn:sha1:f1549c09c520877be211d483d3c6f4e7f77d2588</id>
<content type='text'>
Move reset_context out of gpu recover function to make it configurable
for different reset purpose.
For the reset way of call gpu_recovery sysfs, force to use full reset
method. Otherwise, try soft reset by default if the related ASIC
supportted, if soft reset failed, will use full reset.

Signed-off-by: Likun Gao &lt;Likun.Gao@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Follow up change to previous drm scheduler change.</title>
<updated>2022-06-28T15:24:41+00:00</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2022-06-20T21:33:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9ae55f030dc523fc4dc6069557e4a887ea815453'/>
<id>urn:sha1:9ae55f030dc523fc4dc6069557e4a887ea815453</id>
<content type='text'>
Align refcount behaviour for amdgpu_job embedded HW fence with
classic pointer style HW fences by increasing refcount each
time emit is called so amdgpu code doesn't need to make workarounds
using amdgpu_job.job_run_counter to keep the HW fence refcount balanced.

Also since in the previous patch we resumed setting s_fence-&gt;parent to NULL
in drm_sched_stop switch to directly checking if job-&gt;hw_fence is
signaled to short circuit reset if already signed.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Tested-by: Yiqing Yao &lt;yiqing.yao@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Prevent race between late signaled fences and GPU reset.</title>
<updated>2022-06-28T15:24:24+00:00</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2022-06-18T04:28:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9e225fb9e636b31b97e9d35324c2f9e43ee0aab4'/>
<id>urn:sha1:9e225fb9e636b31b97e9d35324c2f9e43ee0aab4</id>
<content type='text'>
Problem:
After we start handling timed out jobs we assume there fences won't be
signaled but we cannot be sure and sometimes they fire late. We need
to prevent concurrent accesses to fence array from
amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process
from a late EOP interrupt.

Fix:
Before accessing fence array in GPU disable EOP interrupt and flush
all pending interrupt handlers for amdgpu device's interrupt line.

v2: Switch from irq_get/put to full enable/disable_irq for amdgpu

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences</title>
<updated>2022-06-28T15:24:18+00:00</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2022-06-20T17:08:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dd70748eda3f63217d5284f48651239a9721245e'/>
<id>urn:sha1:dd70748eda3f63217d5284f48651239a9721245e</id>
<content type='text'>
This function should drop the fence refcount when it extracts the
fence from the fence array, just as it's done in amdgpu_fence_process.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
