<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu, branch v6.4.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.4.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.4.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-08-16T16:32:20+00:00</updated>
<entry>
<title>drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation</title>
<updated>2023-08-16T16:32:20+00:00</updated>
<author>
<name>Evan Quan</name>
<email>evan.quan@amd.com</email>
</author>
<published>2023-05-04T09:09:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=737d519e7b38b75f052f46708cfe7e6bccd733fe'/>
<id>urn:sha1:737d519e7b38b75f052f46708cfe7e6bccd733fe</id>
<content type='text'>
commit b75efe88b20c2be28b67e2821a794cc183e32374 upstream.

An intentional delay is added on soft ctf triggered. Then there will
be a double check for the GPU temperature before taking further
action. This can avoid unintended shutdown due to temperature
momentary fluctuation.

Signed-off-by: Evan Quan &lt;evan.quan@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
[ Hand-modified because XCP support added to amdgpu.h in kernel 6.5
  and is not necessary for this fix. ]
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1267
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2779
Signed-off-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amd: Disable S/G for APUs when 64GB or more host memory</title>
<updated>2023-08-16T16:32:19+00:00</updated>
<author>
<name>Mario Limonciello</name>
<email>mario.limonciello@amd.com</email>
</author>
<published>2023-07-27T15:22:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=349a7b42bcd571670eed4f7e28caad1f355ae6cd'/>
<id>urn:sha1:349a7b42bcd571670eed4f7e28caad1f355ae6cd</id>
<content type='text'>
commit 08fffa74d9772d9538338be3f304006c94dde6f0 upstream.

Users report a white flickering screen on multiple systems that
is tied to having 64GB or more memory.  When S/G is enabled pages
will get pinned to both VRAM carve out and system RAM leading to
this.

Until it can be fixed properly, disable S/G when 64GB of memory or
more is detected.  This will force pages to be pinned into VRAM.
This should fix white screen flickers but if VRAM pressure is
encountered may lead to black screens.  It's a trade-off for now.

Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
Cc: Hamza Mahfooz &lt;Hamza.Mahfooz@amd.com&gt;
Cc: Roman Li &lt;roman.li@amd.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 6.1.y: bf0207e172703 ("drm/amdgpu: add S/G display parameter")
Cc: &lt;stable@vger.kernel.org&gt; # 6.4.y
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2735
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354
Signed-off-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix possible UAF in amdgpu_cs_pass1()</title>
<updated>2023-08-16T16:32:19+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2023-07-28T15:14:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e08e9dd09809b16f8f8cee8c466841b33d24ed96'/>
<id>urn:sha1:e08e9dd09809b16f8f8cee8c466841b33d24ed96</id>
<content type='text'>
commit 90e065677e0362a777b9db97ea21d43a39211399 upstream.

Since the gang_size check is outside of chunk parsing
loop, we need to reset i before we free the chunk data.

Suggested by Ye Zhang (@VAR10CK) of Baidu Security.

Reviewed-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Use apt name for FW reserved region</title>
<updated>2023-08-11T10:14:27+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2023-02-24T12:31:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=33f735ef8dfecdd8bf3c8544d1db15079353e920'/>
<id>urn:sha1:33f735ef8dfecdd8bf3c8544d1db15079353e920</id>
<content type='text'>
commit db3b5cb64a9ca301d14ed027e470834316720e42 upstream.

Use the generic term fw_reserved_memory for FW reserve region. This
region may also hold discovery TMR in addition to other reserve
regions. This region size could be larger than discovery tmr size, hence
don't change the discovery tmr size based on this.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Le Ma &lt;le.ma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
This change fixes reading IP discovery from debugfs.
It needed to be hand modified because GC 9.4.3 support isn't
introduced in older kernels until 228ce176434b ("drm/amdgpu: Handle
VRAM dependencies on GFXIP9.4.3")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2748
Signed-off-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amd: Fix an error handling mistake in psp_sw_init()</title>
<updated>2023-08-03T08:25:58+00:00</updated>
<author>
<name>Mario Limonciello</name>
<email>mario.limonciello@amd.com</email>
</author>
<published>2023-07-13T05:14:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d3e51257ec317a44b24964090ad245f19c828a43'/>
<id>urn:sha1:d3e51257ec317a44b24964090ad245f19c828a43</id>
<content type='text'>
[ Upstream commit c01aebeef3ce45f696ffa0a1303cea9b34babb45 ]

If the second call to amdgpu_bo_create_kernel() fails, the memory
allocated from the first call should be cleared.  If the third call
fails, the memory from the second call should be cleared.

Fixes: b95b5391684b ("drm/amdgpu/psp: move PSP memory alloc from hw_init to sw_init")
Signed-off-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amd: Move helper for dynamic speed switch check out of smu13</title>
<updated>2023-08-03T08:25:40+00:00</updated>
<author>
<name>Mario Limonciello</name>
<email>mario.limonciello@amd.com</email>
</author>
<published>2023-07-08T02:26:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3d7a757d2c15ef26ebdac20bf6f10da8d94b7918'/>
<id>urn:sha1:3d7a757d2c15ef26ebdac20bf6f10da8d94b7918</id>
<content type='text'>
commit 188623076d0f1a500583d392b6187056bf7cc71a upstream.

This helper is used for checking if the connected host supports
the feature, it can be moved into generic code to be used by other
smu implementations as well.

Signed-off-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Reviewed-by: Evan Quan &lt;evan.quan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel</title>
<updated>2023-07-27T06:56:38+00:00</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2023-07-06T07:57:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1ac157c8a7ea5868c47da9b43de7c10708c31df2'/>
<id>urn:sha1:1ac157c8a7ea5868c47da9b43de7c10708c31df2</id>
<content type='text'>
commit b42ae87a7b3878afaf4c3852ca66c025a5b996e0 upstream.

In below thousands of screen rotation loop tests with virtual display
enabled, a CPU hard lockup issue may happen, leading system to unresponsive
and crash.

do {
	xrandr --output Virtual --rotate inverted
	xrandr --output Virtual --rotate right
	xrandr --output Virtual --rotate left
	xrandr --output Virtual --rotate normal
} while (1);

NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

? hrtimer_run_softirq+0x140/0x140
? store_vblank+0xe0/0xe0 [drm]
hrtimer_cancel+0x15/0x30
amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
drm_vblank_disable_and_save+0x185/0x1f0 [drm]
drm_crtc_vblank_off+0x159/0x4c0 [drm]
? record_print_text.cold+0x11/0x11
? wait_for_completion_timeout+0x232/0x280
? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
? bit_wait_io_timeout+0xe0/0xe0
? wait_for_completion_interruptible+0x1d7/0x320
? mutex_unlock+0x81/0xd0
amdgpu_vkms_crtc_atomic_disable

It's caused by a stuck in lock dependency in such scenario on different
CPUs.

CPU1                                             CPU2
drm_crtc_vblank_off                              hrtimer_interrupt
    grab event_lock (irq disabled)                   __hrtimer_run_queues
        grab vbl_lock/vblank_time_block                  amdgpu_vkms_vblank_simulate
            amdgpu_vkms_disable_vblank                       drm_handle_vblank
                hrtimer_cancel                                         grab dev-&gt;event_lock

So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
current clock base, as that timer queue on CPU2 has no chance to finish it
because of failing to hold the lock. So NMI watchdog will throw the errors
after its threshold, and all later CPUs are impacted/blocked.

So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
does not need to wait the handler to finish. And also it's not necessary
to check the return value of hrtimer_try_to_cancel, because even if it's
-1 which means current timer callback is running, it will be reprogrammed
in hrtimer_start with calling enable_vblank to make it works.

v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
tag as well

v3: drop warn printing (Christian)

v4: drop superfluous check of blank-&gt;enabled in timer function, as it's
guaranteed in drm_handle_vblank (Christian)

Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable@vger.kernel.org
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: avoid restore process run into dead loop.</title>
<updated>2023-07-23T11:54:04+00:00</updated>
<author>
<name>gaba</name>
<email>gaba@amd.com</email>
</author>
<published>2023-03-03T00:03:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=478e83b96931669615dfdd7fecffab03bdf70a2b'/>
<id>urn:sha1:478e83b96931669615dfdd7fecffab03bdf70a2b</id>
<content type='text'>
commit 8a774fe912ff09e39c2d3a3589c729330113f388 upstream.

In restore process worker, pinned BO cause update PTE fail, then
the function re-schedule the restore_work. This will generate dead loop.

Signed-off-by: gaba &lt;gaba@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix clearing mappings for BOs that are always valid in VM</title>
<updated>2023-07-23T11:54:04+00:00</updated>
<author>
<name>Samuel Pitoiset</name>
<email>samuel.pitoiset@gmail.com</email>
</author>
<published>2023-06-16T13:14:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=733a1854db14792a47caabdd19f08694e90c1203'/>
<id>urn:sha1:733a1854db14792a47caabdd19f08694e90c1203</id>
<content type='text'>
commit ea2c3c08554601b051d91403a241266e1cf490a5 upstream.

Per VM BOs must be marked as moved or otherwise their ranges are not
updated on use which might be necessary when the replace operation
splits mappings.

This fixes random GPU hangs when replacing sparse mappings from the
userspace, while OP_MAP/OP_UNMAP works fine because always valid BOs
are correctly handled there.

Cc: stable@vger.kernel.org
Signed-off-by: Samuel Pitoiset &lt;samuel.pitoiset@gmail.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: check RAS irq existence for VCN/JPEG</title>
<updated>2023-07-19T14:37:02+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-07-07T15:07:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=14c6b21717f05e39aa2412a26762c1439e05e9cd'/>
<id>urn:sha1:14c6b21717f05e39aa2412a26762c1439e05e9cd</id>
<content type='text'>
commit 4ff96bcc0d40b66bf3ddd6010830e9a4f9b85d53 upstream

No RAS irq is allowed.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
