<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v6.1.2</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.2</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.2'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-12-31T12:33:03+00:00</updated>
<entry>
<title>drm/amdgpu: Fix potential double free and null pointer dereference</title>
<updated>2022-12-31T12:33:03+00:00</updated>
<author>
<name>Liang He</name>
<email>windhl@126.com</email>
</author>
<published>2022-11-22T04:28:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b9f8ed9b01c40167bf46f6c25391a875b41c60b5'/>
<id>urn:sha1:b9f8ed9b01c40167bf46f6c25391a875b41c60b5</id>
<content type='text'>
[ Upstream commit dfd0287bd3920e132a8dae2a0ec3d92eaff5f2dd ]

In amdgpu_get_xgmi_hive(), we should not call kfree() after
kobject_put() as the PUT will call kfree().

In amdgpu_device_ip_init(), we need to check the returned *hive*
which can be NULL before we dereference it.

Signed-off-by: Liang He &lt;windhl@126.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix pci device refcount leak</title>
<updated>2022-12-31T12:32:15+00:00</updated>
<author>
<name>Yang Yingliang</name>
<email>yangyingliang@huawei.com</email>
</author>
<published>2022-11-17T15:00:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d7352b410471cbebf6350b2990bae82bb0d59a76'/>
<id>urn:sha1:d7352b410471cbebf6350b2990bae82bb0d59a76</id>
<content type='text'>
[ Upstream commit b85e285e3d6352b02947fc1b72303673dfacb0aa ]

As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().

So before returning from amdgpu_device_resume|suspend_display_audio(),
pci_dev_put() is called to avoid refcount leak.

Fixes: 3f12acc8d6d4 ("drm/amdgpu: put the audio codec into suspend state before gpu reset V3")
Reviewed-by: Evan Quan &lt;evan.quan@amd.com&gt;
Signed-off-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: there is no vbios fb on devices with no display hw (v2)</title>
<updated>2022-11-15T18:24:40+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2022-11-11T17:50:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f8794f31abf33a3b22c72002783670a95e6efc51'/>
<id>urn:sha1:f8794f31abf33a3b22c72002783670a95e6efc51</id>
<content type='text'>
If we enable virtual display functionality on parts with
no display hardware we can end up trying to check for and
reserve the vbios FB area on devices where it doesn't exist.
Check if display hardware is actually present on the hardware
before trying to reserve the memory.

v2: move the check into common code

Acked-by: Evan Quan &lt;evan.quan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd: Fail the suspend if resources can't be evicted</title>
<updated>2022-11-02T21:00:39+00:00</updated>
<author>
<name>Mario Limonciello</name>
<email>mario.limonciello@amd.com</email>
</author>
<published>2022-10-26T19:03:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8d4de331f1b24a22d18e3c6116aa25228cf54854'/>
<id>urn:sha1:8d4de331f1b24a22d18e3c6116aa25228cf54854</id>
<content type='text'>
If a system does not have swap and memory is under 100% usage,
amdgpu will fail to evict resources.  Currently the suspend
carries on proceeding to reset the GPU:

```
[drm] evicting device resources failed
[drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block &lt;vcn_v3_0&gt; failed -12
[drm] free PSP TMR buffer
[TTM] Failed allocating page table
[drm] evicting device resources failed
amdgpu 0000:03:00.0: amdgpu: MODE1 reset
amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
```

At this point if the suspend actually succeeded I think that amdgpu
would have recovered because the GPU would have power cut off and
restored.  However the kernel fails to continue the suspend from the
memory pressure and amdgpu fails to run the "resume" from the aborted
suspend.

```
ACPI: PM: Preparing to enter system sleep state S3
SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL|__GFP_ZERO)
  cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0
  node 0: slabs: 22, objs: 1122, free: 0
ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651)

[drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed!
[drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
[drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block &lt;psp&gt; failed -62
amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62
amdgpu 0000:03:00.0: PM: failed to resume async: error -62
```

To avoid this series of unfortunate events, fail amdgpu's suspend
when the memory eviction fails.  This will let the system gracefully
recover and the user can try suspend again when the memory pressure
is relieved.

Reported-by: post@davidak.de
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223
Signed-off-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume</title>
<updated>2022-10-26T21:48:43+00:00</updated>
<author>
<name>Prike Liang</name>
<email>Prike.Liang@amd.com</email>
</author>
<published>2022-10-21T02:04:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d61e1d1d5225a9baeb995bcbdb904f66f70ed87e'/>
<id>urn:sha1:d61e1d1d5225a9baeb995bcbdb904f66f70ed87e</id>
<content type='text'>
In the S2idle suspend/resume phase the gfxoff is keeping functional so
some IP blocks will be likely to reinitialize at gfxoff entry and that
will result in failing to program GC registers.Therefore, let disallow
gfxoff until AMDGPU IPs reinitialized completely.

Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 5.15.x
</content>
</entry>
<entry>
<title>drm/amdgpu: skip mes self test for gc 11.0.3 in recover</title>
<updated>2022-10-24T18:44:03+00:00</updated>
<author>
<name>YuBiao Wang</name>
<email>YuBiao.Wang@amd.com</email>
</author>
<published>2022-10-19T03:36:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e105b6212f1f90c56c04439279d0ef0f8dd1c308'/>
<id>urn:sha1:e105b6212f1f90c56c04439279d0ef0f8dd1c308</id>
<content type='text'>
Temporary disable mes self teset for gc 11.0.3 during gpu_recovery.

Signed-off-by: YuBiao Wang &lt;YuBiao.Wang@amd.com&gt;
Acked-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd/pm: disable cstate feature for gpu reset scenario</title>
<updated>2022-10-19T02:12:20+00:00</updated>
<author>
<name>Evan Quan</name>
<email>evan.quan@amd.com</email>
</author>
<published>2022-09-29T02:50:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3059cd8c5f797ad83d2b194ae66339f5c007ca43'/>
<id>urn:sha1:3059cd8c5f797ad83d2b194ae66339f5c007ca43</id>
<content type='text'>
Suggested by PMFW team and same as what did for gfxoff feature.
This can address some Mode1Reset failures observed on SMU13.0.0.

Signed-off-by: Evan Quan &lt;evan.quan@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>Revert "drm/amdgpu: let mode2 reset fallback to default when failure"</title>
<updated>2022-10-19T02:08:33+00:00</updated>
<author>
<name>Victor Zhao</name>
<email>Victor.Zhao@amd.com</email>
</author>
<published>2022-10-13T03:06:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a340847b0214aa9b8fd9839f7b2822ccc607edab'/>
<id>urn:sha1:a340847b0214aa9b8fd9839f7b2822ccc607edab</id>
<content type='text'>
This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV</title>
<updated>2022-09-29T13:41:46+00:00</updated>
<author>
<name>Bokun Zhang</name>
<email>Bokun.Zhang@amd.com</email>
</author>
<published>2022-09-27T16:30:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d7274ec723cd0c2359ef75f84deca462a60c5025'/>
<id>urn:sha1:d7274ec723cd0c2359ef75f84deca462a60c5025</id>
<content type='text'>
- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor
  during the suspend time. Furthermore, we cannot request a
  mode 1 reset under SRIOV as VF. Therefore, we will skip it
  as it is called in suspend_noirq() function.

- In the resume code path, we need to send REQ_GPU_INIT to the
  hypervisor and also resume PSP IP block under SRIOV.

Signed-off-by: Bokun Zhang &lt;Bokun.Zhang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Use simplified API for p2p dist calc</title>
<updated>2022-09-29T13:41:42+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2022-09-21T12:24:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bb66ecbf122cc5ca52c569f0f84b5d1b2c00f6b9'/>
<id>urn:sha1:bb66ecbf122cc5ca52c569f0f84b5d1b2c00f6b9</id>
<content type='text'>
Use the simpified API that calculates distance between two devices.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
