kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu.h, branch v5.15.89

drm/amdgpu: Increase tlb flush timeout for sriov

2022-09-05T08:30:11+00:00

[ Upstream commit 373008bfc9cdb0f050258947fa5a095f0657e1bc ] [Why] During multi-vf executing benchmark (Luxmark) observed kiq error timeout. It happenes because all of VFs do the tlb invalidation at the same time. Although each VF has the invalidate register set, from hardware side the invalidate requests are queue to execute. [How] In case of 12 VF increase timeout on 12*100ms Signed-off-by: Dusica Milinkovic Acked-by: Shaoyun Liu Acked-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amd: Refactor `amdgpu_aspm` to be evaluated per device

2022-07-12T14:35:06+00:00

[ Upstream commit 0ab5d711ec74d9e60673900974806b7688857947 ] Evaluating `pcie_aspm_enabled` as part of driver probe has the implication that if one PCIe bridge with an AMD GPU connected doesn't support ASPM then none of them do. This is an invalid assumption as the PCIe core will configure ASPM for individual PCIe bridges. Create a new helper function that can be called by individual dGPUs to react to the `amdgpu_aspm` module parameter without having negative results for other dGPUs on the PCIe bus. Suggested-by: Lijo Lazar Reviewed-by: Lijo Lazar Signed-off-by: Mario Limonciello Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amd: Don't reset dGPUs if the system is going to s2idle

2022-05-25T07:57:28+00:00

commit 7123d39dc24dcd21ff23d75f46f926b15269b9da upstream. An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC reset for handling aborted suspend can't work with s2idle. This functionality was introduced in commit daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)"). A few other commits have gone on top of the ASIC reset, but this still doesn't work on the A+A configuration in s2idle. Avoid doing the reset on dGPUs specifically when using s2idle. Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008 Reviewed-by: Alex Deucher Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amd: add support to check whether the system is set to s3

2022-02-23T11:03:07+00:00

[ Upstream commit f52a2b8badbd24faf73a13c9c07fdb9d07352944 ] This will be used to help make decisions on what to do in misconfigured systems. v2: squash in semicolon fix from Stephen Rothwell Signed-off-by: Mario Limonciello Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amd: Warn users about potential s0ix problems

2022-02-23T11:03:06+00:00

[ Upstream commit a6ed2035878e5ad2e43ed175d8812ac9399d6c40 ] On some OEM setups users can configure the BIOS for S3 or S2idle. When configured to S3 users can still choose 's2idle' in the kernel by using `/sys/power/mem_sleep`. Before commit 6dc8265f9803 ("drm/amdgpu: always reset the asic in suspend (v2)"), the GPU would crash. Now when configured this way, the system should resume but will use more power. As such, adjust the `amdpu_acpi_is_s0ix function` to warn users about potential power consumption issues during their first attempt at suspending. Reported-by: Bjoren Dasse Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1824 Reviewed-by: Alex Deucher Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: disable runpm if we are the primary adapter

2022-01-11T14:35:17+00:00

commit b95dc06af3e683d6b7ddbbae178b2b2a21ee8b2b upstream. If we are the primary adapter (i.e., the one used by the firwmare framebuffer), disable runtime pm. This fixes a regression caused by commit 55285e21f045 which results in the displays waking up shortly after they go to sleep due to the device coming out of runtime suspend and sending a hotplug uevent. v2: squash in reworked fix from Evan Fixes: 55285e21f045 ("fbdev/efifb: Release PCI device's runtime PM ref during FB destroy") Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215203 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1840 Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: revert "Add autodump debugfs node for gpu reset v8"

2021-11-06T13:13:31+00:00

commit c8365dbda056578eebe164bf110816b1a39b4b7f upstream. This reverts commit 728e7e0cd61899208e924472b9e641dbeb0775c4. Further discussion reveals that this feature is severely broken and needs to be reverted ASAP. GPU reset can never be delayed by userspace even for debugging or otherwise we can run into in kernel deadlocks. Signed-off-by: Christian König Acked-by: Alex Deucher Acked-by: Nirmoy Das Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume

2021-10-05T17:02:31+00:00

In current code, when a PCI error state pci_channel_io_normal is detectd, it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI driver will continue the execution of PCI resume callback report_resume by pci_walk_bridge, and the callback will go into amdgpu_pci_resume finally, where write lock is releasd unconditionally without acquiring such lock first. In this case, a deadlock will happen when other threads start to acquire the read lock. To fix this, add a member in amdgpu_device strucutre to cache pci_channel_state, and only continue the execution in amdgpu_pci_resume when it's pci_channel_io_frozen. Fixes: c9a6b82f45e2 ("drm/amdgpu: Implement DPC recovery") Suggested-by: Andrey Grodzovsky Signed-off-by: Guchun Chen Reviewed-by: Andrey Grodzovsky Signed-off-by: Alex Deucher

drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10

2021-09-14T20:14:41+00:00

Seems like newer cards can have even more instances now. Found by UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29 index 8 is out of range for type 'uint32_t *[8]' Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697 Cc: stable@vger.kernel.org Signed-off-by: Ernst Sjöstrand Signed-off-by: Alex Deucher

drm/amdgpu: Add driver infrastructure for MCA RAS

2021-08-24T19:36:18+00:00

Add MCA specific IP blocks targetting RAS features Reviewed-by: Hawking Zhang Signed-off-by: John Clements Signed-off-by: Alex Deucher