summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-03-30drm/amdgpu: reposition the gpu reset checking for reuseTim Huang2-20/+25
commit aaee0ce460b954e08b6e630d7e54b2abb672feb8 upstream. Move the amdgpu_acpi_should_gpu_reset out of CONFIG_SUSPEND to share it with hibernate case. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-30drm/amdgpu: skip ASIC reset for APUs when go to S4Tim Huang1-1/+4
commit b589626674de94d977e81c99bf7905872b991197 upstream. For GC IP v11.0.4/11, PSP TMR need to be reserved for ASIC mode2 reset. But for S4, when psp suspend, it will destroy the TMR that fails the ASIC reset. [ 96.006101] amdgpu 0000:62:00.0: amdgpu: MODE2 reset [ 100.409717] amdgpu 0000:62:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000011 SMN_C2PMSG_82:0x00000002 [ 100.411593] amdgpu 0000:62:00.0: amdgpu: Mode2 reset failed! [ 100.412470] amdgpu 0000:62:00.0: PM: pci_pm_freeze(): amdgpu_pmops_freeze+0x0/0x50 [amdgpu] returns -62 [ 100.414020] amdgpu 0000:62:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -62 [ 100.415311] amdgpu 0000:62:00.0: PM: pci_pm_freeze+0x0/0xd0 returned -62 after 4623202 usecs [ 100.416608] amdgpu 0000:62:00.0: PM: failed to freeze async: error -62 We can skip the reset on APUs, assuming we can resume them properly. Verified on some GFX11, GFX10 and old GFX9 APUs. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-30drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD NaviKai-Heng Feng4-17/+18
commit 2b072442f4962231a8516485012bb2d2551ef2fe upstream. S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default"). The root cause is still not clear for now. So extend and apply the ASPM quirk from commit e02fe3bc7aba ("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to workaround the issue on Navi cards too. Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-30drm/amd: Fix initialization mistake for NBIO 7.3.0Mario Limonciello1-5/+9
[ Upstream commit 1717cc5f2962a4652c76ed3858b499ccae6c277c ] The same strapping initialization issue that happened on NBIO 7.5.1 appears to be happening on NBIO 7.3.0. Apply the same fix to 7.3.0 as well. Note: This workaround relies upon the integrated GPU being enabled in BIOS. If the integrated GPU is disabled in BIOS a different workaround will be required. Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Cc: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Link: https://lore.kernel.org/linux-usb/Y%2Fz9GdHjPyF2rNG3@glanzmann.de/T/#u Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-30drm/amdgpu: Fix call trace warning and hang when removing amdgpu devicelyndonli1-1/+1
[ Upstream commit 93bb18d2a873d2fa9625c8ea927723660a868b95 ] On GPUs with RAS enabled, below call trace and hang are observed when shutting down device. v2: use DRM device unplugged flag instead of shutdown flag as the check to prevent memory wipe in shutdown stage. [ +0.000000] RIP: 0010:amdgpu_vram_mgr_fini+0x18d/0x1c0 [amdgpu] [ +0.000001] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000002] amdgpu_ttm_fini+0x140/0x1c0 [amdgpu] [ +0.000183] amdgpu_bo_fini+0x27/0xa0 [amdgpu] [ +0.000184] gmc_v11_0_sw_fini+0x2b/0x40 [amdgpu] [ +0.000163] amdgpu_device_fini_sw+0xb6/0x510 [amdgpu] [ +0.000152] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] [ +0.000090] drm_dev_release+0x28/0x50 [drm] [ +0.000016] devm_drm_dev_init_release+0x38/0x60 [drm] [ +0.000011] devm_action_release+0x15/0x20 [ +0.000003] release_nodes+0x40/0xc0 [ +0.000001] devres_release_all+0x9e/0xe0 [ +0.000001] device_unbind_cleanup+0x12/0x80 [ +0.000003] device_release_driver_internal+0xff/0x160 [ +0.000001] driver_detach+0x4a/0x90 [ +0.000001] bus_remove_driver+0x6c/0xf0 [ +0.000001] driver_unregister+0x31/0x50 [ +0.000001] pci_unregister_driver+0x40/0x90 [ +0.000003] amdgpu_exit+0x15/0x120 [amdgpu] Signed-off-by: lyndonli <Lyndon.Li@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-22drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSesGuilherme G. Piccoli1-0/+19
commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream. The VCN firmware loading path enables the indirect SRAM mode if it's advertised as supported. We might have some cases of FW issues that prevents this mode to working properly though, ending-up in a failed probe. An example below, observed in the Steam Deck: [...] [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000) amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110 amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init [...] Disabling the VCN block circumvents this, but it's a very invasive workaround that turns off the entire feature. So, let's add a quirk on VCN loading that checks for known problematic BIOSes on Vangogh, so we can proactively disable the indirect SRAM mode and allow the HW proper probe and VCN IP block to work fine. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385 Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Cc: stable@vger.kernel.org Cc: James Zhu <James.Zhu@amd.com> Cc: Leo Liu <leo.liu@amd.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-22drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_finiHoratio Zhang1-3/+3
[ Upstream commit 23f4a2d29ba57bf88095f817de5809d427fcbe7e ] The call trace occurs when the amdgpu is removed after the mode1 reset. During mode1 reset, from suspend to resume, there is no need to reinitialize the ta firmware buffer which caused the bo pin_count increase redundantly. [ 489.885525] Call Trace: [ 489.885525] <TASK> [ 489.885526] amdttm_bo_put+0x34/0x50 [amdttm] [ 489.885529] amdgpu_bo_free_kernel+0xe8/0x130 [amdgpu] [ 489.885620] psp_free_shared_bufs+0xb7/0x150 [amdgpu] [ 489.885720] psp_hw_fini+0xce/0x170 [amdgpu] [ 489.885815] amdgpu_device_fini_hw+0x2ff/0x413 [amdgpu] [ 489.885960] ? blocking_notifier_chain_unregister+0x56/0xb0 [ 489.885962] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 489.886049] amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 489.886132] ? __pm_runtime_resume+0x60/0x90 [ 489.886134] pci_device_remove+0x3e/0xb0 [ 489.886135] __device_release_driver+0x1ab/0x2a0 [ 489.886137] driver_detach+0xf3/0x140 [ 489.886138] bus_remove_driver+0x6c/0xf0 [ 489.886140] driver_unregister+0x31/0x60 [ 489.886141] pci_unregister_driver+0x40/0x90 [ 489.886142] amdgpu_exit+0x15/0x451 [amdgpu] Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Signed-off-by: longlyao <Longlong.Yao@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17drm/amdgpu/soc21: Add video cap query support for VCN_4_0_4Veerabadhran Gopalakrishnan1-0/+1
[ Upstream commit 6ce2ea07c5ff0a8188eab0e5cd1f0e4899b36835 ] Added the video capability query support for VCN version 4_0_4 Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17drm/amdgpu/soc21: don't expose AV1 if VCN0 is harvestedAlex Deucher1-13/+48
[ Upstream commit a6de636eb04f146d23644dbbb7173e142452a9b7 ] Only VCN0 supports AV1. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 6ce2ea07c5ff ("drm/amdgpu/soc21: Add video cap query support for VCN_4_0_4") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17drm/amdgpu: fix error checking in amdgpu_read_mm_registers for nvAlex Deucher1-3/+4
commit b42fee5e0b44344cfe4c38e61341ee250362c83f upstream. Properly skip non-existent registers as well. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17drm/amdgpu: fix error checking in amdgpu_read_mm_registers for soc21Alex Deucher1-3/+4
commit 2915e43a033a778816fa4bc621f033576796521e upstream. Properly skip non-existent registers as well. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17drm/amdgpu: fix error checking in amdgpu_read_mm_registers for soc15Alex Deucher1-2/+3
commit 0dcdf8498eae2727bb33cef3576991dc841d4343 upstream. Properly skip non-existent registers as well. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10drm/amd: Fix initialization for nbio 7.5.1Mario Limonciello1-0/+5
commit 65a24000808f70ac69bd2a96381fa0c7341f20c0 upstream. A mistake has been made in the BIOS for some ASICs with NBIO 7.5.1 where some NBIO registers aren't properly setup. Ensure that they're set during initialization. Tested-by: Richard Gong <richard.gong@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10Revert "drm/amdgpu: TA unload messages are not actually sent to psp when ↵Vitaly Prosyak2-3/+4
amdgpu is uninstalled" [ Upstream commit 39934d3ed5725c5e3570ed1b67f612f1ea60ce03 ] This reverts commit fac53471d0ea9693d314aa2df08d62b2e7e3a0f8. The following change: move the drm_dev_unplug call after amdgpu_driver_unload_kms in amdgpu_pci_remove. The reason is the following: amdgpu_pci_remove calls drm_dev_unregister and it should be called first to ensure userspace can't access the device instance anymore. If we call drm_dev_unplug after amdgpu_driver_unload_kms then we observe IGT PCI software unplug test failure (kernel hung) for all ASICs. This is how this regression was found. After this revert, the following commands do work not, but it would be fixed in the next commit: - sudo modprobe -r amdgpu - sudo modprobe amdgpu Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10drm/amdkfd: Page aligned memory reserve sizePhilip Yang2-6/+8
[ Upstream commit 0c2dece8fb541ab07b68c3312a1065fa9c927a81 ] Use page aligned size to reserve memory usage because page aligned TTM BO size is used to unreserve memory usage, otherwise no page aligned size causes memory usage accounting unbalanced. Change vram_used definition type to int64_t to be able to trigger WARN_ONCE(adev && adev->kfd.vram_used < 0, "..."), to help debug the accounting issue with warning and backtrace. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10drm/amd: Avoid BUG() for case of SRIOV missing IP versionMario Limonciello1-1/+1
[ Upstream commit 93fec4f8c158584065134b4d45e875499bf517c8 ] No need to crash the kernel. AMDGPU will now fail to probe. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10drm/amdgpu: Use the sched from entity for amdgpu_cs traceLeo Liu1-2/+2
[ Upstream commit cf22ef78f22ce4df4757472c5dbd33c430c5b659 ] The problem is that base sched hasn't been assigned yet at this moment, causing something like "ring=0" all the time from trace. mpv:cs0-3473 [002] ..... 129.047431: amdgpu_cs: ring=0, dw=48, fences=0 mpv:cs0-3473 [002] ..... 129.089125: amdgpu_cs: ring=0, dw=48, fences=0 mpv:cs0-3473 [002] ..... 129.130987: amdgpu_cs: ring=0, dw=48, fences=0 mpv:cs0-3473 [002] ..... 129.172478: amdgpu_cs: ring=0, dw=48, fences=0 Fixes: 4624459c84d7 ("drm/amdgpu: add gang submit frontend v6") Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-16drm/amd/amdgpu: fix warning during suspendJack Xiao2-1/+4
Freeing memory was warned during suspend. Move the self test out of suspend. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2151825 Cc: jfalempe@redhat.com Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-and-tested-by: Evan Quan <evan.quan@amd.com> Tested-by: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-02-10Merge tag 'amd-drm-fixes-6.2-2023-02-09' of ↵Dave Airlie2-0/+12
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.2-2023-02-09: amdgpu: - Add a parameter to disable S/G display - Re-enable S/G display on all DCNs Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230209174504.7577-1-alexander.deucher@amd.com
2023-02-10Merge tag 'drm-misc-fixes-2023-02-09' of ↵Dave Airlie1-1/+4
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A fix for a circular refcounting in drm/client, one for a memory leak in amdgpu and a virtio fence fix when interrupted Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20230209083600.7hi6roht6xxgldgz@houat
2023-02-09drm/amdgpu: add S/G display parameterAlex Deucher2-0/+12
Some users have reported flickerng with S/G display. We've tried extensively to reproduce and debug the issue on a wide variety of platform configurations (DRAM bandwidth, etc.) and a variety of monitors, but so far have not been able to. We disabled S/G display on a number of platforms to address this but that leads to failure to pin framebuffers errors and blank displays when there is memory pressure or no displays at all on systems with limited carveout (e.g., Chromebooks). Add a option to disable this as a debugging option as a way for users to disable this, depending on their use case, and for us to help debug this further. v2: fix typo Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-09amd/amdgpu: remove test ib on hw ringJesseZhang2-2/+1
test ib function is not necessary on hw ring, so remove it. v2: squash in NULL check fix Signed-off-by: JesseZhang <Jesse.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-09drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/finiGuilherme G. Piccoli1-1/+7
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops: amdgpu: probe of 0000:04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0000000000000090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: <TASK> amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part. Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1]. [0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/ Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König <christian.koenig@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-02-09drm/amdgpu: Use the TGID for trace_amdgpu_vm_update_ptesFriedrich Vock1-1/+1
The pid field corresponds to the result of gettid() in userspace. However, userspace cannot reliably attribute PTE events to processes with just the thread id. This patch allows userspace to easily attribute PTE update events to specific processes by comparing this field with the result of getpid(). For attributing events to specific threads, the thread id is also contained in the common fields of each trace event. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-02-09drm/amd/amdgpu: enable athub cg 11.0.3Kenneth Feng1-1/+3
enable athub cg on gc 11.0.3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-03drm/amdgpu: fix memory leak in amdgpu_cs_sync_ringsBert Karwatzki1-1/+4
amdgpu_sync_get_fence deletes the returned fence from the syncobj, so the refcount of fence needs to lowered to avoid a memory leak. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2360 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/3b590ba0f11d24b8c6c39c3d38250129c1116af4.camel@web.de
2023-02-02drm/amd: Fix initialization for nbio 4.3.0Mario Limonciello1-1/+7
A mistake has been made on some boards with NBIO 4.3.0 where some NBIO registers aren't properly set by the hardware. Ensure that they're set during initialization. Cc: Natikar Basavaraj <Basavaraj.Natikar@amd.com> Tested-by: Satyanarayana ReddyTVN <Satyanarayana.ReddyTVN@amd.com> Tested-by: Rutvij Gajjar <Rutvij.Gajjar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-02-02drm/amdgpu: enable HDP SD for gfx 11.0.3Evan Quan1-1/+2
Enable HDP clock gating control for gfx 11.0.3. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-02drm/amdgpu: update wave data type to 3 for gfx11Graham Sider1-2/+2
SQ_WAVE_INST_DW0 isn't present on gfx11 compared to gfx10, so update wave data type to signify a difference. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-25drm/amdgpu: declare firmware for new MES 11.0.4Li Ma1-0/+2
To support new mes ip block Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amdgpu: enable imu firmware for GC 11.0.4Li Ma1-0/+1
The GC 11.0.4 needs load IMU to power up GFX before loads GFX firmware. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amdgpu: remove unconditional trap enable on add gfx11 queuesJonathan Kim1-1/+0
Rebase of driver has incorrect unconditional trap enablement for GFX11 when adding mes queues. Reported-by: Graham Sider <graham.sider@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Graham Sider <graham.sider@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-19drm/amdgpu: allow multipipe policy on ASICs with one MECLang Yu1-0/+3
Always enable multipipe policy on ASICs with GC VERSION > 9.0.0 instead of MEC number > 1. This will allow multipipe policy on ASICs with one MEC, e.g., gfx11 APUs. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-19drm/amdgpu: correct MEC number for gfx11 APUsLang Yu1-2/+9
There is only one MEC on these APUs. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-19drm/amdgpu: fix amdgpu_job_free_resources v2Christian König1-2/+8
It can be that neither fence were initialized when we run out of UVD streams for example. v2: fix typo breaking compile Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2324 Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-19drm/amdgpu: fix cleaning up reserved VMID on releaseChristian König1-0/+1
We need to reset this or otherwise run into list corruption later on. Fixes: e44a0fe630c5 ("drm/amdgpu: rework reserved VMID handling") Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Candice Li <candice.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-13Merge tag 'amd-drm-fixes-6.2-2023-01-11' of ↵Dave Airlie5-10/+13
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.2-2023-01-11: amdgpu: - SMU13 fan speed fix - SMU13 fix power cap handling - SMU13 BACO fix - Fix a possible segfault in bo validation error case - Delay removal of firmware framebuffer - Fix error when unloading amdkfd: - SVM fix when clearing vram - GC11 fix for multi-GPU Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230112033004.8184-1-alexander.deucher@amd.com
2023-01-13Merge tag 'drm-misc-fixes-2023-01-12' of ↵Dave Airlie2-18/+37
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Several fixes for amdgpu (all addressing issues with fences), yet another orientation quirk for a Lenovo device, a use-after-free fix for virtio, a regression fix in TTM and a performance regression in drm buddy. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20230112130954.pxt77g3a7rokha42@houat
2023-01-11drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPUEric Huang1-1/+1
The point bo->kfd_bo is NULL for queue's write pointer BO when creating queue on mGPU. To avoid using the pointer fixes the error. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10drm/amdgpu: fix pipeline sync v2Christian König1-16/+30
This fixes a potential memory leak of dma_fence objects in the CS code as well as glitches in firefox because of missing pipeline sync. v2: use the scheduler instead of the fence context Signed-off-by: Christian König <christian.koenig@amd.com> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2323 Tested-by: Michal Kubecek mkubecek@suse.cz Tested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230109130120.73389-1-christian.koenig@amd.com
2023-01-10drm/amdgpu: Fixed bug on error when unloading amdgpuYiPeng Chai1-1/+1
Fixed bug on error when unloading amdgpu. The error message is as follows: [ 377.706202] kernel BUG at drivers/gpu/drm/drm_buddy.c:278! [ 377.706215] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 377.706222] CPU: 4 PID: 8610 Comm: modprobe Tainted: G IOE 6.0.0-thomas #1 [ 377.706231] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021 [ 377.706238] RIP: 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy] [ 377.706264] Code: 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 04 00 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53 [ 377.706282] RSP: 0018:ffffad2dc4683cb8 EFLAGS: 00010287 [ 377.706289] RAX: 0000000000000000 RBX: ffff8b1743bd5138 RCX: 0000000000000000 [ 377.706297] RDX: ffff8b1743bd5160 RSI: ffff8b1743bd5c78 RDI: ffff8b16d1b25f70 [ 377.706304] RBP: ffff8b1743bd59e0 R08: 0000000000000001 R09: 0000000000000001 [ 377.706311] R10: ffff8b16c8572400 R11: ffffad2dc4683cf0 R12: ffff8b16d1b25f70 [ 377.706318] R13: ffff8b16d1b25fd0 R14: ffff8b1743bd59c0 R15: ffff8b16d1b25f70 [ 377.706325] FS: 00007fec56c72c40(0000) GS:ffff8b1836500000(0000) knlGS:0000000000000000 [ 377.706334] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 377.706340] CR2: 00007f9b88c1ba50 CR3: 0000000110450004 CR4: 00000000003706e0 [ 377.706347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 377.706354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 377.706361] Call Trace: [ 377.706365] <TASK> [ 377.706369] drm_buddy_free_list+0x2a/0x60 [drm_buddy] [ 377.706376] amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu] [ 377.706572] amdgpu_ttm_fini+0x12e/0x1a0 [amdgpu] [ 377.706650] amdgpu_bo_fini+0x22/0x90 [amdgpu] [ 377.706727] gmc_v11_0_sw_fini+0x26/0x30 [amdgpu] [ 377.706821] amdgpu_device_fini_sw+0xa1/0x3c0 [amdgpu] [ 377.706897] amdgpu_driver_release_kms+0x12/0x30 [amdgpu] [ 377.706975] drm_dev_release+0x20/0x40 [drm] [ 377.707006] release_nodes+0x35/0xb0 [ 377.707014] devres_release_all+0x8b/0xc0 [ 377.707020] device_unbind_cleanup+0xe/0x70 [ 377.707027] device_release_driver_internal+0xee/0x160 [ 377.707033] driver_detach+0x44/0x90 [ 377.707039] bus_remove_driver+0x55/0xe0 [ 377.707045] pci_unregister_driver+0x3b/0x90 [ 377.707052] amdgpu_exit+0x11/0x6c [amdgpu] [ 377.707194] __x64_sys_delete_module+0x142/0x2b0 [ 377.707201] ? fpregs_assert_state_consistent+0x22/0x50 [ 377.707208] ? exit_to_user_mode_prepare+0x3e/0x190 [ 377.707215] do_syscall_64+0x38/0x90 [ 377.707221] entry_SYSCALL_64_after_hwframe+0x63/0xcd Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-01-10drm/amd: Delay removal of the firmware framebufferMario Limonciello2-6/+8
Removing the firmware framebuffer from the driver means that even if the driver doesn't support the IP blocks in a GPU it will no longer be functional after the driver fails to initialize. This change will ensure that unsupported IP blocks at least cause the driver to work with the EFI framebuffer. Cc: stable@vger.kernel.org Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10drm/amdgpu: Fix potential NULL dereferenceLuben Tuikov1-2/+3
Fix potential NULL dereference, in the case when "man", the resource manager might be NULL, when/if we print debug information. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Cc: Dan Carpenter <error27@gmail.com> Cc: kernel test robot <lkp@intel.com> Fixes: 7554886daa31ea ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-05drm/amdgpu: fix missing dma_fence_put in error pathChristian König1-1/+3
When the fence can't be added we need to drop the reference. Suggested-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-2-christian.koenig@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
2023-01-05drm/amdgpu: fix another missing fence reference in the CS codeChristian König1-1/+4
drm_sched_job_add_dependency() consumes the references of the gang members. Only triggered by mesh shaders. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 1728baa7e4e6 ("drm/amdgpu: use scheduler dependencies for CS") Tested-by: Mike Lothian <mike@fireburn.co.uk> Tested-by: Bert Karwatzki <spasswolf@web.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-1-christian.koenig@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
2023-01-05Revert "drm/amd/display: Enable Freesync Video Mode by default"Michel Dänzer2-0/+28
This reverts commit de05abe6b9d0fe08f65d744f7f75a4cba4df27ad. The bug referenced below was bisected to this commit. There has been no activity toward fixing it in 3 months, so let's revert for now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-12-23Merge tag 'drm-next-2022-12-23' of git://anongit.freedesktop.org/drm/drmLinus Torvalds22-234/+355
Pull drm fixes from Dave Airlie: "Holiday fixes! Two batches from amd, and one group of i915 changes. amdgpu: - Spelling fix - BO pin fix - Properly handle polaris 10/11 overlap asics - GMC9 fix - SR-IOV suspend fix - DCN 3.1.4 fix - KFD userptr locking fix - SMU13.x fixes - GDS/GWS/OA handling fix - Reserved VMID handling fixes - FRU EEPROM fix - BO validation fixes - Avoid large variable on the stack - S0ix fixes - SMU 13.x fixes - VCN fix - Add missing fence reference amdkfd: - Fix init vm error handling - Fix double release of compute pasid i915 - Documentation fixes - OA-perf related fix - VLV/CHV HDMI/DP audio fix - Display DDI/Transcoder fix - Migrate fixes" * tag 'drm-next-2022-12-23' of git://anongit.freedesktop.org/drm/drm: (39 commits) drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency drm/amdgpu: enable VCN DPG for GC IP v11.0.4 drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0 drm/amd/pm: correct the fan speed retrieving in PWM for some SMU13 asics drm/amd/pm: bump SMU13.0.0 driver_if header to version 0x34 drm/amdgpu: skip MES for S0ix as well since it's part of GFX drm/amd/pm: avoid large variable on kernel stack drm/amdkfd: Fix double release compute pasid drm/amdkfd: Fix kfd_process_device_init_vm error handling drm/amd/pm: update SMU13.0.0 reported maximum shader clock drm/amd/pm: correct SMU13.0.0 pstate profiling clock settings drm/amd/pm: enable GPO dynamic control support for SMU13.0.7 drm/amd/pm: enable GPO dynamic control support for SMU13.0.0 drm/amdgpu: revert "generally allow over-commit during BO allocation" drm/amdgpu: Remove unnecessary domain argument drm/amdgpu: Fix size validation for non-exclusive domains (v4) drm/amdgpu: Check if fru_addr is not NULL (v2) drm/i915/ttm: consider CCS for backup objects drm/i915/migrate: fix corner case in CCS aux copying drm/amdgpu: rework reserved VMID handling ...
2022-12-21drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependencyChristian König1-0/+2
That function consumes the reference. Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: aab9cf7b6954 ("drm/amdgpu: use scheduler dependencies for VM updates") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-21drm/amdgpu: enable VCN DPG for GC IP v11.0.4Saleemkhan Jamadar1-0/+1
Enable VCN Dynamic Power Gating control for GC IP v11.0.4. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0, 6.1
2022-12-20drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0Tim Huang1-1/+2
MES is part of gfxoff and MES suspend and resume are skipped for S0i3. But the mes_self_test call path is still in the amdgpu_device_ip_late_init. it's should also be skipped for s0ix as no hardware re-initialization happened. Besides, mes_self_test will free the BO that triggers a lot of warning messages while in the suspend state. [ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu] [ 81.679435] Call Trace: [ 81.679726] <TASK> [ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu] [ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu] [ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu] [ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu] [ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu] [ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu] [ 81.684818] pci_pm_resume+0x5c/0xa0 [ 81.685247] ? pci_pm_thaw+0x90/0x90 [ 81.685658] dpm_run_callback+0x4e/0x160 [ 81.686110] device_resume+0xad/0x210 [ 81.686529] async_resume+0x1e/0x40 [ 81.686931] async_run_entry_fn+0x33/0x120 [ 81.687405] process_one_work+0x21d/0x3f0 [ 81.687869] worker_thread+0x4a/0x3c0 [ 81.688293] ? process_one_work+0x3f0/0x3f0 [ 81.688777] kthread+0xff/0x130 [ 81.689157] ? kthread_complete_and_exit+0x20/0x20 [ 81.689707] ret_from_fork+0x22/0x30 [ 81.690118] </TASK> [ 81.690380] ---[ end trace 0000000000000000 ]--- v2: make the comment clean and use adev->in_s0ix instead of adev->suspend Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0, 6.1