Age | Commit message (Collapse) | Author | Files | Lines |
|
commit aaee0ce460b954e08b6e630d7e54b2abb672feb8 upstream.
Move the amdgpu_acpi_should_gpu_reset out of
CONFIG_SUSPEND to share it with hibernate case.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b589626674de94d977e81c99bf7905872b991197 upstream.
For GC IP v11.0.4/11, PSP TMR need to be reserved
for ASIC mode2 reset. But for S4, when psp suspend,
it will destroy the TMR that fails the ASIC reset.
[ 96.006101] amdgpu 0000:62:00.0: amdgpu: MODE2 reset
[ 100.409717] amdgpu 0000:62:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000011 SMN_C2PMSG_82:0x00000002
[ 100.411593] amdgpu 0000:62:00.0: amdgpu: Mode2 reset failed!
[ 100.412470] amdgpu 0000:62:00.0: PM: pci_pm_freeze(): amdgpu_pmops_freeze+0x0/0x50 [amdgpu] returns -62
[ 100.414020] amdgpu 0000:62:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -62
[ 100.415311] amdgpu 0000:62:00.0: PM: pci_pm_freeze+0x0/0xd0 returned -62 after 4623202 usecs
[ 100.416608] amdgpu 0000:62:00.0: PM: failed to freeze async: error -62
We can skip the reset on APUs, assuming we can resume them
properly. Verified on some GFX11, GFX10 and old GFX9 APUs.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2b072442f4962231a8516485012bb2d2551ef2fe upstream.
S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is
caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default").
The root cause is still not clear for now.
So extend and apply the ASPM quirk from commit e02fe3bc7aba
("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to
workaround the issue on Navi cards too.
Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1717cc5f2962a4652c76ed3858b499ccae6c277c ]
The same strapping initialization issue that happened on NBIO 7.5.1
appears to be happening on NBIO 7.3.0.
Apply the same fix to 7.3.0 as well.
Note: This workaround relies upon the integrated GPU being enabled
in BIOS. If the integrated GPU is disabled in BIOS a different
workaround will be required.
Reported-by: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Link: https://lore.kernel.org/linux-usb/Y%2Fz9GdHjPyF2rNG3@glanzmann.de/T/#u
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 93bb18d2a873d2fa9625c8ea927723660a868b95 ]
On GPUs with RAS enabled, below call trace and hang are observed when
shutting down device.
v2: use DRM device unplugged flag instead of shutdown flag as the check to
prevent memory wipe in shutdown stage.
[ +0.000000] RIP: 0010:amdgpu_vram_mgr_fini+0x18d/0x1c0 [amdgpu]
[ +0.000001] PKRU: 55555554
[ +0.000001] Call Trace:
[ +0.000001] <TASK>
[ +0.000002] amdgpu_ttm_fini+0x140/0x1c0 [amdgpu]
[ +0.000183] amdgpu_bo_fini+0x27/0xa0 [amdgpu]
[ +0.000184] gmc_v11_0_sw_fini+0x2b/0x40 [amdgpu]
[ +0.000163] amdgpu_device_fini_sw+0xb6/0x510 [amdgpu]
[ +0.000152] amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
[ +0.000090] drm_dev_release+0x28/0x50 [drm]
[ +0.000016] devm_drm_dev_init_release+0x38/0x60 [drm]
[ +0.000011] devm_action_release+0x15/0x20
[ +0.000003] release_nodes+0x40/0xc0
[ +0.000001] devres_release_all+0x9e/0xe0
[ +0.000001] device_unbind_cleanup+0x12/0x80
[ +0.000003] device_release_driver_internal+0xff/0x160
[ +0.000001] driver_detach+0x4a/0x90
[ +0.000001] bus_remove_driver+0x6c/0xf0
[ +0.000001] driver_unregister+0x31/0x50
[ +0.000001] pci_unregister_driver+0x40/0x90
[ +0.000003] amdgpu_exit+0x15/0x120 [amdgpu]
Signed-off-by: lyndonli <Lyndon.Li@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:
[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000)
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110
amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
[...]
Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Cc: stable@vger.kernel.org
Cc: James Zhu <James.Zhu@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 23f4a2d29ba57bf88095f817de5809d427fcbe7e ]
The call trace occurs when the amdgpu is removed after
the mode1 reset. During mode1 reset, from suspend to resume,
there is no need to reinitialize the ta firmware buffer
which caused the bo pin_count increase redundantly.
[ 489.885525] Call Trace:
[ 489.885525] <TASK>
[ 489.885526] amdttm_bo_put+0x34/0x50 [amdttm]
[ 489.885529] amdgpu_bo_free_kernel+0xe8/0x130 [amdgpu]
[ 489.885620] psp_free_shared_bufs+0xb7/0x150 [amdgpu]
[ 489.885720] psp_hw_fini+0xce/0x170 [amdgpu]
[ 489.885815] amdgpu_device_fini_hw+0x2ff/0x413 [amdgpu]
[ 489.885960] ? blocking_notifier_chain_unregister+0x56/0xb0
[ 489.885962] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu]
[ 489.886049] amdgpu_pci_remove+0x5a/0x140 [amdgpu]
[ 489.886132] ? __pm_runtime_resume+0x60/0x90
[ 489.886134] pci_device_remove+0x3e/0xb0
[ 489.886135] __device_release_driver+0x1ab/0x2a0
[ 489.886137] driver_detach+0xf3/0x140
[ 489.886138] bus_remove_driver+0x6c/0xf0
[ 489.886140] driver_unregister+0x31/0x60
[ 489.886141] pci_unregister_driver+0x40/0x90
[ 489.886142] amdgpu_exit+0x15/0x451 [amdgpu]
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6ce2ea07c5ff0a8188eab0e5cd1f0e4899b36835 ]
Added the video capability query support for VCN version 4_0_4
Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a6de636eb04f146d23644dbbb7173e142452a9b7 ]
Only VCN0 supports AV1.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 6ce2ea07c5ff ("drm/amdgpu/soc21: Add video cap query support for VCN_4_0_4")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b42fee5e0b44344cfe4c38e61341ee250362c83f upstream.
Properly skip non-existent registers as well.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2915e43a033a778816fa4bc621f033576796521e upstream.
Properly skip non-existent registers as well.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0dcdf8498eae2727bb33cef3576991dc841d4343 upstream.
Properly skip non-existent registers as well.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 65a24000808f70ac69bd2a96381fa0c7341f20c0 upstream.
A mistake has been made in the BIOS for some ASICs with NBIO 7.5.1
where some NBIO registers aren't properly setup.
Ensure that they're set during initialization.
Tested-by: Richard Gong <richard.gong@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
amdgpu is uninstalled"
[ Upstream commit 39934d3ed5725c5e3570ed1b67f612f1ea60ce03 ]
This reverts commit fac53471d0ea9693d314aa2df08d62b2e7e3a0f8.
The following change: move the drm_dev_unplug call after
amdgpu_driver_unload_kms in amdgpu_pci_remove. The reason is
the following: amdgpu_pci_remove calls drm_dev_unregister
and it should be called first to ensure userspace can't access the
device instance anymore. If we call drm_dev_unplug after
amdgpu_driver_unload_kms then we observe IGT PCI software unplug
test failure (kernel hung) for all ASICs. This is how this
regression was found.
After this revert, the following commands do work not, but it would
be fixed in the next commit:
- sudo modprobe -r amdgpu
- sudo modprobe amdgpu
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0c2dece8fb541ab07b68c3312a1065fa9c927a81 ]
Use page aligned size to reserve memory usage because page aligned TTM
BO size is used to unreserve memory usage, otherwise no page aligned
size causes memory usage accounting unbalanced.
Change vram_used definition type to int64_t to be able to trigger
WARN_ONCE(adev && adev->kfd.vram_used < 0, "..."), to help debug the
accounting issue with warning and backtrace.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 93fec4f8c158584065134b4d45e875499bf517c8 ]
No need to crash the kernel. AMDGPU will now fail to probe.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cf22ef78f22ce4df4757472c5dbd33c430c5b659 ]
The problem is that base sched hasn't been assigned yet at this moment,
causing something like "ring=0" all the time from trace.
mpv:cs0-3473 [002] ..... 129.047431: amdgpu_cs: ring=0, dw=48, fences=0
mpv:cs0-3473 [002] ..... 129.089125: amdgpu_cs: ring=0, dw=48, fences=0
mpv:cs0-3473 [002] ..... 129.130987: amdgpu_cs: ring=0, dw=48, fences=0
mpv:cs0-3473 [002] ..... 129.172478: amdgpu_cs: ring=0, dw=48, fences=0
Fixes: 4624459c84d7 ("drm/amdgpu: add gang submit frontend v6")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Freeing memory was warned during suspend.
Move the self test out of suspend.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2151825
Cc: jfalempe@redhat.com
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-and-tested-by: Evan Quan <evan.quan@amd.com>
Tested-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.2-2023-02-09:
amdgpu:
- Add a parameter to disable S/G display
- Re-enable S/G display on all DCNs
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230209174504.7577-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A fix for a circular refcounting in drm/client, one for a memory leak in
amdgpu and a virtio fence fix when interrupted
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230209083600.7hi6roht6xxgldgz@houat
|
|
Some users have reported flickerng with S/G display. We've
tried extensively to reproduce and debug the issue on a wide
variety of platform configurations (DRAM bandwidth, etc.) and
a variety of monitors, but so far have not been able to. We
disabled S/G display on a number of platforms to address this
but that leads to failure to pin framebuffers errors and
blank displays when there is memory pressure or no displays
at all on systems with limited carveout (e.g., Chromebooks).
Add a option to disable this as a debugging option as a
way for users to disable this, depending on their use case,
and for us to help debug this further.
v2: fix typo
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
test ib function is not necessary on hw ring,
so remove it.
v2: squash in NULL check fix
Signed-off-by: JesseZhang <Jesse.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.
Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:
amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
<TASK>
amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
devm_drm_dev_init_release+0x49/0x70
[...]
To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.
Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].
[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/
Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
The pid field corresponds to the result of gettid() in userspace.
However, userspace cannot reliably attribute PTE events to processes
with just the thread id. This patch allows userspace to easily
attribute PTE update events to specific processes by comparing this
field with the result of getpid().
For attributing events to specific threads, the thread id is also
contained in the common fields of each trace event.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
enable athub cg on gc 11.0.3
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
amdgpu_sync_get_fence deletes the returned fence from the
syncobj, so the refcount of fence needs to lowered to avoid
a memory leak.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2360
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bert Karwatzki <spasswolf@web.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/3b590ba0f11d24b8c6c39c3d38250129c1116af4.camel@web.de
|
|
A mistake has been made on some boards with NBIO 4.3.0 where some
NBIO registers aren't properly set by the hardware.
Ensure that they're set during initialization.
Cc: Natikar Basavaraj <Basavaraj.Natikar@amd.com>
Tested-by: Satyanarayana ReddyTVN <Satyanarayana.ReddyTVN@amd.com>
Tested-by: Rutvij Gajjar <Rutvij.Gajjar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
|
|
Enable HDP clock gating control for gfx 11.0.3.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SQ_WAVE_INST_DW0 isn't present on gfx11 compared to gfx10, so update
wave data type to signify a difference.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Mukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
|
|
To support new mes ip block
Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The GC 11.0.4 needs load IMU to power up GFX before loads GFX firmware.
Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Rebase of driver has incorrect unconditional trap enablement
for GFX11 when adding mes queues.
Reported-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
|
|
Always enable multipipe policy on ASICs with GC VERSION > 9.0.0
instead of MEC number > 1.
This will allow multipipe policy on ASICs with one MEC,
e.g., gfx11 APUs.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
|
|
There is only one MEC on these APUs.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
|
|
It can be that neither fence were initialized when we run out of UVD
streams for example.
v2: fix typo breaking compile
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2324
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
|
|
We need to reset this or otherwise run into list corruption later on.
Fixes: e44a0fe630c5 ("drm/amdgpu: rework reserved VMID handling")
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.2-2023-01-11:
amdgpu:
- SMU13 fan speed fix
- SMU13 fix power cap handling
- SMU13 BACO fix
- Fix a possible segfault in bo validation error case
- Delay removal of firmware framebuffer
- Fix error when unloading
amdkfd:
- SVM fix when clearing vram
- GC11 fix for multi-GPU
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230112033004.8184-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Several fixes for amdgpu (all addressing issues with fences), yet
another orientation quirk for a Lenovo device, a use-after-free fix for
virtio, a regression fix in TTM and a performance regression in drm
buddy.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230112130954.pxt77g3a7rokha42@houat
|
|
The point bo->kfd_bo is NULL for queue's write pointer BO
when creating queue on mGPU. To avoid using the pointer
fixes the error.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This fixes a potential memory leak of dma_fence objects in the CS code
as well as glitches in firefox because of missing pipeline sync.
v2: use the scheduler instead of the fence context
Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2323
Tested-by: Michal Kubecek mkubecek@suse.cz
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230109130120.73389-1-christian.koenig@amd.com
|
|
Fixed bug on error when unloading amdgpu.
The error message is as follows:
[ 377.706202] kernel BUG at drivers/gpu/drm/drm_buddy.c:278!
[ 377.706215] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 377.706222] CPU: 4 PID: 8610 Comm: modprobe Tainted: G IOE 6.0.0-thomas #1
[ 377.706231] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021
[ 377.706238] RIP: 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy]
[ 377.706264] Code: 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 04 00 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53
[ 377.706282] RSP: 0018:ffffad2dc4683cb8 EFLAGS: 00010287
[ 377.706289] RAX: 0000000000000000 RBX: ffff8b1743bd5138 RCX: 0000000000000000
[ 377.706297] RDX: ffff8b1743bd5160 RSI: ffff8b1743bd5c78 RDI: ffff8b16d1b25f70
[ 377.706304] RBP: ffff8b1743bd59e0 R08: 0000000000000001 R09: 0000000000000001
[ 377.706311] R10: ffff8b16c8572400 R11: ffffad2dc4683cf0 R12: ffff8b16d1b25f70
[ 377.706318] R13: ffff8b16d1b25fd0 R14: ffff8b1743bd59c0 R15: ffff8b16d1b25f70
[ 377.706325] FS: 00007fec56c72c40(0000) GS:ffff8b1836500000(0000) knlGS:0000000000000000
[ 377.706334] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 377.706340] CR2: 00007f9b88c1ba50 CR3: 0000000110450004 CR4: 00000000003706e0
[ 377.706347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 377.706354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 377.706361] Call Trace:
[ 377.706365] <TASK>
[ 377.706369] drm_buddy_free_list+0x2a/0x60 [drm_buddy]
[ 377.706376] amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu]
[ 377.706572] amdgpu_ttm_fini+0x12e/0x1a0 [amdgpu]
[ 377.706650] amdgpu_bo_fini+0x22/0x90 [amdgpu]
[ 377.706727] gmc_v11_0_sw_fini+0x26/0x30 [amdgpu]
[ 377.706821] amdgpu_device_fini_sw+0xa1/0x3c0 [amdgpu]
[ 377.706897] amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
[ 377.706975] drm_dev_release+0x20/0x40 [drm]
[ 377.707006] release_nodes+0x35/0xb0
[ 377.707014] devres_release_all+0x8b/0xc0
[ 377.707020] device_unbind_cleanup+0xe/0x70
[ 377.707027] device_release_driver_internal+0xee/0x160
[ 377.707033] driver_detach+0x44/0x90
[ 377.707039] bus_remove_driver+0x55/0xe0
[ 377.707045] pci_unregister_driver+0x3b/0x90
[ 377.707052] amdgpu_exit+0x11/0x6c [amdgpu]
[ 377.707194] __x64_sys_delete_module+0x142/0x2b0
[ 377.707201] ? fpregs_assert_state_consistent+0x22/0x50
[ 377.707208] ? exit_to_user_mode_prepare+0x3e/0x190
[ 377.707215] do_syscall_64+0x38/0x90
[ 377.707221] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Removing the firmware framebuffer from the driver means that even
if the driver doesn't support the IP blocks in a GPU it will no
longer be functional after the driver fails to initialize.
This change will ensure that unsupported IP blocks at least cause
the driver to work with the EFI framebuffer.
Cc: stable@vger.kernel.org
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix potential NULL dereference, in the case when "man", the resource manager
might be NULL, when/if we print debug information.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Cc: Dan Carpenter <error27@gmail.com>
Cc: kernel test robot <lkp@intel.com>
Fixes: 7554886daa31ea ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When the fence can't be added we need to drop the reference.
Suggested-by: Bert Karwatzki <spasswolf@web.de>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-2-christian.koenig@amd.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
|
|
drm_sched_job_add_dependency() consumes the references of the gang
members. Only triggered by mesh shaders.
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 1728baa7e4e6 ("drm/amdgpu: use scheduler dependencies for CS")
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Bert Karwatzki <spasswolf@web.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-1-christian.koenig@amd.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
|
|
This reverts commit de05abe6b9d0fe08f65d744f7f75a4cba4df27ad.
The bug referenced below was bisected to this commit. There has been no
activity toward fixing it in 3 months, so let's revert for now.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Pull drm fixes from Dave Airlie:
"Holiday fixes!
Two batches from amd, and one group of i915 changes.
amdgpu:
- Spelling fix
- BO pin fix
- Properly handle polaris 10/11 overlap asics
- GMC9 fix
- SR-IOV suspend fix
- DCN 3.1.4 fix
- KFD userptr locking fix
- SMU13.x fixes
- GDS/GWS/OA handling fix
- Reserved VMID handling fixes
- FRU EEPROM fix
- BO validation fixes
- Avoid large variable on the stack
- S0ix fixes
- SMU 13.x fixes
- VCN fix
- Add missing fence reference
amdkfd:
- Fix init vm error handling
- Fix double release of compute pasid
i915
- Documentation fixes
- OA-perf related fix
- VLV/CHV HDMI/DP audio fix
- Display DDI/Transcoder fix
- Migrate fixes"
* tag 'drm-next-2022-12-23' of git://anongit.freedesktop.org/drm/drm: (39 commits)
drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency
drm/amdgpu: enable VCN DPG for GC IP v11.0.4
drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0
drm/amd/pm: correct the fan speed retrieving in PWM for some SMU13 asics
drm/amd/pm: bump SMU13.0.0 driver_if header to version 0x34
drm/amdgpu: skip MES for S0ix as well since it's part of GFX
drm/amd/pm: avoid large variable on kernel stack
drm/amdkfd: Fix double release compute pasid
drm/amdkfd: Fix kfd_process_device_init_vm error handling
drm/amd/pm: update SMU13.0.0 reported maximum shader clock
drm/amd/pm: correct SMU13.0.0 pstate profiling clock settings
drm/amd/pm: enable GPO dynamic control support for SMU13.0.7
drm/amd/pm: enable GPO dynamic control support for SMU13.0.0
drm/amdgpu: revert "generally allow over-commit during BO allocation"
drm/amdgpu: Remove unnecessary domain argument
drm/amdgpu: Fix size validation for non-exclusive domains (v4)
drm/amdgpu: Check if fru_addr is not NULL (v2)
drm/i915/ttm: consider CCS for backup objects
drm/i915/migrate: fix corner case in CCS aux copying
drm/amdgpu: rework reserved VMID handling
...
|
|
That function consumes the reference.
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: aab9cf7b6954 ("drm/amdgpu: use scheduler dependencies for VM updates")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable VCN Dynamic Power Gating control for GC IP v11.0.4.
Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1
|
|
MES is part of gfxoff and MES suspend and resume are skipped for S0i3.
But the mes_self_test call path is still in the amdgpu_device_ip_late_init.
it's should also be skipped for s0ix as no hardware re-initialization
happened.
Besides, mes_self_test will free the BO that triggers a lot of warning
messages while in the suspend state.
[ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[ 81.679435] Call Trace:
[ 81.679726] <TASK>
[ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu]
[ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu]
[ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu]
[ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu]
[ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu]
[ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[ 81.684818] pci_pm_resume+0x5c/0xa0
[ 81.685247] ? pci_pm_thaw+0x90/0x90
[ 81.685658] dpm_run_callback+0x4e/0x160
[ 81.686110] device_resume+0xad/0x210
[ 81.686529] async_resume+0x1e/0x40
[ 81.686931] async_run_entry_fn+0x33/0x120
[ 81.687405] process_one_work+0x21d/0x3f0
[ 81.687869] worker_thread+0x4a/0x3c0
[ 81.688293] ? process_one_work+0x3f0/0x3f0
[ 81.688777] kthread+0xff/0x130
[ 81.689157] ? kthread_complete_and_exit+0x20/0x20
[ 81.689707] ret_from_fork+0x22/0x30
[ 81.690118] </TASK>
[ 81.690380] ---[ end trace 0000000000000000 ]---
v2: make the comment clean and use adev->in_s0ix instead of
adev->suspend
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1
|