summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2022-11-26drm/amd: Fail the suspend if resources can't be evictedMario Limonciello1-5/+10
[ Upstream commit 8d4de331f1b24a22d18e3c6116aa25228cf54854 ] If a system does not have swap and memory is under 100% usage, amdgpu will fail to evict resources. Currently the suspend carries on proceeding to reset the GPU: ``` [drm] evicting device resources failed [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vcn_v3_0> failed -12 [drm] free PSP TMR buffer [TTM] Failed allocating page table [drm] evicting device resources failed amdgpu 0000:03:00.0: amdgpu: MODE1 reset amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset ``` At this point if the suspend actually succeeded I think that amdgpu would have recovered because the GPU would have power cut off and restored. However the kernel fails to continue the suspend from the memory pressure and amdgpu fails to run the "resume" from the aborted suspend. ``` ACPI: PM: Preparing to enter system sleep state S3 SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL|__GFP_ZERO) cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0 node 0: slabs: 22, objs: 1122, free: 0 ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651) [drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed! [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62 amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62). PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62 amdgpu 0000:03:00.0: PM: failed to resume async: error -62 ``` To avoid this series of unfortunate events, fail amdgpu's suspend when the memory eviction fails. This will let the system gracefully recover and the user can try suspend again when the memory pressure is relieved. Reported-by: post@davidak.de Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26drm/amdgpu: set fb_modifiers_not_supported in vkmsYifan Zhang1-0/+2
[ Upstream commit 89b3554782e6b65894f0551e9e0a82ad02dac94d ] This patch to fix the gdm3 start failure with virual display: /usr/libexec/gdm-x-session[1711]: (II) AMDGPU(0): Setting screen physical size to 270 x 203 /usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): Failed to make import prime FD as pixmap: 22 /usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): failed to set mode: Invalid argument /usr/libexec/gdm-x-session[1711]: (WW) AMDGPU(0): Failed to set mode on CRTC 0 /usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): Failed to enable any CRTC gnome-shell[1840]: Running GNOME Shell (using mutter 42.2) as a X11 window and compositing manager /usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): failed to set mode: Invalid argument vkms doesn't have modifiers support, set fb_modifiers_not_supported to bring the gdm back. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26drm/amdgpu: Adjust MES polling timeout for sriovYiqing Yao1-1/+8
[ Upstream commit 226dcfad349f23f7744d02b24f8ec3bc4f6198ac ] [why] MES response time in sriov may be longer than default value due to reset or init in other VF. A timeout value specific to sriov is needed. [how] When in sriov, adjust the timeout value to calculated worst case scenario. Signed-off-by: Yiqing Yao <yiqing.yao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-16drm/amdgpu: workaround for TLB seq raceChristian König1-0/+15
commit 77c092e054262b594614bad5e5f47e57c5d29639 upstream. It can happen that we query the sequence value before the callback had a chance to run. Workaround that by grabbing the fence lock and releasing it again. Should be replaced by hw handling soon. Signed-off-by: Christian König <christian.koenig@amd.com> CC: stable@vger.kernel.org # 5.19+ Fixes: 5255e146c99a6 ("drm/amdgpu: rework TLB flushing") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2113 Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Stefan Springer <stefanspr94@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-16drm/amdgpu: Fix the lpfn checking condition in drm buddyMa Jun1-1/+1
commit e0b26b9482461e9528552f54fa662c2269f75b3f upstream. Because the value of man->size is changed during suspend/resume process, use mgr->mm.size instead of man->size here for lpfn checking. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220914125331.2467162-1-Jun.Ma2@amd.com Signed-off-by: Christian König <christian.koenig@amd.com> Cc: "Limonciello, Mario" <Mario.Limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10drm/amdgpu: disable GFXOFF during compute for GFX11Graham Sider1-0/+7
commit a3e5ce56f3d260f2ec8e5242c33f57e60ae9eba7 upstream. Temporary workaround to fix issues observed in some compute applications when GFXOFF is enabled on GFX11. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10drm/amdgpu: dequeue mes scheduler during finiYuBiao Wang1-3/+39
[ Upstream commit 2abe92c7adc9c0397ba51bf74909b85bc0fff84b ] [Why] If mes is not dequeued during fini, mes will be in an uncleaned state during reload, then mes couldn't receive some commands which leads to reload failure. [How] Perform MES dequeue via MMIO after all the unmap jobs are done by mes and before kiq fini. v2: Move the dequeue operation inside kiq_hw_fini. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-10drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11Yifan Zha3-9/+13
[ Upstream commit 97a3d6090f5c2a165dc88bda05c1dcf9f08bf886 ] [Why] L1 blocks most of GC registers accessing by MMIO. [How] Use RLCG interface to program GC registers under SRIOV VF in full access time. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-10drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV caseDanijel Slivka3-1/+15
[ Upstream commit 65f8682b9aaae20c2cdee993e6fe52374ad513c9 ] For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime. v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT which indicates that VF MMIO write access is not allowed in sriov runtime Signed-off-by: Danijel Slivka <danijel.slivka@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-03drm/amdkfd: Fix memory leak in kfd_mem_dmamap_userptr()Rafael Mendonca1-3/+3
[ Upstream commit 90bfee142af0f0e9d3bec80e7acd5f49b230acf7 ] If the number of pages from the userptr BO differs from the SG BO then the allocated memory for the SG table doesn't get freed before returning -EINVAL, which may lead to a memory leak in some error paths. Fix this by checking the number of pages before allocating memory for the SG table. Fixes: 264fb4d332f5 ("drm/amdgpu: Add multi-GPU DMA mapping helpers") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-03drm/amdgpu: fix pstate setting issueChengming Gui1-1/+4
commit 79610d3041338dc1ef554d6fd8b3b3e23be527f5 upstream. [WHY] 0, original pstate X 1, ctx_A_create -> ctx_A->stable_pstate = X 2, ctx_A_set_pstate (Y) -> current pstate is Y (PEAK or STANDARD) 3, ctx_B_create -> ctx_B->stable_pstate = Y 4, ctx_A_destroy -> restore pstate to X 5, ctx_B_destroy -> restore pstate to Y Above sequence will cause final pstate is wrong (Y), should be original X. [HOW] When ctx_B create, if ctx_A touched pstate setting (not auto, stable_pstate_ctx != NULL), set ctx_B->stable_pstate the same value as ctx_A saved, if stable_pstate_ctx == NULL, fetch current pstate to fill ctx_B->stable_pstate. Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-03drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resumePrike Liang1-0/+16
commit d61e1d1d5225a9baeb995bcbdb904f66f70ed87e upstream. In the S2idle suspend/resume phase the gfxoff is keeping functional so some IP blocks will be likely to reinitialize at gfxoff entry and that will result in failing to program GC registers.Therefore, let disallow gfxoff until AMDGPU IPs reinitialized completely. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-03drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.xLijo Lazar1-20/+8
commit d2c4c1569a7d7d5c8f75963bf2d62d7aeac30e2a upstream. MMHUB 2.1.x versions don't have ATCL2. Remove accesses to ATCL2 registers. Since they are non-existing registers, read access will cause a 'Completer Abort' and gets reported when AER is enabled with the below patch. Tagging with the patch so that this is backported along with it. v2: squash in uninitialized warning fix (Nathan Chancellor) Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-03drm/amdgpu: Fix for BO move issueArunpravin Paneer Selvam1-0/+3
commit 8273b4048664fff356fd10059033f0e2f5a422a1 upstream. A user reported a bug on CAPE VERDE system where uvd_v3_1 IP component failed to initialize as there is an issue with BO move code from one memory to other. In function amdgpu_mem_visible() called by amdgpu_bo_move(), when there are no blocks to compare or if we have a single block then break the loop. Fixes: 312b4dc11d4f ("drm/amdgpu: Fix VRAM BO swap issue") Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: "Limonciello, Mario" <Mario.Limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-03drm/amdgpu: Fix VRAM BO swap issueArunpravin Paneer Selvam1-5/+12
commit 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9 upstream. DRM buddy manager allocates the contiguous memory requests in a single block or multiple blocks. So for the ttm move operation (incase of low vram memory) we should consider all the blocks to compute the total memory size which compared with the struct ttm_resource num_pages in order to verify that the blocks are contiguous for the eviction process. v2: Added a Fixes tag v3: Rewrite the code to save a bit of calculations and variables (Christian) Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: "Limonciello, Mario" <Mario.Limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-29drm/amdgpu: fix sdma doorbell init ordering on APUsAlex Deucher2-5/+21
commit 50b0e4d4da09fa501e722af886f97e60a4f820d6 upstream. Commit 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") uncovered a bug in amdgpu that required a reordering of the driver init sequence to avoid accessing a special register on the GPU before it was properly set up leading to an PCI AER error. This reordering uncovered a different hw programming ordering dependency in some APUs where the SDMA doorbells need to be programmed before the GFX doorbells. To fix this, move the SDMA doorbell programming back into the soc15 common code, but use the actual doorbell range values directly rather than the values stored in the ring structure since those will not be initialized at this point. This is a partial revert, but with the doorbell assignment fixed so the proper doorbell index is set before it's used. Fixes: e3163bc8ffdfdb ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: skhan@linuxfoundation.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26drm/amd/pm: disable cstate feature for gpu reset scenarioEvan Quan1-0/+8
commit 3059cd8c5f797ad83d2b194ae66339f5c007ca43 upstream. Suggested by PMFW team and same as what did for gfxoff feature. This can address some Mode1Reset failures observed on SMU13.0.0. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-21drm/amdgpu: fix initial connector audio valuehongao1-1/+6
[ Upstream commit 4bb71fce58f30df3f251118291d6b0187ce531e6 ] This got lost somewhere along the way, This fixes audio not working until set_property was called. Signed-off-by: hongao <hongao@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21drm/amdgpu: SDMA update use unlocked iteratorPhilip Yang1-3/+6
[ Upstream commit 3913f0179ba366f7d7d160c506ce00de1602bbc4 ] SDMA update page table may be called from unlocked context, this generate below warning. Use unlocked iterator to handle this case. WARNING: CPU: 0 PID: 1475 at drivers/dma-buf/dma-resv.c:483 dma_resv_iter_next Call Trace: dma_resv_iter_first+0x43/0xa0 amdgpu_vm_sdma_update+0x69/0x2d0 [amdgpu] amdgpu_vm_ptes_update+0x29c/0x870 [amdgpu] amdgpu_vm_update_range+0x2f6/0x6c0 [amdgpu] svm_range_unmap_from_gpus+0x115/0x300 [amdgpu] svm_range_cpu_invalidate_pagetables+0x510/0x5e0 [amdgpu] __mmu_notifier_invalidate_range_start+0x1d3/0x230 unmap_vmas+0x140/0x150 unmap_region+0xa8/0x110 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21drm/admgpu: Skip CG/PG on SOC21 under SRIOV VFYifan Zha1-0/+4
[ Upstream commit 828418259254863e0af5805bd712284e2bd88e3b ] [Why] There is no CG(Clock Gating)/PG(Power Gating) requirement on SRIOV VF. For multi VF, VF should not enable any CG/PG features. For one VF, PF will program CG/PG related registers. [How] Do not set any cg/pg flag bit at early init under sriov. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21drm/amdgpu: Skip the program of MMMC_VM_AGP_* in SRIOV on MMHUB v3_0_0Yifan Zha1-5/+5
[ Upstream commit c1026c6f319724dc88fc08d9d9d35bcbdf492b42 ] [Why] VF should not program these registers, the value were defined in the host. [How] Skip writing them in SRIOV environment and program them on host side. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Signed-off-by: Horace Chen <horace.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21drm/amdgpu: add missing pci_disable_device() in amdgpu_pmops_runtime_resume()Yang Yingliang1-1/+4
[ Upstream commit 6b11af6d1c8f5d4135332bb932baaa06e511173d ] Add missing pci_disable_device() if amdgpu_device_resume() fails. Fixes: 8e4d5d43cc6c ("drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21drm/amdgpu: Enable F32_WPTR_POLL_ENABLE in mqdRuili Ji1-1/+2
commit 21a550de5faf9f54013334c9a6a7643b8fd80b36 upstream. This patch is to fix the SDMA user queue doorbell missing issue on SDMA 6.0. F32_WPTR_POLL_ENABLE has to be set if doorbell mode is used. Otherwise ringing SDMA user queue doorbell can't wake up system from gfxoff. Signed-off-by: Ruili Ji <ruiliji2@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-21drm/amdgpu: Enable VCN PG on GC11_0_1Sonny Jiang1-0/+1
commit e626d9b9c6e038a6918aad1b5affd38f6b9deaed upstream. Enable VCN PG on GC11_0_1 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-21Revert "drm/amdgpu: use dirty framebuffer helper"Hamza Mahfooz1-12/+2
commit 17d819e2828cacca2e4c909044eb9798ed379cd2 upstream. This reverts commit 66f99628eb24409cb8feb5061f78283c8b65f820. Unfortunately, that commit causes performance regressions on non-PSR setups. So, just revert it until FB_DAMAGE_CLIPS support can be added. Cc: stable@vger.kernel.org Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2189 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216554 Fixes: 66f99628eb2440 ("drm/amdgpu: use dirty framebuffer helper") Fixes: abbc7a3dafb91b ("drm/amdgpu: don't register a dirty callback for non-atomic") Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-30drm/amdgpu/gfx11: switch to amdgpu_gfx_rlc_init_microcodeHawking Zhang1-152/+4
switch to common helper to initialize rlc firmware for gfx11 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30drm/amdgpu: add helper to init rlc firmwareHawking Zhang2-1/+38
To initialzie rlc firmware according to rlc firmware header version v2: squash in backwards compat fix Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30drm/amdgpu: add helper to init rlc fw in header v2_4Hawking Zhang1-0/+60
To initialize rlc firmware in header v2_4 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30drm/amdgpu: add helper to init rlc fw in header v2_3Hawking Zhang1-0/+35
To initialize rlc firmware in header v2_3 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30drm/amdgpu: add helper to init rlc fw in header v2_2Hawking Zhang1-0/+30
To initialize rlc firmware in header v2_2 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30drm/amdgpu: add helper to init rlc fw in header v2_1Hawking Zhang1-0/+40
To initialize rlc firmware in header v2_1 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30drm/amdgpu: add helper to init rlc fw in header v2_0Hawking Zhang1-0/+64
To initialize rlc firmware in header v2_0 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30drm/amdgpu: save rlcv/rlcp ucode version in amdgpu_gfxHawking Zhang3-0/+13
cache rlcv/rlcvp ucode version info in amdgpu_gfx structure Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29drm/amdgpu: Enable sram on vcn_4_0_2Sonny Jiang1-1/+1
Enable sram on vcn_4_0_2 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29drm/amdgpu: Enable VCN DPG for GC11_0_1Sonny Jiang1-0/+1
Enable VCN DPG on GC11_0_1 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-28drm/amdgpu: Add amdgpu suspend-resume code path under SRIOVBokun Zhang2-1/+30
- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor during the suspend time. Furthermore, we cannot request a mode 1 reset under SRIOV as VF. Therefore, we will skip it as it is called in suspend_noirq() function. - In the resume code path, we need to send REQ_GPU_INIT to the hypervisor and also resume PSP IP block under SRIOV. Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-09-28drm/amdgpu: Remove fence_process in count_emittedJiadong.Zhu1-1/+0
The function amdgpu_fence_count_emitted used in work_hander should not call amdgpu_fence_process which must be used in irq handler. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-28drm/amdgpu: Correct the position in patch_cond_execJiadong.Zhu1-1/+1
The current position calulated in gfx_v9_0_ring_emit_patch_cond_exec underflows when the wptr is divisible by ring->buf_mask + 1. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-28drm/amdgpu: pass queue size and is_aql_queue to MESGraham Sider2-0/+6
Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also re-use gds_size for the queue size (unused for KFD). MES requires the queue size in order to compute the actual wptr offset within the queue RB since it increases monotonically for AQL queues. v2: Make is_aql_queue assign clearer Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-28drm/amdgpu: avoid gfx register accessing during gfxoffEvan Quan1-0/+4
Make sure gfxoff is disabled before gfx register accessing. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-22drm/amdgpu: don't register a dirty callback for non-atomicAlex Deucher1-1/+10
Some asics still support non-atomic code paths. Fixes: 66f99628eb2440 ("drm/amdgpu: use dirty framebuffer helper") Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-22drm/amdgpu: Update PTE flags with TF enabledMukul Joshi2-4/+6
This patch updates the PTE flags when translate further (TF) is enabled: - With translate_further enabled, invalid PTEs can be 0. Reading consecutive invalid PTEs as 0 is considered a fault. To prevent this, ensure invalid PTEs have at least 1 bit set. - The current invalid PTE flags settings to translate a retry fault into a no-retry fault, doesn't work with TF enabled. As a result, update invalid PTE flags settings which works for both TF enabled and disabled case. Fixes: 352e683b72e79d ("drm/amdgpu: Enable translate_further to extend UTCL2 reach") Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-14drm/amdgpu/mes: zero the sdma_hqd_mask of 2nd SDMA engine for SDMA 6.0.1Yifan Zhang1-0/+3
there is only one SDMA engine in SDMA 6.0.1, the sdma_hqd_mask has to be zeroed for the 2nd engine, otherwise MES scheduler will consider 2nd engine exists and map/unmap SDMA queues to the non-existent engine. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-14drm/amdgpu: make sure to init common IP before gmcAlex Deucher1-3/+11
Move common IP init before GMC init so that HDP gets remapped before GMC init which uses it. This fixes the Unsupported Request error reported through AER during driver load. The error happens as a write happens to the remap offset before real remapping is done. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373 The error was unnoticed before and got visible because of the commit referenced below. This doesn't fix anything in the commit below, rather fixes the issue in amdgpu exposed by the commit. The reference is only to associate this commit with below one so that both go together. Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-09-14drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vegaAlex Deucher2-22/+5
This mirrors what we do for other asics and this way we are sure the sdma doorbell range is properly initialized. There is a comment about the way doorbells on gfx9 work that requires that they are initialized for other IPs before GFX is initialized. However, the statement says that it applies to multimedia as well, but the VCN code currently initializes doorbells after GFX and there are no known issues there. In my testing at least I don't see any problems on SDMA. This is a prerequisite for fixing the Unsupported Request error reported through AER during driver load. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373 The error was unnoticed before and got visible because of the commit referenced below. This doesn't fix anything in the commit below, rather fixes the issue in amdgpu exposed by the commit. The reference is only to associate this commit with below one so that both go together. Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-09-14drm/amdgpu: move nbio ih_doorbell_range() into ih code for vegaAlex Deucher3-3/+8
This mirrors what we do for other asics and this way we are sure the ih doorbell range is properly initialized. There is a comment about the way doorbells on gfx9 work that requires that they are initialized for other IPs before GFX is initialized. In this case IH is initialized before GFX, so there should be no issue. This is a prerequisite for fixing the Unsupported Request error reported through AER during driver load. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373 The error was unnoticed before and got visible because of the commit referenced below. This doesn't fix anything in the commit below, rather fixes the issue in amdgpu exposed by the commit. The reference is only to associate this commit with below one so that both go together. Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-09-13drm/amdgpu: Skip reset error status for psp v13_0_0Candice Li1-1/+2
No need to reset error status since only umc ras supported on psp v13_0_0. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdgpu: add HDP remap functionality to nbio 7.7Alex Deucher1-0/+9
Was missing before and would have resulted in a write to a non-existant register. Normally APUs don't use HDP, but other asics could use this code and APUs do use the HDP when used in passthrough. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdgpu: change the alignment size of TMR BO to 1MYang Wang2-1/+2
align TMR BO size TO tmr size is not necessary, modify the size to 1M to avoid re-create BO fail when serious VRAM fragmentation. v2: add new macro PSP_TMR_ALIGNMENT for TMR BO alignment size Signed-off-by: Yang Wang <KevinYang.Wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdgpu: Enable full reset when RAS is supported on gc v11_0_0Candice Li1-0/+1
Enable full reset for RAS supported configuration on gc v11_0_0. v2: simplify the code. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>