summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09drm/amdgpu/gfx8: always restore kcq MQDsAlex Deucher1-4/+2
Always restore the MQD not just when we do a reset. This allows us to move the MQD to VRAM if we want. v2: always reset ring pointer as well (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/gfx11: drop unused variableAlex Deucher1-3/+1
Just check the return value directly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/gfx10: drop unused variableAlex Deucher1-3/+1
Just check the return value directly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/gfx11: use generic [en/dis]able_kgq() helpersAlex Deucher1-47/+2
And remove the duplicate local variants. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/gfx10: use generic [en/dis]able_kgq() helpersAlex Deucher1-46/+2
And remove the duplicate local variants. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add [en/dis]able_kgq() functionsAlex Deucher2-0/+70
To replace the IP specific variants which are largely duplicate. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amd/amdgpu: Fix style problems in amdgpu_psp.cSrinivasan Shanmugam1-31/+20
Fix the following checkpatch warnings & error in amdgpu_psp.c WARNING: Comparisons should place the constant on the right side of the test WARNING: braces {} are not necessary for single statement blocks WARNING: please, no space before tabs WARNING: braces {} are not necessary for single statement blocks ERROR: that open brace { should be on the previous line Suggested-by: Christian König <christian.koenig@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/gfx10: drop old bring up codeAlex Deucher1-67/+3
No longer used. Remove it. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/gfx11: drop old bring up codeAlex Deucher1-67/+3
No longer used. Remove it. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amd/amdgpu: Fix style problems in amdgpu_debugfs.cSrinivasan Shanmugam1-13/+15
Fix the following issues reported by checkpatch: WARNING: please, no space before tabs WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: sizeof *rd should be sizeof(*rd) WARNING: Missing a blank line after declarations WARNING: sizeof rd->id should be sizeof(rd->id) WARNING: static const char * array should probably be static const char * const WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. WARNING: Prefer seq_puts to seq_printf ERROR: space prohibited after that open parenthesis '(' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Enable mcbp under sriov by defaultYuBiao Wang1-2/+2
Enable mcbp under sriov by default. Asics with soc21 supports mcbp now so we should set it enabled. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: remove pasid_src field from IV entryXiaomeng Hou2-2/+0
PASID_SRC is not actually present in the Interrupt Packet, the field is taken as reserved bits now. So remove it from IV entry to avoid misuse. Signed-off-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amd/amdgpu: Simplify switch case statements in amdgpu_connectors.cSrinivasan Shanmugam1-10/+29
Fix the following checkpatch errors: ERROR: trailing statements should be on next line ERROR: space required after that ',' (ctx:VxV) ERROR: code indent should use tabs where possible Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOVHorace Chen1-4/+1
[Why] This WPTR_POLL_ENABLE is a hardware contigious polling which will cause FCLK and UCLK to keep on a high level. Mostly its case can be covered by F32_WPTR_POLL_ENABLE which polls by firmware. So to save power, SR-IOV also needs to disable this bit Signed-off-by: Horace Chen <horace.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add SDMA_UTCL1_WR_FIFO_SED field for sdma_v4_4_ras_fieldStanley.Yang1-0/+4
Query sdma_utcl1_wr_fifo_sed fiel to detect UTCL1_WR_FIFO SED error counts Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add a missing lock for AMDGPU_SCHEDChia-I Wu1-1/+5
mgr->ctx_handles should be protected by mgr->lock. v2: improve commit message v3: add a Fixes tag Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: 52c6a62c64fa ("drm/amdgpu: add interface for editing a foreign process's priority v3") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Update KFD TTM mem limitMukul Joshi2-2/+7
Use the helper function in TTM to get TTM memory limit and set KFD's internal mem limit. This ensures that KFD's TTM mem limit and actual TTM mem limit are exactly same. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Set GTT size equal to TTM mem limitMukul Joshi2-20/+6
Use the helper function in TTM to get TTM mem limit and set GTT size to be equal to TTL mem limit. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: mark gfx_v9_4_3_disable_gpa_mode() staticGuchun Chen1-1/+1
This was left global by accident, the corresponding functions for other hardware types are already static: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1072:6: error: no previous prototype for function 'gfx_v9_4_3_disable_gpa_mode' [-Werror,-Wmissing-prototypes] Fixes: 86301129698b ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: check correct allocated mqd_backup object after allocGuchun Chen1-4/+5
Instead of the default one, check the right mqd_backup object. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Cc: Le Ma <le.ma@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix a build warning by a typo in amdgpu_gfx.cGuchun Chen1-2/+2
This should be a typo when intruducing multi-xx support. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Cc: Le Ma <le.ma@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_finiHoratio Zhang1-1/+0
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v10_0_hw_fini, which also leads to the call trace. [ 82.340264] Call Trace: [ 82.340265] <TASK> [ 82.340269] gmc_v10_0_hw_fini+0x83/0xa0 [amdgpu] [ 82.340447] gmc_v10_0_suspend+0xe/0x20 [amdgpu] [ 82.340623] amdgpu_device_ip_suspend_phase2+0x127/0x1c0 [amdgpu] [ 82.340789] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 82.340955] amdgpu_device_pre_asic_reset+0xdd/0x2b0 [amdgpu] [ 82.341122] amdgpu_device_gpu_recover.cold+0x4dd/0xbb2 [amdgpu] [ 82.341359] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 82.341529] process_one_work+0x21d/0x3f0 [ 82.341535] worker_thread+0x1fa/0x3c0 [ 82.341538] ? process_one_work+0x3f0/0x3f0 [ 82.341540] kthread+0xff/0x130 [ 82.341544] ? kthread_complete_and_exit+0x20/0x20 [ 82.341547] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_finiHoratio Zhang1-1/+0
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v11_0_hw_fini, which also leads to the call trace. [ 102.980303] Call Trace: [ 102.980303] <TASK> [ 102.980304] gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [ 102.980357] gmc_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.980409] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.980459] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.980520] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.980573] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.980687] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.980740] process_one_work+0x21f/0x3f0 [ 102.980741] worker_thread+0x200/0x3e0 [ 102.980742] ? process_one_work+0x3f0/0x3f0 [ 102.980743] kthread+0xfd/0x130 [ 102.980743] ? kthread_complete_and_exit+0x20/0x20 [ 102.980744] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Enable doorbell selfring after resize FB BARShane Xiao3-30/+41
[Why] The selfring doorbell aperture will change when resize FB BAR successfully during gmc sw init, we should reorder the sequence of enabling doorbell selfring aperture. [How] Move enable_doorbell_selfring_aperture from *_common_hw_init to *_common_late_init. This fixes the potential issue that GPU ring its own doorbell when this device is in translated mode when iommu is on. v2: Remove *_enable_doorbell_aperture functions (Christian) v3: Add comments to note that why we need enable doorbell selfring late (Christian) Signed-off-by: Shane Xiao <shane.xiao@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Reviewed-by: Christian K�nig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amd/amdgpu: Fix style errors in amdgpu_display.cSrinivasan Shanmugam1-9/+8
Fix following checkpatch errors in amdgpu_display.c ERROR: spaces required around that '=' (ctx:VxW) ERROR: that open brace { should be on the previous line ERROR: else should follow close brace '}' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Use the default reset when loading or reloading the driverlyndonli1-0/+7
Below call trace and errors are observed when reloading amdgpu driver with the module parameter reset_method=3. It should do a default reset when loading or reloading the driver, regardless of the module parameter reset_method. v2: add comments inside and modify commit messages. [ +2.180243] [drm] psp gfx command ID_LOAD_TOC(0x20) failed and response status is (0x0) [ +0.000011] [drm:psp_hw_start [amdgpu]] *ERROR* Failed to load toc [ +0.000890] [drm:psp_hw_start [amdgpu]] *ERROR* PSP tmr init failed! [ +0.020683] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. [ +0.000003] RIP: 0010:amdgpu_bo_release_notify+0x1ef/0x210 [amdgpu] [ +0.000004] Call Trace: [ +0.000003] <TASK> [ +0.000008] ttm_bo_release+0x2c4/0x330 [amdttm] [ +0.000026] amdttm_bo_put+0x3c/0x70 [amdttm] [ +0.000020] amdgpu_bo_free_kernel+0xe6/0x140 [amdgpu] [ +0.000728] psp_v11_0_ring_destroy+0x34/0x60 [amdgpu] [ +0.000826] psp_hw_init+0xe7/0x2f0 [amdgpu] [ +0.000813] amdgpu_device_fw_loading+0x1ad/0x2d0 [amdgpu] [ +0.000731] amdgpu_device_init.cold+0x108e/0x2002 [amdgpu] [ +0.001071] ? do_pci_enable_device+0xe1/0x110 [ +0.000011] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ +0.000729] amdgpu_pci_probe+0x179/0x3a0 [amdgpu] Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Fix mode2 reset for sienna cichlidlyndonli1-1/+1
Before this change, sienna_cichlid_get_reset_handler will always return NULL, although the module parameter reset_method is 3 when loading amdgpu driver. Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add new flag to AMDGPU_CTX_QUERY2Pierre-Eric Pelloux-Prayer2-1/+5
OpenGL EXT_robustness extension expects the driver to stop reporting GUILTY_CONTEXT_RESET when the reset has completed and the GPU is ready to accept submission again. This commit adds a AMDGPU_CTX_QUERY2_FLAGS_RESET_IN_PROGRESS flag, that let the UMD know that the reset is still not finished. Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm:amd:amdgpu: Fix missing bo unlock in failure pathSukrut Bellary1-1/+3
smatch warning - inconsistent handling of buffer object reserve and unreserve. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sukrut Bellary <sukrut.bellary@linux.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-01Merge drm/drm-next into drm-misc-nextThomas Zimmermann8-61/+41
Backmerging from drm-next to get commit e24e6d695377 ("drm/i915/display: Implement fb_mmap callback function"). Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-05-26drm/amdgpu: Fix no-procfs buildRob Clark1-0/+2
Fixes undefined symbol when PROC_FS is not enabled. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202305251510.U0R2as7k-lkp@intel.com/ Fixes: 376c25f8ca47 ("drm/amdgpu: Switch to fdinfo helper") Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: <alexander.deucher@amd.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230525155227.560094-1-robdclark@gmail.com
2023-05-24drm/amdgpu: Switch to fdinfo helperRob Clark3-19/+18
v2: Rebase on drm-misc-next Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230524155956.382440-5-robdclark@gmail.com
2023-05-11drm/amdgpu: change gfx 11.0.4 external_id rangeYifan Zhang1-1/+1
gfx 11.0.4 range starts from 0x80. Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 11.0.4") Cc: stable@vger.kernel.org Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reported-by: Yogesh Mohan Marimuthu <Yogesh.Mohanmarimuthu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-11drm/amdgpu/jpeg: Remove harvest checking for JPEG3Saleemkhan Jamadar1-0/+1
Register CC_UVD_HARVESTING is obsolete for JPEG 3.1.2 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-05-11drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when enabling legacy gfx rasGuchun Chen1-1/+2
gfx9 cp_ecc_error_irq is only enabled when legacy gfx ras is assert. So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq should be executed under such condition, otherwise, an amdgpu_irq_put calltrace will occur. [ 7283.170322] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ 7283.170964] RSP: 0018:ffff9a5fc3967d00 EFLAGS: 00010246 [ 7283.170967] RAX: ffff98d88afd3040 RBX: ffff98d89da20000 RCX: 0000000000000000 [ 7283.170969] RDX: 0000000000000000 RSI: ffff98d89da2bef8 RDI: ffff98d89da20000 [ 7283.170971] RBP: ffff98d89da20000 R08: ffff98d89da2ca18 R09: 0000000000000006 [ 7283.170973] R10: ffffd5764243c008 R11: 0000000000000000 R12: 0000000000001050 [ 7283.170975] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105 [ 7283.170978] FS: 0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.170981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.170983] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ 7283.170986] Call Trace: [ 7283.170988] <TASK> [ 7283.170989] gfx_v9_0_hw_fini+0x1c/0x6d0 [amdgpu] [ 7283.171655] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.172245] amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.172823] amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.173412] pci_pm_freeze+0x54/0xc0 [ 7283.173419] ? __pfx_pci_pm_freeze+0x10/0x10 [ 7283.173425] dpm_run_callback+0x98/0x200 [ 7283.173430] __device_suspend+0x164/0x5f0 v2: drop gfx11 as it's fixed in a different solution by retiring cp_ecc_irq funcs(Hawking) Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-11drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspendGuchun Chen1-3/+5
sdma_v4_0_ip is shared on a few asics, but in sdma_v4_0_hw_fini, driver unconditionally disables ecc_irq which is only enabled on those asics enabling sdma ecc. This will introduce a warning in suspend cycle on those chips with sdma ip v4.0, while without sdma ecc. So this patch correct this. [ 7283.166354] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ 7283.167001] RSP: 0018:ffff9a5fc3967d08 EFLAGS: 00010246 [ 7283.167019] RAX: ffff98d88afd3770 RBX: 0000000000000001 RCX: 0000000000000000 [ 7283.167023] RDX: 0000000000000000 RSI: ffff98d89da30390 RDI: ffff98d89da20000 [ 7283.167025] RBP: ffff98d89da20000 R08: 0000000000036838 R09: 0000000000000006 [ 7283.167028] R10: ffffd5764243c008 R11: 0000000000000000 R12: ffff98d89da30390 [ 7283.167030] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105 [ 7283.167032] FS: 0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.167036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.167039] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ 7283.167041] Call Trace: [ 7283.167046] <TASK> [ 7283.167048] sdma_v4_0_hw_fini+0x38/0xa0 [amdgpu] [ 7283.167704] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.168296] amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.168875] amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.169464] pci_pm_freeze+0x54/0xc0 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-11drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)Lin.Cao1-1/+5
v1: Vmbo->shadow is used to back vram bo up when vram lost. So that we should set shadow as vmbo->shadow to recover vmbo->bo v2: Modify if(vmbo->shadow) shadow = vmbo->shadow as if(!vmbo->shadow) continue; Fixes: e18aaea733da ("drm/amdgpu: move shadow_list to amdgpu_bo_vm") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-11drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcsHoratio Zhang2-49/+5
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still use amdgpu_irq_put to disable this interrupt, which caused the call trace in this function. [ 102.873958] Call Trace: [ 102.873959] <TASK> [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.874072] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.874122] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.874172] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.874223] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.874321] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.874375] process_one_work+0x21f/0x3f0 [ 102.874377] worker_thread+0x200/0x3e0 [ 102.874378] ? process_one_work+0x3f0/0x3f0 [ 102.874379] kthread+0xfd/0x130 [ 102.874380] ? kthread_complete_and_exit+0x20/0x20 [ 102.874381] ret_from_fork+0x22/0x30 v2: - Handle umc and gfx ras cases in separated patch - Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11 v3: - Improve the subject and code comments - Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init v4: - Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state - Check cp_ecc_error_irq.funcs rather than ip version for a more sustainable life v5: - Simplify judgment conditions Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-11drm/amdgpu: set gfx9 onwards APU atomics support to be trueYifan Zhang1-0/+6
APUs w/ gfx9 onwards doesn't reply on PCIe atomics, rather it is internal path w/ native atomic support. Set have_atomics_support to true. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-11drm/amdgpu/nv: update VCN 3 max HEVC encoding resolutionThong Thai1-6/+16
Update the maximum resolution reported for HEVC encoding on VCN 3 devices to reflect its 8K encoding capability. v2: Also update the max height for H.264 encoding to match spec. (Ruijing) Signed-off-by: Thong Thai <thong.thai@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-05Merge tag 'drm-next-2023-05-05' of git://anongit.freedesktop.org/drm/drmLinus Torvalds12-65/+76
Pull more drm fixes from Dave Airlie: "This is the fixes for the last couple of weeks for i915 and last 3 weeks for amdgpu, lots of them but pretty scattered around and all pretty small. amdgpu: - SR-IOV fixes - DCN 3.2 fixes - DC mclk handling fixes - eDP fixes - SubVP fixes - HDCP regression fix - DSC fixes - DC FP fixes - DCN 3.x fixes - Display flickering fix when switching between vram and gtt - Z8 power saving fix - Fix hang when skipping modeset - GPU reset fixes - Doorbell fix when resizing BARs - Fix spurious warnings in gmc - Locking fix for AMDGPU_SCHED IOCTL - SR-IOV fix - DCN 3.1.4 fix - DCN 3.2 fix - Fix job cleanup when CS is aborted i915: - skl pipe source size check - mtl transcoder mask fix - DSI power on sequence fix - GuC versioning corner case fix" * tag 'drm-next-2023-05-05' of git://anongit.freedesktop.org/drm/drm: (48 commits) drm/amdgpu: drop redundant sched job cleanup when cs is aborted drm/amd/display: filter out invalid bits in pipe_fuses drm/amd/display: Change default Z8 watermark values drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOV drm/amdgpu: add a missing lock for AMDGPU_SCHED drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini() drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini drm/amdgpu: Enable doorbell selfring after resize FB BAR drm/amdgpu: Use the default reset when loading or reloading the driver drm/amdgpu: Fix mode2 reset for sienna cichlid drm/i915/dsi: Use unconditional msleep() instead of intel_dsi_msleep() drm/i915/mtl: Add the missing CPU transcoder mask in intel_device_info drm/i915/guc: Actually return an error if GuC version range check fails drm/amd/display: Lowering min Z8 residency time drm/amd/display: fix flickering caused by S/G mode drm/amd/display: Set min_width and min_height capability for DCN30 drm/amd/display: Isolate remaining FPU code in DCN32 drm/amd/display: Update bounding box values for DCN321 drm/amd/display: Do not clear GPINT register when releasing DMUB from reset ...
2023-05-04drm/amdgpu: drop redundant sched job cleanup when cs is abortedGuchun Chen1-10/+3
Once command submission failed due to userptr invalidation in amdgpu_cs_submit, legacy code will perform cleanup of scheduler job. However, it's not needed at all, as former commit has integrated job cleanup stuff into amdgpu_job_free. Otherwise, because of double free, a NULL pointer dereference will occur in such scenario. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2457 Fixes: f7d66fb2ea43 ("drm/amdgpu: cleanup scheduler job initialization v2") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-04drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOVHorace Chen1-4/+1
[Why] This WPTR_POLL_ENABLE is a hardware contigious polling which will cause FCLK and UCLK to keep on a high level. Mostly its case can be covered by F32_WPTR_POLL_ENABLE which polls by firmware. So to save power, SR-IOV also needs to disable this bit Signed-off-by: Horace Chen <horace.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-04drm/amdgpu: add a missing lock for AMDGPU_SCHEDChia-I Wu1-1/+5
mgr->ctx_handles should be protected by mgr->lock. v2: improve commit message v3: add a Fixes tag Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: 52c6a62c64fa ("drm/amdgpu: add interface for editing a foreign process's priority v3") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-04drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini()Hamza Mahfooz1-1/+0
As made mention of in commit 08c677cb0b43 ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini") and commit 13af556104fa ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini"). It is meaningless to call amdgpu_irq_put() for gmc.ecc_irq. So, remove it from gmc_v9_0_hw_fini(). Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: 3029c855d79f ("drm/amdgpu: Fix desktop freezed after gpu-reset") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-04drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_finiHoratio Zhang1-1/+0
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v10_0_hw_fini, which also leads to the call trace. [ 82.340264] Call Trace: [ 82.340265] <TASK> [ 82.340269] gmc_v10_0_hw_fini+0x83/0xa0 [amdgpu] [ 82.340447] gmc_v10_0_suspend+0xe/0x20 [amdgpu] [ 82.340623] amdgpu_device_ip_suspend_phase2+0x127/0x1c0 [amdgpu] [ 82.340789] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 82.340955] amdgpu_device_pre_asic_reset+0xdd/0x2b0 [amdgpu] [ 82.341122] amdgpu_device_gpu_recover.cold+0x4dd/0xbb2 [amdgpu] [ 82.341359] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 82.341529] process_one_work+0x21d/0x3f0 [ 82.341535] worker_thread+0x1fa/0x3c0 [ 82.341538] ? process_one_work+0x3f0/0x3f0 [ 82.341540] kthread+0xff/0x130 [ 82.341544] ? kthread_complete_and_exit+0x20/0x20 [ 82.341547] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: c8b5a95b5709 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Cc: stable@vger.kernel.org
2023-05-04drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_finiHoratio Zhang1-1/+0
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v11_0_hw_fini, which also leads to the call trace. [ 102.980303] Call Trace: [ 102.980303] <TASK> [ 102.980304] gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [ 102.980357] gmc_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.980409] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.980459] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.980520] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.980573] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.980687] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.980740] process_one_work+0x21f/0x3f0 [ 102.980741] worker_thread+0x200/0x3e0 [ 102.980742] ? process_one_work+0x3f0/0x3f0 [ 102.980743] kthread+0xfd/0x130 [ 102.980743] ? kthread_complete_and_exit+0x20/0x20 [ 102.980744] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: c8b5a95b5709 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Cc: stable@vger.kernel.org
2023-05-04drm/amdgpu: Enable doorbell selfring after resize FB BARShane Xiao3-30/+41
[Why] The selfring doorbell aperture will change when resize FB BAR successfully during gmc sw init, we should reorder the sequence of enabling doorbell selfring aperture. [How] Move enable_doorbell_selfring_aperture from *_common_hw_init to *_common_late_init. This fixes the potential issue that GPU ring its own doorbell when this device is in translated mode when iommu is on. v2: Remove *_enable_doorbell_aperture functions (Christian) v3: Add comments to note that why we need enable doorbell selfring late (Christian) Signed-off-by: Shane Xiao <shane.xiao@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Reviewed-by: Christian K�nig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-04drm/amdgpu: Use the default reset when loading or reloading the driverlyndonli1-0/+7
Below call trace and errors are observed when reloading amdgpu driver with the module parameter reset_method=3. It should do a default reset when loading or reloading the driver, regardless of the module parameter reset_method. v2: add comments inside and modify commit messages. [ +2.180243] [drm] psp gfx command ID_LOAD_TOC(0x20) failed and response status is (0x0) [ +0.000011] [drm:psp_hw_start [amdgpu]] *ERROR* Failed to load toc [ +0.000890] [drm:psp_hw_start [amdgpu]] *ERROR* PSP tmr init failed! [ +0.020683] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. [ +0.000003] RIP: 0010:amdgpu_bo_release_notify+0x1ef/0x210 [amdgpu] [ +0.000004] Call Trace: [ +0.000003] <TASK> [ +0.000008] ttm_bo_release+0x2c4/0x330 [amdttm] [ +0.000026] amdttm_bo_put+0x3c/0x70 [amdttm] [ +0.000020] amdgpu_bo_free_kernel+0xe6/0x140 [amdgpu] [ +0.000728] psp_v11_0_ring_destroy+0x34/0x60 [amdgpu] [ +0.000826] psp_hw_init+0xe7/0x2f0 [amdgpu] [ +0.000813] amdgpu_device_fw_loading+0x1ad/0x2d0 [amdgpu] [ +0.000731] amdgpu_device_init.cold+0x108e/0x2002 [amdgpu] [ +0.001071] ? do_pci_enable_device+0xe1/0x110 [ +0.000011] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ +0.000729] amdgpu_pci_probe+0x179/0x3a0 [amdgpu] Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-04drm/amdgpu: Fix mode2 reset for sienna cichlidlyndonli1-1/+1
Before this change, sienna_cichlid_get_reset_handler will always return NULL, although the module parameter reset_method is 3 when loading amdgpu driver. Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>