summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
AgeCommit message (Collapse)AuthorFilesLines
2023-08-31drm/amdgpu: revise the device initialization sequencesEvan Quan1-16/+21
By placing the sysfs interfaces creation after `.late_int`. Since some operations performed during `.late_init` may affect how the sysfs interfaces should be created. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31drm/amd/pm: introduce a new set of OD interfacesEvan Quan1-0/+2
There will be multiple interfaces(sysfs files) exposed with each representing a single OD functionality. And all those interface will be arranged in a tree liked hierarchy with the top dir as "gpu_od". Meanwhile all functionalities for the same component will be arranged under the same directory. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30drm/amdgpu: Allocate coredump memory in a nonblocking wayAndré Almeida1-1/+1
During a GPU reset, a normal memory reclaim could block to reclaim memory. Giving that coredump is a best effort mechanism, it shouldn't disturb the reset path. Change its memory allocation flag to a nonblocking one. Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30drm/amd/amdgpu/amdgpu_device: Provide suitable description for param 'xcc_id'Lee Jones1-0/+1
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:516: warning: Function parameter or member 'xcc_id' not described in 'amdgpu_mm_wreg_mmio_rlc' Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30drm/amdgpu: Fix the return for gpu mode1_resetHawking Zhang1-2/+11
amdgpu_device_mode1_reset will return gpu mode1_reset succeed (ret = 0) as long as wait_for_bootloader call succeed, regardless of the status reported by smu or psp firmware. This results to driver continue executing recovery even smu or psp fail to perform mode1 reset. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30drm/amdgpu: Add bootloader status checkLijo Lazar1-3/+14
Add a function to wait till bootloader has reached steady state. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Tested-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16drm/amd: flush any delayed gfxoff on suspend entryMario Limonciello1-0/+1
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity is happening during entry. This is because GFXOFF was scheduled as delayed but RLC gets disabled in s2idle entry sequence which will hang GFX IP if not already in GFXOFF. To help this problem, flush any delayed work for GFXOFF early in s2idle entry sequence to ensure that it's off when RLC is changed. commit 4b31b92b143f ("drm/amdgpu: complete gfxoff allow signal during suspend without delay") modified power gating flow so that if called in s0ix that it ensured that GFXOFF wasn't put in work queue but instead processed immediately. This is dead code due to commit 10cb67eb8a1b ("drm/amdgpu: skip CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly called as part of the suspend entry code. Remove that dead code. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16drm/amdgpu: disable mcbp if parameter zero is setJiadong Zhu1-4/+5
The parameter amdgpu_mcbp shall have priority against the default value calculated from the chip version. User could disable mcbp by setting the parameter mcbp as zero. v2: do not trigger preemption in sw ring muxer when mcbp is disabled. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16drm/amdgpu: Fix missing comment for mb() in 'amdgpu_device_aper_access'Srinivasan Shanmugam1-0/+6
This patch adds the missing code comment for memory barrier WARNING: memory barrier without comment + mb(); WARNING: memory barrier without comment + mb(); Cc: Guchun Chen <guchun.chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-11drm/amdkfd: drop IOMMUv2 supportAlex Deucher1-9/+0
Now that we use the dGPU path for all APUs, drop the IOMMUv2 support. v2: drop the now unused queue manager functions for gfx7/8 APUs Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-09drm/amdgpu/irq: Move irq resume to the beginningEmily Deng1-1/+1
Need to move irq resume to the beginning of reset sriov, or if one interrupt occurs before irq resume, then the irq won't work anymore. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-09drm/amdgpu: Add FRU sysfs nodes only if neededLijo Lazar1-68/+3
Create sysfs nodes for FRU data only if FRU data is available. Move the logic to FRU specific file. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd: Disable S/G for APUs when 64GB or more host memoryMario Limonciello1-0/+26
Users report a white flickering screen on multiple systems that is tied to having 64GB or more memory. When S/G is enabled pages will get pinned to both VRAM carve out and system RAM leading to this. Until it can be fixed properly, disable S/G when 64GB of memory or more is detected. This will force pages to be pinned into VRAM. This should fix white screen flickers but if VRAM pressure is encountered may lead to black screens. It's a trade-off for now. Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: <stable@vger.kernel.org> # 6.1.y: bf0207e172703 ("drm/amdgpu: add S/G display parameter") Cc: <stable@vger.kernel.org> # 6.4.y Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2735 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-27drm/amdgpu: Fix ENOSYS means 'invalid syscall nr' in amdgpu_device.cSrinivasan Shanmugam1-29/+31
ENOSYS should be used for nonexistent syscalls only, replace ENOSYS with EOPNOTSUPP for reset handlers that are not implemented for respective ASIC. WARNING: ENOSYS means 'invalid syscall nr' and nothing else + if (r == -ENOSYS) WARNING: ENOSYS means 'invalid syscall nr' and nothing else + if (r == -ENOSYS) And other following style fixes in amdgpu_device.c: WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. WARNING: Block comments should align the * on each line WARNING: Missing a blank line after declarations WARNING: braces {} are not necessary for single statement blocks Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Kent Russell <kent.russell@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25drm/amdgpu: set sw state to gfxoff after SR-IOV resetHorace Chen1-0/+3
[Why] Current SR-IOV will not set GC to off state, while it is a real GC hard reset. Whthout GFX off flag, driver may do gfxhub invalidation before firmware load and gfxhub gart enable. This operation may cause CP to become busy because GC is not in the right state for invalidation. [How] Add a function for SR-IOV to clean up some sw state before recover. Set adev->gfx.is_poweron to false to prevent gfxhub invalidation before gfx firmware autoload complete. Signed-off-by: Horace Chen <horace.chen@amd.com> Reviewed-by: HaiJun Chang <HaiJun.Chang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: Add RLCG interface driver implementation for gfx v9.4.3 (v3)Victor Lu1-2/+3
Add RLCG interface support for gfx v9.4.3 and multiple XCCs. Do not enable it yet. v2: Fix amdgpu_rlcg_reg_access_ctrl init, add support for multiple XCCs in amdgpu_mm_wreg_mmio_rlc v3: Use GET_INST() when indexing amdgpu_rlcg_reg_access_ctrl Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: create a new file for doorbell managerShashank Sharma1-170/+5
This patch: - creates a new file for doorbell management. - moves doorbell code from amdgpu_device.c to this file. V2: - remove doc from function declaration (Christian) - remove 'device' from function names to make it consistent (Alex) - add SPDX license identifier (Luben) V3: - change license to MIT license(Christian) Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12drm/amd: Move helper for dynamic speed switch check out of smu13Mario Limonciello1-0/+19
This helper is used for checking if the connected host supports the feature, it can be moved into generic code to be used by other smu implementations as well. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()Arnd Bergmann1-0/+3
On 32-bit architectures comparing a resource against a value larger than U32_MAX can cause a warning: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1344:18: error: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] res->start > 0x100000000ull) ~~~~~~~~~~ ^ ~~~~~~~~~~~~~~ As gcc does not warn about this in dead code, add an IS_ENABLED() check at the start of the function. This will always return success but not actually resize the BAR on 32-bit architectures without high memory, which is exactly what we want here, as the driver can fall back to bank switching the VRAM access. Fixes: 31b8adab3247 ("drm/amdgpu: require a root bus window above 4GB for BAR resize") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07drm/amd: Use attribute groups for PSP flashing attributesMario Limonciello1-10/+0
Individually creating attributes can be racy, instead make attributes using attribute groups and control their visibility with an is_visible callback to only show when using appropriate products. v2: squash in fix for PSP 13.0.10 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: enable mcbp by default on gfx9Alex Deucher1-0/+5
It's required for high priority queues. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2535 Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: make mcbp a per device settingAlex Deucher1-4/+15
So we can selectively enable it on certain devices. No intended functional change. Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23drm/amdgpu: Add vbios attribute only if supportedLijo Lazar1-0/+5
Not all devices carry VBIOS version information. Add the device attribute only if supported. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Fix up kdoc in amdgpu_device.cSrinivasan Shanmugam1-4/+0
Fix these warnings by deleting the deviant arguments. gcc with W=1 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:799: warning: Excess function parameter 'pcie_index' description in 'amdgpu_device_indirect_wreg' drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:799: warning: Excess function parameter 'pcie_data' description in 'amdgpu_device_indirect_wreg' drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:870: warning: Excess function parameter 'pcie_index' description in 'amdgpu_device_indirect_wreg64' drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:870: warning: Excess function parameter 'pcie_data' description in 'amdgpu_device_indirect_wreg64' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: bypass bios dependent operationsShiwu Zhang1-25/+41
Since bios reading does not work currently so just bypass all operations related to bios v2: hardcode the vram info for APP_APU case (hawking) v3: correct the vram_width with channel number * channel size (lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix incorrect pcie_gen_mask in passthrough caseTong Liu011-3/+3
[why] Passthrough case is treated as root bus and pcie_gen_mask is set as default value that does not support GEN 3 and GEN 4 for PCIe link speed. So PCIe link speed will be downgraded at smu hw init in passthrough condition [how] Move get pci info after detect virtualization and check if it is passthrough case when set pcie_gen_mask Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Refactor migrate init to support partition switchPhilip Yang1-1/+3
Rename smv_migrate_init to a better name kgd2kfd_init_zone_device because it setup zone devive pgmap for page migration and keep it in kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it only once in amdgpu_device_ip_init after adev ip blocks are initialized, but before amdgpu_amdkfd_device_init initialize kfd nodes which enable SVM support based on pgmap. svm_range_set_max_pages is called by kgd2kfd_device_init everytime after switching compute partition mode. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add partition scheduler list updateJames Zhu1-0/+2
Add partition scheduler list update in late init and xcp partition mode switch. Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: support partition drm devicesJames Zhu1-0/+1
Support partition drm devices on GC_HWIP IP_VERSION(9, 4, 3). This is a temporary solution and will be superceded. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-and-tested-by: Philip Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Checked if the pointer NULL before use it.Gavin Wan1-8/+12
For SRIOV on some parts, the host driver does not post VBIOS. So the guest cannot get bios information. Therefore, adev->virt.fw_reserve.p_pf2vf and adev->mode_info.atom_context are NULL. Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Fix unmapping of apertureLijo Lazar1-1/+1
When aperture size is zero, there is no mapping done. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: change the print level to warn for ip block disabledLe Ma1-1/+1
Avoid to mislead users as it's not a real error. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add indirect r/w interface for smn address greater than 32bitsLe Ma1-0/+98
On multiple AIDs platform, bit[34:32] in SMD address is leveraged to access nonAID0 register smn address and new PCI_INDEX_HI register is introduced to access the higher bits. v2: rebase on latest register accessors (Alex) Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: assign the doorbell index in 1st page to sdma page queueLe Ma1-1/+2
Previously for vega10, the sdma_doorbell_range is only enough for sdma gfx queue, thus the index on second doorbell page is allocated for sdma page queue. From vega20, the sdma_doorbell_range on 1st page is enlarged. Therefore, just leverage these index instead of allocation on 2nd page. v2: change "(x << 1) + 2" to "(x + 1) << 1" for readability and add comments. Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Use new atomfirmware init for GC 9.4.3Lijo Lazar1-1/+2
Use the new atomfirmware initialization logic for GC 9.4.3 based ASICs also. ASIC init logic doesn't consider boot clocks during init. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/: add more macro to support offset variantJames Zhu1-0/+28
Add more macro to support offset variant and simplify macro SOC15_WAIT_ON_RREG. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add sysfs node for compute partition modeLe Ma1-0/+1
Add current/available compute partitin mode sysfs node. v2: make the sysfs node as IP independent one in amdgpu_gfx.c Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)Lin.Cao1-1/+5
v1: Vmbo->shadow is used to back vram bo up when vram lost. So that we should set shadow as vmbo->shadow to recover vmbo->bo v2: Modify if(vmbo->shadow) shadow = vmbo->shadow as if(!vmbo->shadow) continue; Fixes: e18aaea733da ("drm/amdgpu: move shadow_list to amdgpu_bo_vm") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: set gfx9 onwards APU atomics support to be trueYifan Zhang1-0/+6
APUs w/ gfx9 onwards doesn't reply on PCIe atomics, rather it is internal path w/ native atomic support. Set have_atomics_support to true. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Use the default reset when loading or reloading the driverlyndonli1-0/+7
Below call trace and errors are observed when reloading amdgpu driver with the module parameter reset_method=3. It should do a default reset when loading or reloading the driver, regardless of the module parameter reset_method. v2: add comments inside and modify commit messages. [ +2.180243] [drm] psp gfx command ID_LOAD_TOC(0x20) failed and response status is (0x0) [ +0.000011] [drm:psp_hw_start [amdgpu]] *ERROR* Failed to load toc [ +0.000890] [drm:psp_hw_start [amdgpu]] *ERROR* PSP tmr init failed! [ +0.020683] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. [ +0.000003] RIP: 0010:amdgpu_bo_release_notify+0x1ef/0x210 [amdgpu] [ +0.000004] Call Trace: [ +0.000003] <TASK> [ +0.000008] ttm_bo_release+0x2c4/0x330 [amdttm] [ +0.000026] amdttm_bo_put+0x3c/0x70 [amdttm] [ +0.000020] amdgpu_bo_free_kernel+0xe6/0x140 [amdgpu] [ +0.000728] psp_v11_0_ring_destroy+0x34/0x60 [amdgpu] [ +0.000826] psp_hw_init+0xe7/0x2f0 [amdgpu] [ +0.000813] amdgpu_device_fw_loading+0x1ad/0x2d0 [amdgpu] [ +0.000731] amdgpu_device_init.cold+0x108e/0x2002 [amdgpu] [ +0.001071] ? do_pci_enable_device+0xe1/0x110 [ +0.000011] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ +0.000729] amdgpu_pci_probe+0x179/0x3a0 [amdgpu] Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-21drm/amd/amdgpu: Fix style errors in amdgpu_drv.c & amdgpu_device.cSrinivasan Shanmugam1-32/+33
Fix following checkpatch style errors in amdgpu_drv.c & amdgpu_device.c ERROR: exactly one space required after that #ifdef ERROR: spaces required around that '+=' (ctx:WxV) ERROR: space required before the open brace '{' ERROR: spaces required around that '||' (ctx:VxE) ERROR: space prohibited before that close parenthesis ')' ERROR: space required before the open parenthesis '(' ERROR: space required before the open brace '{' ERROR: code indent should use tabs where possible Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu: release gpu full access after "amdgpu_device_ip_late_init"Chong Li1-15/+17
[WHY] Function "amdgpu_irq_update()" called by "amdgpu_device_ip_late_init()" is an atomic context. We shouldn't access registers through KIQ since "msleep()" may be called in "amdgpu_kiq_rreg()". [HOW] Move function "amdgpu_virt_release_full_gpu()" after function "amdgpu_device_ip_late_init()", to ensure that registers be accessed through RLCG instead of KIQ. Call Trace: <TASK> show_stack+0x52/0x69 dump_stack_lvl+0x49/0x6d dump_stack+0x10/0x18 __schedule_bug.cold+0x4f/0x6b __schedule+0x473/0x5d0 ? __wake_up_klogd.part.0+0x40/0x70 ? vprintk_emit+0xbe/0x1f0 schedule+0x68/0x110 schedule_timeout+0x87/0x160 ? timer_migration_handler+0xa0/0xa0 msleep+0x2d/0x50 amdgpu_kiq_rreg+0x18d/0x1f0 [amdgpu] amdgpu_device_rreg.part.0+0x59/0xd0 [amdgpu] amdgpu_device_rreg+0x3a/0x50 [amdgpu] amdgpu_sriov_rreg+0x3c/0xb0 [amdgpu] gfx_v10_0_set_gfx_eop_interrupt_state.constprop.0+0x16c/0x190 [amdgpu] gfx_v10_0_set_eop_interrupt_state+0xa5/0xb0 [amdgpu] amdgpu_irq_update+0x53/0x80 [amdgpu] amdgpu_irq_get+0x7c/0xb0 [amdgpu] amdgpu_fence_driver_hw_init+0x58/0x90 [amdgpu] amdgpu_device_init.cold+0x16b7/0x2022 [amdgpu] Signed-off-by: Chong Li <chongli2@amd.com> Reviewed-by: JingWen.Chen2@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: skip kfd-iommu suspend/resume for S0ixAaron Liu1-3/+5
GFX is in gfxoff mode during s0ix so we shouldn't need to actually execute kfd_iommu_suspend/kfd_iommu_resume operation. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: rename num_doorbellsShashank Sharma1-11/+11
Rename doorbell.num_doorbells to doorbell.num_kernel_doorbells to make it more readable. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Acked-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-12drm/amdkfd: Check PCIe atomics support on GFX11 to set CP_HQD_HQ_STATUS0[29]Sreekant Somasekharan1-1/+1
CP_HQD_HQ_STATUS0[29] bit will be used by CPFW to acknowledge whether PCIe atomics are supported. The default value of this bit is set to 0. Driver will check whether PCIe atomics are supported and set the bit to 1 if supported. This will force CPFW to use real atomic ops. If the bit is not set, CPFW will default to read/modify/write using the firmware itself. This is applicable only to GFX11 RS64 CP with MEC FW >= 509. If MEC FW < 509 and for all GFX11 F32 CP, PCIe atomics needs to be supported else it will skip the device. This commit also involves moving amdgpu_amdkfd_device_probe() function call after per-IP early_init loop in amdgpu_device_ip_early_init() function so as to check for RS64 enabled device. Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com> Reviewed-by: Graham Sider <Graham.Sider@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-12drm/amd/amdgpu: Drop the hang limit parameterSrinivasan Shanmugam1-1/+1
The driver doesn't resubmit jobs on hangs any more, hence drop the hang limit parameter - amdgpu_job_hang_limit, wherever it is used. Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-31drm/amdgpu: Add JPEG IP block to SRIOV reinitYifan Zha1-1/+2
[Why] Reset(mode1) failed as JPRG IP did not reinit under sriov. [How] Add JPEG IP block to sriov reinit function. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Horace Chen <Horace.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-22drm/amdgpu: resume ras for gfx v11_0_3 during reset on SRIOVYiPeng Chai1-2/+3
Gfx v11_0_3 supports ras on SRIOV, so need to resume ras during reset. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-22drm/amdgpu: reinit mes ip block during reset on SRIOVYiPeng Chai1-0/+1
Reinit mes ip block during reset on SRIOV. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-22drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD NaviKai-Heng Feng1-0/+15
S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default"). The root cause is still not clear for now. So extend and apply the ASPM quirk from commit e02fe3bc7aba ("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to workaround the issue on Navi cards too. Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>