summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2021-03-24drm/amdgpu: nuke the ih reentrant lockChristian König3-7/+0
Interrupts on are non-reentrant on linux. This is just an ancient leftover from radeon where irq processing was kicked of from different places. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdkfd: Fix recursive lock warningsFelix Kuehling1-3/+3
memalloc_nofs_save/restore are no longer sufficient to prevent recursive lock warnings when holding locks that can be taken in MMU notifiers. Use memalloc_noreclaim_save/restore instead. Fixes: f920e413ff9c ("mm: track mmu notifiers in fs_reclaim_acquire/release") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: fix send ras disable cmd when asic not support rasStanley.Yang4-15/+62
cause: It is necessary to send ras disable command to ras-ta during gfx block ras later init, because the ras capability is disable read from vbios for vega20 gaming, but the ras context is released during ras init process, this will cause send ras disable command to ras-to failed. how: Delay releasing ras context, the ras context will be released after gfx block later init done. Changed from V1: move release_ras_context into ras_resume Changed from V2: check BIT(UMC) is more reasonable before access eeprom table Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: update ecc query support for arcturusHawking Zhang1-6/+10
arcturus and sienna_cichlid share the same version of umc_info interface (umc_info v33). arcturus uses umc_config to indicate ECC capability, while sienna_cichlid uses umc_config1 to indicate ECC capability. driver needs to check either umc_config or umc_config1 to decide ECC capability for ASICs that use umc_info v33 interface. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: use the new cursor in the VM codeChristian König1-37/+18
Separate the drm_mm_node walking from the actual handling. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: use the new cursor in amdgpu_ttm_bo_eviction_valuableChristian König1-6/+8
Separate the drm_mm_node walking from the actual handling. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: use new cursor in amdgpu_mem_visibleChristian König1-4/+6
Separate the drm_mm_node walking from the actual handling. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: use the new cursor in amdgpu_ttm_access_memoryChristian König1-49/+18
Separate the drm_mm_node walking from the actual handling. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: use new cursor in amdgpu_ttm_io_mem_pfnChristian König1-5/+3
Separate the drm_mm_node walking from the actual handling. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: use the new cursor in amdgpu_fill_bufferChristian König1-50/+15
Separate the drm_mm_node walking from the actual handling. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: use the new cursor in amdgpu_ttm_copy_mem_to_memChristian König1-61/+26
Separate the drm_mm_node walking from the actual handling. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: new resource cursor (v2)Christian König1-0/+105
Allows to walk over the drm_mm nodes in a TTM resource object. v2: squash in fix from Felix Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Free PDB0 bo before bo_finiLijo Lazar1-2/+1
Cleanup pdb0 bo before bo_fini gets called Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: support query ecc cap for SIENNA_CICHLIDHawking Zhang2-7/+25
driver needs to query umc_info_v3_3 for ecc capability in sienna_cichlid Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: fix a few compiler warningsOak Zeng2-3/+3
1. make function mmhub_v1_7_setup_vm_pt_regs static 2. indent a if statement Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: fix compile error on architecture s390 (v2)Oak Zeng1-0/+2
ioremap_cache is not supported on some architecture such as s390. Put the codes into a #ifdef to fix some compile error reported by test robot. v2: squash in non-x86 fix Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reported-by: Kernel test robot <lkp@intel.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24Revert "drm/amdgpu: During compute disable GFXOFF for Sienna_Cichlid"Harish Kasiviswanathan1-7/+0
This reverts commit 73bf5cad2696fe3a21f70101821405db839ea18e. Fixed in newer firmware Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Replace in_task() in gfx_v8_0_parse_sq_irq()Sebastian Andrzej Siewior1-4/+5
gfx_v8_0_parse_sq_irq() is using in_task() to distinguish if it is invoked from a workqueue worker or directly from the interrupt handler. The usage of in_interrupt() in drivers is phased out and Linus clearly requested that code which changes behaviour depending on context should either be separated or the context be conveyed in an argument passed by the caller, which usually knows the context. gfx_v8_0_parse_sq_irq() is invoked directly either from a worker or from the interrupt service routine. The worker is only bypassed if the worker is already busy. Add an argument `from_wq' to gfx_v8_0_parse_sq_irq() which is true if invoked from the worker. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Remove in_interrupt() usage in gfx_v9_0_kiq_read_clock()Sebastian Andrzej Siewior1-1/+1
gfx_v9_0_get_gpu_clock_counter() acquires a mutex_t lock and is the only caller of gfx_v9_0_kiq_read_clock(). If it safe to acquire a mutex_t then gfx_v9_0_get_gpu_clock_counter() is always invoked from preemptible context. Remove in_interrupt() because it superfluous as it will always return false. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Replace in_interrupt() usage in gmc_v*_process_interrupt()Sebastian Andrzej Siewior2-2/+2
The usage of in_interrupt() in gmc_v*_process_interrupt() is intended to use a different code path if invoked from the interrupt handler vs invoked from the workqueue. The usage of in_interrupt() in drivers is phased out and Linus clearly requested that code which changes behaviour depending on context should either be separated or the context be conveyed in an argument passed by the caller, which usually knows the context. gmc_v*_process_interrupt() is invoked via the ->process() callback from amdgpu_ih_process() which in turn is invoked either from amdgpu_irq_handler() (the interrupt handler) or from amdgpu_irq_handle_*() which is a workqueue. amdgpu_irq::ih is always processed from the interrupt handler, the other three struct amdgpu_ih_ring members are processed from a workqueue. Replace the in_interrupt() check with a comparison against adev->irq.ih. A similar check is already done to check if the ih pointer is from ih_soft. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Fix spelling mistake "disabed" -> "disabled"Colin Ian King1-1/+1
There is a spelling mistake in a drm debug message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: update secure display TA headerJinzhou Su2-0/+4
update secure display TA header file. Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu:disable XGMI TA unload for A+A aldebaranFeifei Xu1-2/+3
In gpu reset, XGMI TA unload will cause gpu hang. Skip it on A+A aldebaran. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Enable light SBR for SMU on passthrough and XGMI configurationshaoyunl1-0/+4
SMU introduce the new interface to enable light Secondary Bus Reset mode, driver enable it on passthrough + XGMI configuration Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: skip read eeprom for device that pending on XGMI resetshaoyunl1-0/+6
Read eeprom through SMU doesn't works stable on XGMI reset during test. skip it for now Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: fix S0ix handling when the CONFIG_AMD_PMC=mAlex Deucher1-1/+1
Need to check the module variant as well. Acked-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4fshaoyunl1-1/+1
This recent change introduce SDMA interrupt info printing with irq->process function. These functions do not require a set function to enable/disable the irq Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: add ih waiter on process until checkpointJonathan Kim2-0/+54
Add IH function to allow caller to wait until ring entries are processed until the checkpoint write pointer. This will be primarily used by HMM to drain pending page fault interrupts before memory unmap to prevent HMM from handling stale interrupts. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: fb BO should be ttm_bo_type_deviceNirmoy Das1-1/+1
FB BO should not be ttm_bo_type_kernel type and amdgpufb_create_pinned_object() pins the FB BO anyway. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Reset the devices in the XGMI hive duirng probeshaoyunl6-31/+164
In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices without sync to each other. This could cause device hang since for XGMI configuration, all the devices within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Add reset_list for device list used for resetshaoyunl2-15/+19
The gmc.xgmi.head list originally is designed for device list in the XGMI hive. Mix use it for reset purpose will prevent the reset function to adjust XGMI device list which is required in next change Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Init the cp MQD if it's not be initialized beforeshaoyunl1-3/+17
The MQD might not be initialized duirng first init period if the device need to be reset druing probe. Driver need to proper init them in gpu recovery period Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Add kfd init_complete flag to check from amdgpu sideshaoyunl3-2/+11
amdgpu driver may be in reset state during init which will not initialize the kfd, driver need to initialize the KFD after reset by check the flag Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24Revert "drm/amdgpu: add psp RAP L0 check support"Kevin Wang1-9/+1
This reverts commit d86fd724e59af96ae5ab6630f4f07a076e9b80cd. Disable PSP RAP L0 self test until to RAP feature ready. Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Verify bo size can fit framebuffer size on init.Mark Yacoub3-15/+65
To initialize the framebuffer, call drm_gem_fb_init_with_funcs which verifies that the BO size can fit the FB size by calculating the minimum expected size of each plane. The bug was caught using igt-gpu-tools test: kms_addfb_basic.too-high and kms_addfb_basic.bo-too-small Tested on ChromeOS Zork by turning on the display and running a YT video. === Changes from v1 === 1. Added new line under declarations. 2. Use C style comment. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Sean Paul <seanpaul@chromium.org> Signed-off-by: Mark Yacoub <markyacoub@chromium.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Check if FB BAR is enabled for ROM readLijo Lazar1-0/+4
Some configurations don't have FB BAR enabled. Avoid reading ROM image from FB BAR region in such cases. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Set GTT_USWC flag to enable freesync v2Shashank Sharma1-3/+11
This patch sets 'AMDGPU_GEM_CREATE_CPU_GTT_USWC' as input parameter flag, during object creation of an imported DMA buffer. In absence of this flag: 1. Function amdgpu_display_supported_domains() doesn't add AMDGPU_GEM_DOMAIN_GTT as supported domain. 2. Due to which, Function amdgpu_display_user_framebuffer_create() refuses to create framebuffer for imported DMA buffers. 3. Due to which, AddFB() IOCTL fails. 4. Due to which, amdgpu_present_check_flip() check fails in DDX 5. Due to which DDX driver doesn't allow flips (goes to blitting) 6. Due to which setting Freesync/VRR property fails for PRIME buffers. So, this patch finally enables Freesync with PRIME buffer offloading. v2 (chk): instead of just checking the flag we copy it over if the exporter is an amdgpu device as well. Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: clean-up unused variableShashank Sharma1-8/+0
Variable 'bp' seems to be unused residue from previous logic, and is not required anymore. Cc: Koenig Christian <christian.koenig@amd.com> Cc: Deucher Alexander <alexander.deucher@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24Revert freesync video patches temporarilyAurabindo Pillai2-13/+0
This temporarily reverts freesync video patches since it causes regression with eDP displays. This patch is a squashed revert of the following patches: 6f59f229f8ed ("drm/amd/display: Skip modeset for front porch change") d10cd527f5e5 ("drm/amd/display: Add freesync video modes based on preferred modes") 0eb1af2e8205 ("drm/amd/display: Add module parameter for freesync video mode") Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Anson Jacob <anson.jacob@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Increase PSP runtime TMR region sizeOak Zeng2-3/+3
Aldebaran uses more than 4M runtime TMR. The current hard coded 4M TMR is not big enough for Aldebaran. Increase it to 8M. v2: Only do 8M size for ALDEBARAN (Hawking) Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdkfd: apply uncached flag for aldebaranEric Huang1-1/+7
The flag is only applied on fine-grained memory. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: set snoop bit in pde/pte entries for A+AEric Huang1-0/+4
Page tables in vram mapping to cpu is changed from uncached to cached in A+A, the snoop bit in VM_CONTEXTx_PAGE_TABLE_BASE_ADDR/ PDE0s/PDE1s/PDE2s/PTE.TFs has to be set so gpuvm walker snoop page table data out of CPU cache. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: set CPU mapping of vram as cached for A+A modeEric Huang1-1/+4
New A+A HW supports cached vram mapped to cpu. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: harvest edc status when connected to host via xGMIDennis Li7-28/+107
When connected to a host via xGMI, system fatal errors may trigger warm reset, driver has no change to query edc status before reset. Therefore in this case, driver should harvest previous error loging registers during boot, instead of only resetting them. v2: 1. IP's ras_manager object is created when its ras feature is enabled, so change to query edc status after amdgpu_ras_late_init called 2. change to enable watchdog timer after finishing gfx edc init Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reivewed-by: Hawking Zhang <hawking.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Make noretry the default on AldebaranFelix Kuehling1-0/+1
This is needed for best machine learning performance. XNACK can still be enabled per-process if needed. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: update default timeout of Aldebaran SQ watchdogHarish Kasiviswanathan2-2/+9
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reivewed-by: Hawking Zhang <hawking.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: add psp RAP L0 check supportKevin Wang1-1/+9
add PSP RAP L0 check when RAP TA is loaded. Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: change psp_rap_invoke() function return valueKevin Wang3-16/+22
RAP TA is an optional firmware. if it doesn’t exist, the driver should bypass psp_rap_invoke() function. 1. bypass psp_rap_invoke() when RAP TA is not loaded. 2. add new parameter (status) to query RAP TA status. (the status value is different with psp_ta_invoke(), 3. fix the 'rap_status' MThread critical problem. (used without lock) Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: Let KFD use more VMIDs on AldebaranFelix Kuehling1-1/+2
When there is no graphics support, KFD can use more of the VMIDs. Graphics VMIDs are only used for video decoding/encoding and post processing. With two VCE engines, there is no reason to reserve more than 2 VMIDs for that. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: enable watchdog feature for SQ of aldebaranDennis Li8-0/+174
SQ's watchdog timer monitors forward progress, a mask of which waves caused the watchdog timeout is recorded into ras status registers and then trigger a system fatal error event. v2: 1. change *query_timeout_status to *query_sq_timeout_status. 2. move query_sq_timeout_status into amdgpu_ras_do_recovery. 3. add module parameters to enable/disable fatal error event and modify the watchdog timer. v3: 1. remove unused parameters of *enable_watchdog_timer Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>