kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c, branch v5.10.28

drm/amdgpu: fix compute queue priority if num_kcq is less than 4

2020-12-30T10:53:09+00:00

[ Upstream commit 3f66bf401e9fde1c35bb8b02dd7975659c40411d ] Compute queues are configurable with module param, num_kcq. amdgpu_gfx_is_high_priority_compute_queue was setting 1st 4 queues to high priority queue leaving a null drm scheduler in adev->gpu_sched[hw_ip]["normal_prio"].sched if num_kcq < 5. This patch tries to fix it by alternating compute queue priority between normal and high priority. Fixes: 33abcb1f5a1719b1c (drm/amdgpu: set compute queue priority at mqd_init) Signed-off-by: Nirmoy Das Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: Avoid accessing HW when suspending SW state

2020-09-15T21:24:39+00:00

At this point the ASIC is already post reset by the HW/PSP so the HW not in proper state to be configured for suspension, some blocks might be even gated and so best is to avoid touching it. v2: Rename in_dpc to more meaningful name Signed-off-by: Andrey Grodzovsky Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: refine message print for devices of hive

2020-08-24T16:24:06+00:00

Using dev_xxx instead of DRM_xxx/pr_xxx to indicate which device of a hive is the message for. Reviewed-by: Christian König Signed-off-by: Dennis Li Signed-off-by: Alex Deucher

drm/amdgpu: refine codes to avoid reentering GPU recovery

2020-08-24T16:22:56+00:00

if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: Hawking Zhang Signed-off-by: Dennis Li Signed-off-by: Alex Deucher

drm/amdgpu: revert "fix system hang issue during GPU reset"

2020-08-14T20:22:40+00:00

The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929. Signed-off-by: Christian König Acked-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: reconfigure spm golden settings on Navi1x after GFXOFF exit(v3)

2020-08-14T20:22:39+00:00

On Navi1x, the SPM golden settings are lost after GFXOFF enter/exit, so reconfigure the golden settings after GFXOFF exit. Reviewed-by: Hawking Zhang Signed-off-by: Tianci.Yin Signed-off-by: Alex Deucher

drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)

2020-08-04T21:27:29+00:00

what: the MQD's save and restore of KCQ (kernel compute queue) cost lots of clocks during world switch which impacts a lot to multi-VF performance how: introduce a paramter to control the number of KCQ to avoid performance drop if there is no kernel compute queue needed notes: this paramter only affects gfx 8/9/10 v2: refine namings v3: choose queues for each ring to that try best to cross pipes evenly. v4: fix indentation some cleanupsin the gfx_compute_queue_acquire() v5: further fix on indentations more cleanupsin gfx_compute_queue_acquire() TODO: in the future we will let hypervisor driver to set this paramter automatically thus no need for user to configure it through modprobe in virtual machine Signed-off-by: Monk Liu Reviewed-by: Felix Kuehling Acked-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: fix system hang issue during GPU reset

2020-07-27T20:21:37+00:00

when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: Hawking Zhang Suggested-by: Andrey Grodzovsky Suggested-by: Christian König Suggested-by: Felix Kuehling Suggested-by: Lijo Lazar Suggested-by: Luben Tukov Signed-off-by: Dennis Li Signed-off-by: Alex Deucher

drm/amdgpu: add read amdgpu_gfxoff status in debugfs

2020-07-21T19:37:37+00:00

Add interface for SMU12 device, used by UMR. v2: fix code style Signed-off-by: Jinzhou.Su Reviewed-by: Evan Quan Reviewed-by: Huang Rui Signed-off-by: Alex Deucher

drm/amdgpu: Rename amdgpu_gfx_kcq_queue_mask_transform()

2020-05-01T19:19:07+00:00

Rename it to amdgpu_queue_mask_bit_to_set_resource_bit() to be more specific about its functionality. KFD will use it later. Signed-off-by: Yong Zhao Signed-off-by: Alex Deucher