kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v6.11.8

drm/amdgpu: sysfs node disable query error count during gpu reset

2024-07-08T20:46:14+00:00

Sysfs node disable query error count during gpu reset. Signed-off-by: YiPeng Chai Reviewed-by: Stanley.Yang Signed-off-by: Alex Deucher

drm/amdgpu: process RAS fatal error MB notification

2024-06-27T21:31:37+00:00

For RAS error scenario, VF guest driver will check mailbox and set fed flag to avoid unnecessary HW accesses. additionally, poll for reset completion message first to avoid accidentally spamming multiple reset requests to host. v2: add another mailbox check for handling case where kfd detects timeout first v3: set host_flr bit and use wait_for_reset Signed-off-by: Vignesh Chander Reviewed-by: Zhigang Luo Signed-off-by: Alex Deucher

drm/amdgpu: Fix pci state save during mode-1 reset

2024-06-27T21:09:46+00:00

Cache the PCI state before bus master is disabled. The saved state is later used for other cases like restoring config space after mode-2 reset. Fixes: 5c03e5843e6b ("drm/amdgpu:add smu mode1/2 support for aldebaran") Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: fix using the reserved VMID with gang submit

2024-06-19T16:48:00+00:00

We need to ensure that even when using a reserved VMID that the gang members can still run in parallel. Signed-off-by: Christian König Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: create amdgpu_ras_in_recovery to simplify code

2024-06-14T20:18:26+00:00

Reduce redundant code and user doesn't need to pay attention to RAS details. Signed-off-by: Tao Zhou Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: refine gpu_info firmware loading

2024-06-14T20:15:59+00:00

refine gpu_info firmware loading Signed-off-by: Yang Wang Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: fix sriov host flr handler

2024-06-14T20:15:58+00:00

We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen. In the current state since we take tens of seconds to stop everything, it is very likely that host would give up waiting and reset the GPU before we send ready, so it would be the same as before. But this gets rid of the hack with reset_domain locking and also let us tell how slow ready to reset actually is from the host. The ready to reset speed can be improved later. Signed-off-by: Yunxiang Li Acked-by: Christian König Reviewed-by: Emily Deng Signed-off-by: Alex Deucher

drm/amdkfd: add reset cause in gpu pre-reset smi event

2024-06-05T15:25:14+00:00

reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources suggested by Lijo Lazar Signed-off-by: Eric Huang Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher

drm/amdgpu: Add lock around VF RLCG interface

2024-05-29T18:48:30+00:00

flush_gpu_tlb may be called from another thread while device_gpu_recover is running. Both of these threads access registers through the VF RLCG interface during VF Full Access. Add a lock around this interface to prevent race conditions between these threads. Signed-off-by: Victor Skvortsov Reviewed-by: Zhigang Luo Signed-off-by: Alex Deucher

drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()

2024-05-29T18:08:44+00:00

Use current speed/width on devices which don't support dynamic PCIe switching. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289 Acked-by: Christian König Signed-off-by: Alex Deucher