kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h, branch v6.6.132

Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

2022-10-19T02:08:33+00:00

This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957. This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with the original design of reset handler. Will redesign it. Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure") Signed-off-by: Victor Zhao Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher

drm/amdgpu: Skip put_reset_domain if it doesn't exist

2022-09-29T13:43:52+00:00

For xgmi sriov, the reset is handled by host driver and hive->reset_domain is not initialized so need to check if it exists before doing a put. Signed-off-by: Vignesh Chander Reviewed-by: Shaoyun Liu Signed-off-by: Alex Deucher

drm/amdgpu: Adjust removal control flow for smu v13_0_2

2022-09-19T19:17:20+00:00

Adjust removal control flow for smu v13_0_2: During amdgpu uninstallation, when removing the first device, the kernel needs to first send a mode1reset message to all gpu devices. Otherwise, smu initialization will fail the next time amdgpu is installed. V2: 1. Update commit comments. 2. Remove the global variable amdgpu_device_remove_cnt and add a variable to the structure amdgpu_hive_info. 3. Use hive to detect the first removed device instead of a global variable. V3: 1. Update commit comments. 2. Split a patch into multiple patches. 3. The current patch does: a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into the existing gpu recover path, which make all devices in hive list only have HW reset but no resume (except the base IP). b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover in amdgpu_pci_remove when removing the first device in hive list. c. When removing the first device, the IP blocks keyword function call sequence is as follows: .suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini. ^ | |-<----------<---------<----| The first three sequences are because of a call to amdgpu_device_gpu_recover. The three sequences will be executed in a loop until all devices in the hive list are iterated. The sequences starting from .hw_fini only apply to the first device. Since .suspend has been called before, except the resumed phase1 basic ip blocks, all other ip blocks .hw_fini of current device will do nothing. d. When removing other devices, the calling sequences is the same as legacy: .hw_fini -> .early_fini -> .sw_fini. Since .suspend has been called when removing the first device, except the resumed phase1 basic ip blocks, all of other ip blocks .hw_fini of current device will do nothing. Signed-off-by: YiPeng Chai Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: let mode2 reset fallback to default when failure

2022-08-16T22:14:31+00:00

- introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed v2: move this part out from the asic specific part Signed-off-by: Victor Zhao Acked-by: Andrey Grodzovsky Signed-off-by: Alex Deucher

drm/amdgpu: Avoid another list of reset devices

2022-08-10T19:07:14+00:00

A list of devices to be reset is already created in amdgpu_device_gpu_recover function. Creating another list with the same nodes is incorrect and not supported in list_head. Instead, pass the device list as part of reset context. Fixes: 9e08564727fc (drm/amdgpu: Refactor mode2 reset logic for v13.0.2) Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: Cache result of last reset at reset domain level.

2022-06-10T19:25:34+00:00

Will be read by executors of async reset like debugfs. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Fix compile error.

2022-02-10T09:23:40+00:00

Seems I forgot to add this to the relevant commit when submitting. Signed-off-by: Andrey Grodzovsky Reported-by: kernel test robot Reviewed-by: Christian König Signed-off-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20220210031724.440943-1-andrey.grodzovsky@amd.com

drm/amdgpu: Rework amdgpu_device_lock_adev

2022-02-09T17:18:39+00:00

This functions needs to be split into 2 parts where one is called only once for locking single instance of reset_domain's sem and reset flag and the other part which handles MP1 states should still be called for each device in XGMI hive. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Link: https://www.spinics.net/lists/amd-gfx/msg74118.html

drm/amdgpu: Move in_gpu_reset into reset_domain

2022-02-09T17:17:57+00:00

We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky Suggested-by: Lijo Lazar Reviewed-by: Christian König Link: https://www.spinics.net/lists/amd-gfx/msg74116.html

drm/amdgpu: Move reset sem into reset_domain

2022-02-09T17:17:32+00:00

We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Link: https://www.spinics.net/lists/amd-gfx/msg74117.html