kernel/linux.git/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c, branch v6.12.80

drm/amdkfd: Use the correct wptr size

2024-12-09T09:41:14+00:00

commit cdc6705f98ea3f854a60ba8c9b19228e197ae384 upstream. Write pointer could be 32-bit or 64-bit. Use the correct size during initialization. Signed-off-by: Lijo Lazar Acked-by: Alex Deucher Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Use device based logging for errors

2024-07-01T20:10:47+00:00

Convert some pr_* to some dev_* APIs to identify the device. Signed-off-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu/kfd: remove is_hws_hang and is_resetting

2024-06-14T20:15:58+00:00

is_hws_hang and is_resetting serves pretty much the same purpose and they all duplicates the work of the reset_domain lock, just check that directly instead. This also eliminate a few bugs listed below and get rid of dqm->ops.pre_reset. kfd_hws_hang did not need to avoid scheduling another reset. If the on-going reset decided to skip GPU reset we have a bad time, otherwise the extra reset will get cancelled anyway. remove_queue_mes forgot to check is_resetting flag compared to the pre-MES path unmap_queue_cpsch, so it did not block hw access during reset correctly. Signed-off-by: Yunxiang Li Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Skip packet submission on fatal error

2024-02-26T16:14:31+00:00

If fatal error is detected, packet submission won't go through. Return error in such cases. Also, avoid waiting for fence when fatal error is detected. Signed-off-by: Lijo Lazar Reviewed-by: Asad Kamal Signed-off-by: Alex Deucher

drm/amdkfd: Introduce kfd_node struct (v5)

2023-06-09T13:42:27+00:00

Introduce a new structure, kfd_node, which will now represent a compute node. kfd_node is carved out of kfd_dev structure. kfd_dev struct now will become the parent of kfd_node, and will store common resources such as doorbells, GTT sub-alloctor etc. kfd_node struct will store all resources specific to a compute node, such as device queue manager, interrupt handling etc. This is the first step in adding compute partition support in KFD. v2: introduce kfd_node struct to gc v11 (Hawking) v3: make reference to kfd_dev struct through kfd_node (Morris) v4: use kfd_node instead for kfd isr/mqd functions (Morris) v5: rebase (Alex) Signed-off-by: Mukul Joshi Tested-by: Amber Lin Reviewed-by: Felix Kuehling Signed-off-by: Hawking Zhang Signed-off-by: Morris Zhang Signed-off-by: Alex Deucher

drm/amdkfd: update SPDX license header

2022-02-14T20:08:40+00:00

Update the SPDX License header for all the KFD files. Reviewed-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: Alex Deucher

drm/amdkfd: add kfd_device_info_init function

2021-12-01T21:15:37+00:00

Initializes kfd->device_info given either asic_type (enum) if GFX version is less than GFX9, or GC IP version if greater. Also takes in vf and the target compiler gfx version. Uses SDMA version to determine num_sdma_queues_per_engine. Convert device_info to a non-pointer member of kfd, change references accordingly. Change unsupported asic condition to only probe f2g, move device_info initialization post-switch. Signed-off-by: Graham Sider Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: replace asic_family with asic_type

2021-11-17T22:10:01+00:00

asic_family was a duplicate of asic_type, both of type amd_asic_type. Replace all instances of device_info->asic_family with adev->asic_type and remove asic_family from device_info. Signed-off-by: Graham Sider Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Remove cu mask from struct queue_properties(v2)

2021-10-28T18:26:13+00:00

Actually, cu_mask has been copied to mqd memory and does't have to persist in queue_properties. Remove it from queue_properties. And use struct mqd_update_info to store such properties, then pass it to update queue operation. v2: * Rename pqm_update_queue to pqm_update_queue_properties. * Rename struct queue_update_info to struct mqd_update_info. * Rename pqm_set_cu_mask to pqm_update_mqd. Suggested-by: Felix Kuehling Signed-off-by: Lang Yu Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Enable over-subscription with >1 GWS queue

2020-04-28T20:20:30+00:00

The current GWS usage model will only allows a single GWS-enabled process to be active on the GPU at once. This ensures that a barrier-using kernel gets a known amount of GPU hardware, to prevent deadlock due to inability to go beyond the GWS barrier. The HWS watches how many GWS entries are assigned to each process, and goes into over-subscription mode when two processes need more than the 64 that are available. The current KFD method for working with this is to allocate all 64 GWS entries to each GWS-capable process. When more than one GWS-enabled process is in the runlist, we must make sure the runlist is in over-subscription mode, so that the HWS gets a chained RUN_LIST packet and continues scheduling kernels. Signed-off-by: Joseph Greathouse Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Signed-off-by: Alex Deucher