kernel/linux.git/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c, branch v6.19.11

drm/amdgpu: resume MES scheduling after user queue hang detection and recovery

2025-11-12T02:54:17+00:00

This patch ensures the Micro-Engine Scheduler (MES) is properly resumed after detecting and recovering from a user queue hang condition. Key changes: 1. Track when a hung user queue is detected using found_hung_queue flag 2. Call amdgpu_mes_resume() to restart MES scheduling after completing the hang recovery process 3. This complements the existing recovery steps (fence force completion and device wedging) by ensuring the scheduler can process new work Without this resume call, the MES scheduler may remain in a paused state even after the hung queue has been handled, preventing newly submitted work from being processed and leading to system stalls. Acked-by: Alex Deucher Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

drm/amdgpu/userq: fix SDMA and compute validation

2025-10-28T13:59:48+00:00

The CSA and EOP buffers have different alignement requirements. Hardcode them for now as a bug fix. A proper query will be added in a subsequent patch. v2: verify gfx shadow helper callback (Prike) Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size") Reviewed-by: Prike Liang Signed-off-by: Alex Deucher

drm/amdgpu: Convert amdgpu userqueue management from IDR to XArray

2025-10-28T13:59:22+00:00

This commit refactors the AMDGPU userqueue management subsystem to replace IDR (ID Allocation) with XArray for improved performance, scalability, and maintainability. The changes address several issues with the previous IDR implementation and provide better locking semantics. Key changes: 1. **Global XArray Introduction**: - Added `userq_doorbell_xa` to `struct amdgpu_device` for global queue tracking - Uses doorbell_index as key for efficient global lookup - Replaces the previous `userq_mgr_list` linked list approach 2. **Per-process XArray Conversion**: - Replaced `userq_idr` with `userq_mgr_xa` in `struct amdgpu_userq_mgr` - Maintains per-process queue tracking with queue_id as key - Uses XA_FLAGS_ALLOC for automatic ID allocation 3. **Locking Improvements**: - Removed global `userq_mutex` from `struct amdgpu_device` - Replaced with fine-grained XArray locking using XArray's internal spinlocks 4. **Runtime Idle Check Optimization**: - Updated `amdgpu_runtime_idle_check_userq()` to use xa_empty 5. **Queue Management Functions**: - Converted all IDR operations to equivalent XArray functions: - `idr_alloc()` → `xa_alloc()` - `idr_find()` → `xa_load()` - `idr_remove()` → `xa_erase()` - `idr_for_each()` → `xa_for_each()` Benefits: - **Performance**: XArray provides better scalability for large numbers of queues - **Memory Efficiency**: Reduced memory overhead compared to IDR - **Thread Safety**: Improved locking semantics with XArray's internal spinlocks v2: rename userq_global_xa/userq_xa to userq_doorbell_xa/userq_mgr_xa Remove xa_lock and use its own lock. v3: Set queue->userq_mgr = uq_mgr in amdgpu_userq_create() v4: use xa_store_irq (Christian) hold the read side of the reset lock while creating/destroying queues and the manager data structure. (Chritian) Acked-by: Alex Deucher Suggested-by: Christian König Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

drm/amdgpu: add userq object va track helpers

2025-10-13T18:14:33+00:00

Add the userq object virtual address list_add() helpers for tracking the userq obj va address usage. Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: fix hung reset queue array memory allocation

2025-10-13T18:14:28+00:00

By design the MES will return an array result that is twice the number of hung doorbells it can report. i.e. if up k reported doorbells are supported, then the second half of the array, also of length k, holds the HQD information (type/queue/pipe) where queue 1 corresponds to index 0 and k, queue 2 corresponds to index 1 and k + 1 etc ... The driver will use the HDQ info to target queue/pipe reset for hardware scheduled user compute queues. Signed-off-by: Jonathan Kim Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: adjust MES API used for suspend and resume

2025-09-15T21:02:28+00:00

Use the suspend and resume API rather than remove queue and add queue API. The former just preempts the queue while the latter remove it from the scheduler completely. There is no need to do that, we only need preemption in this case. V2: replace queue_active with queue state v3: set the suspend_fence_addr v4: allocate another per queue buffer for the suspend fence, and set the sequence number. also wait for the suspend fence. (Alex) v5: use a wb slot (Alex) v6: Change the timeout period. For MES, the default timeout is 2100000; /* 2100 ms */ (Alex) Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

drm/amdgpu: validate userq buffer virtual address and size

2025-09-15T20:52:15+00:00

It needs to validate the userq object virtual address to determine whether it is residented in a valid vm mapping. Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: validate userq input args

2025-09-09T20:17:57+00:00

This will help on validating the userq input args, and rejecting for the invalid userq request at the IOCTLs first place. Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu/userq: add a detect and reset callback

2025-09-05T21:38:39+00:00

Add a detect and reset callback and add the implementation for mes. The callback will detect all hung queues of a particular ip type (e.g., GFX or compute or SDMA) and reset them. v2: increase reset counter and set fence force completion v3: Removed userq_mutex in mes_userq_detect_and_reset since the driver holds it when calling Reviewed-by: Alex Deucher Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

drm/amdgpu/userq: use consistent function naming

2025-04-22T12:51:46+00:00

s/userqueue/userq/ 1. remove the mix of amdgpu_userqueue and amdgpu_userq 2. to be consistent with other amdgpu_userq_fence.c 3. it's shorter Reviewed-by: Prike Liang Signed-off-by: Alex Deucher