kernel/linux.git/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c, branch v7.0-rc7

drm/amdgpu/userq: fix memory leak in MQD creation error paths

2026-03-30T20:13:47+00:00

In mes_userq_mqd_create(), the memdup_user() allocations for IP-specific MQD structs are not freed when subsequent VA validation fails. The goto free_mqd label only cleans up the MQD BO object and userq_props. Fix by adding kfree() before each goto free_mqd on VA validation failure in the COMPUTE, GFX, and SDMA branches. Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size") Reported-by: Yuhao Jiang Signed-off-by: Junrui Luo Reviewed-by: Prike Liang Signed-off-by: Alex Deucher (cherry picked from commit 27f5ff9e4a4150d7cf8b4085aedd3b77ddcc5d08) Cc: stable@vger.kernel.org

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

drm/amdgpu: Use AMDGPU_MQD_SIZE_ALIGN in KGD

2026-01-29T17:26:55+00:00

Use AMDGPU_MQD_SIZE_ALIGN for both kernel and user queue. Signed-off-by: Lang Yu Reviewed-by: David Belanger Reviewed-by: Hawking Zhang Reviewed-by: Mukul Joshi Signed-off-by: Alex Deucher

drm/amdgpu: do not use amdgpu_bo_gpu_offset_no_check individually

2025-12-16T18:27:13+00:00

This should not be used indiviually, use amdgpu_bo_gpu_offset with bo reserved. v3 - unpin bo in queue destroy (Christian) v2 - pin bo so that offset returned won't change after unlock (Christian) Signed-off-by: Saleemkhan Jamadar Suggested-by: Christian König Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Change user queue interface signatures

2025-12-08T18:56:39+00:00

A userq is associated with its queue manager. Use that and make the userqueue interfaces to operate on queue. Signed-off-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: Update vm start, end, hole to support 57bit address

2025-12-08T18:56:30+00:00

Change gmc macro AMDGPU_GMC_HOLE_START/END/MASK to 57bit if vm root level is PDB3 for 5-level page tables. The macro access adev without passing adev as parameter is to minimize the code change to support 57bit, then we have to add adev variable in several places to use the macro. Because adev definition is not available in all amdgpu c files which include amdgpu_gmc.h, change inline function amdgpu_gmc_sign_extend to macro. Signed-off-by: Philip Yang Acked-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu/mes: add multi-xcc support

2025-12-08T18:56:29+00:00

a. extend mes pipe instances to num_xcc * max_mes_pipe b. initialize mes schq/kiq pipes per xcc c. submit mes packet to mes ring according to xcc_id v2: rebase (Alex) Signed-off-by: Jack Xiao Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: resume MES scheduling after user queue hang detection and recovery

2025-11-12T02:54:17+00:00

This patch ensures the Micro-Engine Scheduler (MES) is properly resumed after detecting and recovering from a user queue hang condition. Key changes: 1. Track when a hung user queue is detected using found_hung_queue flag 2. Call amdgpu_mes_resume() to restart MES scheduling after completing the hang recovery process 3. This complements the existing recovery steps (fence force completion and device wedging) by ensuring the scheduler can process new work Without this resume call, the MES scheduler may remain in a paused state even after the hung queue has been handled, preventing newly submitted work from being processed and leading to system stalls. Acked-by: Alex Deucher Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

drm/amdgpu/userq: fix SDMA and compute validation

2025-10-28T13:59:48+00:00

The CSA and EOP buffers have different alignement requirements. Hardcode them for now as a bug fix. A proper query will be added in a subsequent patch. v2: verify gfx shadow helper callback (Prike) Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size") Reviewed-by: Prike Liang Signed-off-by: Alex Deucher