kernel/linux.git/drivers/gpu/drm/amd/amdkfd, branch v7.0.12

drm/amdkfd: Check for pdd drm file first in CRIU restore path

2026-06-09T10:32:47+00:00

commit 6842b6a4b72da9b2906ffc5ca9d846ace2c54c14 upstream. CRIU restore ioctls are meant to be called by CRIU with no existing drm file. There's an error path for if the drm file unexpectedly exists. It was positioned so it was missing a fput(drm_file). Do that check earlier, as soon as we have the pdd. Signed-off-by: David Francis Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit 2bab781dac78916c5cc8de76345a4102449267d7) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: fix a vulnerability of integer overflow in kfd debugger

2026-06-09T10:32:47+00:00

commit 93f5534b35a05ef8a0109c1eefa800062fee810a upstream. get_queue_ids() computes array_size = num_queues * sizeof(uint32_t), which could overflow on 32-bit size_t build. using array_size() instead, it saturates to SIZE_MAX on overflow. Signed-off-by: Eric Huang Acked-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit 2d57a0475f085c08b49312dfd8edcb461845f285) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: fix NULL pointer bug in svm_range_set_attr

2026-06-09T10:32:47+00:00

commit e984d61d92e702096058f0f828f4b2b8563b88ce upstream. The process_info could be NULL if user doesn't call kfd_ioctl_acquire_vm before calling kfd_ioctl_svm. Signed-off-by: Eric Huang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit 83a26c812e0529eb040d31a76f73e33e637243d4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Update queue properties for metadata ring

2026-05-23T11:08:46+00:00

[ Upstream commit 189208d3d503090d95a39e85433bd608a0d84511 ] Metadata ring and queue ring is allocated as one buffer and map to GPU, so update queue peoperties should add the queue metadata size and ring size as buffer size to validate queue ring buffer. Fixes: c51bb53d5c68 ("drm/amdkfd: Add metadata ring buffer for compute") Signed-off-by: Philip Yang Reviewed-by: Alex Sierra Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: GFX12.1 scratch memory limit up to 57-bit

2026-05-23T11:08:43+00:00

[ Upstream commit b2d13a41da94008fdd3786b396a6375c12454522 ] The scratch aperture or gmc private aperture in flat memory contains 57 bits of data on gfx v12.1.0 compared to the 32 bits from previous. Add new helper kfd_init_apertures_v12 for gfx version >= v12.1.0 which supports 57-bit VA space. v2: - update pdd->scratch_limit (Yu, Lang) - update fixes tag (Felix Kuehling) - add helper kfd_init_apertures_v12 Fixes: db1882b3ff0c ("drm/amdkfd: Update LDS, Scratch base for 57bit address") Signed-off-by: Philip Yang Reviewed-by: Lang Yu Acked-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdkfd: Removed commented line for MQD queue priority

2026-05-23T11:08:42+00:00

[ Upstream commit bfe60e539cf7690a6739466b41fb6be250bb783e ] Missed deleting the commented line in the original patch. Fixes: 73463e26f7e2 ("drm/amdkfd: Disable MQD queue priority") Signed-off-by: Andrew Martin Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdkfd: Make all TLB-flushes heavy-weight

2026-05-17T15:16:32+00:00

commit 9b4e3495d1bd2469bf94b74930c153c2d534ddb7 upstream. With only one sequence number we cannot track the need for legacy vs heavy-weight flushes reliably. Always use heavy-weight. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang Signed-off-by: Alex Deucher (cherry picked from commit c1a3ff1d327820cd9a52bc1056b98681fc088949) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: validate SVM ioctl nattr against buffer size

2026-05-17T15:16:31+00:00

commit 045e0ff208f0838a246c10204105126611b267a1 upstream. Validate nattr field against the buffer size, preventing out-of-bounds buffer access via user-controlled attribute count. Reviewed-by: Amir Shetaia Signed-off-by: Alysa Liu Signed-off-by: Alex Deucher (cherry picked from commit 5eca8bfdfa456c3304ca77523718fe24254c172f) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Add upper bound check for num_of_nodes

2026-05-17T15:16:30+00:00

commit 74b73fa56a395d46745e4f245225963e9f8be7f1 upstream. drm/amdkfd: Add upper bound check for num_of_nodes in kfd_ioctl_get_process_apertures_new. Reviewed-by: Harish Kasiviswanathan Signed-off-by: Alysa Liu Signed-off-by: Alex Deucher (cherry picked from commit 98ff46a5ea090c14d2cdb4f5b993b05d74f3949f) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Fix queue preemption/eviction failures by aligning control stack size to GPU page size

2026-03-30T20:22:44+00:00

The control stack size is calculated based on the number of CUs and waves, and is then aligned to PAGE_SIZE. When the resulting control stack size is aligned to 64 KB, GPU hangs and queue preemption failures are observed while running RCCL unit tests on systems with more than two GPUs. amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with doorbell_id: 80030008 amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4 amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with doorbell_id: 80030008 amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues This issue is observed on both 4 KB and 64 KB system page-size configurations. This patch fixes the issue by aligning the control stack size to AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size will not be 64 KB on systems with a 64 KB page size and queue preemption works correctly. Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE, which can waste memory if the system page size is large. In this patch, wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated from wg_data_size and the control stack size, is aligned to PAGE_SIZE. Reviewed-by: Felix Kuehling Signed-off-by: Donet Tom Signed-off-by: Alex Deucher (cherry picked from commit a3e14436304392fbada359edd0f1d1659850c9b7)