kernel/linux.git/drivers/gpu/drm/amd/amdkfd, branch v6.6.143

drm/amdkfd: Fix buffer overflow in SDMA queue checkpoint/restore on GFX11

2026-06-19T11:39:37+00:00

commit 352ea59028ea48a6fff77f19ae28f98f71946a80 upstream. The v11 MQD manager incorrectly assigned the CP-compute variants of checkpoint_mqd/restore_mqd for KFD_MQD_TYPE_SDMA queues. These functions use sizeof(struct v11_compute_mqd) (2048 bytes) instead of sizeof(struct v11_sdma_mqd) (512 bytes), causing a 1536-byte overflow. During CRIU checkpoint of an SDMA queue on Navi3x: - checkpoint_mqd() reads 2048 bytes from a 512-byte SDMA MQD buffer, leaking 1536 bytes of adjacent GTT memory to userspace During CRIU restore: - restore_mqd() writes 2048 bytes into a 512-byte SDMA MQD buffer, corrupting 1536 bytes of adjacent GTT memory (often the ring buffer or neighboring MQDs) This is a copy-paste regression unique to v11. All other ASIC backends (cik, vi, v9, v10, v12) correctly use the SDMA-specific variants. Add checkpoint_mqd_sdma() and restore_mqd_sdma() functions that properly handle the smaller v11_sdma_mqd structure, matching the pattern used in other MQD managers. Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3") Assisted-by: Claude:Sonnet 4-5 Signed-off-by: Andrew Martin Acked-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit 6fa41db7ffdec97d62433adf03b7b9b759af8c2c) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: fix NULL dereference in get_queue_ids()

2026-06-19T11:39:37+00:00

commit 2bd550b547deabef98bd3b017ff743b7c34d3a6d upstream. When usr_queue_id_array is NULL and num_queues is non-zero, get_queue_ids() returns NULL. The callers check only IS_ERR() on the return value; since IS_ERR(NULL) == false the check passes, and suspend_queues() calls q_array_invalidate() which immediately dereferences NULL while iterating num_queues times. Userspace can trigger this via kfd_ioctl_set_debug_trap() by supplying num_queues > 0 with a zero queue_array_ptr, causing a kernel panic. A NULL usr_queue_id_array with num_queues == 0 is a legitimate no-op (q_array_invalidate never executes, and resume_queues already guards all queue_ids dereferences behind a NULL check). Return ERR_PTR(-EINVAL) only when num_queues is non-zero and the pointer is absent; both callers already propagate IS_ERR() returns correctly to userspace. Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process queues operation") Signed-off-by: Muhammad Bilal Signed-off-by: Alex Deucher (cherry picked from commit f165a82cdf503884bb1797771c61b2fcc72113d4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Check for pdd drm file first in CRIU restore path

2026-06-19T11:39:23+00:00

commit 6842b6a4b72da9b2906ffc5ca9d846ace2c54c14 upstream. CRIU restore ioctls are meant to be called by CRIU with no existing drm file. There's an error path for if the drm file unexpectedly exists. It was positioned so it was missing a fput(drm_file). Do that check earlier, as soon as we have the pdd. Signed-off-by: David Francis Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit 2bab781dac78916c5cc8de76345a4102449267d7) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: fix a vulnerability of integer overflow in kfd debugger

2026-06-19T11:39:23+00:00

commit 93f5534b35a05ef8a0109c1eefa800062fee810a upstream. get_queue_ids() computes array_size = num_queues * sizeof(uint32_t), which could overflow on 32-bit size_t build. using array_size() instead, it saturates to SIZE_MAX on overflow. Signed-off-by: Eric Huang Acked-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit 2d57a0475f085c08b49312dfd8edcb461845f285) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: fix NULL pointer bug in svm_range_set_attr

2026-06-19T11:39:23+00:00

commit e984d61d92e702096058f0f828f4b2b8563b88ce upstream. The process_info could be NULL if user doesn't call kfd_ioctl_acquire_vm before calling kfd_ioctl_svm. Signed-off-by: Eric Huang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit 83a26c812e0529eb040d31a76f73e33e637243d4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: validate SVM ioctl nattr against buffer size

2026-05-17T15:13:46+00:00

commit 045e0ff208f0838a246c10204105126611b267a1 upstream. Validate nattr field against the buffer size, preventing out-of-bounds buffer access via user-controlled attribute count. Reviewed-by: Amir Shetaia Signed-off-by: Alysa Liu Signed-off-by: Alex Deucher (cherry picked from commit 5eca8bfdfa456c3304ca77523718fe24254c172f) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Add upper bound check for num_of_nodes

2026-05-17T15:13:46+00:00

commit 74b73fa56a395d46745e4f245225963e9f8be7f1 upstream. drm/amdkfd: Add upper bound check for num_of_nodes in kfd_ioctl_get_process_apertures_new. Reviewed-by: Harish Kasiviswanathan Signed-off-by: Alysa Liu Signed-off-by: Alex Deucher (cherry picked from commit 98ff46a5ea090c14d2cdb4f5b993b05d74f3949f) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Fix out-of-bounds write in kfd_event_page_set()

2026-03-04T12:21:15+00:00

[ Upstream commit 8a70a26c9f34baea6c3199a9862ddaff4554a96d ] The kfd_event_page_set() function writes KFD_SIGNAL_EVENT_LIMIT * 8 bytes via memset without checking the buffer size parameter. This allows unprivileged userspace to trigger an out-of bounds kernel memory write by passing a small buffer, leading to potential privilege escalation. Signed-off-by: Sunday Clement Reviewed-by: Alexander Deucher Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin

drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()

2026-03-04T12:20:40+00:00

[ Upstream commit 6c160001661b6c4e20f5c31909c722741e14c2d8 ] In svm_migrate_gart_map(), while migrating GART mapping, the number of bytes copied for the GART table only accounts for CPU pages. On non-4K systems, each CPU page can contain multiple GPU pages, and the GART requires one 8-byte PTE per GPU page. As a result, an incorrect size was passed to the DMA, causing only a partial update of the GART table. Fix this function to work correctly on non-4K page-size systems by accounting for the number of GPU pages per CPU page when calculating the number of bytes to be copied. Acked-by: Christian König Reviewed-by: Philip Yang Signed-off-by: Ritesh Harjani (IBM) Signed-off-by: Donet Tom Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdkfd: Fix watch_id bounds checking in debug address watch v2

2026-03-04T12:20:26+00:00

[ Upstream commit 5a19302cab5cec7ae7f1a60c619951e6c17d8742 ] The address watch clear code receives watch_id as an unsigned value (u32), but some helper functions were using a signed int and checked bits by shifting with watch_id. If a very large watch_id is passed from userspace, it can be converted to a negative value. This can cause invalid shifts and may access memory outside the watch_points array. drm/amdkfd: Fix watch_id bounds checking in debug address watch v2 Fix this by checking that watch_id is within MAX_WATCH_ADDRESSES before using it. Also use BIT(watch_id) to test and clear bits safely. This keeps the behavior unchanged for valid watch IDs and avoids undefined behavior for invalid ones. Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:448 kfd_dbg_trap_clear_dev_address_watch() error: buffer overflow 'pdd->watch_points' 4 <= u32max user_rl='0-3,2147483648-u32max' uncapped drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c 433 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd, 434 uint32_t watch_id) 435 { 436 int r; 437 438 if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id)) kfd_dbg_owns_dev_watch_id() doesn't check for negative values so if watch_id is larger than INT_MAX it leads to a buffer overflow. (Negative shifts are undefined). 439 return -EINVAL; 440 441 if (!pdd->dev->kfd->shared_resources.enable_mes) { 442 r = debug_lock_and_unmap(pdd->dev->dqm); 443 if (r) 444 return r; 445 } 446 447 amdgpu_gfx_off_ctrl(pdd->dev->adev, false); --> 448 pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch( 449 pdd->dev->adev, 450 watch_id); v2: (as per, Jonathan Kim) - Add early watch_id >= MAX_WATCH_ADDRESSES validation in the set path to match the clear path. - Drop the redundant bounds check in kfd_dbg_owns_dev_watch_id(). Fixes: e0f85f4690d0 ("drm/amdkfd: add debug set and clear address watch points operation") Reported-by: Dan Carpenter Cc: Jonathan Kim Cc: Felix Kuehling Cc: Alex Deucher Cc: Christian König Signed-off-by: Srinivasan Shanmugam Reviewed-by: Jonathan Kim Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin