kernel/linux.git/drivers/gpu/drm/amd/amdkfd/kfd_queue.c, branch v6.12.80

drm/amdkfd: Relax size checking during queue buffer get

2026-03-04T12:21:05+00:00

[ Upstream commit 42ea9cf2f16b7131cb7302acb3dac510968f8bdc ] HW-supported EOP buffer sizes are 4K and 32K. On systems that do not use 4K pages, the minimum buffer object (BO) allocation size is PAGE_SIZE (for example, 64K). During queue buffer acquisition, the driver currently checks the allocated BO size against the supported EOP buffer size. Since the allocated BO is larger than the expected size, this check fails, preventing queue creation. Relax the strict size validation and allow PAGE_SIZE-sized BOs to be used. Only the required 4K region of the buffer will be used as the EOP buffer and avoids queue creation failures on non-4K page systems. Acked-by: Christian König Suggested-by: Philip Yang Signed-off-by: Donet Tom Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdkfd: bump minimum vgpr size for gfx1151

2026-01-08T09:14:54+00:00

commit cf326449637a566ba98fb82c47d46cd479608c88 upstream. GFX1151 has 1.5x the number of available physical VGPRs per SIMD. Bump total memory availability for acquire checks on queue creation. Signed-off-by: Jonathan Kim Reviewed-by: Mario Limonciello Signed-off-by: Alex Deucher (cherry picked from commit b42f3bf9536c9b710fd1d4deb7d1b0dc819dc72d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: relax checks for over allocation of save area

2025-11-24T09:36:03+00:00

commit d15deafab5d722afb9e2f83c5edcdef9d9d98bd1 upstream. Over allocation of save area is not fatal, only under allocation is. ROCm has various components that independently claim authority over save area size. Unless KFD decides to claim single authority, relax size checks. Signed-off-by: Jonathan Kim Reviewed-by: Philip Yang Signed-off-by: Alex Deucher (cherry picked from commit 15bd4958fe38e763bc17b607ba55155254a01f55) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Fix user queue validation on Gfx7/8

2025-03-28T21:03:32+00:00

commit 542c3bb836733a1325874310d54d25b4907ed10e upstream. To workaround queue full h/w issue on Gfx7/8, when application create AQL queue, the ring buffer bo allocate size is queue_size/2 and map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each attachment map size is queue_size/2, with same ring_bo backing memory. For Gfx7/8, user queue buffer validation should use queue_size/2 to verify ring_bo allocation and mapping size. Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers") Suggested-by: Tomáš Trnka Signed-off-by: Philip Yang Acked-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit e7a477735f1771b9a9346a5fbd09d7ff0641723a) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Fix NULL Pointer Dereference in KFD queue

2025-03-13T12:01:53+00:00

commit fd617ea3b79d2116d53f76cdb5a3601c0ba6e42f upstream. Through KFD IOCTL Fuzzing we encountered a NULL pointer derefrence when calling kfd_queue_acquire_buffers. Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size") Signed-off-by: Andrew Martin Reviewed-by: Philip Yang Signed-off-by: Andrew Martin Signed-off-by: Alex Deucher (cherry picked from commit 049e5bf3c8406f87c3d8e1958e0a16804fa1d530) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Handle queue destroy buffer access race

2024-08-13T14:43:16+00:00

Add helper function kfd_queue_unreference_buffers to reduce queue buffer refcount, separate it from release queue buffers. Because it is circular locking to hold dqm_lock to take vm lock, kfd_ioctl_destroy_queue should take vm lock, unreference queue buffers first, but not release queue buffers, to handle error in case failed to hold vm lock. Then hold dqm_lock to remove queue from queue list and then release queue buffers. Restore process worker restore queue hold dqm_lock, will always find the queue with valid queue buffers. v2 (Felix): - renamed kfd_queue_unreference_buffer(s) to kfd_queue_unref_bo_va(s) - added two FIXME comments for follow up Signed-off-by: Philip Yang Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Fix compile error if HMM support not enabled

2024-08-06T14:40:30+00:00

Fixes the below if kernel config not enable HMM support >> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:26: error: implicit declaration of function 'svm_range_from_addr' >> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:24: error: assignment to 'struct svm_range *' from 'int' makes pointer from integer without a cast [-Wint-conversion] >> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:111:28: error: invalid use of undefined type 'struct svm_range' Fixes: b049504e211e ("drm/amdkfd: Validate user queue svm memory residency") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202407252127.zvnxaKRA-lkp@intel.com/ Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Fix missing error code in kfd_queue_acquire_buffers

2024-07-27T21:29:37+00:00

The fix involves setting 'err' to '-EINVAL' before each 'goto out_err_unreserve'. Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:265 kfd_queue_acquire_buffers() warn: missing error code 'err' drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c 226 int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_properties *properties) 227 { 228 struct kfd_topology_device *topo_dev; 229 struct amdgpu_vm *vm; 230 u32 total_cwsr_size; 231 int err; 232 233 topo_dev = kfd_topology_device_by_id(pdd->dev->id); 234 if (!topo_dev) 235 return -EINVAL; 236 237 vm = drm_priv_to_vm(pdd->drm_priv); 238 err = amdgpu_bo_reserve(vm->root.bo, false); 239 if (err) 240 return err; 241 242 err = kfd_queue_buffer_get(vm, properties->write_ptr, &properties->wptr_bo, PAGE_SIZE); 243 if (err) 244 goto out_err_unreserve; 245 246 err = kfd_queue_buffer_get(vm, properties->read_ptr, &properties->rptr_bo, PAGE_SIZE); 247 if (err) 248 goto out_err_unreserve; 249 250 err = kfd_queue_buffer_get(vm, (void *)properties->queue_address, 251 &properties->ring_bo, properties->queue_size); 252 if (err) 253 goto out_err_unreserve; 254 255 /* only compute queue requires EOP buffer and CWSR area */ 256 if (properties->type != KFD_QUEUE_TYPE_COMPUTE) 257 goto out_unreserve; This is clearly a success path. 258 259 /* EOP buffer is not required for all ASICs */ 260 if (properties->eop_ring_buffer_address) { 261 if (properties->eop_ring_buffer_size != topo_dev->node_props.eop_buffer_size) { 262 pr_debug("queue eop bo size 0x%lx not equal to node eop buf size 0x%x\n", 263 properties->eop_buf_bo->tbo.base.size, 264 topo_dev->node_props.eop_buffer_size); --> 265 goto out_err_unreserve; This has err in the label name. err = -EINVAL? 266 } 267 err = kfd_queue_buffer_get(vm, (void *)properties->eop_ring_buffer_address, 268 &properties->eop_buf_bo, 269 properties->eop_ring_buffer_size); 270 if (err) 271 goto out_err_unreserve; 272 } 273 274 if (properties->ctl_stack_size != topo_dev->node_props.ctl_stack_size) { 275 pr_debug("queue ctl stack size 0x%x not equal to node ctl stack size 0x%x\n", 276 properties->ctl_stack_size, 277 topo_dev->node_props.ctl_stack_size); 278 goto out_err_unreserve; err? 279 } 280 281 if (properties->ctx_save_restore_area_size != topo_dev->node_props.cwsr_size) { 282 pr_debug("queue cwsr size 0x%x not equal to node cwsr size 0x%x\n", 283 properties->ctx_save_restore_area_size, 284 topo_dev->node_props.cwsr_size); 285 goto out_err_unreserve; err? Not sure. 286 } 287 288 total_cwsr_size = (topo_dev->node_props.cwsr_size + topo_dev->node_props.debug_memory_size) 289 * NUM_XCC(pdd->dev->xcc_mask); 290 total_cwsr_size = ALIGN(total_cwsr_size, PAGE_SIZE); 291 292 err = kfd_queue_buffer_get(vm, (void *)properties->ctx_save_restore_area_address, 293 &properties->cwsr_bo, total_cwsr_size); 294 if (!err) 295 goto out_unreserve; 296 297 amdgpu_bo_unreserve(vm->root.bo); 298 299 err = kfd_queue_buffer_svm_get(pdd, properties->ctx_save_restore_area_address, 300 total_cwsr_size); 301 if (err) 302 goto out_err_release; 303 304 return 0; 305 306 out_unreserve: 307 amdgpu_bo_unreserve(vm->root.bo); 308 return 0; 309 310 out_err_unreserve: 311 amdgpu_bo_unreserve(vm->root.bo); 312 out_err_release: 313 kfd_queue_release_buffers(pdd, properties); 314 return err; 315 } Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size") Reported-by: Dan Carpenter Cc: Philip Yang Cc: Felix Kuehling Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Reviewed-by: Philip Yang Signed-off-by: Alex Deucher

drm/amdkfd: Validate queue cwsr area and eop buffer size

2024-07-24T18:46:06+00:00

When creating KFD user compute queue, check if queue eop buffer size, cwsr area size, ctl stack size equal to the size of KFD node properities. Check the entire cwsr area which may split into multiple svm ranges aligned to granularity boundary. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Acked-by: Christian König Signed-off-by: Alex Deucher

drm/amdkfd: Store queue cwsr area size to node properties

2024-07-24T18:45:58+00:00

Use the queue eop buffer size, cwsr area size, ctl stack size calculation from Thunk, store the value to KFD node properties. Those will be used to validate queue eop buffer size, cwsr area size, ctl stack size when creating KFD user compute queue. Those will be exposed to user space via sysfs KFD node properties, to remove the duplicate calculation code from Thunk. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Acked-by: Christian König Signed-off-by: Alex Deucher