kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c, branch v6.18.21

drm/amd: Set num IP blocks to 0 if discovery fails

2026-03-19T15:08:41+00:00

commit 3646ff28780b4c52c5b5081443199e7a430110e5 upstream. If discovery has failed for any reason (such as no support for a block) then there is no need to unwind all the IP blocks in fini. In this condition there can actually be failures during the unwind too. Reset num_ip_blocks to zero during failure path and skip the unnecessary cleanup path. Suggested-by: Lijo Lazar Reviewed-by: Lijo Lazar Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher (cherry picked from commit fae5984296b981c8cc3acca35b701c1f332a6cd8) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: Fix query for VPE block_type and ip_count

2026-01-17T15:35:13+00:00

commit 72d7f4573660287f1b66c30319efecd6fcde92ee upstream. [Why] Query for VPE block_type and ip_count is missing. [How] Add VPE case in ip_block_type and hw_ip_count query. Reviewed-by: Lang Yu Signed-off-by: Alan Liu Signed-off-by: Alex Deucher (cherry picked from commit a6ea0a430aca5932b9c75d8e38deeb45665dd2ae) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices

2025-10-13T18:14:15+00:00

Previously, APU platforms (and other scenarios with uninitialized VRAM managers) triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root cause is not that the `struct ttm_resource_manager *man` pointer itself is NULL, but that `man->bdev` (the backing device pointer within the manager) remains uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to a kernel OOPS. 1. **amdgpu_cs.c**: Extend the existing bandwidth control check in `amdgpu_cs_get_threshold_for_moves()` to include a check for `ttm_resource_manager_used()`. If the manager is not used (uninitialized `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific logic that would trigger the NULL dereference. 2. **amdgpu_kms.c**: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info reporting to use a conditional: if the manager is used, return the real VRAM usage; otherwise, return 0. This avoids accessing `man->bdev` when it is NULL. 3. **amdgpu_virt.c**: Modify the vf2pf (virtual function to physical function) data write path. Use `ttm_resource_manager_used()` to check validity: if the manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set `fb_usage` to 0 (APUs have no discrete framebuffer to report). This approach is more robust than APU-specific checks because it: - Works for all scenarios where the VRAM manager is uninitialized (not just APUs), - Aligns with TTM's design by using its native helper function, - Preserves correct behavior for discrete GPUs (which have fully initialized `man->bdev` and pass the `ttm_resource_manager_used()` check). v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian) Reviewed-by: Christian König Suggested-by: Lijo Lazar Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init

2025-10-07T18:09:07+00:00

As KFD no longer uses a separate PASID, the global amdgpu_vm_set_pasid()function is no longer necessary. Merge its functionality directly intoamdgpu_vm_init() to simplify code flow and eliminate redundant locking. v2: remove superflous check adjust amdgpu_vm_fin and remove amdgpu_vm_set_pasid (Chritian) v3: drop amdgpu_vm_assert_locked (Chritian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4614 Fixes: 59e4405e9ee2 ("drm/amdgpu: revert to old status lock handling v3") Reviewed-by: Christian König Suggested-by: Christian König Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

drm/amdgpu: add AMDGPU_IDS_FLAGS_GANG_SUBMIT

2025-09-15T21:04:42+00:00

Add a UAPI flag indicating if gang submit is supported or not. Signed-off-by: Christian König Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: Replace HQD terminology with slots naming

2025-07-16T20:17:36+00:00

The term "HQD" is CP-specific and doesn't accurately describe the queue resources for other IP blocks like SDMA, VCN, or VPE. This change: 1. Renames `num_hqds` to `num_slots` in amdgpu_kms.c to better reflect the generic nature of the resource counting 2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots` 3. Maintains the same functionality while using more appropriate terminology Signed-off-by: Jesse Zhang Reviewed-by: Marek Olšák Signed-off-by: Alex Deucher

drm/amdgpu: Add user queue instance count in HW IP info

2025-07-16T20:17:35+00:00

This change exposes the number of available user queue instances for each hardware IP type (GFX, COMPUTE, SDMA) through the drm_amdgpu_info_hw_ip interface. Key changes: 1. Added userq_num_instance field to drm_amdgpu_info_hw_ip structure 2. Implemented counting of available HQD slots using: - mes.gfx_hqd_mask for GFX queues - mes.compute_hqd_mask for COMPUTE queues - mes.sdma_hqd_mask for SDMA queues 3. Only counts available instances when user queues are enabled (!disable_uq) v2: using the adev->mes.gfx_hqd_mask[]/compute_hqd_mask[]/sdma_hqd_mask[] masks to determine the number of queue slots available for each engine type (Alex) v3: rename userq_num_instance to userq_num_hqds (Alex) Suggested-by: Alex Deucher Reviewed-by: Alex Deucher Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

Merge tag 'amd-drm-next-6.17-2025-07-11' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

2025-07-11T21:55:40+00:00

amd-drm-next-6.17-2025-07-11: amdgpu: - Clean up function signatures - GC 10 KGQ reset fix - SDMA reset cleanups - Misc fixes - LVDS fixes - UserQ fix amdkfd: - Reset fix Signed-off-by: Simona Vetter From: Alex Deucher Link: https://patchwork.freedesktop.org/patch/msgid/20250711205548.21052-1-alexander.deucher@amd.com

Merge remote-tracking branch 'drm/drm-next' into drm-misc-next

2025-07-08T14:49:07+00:00

Pull in drm-intel-next for the updates to drm panic handling. Signed-off-by: Maarten Lankhorst

Revert "drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini"

2025-07-07T17:48:50+00:00

This reverts commit 5fb90421fa0fbe0a968274912101fe917bf1c47b. The original patch moved `amdgpu_userq_mgr_fini()` to the driver's `postclose` callback, which is called after `drm_gem_release()` in the DRM file cleanup sequence.If a user application crashes or aborts without cleaning up its user queues, 'drm_gem_release()` may free GEM objects that are still referenced by active user queues, leading to use-after-free. By reverting, we ensure that user queues are disabled and cleaned up before any GEM objects are released, preventing this class of bug. However, this reintroduces a race during PCI hot-unplug, where device removal can race with per-file cleanup, leading to use-after-free in suspend/unplug paths. This will be fixed in the next patch. Fixes: 5fb90421fa0f ("drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c") Signed-off-by: Vitaly Prosyak Acked-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Alex Deucher