kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c, branch v6.18.21

drm/amdgpu: keep vga memory on MacBooks with switchable graphics

2026-03-04T12:21:34+00:00

[ Upstream commit 096bb75e13cc508d3915b7604e356bcb12b17766 ] On Intel MacBookPros with switchable graphics, when the iGPU is enabled, the address of VRAM gets put at 0 in the dGPU's virtual address space. This is non-standard and seems to cause issues with the cursor if it ends up at 0. We have the framework to reserve memory at 0 in the address space, so enable it here if the vram start address is 0. Reviewed-and-tested-by: Mario Kleiner Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4302 Cc: stable@vger.kernel.org Cc: Mario Kleiner Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: Use kvfree instead of kfree in amdgpu_gmc_get_nps_memranges()

2026-02-26T22:59:42+00:00

[ Upstream commit 0c44d61945c4a80775292d96460aa2f22e62f86c ] amdgpu_discovery_get_nps_info() internally allocates memory for ranges using kvcalloc(), which may use vmalloc() for large allocation. Using kfree() to release vmalloc memory will lead to a memory corruption. Use kvfree() to safely handle both kmalloc and vmalloc allocations. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: b194d21b9bcc ("drm/amdgpu: Use NPS ranges from discovery table") Reviewed-by: Lijo Lazar Signed-off-by: Zilin Guan Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove

2026-02-06T15:57:44+00:00

commit 8b1ecc9377bc641533cd9e76dfa3aee3cd04a007 upstream. On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and ih2 interrupt ring buffers are not initialized. This is by design, as these secondary IH rings are only available on discrete GPUs. See vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when AMD_IS_APU is set. However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to get the timestamp of the last interrupt entry. When retry faults are enabled on APUs (noretry=0), this function is called from the SVM page fault recovery path, resulting in a NULL pointer dereference when amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[]. The crash manifests as: BUG: kernel NULL pointer dereference, address: 0000000000000004 RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu] Call Trace: amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu] svm_range_restore_pages+0xae5/0x11c0 [amdgpu] amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu] gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu] amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu] amdgpu_ih_process+0x84/0x100 [amdgpu] This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1") which changed the default for Renoir APU from noretry=1 to noretry=0, enabling retry fault handling and thus exercising the buggy code path. Fix this by adding a check for ih1.ring_size before attempting to use it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu: Rework retry fault removal"). This is needed if the hardware doesn't support secondary HW IH rings. v2: additional updates (Alex) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814 Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal") Reviewed-by: Timur Kristóf Reviewed-by: Philip Yang Signed-off-by: Jon Doron Signed-off-by: Alex Deucher (cherry picked from commit 6ce8d536c80aa1f059e82184f0d1994436b1d526) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amd/amdgpu: reserve vm invalidation engine for uni_mes

2025-11-24T18:25:31+00:00

Reserve vm invalidation engine 6 when uni_mes enabled. It is used in processing tlb flush request from host. Signed-off-by: Michael Chen Acked-by: Alex Deucher Reviewed-by: Shaoyun liu Signed-off-by: Alex Deucher (cherry picked from commit 873373739b9b150720ea2c5390b4e904a4d21505) Cc: stable@vger.kernel.org

drm/amdgpu: give each kernel job a unique id

2025-09-01T05:19:31+00:00

Userspace jobs have drm_file.client_id as a unique identifier as job's owners. For kernel jobs, we can allocate arbitrary values - the risk of overlap with userspace ids is small (given that it's a u64 value). In the unlikely case the overlap happens, it'll only impact trace events. Since this ID is traced in the gpu_scheduler trace events, this allows to determine the source of each job sent to the hardware. To make grepping easier, the IDs are defined as they will appear in the trace output. Signed-off-by: Pierre-Eric Pelloux-Prayer Acked-by: Alex Deucher Signed-off-by: Arunpravin Paneer Selvam Link: https://lore.kernel.org/r/20250604122827.2191-1-pierre-eric.pelloux-prayer@amd.com

drm/amdgpu: Convert init_mem_ranges into common helpers

2025-06-24T14:04:16+00:00

They can be shared across multiple products Signed-off-by: Hawking Zhang Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher

drm/amdgpu: Convert query_memory_partition into common helpers

2025-06-24T14:04:09+00:00

The query_memory_partition does not need to remain as soc specific callbacks. They can be shared across multiple products Signed-off-by: Hawking Zhang Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher

drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-06-18T16:19:15+00:00

When switching to new GPU index after hibernation and then resume, VRAM offset of each VRAM BO will be changed, and the cached gpu addresses needed to updated. This is to enable pdb0 and switch to use pdb0-based virtual gpu address by default in amdgpu_bo_create_reserved(). since the virtual addresses do not change, this can avoid the need to update all cached gpu addresses all over the codebase. Signed-off-by: Emily Deng Signed-off-by: Samuel Zhang Acked-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Disallow partition query during reset

2025-04-30T22:03:02+00:00

Reject queries to get current partition modes during reset. Also, don't accept sysfs interface requests to switch compute partition mode while in reset. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Reviewed-by: Asad Kamal Signed-off-by: Alex Deucher

drm/amdgpu: Increase KIQ invalidate_tlbs timeout

2025-04-07T19:18:30+00:00

KIQ invalidate_tlbs request has been seen to marginally exceed the configured 100 ms timeout on systems under load. All other KIQ requests in the driver use a 10 second timeout. Use a similar timeout implementation on the invalidate_tlbs path. v2: Poll once before msleep v3: Fix return value Signed-off-by: Jay Cornwall Cc: Kent Russell Reviewed-by: Harish Kasiviswanathan Signed-off-by: Alex Deucher