kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c, branch v7.2-rc1

drm/amdgpu: set noretry=1 as default for GFX 10.1.x (Navi10/12/14)

2026-06-03T17:54:14+00:00

Problem: While developing the amd_close_race IGT test (which intentionally triggers execute permission faults by removing VM_PAGE_EXECUTABLE from GPU page table entries), we discovered that on Navi10 (GFX 10.1.x) these faults produce zero diagnostic output. The GPU simply hangs silently for ~10s until the scheduler timeout fires. There is no way to distinguish an execute permission fault from any other type of GPU hang. Root cause: GFX 10.1.x defaults to noretry=0, which sets RETRY_PERMISSION_OR_INVALID_PAGE_FAULT=1 in the GFXHUB UTCL2 registers (gfxhub_v2_0.c line 313). With this bit set, permission faults (valid PTE, wrong R/W/X bits) are handled entirely within the UTCL1/UTCL2 hardware loop: UTCL2 returns an XNACK to UTCL1, and UTCL1 re-requests the translation indefinitely, expecting software to eventually fix the permission bits (as happens in SVM/HMM recovery). No interrupt of any kind reaches the IH ring. This is different from invalid-page faults (V=0) which DO generate a retry fault interrupt that the driver can escalate to a no-retry fault. Permission faults with valid PTEs loop silently forever in hardware. GFX 10.3+ already defaults to noretry=1, which makes permission faults generate immediate L2 protection fault interrupts. GFX 10.1.x was inadvertently left out of this default. Fix: Change the noretry=1 threshold from IP_VERSION(10, 3, 0) to IP_VERSION(10, 1, 0) in amdgpu_gmc_noretry_set(). This is a one-line change that aligns GFX 10.1.x behavior with GFX 10.3+ and all newer generations. With noretry=1, the existing non-retry fault handler (gmc_v10_0_process_interrupt) already decodes and prints the full GCVM_L2_PROTECTION_FAULT_STATUS register including PERMISSION_FAULTS, faulting address, VMID, PASID, and process name. No additional logging code is needed — the fix is purely routing permission faults to the existing, fully-capable non-retry interrupt handler. v2: Dropped GFX10-specific logging from gmc_v10_0.c and kfd_int_process_v10.c (Felix Kuehling). v1 added logging in the retry fault handler, but with noretry=1 permission faults take the non-retry path — the v1 retry handler code was dead and would never execute. Tested on Navi10 (GFX 10.1.10): - Execute permission faults now produce immediate, clear output: [gfxhub] page fault (src_id:0 ring:64 vmid:4 pasid:592) Process amd_close_race pid 13380 thread amd_close_race pid 13384 in page at address 0x40001000 from client 0x1b (UTCL2) GCVM_L2_PROTECTION_FAULT_STATUS:0x00700881 PERMISSION_FAULTS: 0x8 - No regressions with properly-mapped GPU workloads Cc: Christian Koenig Cc: Alex Deucher Cc: Felix Kuehling Signed-off-by: Vitaly Prosyak Acked-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: Use asic specific pte_addr_mask

2026-06-03T17:53:46+00:00

For PTE creation use asic specific physical page base address mask v2: Change variable name from pa_mask to pte_addr_mask Signed-off-by: Harish Kasiviswanathan Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher

drm/amdgpu: Add support for GC IP version 11.5.6

2026-06-03T17:50:15+00:00

Initialize GC IP 11_5_6 Signed-off-by: Pratik Vishwakarma Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: Add helper to set gart size

2026-05-05T13:56:51+00:00

Add a helper to make any adjustments to gart size based on other parameters or conditions. Suggested-by: Christian König Signed-off-by: Lijo Lazar Reviewed-by: Christian König Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: Set default xnack mode for gfx_v12.1 A0/B0

2026-04-24T14:57:29+00:00

For A0, default xnack mode is off For BO, default xnack mode is on Signed-off-by: Harish Kasiviswanathan Reviewed-by: Philip.Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu/gmc: Fix AMDGPU_GART_PLACEMENT_LOW to not overlap with VRAM

2026-04-24T14:47:30+00:00

When the GART placement is set to AMDGPU_GART_PLACEMENT_LOW: Make sure that GART does not overlap with VRAM when VRAM is configured to be in the low address space. Solve this according to the following logic: - When GART fits before VRAM, use zero address for GART - Otherwise, put GART after the end of VRAM, aligned to 4 GiB Previously, I had assumed this was not possible so it was OK to not handle it, but now we got a report from a user who has a board that is configured this way. Fixes: 917f91d8d8e8 ("drm/amdgpu/gmc: add a way to force a particular placement for GART") Signed-off-by: Timur Kristóf Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: pass all the sdma scheds to amdgpu_mman

2026-04-17T19:41:12+00:00

This will allow the use of all of them for clear/fill buffer operations. Since drm_sched_entity_init requires a scheduler array, we store schedulers rather than rings. For the few places that need access to a ring, we can get it from the sched using container_of. Since the code is the same for all sdma versions, add a new helper amdgpu_sdma_set_buffer_funcs_scheds to set buffer_funcs_scheds based on the number of sdma instances. Note: the new sched array is identical to the amdgpu_vm_manager one. These 2 could be merged. Signed-off-by: Pierre-Eric Pelloux-Prayer Acked-by: Felix Kuehling Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Group filling reserve region details

2026-04-03T17:50:09+00:00

Add a function which groups filling of reserve region information. It may not cover all as info on some regions are still filled outside like those from atomfirmware tables. Signed-off-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: Add stolen_reserved reserve-region

2026-04-03T17:49:43+00:00

Use reserve region helpers for initializing/reserving stolen_reserved region. Signed-off-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: Add extended stolen vga reserve-region

2026-04-03T17:49:41+00:00

Use reserve region helpers for initializing/reserving extended stolen vga region. Signed-off-by: Lijo Lazar Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher