summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)AuthorFilesLines
2026-03-13drm/amd: Fix hang on amdgpu unload by using pci_dev_is_disconnected()Mario Limonciello1-2/+2
[ Upstream commit f7afda7fcd169a9168695247d07ad94cf7b9798f ] The commit 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise disconnect") introduced early KFD cleanup when drm_dev_is_unplugged() returns true. However, this causes hangs during normal module unload (rmmod amdgpu). The issue occurs because drm_dev_unplug() is called in amdgpu_pci_remove() for all removal scenarios, not just surprise disconnects. This was done intentionally in commit 39934d3ed572 ("Revert "drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled"") to fix IGT PCI software unplug test failures. As a result, drm_dev_is_unplugged() returns true even during normal module unload, triggering the early KFD cleanup inappropriately. The correct check should distinguish between: - Actual surprise disconnect (eGPU unplugged): pci_dev_is_disconnected() returns true - Normal module unload (rmmod): pci_dev_is_disconnected() returns false Replace drm_dev_is_unplugged() with pci_dev_is_disconnected() to ensure the early cleanup only happens during true hardware disconnect events. Cc: stable@vger.kernel.org Reported-by: Cal Peake <cp@absolutedigital.net> Closes: https://lore.kernel.org/all/b0c22deb-c0fa-3343-33cf-fd9a77d7db99@absolutedigital.net/ Fixes: 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise disconnect") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13drm/amdgpu: Fix locking bugs in error pathsBart Van Assche1-5/+7
[ Upstream commit 480ad5f6ead4a47b969aab6618573cd6822bb6a4 ] Do not unlock psp->ras_context.mutex if it has not been locked. This has been detected by the Clang thread-safety analyzer. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: YiPeng Chai <YiPeng.Chai@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Fixes: b3fb79cda568 ("drm/amdgpu: add mutex to protect ras shared memory") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6fa01b4335978051d2cd80841728fd63cc597970) Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13drm/amdgpu: Replace kzalloc + copy_from_user with memdup_userThorsten Blum1-14/+6
[ Upstream commit 99eeb8358e6cdb7050bf2956370c15dcceda8c7e ] Replace kzalloc() followed by copy_from_user() with memdup_user() to improve and simplify ta_if_load_debugfs_write() and ta_if_invoke_debugfs_write(). No functional changes intended. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 480ad5f6ead4 ("drm/amdgpu: Fix locking bugs in error paths") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13drm/amdgpu: Unlock a mutex before destroying itBart Van Assche1-0/+1
[ Upstream commit 5e0bcc7b88bcd081aaae6f481b10d9ab294fcb69 ] Mutexes must be unlocked before these are destroyed. This has been detected by the Clang thread-safety analyzer. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework") Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 270258ba320beb99648dceffb67e86ac76786e55) Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: keep vga memory on MacBooks with switchable graphicsAlex Deucher1-0/+10
[ Upstream commit 096bb75e13cc508d3915b7604e356bcb12b17766 ] On Intel MacBookPros with switchable graphics, when the iGPU is enabled, the address of VRAM gets put at 0 in the dGPU's virtual address space. This is non-standard and seems to cause issues with the cursor if it ends up at 0. We have the framework to reserve memory at 0 in the address space, so enable it here if the vram start address is 0. Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4302 Cc: stable@vger.kernel.org Cc: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notifyPierre-Eric Pelloux-Prayer1-1/+8
[ Upstream commit b18fc0ab837381c1a6ef28386602cd888f2d9edf ] Invalidating a dmabuf will impact other users of the shared BO. In the scenario where process A moves the BO, it needs to inform process B about the move and process B will need to update its page table. The commit fixes a synchronisation bug caused by the use of the ticket: it made amdgpu_vm_handle_moved behave as if updating the page table immediately was correct but in this case it's not. An example is the following scenario, with 2 GPUs and glxgears running on GPU0 and Xorg running on GPU1, on a system where P2P PCI isn't supported: glxgears: export linear buffer from GPU0 and import using GPU1 submit frame rendering to GPU0 submit tiled->linear blit Xorg: copy of linear buffer The sequence of jobs would be: drm_sched_job_run # GPU0, frame rendering drm_sched_job_queue # GPU0, blit drm_sched_job_done # GPU0, frame rendering drm_sched_job_run # GPU0, blit move linear buffer for GPU1 access # amdgpu_dma_buf_move_notify -> update pt # GPU0 It this point the blit job on GPU0 is still running and would likely produce a page fault. Cc: stable@vger.kernel.org Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Increase DCN35 SR enter/exit latencyLeo Li2-10/+10
[ Upstream commit 318917e1d8ecc89f820f4fabf79935f4fed718cd ] [Why & How] On Framework laptops with DDR5 modules, underflow can be observed. It's unclear why it only occurs on specific desktop contents. However, increasing enter/exit latencies by 3us seems to resolve it. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4463 Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Fix out-of-bounds write in kfd_event_page_set()Sunday Clement1-0/+6
[ Upstream commit 8a70a26c9f34baea6c3199a9862ddaff4554a96d ] The kfd_event_page_set() function writes KFD_SIGNAL_EVENT_LIMIT * 8 bytes via memset without checking the buffer size parameter. This allows unprivileged userspace to trigger an out-of bounds kernel memory write by passing a small buffer, leading to potential privilege escalation. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Remove conditional for shaper 3DLUT power-onAlex Hung1-2/+1
[ Upstream commit 1b38a87b8f8020e8ef4563e7752a64182b5a39b9 ] [Why] Shaper programming has high chance to fail on first time after power-on or reboot. This can be verified by running IGT's kms_colorop. [How] Always power on the shaper and 3DLUT before programming by removing the debug flag of low power mode. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: bypass post csc for additional color spaces in dalClay King3-6/+25
[ Upstream commit 7d9ec9dc20ecdb1661f4538cd9112cd3d6a5f15a ] [Why] For RGB BT2020 full and limited color spaces, overlay adjustments were applied twice (once by MM and once by DAL). This results in incorrect colours and a noticeable difference between mpo and non-mpo cases. [How] Add RGB BT2020 full and limited color spaces to list that bypasses post csc adjustment. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Clay King <clayking@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Add HAINAN clock adjustmentdecce61-0/+5
[ Upstream commit 49fe2c57bdc0acff9d2551ae337270b6fd8119d9 ] This patch limits the clock speeds of the AMD Radeon R5 M420 GPU from 850/1000MHz (core/memory) to 800/950 MHz, making it work stably. This patch is for amdgpu. Signed-off-by: decce6 <decce6@proton.me> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Adjust usleep_range in fence waitCe Sun1-1/+1
[ Upstream commit 3ee1c72606bd2842f0f377fd4b118362af0323ae ] Tune the sleep interval in the PSP fence wait loop from 10-100us to 60-100us.This adjustment results in an overall wait window of 1.2s (60us * 20000 iterations) to 2 seconds (100us * 20000 iterations), which guarantees that we can retrieve the correct fence value Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Avoid updating surface with the same surface under MPOWayne Lin1-1/+1
[ Upstream commit 1a38ded4bc8ac09fd029ec656b1e2c98cc0d238c ] [Why & How] Although it's dummy updates of surface update for committing stream updates, we should not have dummy_updates[j].surface all indicating to the same surface under multiple surfaces case. Otherwise, copy_surface_update_to_plane() in update_planes_and_stream_state() will update to the same surface only. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix system resume lag issueTom Chung1-0/+10
[ Upstream commit 64c94cd9be2e188ed07efeafa6a109bce638c967 ] [Why] System will try to apply idle power optimizations setting during system resume. But system power state is still in D3 state, and it will cause the idle power optimizations command not actually to be sent to DMUB and cause some platforms to go into IPS. [How] Set power state to D0 first before calling the dc_dmub_srv_apply_idle_power_optimizations(dm->dc, false) Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix writeback on DCN 3.2+Alex Hung1-4/+15
[ Upstream commit 9ef84a307582a92ef055ef0bd3db10fd8ac75960 ] [WHAT] 1. Set no scaling for writeback as they are hardcoded in DCN3.2+. 2. Set no fast plane update for writeback commits. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: avoid dig reg access timeout on usb4 link training failZhongwei1-2/+10
[ Upstream commit 15b1d7b77e9836ff4184093163174a1ef28bbdd7 ] [Why] When usb4 link training fails, the dpia sym clock will be disabled and SYMCLK source should be changed back to phy clock. In enable_streams, it is assumed that link training succeeded and will switch from refclk to phy clock. But phy clk here might not be on. Dig reg access timeout will occur. [How] When enable_stream is hit, check if link training failed for usb4. If it did, fall back to the ref clock to avoid reg access timeout. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Zhongwei <Zhongwei.Zhang@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix GFX12 family constant checksMatthew Stewart2-3/+3
[ Upstream commit bdad08670278829771626ea7b57c4db531e2544f ] Using >=, <= for checking the family is not always correct. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Disable FEC when powering down encodersOvidiu Bunea1-9/+15
[ Upstream commit 8cee62904caf95e5698fa0f2d420f5f22b4dea15 ] [why & how] VBIOS DMCUB FW can enable FEC for capable eDPs, but S/W DC state is only updated for link0 when transitioning into OS with driver loaded. This causes issues when the eDP is immediately hidden and DIG0 is assigned to another link that does not support FEC. Driver will attempt to disable FEC but FEC enablement occurs based on the link state, which does not have fec_state updated since it is a different link. Thus, FEC disablement on DIG0 will get skipped and cause no light up. Reviewed-by: Karen Chen <karen.chen@amd.com> Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()Donet Tom1-1/+1
[ Upstream commit 6c160001661b6c4e20f5c31909c722741e14c2d8 ] In svm_migrate_gart_map(), while migrating GART mapping, the number of bytes copied for the GART table only accounts for CPU pages. On non-4K systems, each CPU page can contain multiple GPU pages, and the GART requires one 8-byte PTE per GPU page. As a result, an incorrect size was passed to the DMA, causing only a partial update of the GART table. Fix this function to work correctly on non-4K page-size systems by accounting for the number of GPU pages per CPU page when calculating the number of bytes to be copied. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Relax size checking during queue buffer getDonet Tom1-3/+3
[ Upstream commit 42ea9cf2f16b7131cb7302acb3dac510968f8bdc ] HW-supported EOP buffer sizes are 4K and 32K. On systems that do not use 4K pages, the minimum buffer object (BO) allocation size is PAGE_SIZE (for example, 64K). During queue buffer acquisition, the driver currently checks the allocated BO size against the supported EOP buffer size. Since the allocated BO is larger than the expected size, this check fails, preventing queue creation. Relax the strict size validation and allow PAGE_SIZE-sized BOs to be used. Only the required 4K region of the buffer will be used as the EOP buffer and avoids queue creation failures on non-4K page systems. Acked-by: Christian König <christian.koenig@amd.com> Suggested-by: Philip Yang <yangp@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: only power down dig on phy endpointsDmytro Laktyushkin1-0/+2
[ Upstream commit 0839d8d24e6f1fc2587c4a976f44da9fa69ae3d0 ] This avoids any issues with dpia endpoints Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Skip loading SDMA_RS64 in VFYuBiao Wang1-0/+1
[ Upstream commit 39c21b81112321cbe1267b02c77ecd2161ce19aa ] VFs use the PF SDMA ucode and are unable to load SDMA_RS64. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Signed-off-by: Victor Skvortsov <Victor.Skvortsov@amd.com> Reviewed-by: Gavin Wan <gavin.wan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Add signal type check for dcn401 get_phyd32clk_srcDmytro Laktyushkin1-3/+3
[ Upstream commit c979d8db7b0f293111f2e83795ea353c8ed75de9 ] Trying to access link enc on a dpia link will cause a crash otherwise Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: avoid a warning in timedout job handlerAlex Deucher1-1/+2
[ Upstream commit c8cf9ddc549fb93cb5a35f3fe23487b1e6707e74 ] Only set an error on the fence if the fence is not signalled. We can end up with a warning if the per queue reset path signals the fence and sets an error as part of the reset, but fails to recover. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix dsc eDP issueCharlene Liu1-3/+9
[ Upstream commit 878a4b73c11111ff5f820730f59a7f8c6fd59374 ] [why] Need to add function hook check before use Reviewed-by: Mohit Bawa <mohit.bawa@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: add support for HDP IP version 6.1.1Tim Huang1-0/+1
[ Upstream commit e2fd14f579b841f54a9b7162fef15234d8c0627a ] This initializes HDP IP version 6.1.1. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Add USB-C DP Alt Mode lane limitation in DCN32LinCheng Ku1-3/+12
[ Upstream commit cea573a8e1ed83840a2173d153dd68e172849d44 ] [Why] USB-C DisplayPort Alt Mode with concurrent USB data needs lane count limitation to prevent incorrect 4-lane DP configuration when only 2 lanes are available due to hardware lane sharing between DP and USB3. [How] Query DMUB for Alt Mode status (is_dp_alt_disable, is_usb, is_dp4) in dcn32_link_encoder_get_max_link_cap() and cap DP to 2 lanes when USB is active on USB-C port. Added inline documentation explaining the USB-C lane sharing constraint. Reviewed-by: PeiChen Huang <peichen.huang@amd.com> Signed-off-by: LinCheng Ku <lincheng.ku@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Handle GPU reset and drain retry fault racePhilip Yang1-1/+6
[ Upstream commit 5b57c3c3f22336e8fd5edb7f0fef3c7823f8eac1 ] Only check and drain IH1 ring if CAM is not enabled. If GPU is under reset, don't access IH to drain retry fault. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: fix NULL pointer issue buffer funcsLikun Gao1-1/+2
[ Upstream commit 9877a865d62c9c3e0f4cc369dc9ca9f7f24f5ee9 ] If SDMA block not enabled, buffer_funcs will not initialize, fix the null pointer issue if buffer_funcs not initialized. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Use same max plane scaling limits for all 64 bpp formatsMario Kleiner1-0/+5
[ Upstream commit f0157ce46cf0e5e2257e19d590c9b16036ce26d4 ] The plane scaling hw seems to have the same min/max plane scaling limits for all 16 bpc / 64 bpp interleaved pixel color formats. Therefore add cases to amdgpu_dm_plane_get_min_max_dc_plane_scaling() for all the 16 bpc fixed-point / unorm formats to use the same .fp16 up/downscaling factor limits as used by the fp16 floating point formats. So far, 16 bpc unorm formats were not handled, and the default: path returned max/min factors for 32 bpp argb8888 formats, which were wrong and bigger than what many DCE / DCN hw generations could handle. The result sometimes was misscaling of framebuffers with DRM_FORMAT_XRGB16161616, DRM_FORMAT_ARGB16161616, DRM_FORMAT_XBGR16161616, DRM_FORMAT_ABGR16161616, leading to very wrong looking display, as tested on Polaris11 / DCE-11.2. So far this went unnoticed, because only few userspace clients used such 16 bpc unorm framebuffers, and those didn't use hw plane scaling, so they did not experience this issue. With upcoming Mesa 26 exposing 16 bpc unorm formats under both OpenGL and Vulkan under Wayland, and the upcoming GNOME 50 Mutter Wayland compositor allowing for direct scanout of these formats, the scaling hw will be used on these formats if possible for HiDPI display scaling, so it is important to use the correct hw scaling limits to avoid wrong display. Tested on AMD Polaris 11 / DCE 11.2 with upcoming Mesa 26 and GNOME 50 on HiDPI displays with scaling enabled. The mutter Wayland compositor now correctly falls back to scaling via desktop compositing instead of direct scanout, thereby avoiding wrong image display. For unscaled mode, it correctly uses direct scanout. Fixes: 580204038f5b ("drm/amd/display: Enable support for 16 bpc fixed-point framebuffers.") Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Tested-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix out-of-bounds stream encoder index v3Srinivasan Shanmugam6-24/+24
[ Upstream commit abde491143e4e12eecc41337910aace4e8d59603 ] eng_id can be negative and that stream_enc_regs[] can be indexed out of bounds. eng_id is used directly as an index into stream_enc_regs[], which has only 5 entries. When eng_id is 5 (ENGINE_ID_DIGF) or negative, this can access memory past the end of the array. Add a bounds check using ARRAY_SIZE() before using eng_id as an index. The unsigned cast also rejects negative values. This avoids out-of-bounds access. Fixes the below smatch error: dcn*_resource.c: stream_encoder_create() may index stream_enc_regs[eng_id] out of bounds (size 5). drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn351/dcn351_resource.c 1246 static struct stream_encoder *dcn35_stream_encoder_create( 1247 enum engine_id eng_id, 1248 struct dc_context *ctx) 1249 { ... 1255 1256 /* Mapping of VPG, AFMT, DME register blocks to DIO block instance */ 1257 if (eng_id <= ENGINE_ID_DIGF) { ENGINE_ID_DIGF is 5. should <= be <? Unrelated but, ugh, why is Smatch saying that "eng_id" can be negative? end_id is type signed long, but there are checks in the caller which prevent it from being negative. 1258 vpg_inst = eng_id; 1259 afmt_inst = eng_id; 1260 } else 1261 return NULL; 1262 ... 1281 1282 dcn35_dio_stream_encoder_construct(enc1, ctx, ctx->dc_bios, 1283 eng_id, vpg, afmt, --> 1284 &stream_enc_regs[eng_id], ^^^^^^^^^^^^^^^^^^^^^^^ This stream_enc_regs[] array has 5 elements so we are one element beyond the end of the array. ... 1287 return &enc1->base; 1288 } v2: use explicit bounds check as suggested by Roman/Dan; avoid unsigned int cast v3: The compiler already knows how to compare the two values, so the cast (int) is not needed. (Roman) Fixes: 2728e9c7c842 ("drm/amd/display: add DC changes for DCN351") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Mario Limonciello <superm1@kernel.org> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Cc: Roman Li <roman.li@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Reject cursor plane on DCE when scaled differently than primaryTimur Kristóf1-3/+8
[ Upstream commit 41af6215cdbcecd12920f211239479027904abf3 ] Currently DCE doesn't support the overlay cursor, so the dm_crtc_get_cursor_mode() function returns DM_CURSOR_NATIVE_MODE unconditionally. The outcome is that it doesn't check for the conditions that would necessitate the overlay cursor, meaning that it doesn't reject cases where the native cursor mode isn't supported on DCE. Remove the early return from dm_crtc_get_cursor_mode() for DCE and instead let it perform the necessary checks and return DM_CURSOR_OVERLAY_MODE. Add a later check that rejects when DM_CURSOR_OVERLAY_MODE would be used with DCE. Fixes: 1b04dcca4fb1 ("drm/amd/display: Introduce overlay cursor mode") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4600 Suggested-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Fix watch_id bounds checking in debug address watch v2Srinivasan Shanmugam1-8/+12
[ Upstream commit 5a19302cab5cec7ae7f1a60c619951e6c17d8742 ] The address watch clear code receives watch_id as an unsigned value (u32), but some helper functions were using a signed int and checked bits by shifting with watch_id. If a very large watch_id is passed from userspace, it can be converted to a negative value. This can cause invalid shifts and may access memory outside the watch_points array. drm/amdkfd: Fix watch_id bounds checking in debug address watch v2 Fix this by checking that watch_id is within MAX_WATCH_ADDRESSES before using it. Also use BIT(watch_id) to test and clear bits safely. This keeps the behavior unchanged for valid watch IDs and avoids undefined behavior for invalid ones. Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:448 kfd_dbg_trap_clear_dev_address_watch() error: buffer overflow 'pdd->watch_points' 4 <= u32max user_rl='0-3,2147483648-u32max' uncapped drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c 433 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd, 434 uint32_t watch_id) 435 { 436 int r; 437 438 if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id)) kfd_dbg_owns_dev_watch_id() doesn't check for negative values so if watch_id is larger than INT_MAX it leads to a buffer overflow. (Negative shifts are undefined). 439 return -EINVAL; 440 441 if (!pdd->dev->kfd->shared_resources.enable_mes) { 442 r = debug_lock_and_unmap(pdd->dev->dqm); 443 if (r) 444 return r; 445 } 446 447 amdgpu_gfx_off_ctrl(pdd->dev->adev, false); --> 448 pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch( 449 pdd->dev->adev, 450 watch_id); v2: (as per, Jonathan Kim) - Add early watch_id >= MAX_WATCH_ADDRESSES validation in the set path to match the clear path. - Drop the redundant bounds check in kfd_dbg_owns_dev_watch_id(). Fixes: e0f85f4690d0 ("drm/amdkfd: add debug set and clear address watch points operation") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Kim <jonathan.kim@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Fix memory leak in amdgpu_ras_init()Zilin Guan1-1/+1
[ Upstream commit ee41e5b63c8210525c936ee637a2c8d185ce873c ] When amdgpu_nbio_ras_sw_init() fails in amdgpu_ras_init(), the function returns directly without freeing the allocated con structure, leading to a memory leak. Fix this by jumping to the release_con label to properly clean up the allocated memory before returning the error code. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: fdc94d3a8c88 ("drm/amdgpu: Rework pcie_bif ras sw_init") Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Use kvfree instead of kfree in amdgpu_gmc_get_nps_memranges()Zilin Guan1-1/+1
[ Upstream commit 0c44d61945c4a80775292d96460aa2f22e62f86c ] amdgpu_discovery_get_nps_info() internally allocates memory for ranges using kvcalloc(), which may use vmalloc() for large allocation. Using kfree() to release vmalloc memory will lead to a memory corruption. Use kvfree() to safely handle both kmalloc and vmalloc allocations. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: b194d21b9bcc ("drm/amdgpu: Use NPS ranges from discovery table") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Fix memory leak in amdgpu_acpi_enumerate_xcc()Zilin Guan1-1/+3
[ Upstream commit c9be63d565789b56ca7b0197e2cb78a3671f95a8 ] In amdgpu_acpi_enumerate_xcc(), if amdgpu_acpi_dev_init() returns -ENOMEM, the function returns directly without releasing the allocated xcc_info, resulting in a memory leak. Fix this by ensuring that xcc_info is properly freed in the error paths. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: 4d5275ab0b18 ("drm/amdgpu: Add parsing of acpi xcc objects") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Use explicit VCN instance 0 in SR-IOV initSrinivasan Shanmugam1-22/+23
[ Upstream commit af26fa751c2eef66916acbf0d3c3e9159da56186 ] vcn_v2_0_start_sriov() declares a local variable "i" initialized to zero and uses it only as the instance index in SOC15_REG_OFFSET(UVD, i, ...). The value is never changed and all other fields are taken from adev->vcn.inst[0], so this path only ever programs VCN instance 0. This triggered a Smatch: warn: iterator 'i' not incremented Replace the dummy iterator with an explicit instance index of 0 in SOC15_REG_OFFSET() calls. Fixes: dd26858a9cd8 ("drm/amdgpu: implement initialization part on VCN2.0 for SRIOV") Reported by: Dan Carpenter <dan.carpenter@linaro.org> Cc: darlington Opara <darlington.opara@amd.com> Cc: Jinage Zhao <jiange.zhao@amd.com> Cc: Monk Liu <Monk.Liu@amd.com> Cc: Emily Deng <Emily.Deng@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Fix signal_eviction_fence() bool return valueSrinivasan Shanmugam1-1/+1
[ Upstream commit 31dc58adda9874420ab8fa5a2f9c43377745753a ] signal_eviction_fence() is declared to return bool, but returns -EINVAL when no eviction fence is present. This makes the "no fence" or "the NULL-fence" path evaluate to true and triggers a Smatch warning. v2: Return true instead to explicitly indicate that there is no eviction fence to signal and that eviction is already complete. This matches the existing caller logic where a NULL fence means "nothing to do" and allows restore handling to proceed normally. (Christian) Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:2099 signal_eviction_fence() warn: '(-22)' is not bool drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c 2090 static bool signal_eviction_fence(struct kfd_process *p) ^^^^ 2091 { 2092 struct dma_fence *ef; 2093 bool ret; 2094 2095 rcu_read_lock(); 2096 ef = dma_fence_get_rcu_safe(&p->ef); 2097 rcu_read_unlock(); 2098 if (!ef) --> 2099 return -EINVAL; This should be either true or false. Probably true because presumably it has been tested? 2100 2101 ret = dma_fence_check_and_signal(ef); 2102 dma_fence_put(ef); 2103 2104 return ret; 2105 } Fixes: 37865e02e6cc ("drm/amdkfd: Fix eviction fence handling") Reported by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Philip Yang <Philip.Yang@amd.com> Cc: Gang BA <Gang.Ba@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd: Drop "amdgpu kernel modesetting enabled" messageMario Limonciello (AMD)1-1/+0
[ Upstream commit 8644084a74a4573278d6f454c6638ccd5965f4e2 ] The behavior for amdgpu was changed with commit e00e5c223878 ("drm/amdgpu: adjust drm_firmware_drivers_only() handling") to potentially allow loading even if nomodeset was set, so the message is no longer accurate. Just drop it to avoid confusion. Fixes: e00e5c223878 ("drm/amdgpu: adjust drm_firmware_drivers_only() handling") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-19drm/amd/display: remove assert around dpp_base replacementMelissa Wen1-1/+0
[ Upstream commit 84962445cd8a83dc5bed4c8ad5bbb2c1cdb249a0 ] There is nothing wrong if in_shaper_func type is DISTRIBUTED POINTS. Remove the assert placed for a TODO to avoid misinterpretations. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1714dcc4c2c53e41190896eba263ed6328bcf415) Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-19drm/amd/display: extend delta clamping logic to CM3 LUT helperMelissa Wen5-25/+49
[ Upstream commit d25b32aa829a3ed5570138e541a71fb7805faec3 ] Commit 27fc10d1095f ("drm/amd/display: Fix the delta clamping for shaper LUT") fixed banding when using plane shaper LUT in DCN10 CM helper. The problem is also present in DCN30 CM helper, fix banding by extending the same bug delta clamping fix to CM3. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0274a54897f356f9c78767c4a2a5863f7dde90c6) Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11drm/amd/display: fix wrong color value mapping on MCM shaper LUTMelissa Wen1-2/+5
[ Upstream commit 8f959d37c1f2efec6dac55915ee82302e98101fb ] Some shimmer/colorful points appears when using the steamOS color pipeline for HDR on gaming with DCN32. These points look like black values being wrongly mapped to red/blue/green values. It was caused because the number of hw points in regular LUTs and in a shaper LUT was treated as the same. DCN3+ regular LUTs have 257 bases and implicit deltas (i.e. HW calculates them), but shaper LUT is a special case: it has 256 bases and 256 deltas, as in DCN1-2 regular LUTs, and outputs 14-bit values. Fix that by setting by decreasing in 1 the number of HW points computed in the LUT segmentation so that shaper LUT (i.e. fixpoint == true) keeps the same DCN10 CM logic and regular LUTs go with `hw_points + 1`. CC: Krunoslav Kovac <Krunoslav.Kovac@amd.com> Fixes: 4d5fd3d08ea9 ("drm/amd/display: PQ tail accuracy") Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5006505b19a2119e71c008044d59f6d753c858b9) Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11drm/amd/pm: Disable MMIO access during SMU Mode 1 resetPerry Yuan3-3/+16
[ Upstream commit 0de604d0357d0d22cbf03af1077d174b641707b6 ] During Mode 1 reset, the ASIC undergoes a reset cycle and becomes temporarily inaccessible via PCIe. Any attempt to access MMIO registers during this window (e.g., from interrupt handlers or other driver threads) can result in uncompleted PCIe transactions, leading to NMI panics or system hangs. To prevent this, set the `no_hw_access` flag to true immediately after triggering the reset. This signals other driver components to skip register accesses while the device is offline. A memory barrier `smp_mb()` is added to ensure the flag update is globally visible to all cores before the driver enters the sleep/wait state. Signed-off-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7edb503fe4b6d67f47d8bb0dfafb8e699bb0f8a4) Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11Revert "drm/amd: Check if ASPM is enabled from PCIe subsystem"Bert Karwatzki1-3/+0
commit 243b467dea1735fed904c2e54d248a46fa417a2d upstream. This reverts commit 7294863a6f01248d72b61d38478978d638641bee. This commit was erroneously applied again after commit 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device") removed it, leading to very hard to debug crashes, when used with a system with two AMD GPUs of which only one supports ASPM. Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/ Link: https://github.com/acpica/acpica/issues/1060 Fixes: 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device") Signed-off-by: Bert Karwatzki <spasswolf@web.de> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 97a9689300eb2b393ba5efc17c8e5db835917080) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06drm/amdgpu/gfx11: adjust KGQ reset sequenceAlex Deucher1-21/+24
[ Upstream commit 3eb46fbb601f9a0b4df8eba79252a0a85e983044 ] Kernel gfx queues do not need to be reinitialized or remapped after a reset. This fixes queue reset failures on APUs. v2: preserve init and remap for MMIO case. Fixes: b3e9bfd86658 ("drm/amdgpu/gfx11: add ring reset callbacks") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b340ff216fdabfe71ba0cdd47e9835a141d08e10) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06drm/amdgpu: Fix cond_exec handling in amdgpu_ib_schedule()Alex Deucher1-2/+3
commit b1defcdc4457649db236415ee618a7151e28788c upstream. The EXEC_COUNT field must be > 0. In the gfx shadow handling we always emit a cond_exec packet after the gfx_shadow packet, but the EXEC_COUNT never gets patched. This leads to a hang when we try and reset queues on gfx11 APUs. Fixes: c68cbbfd54c6 ("drm/amdgpu: cleanup conditional execution") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ba205ac3d6e83f56c4f824f23f1b4522cb844ff3) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_removeJon Doron1-1/+6
commit 8b1ecc9377bc641533cd9e76dfa3aee3cd04a007 upstream. On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and ih2 interrupt ring buffers are not initialized. This is by design, as these secondary IH rings are only available on discrete GPUs. See vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when AMD_IS_APU is set. However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to get the timestamp of the last interrupt entry. When retry faults are enabled on APUs (noretry=0), this function is called from the SVM page fault recovery path, resulting in a NULL pointer dereference when amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[]. The crash manifests as: BUG: kernel NULL pointer dereference, address: 0000000000000004 RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu] Call Trace: amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu] svm_range_restore_pages+0xae5/0x11c0 [amdgpu] amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu] gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu] amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu] amdgpu_ih_process+0x84/0x100 [amdgpu] This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1") which changed the default for Renoir APU from noretry=1 to noretry=0, enabling retry fault handling and thus exercising the buggy code path. Fix this by adding a check for ih1.ring_size before attempting to use it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu: Rework retry fault removal"). This is needed if the hardware doesn't support secondary HW IH rings. v2: additional updates (Alex) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814 Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6ce8d536c80aa1f059e82184f0d1994436b1d526) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06drm/amdgpu/gfx12: fix wptr reset in KGQ initAlex Deucher1-1/+1
commit 9077d32a4b570fa20500aa26e149981c366c965d upstream. wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a2918f958d3f677ea93c0ac257cb6ba69b7abb7c) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06drm/amdgpu/gfx11: fix wptr reset in KGQ initAlex Deucher1-1/+1
commit b1f810471c6a6bd349f7f9f2f2fed96082056d46 upstream. wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1f16866bdb1daed7a80ca79ae2837a9832a74fbc) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06drm/amdgpu/gfx10: fix wptr reset in KGQ initAlex Deucher1-1/+1
commit cc4f433b14e05eaa4a98fd677b836e9229422387 upstream. wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e80b1d1aa1073230b6c25a1a72e88f37e425ccda) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>