summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)AuthorFilesLines
2026-03-04drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and ↵Srinivasan Shanmugam1-36/+37
Timeline Management v7 [ Upstream commit efdc66fe12b07e7b7d28650bd8d4f7e3bb92c5d4 ] When GPU memory mappings are updated, the driver returns a fence so userspace knows when the update is finished. The previous refactor could pick the wrong fence or rely on checks that are not safe for GPU mappings that stay valid even when memory is missing. In some cases this could return an invalid fence or cause fence reference counting problems. Fix this by (v5,v6, per Christian): - Starting from the VM’s existing last update fence, so a valid and meaningful fence is always returned even when no new work is required. - Selecting the VM-level fence only for always-valid / PRT mappings using the required combined bo_va + bo guard. - Using the per-BO page table update fence for normal MAP and REPLACE operations. - For UNMAP and CLEAR, returning the fence provided by amdgpu_vm_clear_freed(), which may remain unchanged when nothing needs clearing. - Keeping fence reference counting balanced. v7: Drop the extra bo_va/bo NULL guard since amdgpu_vm_is_bo_always_valid() handles NULL BOs correctly (including PRT). (Christian) This makes VM timeline fences correct and prevents crashes caused by incorrect fence handling. Fixes: bd8150a1b337 ("drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline Management v4") Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: keep vga memory on MacBooks with switchable graphicsAlex Deucher1-0/+10
[ Upstream commit 096bb75e13cc508d3915b7604e356bcb12b17766 ] On Intel MacBookPros with switchable graphics, when the iGPU is enabled, the address of VRAM gets put at 0 in the dGPU's virtual address space. This is non-standard and seems to cause issues with the cursor if it ends up at 0. We have the framework to reserve memory at 0 in the address space, so enable it here if the vram start address is 0. Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4302 Cc: stable@vger.kernel.org Cc: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Correct logic check error for fastbootCharlene Liu1-2/+2
[ Upstream commit b6a65009e7ce3f0cc72da18f186adb60717b51a0 ] [Why] Fix fastboot broken in driver. This is caused by an open source backport change 7495962c. from the comment, the intended check is to disable fastboot for pre-DCN10. but the logic check is reversed, and causes fastboot to be disabled on all DCN10 and after. fastboot is for driver trying to pick up bios used hw setting and bypass reprogramming the hw if dc_validate_boot_timing() condition meets. Fixes: 7495962cbceb ("drm/amd/display: Disable fastboot on DCE 6 too") Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello <Mario.Limonciello@amd.com> Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notifyPierre-Eric Pelloux-Prayer1-1/+8
[ Upstream commit b18fc0ab837381c1a6ef28386602cd888f2d9edf ] Invalidating a dmabuf will impact other users of the shared BO. In the scenario where process A moves the BO, it needs to inform process B about the move and process B will need to update its page table. The commit fixes a synchronisation bug caused by the use of the ticket: it made amdgpu_vm_handle_moved behave as if updating the page table immediately was correct but in this case it's not. An example is the following scenario, with 2 GPUs and glxgears running on GPU0 and Xorg running on GPU1, on a system where P2P PCI isn't supported: glxgears: export linear buffer from GPU0 and import using GPU1 submit frame rendering to GPU0 submit tiled->linear blit Xorg: copy of linear buffer The sequence of jobs would be: drm_sched_job_run # GPU0, frame rendering drm_sched_job_queue # GPU0, blit drm_sched_job_done # GPU0, frame rendering drm_sched_job_run # GPU0, blit move linear buffer for GPU1 access # amdgpu_dma_buf_move_notify -> update pt # GPU0 It this point the blit job on GPU0 is still running and would likely produce a page fault. Cc: stable@vger.kernel.org Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Increase DCN35 SR enter/exit latencyLeo Li2-10/+10
[ Upstream commit 318917e1d8ecc89f820f4fabf79935f4fed718cd ] [Why & How] On Framework laptops with DDR5 modules, underflow can be observed. It's unclear why it only occurs on specific desktop contents. However, increasing enter/exit latencies by 3us seems to resolve it. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4463 Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Use 5-level paging if gmc support 57-bit VAPhilip Yang1-17/+0
[ Upstream commit 3b948dd0366a0b64c02e4ed1aefdf7825942e803 ] Regardless if CPU enable 5-level paging, GPU vm use 5-level paging if gmc init with 57-bit address space support, because ARM64 4-level paging support 48-bit VA, x86 and GPU 4-level paging support 47-bit VA, require 5-level paging on GPU to support ARM64. NPA address space 52-bit mapping on NPA GPU VM require 5-level paging. Debugger trap get device snapshot expect LDS and Scratch base, limit above 57-bit, which is set only for 5-level paging. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.19.x Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: GPU vm support 5-level page tablePhilip Yang3-1/+23
[ Upstream commit f6b1c1f5fd7237f77fc3880603ea54dcf0371a20 ] If GPU supports 5-level page table, but CPU disable 5-level page table by using boot option no5lvl or CPU feature not available, the virtual address will be 48bit, not needed to enable 5-level page table on GPU vm. If adev->vm_manager.num_level, number of pde levels, set to 4, then gfxhub and mmhub register VM_CONTEXTx_CNTL/PAGE_TABLE_DEPTH will set to 4 to enable 5-level page table in page table walker. Set vm_manager.root_level to AMDGPU_VM_PDE3, then update GPU mapping will allocate and update PDE3/PDE2/PDE1/PDE0/PTB 5-level page tables. If max_level is not 4, no change for the logic to support features needed by old ASICs. v2: squash in CONFIG fix Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 3b948dd0366a ("drm/amdgpu: Use 5-level paging if gmc support 57-bit VA") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Protect GPU register accesses in powergated state in some pathsYifan Zhang1-3/+6
[ Upstream commit 39fc2bc4da0082c226cbee331f0a5d44db3997da ] Ungate GPU CG/PG in device_fini_hw and device_halt to protect GPU register accesses, e.g. GC registers are accessed in amdgpu_irq_disable_all() and amdgpu_fence_driver_hw_fini(). Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Fix out-of-bounds write in kfd_event_page_set()Sunday Clement1-0/+6
[ Upstream commit 8a70a26c9f34baea6c3199a9862ddaff4554a96d ] The kfd_event_page_set() function writes KFD_SIGNAL_EVENT_LIMIT * 8 bytes via memset without checking the buffer size parameter. This allows unprivileged userspace to trigger an out-of bounds kernel memory write by passing a small buffer, leading to potential privilege escalation. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: avoid sdma ring reset in sriovVictor Zhao1-0/+3
[ Upstream commit 5cc7bbd9f1b74d9fe2f7ac08d6ba0477e8d2d65f ] sdma ring reset is not supported in SRIOV. kfd driver does not check reset mask, and could queue sdma ring reset during unmap_queues_cpsch. Avoid the ring reset for sriov. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Remove conditional for shaper 3DLUT power-onAlex Hung1-2/+1
[ Upstream commit 1b38a87b8f8020e8ef4563e7752a64182b5a39b9 ] [Why] Shaper programming has high chance to fail on first time after power-on or reboot. This can be verified by running IGT's kms_colorop. [How] Always power on the shaper and 3DLUT before programming by removing the debug flag of low power mode. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: bypass post csc for additional color spaces in dalClay King3-6/+25
[ Upstream commit 7d9ec9dc20ecdb1661f4538cd9112cd3d6a5f15a ] [Why] For RGB BT2020 full and limited color spaces, overlay adjustments were applied twice (once by MM and once by DAL). This results in incorrect colours and a noticeable difference between mpo and non-mpo cases. [How] Add RGB BT2020 full and limited color spaces to list that bypasses post csc adjustment. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Clay King <clayking@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Add HAINAN clock adjustmentdecce61-0/+5
[ Upstream commit 49fe2c57bdc0acff9d2551ae337270b6fd8119d9 ] This patch limits the clock speeds of the AMD Radeon R5 M420 GPU from 850/1000MHz (core/memory) to 800/950 MHz, making it work stably. This patch is for amdgpu. Signed-off-by: decce6 <decce6@proton.me> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: set enable_legacy_fast_update to false for DCN36YiLing Chen1-1/+1
[ Upstream commit d0728aee5090853d0b9982757f5fb1b13e2e2b27 ] [Why/How] Align the default value of the flag with DCN35/351. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: YiLing Chen <yi-lchen@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Adjust usleep_range in fence waitCe Sun1-1/+1
[ Upstream commit 3ee1c72606bd2842f0f377fd4b118362af0323ae ] Tune the sleep interval in the PSP fence wait loop from 10-100us to 60-100us.This adjustment results in an overall wait window of 1.2s (60us * 20000 iterations) to 2 seconds (100us * 20000 iterations), which guarantees that we can retrieve the correct fence value Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: return when ras table checksum is errorGangliang Xie1-1/+3
[ Upstream commit 044f8d3b1fac6ac89c560f61415000e6bdab3a03 ] end the function flow when ras table checksum is error Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Avoid updating surface with the same surface under MPOWayne Lin1-1/+1
[ Upstream commit 1a38ded4bc8ac09fd029ec656b1e2c98cc0d238c ] [Why & How] Although it's dummy updates of surface update for committing stream updates, we should not have dummy_updates[j].surface all indicating to the same surface under multiple surfaces case. Otherwise, copy_surface_update_to_plane() in update_planes_and_stream_state() will update to the same surface only. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix system resume lag issueTom Chung1-0/+10
[ Upstream commit 64c94cd9be2e188ed07efeafa6a109bce638c967 ] [Why] System will try to apply idle power optimizations setting during system resume. But system power state is still in D3 state, and it will cause the idle power optimizations command not actually to be sent to DMUB and cause some platforms to go into IPS. [How] Set power state to D0 first before calling the dc_dmub_srv_apply_idle_power_optimizations(dm->dc, false) Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Skip vcn poison irq release on VFLijo Lazar1-1/+3
[ Upstream commit 8980be03b3f9a4b58197ef95d3b37efa41a25331 ] VF doesn't enable VCN poison irq in VCNv2.5. Skip releasing it and avoid call trace during deinitialization. [ 71.913601] [drm] clean up the vf2pf work item [ 71.915088] ------------[ cut here ]------------ [ 71.915092] WARNING: CPU: 3 PID: 1079 at /tmp/amd.aFkFvSQl/amd/amdgpu/amdgpu_irq.c:641 amdgpu_irq_put+0xc6/0xe0 [amdgpu] [ 71.915355] Modules linked in: amdgpu(OE-) amddrm_ttm_helper(OE) amdttm(OE) amddrm_buddy(OE) amdxcp(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) drm_suballoc_helper drm_display_helper cec rc_core i2c_algo_bit video wmi binfmt_misc nls_iso8859_1 intel_rapl_msr intel_rapl_common input_leds joydev serio_raw mac_hid qemu_fw_cfg sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel usbhid 8139too sha256_ssse3 sha1_ssse3 hid psmouse bochs i2c_i801 ahci drm_vram_helper libahci i2c_smbus lpc_ich drm_ttm_helper 8139cp mii ttm aesni_intel crypto_simd cryptd [ 71.915484] CPU: 3 PID: 1079 Comm: rmmod Tainted: G OE 6.8.0-87-generic #88~22.04.1-Ubuntu [ 71.915489] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-2.el9_5.1 04/01/2014 [ 71.915492] RIP: 0010:amdgpu_irq_put+0xc6/0xe0 [amdgpu] [ 71.915768] Code: 75 84 b8 ea ff ff ff eb d4 44 89 ea 48 89 de 4c 89 e7 e8 fd fc ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 55 30 3b c7 <0f> 0b eb d4 b8 fe ff ff ff eb a8 e9 b7 3b 8a 00 66 2e 0f 1f 84 00 [ 71.915771] RSP: 0018:ffffcf0800eafa30 EFLAGS: 00010246 [ 71.915775] RAX: 0000000000000000 RBX: ffff891bda4b0668 RCX: 0000000000000000 [ 71.915777] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 71.915779] RBP: ffffcf0800eafa50 R08: 0000000000000000 R09: 0000000000000000 [ 71.915781] R10: 0000000000000000 R11: 0000000000000000 R12: ffff891bda480000 [ 71.915782] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 [ 71.915792] FS: 000070cff87c4c40(0000) GS:ffff893abfb80000(0000) knlGS:0000000000000000 [ 71.915795] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.915797] CR2: 00005fa13073e478 CR3: 000000010d634006 CR4: 0000000000770ef0 [ 71.915800] PKRU: 55555554 [ 71.915802] Call Trace: [ 71.915805] <TASK> [ 71.915809] vcn_v2_5_hw_fini+0x19e/0x1e0 [amdgpu] Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix writeback on DCN 3.2+Alex Hung1-4/+15
[ Upstream commit 9ef84a307582a92ef055ef0bd3db10fd8ac75960 ] [WHAT] 1. Set no scaling for writeback as they are hardcoded in DCN3.2+. 2. Set no fast plane update for writeback commits. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/pm: Fix null pointer dereference issueJinzhou Su1-0/+3
[ Upstream commit 1197366cca89a4c44c541ddedb8ce8bf0757993d ] If SMU is disabled, during RAS initialization, there will be null pointer dereference issue here. Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: validate user queue size constraintsJesse.Zhang1-0/+11
[ Upstream commit 8079b87c02e531cc91601f72ea8336dd2262fdf1 ] Add validation to ensure user queue sizes meet hardware requirements: - Size must be a power of two for efficient ring buffer wrapping - Size must be at least AMDGPU_GPU_PAGE_SIZE to prevent undersized allocations This prevents invalid configurations that could lead to GPU faults or unexpected behavior. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: avoid dig reg access timeout on usb4 link training failZhongwei1-2/+10
[ Upstream commit 15b1d7b77e9836ff4184093163174a1ef28bbdd7 ] [Why] When usb4 link training fails, the dpia sym clock will be disabled and SYMCLK source should be changed back to phy clock. In enable_streams, it is assumed that link training succeeded and will switch from refclk to phy clock. But phy clk here might not be on. Dig reg access timeout will occur. [How] When enable_stream is hit, check if link training failed for usb4. If it did, fall back to the ref clock to avoid reg access timeout. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Zhongwei <Zhongwei.Zhang@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix GFX12 family constant checksMatthew Stewart2-3/+3
[ Upstream commit bdad08670278829771626ea7b57c4db531e2544f ] Using >=, <= for checking the family is not always correct. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: mark invalid records with U64_MAXGangliang Xie1-0/+6
[ Upstream commit 0028b86b52f7609e36af635ef6cb908925306233 ] set retired_page of invalid ras records to U64_MAX, and skip them when reading ras records Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Revert "init dispclk from bootup clock for DCN315"Wang, Sung-huai2-88/+3
[ Upstream commit a625dc4989a2affb8f06e7b418bf30e1474b99c1 ] [Why&How] This reverts commit 14bb17cc37e0. Due to the change, the display shows garbage on startup. We have an alternative solution for the original issue: d24203bb629f ("drm/amd/display: Re-check seamless boot can be enabled or not") Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Wang, Sung-huai <Danny.Wang@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Revert "init dispclk from bootup clock for DCN314"Wang, Sung-huai2-134/+4
[ Upstream commit bdc26342c49e1dc1afb48feeb20c9d74d15b784c ] [Why&How] This reverts commit f082daf08f2f. Due to the change, the display shows garbage on startup. We have an alternative solution for the original issue: d24203bb629f ("drm/amd/display: Re-check seamless boot can be enabled or not") Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Wang, Sung-huai <Danny.Wang@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Ensure link output is disabled in backend reset for PLL_ONNicholas Kazlauskas1-1/+15
[ Upstream commit 4589712e0111352973131bad975023b25569287c ] [Why] We're missing the code to actually disable the link output when we have to leave the SYMCLK_ON but the TX remains OFF. [How] Port the code from DCN401 that detects SYMCLK_ON_TX_OFF and disable the link output when the backend is reset. Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Disable FEC when powering down encodersOvidiu Bunea1-9/+15
[ Upstream commit 8cee62904caf95e5698fa0f2d420f5f22b4dea15 ] [why & how] VBIOS DMCUB FW can enable FEC for capable eDPs, but S/W DC state is only updated for link0 when transitioning into OS with driver loaded. This causes issues when the eDP is immediately hidden and DIG0 is assigned to another link that does not support FEC. Driver will attempt to disable FEC but FEC enablement occurs based on the link state, which does not have fec_state updated since it is a different link. Thus, FEC disablement on DIG0 will get skipped and cause no light up. Reviewed-by: Karen Chen <karen.chen@amd.com> Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()Donet Tom1-1/+1
[ Upstream commit 6c160001661b6c4e20f5c31909c722741e14c2d8 ] In svm_migrate_gart_map(), while migrating GART mapping, the number of bytes copied for the GART table only accounts for CPU pages. On non-4K systems, each CPU page can contain multiple GPU pages, and the GART requires one 8-byte PTE per GPU page. As a result, an incorrect size was passed to the DMA, causing only a partial update of the GART table. Fix this function to work correctly on non-4K page-size systems by accounting for the number of GPU pages per CPU page when calculating the number of bytes to be copied. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Relax size checking during queue buffer getDonet Tom1-3/+3
[ Upstream commit 42ea9cf2f16b7131cb7302acb3dac510968f8bdc ] HW-supported EOP buffer sizes are 4K and 32K. On systems that do not use 4K pages, the minimum buffer object (BO) allocation size is PAGE_SIZE (for example, 64K). During queue buffer acquisition, the driver currently checks the allocated BO size against the supported EOP buffer size. Since the allocated BO is larger than the expected size, this check fails, preventing queue creation. Relax the strict size validation and allow PAGE_SIZE-sized BOs to be used. Only the required 4K region of the buffer will be used as the EOP buffer and avoids queue creation failures on non-4K page systems. Acked-by: Christian König <christian.koenig@amd.com> Suggested-by: Philip Yang <yangp@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Adjust PHY FSM transition to TX_EN-to-PLL_ON for TMDS on DCN35Nicholas Kazlauskas3-1/+56
[ Upstream commit 75372d75a4e23783583998ed99d5009d555850da ] [Why] A backport of the change made for DCN401 that addresses an issue where we turn off the PHY PLL when disabling TMDS output, which causes the OTG to remain stuck. The OTG being stuck can lead to a hang in the DCHVM's ability to ACK invalidations when it thinks the HUBP is still on but it's not receiving global sync. The transition to PLL_ON needs to be atomic as there's no guarantee that the thread isn't pre-empted or is able to complete before the IOMMU watchdog times out. [How] Backport the implementation from dcn401 back to dcn35. There's a functional difference in when the eDP output is disabled in dcn401 code so we don't want to utilize it directly. Reviewed-by: Yihan Zhu <yihan.zhu@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: only power down dig on phy endpointsDmytro Laktyushkin1-0/+2
[ Upstream commit 0839d8d24e6f1fc2587c4a976f44da9fa69ae3d0 ] This avoids any issues with dpia endpoints Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Skip loading SDMA_RS64 in VFYuBiao Wang1-0/+1
[ Upstream commit 39c21b81112321cbe1267b02c77ecd2161ce19aa ] VFs use the PF SDMA ucode and are unable to load SDMA_RS64. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Signed-off-by: Victor Skvortsov <Victor.Skvortsov@amd.com> Reviewed-by: Gavin Wan <gavin.wan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and ↵Srinivasan Shanmugam1-53/+82
Timeline Management v4 [ Upstream commit bd8150a1b3370a9f7761c5814202a3fe5a79f44f ] This commit simplifies the amdgpu_gem_va_ioctl function, key updates include: - Moved the logic for managing the last update fence directly into amdgpu_gem_va_update_vm. - Introduced checks for the timeline point to enable conditional replacement or addition of fences. v2: Addressed review comments from Christian. v3: Updated comments (Christian). v4: The previous version selected the fence too early and did not manage its reference correctly, which could lead to stale or freed fences being used. This resulted in refcount underflows and could crash when updating GPU timelines. The fence is now chosen only after the VA mapping work is completed, and its reference is taken safely. After exporting it to the VM timeline syncobj, the driver always drops its local fence reference, ensuring balanced refcounting and avoiding use-after-free on dma_fence. Crash signature: [ 205.828135] refcount_t: underflow; use-after-free. [ 205.832963] WARNING: CPU: 30 PID: 7274 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110 ... [ 206.074014] Call Trace: [ 206.076488] <TASK> [ 206.078608] amdgpu_gem_va_ioctl+0x6ea/0x740 [amdgpu] [ 206.084040] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [ 206.089994] drm_ioctl_kernel+0x86/0xe0 [drm] [ 206.094415] drm_ioctl+0x26e/0x520 [drm] [ 206.098424] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [ 206.104402] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu] [ 206.109387] __x64_sys_ioctl+0x96/0xe0 [ 206.113156] do_syscall_64+0x66/0x2d0 ... [ 206.553351] BUG: unable to handle page fault for address: ffffffffc0dfde90 ... [ 206.553378] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0 ... [ 206.553405] Call Trace: [ 206.553409] <IRQ> [ 206.553415] ? __pfx_drm_sched_fence_free_rcu+0x10/0x10 [gpu_sched] [ 206.553424] dma_fence_signal+0x30/0x60 [ 206.553427] drm_sched_job_done.isra.0+0x123/0x150 [gpu_sched] [ 206.553434] dma_fence_signal_timestamp_locked+0x6e/0xe0 [ 206.553437] dma_fence_signal+0x30/0x60 [ 206.553441] amdgpu_fence_process+0xd8/0x150 [amdgpu] [ 206.553854] sdma_v4_0_process_trap_irq+0x97/0xb0 [amdgpu] [ 206.554353] edac_mce_amd(E) ee1004(E) [ 206.554270] amdgpu_irq_dispatch+0x150/0x230 [amdgpu] [ 206.554702] amdgpu_ih_process+0x6a/0x180 [amdgpu] [ 206.555101] amdgpu_irq_handler+0x23/0x60 [amdgpu] [ 206.555500] __handle_irq_event_percpu+0x4a/0x1c0 [ 206.555506] handle_irq_event+0x38/0x80 [ 206.555509] handle_edge_irq+0x92/0x1e0 [ 206.555513] __common_interrupt+0x3e/0xb0 [ 206.555519] common_interrupt+0x80/0xa0 [ 206.555525] </IRQ> [ 206.555527] <TASK> ... [ 206.555650] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0 ... [ 206.555667] Kernel panic - not syncing: Fatal exception in interrupt Link: https://patchwork.freedesktop.org/patch/654669/ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Add signal type check for dcn401 get_phyd32clk_srcDmytro Laktyushkin1-3/+3
[ Upstream commit c979d8db7b0f293111f2e83795ea353c8ed75de9 ] Trying to access link enc on a dpia link will cause a crash otherwise Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: avoid a warning in timedout job handlerAlex Deucher1-1/+2
[ Upstream commit c8cf9ddc549fb93cb5a35f3fe23487b1e6707e74 ] Only set an error on the fence if the fence is not signalled. We can end up with a warning if the per queue reset path signals the fence and sets an error as part of the reset, but fails to recover. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix dsc eDP issueCharlene Liu1-3/+9
[ Upstream commit 878a4b73c11111ff5f820730f59a7f8c6fd59374 ] [why] Need to add function hook check before use Reviewed-by: Mohit Bawa <mohit.bawa@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix mismatched unlock for DMUB HW lock in HWSS fast pathNicholas Kazlauskas1-4/+6
[ Upstream commit af3303970da5ce5bfe6dffdd07f38f42aad603e0 ] [Why] The evaluation for whether we need to use the DMUB HW lock isn't the same as whether we need to unlock which results in a hang when the fast path is used for ASIC without FAMS support. [How] Store a flag that indicates whether we should use the lock and use that same flag to specify whether unlocking is needed. Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: add support for HDP IP version 6.1.1Tim Huang1-0/+1
[ Upstream commit e2fd14f579b841f54a9b7162fef15234d8c0627a ] This initializes HDP IP version 6.1.1. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Don't disable DPCD mst_en if sink connectedPeichen Huang1-5/+11
[ Upstream commit 9aeb31b2456452257ad1ff7ec566f21bab1f3e8a ] [WHY] User may connect mst dock with multi monitors and do quick unplug and plug in one of the monitor. This operatioin may create CSN from dock to display driver. Then display driver would disable and then enable mst link and also disable/enable DPCD mst_en bit in dock RX. However, when mst_en bit being disabled, if dock has another CSN message to transmit then the message would be removed because of the disabling of mst_en. In this case, the message is missing and it ends up no display in the replugged monitor. [HOW] Don't disable mst_en bit when link still has sink connected. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Add USB-C DP Alt Mode lane limitation in DCN32LinCheng Ku1-3/+12
[ Upstream commit cea573a8e1ed83840a2173d153dd68e172849d44 ] [Why] USB-C DisplayPort Alt Mode with concurrent USB data needs lane count limitation to prevent incorrect 4-lane DP configuration when only 2 lanes are available due to hardware lane sharing between DP and USB3. [How] Query DMUB for Alt Mode status (is_dp_alt_disable, is_usb, is_dp4) in dcn32_link_encoder_get_max_link_cap() and cap DP to 2 lanes when USB is active on USB-C port. Added inline documentation explaining the USB-C lane sharing constraint. Reviewed-by: PeiChen Huang <peichen.huang@amd.com> Signed-off-by: LinCheng Ku <lincheng.ku@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix DP no audio issueCharlene Liu1-1/+0
[ Upstream commit bf5e396957acafd46003318965500914d5f4edfa ] [why] need to enable APG_CLOCK_ENABLE enable first also need to wake up az from D3 before access az block Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdkfd: Handle GPU reset and drain retry fault racePhilip Yang1-1/+6
[ Upstream commit 5b57c3c3f22336e8fd5edb7f0fef3c7823f8eac1 ] Only check and drain IH1 ring if CAM is not enabled. If GPU is under reset, don't access IH to drain retry fault. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Guard FAMS2 configuration updatesDillon Varone1-2/+5
[ Upstream commit 7dedb906cdfec100061daf41f8e54266e975987d ] [WHY&HOW] If DMCUB is not initialized or FAMS2 is not supported, the interface should not be called. Reviewed-by: Sridevi Arvindekar <sridevi.arvindekar@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Correct FIXED_VS Link Rate Toggle ConditionJing Zhou1-1/+1
[ Upstream commit 531fe6e0fee85a1bdb5b8223a706fff654ed0a61 ] [WHY&HOW] The condition is only perform toggle if FIXED_VS LTTPR reports no IEEE OUI. The literal "\x0,\x0,\x0" contains commas changes the bytes being compared to {0x00,0x2C,0X00}. The correct literal should be "\x00\x00\x00" without commas. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Jing Zhou <Jing.Zhou@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Fix wrong x_pos and y_pos for cursor offloadNicholas Kazlauskas2-8/+9
[ Upstream commit c02288724b98cbc018231200891d66578f83f848 ] [Why] The hubp401_cursor_set_position function programs a different value than it stores for use with cursor offload. This can cause a desync when switching between cursor programming paths. [How] We do the translation to destination space currently twice: once in the HWSS layer, and then again in the HUBP layer since we never store the translated result. HUBP expects to program the pos->x and pos->y directly for other ASIC, so follow that pattern here as well. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amd/display: Correct DSC padding accountingRelja Vojvodic5-7/+8
[ Upstream commit c7062be3380cb20c8b1c4a935a13f1848ead0719 ] [WHY] - After the addition of all OVT patches, DSC padding was being accounted for multiple times, effectively doubling the padding - This caused compliance failures or corruption [HOW] - Add padding to DSC pic width when required by HW, and do not re-add when calculating reg values - Do not add padding when computing PPS values, and instead track padding separately to add when calculating slice width values Reviewed-by: Chris Park <chris.park@amd.com> Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Relja Vojvodic <rvojvodi@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu/ras: Move ras data alloc before bad page checkAsad Kamal1-5/+5
[ Upstream commit bd68a1404b6fa2e7e9957b38ba22616faba43e75 ] In the rare event if eeprom has only invalid address entries, allocation is skipped, this causes following NULL pointer issue [ 547.103445] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 547.118897] #PF: supervisor read access in kernel mode [ 547.130292] #PF: error_code(0x0000) - not-present page [ 547.141689] PGD 124757067 P4D 0 [ 547.148842] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 547.158504] CPU: 49 PID: 8167 Comm: cat Tainted: G OE 6.8.0-38-generic #38-Ubuntu [ 547.177998] Hardware name: Supermicro AS -8126GS-TNMR/H14DSG-OD, BIOS 1.7 09/12/2025 [ 547.195178] RIP: 0010:amdgpu_ras_sysfs_badpages_read+0x2f2/0x5d0 [amdgpu] [ 547.210375] Code: e8 63 78 82 c0 45 31 d2 45 3b 75 08 48 8b 45 a0 73 44 44 89 f1 48 8b 7d 88 48 89 ca 48 c1 e2 05 48 29 ca 49 8b 4d 00 48 01 d1 <48> 83 79 10 00 74 17 49 63 f2 48 8b 49 08 41 83 c2 01 48 8d 34 76 [ 547.252045] RSP: 0018:ffa0000067287ac0 EFLAGS: 00010246 [ 547.263636] RAX: ff11000167c28130 RBX: ff11000127600000 RCX: 0000000000000000 [ 547.279467] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff11000125b1c800 [ 547.295298] RBP: ffa0000067287b50 R08: 0000000000000000 R09: 0000000000000000 [ 547.311129] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 547.326959] R13: ff11000217b1de00 R14: 0000000000000000 R15: 0000000000000092 [ 547.342790] FS: 0000746e59d14740(0000) GS:ff11017dfda80000(0000) knlGS:0000000000000000 [ 547.360744] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 547.373489] CR2: 0000000000000010 CR3: 000000019585e001 CR4: 0000000000f71ef0 [ 547.389321] PKRU: 55555554 [ 547.395316] Call Trace: [ 547.400737] <TASK> [ 547.405386] ? show_regs+0x6d/0x80 [ 547.412929] ? __die+0x24/0x80 [ 547.419697] ? page_fault_oops+0x99/0x1b0 [ 547.428588] ? do_user_addr_fault+0x2ee/0x6b0 [ 547.438249] ? exc_page_fault+0x83/0x1b0 [ 547.446949] ? asm_exc_page_fault+0x27/0x30 [ 547.456225] ? amdgpu_ras_sysfs_badpages_read+0x2f2/0x5d0 [amdgpu] [ 547.470040] ? mas_wr_modify+0xcd/0x140 [ 547.478548] sysfs_kf_bin_read+0x63/0xb0 [ 547.487248] kernfs_file_read_iter+0xa1/0x190 [ 547.496909] kernfs_fop_read_iter+0x25/0x40 [ 547.506182] vfs_read+0x255/0x390 This also result in space left assigned to negative values. Moving data alloc call before bad page check resolves both the issue. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drm/amdgpu: fix the calculation of RAS bad page numberTao Zhou1-4/+0
[ Upstream commit f752e79d38857011f1293fcb6c810409c3b669ee ] __amdgpu_ras_restore_bad_pages is responsible for the maintenance of bad page number, drop the unnecessary bad page number update in the error handling path of add_bad_pages. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>