summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)AuthorFilesLines
3 daysMerge tag 'drm-misc-fixes-2026-06-12' of ↵Dave Airlie1-1/+5
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: amd: - track colorop changes correctly amdxdna: - fix possible leak of mm_struct colorop: - make lut interpolation mutable - track colorop updates correctly ivpu: - fix integer truncation vc4: - fix leak in krealloc() error handling virtio: - fix dma_fence ref-count leak Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260612081418.GA17001@2a02-2455-9062-2500-e496-5a17-62ba-545e.dyn6.pyur.net
6 daysdrm/amd/display: use plane color_mgmt_changed to track colorop changesMelissa Wen1-1/+5
Ensure the driver tracks changes in any colorop property of a plane color pipeline by using the same mechanism of CRTC color management and update plane color blocks when any colorop property changes. It fixes an issue observed on gamescope settings for night mode which is done via shaper/3D-LUT updates. Fixes: 9ba25915efba ("drm/amd/display: Add support for sRGB EOTF in DEGAM block") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patch.msgid.link/20260609110420.1298352-5-mwen@igalia.com
11 daysdrm/amd/display: Consult MCCS FreeSync cap only if requested & supportedMichel Dänzer1-8/+6
When the do_mccs parameter is false, we don't call dm_helpers_read_mccs_caps, so sink->mccs_caps.freesync_supported is unlikely to be true. Fixes: 6f71d5dd3206 ("drm/amd/display: Read sink freesync support via mccs") Bug: https://gitlab.freedesktop.org/drm/amd/-/work_items/5286 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 115bf5ca318e18a3dc1888ec6271c7052774952a)
11 daysdrm/amdkfd: Unwind debug trap enable on copy_to_user failureYongqiang Sun1-0/+6
If kfd_dbg_trap_enable() fails while copying runtime_info to userspace, it had already activated the trap, set debug_trap_enabled, taken an extra process reference, and opened the debug event file. Return -EFAULT without unwinding that state, leaving inconsistent trap state and a refcount imbalance that could break later DISABLE/ENABLE. On copy_to_user failure, deactivate the trap and undo the rest of the enable setup before returning. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 01112e241e37f9ac98b6f418d93ce2e0b87b7ee0)
11 daysdrm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTSSunday Clement1-0/+2
The kfd_wait_on_events ioctl passes a user-supplied num_events parameter directly to alloc_event_waiters() which calls kcalloc() without validation. This allows unprivileged users with /dev/kfd access to trigger large kernel memory allocations, potentially causing memory exhaustion and denial of service via the OOM killer. Add a check to reject num_events values exceeding KFD_SIGNAL_EVENT_LIMIT (4096), which is the maximum number of events a single process can create. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 39eb6da7acee8d0cc12a8959235b590f295d7b4c)
11 daysdrm/amdgpu: restart the CS if some parts of the VM are still invalidatedChristian König1-1/+3
Make sure that we only submit work with full up to date VM page tables. Backport to 7.1 and older. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 59720bfd8c6dbebeb8d5a7ab64241b007efd9213) Cc: stable@vger.kernel.org
11 daysdrm/amdgpu/userq: Fix reading timeline points in wait ioctlDavid Rosca1-4/+5
Use correct u64 type. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0ac98160dfb6ab3c6d7b38e0ff9687780beed9cb)
11 daysdrm/amdkfd: fix SMI event cross-process information leakYongqiang Sun1-3/+5
kfd_smi_ev_enabled() skips the suser privilege check when pid=0. PROCESS_START, PROCESS_END, and VMFAULT events are emitted with pid=0 while carrying another process's PID and command name, so any /dev/kfd user in the render group can monitor all GPU workloads. Pass the target process PID into kfd_smi_event_add() for these events so the existing per-client filter restricts delivery to the owning process or CAP_SYS_ADMIN subscribers. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 92a8dba246d371fe268280e5fd74b0955688e6df)
13 daysdrm/amd/pm: smu_v14_0_0: use SoftMin for gfxclk in set_soft_freq_limited_rangePriya Hosur1-1/+2
In smu_v14_0_0_set_soft_freq_limited_range(), the gfxclk floor is programmed via SetHardMinGfxClk together with SetSoftMaxGfxClk. Under power_dpm_force_performance_level=high this pins HardMin to peak gfxclk. In PMFW arbitration HardMin has higher priority than SoftMax, so the firmware thermal/PPT throttler cannot clamp gfxclk via SoftMax once HardMin is set to peak. Replace SetHardMinGfxClk with SetSoftMinGfxclk so the driver still requests peak performance but the firmware throttler retains the ability to clamp gfxclk under thermal/PPT pressure. SoftMax handling is unchanged and no other clock domains are affected. Signed-off-by: Priya Hosur <Priya.Hosur@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3ea273267fd29cbf6d83ee72329f59eb5042605b) Cc: stable@vger.kernel.org
13 daysdrm/amdgpu: Fix incorrect VRAM GART mappings on non-4K page size systemsDonet Tom1-4/+8
When mapping VRAM pages into the GART page table, amdgpu_gart_map_vram_range() assumes that the system page size is the same as the GPU page size. On systems with non-4K page sizes, multiple GPU pages can exist within a single CPU page. As a result, the mappings are created incorrectly because fewer page table entries are programmed than required. Fix this by programming the mappings correctly for non-4K page size systems. Fixes: 237d623ae659 ("drm/amdgpu/gart: Add helper to bind VRAM pages (v2)") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a8f0bc22388f74e0cf4ed8b7d1846c580eaf44cc) Cc: stable@vger.kernel.org
13 daysdrm/amdgpu/userq: move wptr_obj cleanup in mqd_destroySunil Khatri2-4/+5
In case when queue_create fails and mqd has already been allocated and hence wptr_obj is not cleaned up. So moving that cleanup part to mqd_destroy so it takes care of all the cases of clean up and during tear down of the queue. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 43355f62cd2ef5386c2693df537c232ea0f2ce6c)
13 daysdrm/amdgpu: improve the userq seq BO free bit lookupPrike Liang1-5/+6
Use find_next_zero_bit() to locate the next free seq slot bit instead of the current walk, for more efficient bitmap scanning. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ff905a9b6228de9eedd0db71ecb1bdde91fb898d)
13 daysdrm/amdgpu/userq: remove the vital queue unmap loggingSunil Khatri3-10/+5
Mesa userqueues free does not wait for the free to complete and go ahead in unmapping the vital bos while kernel is still in queue free and corresponding cleanup. So ideally we don't need the logging for that and hence remove the warn message as this is expected behaviour and functionally, we are making sure to wait for the required fences before unmap. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 758a868043dcb07eca923bc451c16da3e73dc47c)
13 daysdrm/amdkfd: Fix buffer overflow in SDMA queue checkpoint/restore on GFX11Andrew Martin1-8/+41
The v11 MQD manager incorrectly assigned the CP-compute variants of checkpoint_mqd/restore_mqd for KFD_MQD_TYPE_SDMA queues. These functions use sizeof(struct v11_compute_mqd) (2048 bytes) instead of sizeof(struct v11_sdma_mqd) (512 bytes), causing a 1536-byte overflow. During CRIU checkpoint of an SDMA queue on Navi3x: - checkpoint_mqd() reads 2048 bytes from a 512-byte SDMA MQD buffer, leaking 1536 bytes of adjacent GTT memory to userspace During CRIU restore: - restore_mqd() writes 2048 bytes into a 512-byte SDMA MQD buffer, corrupting 1536 bytes of adjacent GTT memory (often the ring buffer or neighboring MQDs) This is a copy-paste regression unique to v11. All other ASIC backends (cik, vi, v9, v10, v12) correctly use the SDMA-specific variants. Add checkpoint_mqd_sdma() and restore_mqd_sdma() functions that properly handle the smaller v11_sdma_mqd structure, matching the pattern used in other MQD managers. Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3") Assisted-by: Claude:Sonnet 4-5 Signed-off-by: Andrew Martin <andrew.martin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6fa41db7ffdec97d62433adf03b7b9b759af8c2c) Cc: stable@vger.kernel.org
13 daysdrm/amdkfd: fix NULL dereference in get_queue_ids()Muhammad Bilal1-1/+1
When usr_queue_id_array is NULL and num_queues is non-zero, get_queue_ids() returns NULL. The callers check only IS_ERR() on the return value; since IS_ERR(NULL) == false the check passes, and suspend_queues() calls q_array_invalidate() which immediately dereferences NULL while iterating num_queues times. Userspace can trigger this via kfd_ioctl_set_debug_trap() by supplying num_queues > 0 with a zero queue_array_ptr, causing a kernel panic. A NULL usr_queue_id_array with num_queues == 0 is a legitimate no-op (q_array_invalidate never executes, and resume_queues already guards all queue_ids dereferences behind a NULL check). Return ERR_PTR(-EINVAL) only when num_queues is non-zero and the pointer is absent; both callers already propagate IS_ERR() returns correctly to userspace. Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process queues operation") Signed-off-by: Muhammad Bilal <meatuni001@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f165a82cdf503884bb1797771c61b2fcc72113d4) Cc: stable@vger.kernel.org
13 daysdrm/amdgpu: set noretry=1 as default for GFX 10.1.x (Navi10/12/14)Vitaly Prosyak1-1/+1
Problem: While developing the amd_close_race IGT test (which intentionally triggers execute permission faults by removing VM_PAGE_EXECUTABLE from GPU page table entries), we discovered that on Navi10 (GFX 10.1.x) these faults produce zero diagnostic output. The GPU simply hangs silently for ~10s until the scheduler timeout fires. There is no way to distinguish an execute permission fault from any other type of GPU hang. Root cause: GFX 10.1.x defaults to noretry=0, which sets RETRY_PERMISSION_OR_INVALID_PAGE_FAULT=1 in the GFXHUB UTCL2 registers (gfxhub_v2_0.c line 313). With this bit set, permission faults (valid PTE, wrong R/W/X bits) are handled entirely within the UTCL1/UTCL2 hardware loop: UTCL2 returns an XNACK to UTCL1, and UTCL1 re-requests the translation indefinitely, expecting software to eventually fix the permission bits (as happens in SVM/HMM recovery). No interrupt of any kind reaches the IH ring. This is different from invalid-page faults (V=0) which DO generate a retry fault interrupt that the driver can escalate to a no-retry fault. Permission faults with valid PTEs loop silently forever in hardware. GFX 10.3+ already defaults to noretry=1, which makes permission faults generate immediate L2 protection fault interrupts. GFX 10.1.x was inadvertently left out of this default. Fix: Change the noretry=1 threshold from IP_VERSION(10, 3, 0) to IP_VERSION(10, 1, 0) in amdgpu_gmc_noretry_set(). This is a one-line change that aligns GFX 10.1.x behavior with GFX 10.3+ and all newer generations. With noretry=1, the existing non-retry fault handler (gmc_v10_0_process_interrupt) already decodes and prints the full GCVM_L2_PROTECTION_FAULT_STATUS register including PERMISSION_FAULTS, faulting address, VMID, PASID, and process name. No additional logging code is needed — the fix is purely routing permission faults to the existing, fully-capable non-retry interrupt handler. v2: Dropped GFX10-specific logging from gmc_v10_0.c and kfd_int_process_v10.c (Felix Kuehling). v1 added logging in the retry fault handler, but with noretry=1 permission faults take the non-retry path — the v1 retry handler code was dead and would never execute. Tested on Navi10 (GFX 10.1.10): - Execute permission faults now produce immediate, clear output: [gfxhub] page fault (src_id:0 ring:64 vmid:4 pasid:592) Process amd_close_race pid 13380 thread amd_close_race pid 13384 in page at address 0x40001000 from client 0x1b (UTCL2) GCVM_L2_PROTECTION_FAULT_STATUS:0x00700881 PERMISSION_FAULTS: 0x8 - No regressions with properly-mapped GPU workloads Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit eb21edd24c40d81066753f8ac6f23bce15745395) Cc: stable@vger.kernel.org
13 daysdrm/amdgpu/gfxhub: Program CRASH_ON_*_FAULT bits to 0 as neededTimur Kristóf9-56/+38
When the fault stop mode isn't AMDGPU_VM_FAULT_STOP_ALWAYS, these bits should be programmed to 0. Program CRASH_ON_NO_RETRY_FAULT and CRASH_ON_RETRY_FAULT always, to make sure to clear the bits when we don't want to crash. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d0cd99e73090700b7a942b98a3327ec966597d0a)
13 daysdrm/amdgpu: fix waiting for all submissions for userptrsChristian König1-2/+4
Wait for all submissions when userptrs need to be invalidated by the MMU notifier, not just the one the userptr was involved into. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91250893cbaa25c86872deca95a540d08de1f91e) Cc: stable@vger.kernel.org
13 daysdrm/amdgpu: drm/amdgpu: Set correct DMA mask for gfx12.1Harish Kasiviswanathan1-4/+8
Set correct DMA mask for gfx12 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a2ef14ee2593b48242b8d90f229f71c1710529da)
13 daysdrm/amdgpu: Use asic specific pte_addr_maskHarish Kasiviswanathan9-1/+12
For PTE creation use asic specific physical page base address mask v2: Change variable name from pa_mask to pte_addr_mask Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2ea989885941a6e5607ef86dbe309e90b7191f21)
13 daysdrm/amd/pm: zero unused SMU argument registersYang Wang1-2/+6
SMU messages may use fewer arguments than the available argument registers, the previous code only wrote used registers and left the rest unchanged, so stale values from a prior message could persist. Write all argument registers for each message and zero the unused tail to keep command arguments deterministic and avoid unintended carry-over. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e03b635f61f77ebd5107ef82f48e3221cb695856)
13 daysdrm/amd/pm: mark metrics.energy_accumulator is invalid for smu 14.0.2Yang Wang1-1/+0
EnergyAccumulator is unsupported on SMU 14.0.2, mark it invalid. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 646b05043eeed04b51c14aad22a400a8250af4b7) Cc: stable@vger.kernel.org
13 daysdrm/amd/pm: fix smu13 power limit default/cap calculationYang Wang2-29/+35
smu_v13_0_0_get_power_limit() and smu_v13_0_7_get_power_limit() mix runtime power_limit with PP table limits when reporting default/min/max. When current power limit query succeeds, default_power_limit was set to the runtime value instead of the PP table default, and min/max could be derived from inconsistent bases (MsgLimits/runtime), leading to incorrect cap info. Use SocketPowerLimitAc/Dc as the PP default base (pp_limit), keep current_power_limit as runtime value, and derive min/max from pp_limit with OD percentages. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5227 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1eaf26db95901ca70737503a89b831dd763c8453) Cc: stable@vger.kernel.org
13 daysdrm/amd/pm: apply SMU 13.0.10 workaround during MP1 unloadYang Wang1-1/+9
On SMU v13.0.10, sending PrepareMp1ForUnload with the default parameter may leave the device in an inaccessible state. This can affect runtime power management and partial PnP flows. e.g: kexec, driver unload, boco/d3cold. Pass the required workaround parameter 0x55, when preparing MP1 for unload on SMU v13.0.10, keep the existing behavior for other SMU versions. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5133 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4e8ee1afeedb8d24dd22cdd5ae9f98a6d76ebe4b) Cc: stable@vger.kernel.org
13 daysdrm/amdgpu: Align amdgpu_gtt_mgr entries to TLB size on all SITimur Kristóf1-1/+1
It seems that Pitcairn has the same issues as Tahiti with regards to the TLB size. This commit fixes a VCE1 FW validation timeout on suspend/resume on Pitcairn. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5336 Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 629279e2e798cd161cf74f40aaebfeb16d45eb01)
13 daysdrm/amdgpu: unmap userq for evicting user queuePrike Liang1-2/+2
If the driver only preempts queues, there can still be inflight waves, pending dispatch state, or resume/redispatch possibility tied to the same queue. Then the VM/TTM side may proceed to move/unmap queue related BOs during evicting userq objects while shader TCP clients still need to access them. So for eviction, unmap is safer because it makes the queue nonrunnable before memory backing is invalidated. Meanwhile, for a idle queue it's more sutiable for unmapping it rather preempt and unmapping also can save more processing time than preempt. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d87c9d86727a0bcc95c3009a213a1b27a11b691e)
13 daysdrm/amdgpu/sdma7.1: fix support for disable_kqAlex Deucher1-0/+1
Set the flag in the ring structure. Fixes: 80d4d3a45b86 ("drm/amdgpu/sdma7.1: add support for disable_kq") Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e0a3aa8a6750e8cf067fe2146dc618ffd296d5ef)
13 daysdrm/amdkfd: fix UAF race in destroy_queue_cpschAlysa Liu1-1/+7
wait_on_destroy_queue() drops locks to wait for queue resume, allowing a concurrent destroy to free the queue. Use is_being_destroyed flag to serialize destruction. Reviewed-by: Amir Shetaia <Amir.Shetaia@amd.com> Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ac081deaf16a639ea7dff2f285fe421a33c1ade0)
13 daysdrm/amd/display: Bound VBIOS record-chain walk loopsHarry Wentland3-14/+33
[Why & How] All record-chain walk loops in bios_parser.c and bios_parser2.c use for(;;) and only terminate on a 0xFF record_type sentinel or zero record_size. A malformed VBIOS image missing the terminator record causes unbounded iteration at probe time, potentially hundreds of thousands of iterations with record_size=1. In the final iterations near the BIOS image boundary, struct casts beyond the 2-byte header validated by GET_IMAGE can also read out of bounds. Cap all 14 record-chain walk loops to BIOS_MAX_NUM_RECORD (256) iterations. The atombios.h defines up to 22 distinct record types and atomfirmware.h has 13. Assuming an average of less than 10 records per type (which is reasonable since most are connector- based) 256 is a generous upper bound. Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Assisted-by: Copilot:claude-opus-4.6 Mythos Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 95700a3d660287ed657d6892f7be9ffc0e294a93) Cc: stable@vger.kernel.org
13 daysdrm/amd/display: Clamp HDMI HDCP2 rx_id_list read to buffer sizeHarry Wentland1-1/+2
[Why & How] During HDCP 2.x repeater authentication over HDMI, the driver reads the sink's RxStatus register and extracts a 10-bit message size field (max value 1023). This value is used as the read length for the ReceiverID list without being clamped to the size of the destination buffer rx_id_list[177]. A malicious HDMI repeater could advertise a message size larger than the buffer, causing an out-of-bounds write during the I2C read. Clamp the read length in mod_hdcp_read_rx_id_list() to the size of the rx_id_list buffer, matching the approach already used in the DP branch. Fixes: eff682f83c9c ("drm/amd/display: Add DDC handles for HDCP2.2") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 229212219e4247d9486f8ba41ef087358490be09) Cc: stable@vger.kernel.org
13 daysdrm/amd/display: Reject gpio_bitshift >= 32 in bios_parser_get_gpio_pin_info()Harry Wentland1-2/+4
[Why & How] gpio_bitshift is a uint8_t read directly from the VBIOS GPIO pin table. If the value is >= 32, the expression "1 << gpio_bitshift" triggers undefined behaviour in C (shift count exceeds type width). On x86 the shift is silently masked to 5 bits, producing an incorrect GPIO mask that may cause wrong MMIO register bits to be toggled. Validate gpio_bitshift before use and return BP_RESULT_BADBIOSTABLE for out-of-range values. Fixes: ae79c310b1a6 ("drm/amd/display: Add DCE12 bios parser support") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit eadf438ab8d370b9d19acee9359918c85afeb80d) Cc: stable@vger.kernel.org
13 daysdrm/amd/display: Use krealloc_array() in dal_vector_reserve()Harry Wentland1-2/+2
[Why & How] dal_vector_reserve() computes the allocation size as "capacity * vector->struct_size" using uint32_t arithmetic, which can silently wrap to a small value on overflow. This would cause krealloc to return a smaller buffer than expected, leading to heap overflows on subsequent vector appends. Replace krealloc() with krealloc_array() which performs an internal overflow check and returns NULL on wrap, preventing the issue. Fixes: 2004f45ef83f ("drm/amd/display: Use kernel alloc/free") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 37668568641ccc4cc1dbca4923d0a16609dd5707) Cc: stable@vger.kernel.org
13 daysdrm/amd/display: Fix NULL deref and buffer over-read in SDP debugfsHarry Wentland1-0/+5
[Why & How] dp_sdp_message_debugfs_write() dereferences connector->base.state->crtc without checking for NULL. A connector can be connected but not bound to any CRTC (e.g. after hot-plug before the next atomic commit), causing a kernel crash when writing to the sdp_message debugfs node. The function also ignores the user-provided size argument and always passes 36 bytes to copy_from_user(), reading past the user buffer when size < 36. Fix both issues by: - Returning -ENODEV when connector->base.state or state->crtc is NULL - Clamping write_size to min(size, sizeof(data)) Fixes: c7ba3653e977 ("drm/amd/display: Generic SDP message access in amdgpu") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6ab4c36a522842ff70474a1c0af2e40e50fc8300) Cc: stable@vger.kernel.org
13 daysdrm/amd/display: Clamp VBIOS HDMI retimer register count to array sizeHarry Wentland1-16/+32
[Why & How] The VBIOS integrated info tables (v1_11 and v2_1) contain HdmiRegNum and Hdmi6GRegNum fields that are used as loop bounds when copying retimer I2C register settings into fixed-size arrays (dp*_ext_hdmi_reg_settings[9] and dp*_ext_hdmi_6g_reg_settings[3]). These u8 fields are not validated before use, so a malformed VBIOS can specify values up to 255, causing an out-of-bounds heap write during driver probe. Clamp each register count to the destination array size using min_t() before the copy loops, in both get_integrated_info_v11() and get_integrated_info_v2_1(). Assisted-by: GitHub Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5a7f0ef90195940c54b0f5bb85b87da55f038c69) Cc: stable@vger.kernel.org
13 daysdrm/amd/display: Fix out-of-bounds read in dp_get_eq_aux_rd_interval()Harry Wentland1-1/+1
[Why & How] The aux_rd_interval array in struct dc_lttpr_caps is declared with MAX_REPEATER_CNT - 1 (7) elements, indexed 0..6. However, the offset parameter passed to dp_get_eq_aux_rd_interval() can be as large as MAX_REPEATER_CNT (8) when a sink reports 8 LTTPR repeaters via DPCD. This leads to an out-of-bounds read of aux_rd_interval[7] when offset is 8. Fix this by growing aux_rd_interval to MAX_REPEATER_CNT elements to accommodate the full range of valid repeater counts defined by the DP spec. Assisted-by: GitHub Copilot:Claude claude-4-opus Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a55a458a8df37a65ffda5cf721d554a8f74f6b04) Cc: stable@vger.kernel.org
13 daysdrm/amd/display: add missing CSC entries for BT.2020 for DCE IPsLeorize2-2/+18
DCE-based hardware does not have the CSC matrices for BT.2020, which causes the driver to fallback to the GPU built-in matrices. This does not appear to cause any issues for RGB sinks, but causes major color artifacts for YCbCr ones (e.g. black becomes green). This commit adds the missing CSC matrices (taken from DC common) to DCE CSC tables, resolving the issue. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/3358 Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5333 Assisted-by: oh-my-pi:GPT-5.5 Signed-off-by: Leorize <leorize+oss@disroot.org> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 51e6668ab4baf55b082c376318d51ef965757196) Cc: stable@vger.kernel.org
2026-05-27drm/amdgpu: fix calling VM invalidation in amdgpu_hmm_invalidate_gfxChristian König2-2/+6
Otherwise we don't invalidate page tables on next CS. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b6444d1bcbc34f6f2a31a3aab3059be082f3683e) Cc: stable@vger.kernel.org
2026-05-27drm/amdgpu: fix amdgpu_hmm_range_get_pagesChristian König1-8/+8
The notifier sequence must only be read once or otherwise we could work with invalid pages. While at it also fix the coding style, e.g. drop the pre-initialized return value and use the common define for 2G range. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c08972f555945cda57b0adb72272a37910153390) Cc: stable@vger.kernel.org
2026-05-27drm/amdgpu/userq: use array instead of list for userq_vasSunil Khatri3-77/+45
Use arrays instead of list for userq_vas since we have fixed no of bos. Also, we dont have to worry to free that memory later since this array would be free along with queue only. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ef7dc711a664b0c548ecfdf13a00436b7446b8e7)
2026-05-27drm/amdgpu/userq: move mqd_destroy to later stage to keep core obj validSunil Khatri1-5/+4
mqd_destroy cleans up queue core objects like mqd and fw_object which are needed for any pending fence to signal properly. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4ad65d610096498c8e265615aba42b3c47441bb5)
2026-05-27drm/amdkfd: fix a vulnerability of integer overflow in kfd debuggerEric Huang1-3/+5
get_queue_ids() computes array_size = num_queues * sizeof(uint32_t), which could overflow on 32-bit size_t build. using array_size() instead, it saturates to SIZE_MAX on overflow. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2d57a0475f085c08b49312dfd8edcb461845f285) Cc: stable@vger.kernel.org
2026-05-27drm/amdgpu/userq: remove amdgpu_userq_create/destroy_object wrapperSunil Khatri3-82/+21
Remove the amdgpu_userq_create/destroy_object wrappers and use directly the kernel bo allocation function which does all the things which are done in wrapper. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit deb02080ca5d3f015cf71e56067a39ef2f141998)
2026-05-27drm/amd/pm/si: Disregard vblank time when no displays are connectedTimur Kristóf1-0/+4
When no displays are connected, there is no vblank happening so the power management code shouldn't worry about it. This fixes a regression that caused the memory clock to be stuck at maximum when there were no displays connected to a SI GPU. Fixes: 9003a0746864 ("drm/amd/pm: Treat zero vblank time as too short in si_dpm (v3)") Fixes: 9d73b107a61b ("drm/amd/pm: Use pm_display_cfg in legacy DPM (v2)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6d87e0199f7b83735b56e422d59f170a201897a8) Cc: stable@vger.kernel.org
2026-05-27drm/amdkfd: Check for pdd drm file first in CRIU restore pathDavid Francis1-5/+5
CRIU restore ioctls are meant to be called by CRIU with no existing drm file. There's an error path for if the drm file unexpectedly exists. It was positioned so it was missing a fput(drm_file). Do that check earlier, as soon as we have the pdd. Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2bab781dac78916c5cc8de76345a4102449267d7) Cc: stable@vger.kernel.org
2026-05-27drm/amdgpu: fix potential overflow in fs_info.debugfs_nameStanley.Yang1-1/+2
Use snprintf() with sizeof(fs_info.debugfs_name) so a long RAS block name plus the "_err_inject" suffix cannot overflow the 32-byte buffer. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1a58070fda26857a8f6acc0ab05428e60d5c6844)
2026-05-27drm/amdgpu/userq: make sure queue is valid in the hang_detect_workSunil Khatri1-7/+7
Thread 1: Running amdgpu_userq_destroy which eventually remove the queue from door bell and set userq_mgr = NULL. Thread2: An interrupt might have scheduled the hang_detect_work which still need userq_mgr to be valid but could get an NULL ptrs. To fix that make sure we cancel the hang_detect_work again before setting userq_mgr to NULL. Along with that we also need all the queue va to remain valid till we could be running anything on the queue and hence moving the userq_va post hang_detect handler is cancelled. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1a66ceb98b137d18d303b9889f0e7d8c4db73943)
2026-05-27drm/amdgpu/userq: reserve root bo without interruptionSunil Khatri1-5/+1
Fix the code to make it an uninterruptible reservation for root bo. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d409ab4e387d94b2e593d558b54b7bfd315e0e75)
2026-05-27drm/amdgpu/userq: add amdgpu_bo_unpin when amdgpu_ttm_alloc_gart failsSunil Khatri1-1/+3
Unpin the wptr_obj->obj when amdgpu_ttm_alloc_gart fails. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d8145c437ccdc2d91c579787290f82788172bea0)
2026-05-27drm/amdgpu: simplify return value in amdgpu_userq_get_doorbell_indexSunil Khatri2-14/+11
amdgpu_userq_get_doorbell_index returns a uint64 type index as well as a int type failure values. Simplifying this and using a int type return value and getting the index in input pointer of type uint64 type. Also since it's used at once place making it static would be better. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e947ec9d0529d5f93dbdb33cd197347f6a7b2922)
2026-05-27drm/amdkfd: fix NULL pointer bug in svm_range_set_attrEric Huang1-0/+3
The process_info could be NULL if user doesn't call kfd_ioctl_acquire_vm before calling kfd_ioctl_svm. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 83a26c812e0529eb040d31a76f73e33e637243d4) Cc: stable@vger.kernel.org