summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm
AgeCommit message (Collapse)AuthorFilesLines
2025-08-20drm/amd/display: Allow DCN301 to clear update flagsIvan Lipski1-1/+2
commit 2d418e4fd9f1eca7dfce80de86dd702d36a06a25 upstream. [Why & How] Not letting DCN301 to clear after surface/stream update results in artifacts when switching between active overlay planes. The issue is known and has been solved initially. See below: (https://gitlab.freedesktop.org/drm/amd/-/issues/3441) Fixes: f354556e29f4 ("drm/amd/display: limit clear_update_flags t dcn32 and above") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-20drm/amdgpu: fix incorrect vm flags to map boJack Xiao1-2/+2
[ Upstream commit 040bc6d0e0e9c814c9c663f6f1544ebaff6824a8 ] It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b08425fa77ad2f305fe57a33dceb456be03b653f) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: fix vram reservation issueYiPeng Chai1-2/+1
[ Upstream commit 10ef476aad1c848449934e7bec2ab2374333c7b6 ] The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d38eaf27de1b8584f42d6fb3f717b7ec44b3a7a1) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/xe/hwmon: Add SW clamp for power limits writesKarthik Poosa1-0/+29
[ Upstream commit 55d49f06162e45686399df4ae6292167f0deab7c ] Clamp writes to power limits powerX_crit/currX_crit, powerX_cap, powerX_max, to the maximum supported by the pcode mailbox when sysfs-provided values exceed this limit. Although the pcode already performs clamping, values beyond the pcode mailbox's supported range get truncated, leading to incorrect critical power settings. This patch ensures proper clamping to prevent such truncation. v2: - Address below review comments. (Riana) - Split comments into multiple sentences. - Use local variables for readability. - Add a debug log. - Use u64 instead of unsigned long. v3: - Change drm_dbg logs to drm_info. (Badal) v4: - Rephrase the drm_info log. (Rodrigo, Riana) - Rename variable max_mbx_power_limit to max_supp_power_limit, as limit is same for platforms with and without mailbox power limit support. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: 92d44a422d0d ("drm/xe/hwmon: Expose card reactive critical power") Fixes: fb1b70607f73 ("drm/xe/hwmon: Expose power attributes") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://lore.kernel.org/r/20250808185310.3466529-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit d301eb950da59f962bafe874cf5eb6d61a85b2c2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/xe/migrate: prevent potential UAFMatthew Auld1-4/+5
[ Upstream commit 145832fbdd17b1d77ffd6cdd1642259e101d1b7e ] If we hit the error path, the previous fence (if there is one) has already been put() prior to this, so doing a fence_wait could lead to UAF. Tweak the flow to do to the put() until after we do the wait. Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-8-matthew.auld@intel.com (cherry picked from commit 9b7ca35ed28fe5fad86e9d9c24ebd1271e4c9c3e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/xe/migrate: don't overflow max copy sizeMatthew Auld1-0/+6
[ Upstream commit 4126cb327a2e3273c81fcef1c594c5b7b645c44c ] With non-page aligned copy, we need to use 4 byte aligned pitch, however the size itself might still be close to our maximum of ~8M, and so the dimensions of the copy can easily exceed the S16_MAX limit of the copy command leading to the following assert: xe 0000:03:00.0: [drm] Assertion `size / pitch <= ((s16)(((u16)~0U) >> 1))` failed! platform: BATTLEMAGE subplatform: 1 graphics: Xe2_HPG 20.01 step A0 media: Xe2_HPM 13.01 step A1 tile: 0 VRAM 10.0 GiB GT: 0 type 1 WARNING: CPU: 23 PID: 10605 at drivers/gpu/drm/xe/xe_migrate.c:673 emit_copy+0x4b5/0x4e0 [xe] To fix this account for the pitch when calculating the number of current bytes to copy. Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-7-matthew.auld@intel.com (cherry picked from commit 8c2d61e0e916e077fda7e7b8e67f25ffe0f361fc) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/xe/migrate: prevent infinite recursionMatthew Auld1-12/+17
[ Upstream commit 9d7a1cbebbb691891671def57407ba2f8ee914e8 ] If the buf + offset is not aligned to XE_CAHELINE_BYTES we fallback to using a bounce buffer. However the bounce buffer here is allocated on the stack, and the only alignment requirement here is that it's naturally aligned to u8, and not XE_CACHELINE_BYTES. If the bounce buffer is also misaligned we then recurse back into the function again, however the new bounce buffer might also not be aligned, and might never be until we eventually blow through the stack, as we keep recursing. Instead of using the stack use kmalloc, which should respect the power-of-two alignment request here. Fixes a kernel panic when triggering this path through eudebug. v2 (Stuart): - Add build bug check for power-of-two restriction - s/EINVAL/ENOMEM/ Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-6-matthew.auld@intel.com (cherry picked from commit 38b34e928a08ba594c4bbf7118aa3aadacd62fff) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/i915/psr: Do not trigger Frame Change events from frontbuffer flushJouni Högander1-5/+9
[ Upstream commit 184889dfe0568528fd6d14bba864dd57ed45bbf2 ] We want to get rid of triggering "Frame Change" events from frontbuffer flush calls. We are about to move using TRANS_PUSH register for this on LunarLake and onwards. Touching TRANS_PUSH register from fronbuffer flush would be problematic as it's written by DSB as well. Fix this by using intel_psr_exit when flush or invalidate is done on LunarLake and onwards. This is not possible on AlderLake and MeteorLake due to HW bug in PSR2 disable. This patch is also fixing problems with cursor plane where cursor is disappearing or duplicate cursor is seen on the screen. v2: Commit message updated Bspec: 68927, 68934, 66624 Reported-by: Janna Martl <janna.martl109@gmail.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5522 Fixes: 411ad63877bb ("drm/i915/psr: Use SFF_CTL on invalidate/flush for LunarLake onwards") Tested-by: Janna Martl <janna.martl109@gmail.com> Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/20250801062905.564453-1-jouni.hogander@intel.com (cherry picked from commit 46fb38cb20c0d185a6391ab524b23e0e0219c41f) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/i915/fbc: fix the implementation of wa_18038517565Vinod Govindapillai1-4/+4
[ Upstream commit fd56b9c9507f32b16159f9a922e1af5628254567 ] As per the wa_18038517565, we need to disable FBC compressor clock gating before enabling FBC and enable after disabling FBC. Placing the enabling of clock gating in the fbc deactivate function can make the above wa logic go wrong in case of frontbuffer rendering FBC mechanism. FBC deactivate can get called during fb invalidate and then the corresponding FBC activate can get called without properly disabling the clock gating and can result in compression stalled. So move the enable clock gating at the end of one FBC session after FBC is completely disabled for a pipe. Bspec: 74212, 72197, 69741, 65555 Fixes: 010363c46189 ("drm/i915/display: implement wa_18038517565") Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Link: https://lore.kernel.org/r/20250729124648.288497-1-vinod.govindapillai@intel.com (cherry picked from commit 82dde0407ab126f8413fd6c51429e5057ced5ba2) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Disable dsc_power_gate for dcn314 by defaultRoman Li1-0/+1
[ Upstream commit 02f3ec53177243d32ee8b6f8ba99136d7887ee3a ] [Why] "REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line" warnings seen after resuming from s2idle. DCN314 has issues with DSC power gating that cause REG_WAIT timeouts when attempting to power down DSC blocks. [How] Disable dsc_power_gate for dcn314 by default. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supportedMario Limonciello1-2/+4
[ Upstream commit a5ce8695d6d1b40d6960d2d298b579042c158f25 ] [Why] If PSR-SU is disabled on the link, then configuring su_y granularity in mod_power_calc_psr_configs() can lead to assertions in psr_su_set_dsc_slice_height(). [How] Check the PSR version in amdgpu_dm_link_setup_psr() to determine whether or not to configure granularity. Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Only finalize atomic_obj if it was initializedMario Limonciello1-1/+2
[ Upstream commit b174084b3fe15ad1acc69530e673c1535d2e4f85 ] [Why] If amdgpu_dm failed to initalize before amdgpu_dm_initialize_drm_device() completed then freeing atomic_obj will lead to list corruption. [How] Check if atomic_obj state is initialized before trying to free. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/ttm: Respect the shrinker core free targetTvrtko Ursulin1-3/+5
[ Upstream commit eac21f8ebeb4f84d703cf41dc3f81d16fa9dc00a ] Currently the TTM shrinker aborts shrinking as soon as it frees pages from any of the page order pools and by doing so it can fail to respect the freeing target which was configured by the shrinker core. We use the wording "can fail" because the number of freed pages will depend on the presence of pages in the pools and the order of the pools on the LRU list. For example if there are no free pages in the high order pools the shrinker core may require multiple passes over the TTM shrinker before it will free the default target of 128 pages (assuming there are free pages in the low order pools). This inefficiency can be compounded by the pool LRU where multiple further calls into the TTM shrinker are required to end up looking at the pool with pages. Improve this by never freeing less than the shrinker core has requested. At the same time we start reporting the number of scanned pages (freed in this case), which prevents the core shrinker from giving up on the TTM shrinker too soon and moving on. v2: * Simplify loop logic. (Christian) * Improve commit message. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250603112750.34997-2-tvrtko.ursulin@igalia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Avoid trying AUX transactions on disconnected portsWayne Lin1-1/+2
[ Upstream commit deb24e64c8881c462b29e2c69afd9e6669058be5 ] [Why & How] Observe that we try to access DPCD 0x600h of disconnected DP ports. In order not to wasting time on retrying these ports, call dpcd_write_rx_power_ctrl() after checking its connection status. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Update DMCUB loading sequence for DCN3.5Nicholas Kazlauskas1-13/+3
[ Upstream commit d42b2331e158fa6bcdc89e4c8c470dc5da20be1f ] [Why] New sequence from HW for reset and firmware reloading has been provided that aims to stabilize the reload sequence in the case the firmware is hung or has outstanding requests. [How] Update the sequence to remove the DMUIF reset and the redundant writes in the release. Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/ttm: Should to return the evict errorEmily Deng1-0/+3
[ Upstream commit 4e16a9a00239db5d819197b9a00f70665951bf50 ] For the evict fail case, the evict error should be returned. v2: Consider ENOENT case. v3: Abort directly when the eviction failed for some reason (except for -ENOENT) and not wait for the move to finish Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250603091154.3472646-1-Emily.Deng@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm: renesas: rz-du: mipi_dsi: Add min check for VCLK rangeLad Prabhakar1-0/+3
[ Upstream commit e37a95d01d5acce211da8446fefbd8684c67f516 ] The VCLK range for Renesas RZ/G2L SoC is 5.803 MHz to 148.5 MHz. Add a minimum clock check in the mode_valid callback to ensure that the clock value does not fall below the valid range. Co-developed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com> Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://lore.kernel.org/r/20250609225630.502888-2-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/xe: Make dma-fences compliant with the safe access rulesTvrtko Ursulin3-1/+11
[ Upstream commit 6bd90e700b4285e6a7541e00f969cab0d696adde ] Xe can free some of the data pointed to by the dma-fences it exports. Most notably the timeline name can get freed if userspace closes the associated submit queue. At the same time the fence could have been exported to a third party (for example a sync_fence fd) which will then cause an use- after-free on subsequent access. To make this safe we need to make the driver compliant with the newly documented dma-fence rules. Driver has to ensure a RCU grace period between signalling a fence and freeing any data pointed to by said fence. For the timeline name we simply make the queue be freed via kfree_rcu and for the shared lock associated with multiple queues we add a RCU grace period before freeing the per GT structure holding the lock. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250610164226.10817-5-tvrtko.ursulin@igalia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: clear pa and mca record counter when resetting eepromganglxie1-0/+2
[ Upstream commit d0cc8d2b7df1848f98f0fea8135ba706814b1d13 ] clear pa and mca record counter when resetting eeprom, so that ras_num_bad_pages can be calculated correctly Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: Suspend IH during mode-2 resetLijo Lazar1-4/+29
[ Upstream commit 3f1e81ecb61923934bd11c3f5c1e10893574e607 ] On multi-aid SOCs, there could be a continuous stream of interrupts from GC after poison consumption. Suspend IH to disable them before doing mode-2 reset. This avoids conflicts in hardware accesses during interrupt handlers while a reset is ongoing. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Stop storing failures into adev->dm.cached_stateMario Limonciello1-7/+18
[ Upstream commit 709a37ab9c63297da2194dc36f604537f9d2d417 ] If drm_atomic_helper_suspend() has failed for any reason, it's stored in adev->dm.cached_state. This isn't expected because the resume (or complete()) sequence will attempt to use the stored state to resume. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250602014432.3538345-3-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd: Allow printing VanGogh OD SCLK levels without setting dpm to manualMario Limonciello1-22/+15
[ Upstream commit 2d1ec1e955414e8e8358178011c35afca1a1c0b1 ] Several other ASICs allow printing OD SCLK levels without setting DPM control to manual. When OD is disabled it will show the range the hardware supports. When OD is enabled it will show what values have been programmed. Adjust VanGogh to work the same. Cc: Pierre-Loup A. Griffais <pgriffais@valvesoftware.com> Reported-by: Vicki Pfau <vi@endrift.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250609031227.479079-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/pm: Use pointer type for typecheck()Lijo Lazar1-20/+21
[ Upstream commit b49e3d7ca71aaf1e3412d41522a11a56563799b5 ] typecheck creates local variables based on the type passed. That could result in stack frame size warnings like below in certain configs: drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:2885:1: error: the frame size of 8304 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Checking against the pointer type is sufficient for the purpose of getting a diagnostic message during build time. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com> Tested-by: Palmer Dabbelt <palmer@dabbelt.com> Link: https://lore.kernel.org/r/20250610212141.19445-1-palmer@dabbelt.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Initialize mode_select to 0Alex Hung1-1/+1
[ Upstream commit 592ddac93f8c02e13f19175745465f8c4d0f56cd ] [WHAT] mode_select was supposed to be initialized in mpc_read_gamut_remap but is not set in default case. This can cause indeterminate behaviors. This is reported as an UNINIT error by Coverity. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Fix 'failed to blank crtc!'Wen Chen1-1/+1
[ Upstream commit 01f60348d8fb6b3fbcdfc7bdde5d669f95b009a4 ] [why] DCN35 is having “DC: failed to blank crtc!” when running HPO test cases. It's caused by not having sufficient udelay time. [how] Replace the old wait_for_blank_complete function with fsleep function to sleep just until the next frame should come up. This way it doesn't poll in case the pixel clock or other clock was bugged or until vactive and the vblank are hit again. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Wen Chen <Wen.Chen3@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: Add more checks to PSP mailboxLijo Lazar9-61/+107
[ Upstream commit 8345a71fc54b28e4d13a759c45ce2664d8540d28 ] Instead of checking the response flag, use status mask also to check against any unexpected failures like a device drop. Also, log error if waiting on a psp response fails/times out. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/panel: raydium-rm67200: Move initialization from enable() to prepare stageAndy Yan1-15/+7
[ Upstream commit 691674a282bdbf8f8bce4094369a2d1e4b5645e9 ] The DSI host has different modes in prepare() and enable() functions, prepare() is in LP command mode and enable() is in HS video mode. >From our experience, generally the initialization sequence needs to be sent in the LP command mode. Move the setup init function from enable() to prepare() to fix a display shift on rk3568 evb. Tested on rk3568/rk3576/rk3588 EVB. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250618091520.691590-1-andyshrk@163.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: add null checkPeichen Huang1-2/+4
[ Upstream commit 158b9201c17fc93ed4253c2f03b77fd2671669a1 ] [WHY] Prevents null pointer dereferences to enhance function robustness [HOW] Adds early null check and return false if invalid. Reviewed-by: Cruise Hung <cruise.hung@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Separate set_gsl from set_gsl_source_selectIlya Bakoulin1-5/+4
[ Upstream commit 660a467a5e7366cd6642de61f1aaeaf0d253ee68 ] [Why/How] Separate the checks for set_gsl and set_gsl_source_select, since source_select may not be implemented/necessary. Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: Use correct severity for BP threshold exceed eventXiang Liu1-2/+4
[ Upstream commit 4a33ca3f6ee9a013a423a867426704e9c9d785bd ] The severity of CPER for BP threshold exceed event should be set as CPER_SEV_FATAL to match the OOB implementation. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/xe/xe_query: Use separate iterator while filling GT listMatt Roper1-12/+15
[ Upstream commit d4eb4a010262ea7801e576d1033b355910f2f7d4 ] The 'id' value updated by for_each_gt() is the uapi GT ID of the GTs being iterated over, and may skip over values if a GT is not present on the device. Use a separate iterator for GT list array assignments to ensure that the array will be filled properly on future platforms where index in the GT query list may not match the uapi ID. v2: - Include the missing increment of the iterator. (Jonathan) Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-16-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/imagination: Clear runtime PM errors while resetting the GPUAlessio Belle1-1/+58
[ Upstream commit 551507e0d0bf32ce1d7d27533c4b98307380804c ] The runtime PM might be left in error state if one of the callbacks returned an error, e.g. if the (auto)suspend callback failed following a firmware crash. When that happens, any further attempt to acquire or release a power reference will then also fail, making it impossible to do anything else with the GPU. The driver logic will eventually reach the reset code. In pvr_power_reset(), replace pvr_power_get() with a new API pvr_power_get_clear() which also attempts to clear any runtime PM error state if acquiring a power reference is not possible. Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://lore.kernel.org/r/20250624-clear-rpm-errors-gpu-reset-v1-1-b8ff2ae55aac@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/msm: Add error handling for krealloc in metadata setupYuan Chen1-1/+8
[ Upstream commit 1c8c354098ea9d4376a58c96ae6b65288a6f15d8 ] Function msm_ioctl_gem_info_set_metadata() now checks for krealloc failure and returns -ENOMEM, avoiding potential NULL pointer dereference. Explicitly avoids __GFP_NOFAIL due to deadlock risks and allocation constraints. Signed-off-by: Yuan Chen <chenyuan@kylinos.cn> Patchwork: https://patchwork.freedesktop.org/patch/661235/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/msm: use trylock for debugfsRob Clark2-1/+8
[ Upstream commit 0a1ff88ec5b60b41ba830c5bf08b6cd8f45ab411 ] This resolves a potential deadlock vs msm_gem_vm_close(). Otherwise for _NO_SHARE buffers msm_gem_describe() could be trying to acquire the shared vm resv, while already holding priv->obj_lock. But _vm_close() might drop the last reference to a GEM obj while already holding the vm resv, and msm_gem_free_object() needs to grab priv->obj_lock, a locking inversion. OTOH this is only for debugfs and it isn't critical if we undercount by skipping a locked obj. So just use trylock() and move along if we can't get the lock. Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Tested-by: Antonino Maniscalco <antomani103@gmail.com> Reviewed-by: Antonino Maniscalco <antomani103@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/661525/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/msm: Update register xmlRob Clark14-3027/+3312
[ Upstream commit 6733d8276ac02a8790e571d2af4a69a9039d0522 ] Sync register xml from mesa commit eb3e0b7164a3 ("freedreno/a6xx: Split descriptors out into their own file"). Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/662470/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/fbdev-client: Skip DRM clients if modesetting is absentThierry Reding1-0/+5
[ Upstream commit cce91f29c088ba902dd2abfc9c3216ba9a2fb2fe ] Recent generations of Tegra have moved the display components outside of host1x, leading to a device that has no CRTCs attached and hence doesn't support any of the modesetting functionality. When this is detected, the driver clears the DRIVER_MODESET and DRIVER_ATOMIC flags for the device. Unfortunately, this causes the following errors during boot: [ 15.418958] ERR KERN drm drm: [drm] *ERROR* Failed to register client: -95 [ 15.425311] WARNING KERN drm drm: [drm] Failed to set up DRM client; error -95 These originate from the fbdev client checking for the presence of the DRIVER_MODESET flag and returning -EOPNOTSUPP. However, if a driver does not support DRIVER_MODESET this is entirely expected and the error isn't helpful. Prevent this misleading error message by setting up the DRM clients only if modesetting is enabled. Changes in v2: - use DRIVER_MODESET check to avoid registering any clients Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250613122838.2082334-1-thierry.reding@gmail.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/sched: Avoid memory leaks with cancel_job() callbackPhilipp Stanner1-13/+21
[ Upstream commit bf8bbaefaa6ae0a07971ea57b3208df60e8ad0a4 ] Since its inception, the GPU scheduler can leak memory if the driver calls drm_sched_fini() while there are still jobs in flight. The simplest way to solve this in a backwards compatible manner is by adding a new callback, drm_sched_backend_ops.cancel_job(), which instructs the driver to signal the hardware fence associated with the job. Afterwards, the scheduler can safely use the established free_job() callback for freeing the job. Implement the new backend_ops callback cancel_job(). Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursulin@igalia.com/ Reviewed-by: Maíra Canal <mcanal@igalia.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-4-phasta@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/sched/tests: Add unit test for cancel_job()Philipp Stanner1-0/+42
[ Upstream commit c2668a0e03501a26cedc82e32fe9f3f692d330db ] The scheduler unit tests now provide a new callback, cancel_job(). This callback gets used by drm_sched_fini() for all still pending jobs to cancel them. Implement a new unit test to test this. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-6-phasta@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: limit clear_update_flags to dcn32 and aboveCharlene Liu1-3/+2
[ Upstream commit f354556e29f40ef44fa8b13dc914817db3537e20 ] [why] dc has some code out of sync: dc_commit_updates_for_stream handles v1/v2/v3, but dc_update_planes_and_stream makes v1 asic to use v2. as a reression fix: limit clear_update_flags to dcn32 or newer asic. need to follow up that v1 asic using v2 issue. Reviewed-by: Syed Hassan <syed.hassan@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/pm: fix null pointer accessUmio Yasuno1-0/+5
[ Upstream commit d524d40e3a6152a3ea1125af729f8cd8ca65efde ] Writing a string without delimiters (' ', '\n', '\0') to the under gpu_od/fan_ctrl sysfs or pp_power_profile_mode for the CUSTOM profile will result in a null pointer dereference. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4401 Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/xe/pf: Disable PF restart worker on device removalMichal Wajdeczko1-1/+31
[ Upstream commit c286ce6b01f633806b4db3e4ec8e0162928299cd ] We can't let restart worker run once device is removed, since other data that it might want to access could be already released. Explicitly disable worker as part of device cleanup action. Fixes: a4d1c5d0b99b ("drm/xe/pf: Move VFs reprovisioning to worker") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-2-michal.wajdeczko@intel.com (cherry picked from commit a424353937c24554bb242a6582ed8f018b4a411c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/xe/vf: Disable CSC support on VFLukasz Laguna1-0/+1
[ Upstream commit f62408efc8669b82541295a4611494c8c8c52684 ] CSC is not accessible by VF drivers, so disable its support flag on VF to prevent further initialization attempts. Fixes: e02cea83d32d ("drm/xe/gsc: add Battlemage support") Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Cc: Alexander Usyskin <alexander.usyskin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250729123437.5933-1-lukasz.laguna@intel.com (cherry picked from commit 552dbba1caaf0cb40ce961806d757615e26ec668) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/xe/configfs: Fix pci_dev reference leakMichal Wajdeczko1-1/+2
[ Upstream commit 942ac8da6388c25fe62b2792c78715e0ea6e649b ] We are using pci_get_domain_bus_and_slot() function to verify if the given config directory name matches any existing PCI device, but we missed to call matching pci_dev_put() to release reference. While around, also change error code in case of no device match, to make it more specific than generic formatting error. Fixes: 16280ded45fb ("drm/xe: Add configfs to enable survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 0bdd05c2a82bbf2419415d012fd4f5faeca7f1af) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx10: fix kiq locking in KCQ resetAlex Deucher1-4/+2
[ Upstream commit a4b2ba8f631d3e44b30b9b46ee290fbfe608b7d0 ] The ring test needs to be inside the lock. Fixes: 097af47d3cfb ("drm/amdgpu/gfx10: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx9.4.3: fix kiq locking in KCQ resetAlex Deucher1-2/+1
[ Upstream commit 08f116c59310728ea8b7e9dc3086569006c861cf ] The ring test needs to be inside the lock. Fixes: 4c953e53cc34 ("drm/amdgpu/gfx_9.4.3: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx9: fix kiq locking in KCQ resetAlex Deucher1-1/+1
[ Upstream commit 730ea5074dac1b105717316be5d9c18b09829385 ] The ring test needs to be inside the lock. Fixes: fdbd69486b46 ("drm/amdgpu/gfx9: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/panthor: Fix UAF in panthor_gem_create_with_handle() debugfs codeSimona Vetter2-19/+15
[ Upstream commit fe69a391808404977b1f002a6e7447de3de7a88e ] The object is potentially already gone after the drm_gem_object_put(). In general the object should be fully constructed before calling drm_gem_handle_create(), except the debugfs tracking uses a separate lock and list and separate flag to denotate whether the object is actually initialized. Since I'm touching this all anyway simplify this by only adding the object to the debugfs when it's ready for that, which allows us to delete that separate flag. panthor_gem_debugfs_bo_rm() already checks whether we've actually been added to the list or this is some error path cleanup. v2: Fix build issues for !CONFIG_DEBUGFS (Adrián) v3: Add linebreak and remove outdated comment (Liviu) Fixes: a3707f53eb3f ("drm/panthor: show device-wide list of DRM GEM objects over DebugFS") Cc: Adrián Larumbe <adrian.larumbe@collabora.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Simona Vetter <simona.vetter@intel.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250709135220.1428931-1-simona.vetter@ffwll.ch Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/rockchip: vop2: Fix the update of LAYER/PORT select registers when there ↵Andy Yan3-25/+122
are multi display output on rk3588/rk3568 [ Upstream commit 3e89a8c6835476aa782da80585dee9ddae651eea ] The all video ports of rk3568/rk3588 share the same OVL_LAYER_SEL and OVL_PORT_SEL registers, and the configuration of these two registers can be set to take effect when the vsync signal arrives at a certain Video Port. If two threads for two display output choose to update these two registers simultaneously to meet their own plane adjustment requirements(change plane zpos or switch plane from one crtc to another), then no matter which Video Port'svsync signal we choose to follow for these two registers, the display output of the other Video Port will be abnormal. This is because the configuration of this Video Port does not take effect at the right time (its configuration should take effect when its VSYNC signal arrives). In order to solve this problem, when performing plane migration or change the zpos of planes, there are two things to be observed and followed: 1. When a plane is migrated from one VP to another, the configuration of the layer can only take effect after the Port mux configuration is enabled. 2. When change the zpos of planes, we must ensure that the change for the previous VP takes effect before we proceed to change the next VP. Otherwise, the new configuration might overwrite the previous one for the previous VP, or it could lead to the configuration of the previous VP being take effect along with the VSYNC of the new VP. This issue only occurs in scenarios where multi-display output is enabled. Fixes: c5996e4ab109 ("drm/rockchip: vop2: Make overlay layer select register configuration take effect by vsync") Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250421102156.424480-1-andyshrk@163.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/rockchip: vop2: fail cleanly if missing a primary plane for a video-portHeiko Stuebner1-0/+4
[ Upstream commit f9f68bf1d0efeadb6c427c9dbb30f307a7def19b ] Each window of a vop2 is usable by a specific set of video ports, so while binding the vop2, we look through the list of available windows trying to find one designated as primary-plane and usable by that specific port. The code later wants to use drm_crtc_init_with_planes with that found primary plane, but nothing has checked so far if a primary plane was actually found. For whatever reason, the rk3576 vp2 does not have a usable primary window (if vp0 is also in use) which brought the issue to light and ended in a null-pointer dereference further down. As we expect a primary-plane to exist for a video-port, add a check at the end of the window-iteration and fail probing if none was found. Fixes: 604be85547ce ("drm/rockchip: Add VOP2 driver") Reviewed-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250610212748.1062375-1-heiko@sntech.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu: fix use-after-free in amdgpu_userq_suspend+0x51a/0x5a0Vitaly Prosyak1-1/+1
[ Upstream commit a886d26f2c8f9e3f3c1869ae368d09c75daac553 ] [ +0.000020] BUG: KASAN: slab-use-after-free in amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000817] Read of size 8 at addr ffff88812eec8c58 by task amd_pci_unplug/1733 [ +0.000027] CPU: 10 UID: 0 PID: 1733 Comm: amd_pci_unplug Tainted: G W 6.14.0+ #2 [ +0.000009] Tainted: [W]=WARN [ +0.000003] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000003] dump_stack_lvl+0x76/0xa0 [ +0.000011] print_report+0xce/0x600 [ +0.000009] ? srso_return_thunk+0x5/0x5f [ +0.000006] ? kasan_complete_mode_report_info+0x76/0x200 [ +0.000007] ? kasan_addr_to_slab+0xd/0xb0 [ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000707] kasan_report+0xbe/0x110 [ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000541] __asan_report_load8_noabort+0x14/0x30 [ +0.000005] amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000535] ? stop_cpsch+0x396/0x600 [amdgpu] [ +0.000556] ? stop_cpsch+0x429/0x600 [amdgpu] [ +0.000536] ? __pfx_amdgpu_userq_suspend+0x10/0x10 [amdgpu] [ +0.000536] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? kgd2kfd_suspend+0x132/0x1d0 [amdgpu] [ +0.000542] amdgpu_device_fini_hw+0x581/0xe90 [amdgpu] [ +0.000485] ? down_write+0xbb/0x140 [ +0.000007] ? __mutex_unlock_slowpath.constprop.0+0x317/0x360 [ +0.000005] ? __pfx_amdgpu_device_fini_hw+0x10/0x10 [amdgpu] [ +0.000482] ? __kasan_check_write+0x14/0x30 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? up_write+0x55/0xb0 [ +0.000007] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? blocking_notifier_chain_unregister+0x6c/0xc0 [ +0.000008] amdgpu_driver_unload_kms+0x69/0x90 [amdgpu] [ +0.000484] amdgpu_pci_remove+0x93/0x130 [amdgpu] [ +0.000482] pci_device_remove+0xae/0x1e0 [ +0.000008] device_remove+0xc7/0x180 [ +0.000008] device_release_driver_internal+0x3d4/0x5a0 [ +0.000007] device_release_driver+0x12/0x20 [ +0.000004] pci_stop_bus_device+0x104/0x150 [ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40 [ +0.000005] remove_store+0xd7/0xf0 [ +0.000005] ? __pfx_remove_store+0x10/0x10 [ +0.000006] ? __pfx__copy_from_iter+0x10/0x10 [ +0.000006] ? __pfx_dev_attr_store+0x10/0x10 [ +0.000006] dev_attr_store+0x3f/0x80 [ +0.000006] sysfs_kf_write+0x125/0x1d0 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000005] kernfs_fop_write_iter+0x2ea/0x490 [ +0.000005] ? rw_verify_area+0x70/0x420 [ +0.000005] ? __pfx_kernfs_fop_write_iter+0x10/0x10 [ +0.000006] vfs_write+0x90d/0xe70 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? __pfx_vfs_write+0x10/0x10 [ +0.000004] ? local_clock+0x15/0x30 [ +0.000008] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_slab_free+0x5f/0x80 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? fdget_pos+0x1d3/0x500 [ +0.000007] ksys_write+0x119/0x220 [ +0.000005] ? putname+0x1c/0x30 [ +0.000006] ? __pfx_ksys_write+0x10/0x10 [ +0.000007] __x64_sys_write+0x72/0xc0 [ +0.000006] x64_sys_call+0x18ab/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __pfx___x64_sys_openat+0x10/0x10 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? fpregs_assert_state_consistent+0x21/0xb0 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? syscall_exit_to_user_mode+0x4e/0x240 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? do_syscall_64+0x88/0x170 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? irqentry_exit+0x43/0x50 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? exc_page_fault+0x7c/0x110 [ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000006] RIP: 0033:0x7480c0b14887 [ +0.000005] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ +0.000005] RSP: 002b:00007fff142b0058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ +0.000006] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007480c0b14887 [ +0.000003] RDX: 0000000000000001 RSI: 00007480c0e7365a RDI: 0000000000000004 [ +0.000003] RBP: 00007fff142b0080 R08: 0000563b2e73c170 R09: 0000000000000000 [ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff142b02f8 [ +0.000003] R13: 0000563b159a72a9 R14: 0000563b159a9d48 R15: 00007480c0f19040 [ +0.000008] </TASK> [ +0.000445] Allocated by task 427 on cpu 5 at 29.342331s: [ +0.000011] kasan_save_stack+0x28/0x60 [ +0.000006] kasan_save_track+0x18/0x70 [ +0.000006] kasan_save_alloc_info+0x38/0x60 [ +0.000005] __kasan_kmalloc+0xc1/0xd0 [ +0.000006] __kmalloc_cache_noprof+0x1bd/0x430 [ +0.000007] amdgpu_driver_open_kms+0x172/0x760 [amdgpu] [ +0.000493] drm_file_alloc+0x569/0x9a0 [ +0.000007] drm_client_init+0x1b7/0x410 [ +0.000007] drm_fbdev_client_setup+0x174/0x470 [ +0.000006] drm_client_setup+0x8a/0xf0 [ +0.000006] amdgpu_pci_probe+0x510/0x10c0 [amdgpu] [ +0.000483] local_pci_probe+0xe7/0x1b0 [ +0.000006] pci_device_probe+0x5bf/0x890 [ +0.000006] really_probe+0x1fd/0x950 [ +0.000005] __driver_probe_device+0x307/0x410 [ +0.000006] driver_probe_device+0x4e/0x150 [ +0.000005] __driver_attach+0x223/0x510 [ +0.000006] bus_for_each_dev+0x102/0x1a0 [ +0.000005] driver_attach+0x3d/0x60 [ +0.000006] bus_add_driver+0x309/0x650 [ +0.000005] driver_register+0x13d/0x490 [ +0.000006] __pci_register_driver+0x1ee/0x2b0 [ +0.000006] rfcomm_dlc_clear_state+0x69/0x220 [rfcomm] [ +0.000011] do_one_initcall+0x9c/0x3e0 [ +0.000007] do_init_module+0x29e/0x7f0 [ +0.000006] load_module+0x5c75/0x7c80 [ +0.000006] init_module_from_file+0x106/0x180 [ +0.000006] idempotent_init_module+0x377/0x740 [ +0.000006] __x64_sys_finit_module+0xd7/0x180 [ +0.000006] x64_sys_call+0x1f0b/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000013] Freed by task 1733 on cpu 5 at 59.907086s: [ +0.000011] kasan_save_stack+0x28/0x60 [ +0.000006] kasan_save_track+0x18/0x70 [ +0.000005] kasan_save_free_info+0x3b/0x60 [ +0.000005] __kasan_slab_free+0x54/0x80 [ +0.000006] kfree+0x127/0x470 [ +0.000006] amdgpu_driver_postclose_kms+0x455/0x760 [amdgpu] [ +0.000493] drm_file_free.part.0+0x5b1/0xba0 [ +0.000006] drm_file_free+0x13/0x30 [ +0.000006] drm_client_release+0x1c4/0x2b0 [ +0.000006] drm_fbdev_ttm_fb_destroy+0xd2/0x120 [drm_ttm_helper] [ +0.000007] put_fb_info+0x97/0xe0 [ +0.000007] unregister_framebuffer+0x197/0x380 [ +0.000005] drm_fb_helper_unregister_info+0x94/0x100 [ +0.000005] drm_fbdev_client_unregister+0x3c/0x80 [ +0.000007] drm_client_dev_unregister+0x144/0x330 [ +0.000006] drm_dev_unregister+0x49/0x1b0 [ +0.000006] drm_dev_unplug+0x4c/0xd0 [ +0.000006] amdgpu_pci_remove+0x58/0x130 [amdgpu] [ +0.000484] pci_device_remove+0xae/0x1e0 [ +0.000008] device_remove+0xc7/0x180 [ +0.000007] device_release_driver_internal+0x3d4/0x5a0 [ +0.000006] device_release_driver+0x12/0x20 [ +0.000007] pci_stop_bus_device+0x104/0x150 [ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40 [ +0.000006] remove_store+0xd7/0xf0 [ +0.000006] dev_attr_store+0x3f/0x80 [ +0.000005] sysfs_kf_write+0x125/0x1d0 [ +0.000006] kernfs_fop_write_iter+0x2ea/0x490 [ +0.000006] vfs_write+0x90d/0xe70 [ +0.000006] ksys_write+0x119/0x220 [ +0.000006] __x64_sys_write+0x72/0xc0 [ +0.000006] x64_sys_call+0x18ab/0x26f0 [ +0.000005] do_syscall_64+0x7c/0x170 [ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000012] The buggy address belongs to the object at ffff88812eec8000 which belongs to the cache kmalloc-rnd-07-4k of size 4096 [ +0.000016] The buggy address is located 3160 bytes inside of freed 4096-byte region [ffff88812eec8000, ffff88812eec9000) [ +0.000023] The buggy address belongs to the physical page: [ +0.000009] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12eec8 [ +0.000007] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ +0.000005] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ +0.000007] page_type: f5(slab) [ +0.000008] raw: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000 [ +0.000005] raw: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000 [ +0.000005] head: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000003 ffffea0004bbb201 ffffffffffffffff 0000000000000000 [ +0.000005] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 [ +0.000005] page dumped because: kasan: bad access detected [ +0.000010] Memory state around the buggy address: [ +0.000009] ffff88812eec8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff88812eec8b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] >ffff88812eec8c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ^ [ +0.000010] ffff88812eec8c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ffff88812eec8d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ================================================================== The use-after-free occurs because a delayed work item (`suspend_work`) may still be pending or running when resources it accesses are freed during device removal or file close. The previous code used `flush_work(&fpriv->evf_mgr.suspend_work.work)`, which does not wait for delayed work that has not yet started. As a result, the delayed work could run after its memory was freed, causing a use-after-free. By switching to `flush_delayed_work(&fpriv->evf_mgr.suspend_work)`, we ensure that the kernel waits for both queued and delayed work to finish before freeing memory, closing this race. Fixes: adba0929736a ("drm/amdgpu: Fix Illegal opcode in command stream Error") Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>