summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2025-08-20drm: renesas: rz-du: mipi_dsi: Add min check for VCLK rangeLad Prabhakar1-0/+3
[ Upstream commit e37a95d01d5acce211da8446fefbd8684c67f516 ] The VCLK range for Renesas RZ/G2L SoC is 5.803 MHz to 148.5 MHz. Add a minimum clock check in the mode_valid callback to ensure that the clock value does not fall below the valid range. Co-developed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com> Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://lore.kernel.org/r/20250609225630.502888-2-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/xe: Make dma-fences compliant with the safe access rulesTvrtko Ursulin3-1/+11
[ Upstream commit 6bd90e700b4285e6a7541e00f969cab0d696adde ] Xe can free some of the data pointed to by the dma-fences it exports. Most notably the timeline name can get freed if userspace closes the associated submit queue. At the same time the fence could have been exported to a third party (for example a sync_fence fd) which will then cause an use- after-free on subsequent access. To make this safe we need to make the driver compliant with the newly documented dma-fence rules. Driver has to ensure a RCU grace period between signalling a fence and freeing any data pointed to by said fence. For the timeline name we simply make the queue be freed via kfree_rcu and for the shared lock associated with multiple queues we add a RCU grace period before freeing the per GT structure holding the lock. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250610164226.10817-5-tvrtko.ursulin@igalia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: clear pa and mca record counter when resetting eepromganglxie1-0/+2
[ Upstream commit d0cc8d2b7df1848f98f0fea8135ba706814b1d13 ] clear pa and mca record counter when resetting eeprom, so that ras_num_bad_pages can be calculated correctly Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: Suspend IH during mode-2 resetLijo Lazar1-4/+29
[ Upstream commit 3f1e81ecb61923934bd11c3f5c1e10893574e607 ] On multi-aid SOCs, there could be a continuous stream of interrupts from GC after poison consumption. Suspend IH to disable them before doing mode-2 reset. This avoids conflicts in hardware accesses during interrupt handlers while a reset is ongoing. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Stop storing failures into adev->dm.cached_stateMario Limonciello1-7/+18
[ Upstream commit 709a37ab9c63297da2194dc36f604537f9d2d417 ] If drm_atomic_helper_suspend() has failed for any reason, it's stored in adev->dm.cached_state. This isn't expected because the resume (or complete()) sequence will attempt to use the stored state to resume. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250602014432.3538345-3-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd: Allow printing VanGogh OD SCLK levels without setting dpm to manualMario Limonciello1-22/+15
[ Upstream commit 2d1ec1e955414e8e8358178011c35afca1a1c0b1 ] Several other ASICs allow printing OD SCLK levels without setting DPM control to manual. When OD is disabled it will show the range the hardware supports. When OD is enabled it will show what values have been programmed. Adjust VanGogh to work the same. Cc: Pierre-Loup A. Griffais <pgriffais@valvesoftware.com> Reported-by: Vicki Pfau <vi@endrift.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250609031227.479079-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/pm: Use pointer type for typecheck()Lijo Lazar1-20/+21
[ Upstream commit b49e3d7ca71aaf1e3412d41522a11a56563799b5 ] typecheck creates local variables based on the type passed. That could result in stack frame size warnings like below in certain configs: drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:2885:1: error: the frame size of 8304 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Checking against the pointer type is sufficient for the purpose of getting a diagnostic message during build time. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com> Tested-by: Palmer Dabbelt <palmer@dabbelt.com> Link: https://lore.kernel.org/r/20250610212141.19445-1-palmer@dabbelt.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Initialize mode_select to 0Alex Hung1-1/+1
[ Upstream commit 592ddac93f8c02e13f19175745465f8c4d0f56cd ] [WHAT] mode_select was supposed to be initialized in mpc_read_gamut_remap but is not set in default case. This can cause indeterminate behaviors. This is reported as an UNINIT error by Coverity. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Fix 'failed to blank crtc!'Wen Chen1-1/+1
[ Upstream commit 01f60348d8fb6b3fbcdfc7bdde5d669f95b009a4 ] [why] DCN35 is having “DC: failed to blank crtc!” when running HPO test cases. It's caused by not having sufficient udelay time. [how] Replace the old wait_for_blank_complete function with fsleep function to sleep just until the next frame should come up. This way it doesn't poll in case the pixel clock or other clock was bugged or until vactive and the vblank are hit again. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Wen Chen <Wen.Chen3@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: Add more checks to PSP mailboxLijo Lazar9-61/+107
[ Upstream commit 8345a71fc54b28e4d13a759c45ce2664d8540d28 ] Instead of checking the response flag, use status mask also to check against any unexpected failures like a device drop. Also, log error if waiting on a psp response fails/times out. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/panel: raydium-rm67200: Move initialization from enable() to prepare stageAndy Yan1-15/+7
[ Upstream commit 691674a282bdbf8f8bce4094369a2d1e4b5645e9 ] The DSI host has different modes in prepare() and enable() functions, prepare() is in LP command mode and enable() is in HS video mode. >From our experience, generally the initialization sequence needs to be sent in the LP command mode. Move the setup init function from enable() to prepare() to fix a display shift on rk3568 evb. Tested on rk3568/rk3576/rk3588 EVB. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250618091520.691590-1-andyshrk@163.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: add null checkPeichen Huang1-2/+4
[ Upstream commit 158b9201c17fc93ed4253c2f03b77fd2671669a1 ] [WHY] Prevents null pointer dereferences to enhance function robustness [HOW] Adds early null check and return false if invalid. Reviewed-by: Cruise Hung <cruise.hung@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: Separate set_gsl from set_gsl_source_selectIlya Bakoulin1-5/+4
[ Upstream commit 660a467a5e7366cd6642de61f1aaeaf0d253ee68 ] [Why/How] Separate the checks for set_gsl and set_gsl_source_select, since source_select may not be implemented/necessary. Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amdgpu: Use correct severity for BP threshold exceed eventXiang Liu1-2/+4
[ Upstream commit 4a33ca3f6ee9a013a423a867426704e9c9d785bd ] The severity of CPER for BP threshold exceed event should be set as CPER_SEV_FATAL to match the OOB implementation. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/xe/xe_query: Use separate iterator while filling GT listMatt Roper1-12/+15
[ Upstream commit d4eb4a010262ea7801e576d1033b355910f2f7d4 ] The 'id' value updated by for_each_gt() is the uapi GT ID of the GTs being iterated over, and may skip over values if a GT is not present on the device. Use a separate iterator for GT list array assignments to ensure that the array will be filled properly on future platforms where index in the GT query list may not match the uapi ID. v2: - Include the missing increment of the iterator. (Jonathan) Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-16-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/imagination: Clear runtime PM errors while resetting the GPUAlessio Belle1-1/+58
[ Upstream commit 551507e0d0bf32ce1d7d27533c4b98307380804c ] The runtime PM might be left in error state if one of the callbacks returned an error, e.g. if the (auto)suspend callback failed following a firmware crash. When that happens, any further attempt to acquire or release a power reference will then also fail, making it impossible to do anything else with the GPU. The driver logic will eventually reach the reset code. In pvr_power_reset(), replace pvr_power_get() with a new API pvr_power_get_clear() which also attempts to clear any runtime PM error state if acquiring a power reference is not possible. Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://lore.kernel.org/r/20250624-clear-rpm-errors-gpu-reset-v1-1-b8ff2ae55aac@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/msm: Add error handling for krealloc in metadata setupYuan Chen1-1/+8
[ Upstream commit 1c8c354098ea9d4376a58c96ae6b65288a6f15d8 ] Function msm_ioctl_gem_info_set_metadata() now checks for krealloc failure and returns -ENOMEM, avoiding potential NULL pointer dereference. Explicitly avoids __GFP_NOFAIL due to deadlock risks and allocation constraints. Signed-off-by: Yuan Chen <chenyuan@kylinos.cn> Patchwork: https://patchwork.freedesktop.org/patch/661235/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/msm: use trylock for debugfsRob Clark2-1/+8
[ Upstream commit 0a1ff88ec5b60b41ba830c5bf08b6cd8f45ab411 ] This resolves a potential deadlock vs msm_gem_vm_close(). Otherwise for _NO_SHARE buffers msm_gem_describe() could be trying to acquire the shared vm resv, while already holding priv->obj_lock. But _vm_close() might drop the last reference to a GEM obj while already holding the vm resv, and msm_gem_free_object() needs to grab priv->obj_lock, a locking inversion. OTOH this is only for debugfs and it isn't critical if we undercount by skipping a locked obj. So just use trylock() and move along if we can't get the lock. Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Tested-by: Antonino Maniscalco <antomani103@gmail.com> Reviewed-by: Antonino Maniscalco <antomani103@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/661525/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/msm: Update register xmlRob Clark14-3027/+3312
[ Upstream commit 6733d8276ac02a8790e571d2af4a69a9039d0522 ] Sync register xml from mesa commit eb3e0b7164a3 ("freedreno/a6xx: Split descriptors out into their own file"). Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/662470/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/fbdev-client: Skip DRM clients if modesetting is absentThierry Reding1-0/+5
[ Upstream commit cce91f29c088ba902dd2abfc9c3216ba9a2fb2fe ] Recent generations of Tegra have moved the display components outside of host1x, leading to a device that has no CRTCs attached and hence doesn't support any of the modesetting functionality. When this is detected, the driver clears the DRIVER_MODESET and DRIVER_ATOMIC flags for the device. Unfortunately, this causes the following errors during boot: [ 15.418958] ERR KERN drm drm: [drm] *ERROR* Failed to register client: -95 [ 15.425311] WARNING KERN drm drm: [drm] Failed to set up DRM client; error -95 These originate from the fbdev client checking for the presence of the DRIVER_MODESET flag and returning -EOPNOTSUPP. However, if a driver does not support DRIVER_MODESET this is entirely expected and the error isn't helpful. Prevent this misleading error message by setting up the DRM clients only if modesetting is enabled. Changes in v2: - use DRIVER_MODESET check to avoid registering any clients Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250613122838.2082334-1-thierry.reding@gmail.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/sched: Avoid memory leaks with cancel_job() callbackPhilipp Stanner1-13/+21
[ Upstream commit bf8bbaefaa6ae0a07971ea57b3208df60e8ad0a4 ] Since its inception, the GPU scheduler can leak memory if the driver calls drm_sched_fini() while there are still jobs in flight. The simplest way to solve this in a backwards compatible manner is by adding a new callback, drm_sched_backend_ops.cancel_job(), which instructs the driver to signal the hardware fence associated with the job. Afterwards, the scheduler can safely use the established free_job() callback for freeing the job. Implement the new backend_ops callback cancel_job(). Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursulin@igalia.com/ Reviewed-by: Maíra Canal <mcanal@igalia.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-4-phasta@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/sched/tests: Add unit test for cancel_job()Philipp Stanner1-0/+42
[ Upstream commit c2668a0e03501a26cedc82e32fe9f3f692d330db ] The scheduler unit tests now provide a new callback, cancel_job(). This callback gets used by drm_sched_fini() for all still pending jobs to cancel them. Implement a new unit test to test this. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250710125412.128476-6-phasta@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/display: limit clear_update_flags to dcn32 and aboveCharlene Liu1-3/+2
[ Upstream commit f354556e29f40ef44fa8b13dc914817db3537e20 ] [why] dc has some code out of sync: dc_commit_updates_for_stream handles v1/v2/v3, but dc_update_planes_and_stream makes v1 asic to use v2. as a reression fix: limit clear_update_flags to dcn32 or newer asic. need to follow up that v1 asic using v2 issue. Reviewed-by: Syed Hassan <syed.hassan@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drm/amd/pm: fix null pointer accessUmio Yasuno1-0/+5
[ Upstream commit d524d40e3a6152a3ea1125af729f8cd8ca65efde ] Writing a string without delimiters (' ', '\n', '\0') to the under gpu_od/fan_ctrl sysfs or pp_power_profile_mode for the CUSTOM profile will result in a null pointer dereference. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4401 Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/xe/pf: Disable PF restart worker on device removalMichal Wajdeczko1-1/+31
[ Upstream commit c286ce6b01f633806b4db3e4ec8e0162928299cd ] We can't let restart worker run once device is removed, since other data that it might want to access could be already released. Explicitly disable worker as part of device cleanup action. Fixes: a4d1c5d0b99b ("drm/xe/pf: Move VFs reprovisioning to worker") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-2-michal.wajdeczko@intel.com (cherry picked from commit a424353937c24554bb242a6582ed8f018b4a411c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/xe/vf: Disable CSC support on VFLukasz Laguna1-0/+1
[ Upstream commit f62408efc8669b82541295a4611494c8c8c52684 ] CSC is not accessible by VF drivers, so disable its support flag on VF to prevent further initialization attempts. Fixes: e02cea83d32d ("drm/xe/gsc: add Battlemage support") Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Cc: Alexander Usyskin <alexander.usyskin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250729123437.5933-1-lukasz.laguna@intel.com (cherry picked from commit 552dbba1caaf0cb40ce961806d757615e26ec668) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/xe/configfs: Fix pci_dev reference leakMichal Wajdeczko1-1/+2
[ Upstream commit 942ac8da6388c25fe62b2792c78715e0ea6e649b ] We are using pci_get_domain_bus_and_slot() function to verify if the given config directory name matches any existing PCI device, but we missed to call matching pci_dev_put() to release reference. While around, also change error code in case of no device match, to make it more specific than generic formatting error. Fixes: 16280ded45fb ("drm/xe: Add configfs to enable survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 0bdd05c2a82bbf2419415d012fd4f5faeca7f1af) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx10: fix kiq locking in KCQ resetAlex Deucher1-4/+2
[ Upstream commit a4b2ba8f631d3e44b30b9b46ee290fbfe608b7d0 ] The ring test needs to be inside the lock. Fixes: 097af47d3cfb ("drm/amdgpu/gfx10: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx9.4.3: fix kiq locking in KCQ resetAlex Deucher1-2/+1
[ Upstream commit 08f116c59310728ea8b7e9dc3086569006c861cf ] The ring test needs to be inside the lock. Fixes: 4c953e53cc34 ("drm/amdgpu/gfx_9.4.3: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx9: fix kiq locking in KCQ resetAlex Deucher1-1/+1
[ Upstream commit 730ea5074dac1b105717316be5d9c18b09829385 ] The ring test needs to be inside the lock. Fixes: fdbd69486b46 ("drm/amdgpu/gfx9: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/panthor: Fix UAF in panthor_gem_create_with_handle() debugfs codeSimona Vetter2-19/+15
[ Upstream commit fe69a391808404977b1f002a6e7447de3de7a88e ] The object is potentially already gone after the drm_gem_object_put(). In general the object should be fully constructed before calling drm_gem_handle_create(), except the debugfs tracking uses a separate lock and list and separate flag to denotate whether the object is actually initialized. Since I'm touching this all anyway simplify this by only adding the object to the debugfs when it's ready for that, which allows us to delete that separate flag. panthor_gem_debugfs_bo_rm() already checks whether we've actually been added to the list or this is some error path cleanup. v2: Fix build issues for !CONFIG_DEBUGFS (Adrián) v3: Add linebreak and remove outdated comment (Liviu) Fixes: a3707f53eb3f ("drm/panthor: show device-wide list of DRM GEM objects over DebugFS") Cc: Adrián Larumbe <adrian.larumbe@collabora.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Simona Vetter <simona.vetter@intel.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250709135220.1428931-1-simona.vetter@ffwll.ch Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/rockchip: vop2: Fix the update of LAYER/PORT select registers when there ↵Andy Yan3-25/+122
are multi display output on rk3588/rk3568 [ Upstream commit 3e89a8c6835476aa782da80585dee9ddae651eea ] The all video ports of rk3568/rk3588 share the same OVL_LAYER_SEL and OVL_PORT_SEL registers, and the configuration of these two registers can be set to take effect when the vsync signal arrives at a certain Video Port. If two threads for two display output choose to update these two registers simultaneously to meet their own plane adjustment requirements(change plane zpos or switch plane from one crtc to another), then no matter which Video Port'svsync signal we choose to follow for these two registers, the display output of the other Video Port will be abnormal. This is because the configuration of this Video Port does not take effect at the right time (its configuration should take effect when its VSYNC signal arrives). In order to solve this problem, when performing plane migration or change the zpos of planes, there are two things to be observed and followed: 1. When a plane is migrated from one VP to another, the configuration of the layer can only take effect after the Port mux configuration is enabled. 2. When change the zpos of planes, we must ensure that the change for the previous VP takes effect before we proceed to change the next VP. Otherwise, the new configuration might overwrite the previous one for the previous VP, or it could lead to the configuration of the previous VP being take effect along with the VSYNC of the new VP. This issue only occurs in scenarios where multi-display output is enabled. Fixes: c5996e4ab109 ("drm/rockchip: vop2: Make overlay layer select register configuration take effect by vsync") Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250421102156.424480-1-andyshrk@163.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/rockchip: vop2: fail cleanly if missing a primary plane for a video-portHeiko Stuebner1-0/+4
[ Upstream commit f9f68bf1d0efeadb6c427c9dbb30f307a7def19b ] Each window of a vop2 is usable by a specific set of video ports, so while binding the vop2, we look through the list of available windows trying to find one designated as primary-plane and usable by that specific port. The code later wants to use drm_crtc_init_with_planes with that found primary plane, but nothing has checked so far if a primary plane was actually found. For whatever reason, the rk3576 vp2 does not have a usable primary window (if vp0 is also in use) which brought the issue to light and ended in a null-pointer dereference further down. As we expect a primary-plane to exist for a video-port, add a check at the end of the window-iteration and fail probing if none was found. Fixes: 604be85547ce ("drm/rockchip: Add VOP2 driver") Reviewed-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250610212748.1062375-1-heiko@sntech.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu: fix use-after-free in amdgpu_userq_suspend+0x51a/0x5a0Vitaly Prosyak1-1/+1
[ Upstream commit a886d26f2c8f9e3f3c1869ae368d09c75daac553 ] [ +0.000020] BUG: KASAN: slab-use-after-free in amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000817] Read of size 8 at addr ffff88812eec8c58 by task amd_pci_unplug/1733 [ +0.000027] CPU: 10 UID: 0 PID: 1733 Comm: amd_pci_unplug Tainted: G W 6.14.0+ #2 [ +0.000009] Tainted: [W]=WARN [ +0.000003] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000003] dump_stack_lvl+0x76/0xa0 [ +0.000011] print_report+0xce/0x600 [ +0.000009] ? srso_return_thunk+0x5/0x5f [ +0.000006] ? kasan_complete_mode_report_info+0x76/0x200 [ +0.000007] ? kasan_addr_to_slab+0xd/0xb0 [ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000707] kasan_report+0xbe/0x110 [ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000541] __asan_report_load8_noabort+0x14/0x30 [ +0.000005] amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000535] ? stop_cpsch+0x396/0x600 [amdgpu] [ +0.000556] ? stop_cpsch+0x429/0x600 [amdgpu] [ +0.000536] ? __pfx_amdgpu_userq_suspend+0x10/0x10 [amdgpu] [ +0.000536] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? kgd2kfd_suspend+0x132/0x1d0 [amdgpu] [ +0.000542] amdgpu_device_fini_hw+0x581/0xe90 [amdgpu] [ +0.000485] ? down_write+0xbb/0x140 [ +0.000007] ? __mutex_unlock_slowpath.constprop.0+0x317/0x360 [ +0.000005] ? __pfx_amdgpu_device_fini_hw+0x10/0x10 [amdgpu] [ +0.000482] ? __kasan_check_write+0x14/0x30 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? up_write+0x55/0xb0 [ +0.000007] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? blocking_notifier_chain_unregister+0x6c/0xc0 [ +0.000008] amdgpu_driver_unload_kms+0x69/0x90 [amdgpu] [ +0.000484] amdgpu_pci_remove+0x93/0x130 [amdgpu] [ +0.000482] pci_device_remove+0xae/0x1e0 [ +0.000008] device_remove+0xc7/0x180 [ +0.000008] device_release_driver_internal+0x3d4/0x5a0 [ +0.000007] device_release_driver+0x12/0x20 [ +0.000004] pci_stop_bus_device+0x104/0x150 [ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40 [ +0.000005] remove_store+0xd7/0xf0 [ +0.000005] ? __pfx_remove_store+0x10/0x10 [ +0.000006] ? __pfx__copy_from_iter+0x10/0x10 [ +0.000006] ? __pfx_dev_attr_store+0x10/0x10 [ +0.000006] dev_attr_store+0x3f/0x80 [ +0.000006] sysfs_kf_write+0x125/0x1d0 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000005] kernfs_fop_write_iter+0x2ea/0x490 [ +0.000005] ? rw_verify_area+0x70/0x420 [ +0.000005] ? __pfx_kernfs_fop_write_iter+0x10/0x10 [ +0.000006] vfs_write+0x90d/0xe70 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? __pfx_vfs_write+0x10/0x10 [ +0.000004] ? local_clock+0x15/0x30 [ +0.000008] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_slab_free+0x5f/0x80 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? fdget_pos+0x1d3/0x500 [ +0.000007] ksys_write+0x119/0x220 [ +0.000005] ? putname+0x1c/0x30 [ +0.000006] ? __pfx_ksys_write+0x10/0x10 [ +0.000007] __x64_sys_write+0x72/0xc0 [ +0.000006] x64_sys_call+0x18ab/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __pfx___x64_sys_openat+0x10/0x10 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? fpregs_assert_state_consistent+0x21/0xb0 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? syscall_exit_to_user_mode+0x4e/0x240 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? do_syscall_64+0x88/0x170 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? irqentry_exit+0x43/0x50 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? exc_page_fault+0x7c/0x110 [ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000006] RIP: 0033:0x7480c0b14887 [ +0.000005] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ +0.000005] RSP: 002b:00007fff142b0058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ +0.000006] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007480c0b14887 [ +0.000003] RDX: 0000000000000001 RSI: 00007480c0e7365a RDI: 0000000000000004 [ +0.000003] RBP: 00007fff142b0080 R08: 0000563b2e73c170 R09: 0000000000000000 [ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff142b02f8 [ +0.000003] R13: 0000563b159a72a9 R14: 0000563b159a9d48 R15: 00007480c0f19040 [ +0.000008] </TASK> [ +0.000445] Allocated by task 427 on cpu 5 at 29.342331s: [ +0.000011] kasan_save_stack+0x28/0x60 [ +0.000006] kasan_save_track+0x18/0x70 [ +0.000006] kasan_save_alloc_info+0x38/0x60 [ +0.000005] __kasan_kmalloc+0xc1/0xd0 [ +0.000006] __kmalloc_cache_noprof+0x1bd/0x430 [ +0.000007] amdgpu_driver_open_kms+0x172/0x760 [amdgpu] [ +0.000493] drm_file_alloc+0x569/0x9a0 [ +0.000007] drm_client_init+0x1b7/0x410 [ +0.000007] drm_fbdev_client_setup+0x174/0x470 [ +0.000006] drm_client_setup+0x8a/0xf0 [ +0.000006] amdgpu_pci_probe+0x510/0x10c0 [amdgpu] [ +0.000483] local_pci_probe+0xe7/0x1b0 [ +0.000006] pci_device_probe+0x5bf/0x890 [ +0.000006] really_probe+0x1fd/0x950 [ +0.000005] __driver_probe_device+0x307/0x410 [ +0.000006] driver_probe_device+0x4e/0x150 [ +0.000005] __driver_attach+0x223/0x510 [ +0.000006] bus_for_each_dev+0x102/0x1a0 [ +0.000005] driver_attach+0x3d/0x60 [ +0.000006] bus_add_driver+0x309/0x650 [ +0.000005] driver_register+0x13d/0x490 [ +0.000006] __pci_register_driver+0x1ee/0x2b0 [ +0.000006] rfcomm_dlc_clear_state+0x69/0x220 [rfcomm] [ +0.000011] do_one_initcall+0x9c/0x3e0 [ +0.000007] do_init_module+0x29e/0x7f0 [ +0.000006] load_module+0x5c75/0x7c80 [ +0.000006] init_module_from_file+0x106/0x180 [ +0.000006] idempotent_init_module+0x377/0x740 [ +0.000006] __x64_sys_finit_module+0xd7/0x180 [ +0.000006] x64_sys_call+0x1f0b/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000013] Freed by task 1733 on cpu 5 at 59.907086s: [ +0.000011] kasan_save_stack+0x28/0x60 [ +0.000006] kasan_save_track+0x18/0x70 [ +0.000005] kasan_save_free_info+0x3b/0x60 [ +0.000005] __kasan_slab_free+0x54/0x80 [ +0.000006] kfree+0x127/0x470 [ +0.000006] amdgpu_driver_postclose_kms+0x455/0x760 [amdgpu] [ +0.000493] drm_file_free.part.0+0x5b1/0xba0 [ +0.000006] drm_file_free+0x13/0x30 [ +0.000006] drm_client_release+0x1c4/0x2b0 [ +0.000006] drm_fbdev_ttm_fb_destroy+0xd2/0x120 [drm_ttm_helper] [ +0.000007] put_fb_info+0x97/0xe0 [ +0.000007] unregister_framebuffer+0x197/0x380 [ +0.000005] drm_fb_helper_unregister_info+0x94/0x100 [ +0.000005] drm_fbdev_client_unregister+0x3c/0x80 [ +0.000007] drm_client_dev_unregister+0x144/0x330 [ +0.000006] drm_dev_unregister+0x49/0x1b0 [ +0.000006] drm_dev_unplug+0x4c/0xd0 [ +0.000006] amdgpu_pci_remove+0x58/0x130 [amdgpu] [ +0.000484] pci_device_remove+0xae/0x1e0 [ +0.000008] device_remove+0xc7/0x180 [ +0.000007] device_release_driver_internal+0x3d4/0x5a0 [ +0.000006] device_release_driver+0x12/0x20 [ +0.000007] pci_stop_bus_device+0x104/0x150 [ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40 [ +0.000006] remove_store+0xd7/0xf0 [ +0.000006] dev_attr_store+0x3f/0x80 [ +0.000005] sysfs_kf_write+0x125/0x1d0 [ +0.000006] kernfs_fop_write_iter+0x2ea/0x490 [ +0.000006] vfs_write+0x90d/0xe70 [ +0.000006] ksys_write+0x119/0x220 [ +0.000006] __x64_sys_write+0x72/0xc0 [ +0.000006] x64_sys_call+0x18ab/0x26f0 [ +0.000005] do_syscall_64+0x7c/0x170 [ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000012] The buggy address belongs to the object at ffff88812eec8000 which belongs to the cache kmalloc-rnd-07-4k of size 4096 [ +0.000016] The buggy address is located 3160 bytes inside of freed 4096-byte region [ffff88812eec8000, ffff88812eec9000) [ +0.000023] The buggy address belongs to the physical page: [ +0.000009] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12eec8 [ +0.000007] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ +0.000005] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ +0.000007] page_type: f5(slab) [ +0.000008] raw: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000 [ +0.000005] raw: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000 [ +0.000005] head: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000003 ffffea0004bbb201 ffffffffffffffff 0000000000000000 [ +0.000005] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 [ +0.000005] page dumped because: kasan: bad access detected [ +0.000010] Memory state around the buggy address: [ +0.000009] ffff88812eec8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff88812eec8b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] >ffff88812eec8c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ^ [ +0.000010] ffff88812eec8c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ffff88812eec8d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ================================================================== The use-after-free occurs because a delayed work item (`suspend_work`) may still be pending or running when resources it accesses are freed during device removal or file close. The previous code used `flush_work(&fpriv->evf_mgr.suspend_work.work)`, which does not wait for delayed work that has not yet started. As a result, the delayed work could run after its memory was freed, causing a use-after-free. By switching to `flush_delayed_work(&fpriv->evf_mgr.suspend_work)`, we ensure that the kernel waits for both queued and delayed work to finish before freeing memory, closing this race. Fixes: adba0929736a ("drm/amdgpu: Fix Illegal opcode in command stream Error") Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15Revert "drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini"Vitaly Prosyak2-4/+15
[ Upstream commit a73345b866ff8bbd93135af667c973a8fb4b2c40 ] This reverts commit 5fb90421fa0fbe0a968274912101fe917bf1c47b. The original patch moved `amdgpu_userq_mgr_fini()` to the driver's `postclose` callback, which is called after `drm_gem_release()` in the DRM file cleanup sequence.If a user application crashes or aborts without cleaning up its user queues, 'drm_gem_release()` may free GEM objects that are still referenced by active user queues, leading to use-after-free. By reverting, we ensure that user queues are disabled and cleaned up before any GEM objects are released, preventing this class of bug. However, this reintroduces a race during PCI hot-unplug, where device removal can race with per-file cleanup, leading to use-after-free in suspend/unplug paths. This will be fixed in the next patch. Fixes: 5fb90421fa0f ("drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c") Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amd/pm/powerplay/hwmgr/smu_helper: fix order of mask and valueFedor Pchelkin1-1/+1
[ Upstream commit a54e4639c4ef37a0241bac7d2a77f2e6ffb57099 ] There is a small typo in phm_wait_on_indirect_register(). Swap mask and value arguments provided to phm_wait_on_register() so that they satisfy the function signature and actual usage scheme. Found by Linux Verification Center (linuxtesting.org) with Svace static analysis tool. In practice this doesn't fix any issues because the only place this function is used uses the same value for the value and mask. Fixes: 3bace3591493 ("drm/amd/powerplay: add hardware manager sub-component") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/gfx10: fix KGQ reset sequenceAlex Deucher1-5/+15
[ Upstream commit 14b2d71a9a24727f1b9f2131ed5eb2e345840a3a ] Need to reinit the ring before remapping it and all of the KIQ handling needs to be within the kiq lock. Fixes: 1741281a157f ("drm/amdgpu/gfx10: add ring reset callbacks") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu: move force completion into ring resetsAlex Deucher21-30/+152
[ Upstream commit 2dee58ca471dae05c473270d0fb74efe01a78ccb ] Move the force completion handling into each ring reset function so that each engine can determine whether or not it needs to force completion on the jobs in the ring. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 14b2d71a9a24 ("drm/amdgpu/gfx10: fix KGQ reset sequence") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu: rework queue reset scheduler interactionChristian König1-15/+20
[ Upstream commit 821aacb2dcf0d1fbc3c0f7803b6089b01addb8bf ] Stopping the scheduler for queue reset is generally a good idea because it prevents any worker from touching the ring buffer. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 14b2d71a9a24 ("drm/amdgpu/gfx10: fix KGQ reset sequence") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdkfd: Move the process suspend and resume out of full accessEmily Deng6-36/+86
[ Upstream commit 54f7a24e1437d66c9ff36d727a9dff1beeeab429 ] For the suspend and resume process, exclusive access is not required. Therefore, it can be moved out of the full access section to reduce the duration of exclusive access. v3: Move suspend processes before hardware fini. Remove twice call for bare metal. v4: Refine code Signed-off-by: Emily Deng <Emily.Deng@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 14b2d71a9a24 ("drm/amdgpu/gfx10: fix KGQ reset sequence") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/msm/dpu: Fill in min_prefill_lines for SC8180XKonrad Dybcio1-0/+1
[ Upstream commit 5136acc40afc0261802e5cb01b04f871bf6d876b ] Based on the downstream release, predictably same value as for SM8150. Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Fixes: f3af2d6ee9ab ("drm/msm/dpu: Add SC8180x to hw catalog") Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657794/ Link: https://lore.kernel.org/r/20250610-topic-dpu_8180_mpl-v1-1-f480cd22f11c@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70cVitaly Prosyak2-15/+4
[ Upstream commit 5fb90421fa0fbe0a968274912101fe917bf1c47b ] The issue was reproduced on NV10 using IGT pci_unplug test. It is expected that `amdgpu_driver_postclose_kms()` is called prior to `amdgpu_drm_release()`. However, the bug is that `amdgpu_fpriv` was freed in `amdgpu_driver_postclose_kms()`, and then later accessed in `amdgpu_drm_release()` via a call to `amdgpu_userq_mgr_fini()`. As a result, KASAN detected a use-after-free condition, as shown in the log below. The proposed fix is to move the calls to `amdgpu_eviction_fence_destroy()` and `amdgpu_userq_mgr_fini()` into `amdgpu_driver_postclose_kms()`, so they are invoked before `amdgpu_fpriv` is freed. This also ensures symmetry with the initialization path in `amdgpu_driver_open_kms()`, where the following components are initialized: - `amdgpu_userq_mgr_init()` - `amdgpu_eviction_fence_init()` - `amdgpu_ctx_mgr_init()` Correspondingly, in `amdgpu_driver_postclose_kms()` we should clean up using: - `amdgpu_userq_mgr_fini()` - `amdgpu_eviction_fence_destroy()` - `amdgpu_ctx_mgr_fini()` This change eliminates the use-after-free and improves consistency in resource management between open and close paths. [ +0.094367] ================================================================== [ +0.000026] BUG: KASAN: slab-use-after-free in amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000866] Write of size 8 at addr ffff88811c068c60 by task amd_pci_unplug/1737 [ +0.000026] CPU: 3 UID: 0 PID: 1737 Comm: amd_pci_unplug Not tainted 6.14.0+ #2 [ +0.000008] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000003] dump_stack_lvl+0x76/0xa0 [ +0.000010] print_report+0xce/0x600 [ +0.000009] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000790] ? srso_return_thunk+0x5/0x5f [ +0.000007] ? kasan_complete_mode_report_info+0x76/0x200 [ +0.000008] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000684] kasan_report+0xbe/0x110 [ +0.000007] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000601] __asan_report_store8_noabort+0x17/0x30 [ +0.000007] amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000801] ? __pfx_amdgpu_userq_mgr_fini+0x10/0x10 [amdgpu] [ +0.000819] ? srso_return_thunk+0x5/0x5f [ +0.000008] amdgpu_drm_release+0xa3/0xe0 [amdgpu] [ +0.000604] __fput+0x354/0xa90 [ +0.000010] __fput_sync+0x59/0x80 [ +0.000005] __x64_sys_close+0x7d/0xe0 [ +0.000006] x64_sys_call+0x2505/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000004] ? kasan_record_aux_stack+0xae/0xd0 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? kmem_cache_free+0x398/0x580 [ +0.000006] ? __fput+0x543/0xa90 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __fput+0x543/0xa90 [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000007] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? fpregs_assert_state_consistent+0x21/0xb0 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? syscall_exit_to_user_mode+0x4e/0x240 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? do_syscall_64+0x88/0x170 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? do_syscall_64+0x88/0x170 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? irqentry_exit+0x43/0x50 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? exc_page_fault+0x7c/0x110 [ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7ffff7b14f67 [ +0.000005] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff [ +0.000004] RSP: 002b:00007fffffffe358 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ +0.000006] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67 [ +0.000003] RDX: 0000000000000000 RSI: 00007ffff7f5755a RDI: 0000000000000003 [ +0.000003] RBP: 00007fffffffe380 R08: 0000555555568170 R09: 0000000000000000 [ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8 [ +0.000003] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040 [ +0.000007] </TASK> [ +0.000286] Allocated by task 425 on cpu 11 at 29.751192s: [ +0.000013] kasan_save_stack+0x28/0x60 [ +0.000008] kasan_save_track+0x18/0x70 [ +0.000006] kasan_save_alloc_info+0x38/0x60 [ +0.000006] __kasan_kmalloc+0xc1/0xd0 [ +0.000005] __kmalloc_cache_noprof+0x1bd/0x430 [ +0.000006] amdgpu_driver_open_kms+0x172/0x760 [amdgpu] [ +0.000521] drm_file_alloc+0x569/0x9a0 [ +0.000008] drm_client_init+0x1b7/0x410 [ +0.000007] drm_fbdev_client_setup+0x174/0x470 [ +0.000007] drm_client_setup+0x8a/0xf0 [ +0.000006] amdgpu_pci_probe+0x50b/0x10d0 [amdgpu] [ +0.000482] local_pci_probe+0xe7/0x1b0 [ +0.000008] pci_device_probe+0x5bf/0x890 [ +0.000005] really_probe+0x1fd/0x950 [ +0.000007] __driver_probe_device+0x307/0x410 [ +0.000005] driver_probe_device+0x4e/0x150 [ +0.000006] __driver_attach+0x223/0x510 [ +0.000005] bus_for_each_dev+0x102/0x1a0 [ +0.000006] driver_attach+0x3d/0x60 [ +0.000005] bus_add_driver+0x309/0x650 [ +0.000005] driver_register+0x13d/0x490 [ +0.000006] __pci_register_driver+0x1ee/0x2b0 [ +0.000006] xfrm_ealg_get_byidx+0x43/0x50 [xfrm_algo] [ +0.000008] do_one_initcall+0x9c/0x3e0 [ +0.000007] do_init_module+0x29e/0x7f0 [ +0.000006] load_module+0x5c75/0x7c80 [ +0.000006] init_module_from_file+0x106/0x180 [ +0.000007] idempotent_init_module+0x377/0x740 [ +0.000006] __x64_sys_finit_module+0xd7/0x180 [ +0.000006] x64_sys_call+0x1f0b/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000013] Freed by task 1737 on cpu 9 at 76.455063s: [ +0.000010] kasan_save_stack+0x28/0x60 [ +0.000006] kasan_save_track+0x18/0x70 [ +0.000005] kasan_save_free_info+0x3b/0x60 [ +0.000006] __kasan_slab_free+0x54/0x80 [ +0.000005] kfree+0x127/0x470 [ +0.000006] amdgpu_driver_postclose_kms+0x455/0x760 [amdgpu] [ +0.000485] drm_file_free.part.0+0x5b1/0xba0 [ +0.000007] drm_file_free+0x13/0x30 [ +0.000006] drm_client_release+0x1c4/0x2b0 [ +0.000006] drm_fbdev_ttm_fb_destroy+0xd2/0x120 [drm_ttm_helper] [ +0.000007] put_fb_info+0x97/0xe0 [ +0.000006] unregister_framebuffer+0x197/0x380 [ +0.000005] drm_fb_helper_unregister_info+0x94/0x100 [ +0.000005] drm_fbdev_client_unregister+0x3c/0x80 [ +0.000007] drm_client_dev_unregister+0x144/0x330 [ +0.000006] drm_dev_unregister+0x49/0x1b0 [ +0.000006] drm_dev_unplug+0x4c/0xd0 [ +0.000006] amdgpu_pci_remove+0x58/0x130 [amdgpu] [ +0.000482] pci_device_remove+0xae/0x1e0 [ +0.000006] device_remove+0xc7/0x180 [ +0.000006] device_release_driver_internal+0x3d4/0x5a0 [ +0.000007] device_release_driver+0x12/0x20 [ +0.000006] pci_stop_bus_device+0x104/0x150 [ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40 [ +0.000005] remove_store+0xd7/0xf0 [ +0.000007] dev_attr_store+0x3f/0x80 [ +0.000006] sysfs_kf_write+0x125/0x1d0 [ +0.000005] kernfs_fop_write_iter+0x2ea/0x490 [ +0.000007] vfs_write+0x90d/0xe70 [ +0.000006] ksys_write+0x119/0x220 [ +0.000006] __x64_sys_write+0x72/0xc0 [ +0.000006] x64_sys_call+0x18ab/0x26f0 [ +0.000005] do_syscall_64+0x7c/0x170 [ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000013] The buggy address belongs to the object at ffff88811c068000 which belongs to the cache kmalloc-rnd-01-4k of size 4096 [ +0.000016] The buggy address is located 3168 bytes inside of freed 4096-byte region [ffff88811c068000, ffff88811c069000) [ +0.000022] The buggy address belongs to the physical page: [ +0.000010] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88811c06e000 pfn:0x11c068 [ +0.000006] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ +0.000006] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ +0.000007] page_type: f5(slab) [ +0.000007] raw: 0017ffffc0000040 ffff88810004c140 dead000000000122 0000000000000000 [ +0.000005] raw: ffff88811c06e000 0000000080040002 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000040 ffff88810004c140 dead000000000122 0000000000000000 [ +0.000005] head: ffff88811c06e000 0000000080040002 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000003 ffffea0004701a01 ffffffffffffffff 0000000000000000 [ +0.000005] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 [ +0.000004] page dumped because: kasan: bad access detected [ +0.000011] Memory state around the buggy address: [ +0.000009] ffff88811c068b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff88811c068b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] >ffff88811c068c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ^ [ +0.000010] ffff88811c068c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ffff88811c068d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ================================================================== Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Jesse Zhang <Jesse.Zhang@amd.com> Cc: Arvind Yadav <arvind.yadav@amd.com> v2: drop amdgpu_drm_release() and assign drm_release() as the callback directly.(Alex) Fixes: adba0929736a ("drm/amdgpu: Fix Illegal opcode in command stream Error") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu/sdma: handle paging queues in amdgpu_sdma_reset_engine()Alex Deucher1-2/+8
[ Upstream commit 9a9e87d15297ce72507178e93cbb773510c061cd ] Need to properly start and stop paging queues if they are present. This is not an issue today since we don't support a paging queue on any chips with queue reset. Fixes: b22659d5d352 ("drm/amdgpu: switch amdgpu_sdma_reset_engine to use the new sdma function pointers") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/amdgpu: Remove nbiov7.9 replay count reportingLijo Lazar1-20/+0
[ Upstream commit 0f566f0e9c614aa3d95082246f5b8c9e8a09c8b3 ] Direct pcie replay count reporting is not available on nbio v7.9. Reporting is done through firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Fixes: 50709d18f4a6 ("drm/amdgpu: Add pci replay count to nbio v7.9") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/vmwgfx: Fix Host-Backed userspace on Guest-Backed kernelIan Forbes1-1/+1
[ Upstream commit 7872997c048e989c7689c2995d230fdca7798000 ] Running 3D applications with SVGA_FORCE_HOST_BACKED=1 or using an ancient version of mesa was broken because the buffer was pinned in VMW_BO_DOMAIN_SYS and could not be moved to VMW_BO_DOMAIN_MOB during validation. The compat_shader buffer should not pinned. Fixes: 668b206601c5 ("drm/vmwgfx: Stop using raw ttm_buffer_object's") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://lore.kernel.org/r/20250429203427.1742331-1-ian.forbes@broadcom.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/panfrost: Fix panfrost device variable name in devfreqAdrián Larumbe1-2/+2
[ Upstream commit 6048f5587614bb4919c54966913452c1a0a43138 ] Commit 64111a0e22a9 ("drm/panfrost: Fix incorrect updating of current device frequency") was a Panfrost port of a similar fix in Panthor. Fix the Panfrost device pointer variable name so that it follows Panfrost naming conventions. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Fixes: 64111a0e22a9 ("drm/panfrost: Fix incorrect updating of current device frequency") Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250520174634.353267-6-adrian.larumbe@collabora.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/connector: hdmi: Evaluate limited range after computing formatCristian Ciocaltea1-2/+2
[ Upstream commit 21f627139652dd8329a88e281df6600f3866d238 ] Evaluating the requirement to use a limited RGB quantization range involves a verification of the output format, among others, but this is currently performed before actually computing the format, hence relying on the old connector state. Move the call to hdmi_is_limited_range() after hdmi_compute_config() to ensure the verification is done on the updated output format. Fixes: 027d43590649 ("drm/connector: hdmi: Add RGB Quantization Range to the connector state") Reviewed-by: Dmitry Baryshkov <lumag@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-1-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/sitronix: Remove broken backwards-compatibility layerGeert Uytterhoeven1-10/+0
[ Upstream commit a3f7d26dfce9e2d547a58f4941881843a391a6cc ] When moving the Sitronix DRM drivers and renaming their Kconfig symbols, the old symbols were kept, aiming to provide a seamless migration path when running "make olddefconfig" or "make oldconfig". However, the old compatibility symbols are not visible. Hence unless they are selected by another symbol (which they are not), they can never be enabled, and no backwards compatibility is provided. Drop the broken mechanism and the old symbols. Fixes: 9b8f32002cddf792 ("drm/sitronix: move tiny Sitronix drivers to their own subdir") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20395b14effe5e2e05a4f0856fdcda51c410329d.1747751592.git.geert+renesas@glider.be Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/rockchip: cleanup fb when drm_gem_fb_afbc_init failedAndy Yan1-8/+1
[ Upstream commit 099593a28138b48feea5be8ce700e5bc4565e31d ] In the function drm_gem_fb_init_with_funcs, the framebuffer (fb) and its corresponding object ID have already been registered. So we need to cleanup the drm framebuffer if the subsequent execution of drm_gem_fb_afbc_init fails. Directly call drm_framebuffer_put to ensure that all fb related resources are cleanup. Fixes: 7707f7227f09 ("drm/rockchip: Add support for afbc") Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250509031607.2542187-1-andyshrk@163.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15drm/xe: Correct BMG VSEC header sizingMichael J. Ruhl1-15/+4
[ Upstream commit 5b27388171a18cf6842c700520086ec50194e858 ] The intel_vsec_header information for the crashlog feature is incorrect. Update the VSEC header with correct sizing and count. Since the crashlog entries are "merged" (num_entries = 2), the separate capabilities entries must be merged as well. Fixes: 0c45e76fcc62 ("drm/xe/vsec: Support BMG devices") Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: David E. Box <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20250713172943.7335-4-michael.j.ruhl@intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>