summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm
AgeCommit message (Collapse)AuthorFilesLines
2025-04-20drm/amdgpu: grab an additional reference on the gang fence v2Christian König1-1/+9
[ Upstream commit 0d9a95099dcb05b5f4719c830d15bf4fdcad0dc2 ] We keep the gang submission fence around in adev, make sure that it stays alive. v2: fix memory leak on retry Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amdgpu: Fix the race condition for draining retry faultEmily Deng1-14/+17
[ Upstream commit f844732e3ad9c4b78df7436232949b8d2096d1a6 ] Issue: In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts has not been set and the retry fault has not been drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages continues execution and reaches svm_range_from_addr. This results in a "failed to find prange..." error, causing the page recovery to fail. How to fix: Move the timestamp check code under the protection of svm->lock. v2: Make sure all right locks are released before go out. v3: Directly goto out_unlock_svms, and return -EAGAIN. v4: Refine code. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amd/display: Prevent VStartup OverflowRyan Seto1-0/+2
[ Upstream commit 29c1c20496a7a9bafe2bc2f833d69aa52e0f2c2d ] [Why] For some VR headsets with large blanks, it's possible to overflow the OTG_VSTARTUP_PARAM:VSTARTUP_START register. This can lead to incorrect DML calculations and underflow downstream. [How] Min the calcualted max_vstartup_lines with the max value of the register. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ryan Seto <ryanseto@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amdgpu: handle amdgpu_cgs_create_device() errors in amd_powerplay_create()Wentao Liang1-0/+5
[ Upstream commit 1435e895d4fc967d64e9f5bf81e992ac32f5ac76 ] Add error handling to propagate amdgpu_cgs_create_device() failures to the caller. When amdgpu_cgs_create_device() fails, release hwmgr and return -ENOMEM to prevent null pointer dereference. [v1]->[v2]: Change error code from -EINVAL to -ENOMEM. Free hwmgr. Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/mediatek: mtk_dpi: Explicitly manage TVD clock in power on/offAngeloGioacchino Del Regno1-0/+9
[ Upstream commit 473c33f5ce651365468503c76f33158aaa1c7dd2 ] In preparation for adding support for MT8195's HDMI reserved DPI, add calls to clk_prepare_enable() / clk_disable_unprepare() for the TVD clock: in this particular case, the aforementioned clock is not (and cannot be) parented to neither pixel or engine clocks hence it won't get enabled automatically by the clock framework. Please note that on all of the currently supported MediaTek platforms, the TVD clock is always a parent of either pixel or engine clocks, and this means that the common clock framework is already enabling this clock before the children. On such platforms, this commit will only increase the refcount of the TVD clock without any functional change. Reviewed-by: CK Hu <ck.hu@mediatek.com> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250217154836.108895-10-angelogioacchino.delregno@collabora.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/mediatek: mtk_dpi: Move the input_2p_en bit to platform dataAngeloGioacchino Del Regno1-7/+7
[ Upstream commit c90876a695dd83e76680b88b40067275a5982811 ] In preparation for adding support for MT8195's HDMI reserved DPI instance, move the input_2p_en bit for DP_INTF to platform data. While at it, remove the input_2pixel member from platform data as having this bit implies that the 2pixel feature must be enabled. Reviewed-by: CK Hu <ck.hu@mediatek.com> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250217154836.108895-7-angelogioacchino.delregno@collabora.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/xe/xelp: Move Wa_16011163337 from tunings to workaroundsTvrtko Ursulin2-8/+7
[ Upstream commit d9b5d83c5a4d720af6ddbefe2825c78f0325a3fd ] Workaround database specifies 16011163337 as a workaround so lets move it there. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227101304.46660-3-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amdkfd: debugfs hang_hws skip GPU with MESPhilip Yang1-0/+5
[ Upstream commit fe9d0061c413f8fb8c529b18b592b04170850ded ] debugfs hang_hws is used by GPU reset test with HWS, for MES this crash the kernel with NULL pointer access because dqm->packet_mgr is not setup for MES path. Skip GPU with MES for now, MES hang_hws debugfs interface will be supported later. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amdkfd: Fix pqm_destroy_queue race with GPU resetPhilip Yang1-1/+1
[ Upstream commit 7919b4cad5545ed93778f11881ceee72e4dbed66 ] If GPU in reset, destroy_queue return -EIO, pqm_destroy_queue should delete the queue from process_queue_list and free the resource. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amdkfd: Fix mode1 reset crash issuePhilip Yang1-0/+17
[ Upstream commit f0b4440cdc1807bb6ec3dce0d6de81170803569b ] If HW scheduler hangs and mode1 reset is used to recover GPU, KFD signal user space to abort the processes. After process abort exit, user queues still use the GPU to access system memory before h/w is reset while KFD cleanup worker free system memory and free VRAM. There is use-after-free race bug that KFD allocate and reuse the freed system memory, and user queue write to the same system memory to corrupt the data structure and cause driver crash. To fix this race, KFD cleanup worker terminate user queues, then flush reset_domain wq to wait for any GPU ongoing reset complete, and then free outstanding BOs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amdkfd: clamp queue size to minimumDavid Yat Sin1-0/+10
[ Upstream commit e90711946b53590371ecce32e8fcc381a99d6333 ] If queue size is less than minimum, clamp it to minimum to prevent underflow when writing queue mqd. Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amd/display: stop DML2 from removing pipes based on planesMike Katsnelson1-26/+0
[ Upstream commit 8adeff83a3b07fa6d0958ed51e1b38ba7469e448 ] [Why] Transitioning from low to high resolutions at high refresh rates caused grey corruption. During the transition state, there is a period where plane size is based on low resultion state and ODM slices are based on high resoultion state, causing the entire plane to be contained in one ODM slice. DML2 would turn off the pipe for the ODM slice with no plane, causing an underflow since the pixel rate for the higher resolution cannot be supported on one pipe. This change stops DML2 from turning off pipes that are mapped to an ODM slice with no plane. This is possible to do without negative consequences because pipes can now take the minimum viewport and draw with zero recout size, removing the need to have the pipe turned off. [How] In map_pipes_from_plane(), remove "check" that skips ODM slices that are not covered by the plane. This prevents the pipes for those ODM slices from being freed. Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Mike Katsnelson <mike.katsnelson@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amd/display: Update FIXED_VS Link Rate Toggle Workaround UsageMichael Strauss3-3/+20
[ Upstream commit 7c6518c1c73199a230b5fc55ddfed3e5b9dc3290 ] [WHY] Previously the 128b/132b LTTPR support DPCD field was used to decide if FIXED_VS training sequence required a rate toggle before initiating LT. When running DP2.1 4.9.x.x compliance tests, emulated LTTPRs can report no-128b/132b support which is then forwarded by the FIXED_VS retimer. As a result this test exposes the rate toggle again, erroneously causing failures as certain compliance sinks don't expect this behaviour. [HOW] Add new DPCD register defines/reads to read LTTPR IEEE OUI and device ID. Decide whether to perform the rate toggle based on the LTTPR's IEEE OUI which guarantees that we only perform the toggle on affected retimers. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/bridge: panel: forbid initializing a panel with unknown connector typeLuca Ceresoli1-1/+4
[ Upstream commit b296955b3a740ecc8b3b08e34fd64f1ceabb8fb4 ] Having an DRM_MODE_CONNECTOR_Unknown connector type is considered bad, and drm_panel_bridge_add_typed() and derivatives are deprecated for this. drm_panel_init() won't prevent initializing a panel with a DRM_MODE_CONNECTOR_Unknown connector type. Luckily there are no in-tree users doing it, so take this as an opportinuty to document a valid connector type must be passed. Returning an error if this rule is violated is not possible because drm_panel_init() is a void function. Add at least a warning to make any violations noticeable, especially to non-upstream drivers. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250214-drm-assorted-cleanups-v7-5-88ca5827d7af@bootlin.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/debugfs: fix printk format for bridge indexLuca Ceresoli1-1/+1
[ Upstream commit 72443c730b7a7b5670a921ea928e17b9b99bd934 ] idx is an unsigned int, use %u for printk-style strings. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250214-drm-assorted-cleanups-v7-1-88ca5827d7af@bootlin.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm: panel-orientation-quirks: Add quirk for OneXPlayer Mini (Intel)Andrew Wyatt1-0/+12
[ Upstream commit b24dcc183583fc360ae0f0899e286a68f46abbd0 ] The Intel model of the OneXPlayer Mini uses a 1200x1920 portrait LCD panel. The DMI strings are the same as the OneXPlayer, which already has a DMI quirk, but the panel is different. Add a DMI match to correctly rotate this panel. Signed-off-by: Andrew Wyatt <fewtarius@steamfork.org> Co-developed-by: John Edwards <uejji@uejji.net> Signed-off-by: John Edwards <uejji@uejji.net> Tested-by: João Pedro Kurtz <joexkurtz@gmail.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250213222455.93533-6-uejji@uejji.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm: panel-orientation-quirks: Add new quirk for GPD Win 2Andrew Wyatt1-0/+6
[ Upstream commit a860eb9c6ba6cdbf32e3e01a606556e5a90a2931 ] Some GPD Win 2 units shipped with the correct DMI strings. Add a DMI match to correctly rotate the panel on these units. Signed-off-by: Andrew Wyatt <fewtarius@steamfork.org> Signed-off-by: John Edwards <uejji@uejji.net> Tested-by: Paco Avelar <pacoavelar@hotmail.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250213222455.93533-5-uejji@uejji.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm: panel-orientation-quirks: Add quirk for AYA NEO SlideAndrew Wyatt1-0/+6
[ Upstream commit 132c89ef8872e602cfb909377815111d121fe8d7 ] The AYANEO Slide uses a 1080x1920 portrait LCD panel. This is the same panel used on the AYANEO Air Plus, but the DMI data is too different to match both with one entry. Add a DMI match to correctly rotate the panel on the AYANEO Slide. This also covers the Antec Core HS, which is a rebranded AYANEO Slide with the exact same hardware and DMI strings. Signed-off-by: Andrew Wyatt <fewtarius@steamfork.org> Signed-off-by: John Edwards <uejji@uejji.net> Tested-by: John Edwards <uejji@uejji.net> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250213222455.93533-4-uejji@uejji.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm: panel-orientation-quirks: Add quirks for AYA NEO Flip DS and KBAndrew Wyatt1-0/+18
[ Upstream commit 529741c331da1fbf54f86c6ec3a4558b9b0b16dc ] The AYA NEO Flip DS and KB both use a 1080x1920 portrait LCD panel. The Flip DS additionally uses a 640x960 portrait LCD panel as a second display. Add DMI matches to correctly rotate these panels. Signed-off-by: Andrew Wyatt <fewtarius@steamfork.org> Co-developed-by: John Edwards <uejji@uejji.net> Signed-off-by: John Edwards <uejji@uejji.net> Tested-by: Paco Avelar <pacoavelar@hotmail.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250213222455.93533-3-uejji@uejji.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm: panel-orientation-quirks: Add support for AYANEO 2SAndrew Wyatt1-2/+2
[ Upstream commit eb8f1e3e8ee10cff591d4a47437dfd34d850d454 ] AYANEO 2S uses the same panel and orientation as the AYANEO 2. Update the AYANEO 2 DMI match to also match AYANEO 2S. Signed-off-by: Andrew Wyatt <fewtarius@steamfork.org> Signed-off-by: John Edwards <uejji@uejji.net> Tested-by: John Edwards <uejji@uejji.net> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250213222455.93533-2-uejji@uejji.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amdgpu: Unlocked unmap only clear page table leavesPhilip Yang3-38/+13
[ Upstream commit 23b645231eeffdaf44021debac881d2f26824150 ] SVM migration unmap pages from GPU and then update mapping to GPU to recover page fault. Currently unmap clears the PDE entry for range length >= huge page and free PTB bo, update mapping to alloc new PT bo. There is race bug that the freed entry bo maybe still on the pt_free list, reused when updating mapping and then freed, leave invalid PDE entry and cause GPU page fault. By setting the update to clear only one PDE entry or clear PTB, to avoid unmap to free PTE bo. This fixes the race bug and improve the unmap and map to GPU performance. Update mapping to huge page will still free the PTB bo. With this change, the vm->pt_freed list and work is not needed. Add WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the PTB. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amd/display: add workaround flag to link to force FFE presetBrendan Tam2-0/+4
[ Upstream commit 51d1b338541dea83fec8e6f95d3e46fa469a73a8 ] [Why] There have been instances of some monitors being unable to link train on their reported link speed using their selected FFE preset. If a different FFE preset is found that has a higher rate of success during link training this workaround can be used to force its FFE preset. [How] A new link workaround flag is made called force_dp_ffe_preset. The flag is checked in override_training_settings and will set lt_settings->ffe_preset which is null if the flag is not set. The flag is then set in override_lane_settings. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Brendan Tam <Brendan.Tam@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/amd/display: Update Cursor request mode to the beginning prefetch alwaysZhikai Zhai2-14/+10
[ Upstream commit 4a4077b4b63a8404efd6d37fc2926f03fb25bace ] [Why] The double buffer cursor registers is updated by the cursor vupdate event. There is a gap between vupdate and cursor data fetch if cursor fetch data reletive to cursor position. Cursor corruption will happen if we update the cursor surface in this gap. [How] Modify the cursor request mode to the beginning prefetch always and avoid wraparound calculation issues. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Zhikai Zhai <zhikai.zhai@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/xe/vf: Don't try to trigger a full GT reset if VFMichal Wajdeczko3-0/+21
[ Upstream commit 459777724d306315070d24608fcd89aea85516d6 ] VFs don't have access to the GDRST(0x941c) register that driver uses to reset a GT. Attempt to trigger a reset using debugfs: $ cat /sys/kernel/debug/dri/0000:00:02.1/gt0/force_reset or due to a hang condition detected by the driver leads to: [ ] xe 0000:00:02.1: [drm] GT0: trying reset from force_reset [xe] [ ] xe 0000:00:02.1: [drm] GT0: reset queued [ ] xe 0000:00:02.1: [drm] GT0: reset started [ ] ------------[ cut here ]------------ [ ] xe 0000:00:02.1: [drm] GT0: VF is trying to write 0x1 to an inaccessible register 0x941c+0x0 [ ] WARNING: CPU: 3 PID: 3069 at drivers/gpu/drm/xe/xe_gt_sriov_vf.c:996 xe_gt_sriov_vf_write32+0xc6/0x580 [xe] [ ] RIP: 0010:xe_gt_sriov_vf_write32+0xc6/0x580 [xe] [ ] Call Trace: [ ] <TASK> [ ] ? show_regs+0x6c/0x80 [ ] ? __warn+0x93/0x1c0 [ ] ? xe_gt_sriov_vf_write32+0xc6/0x580 [xe] [ ] ? report_bug+0x182/0x1b0 [ ] ? handle_bug+0x6e/0xb0 [ ] ? exc_invalid_op+0x18/0x80 [ ] ? asm_exc_invalid_op+0x1b/0x20 [ ] ? xe_gt_sriov_vf_write32+0xc6/0x580 [xe] [ ] ? xe_gt_sriov_vf_write32+0xc6/0x580 [xe] [ ] ? xe_gt_tlb_invalidation_reset+0xef/0x110 [xe] [ ] ? __mutex_unlock_slowpath+0x41/0x2e0 [ ] xe_mmio_write32+0x64/0x150 [xe] [ ] do_gt_reset+0x2f/0xa0 [xe] [ ] gt_reset_worker+0x14e/0x1e0 [xe] [ ] process_one_work+0x21c/0x740 [ ] worker_thread+0x1db/0x3c0 Fix that by sending H2G VF_RESET(0x5507) action instead. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4078 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250131182502.852-1-michal.wajdeczko@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/xe/pf: Don't send BEGIN_ID if VF has no context/doorbellsMichal Wajdeczko1-2/+2
[ Upstream commit 21ccac0e22aaf27b767f9de4bf573e7c47f619c8 ] It turned out that GuC validates VF configuration immediately after receiving "some" set of configuration KLVs and complains if one of the critical, from GuC understanding, resource is left unprovisioned, even if PF should be still allowed to make late VF config adjustments, since VF was not yet started. This issue was discovered after we decided to asynchronously re-send configuration KLVs after GT reset/resume, as then fair VF auto-provisioning could already allocate some of the resources, which was a prerequiste for sending those config KLVs: # fair GGTT provisioning [] xe 0000:00:02.0: [drm] GT0: PF: pushed VF1 config with 2 KLVs: [] xe 0000:00:02.0: [drm] GT0: { key 0x0001 : 64b value 0x176a000 } # ggtt_start [] xe 0000:00:02.0: [drm] GT0: { key 0x0002 : 64b value 0xfd696000 } # ggtt_size [] xe 0000:00:02.0: [drm] GT0: PF: VF1 provisioned with 4251541504 (3.96 GiB) GGTT # re-provisioning worker [] xe 0000:00:02.0: [drm] *ERROR* GT0: H2G request 0x5503 failed: error 0x60 hint 0x0 [] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 14 config KLVs (-EIO) [] xe 0000:00:02.0: [drm] GT0: { key 0x0001 : 64b value 0x176a000 } # ggtt_start [] xe 0000:00:02.0: [drm] GT0: { key 0x0002 : 64b value 0xfd696000 } # ggtt_size [] xe 0000:00:02.0: [drm] GT0: { key 0x8a0b : 32b value 0 } # begin_ctx_id [] xe 0000:00:02.0: [drm] GT0: { key 0x0004 : 32b value 0 } # num_contexts [] xe 0000:00:02.0: [drm] GT0: { key 0x8a0a : 32b value 0 } # begin_db_id [] xe 0000:00:02.0: [drm] GT0: { key 0x0006 : 32b value 0 } # num_doorbells [] xe 0000:00:02.0: [drm] GT0: { key 0x8a01 : 32b value 0 } # exec_quantum [] xe 0000:00:02.0: [drm] GT0: { key 0x8a02 : 32b value 0 } # preempt_timeout [] xe 0000:00:02.0: [drm] GT0: { key 0x8a03 : 32b value 0 } # cat_error_count [] xe 0000:00:02.0: [drm] GT0: { key 0x8a04 : 32b value 0 } # engine_reset_count [] xe 0000:00:02.0: [drm] GT0: { key 0x8a05 : 32b value 0 } # page_fault_count [] xe 0000:00:02.0: [drm] GT0: { key 0x8a06 : 32b value 0 } # guc_time_us [] xe 0000:00:02.0: [drm] GT0: { key 0x8a07 : 32b value 0 } # irq_time_us [] xe 0000:00:02.0: [drm] GT0: { key 0x8a08 : 32b value 0 } # doorbell_time_us [] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-EIO) To avoid such errors stop sending BEGIN_CONTEXT/DOORBELL_ID KLVs if no GuC context/doorbell IDs were provisioned to VF. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4176 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129195947.764-2-michal.wajdeczko@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm: allow encoder mode_set even when connectors change for crtcAbhinav Kumar1-1/+1
[ Upstream commit 7e182cb4f5567f53417b762ec0d679f0b6f0039d ] In certain use-cases, a CRTC could switch between two encoders and because the mode being programmed on the CRTC remains the same during this switch, the CRTC's mode_changed remains false. In such cases, the encoder's mode_set also gets skipped. Skipping mode_set on the encoder for such cases could cause an issue because even though the same CRTC mode was being used, the encoder type could have changed like the CRTC could have switched from a real time encoder to a writeback encoder OR vice-versa. Allow encoder's mode_set to happen even when connectors changed on a CRTC and not just when the mode changed. Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Signed-off-by: Jessica Zhang <quic_jesszhan@quicinc.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241211-abhinavk-modeset-fix-v3-1-0de4bf3e7c32@quicinc.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/i915/huc: Fix fence not released on early probe errorsJanusz Krzysztofik3-6/+7
[ Upstream commit e3ea2eae70692a455e256787e4f54153fb739b90 ] HuC delayed loading fence, introduced with commit 27536e03271da ("drm/i915/huc: track delayed HuC load with a fence"), is registered with object tracker early on driver probe but unregistered only from driver remove, which is not called on early probe errors. Since its memory is allocated under devres, then released anyway, it may happen to be allocated again to the fence and reused on future driver probes, resulting in kernel warnings that taint the kernel: <4> [309.731371] ------------[ cut here ]------------ <3> [309.731373] ODEBUG: init destroyed (active state 0) object: ffff88813d7dd2e0 object type: i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x20 [i915] <4> [309.731575] WARNING: CPU: 2 PID: 3161 at lib/debugobjects.c:612 debug_print_object+0x93/0xf0 ... <4> [309.731693] CPU: 2 UID: 0 PID: 3161 Comm: i915_module_loa Tainted: G U 6.14.0-CI_DRM_16362-gf0fd77956987+ #1 ... <4> [309.731700] RIP: 0010:debug_print_object+0x93/0xf0 ... <4> [309.731728] Call Trace: <4> [309.731730] <TASK> ... <4> [309.731949] __debug_object_init+0x17b/0x1c0 <4> [309.731957] debug_object_init+0x34/0x50 <4> [309.732126] __i915_sw_fence_init+0x34/0x60 [i915] <4> [309.732256] intel_huc_init_early+0x4b/0x1d0 [i915] <4> [309.732468] intel_uc_init_early+0x61/0x680 [i915] <4> [309.732667] intel_gt_common_init_early+0x105/0x130 [i915] <4> [309.732804] intel_root_gt_init_early+0x63/0x80 [i915] <4> [309.732938] i915_driver_probe+0x1fa/0xeb0 [i915] <4> [309.733075] i915_pci_probe+0xe6/0x220 [i915] <4> [309.733198] local_pci_probe+0x44/0xb0 <4> [309.733203] pci_device_probe+0xf4/0x270 <4> [309.733209] really_probe+0xee/0x3c0 <4> [309.733215] __driver_probe_device+0x8c/0x180 <4> [309.733219] driver_probe_device+0x24/0xd0 <4> [309.733223] __driver_attach+0x10f/0x220 <4> [309.733230] bus_for_each_dev+0x7d/0xe0 <4> [309.733236] driver_attach+0x1e/0x30 <4> [309.733239] bus_add_driver+0x151/0x290 <4> [309.733244] driver_register+0x5e/0x130 <4> [309.733247] __pci_register_driver+0x7d/0x90 <4> [309.733251] i915_pci_register_driver+0x23/0x30 [i915] <4> [309.733413] i915_init+0x34/0x120 [i915] <4> [309.733655] do_one_initcall+0x62/0x3f0 <4> [309.733667] do_init_module+0x97/0x2a0 <4> [309.733671] load_module+0x25ff/0x2890 <4> [309.733688] init_module_from_file+0x97/0xe0 <4> [309.733701] idempotent_init_module+0x118/0x330 <4> [309.733711] __x64_sys_finit_module+0x77/0x100 <4> [309.733715] x64_sys_call+0x1f37/0x2650 <4> [309.733719] do_syscall_64+0x91/0x180 <4> [309.733763] entry_SYSCALL_64_after_hwframe+0x76/0x7e <4> [309.733792] </TASK> ... <4> [309.733806] ---[ end trace 0000000000000000 ]--- That scenario is most easily reproducible with igt@i915_module_load@reload-with-fault-injection. Fix the issue by moving the cleanup step to driver release path. Fixes: 27536e03271da ("drm/i915/huc: track delayed HuC load with a fence") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13592 Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250402172057.209924-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 795dbde92fe5c6996a02a5b579481de73035e7bf) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/tests: probe-helper: Fix drm_display_mode memory leakMaxime Ripard1-1/+7
[ Upstream commit 8b6f2e28431b2f9f84073bff50353aeaf25559d0 ] drm_analog_tv_mode() and its variants return a drm_display_mode that needs to be destroyed later one. The drm_test_connector_helper_tv_get_modes_check() test never does however, which leads to a memory leak. Let's make sure it's freed. Reported-by: Philipp Stanner <phasta@mailbox.org> Closes: https://lore.kernel.org/dri-devel/a7655158a6367ac46194d57f4b7433ef0772a73e.camel@mailbox.org/ Fixes: 1e4a91db109f ("drm/probe-helper: Provide a TV get_modes helper") Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250408-drm-kunit-drm-display-mode-memleak-v1-7-996305a2e75a@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/tests: modes: Fix drm_display_mode memory leakMaxime Ripard1-0/+22
[ Upstream commit d34146340f95cd9bf06d4ce71cca72127dc0b7cd ] drm_analog_tv_mode() and its variants return a drm_display_mode that needs to be destroyed later one. The drm_modes_analog_tv tests never do however, which leads to a memory leak. Let's make sure it's freed. Reported-by: Philipp Stanner <phasta@mailbox.org> Closes: https://lore.kernel.org/dri-devel/a7655158a6367ac46194d57f4b7433ef0772a73e.camel@mailbox.org/ Fixes: 4fcd238560ee ("drm/modes: Add a function to generate analog display modes") Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250408-drm-kunit-drm-display-mode-memleak-v1-5-996305a2e75a@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/tests: cmdline: Fix drm_display_mode memory leakMaxime Ripard1-1/+9
[ Upstream commit 70f29ca3117a8796cd6bde7612a3ded96d0f2dde ] drm_analog_tv_mode() and its variants return a drm_display_mode that needs to be destroyed later one. The drm_test_cmdline_tv_options() test never does however, which leads to a memory leak. Let's make sure it's freed. Reported-by: Philipp Stanner <phasta@mailbox.org> Closes: https://lore.kernel.org/dri-devel/a7655158a6367ac46194d57f4b7433ef0772a73e.camel@mailbox.org/ Fixes: e691c9992ae1 ("drm/modes: Introduce the tv_mode property as a command-line option") Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250408-drm-kunit-drm-display-mode-memleak-v1-4-996305a2e75a@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/tests: helpers: Create kunit helper to destroy a drm_display_modeMaxime Ripard1-0/+22
[ Upstream commit 13c1d5f3a7fa7b55a26e73bb9e95342374a489b2 ] A number of test suites call functions that expect the returned drm_display_mode to be destroyed eventually. However, none of the tests called drm_mode_destroy, which results in a memory leak. Since drm_mode_destroy takes two pointers as argument, we can't use a kunit wrapper. Let's just create a helper every test suite can use. Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250408-drm-kunit-drm-display-mode-memleak-v1-1-996305a2e75a@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org> Stable-dep-of: 70f29ca3117a ("drm/tests: cmdline: Fix drm_display_mode memory leak") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/tests: modeset: Fix drm_display_mode memory leakMaxime Ripard1-0/+3
[ Upstream commit dacafdcc7789cfeb0f0552716db56f210238225d ] drm_mode_find_dmt() returns a drm_display_mode that needs to be destroyed later one. The drm_test_pick_cmdline_res_1920_1080_60() test never does however, which leads to a memory leak. Let's make sure it's freed. Reported-by: Philipp Stanner <phasta@mailbox.org> Closes: https://lore.kernel.org/dri-devel/a7655158a6367ac46194d57f4b7433ef0772a73e.camel@mailbox.org/ Fixes: 8fc0380f6ba7 ("drm/client: Add some tests for drm_connector_pick_cmdline_mode()") Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250408-drm-kunit-drm-display-mode-memleak-v1-2-996305a2e75a@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/xe: Restore EIO errno return when GuC PC start failsRodrigo Vivi1-0/+1
[ Upstream commit 88ecb66b9956a14577d513a6c8c28bb2e7989703 ] Commit b4b05e53b550 ("drm/xe/guc_pc: Retry and wait longer for GuC PC start"), leads to the following Smatch static checker warning: drivers/gpu/drm/xe/xe_guc_pc.c:1073 xe_guc_pc_start() warn: missing error code here? '_dev_err()' failed. 'ret' = '0' Fixes: c605acb53f44 ("drm/xe/guc_pc: Retry and wait longer for GuC PC start") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/intel-xe/1454a5f1-ee18-4df1-a6b2-a4a3dddcd1cb@stanley.mountain/ Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250328181752.26677-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 3f2bdccbccdcb53b0d316474eafff2e3462a51ad) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/xe/hw_engine: define sysfs_ops on all directoriesTejas Upadhyay1-56/+52
[ Upstream commit a5c71fd5b69b9da77e5e0b268e69e256932ba49c ] Sysfs_ops needs to be defined on all directories which can have attr files with set/get method. Add sysfs_ops to even those directories which is currently empty but would have attr files with set/get method in future. Leave .default with default sysfs_ops as it will never have setter method. V2(Himal/Rodrigo): - use single sysfs_ops for all dir and attr with set/get - add default ops as ./default does not need runtime pm at all Fixes: 3f0e14651ab0 ("drm/xe: Runtime PM wake on every sysfs call") Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250327122647.886637-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit 40780b9760b561e093508d07b8b9b06c94ab201e) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20drm/i915: Disable RPG during live selftestBadal Nilawar2-15/+22
[ Upstream commit 9d3d9776bd3bd9c32d460dfe6c3363134de578bc ] The Forcewake timeout issue has been observed on Gen 12.0 and above. To address this, disable Render Power-Gating (RPG) during live self-tests for these generations. The temporary workaround 'drm/i915/mtl: do not enable render power-gating on MTL' disables RPG globally, which is unnecessary since the issues were only seen during self-tests. v2: take runtime pm wakeref Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/9413 Fixes: 25e7976db86b ("drm/i915/mtl: do not enable render power-gating on MTL") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Sk Anirban <sk.anirban@intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250310152821.2931678-1-sk.anirban@intel.com (cherry picked from commit 0a4ae87706c6d15d14648e428c3a76351f823e48) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10Remove unnecessary firmware version check for gc v9_4_2Candice Li1-0/+1
commit 5b3c08ae9ed324743f5f7286940d45caeb656e6e upstream. GC v9_4_2 uses a new versioning scheme for CP firmware, making the warning ("CP firmware version too old, please update!") irrelevant. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10drm/amdgpu/gfx12: fix num_mecAlex Deucher1-1/+1
[ Upstream commit dce8bd9137b88735dd0efc4e2693213d98c15913 ] GC12 only has 1 mec. Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/amdgpu/gfx11: fix num_mecAlex Deucher1-1/+1
[ Upstream commit 4161050d47e1b083a7e1b0b875c9907e1a6f1f1f ] GC11 only has 1 mec. Fixes: 3d879e81f0f9 ("drm/amdgpu: add init support for GFX11 (v2)") Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/xe: Fix unmet direct dependencies warningYue Haibing1-1/+1
[ Upstream commit 5e66cf6edddb5f6237e3afb07475ace57ecb56bc ] WARNING: unmet direct dependencies detected for FB_IOMEM_HELPERS Depends on [n]: HAS_IOMEM [=y] && FB_CORE [=n] Selected by [m]: - DRM_XE_DISPLAY [=y] && HAS_IOMEM [=y] && DRM [=m] && DRM_XE [=m] && DRM_XE [=m]=m [=m] && HAS_IOPORT [=y] DRM_XE_DISPLAY requires FB_IOMEM_HELPERS, but the dependency FB_CORE is missing, selecting FB_IOMEM_HELPERS if DRM_FBDEV_EMULATION is set as other drm drivers. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250323114103.1960511-1-yuehaibing@huawei.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 689582882802cd64986c1eb584c9f5184d67f0cf) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/amd/display: Fix incorrect fw_state address in dmub_srvLo-an Chen1-1/+1
[ Upstream commit d60073294cc3b46b73d6de247e0e5ae8684a6241 ] [WHY] The fw_state in dmub_srv was assigned with wrong address. The address was pointed to the firmware region. [HOW] Fix the firmware state by using DMUB_DEBUG_FW_STATE_OFFSET in dmub_cmd.h. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Lo-an Chen <lo-an.chen@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f57b38ac85a01bf03020cc0a9761d63e5c0ce197) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/amd: Keep display off while going into S4Mario Limonciello2-2/+14
[ Upstream commit 4afacc9948e1f8fdbca401d259ae65ad93d298c0 ] When userspace invokes S4 the flow is: 1) amdgpu_pmops_prepare() 2) amdgpu_pmops_freeze() 3) Create hibernation image 4) amdgpu_pmops_thaw() 5) Write out image to disk 6) Turn off system Then on resume amdgpu_pmops_restore() is called. This flow has a problem that because amdgpu_pmops_thaw() is called it will call amdgpu_device_resume() which will resume all of the GPU. This includes turning the display hardware back on and discovering connectors again. This is an unexpected experience for the display to turn back on. Adjust the flow so that during the S4 sequence display hardware is not turned back on. Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038 Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 68bfdc8dc0a1a7fdd9ab61e69907ae71a6fd3d91) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/xe/guc_pc: Retry and wait longer for GuC PC startRodrigo Vivi1-13/+40
[ Upstream commit c605acb53f449f6289f042790307d7dc9e62d03d ] In a rare situation of thermal limit during resume, GuC can be slow and run into delays like this: xe 0000:00:02.0: [drm] GT1: excessive init time: 667ms! \ [status = 0x8002F034, timeouts = 0] xe 0000:00:02.0: [drm] GT1: excessive init time: \ [freq = 100MHz (req = 800MHz), before = 100MHz, \ perf_limit_reasons = 0x1C001000] xe 0000:00:02.0: [drm] *ERROR* GT1: GuC PC Start failed ------------[ cut here ]------------ xe 0000:00:02.0: [drm] GT1: Failed to start GuC PC: -EIO When this happens, it will block entirely the GPU to be used. So, let's try and with a huge timeout in the hope it comes back. Also, let's collect some information on how long it is usually taking on situations like this, so perhaps the time can be tuned later. Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307160307.1093391-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit b4b05e53b550a886b4754b87fd0dd2b304579e85) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/amd/display: avoid NPD when ASIC does not support DMUBThadeu Lima de Souza Cascardo1-0/+4
[ Upstream commit 42d9d7bed270247f134190ba0cb05bbd072f58c2 ] ctx->dmub_srv will de NULL if the ASIC does not support DMUB, which is tested in dm_dmub_sw_init. However, it will be dereferenced in dmub_hw_lock_mgr_cmd if should_use_dmub_lock returns true. This has been the case since dmub support has been added for PSR1. Fix this by checking for dmub_srv in should_use_dmub_lock. [ 37.440832] BUG: kernel NULL pointer dereference, address: 0000000000000058 [ 37.447808] #PF: supervisor read access in kernel mode [ 37.452959] #PF: error_code(0x0000) - not-present page [ 37.458112] PGD 0 P4D 0 [ 37.460662] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI [ 37.465553] CPU: 2 UID: 1000 PID: 1745 Comm: DrmThread Not tainted 6.14.0-rc1-00003-gd62e938120f0 #23 99720e1cb1e0fc4773b8513150932a07de3c6e88 [ 37.478324] Hardware name: Google Morphius/Morphius, BIOS Google_Morphius.13434.858.0 10/26/2023 [ 37.487103] RIP: 0010:dmub_hw_lock_mgr_cmd+0x77/0xb0 [ 37.492074] Code: 44 24 0e 00 00 00 00 48 c7 04 24 45 00 00 0c 40 88 74 24 0d 0f b6 02 88 44 24 0c 8b 01 89 44 24 08 85 f6 75 05 c6 44 24 0e 01 <48> 8b 7f 58 48 89 e6 ba 01 00 00 00 e8 08 3c 2a 00 65 48 8b 04 5 [ 37.510822] RSP: 0018:ffff969442853300 EFLAGS: 00010202 [ 37.516052] RAX: 0000000000000000 RBX: ffff92db03000000 RCX: ffff969442853358 [ 37.523185] RDX: ffff969442853368 RSI: 0000000000000001 RDI: 0000000000000000 [ 37.530322] RBP: 0000000000000001 R08: 00000000000004a7 R09: 00000000000004a5 [ 37.537453] R10: 0000000000000476 R11: 0000000000000062 R12: ffff92db0ade8000 [ 37.544589] R13: ffff92da01180ae0 R14: ffff92da011802a8 R15: ffff92db03000000 [ 37.551725] FS: 0000784a9cdfc6c0(0000) GS:ffff92db2af00000(0000) knlGS:0000000000000000 [ 37.559814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.565562] CR2: 0000000000000058 CR3: 0000000112b1c000 CR4: 00000000003506f0 [ 37.572697] Call Trace: [ 37.575152] <TASK> [ 37.577258] ? __die_body+0x66/0xb0 [ 37.580756] ? page_fault_oops+0x3e7/0x4a0 [ 37.584861] ? exc_page_fault+0x3e/0xe0 [ 37.588706] ? exc_page_fault+0x5c/0xe0 [ 37.592550] ? asm_exc_page_fault+0x22/0x30 [ 37.596742] ? dmub_hw_lock_mgr_cmd+0x77/0xb0 [ 37.601107] dcn10_cursor_lock+0x1e1/0x240 [ 37.605211] program_cursor_attributes+0x81/0x190 [ 37.609923] commit_planes_for_stream+0x998/0x1ef0 [ 37.614722] update_planes_and_stream_v2+0x41e/0x5c0 [ 37.619703] dc_update_planes_and_stream+0x78/0x140 [ 37.624588] amdgpu_dm_atomic_commit_tail+0x4362/0x49f0 [ 37.629832] ? srso_return_thunk+0x5/0x5f [ 37.633847] ? mark_held_locks+0x6d/0xd0 [ 37.637774] ? _raw_spin_unlock_irq+0x24/0x50 [ 37.642135] ? srso_return_thunk+0x5/0x5f [ 37.646148] ? lockdep_hardirqs_on+0x95/0x150 [ 37.650510] ? srso_return_thunk+0x5/0x5f [ 37.654522] ? _raw_spin_unlock_irq+0x2f/0x50 [ 37.658883] ? srso_return_thunk+0x5/0x5f [ 37.662897] ? wait_for_common+0x186/0x1c0 [ 37.666998] ? srso_return_thunk+0x5/0x5f [ 37.671009] ? drm_crtc_next_vblank_start+0xc3/0x170 [ 37.675983] commit_tail+0xf5/0x1c0 [ 37.679478] drm_atomic_helper_commit+0x2a2/0x2b0 [ 37.684186] drm_atomic_commit+0xd6/0x100 [ 37.688199] ? __cfi___drm_printfn_info+0x10/0x10 [ 37.692911] drm_atomic_helper_update_plane+0xe5/0x130 [ 37.698054] drm_mode_cursor_common+0x501/0x670 [ 37.702600] ? __cfi_drm_mode_cursor_ioctl+0x10/0x10 [ 37.707572] drm_mode_cursor_ioctl+0x48/0x70 [ 37.711851] drm_ioctl_kernel+0xf2/0x150 [ 37.715781] drm_ioctl+0x363/0x590 [ 37.719189] ? __cfi_drm_mode_cursor_ioctl+0x10/0x10 [ 37.724165] amdgpu_drm_ioctl+0x41/0x80 [ 37.728013] __se_sys_ioctl+0x7f/0xd0 [ 37.731685] do_syscall_64+0x87/0x100 [ 37.735355] ? vma_end_read+0x12/0xe0 [ 37.739024] ? srso_return_thunk+0x5/0x5f [ 37.743041] ? find_held_lock+0x47/0xf0 [ 37.746884] ? vma_end_read+0x12/0xe0 [ 37.750552] ? srso_return_thunk+0x5/0x5f [ 37.754565] ? lock_release+0x1c4/0x2e0 [ 37.758406] ? vma_end_read+0x12/0xe0 [ 37.762079] ? exc_page_fault+0x84/0xe0 [ 37.765921] ? srso_return_thunk+0x5/0x5f [ 37.769938] ? lockdep_hardirqs_on+0x95/0x150 [ 37.774303] ? srso_return_thunk+0x5/0x5f [ 37.778317] ? exc_page_fault+0x84/0xe0 [ 37.782163] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 37.787218] RIP: 0033:0x784aa5ec3059 [ 37.790803] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1d 48 8b 45 c8 64 48 2b 04 25 28 00 0 [ 37.809553] RSP: 002b:0000784a9cdf90e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 37.817121] RAX: ffffffffffffffda RBX: 0000784a9cdf917c RCX: 0000784aa5ec3059 [ 37.824256] RDX: 0000784a9cdf917c RSI: 00000000c01c64a3 RDI: 0000000000000020 [ 37.831391] RBP: 0000784a9cdf9130 R08: 0000000000000100 R09: 0000000000ff0000 [ 37.838525] R10: 0000000000000000 R11: 0000000000000246 R12: 0000025c01606ed0 [ 37.845657] R13: 0000025c00030200 R14: 00000000c01c64a3 R15: 0000000000000020 [ 37.852799] </TASK> [ 37.854992] Modules linked in: [ 37.864546] gsmi: Log Shutdown Reason 0x03 [ 37.868656] CR2: 0000000000000058 [ 37.871979] ---[ end trace 0000000000000000 ]--- [ 37.880976] RIP: 0010:dmub_hw_lock_mgr_cmd+0x77/0xb0 [ 37.885954] Code: 44 24 0e 00 00 00 00 48 c7 04 24 45 00 00 0c 40 88 74 24 0d 0f b6 02 88 44 24 0c 8b 01 89 44 24 08 85 f6 75 05 c6 44 24 0e 01 <48> 8b 7f 58 48 89 e6 ba 01 00 00 00 e8 08 3c 2a 00 65 48 8b 04 5 [ 37.904703] RSP: 0018:ffff969442853300 EFLAGS: 00010202 [ 37.909933] RAX: 0000000000000000 RBX: ffff92db03000000 RCX: ffff969442853358 [ 37.917068] RDX: ffff969442853368 RSI: 0000000000000001 RDI: 0000000000000000 [ 37.924201] RBP: 0000000000000001 R08: 00000000000004a7 R09: 00000000000004a5 [ 37.931336] R10: 0000000000000476 R11: 0000000000000062 R12: ffff92db0ade8000 [ 37.938469] R13: ffff92da01180ae0 R14: ffff92da011802a8 R15: ffff92db03000000 [ 37.945602] FS: 0000784a9cdfc6c0(0000) GS:ffff92db2af00000(0000) knlGS:0000000000000000 [ 37.953689] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.959435] CR2: 0000000000000058 CR3: 0000000112b1c000 CR4: 00000000003506f0 [ 37.966570] Kernel panic - not syncing: Fatal exception [ 37.971901] Kernel Offset: 0x30200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 37.982840] gsmi: Log Shutdown Reason 0x02 Fixes: b5c764d6ed55 ("drm/amd/display: Use HW lock mgr for PSR1") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Cc: Sun peng Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Daniel Wheeler <daniel.wheeler@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/mediatek: dsi: fix error codes in mtk_dsi_host_transfer()Dan Carpenter1-3/+3
[ Upstream commit dcb166ee43c3d594e7b73a24f6e8cf5663eeff2c ] There is a type bug because the return statement: return ret < 0 ? ret : recv_cnt; The issue is that ret is an int, recv_cnt is a u32 and the function returns ssize_t, which is a signed long. The way that the type promotion works is that the negative error codes are first cast to u32 and then to signed long. The error codes end up being positive instead of negative and the callers treat them as success. Fixes: 81cc7e51c4f1 ("drm/mediatek: Allow commands to be sent during video mode") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202412210801.iADw0oIH-lkp@intel.com/ Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/b754a408-4f39-4e37-b52d-7706c132e27f@stanley.mountain/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/mediatek: dp: drm_err => dev_err in HPD path to avoid NULL ptrDouglas Anderson1-3/+3
[ Upstream commit 106a6de46cf4887d535018185ec528ce822d6d84 ] The function mtk_dp_wait_hpd_asserted() may be called before the `mtk_dp->drm_dev` pointer is assigned in mtk_dp_bridge_attach(). Specifically it can be called via this callpath: - mtk_edp_wait_hpd_asserted - [panel probe] - dp_aux_ep_probe Using "drm" level prints anywhere in this callpath causes a NULL pointer dereference. Change the error message directly in mtk_dp_wait_hpd_asserted() to dev_err() to avoid this. Also change the error messages in mtk_dp_parse_capabilities(), which is called by mtk_dp_wait_hpd_asserted(). While touching these prints, also add the error code to them to make future debugging easier. Fixes: 7eacba9a083b ("drm/mediatek: dp: Add .wait_hpd_asserted() for AUX bus") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250116094249.1.I29b0b621abb613ddc70ab4996426a3909e1aa75f@changeid/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/mediatek: Fix config_updating flag never false when no mbox channelJason-JH Lin1-2/+5
[ Upstream commit 4ba973c8bad04d59fd4efa62512f4d9cee131714 ] When CONFIG_MTK_CMDQ is enabled, if the display is controlled by the CPU while other hardware is controlled by the GCE, the display will encounter a mbox request channel failure. However, it will still enter the CONFIG_MTK_CMDQ statement, causing the config_updating flag to never be set to false. As a result, no page flip event is sent back to user space, and the screen does not update. Fixes: da03801ad08f ("drm/mediatek: Move mtk_crtc_finish_page_flip() to ddp_cmdq_cb()") Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250224051301.3538484-1-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/msm/a6xx: Fix a6xx indexed-regs in devcoreduumpRob Clark1-0/+2
[ Upstream commit 06dd5d86c6aef1c7609ca3a5ffa4097e475e2213 ] Somehow, possibly as a result of rebase gone badly, setting nr_indexed_regs for pre-a650 a6xx devices lost the setting of nr_indexed_regs, resulting in values getting snapshot, but omitted from the devcoredump. Fixes: e997ae5f45ca ("drm/msm/a6xx: Mostly implement A7xx gpu_state") Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/640289/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/amd/display: fix type mismatch in CalculateDynamicMetadataParameters()Vitaliy Shevtsov1-6/+6
[ Upstream commit c3c584c18c90a024a54716229809ba36424f9660 ] There is a type mismatch between what CalculateDynamicMetadataParameters() takes and what is passed to it. Currently this function accepts several args as signed long but it's called with unsigned integers and integer. On some systems where long is 32 bits and one of these unsigned int params is greater than INT_MAX it may cause passing input params as negative values. Fix this by changing these argument types from long to unsigned int and to int respectively. Also this will align the function's definition with similar functions in other dcn* drivers. Found by Linux Verification Center (linuxtesting.org) with Svace. Fixes: 6725a88f88a7 ("drm/amd/display: Add DCN3 DML") Signed-off-by: Vitaliy Shevtsov <v.shevtsov@mt-integration.ru> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/panthor: Clean up FW version information displaySteven Price1-5/+4
[ Upstream commit 3b87886bfb038de2c62e627079472ba612e89410 ] Assigning a string to an array which is too small to include the NUL byte at the end causes a warning on some compilers. But this function also has some other oddities like the 'header' array which is only ever used within sizeof(). Tidy up the function by removing the 'header' array, allow the NUL byte to be present in git_sha_header, and calculate the length directly from git_sha_header. Reported-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/all/20250213154237.GA11897@willie-the-truck/ Fixes: 9d443deb0441 ("drm/panthor: Display FW version information") Signed-off-by: Steven Price <steven.price@arm.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213161248.1642392-1-steven.price@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/panthor: Update CS_STATUS_ defines to correct valuesAshley Smith1-3/+3
[ Upstream commit c82734fbdc50dc9e568e8686622eaa4498acb81e ] Values for SC_STATUS_BLOCKED_REASON_ are documented in the G610 "Odin" GPU specification (CS_STATUS_BLOCKED_REASON register). This change updates the defines to the correct values. Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block") Signed-off-by: Ashley Smith <ashley.smith@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250303180444.3768993-1-ashley.smith@collabora.com Signed-off-by: Sasha Levin <sashal@kernel.org>