summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2024-12-09Revert "drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs"Ashutosh Dixit1-14/+0
commit 0191fddf53748cf2b473d78faeabe6dcb47689d2 upstream. This reverts commit 55858fa7eb2f163f7aa34339fd3399ba4ff564c6. '55858fa7eb2f ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs")' was not properly reviewed and also causes dmesg asserts in CI. Revert it. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295 Fixes: 55858fa7eb2f ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029200147.1476513-1-ashutosh.dixit@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amd/display: Remove PIPE_DTO_SRC_SEL programming from set_dtbclk_dtoOvidiu Bunea1-6/+9
commit a3e6079bd93d5c66a43bf6a5f90e5b98465dc7b3 upstream. There are cases where an OTG is remapped from driving a regular HDMI display to a DP/eDP display. There are also cases where DTBCLK needs to be enabled for HPO, but DTBCLK DTO programming may be done while OTG is still enabled which is dangerous as the PIPE_DTO_SRC_SEL programming may change the pixel clock generator source for a mapped and running OTG and cause it to hang. Remove the PIPE_DTO_SRC_SEL programming from this sequence since it is already done in program_pixel_clk(). Additionally, make sure that program_pixel_clk sets DTBCLK DTO as source for special HDMI cases. Cc: stable@vger.kernel.org # 6.11+ Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amd/display: update pipe selection policy to check head pipeYihan Zhu1-1/+22
commit 8fef253c94a5312b9150b2ff8e633b331bac7e88 upstream. [Why] No check on head pipe during the dml to dc hw mapping will allow illegal pipe usage. This will result in a wrong pipe topology to cause mpcc tree totally mess up then cause a display hang. [How] Avoid to use the pipe is head in all check and avoid ODM slice during preferred pipe check. Cc: stable@vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amd/display: Fix handling of plane refcountJoshua Aberback1-0/+3
commit 27227a234c1487cb7a684615f0749c455218833a upstream. [Why] The mechanism to backup and restore plane states doesn't maintain refcount, which can cause issues if the refcount of the plane changes in between backup and restore operations, such as memory leaks if the refcount was supposed to go down, or double frees / invalid memory accesses if the refcount was supposed to go up. [How] Cache and re-apply current refcount when restoring plane states. Cc: stable@vger.kernel.org Reviewed-by: Josip Pavic <josip.pavic@amd.com> Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amd/pm: Remove arcturus min power limitLijo Lazar1-1/+5
commit da868898cf4c5ddbd1f7406e356edce5d7211eb5 upstream. As per power team, there is no need to impose a lower bound on arcturus power limit. Any unreasonable limit set will result in frequent throttling. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amd/pm: disable pcie speed switching on Intel platform for smu v14.0.2/3Kenneth Feng1-3/+23
commit b0df0e777874549c128b43f7bf4989a2ed24b37a upstream. disable pcie speed switching on Intel platform for smu v14.0.2/3 based on Intel's requirement. v2: align the setting with smu v13. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amd/pm: update current_socclk and current_uclk in gpu_metrics on smu v13.0.7Umio Yasuno1-0/+2
commit 2abf2f7032df4c4e7f6cf7906da59d0e614897d6 upstream. These were missed before. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3751 Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amd: Fix initialization mistake for NBIO 7.11 devicesMario Limonciello1-0/+9
commit 349af06a3abd0bb3787ee2daf3ac508412fe8dcc upstream. There is a strapping issue on NBIO 7.11.x that can lead to spurious PME events while in the D0 state. Cc: stable@vger.kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241118174611.10700-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amd/pm: skip setting the power source on smu v14.0.2/3Kenneth Feng1-1/+0
commit 76c7f08094767b5df3b60e18d1bdecddd4a5c844 upstream. skip setting power source on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amdgpu: fix usage slab after freeVitaly Prosyak2-4/+4
commit b61badd20b443eabe132314669bb51a263982e5c upstream. [ +0.000021] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000027] Read of size 8 at addr ffff8881b8605f88 by task amd_pci_unplug/2147 [ +0.000023] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1 [ +0.000016] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000016] Call Trace: [ +0.000008] <TASK> [ +0.000009] dump_stack_lvl+0x76/0xa0 [ +0.000017] print_report+0xce/0x5f0 [ +0.000017] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000019] ? srso_return_thunk+0x5/0x5f [ +0.000015] ? kasan_complete_mode_report_info+0x72/0x200 [ +0.000016] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000019] kasan_report+0xbe/0x110 [ +0.000015] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000023] __asan_report_load8_noabort+0x14/0x30 [ +0.000014] drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000016] ? __pfx_drm_sched_entity_flush+0x10/0x10 [gpu_sched] [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? enable_work+0x124/0x220 [ +0.000015] ? __pfx_enable_work+0x10/0x10 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? free_large_kmalloc+0x85/0xf0 [ +0.000016] drm_sched_entity_destroy+0x18/0x30 [gpu_sched] [ +0.000020] amdgpu_vce_sw_fini+0x55/0x170 [amdgpu] [ +0.000735] ? __kasan_check_read+0x11/0x20 [ +0.000016] vce_v4_0_sw_fini+0x80/0x110 [amdgpu] [ +0.000726] amdgpu_device_fini_sw+0x331/0xfc0 [amdgpu] [ +0.000679] ? mutex_unlock+0x80/0xe0 [ +0.000017] ? __pfx_amdgpu_device_fini_sw+0x10/0x10 [amdgpu] [ +0.000662] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __kasan_check_write+0x14/0x30 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? mutex_unlock+0x80/0xe0 [ +0.000016] amdgpu_driver_release_kms+0x16/0x80 [amdgpu] [ +0.000663] drm_minor_release+0xc9/0x140 [drm] [ +0.000081] drm_release+0x1fd/0x390 [drm] [ +0.000082] __fput+0x36c/0xad0 [ +0.000018] __fput_sync+0x3c/0x50 [ +0.000014] __x64_sys_close+0x7d/0xe0 [ +0.000014] x64_sys_call+0x1bc6/0x2680 [ +0.000014] do_syscall_64+0x70/0x130 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? irqentry_exit_to_user_mode+0x60/0x190 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? irqentry_exit+0x43/0x50 [ +0.000012] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? exc_page_fault+0x7c/0x110 [ +0.000015] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000014] RIP: 0033:0x7ffff7b14f67 [ +0.000013] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff [ +0.000026] RSP: 002b:00007fffffffe378 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ +0.000019] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67 [ +0.000014] RDX: 0000000000000000 RSI: 00007ffff7f6f47a RDI: 0000000000000003 [ +0.000014] RBP: 00007fffffffe3a0 R08: 0000555555569890 R09: 0000000000000000 [ +0.000014] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8 [ +0.000013] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040 [ +0.000020] </TASK> [ +0.000016] Allocated by task 383 on cpu 7 at 26.880319s: [ +0.000014] kasan_save_stack+0x28/0x60 [ +0.000008] kasan_save_track+0x18/0x70 [ +0.000007] kasan_save_alloc_info+0x38/0x60 [ +0.000007] __kasan_kmalloc+0xc1/0xd0 [ +0.000007] kmalloc_trace_noprof+0x180/0x380 [ +0.000007] drm_sched_init+0x411/0xec0 [gpu_sched] [ +0.000012] amdgpu_device_init+0x695f/0xa610 [amdgpu] [ +0.000658] amdgpu_driver_load_kms+0x1a/0x120 [amdgpu] [ +0.000662] amdgpu_pci_probe+0x361/0xf30 [amdgpu] [ +0.000651] local_pci_probe+0xe7/0x1b0 [ +0.000009] pci_device_probe+0x248/0x890 [ +0.000008] really_probe+0x1fd/0x950 [ +0.000008] __driver_probe_device+0x307/0x410 [ +0.000007] driver_probe_device+0x4e/0x150 [ +0.000007] __driver_attach+0x223/0x510 [ +0.000006] bus_for_each_dev+0x102/0x1a0 [ +0.000007] driver_attach+0x3d/0x60 [ +0.000006] bus_add_driver+0x2ac/0x5f0 [ +0.000006] driver_register+0x13d/0x490 [ +0.000008] __pci_register_driver+0x1ee/0x2b0 [ +0.000007] llc_sap_close+0xb0/0x160 [llc] [ +0.000009] do_one_initcall+0x9c/0x3e0 [ +0.000008] do_init_module+0x241/0x760 [ +0.000008] load_module+0x51ac/0x6c30 [ +0.000006] __do_sys_init_module+0x234/0x270 [ +0.000007] __x64_sys_init_module+0x73/0xc0 [ +0.000006] x64_sys_call+0xe3/0x2680 [ +0.000006] do_syscall_64+0x70/0x130 [ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000015] Freed by task 2147 on cpu 6 at 160.507651s: [ +0.000013] kasan_save_stack+0x28/0x60 [ +0.000007] kasan_save_track+0x18/0x70 [ +0.000007] kasan_save_free_info+0x3b/0x60 [ +0.000007] poison_slab_object+0x115/0x1c0 [ +0.000007] __kasan_slab_free+0x34/0x60 [ +0.000007] kfree+0xfa/0x2f0 [ +0.000007] drm_sched_fini+0x19d/0x410 [gpu_sched] [ +0.000012] amdgpu_fence_driver_sw_fini+0xc4/0x2f0 [amdgpu] [ +0.000662] amdgpu_device_fini_sw+0x77/0xfc0 [amdgpu] [ +0.000653] amdgpu_driver_release_kms+0x16/0x80 [amdgpu] [ +0.000655] drm_minor_release+0xc9/0x140 [drm] [ +0.000071] drm_release+0x1fd/0x390 [drm] [ +0.000071] __fput+0x36c/0xad0 [ +0.000008] __fput_sync+0x3c/0x50 [ +0.000007] __x64_sys_close+0x7d/0xe0 [ +0.000007] x64_sys_call+0x1bc6/0x2680 [ +0.000007] do_syscall_64+0x70/0x130 [ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000014] The buggy address belongs to the object at ffff8881b8605f80 which belongs to the cache kmalloc-64 of size 64 [ +0.000020] The buggy address is located 8 bytes inside of freed 64-byte region [ffff8881b8605f80, ffff8881b8605fc0) [ +0.000028] The buggy address belongs to the physical page: [ +0.000011] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1b8605 [ +0.000008] anon flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) [ +0.000007] page_type: 0xffffefff(slab) [ +0.000009] raw: 0017ffffc0000000 ffff8881000428c0 0000000000000000 dead000000000001 [ +0.000006] raw: 0000000000000000 0000000000200020 00000001ffffefff 0000000000000000 [ +0.000006] page dumped because: kasan: bad access detected [ +0.000012] Memory state around the buggy address: [ +0.000011] ffff8881b8605e80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ +0.000015] ffff8881b8605f00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ +0.000015] >ffff8881b8605f80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ +0.000013] ^ [ +0.000011] ffff8881b8606000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc [ +0.000014] ffff8881b8606080: fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb [ +0.000013] ================================================================== The issue reproduced on VG20 during the IGT pci_unplug test. The root cause of the issue is that the function drm_sched_fini is called before drm_sched_entity_kill. In drm_sched_fini, the drm_sched_rq structure is freed, but this structure is later accessed by each entity within the run queue, leading to invalid memory access. To resolve this, the order of cleanup calls is updated: Before: amdgpu_fence_driver_sw_fini amdgpu_device_ip_fini After: amdgpu_device_ip_fini amdgpu_fence_driver_sw_fini This updated order ensures that all entities in the IPs are cleaned up first, followed by proper cleanup of the schedulers. Additional Investigation: During debugging, another issue was identified in the amdgpu_vce_sw_fini function. The vce.vcpu_bo buffer must be freed only as the final step in the cleanup process to prevent any premature access during earlier cleanup stages. v2: Using Christian suggestion call drm_sched_entity_destroy before drm_sched_fini. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amd: Add some missing straps from NBIO 7.11.0Mario Limonciello2-0/+15
commit 902fbbf429b8213232b18de0ddfd5c0f3851cb8f upstream. Earlier ASICs have strap information exported, and this is missing for NBIO 7.11.0. Cc: stable@vger.kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Fixes: ca8c68142ad8 ("drm/amdgpu: add nbio 7.11 registers") Link: https://lore.kernel.org/r/20241118174611.10700-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amdgpu/pm: add gen5 display to the user on smu v14.0.2/3Kenneth Feng4-6/+12
commit 6719ab8234ce4b0c0e9aa93aaa94961e5b2bc852 upstream. add gen5 display to the user on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/amdkfd: Use the correct wptr sizeLijo Lazar1-1/+1
commit cdc6705f98ea3f854a60ba8c9b19228e197ae384 upstream. Write pointer could be 32-bit or 64-bit. Use the correct size during initialization. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/xe/guc_submit: fix race around suspend_pendingMatthew Auld1-2/+15
commit 87651f31ae4e6e6e7e6c7270b9b469405e747407 upstream. Currently in some testcases we can trigger: xe 0000:03:00.0: [drm] Assertion `exec_queue_destroyed(q)` failed! .... WARNING: CPU: 18 PID: 2640 at drivers/gpu/drm/xe/xe_guc_submit.c:1826 xe_guc_sched_done_handler+0xa54/0xef0 [xe] xe 0000:03:00.0: [drm] *ERROR* GT1: DEREGISTER_DONE: Unexpected engine state 0x00a1, guc_id=57 Looking at a snippet of corresponding ftrace for this GuC id we can see: 162.673311: xe_sched_msg_add: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3 162.673317: xe_sched_msg_recv: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3 162.673319: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0 162.674089: xe_exec_queue_kill: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0 162.674108: xe_exec_queue_close: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0 162.674488: xe_exec_queue_scheduling_done: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0 162.678452: xe_exec_queue_deregister: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa1, flags=0x0 It looks like we try to suspend the queue (opcode=3), setting suspend_pending and triggering a disable_scheduling. The user then closes the queue. However the close will also forcefully signal the suspend fence after killing the queue, later when the G2H response for disable_scheduling comes back we have now cleared suspend_pending when signalling the suspend fence, so the disable_scheduling now incorrectly tries to also deregister the queue. This leads to warnings since the queue has yet to even be marked for destruction. We also seem to trigger errors later with trying to double unregister the same queue. To fix this tweak the ordering when handling the response to ensure we don't race with a disable_scheduling that didn't actually intend to perform an unregister. The destruction path should now also correctly wait for any pending_disable before marking as destroyed. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3371 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241122161914.321263-6-matthew.auld@intel.com (cherry picked from commit f161809b362f027b6d72bd998e47f8f0bad60a2e) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/xe/migrate: use XE_BO_FLAG_PAGETABLEMatthew Auld1-1/+2
commit c78f4399188369a55eed69cbf19a8aad2a65ac75 upstream. On some HW we want to avoid the host caching PTEs, since access from GPU side can be incoherent. However here the special migrate object is mapping PTEs which are written from the host and potentially cached. Use XE_BO_FLAG_PAGETABLE to ensure that non-cached mapping is used, on platforms where this matters. Fixes: 7a060d786cc1 ("drm/xe/mtl: Map PPGTT as CPU:WC") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241126181259.159713-4-matthew.auld@intel.com (cherry picked from commit febc689b27d28973cd02f667548a5dca383d859a) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09Revert "drm/radeon: Delay Connector detecting when HPD singals is unstable"Alex Deucher1-10/+0
commit 979bfe291b5b30a9132c2fd433247e677b24c6aa upstream. This reverts commit 949658cb9b69ab9d22a42a662b2fdc7085689ed8. This causes a blank screen on boot. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3696 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shixiong Ou <oushixiong@kylinos.cn> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/xe/migrate: fix pat index usageMatthew Auld1-1/+2
commit 23346f85163de83aca6dc30dde3944131cf54706 upstream. XE_CACHE_WB must be converted into the per-platform pat index for that particular caching mode, otherwise we are just encoding whatever happens to be the value of that enum. Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241126181259.159713-3-matthew.auld@intel.com (cherry picked from commit f3dc9246f9c3cd5a7d8fd70cfd805bfc52214e2e) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/xe/xe_guc_ads: save/restore OA registers and allowlist regsJonathan Cavitt1-0/+14
commit 55858fa7eb2f163f7aa34339fd3399ba4ff564c6 upstream. Several OA registers and allowlist registers were missing from the save/restore list for GuC and could be lost during an engine reset. Add them to the list. v2: - Fix commit message (Umesh) - Add missing closes (Ashutosh) v3: - Add missing fixes (Ashutosh) Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Suggested-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: stable@vger.kernel.org # v6.11+ Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023200716.82624-1-jonathan.cavitt@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm: xlnx: zynqmp_dpsub: fix hotplug detectionSteffen Dirkwinkel1-2/+2
commit 71ba1c9b1c717831920c3d432404ee5a707e04b4 upstream. drm_kms_helper_poll_init needs to be called after zynqmp_dpsub_kms_init. zynqmp_dpsub_kms_init creates the connector and without it we don't enable hotplug detection. Fixes: eb2d64bfcc17 ("drm: xlnx: zynqmp_dpsub: Report HPD through the bridge") Cc: stable@vger.kernel.org Signed-off-by: Steffen Dirkwinkel <s.dirkwinkel@beckhoff.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241028134218.54727-1-lists@steffen.cc Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/etnaviv: flush shader L1 cache after user commandstreamLucas Stach1-1/+2
commit 4f8dbadef085ab447a01a8d4806a3f629fea05ed upstream. The shader L1 cache is a writeback cache for shader loads/stores and thus must be flushed before any BOs backing the shader buffers are potentially freed. Cc: stable@vger.kernel.org Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/bridge: it6505: Fix inverted reset polarityChen-Yu Tsai1-4/+4
commit c5f3f21728b069412e8072b8b1d0a3d9d3ab0265 upstream. The IT6505 bridge chip has a active low reset line. Since it is a "reset" and not an "enable" line, the GPIO should be asserted to put it in reset and deasserted to bring it out of reset during the power on sequence. The polarity was inverted when the driver was first introduced, likely because the device family that was targeted had an inverting level shifter on the reset line. The MT8186 Corsola devices already have the IT6505 in their device tree, but the whole display pipeline is actually disabled and won't be enabled until some remaining issues are sorted out. The other known user is the MT8183 Kukui / Jacuzzi family; their device trees currently do not have the IT6505 included. Fix the polarity in the driver while there are no actual users. Fixes: b5c84a9edcd4 ("drm/bridge: add it6505 driver") Cc: stable@vger.kernel.org Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241029095411.657616-1-wenst@chromium.org Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/mediatek: Fix child node refcount handling in early exitJavier Carrasco1-1/+3
commit f708e8b4cfd16e5c8cd8d7fcfcb2fb2c6ed93af3 upstream. Early exits (goto, break, return) from for_each_child_of_node() required an explicit call to of_node_put(), which was not introduced with the break if cnt == MAX_CRTC. Add the missing of_node_put() before the break. Cc: stable@vger.kernel.org Fixes: d761b9450e31 ("drm/mediatek: Add cnt checking for coverity issue") Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241011-mtk_drm_drv_memleak-v1-1-2b40c74c8d75@gmail.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/fbdev-dma: Select FB_DEFERRED_IOThomas Zimmermann1-1/+1
commit 67c40c9b2ec5f375bf78274d4e9ef0e3b8315bea upstream. Commit 808a40b69468 ("drm/fbdev-dma: Implement damage handling and deferred I/O") added deferred I/O for fbdev-dma. Also select the Kconfig symbol FB_DEFERRED_IO (via FB_DMAMEM_HELPERS_DEFERRED). Fixes build errors about missing fbdefio, such as drivers/gpu/drm/drm_fbdev_dma.c:218:26: error: 'struct drm_fb_helper' has no member named 'fbdefio' 218 | fb_helper->fbdefio.delay = HZ / 20; | ^~ drivers/gpu/drm/drm_fbdev_dma.c:219:26: error: 'struct drm_fb_helper' has no member named 'fbdefio' 219 | fb_helper->fbdefio.deferred_io = drm_fb_helper_deferred_io; | ^~ drivers/gpu/drm/drm_fbdev_dma.c:221:21: error: 'struct fb_info' has no member named 'fbdefio' 221 | info->fbdefio = &fb_helper->fbdefio; | ^~ drivers/gpu/drm/drm_fbdev_dma.c:221:43: error: 'struct drm_fb_helper' has no member named 'fbdefio' 221 | info->fbdefio = &fb_helper->fbdefio; | ^~ Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410050241.Mox9QRjP-lkp@intel.com/ Fixes: 808a40b69468 ("drm/fbdev-dma: Implement damage handling and deferred I/O") Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: <stable@vger.kernel.org> # v6.11+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241014085740.582287-4-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/sti: avoid potential dereference of error pointersMa Ke1-0/+3
commit 831214f77037de02afc287eae93ce97f218d8c04 upstream. The return value of drm_atomic_get_crtc_state() needs to be checked. To avoid use of error pointer 'crtc_state' in case of the failure. Cc: stable@vger.kernel.org Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Link: https://patchwork.freedesktop.org/patch/msgid/20240913090412.2022848-1-make24@iscas.ac.cn Signed-off-by: Alain Volmat <alain.volmat@foss.st.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm: panel: jd9365da-h3: Remove unused num_init_cmds structure memberHugo Villeneuve1-1/+0
commit 66ae275365be4f118abe2254a0ced1d913af93f2 upstream. Now that the driver has been converted to use wrapped MIPI DCS functions, the num_init_cmds structure member is no longer needed, so remove it. Fixes: 35583e129995 ("drm/panel: panel-jadard-jd9365da-h3: use wrapped MIPI DCS functions") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Jessica Zhang <quic_jesszhan@quicinc.com> Link: https://lore.kernel.org/r/20240930170503.1324560-1-hugo@hugovil.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240930170503.1324560-1-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/sti: avoid potential dereference of error pointers in sti_gdp_atomic_checkMa Ke1-0/+3
commit e965e771b069421c233d674c3c8cd8c7f7245f42 upstream. The return value of drm_atomic_get_crtc_state() needs to be checked. To avoid use of error pointer 'crtc_state' in case of the failure. Cc: stable@vger.kernel.org Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Acked-by: Alain Volmat <alain.volmat@foss.st.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240909063359.1197065-1-make24@iscas.ac.cn Signed-off-by: Alain Volmat <alain.volmat@foss.st.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/sti: avoid potential dereference of error pointers in sti_hqvdp_atomic_checkMa Ke1-0/+3
commit c1ab40a1fdfee732c7e6ff2fb8253760293e47e8 upstream. The return value of drm_atomic_get_crtc_state() needs to be checked. To avoid use of error pointer 'crtc_state' in case of the failure. Cc: stable@vger.kernel.org Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Link: https://patchwork.freedesktop.org/patch/msgid/20240913090926.2023716-1-make24@iscas.ac.cn Signed-off-by: Alain Volmat <alain.volmat@foss.st.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09drm/panic: Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=nLyude Paul1-1/+1
commit 319e53f155907cf2c6dabc16ec9dce0179bc04d1 upstream. It turns out that if you happen to have a kernel config where CONFIG_DRM_PANIC is disabled and spinlock debugging is enabled, along with KMS being enabled - we'll end up trying to acquire an uninitialized spin_lock with drm_panic_lock() when we try to do a commit: rvkms rvkms.0: [drm:drm_atomic_commit] committing 0000000068d2ade1 INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 4 PID: 1347 Comm: modprobe Not tainted 6.10.0-rc1Lyude-Test+ #272 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20240524-3.fc40 05/24/2024 Call Trace: <TASK> dump_stack_lvl+0x77/0xa0 assign_lock_key+0x114/0x120 register_lock_class+0xa8/0x2c0 __lock_acquire+0x7d/0x2bd0 ? __vmap_pages_range_noflush+0x3a8/0x550 ? drm_atomic_helper_swap_state+0x2ad/0x3a0 lock_acquire+0xec/0x290 ? drm_atomic_helper_swap_state+0x2ad/0x3a0 ? lock_release+0xee/0x310 _raw_spin_lock_irqsave+0x4e/0x70 ? drm_atomic_helper_swap_state+0x2ad/0x3a0 drm_atomic_helper_swap_state+0x2ad/0x3a0 drm_atomic_helper_commit+0xb1/0x270 drm_atomic_commit+0xaf/0xe0 ? __pfx___drm_printfn_info+0x10/0x10 drm_client_modeset_commit_atomic+0x1a1/0x250 drm_client_modeset_commit_locked+0x4b/0x180 drm_client_modeset_commit+0x27/0x50 __drm_fb_helper_restore_fbdev_mode_unlocked+0x76/0x90 drm_fb_helper_set_par+0x38/0x40 fbcon_init+0x3c4/0x690 visual_init+0xc0/0x120 do_bind_con_driver+0x409/0x4c0 do_take_over_console+0x233/0x280 do_fb_registered+0x11f/0x210 fbcon_fb_registered+0x2c/0x60 register_framebuffer+0x248/0x2a0 __drm_fb_helper_initial_config_and_unlock+0x58a/0x720 drm_fbdev_generic_client_hotplug+0x6e/0xb0 drm_client_register+0x76/0xc0 _RNvXs_CsHeezP08sTT_5rvkmsNtB4_5RvkmsNtNtCs1cdwasc6FUb_6kernel8platform6Driver5probe+0xed2/0x1060 [rvkms] ? _RNvMs_NtCs1cdwasc6FUb_6kernel8platformINtB4_7AdapterNtCsHeezP08sTT_5rvkms5RvkmsE14probe_callbackBQ_+0x2b/0x70 [rvkms] ? acpi_dev_pm_attach+0x25/0x110 ? platform_probe+0x6a/0xa0 ? really_probe+0x10b/0x400 ? __driver_probe_device+0x7c/0x140 ? driver_probe_device+0x22/0x1b0 ? __device_attach_driver+0x13a/0x1c0 ? __pfx___device_attach_driver+0x10/0x10 ? bus_for_each_drv+0x114/0x170 ? __device_attach+0xd6/0x1b0 ? bus_probe_device+0x9e/0x120 ? device_add+0x288/0x4b0 ? platform_device_add+0x75/0x230 ? platform_device_register_full+0x141/0x180 ? rust_helper_platform_device_register_simple+0x85/0xb0 ? _RNvMs2_NtCs1cdwasc6FUb_6kernel8platformNtB5_6Device13create_simple+0x1d/0x60 ? _RNvXs0_CsHeezP08sTT_5rvkmsNtB5_5RvkmsNtCs1cdwasc6FUb_6kernel6Module4init+0x11e/0x160 [rvkms] ? 0xffffffffc083f000 ? init_module+0x20/0x1000 [rvkms] ? kernfs_xattr_get+0x3e/0x80 ? do_one_initcall+0x148/0x3f0 ? __lock_acquire+0x5ef/0x2bd0 ? __lock_acquire+0x5ef/0x2bd0 ? __lock_acquire+0x5ef/0x2bd0 ? put_cpu_partial+0x51/0x1d0 ? lock_acquire+0xec/0x290 ? put_cpu_partial+0x51/0x1d0 ? lock_release+0xee/0x310 ? put_cpu_partial+0x51/0x1d0 ? fs_reclaim_acquire+0x69/0xf0 ? lock_acquire+0xec/0x290 ? fs_reclaim_acquire+0x69/0xf0 ? kfree+0x22f/0x340 ? lock_release+0xee/0x310 ? kmalloc_trace_noprof+0x48/0x340 ? do_init_module+0x22/0x240 ? kmalloc_trace_noprof+0x155/0x340 ? do_init_module+0x60/0x240 ? __se_sys_finit_module+0x2e0/0x3f0 ? do_syscall_64+0xa4/0x180 ? syscall_exit_to_user_mode+0x108/0x140 ? do_syscall_64+0xb0/0x180 ? vma_end_read+0xd0/0xe0 ? do_user_addr_fault+0x309/0x640 ? clear_bhb_loop+0x45/0xa0 ? clear_bhb_loop+0x45/0xa0 ? clear_bhb_loop+0x45/0xa0 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> Fix this by stubbing these macros out when this config option isn't enabled, along with fixing the unused variable warning that introduces. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Fixes: e2a1cda3e0c7 ("drm/panic: Add drm panic locking") Cc: <stable@vger.kernel.org> # v6.10+ Link: https://patchwork.freedesktop.org/patch/msgid/20240916230103.611490-1-lyude@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05drm/xe/ufence: Wake up waiters after setting ufence->signalledNirmoy Das1-1/+5
[ Upstream commit 37a1cf288e4538eb39b38dbc745fe0da7ae53d94 ] If a previous ufence is not signalled, vm_bind will return -EBUSY. Delaying the modification of ufence->signalled can cause issues if the UMD reuses the same ufence so update ufence->signalled before waking up waiters. Cc: Matthew Brost <matthew.brost@intel.com> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3233 Fixes: 977e5b82e090 ("drm/xe: Expose user fence from xe_sync_entry") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241114150537.4161573-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 553a5d14fcd927194c409b10faced6a6dbc678d1) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/amd/display: Fix null check for pipe_ctx->plane_state in hwss_setup_dppZicheng Qu1-0/+3
[ Upstream commit 2bc96c95070571c6c824e0d4c7783bee25a37876 ] This commit addresses a null pointer dereference issue in hwss_setup_dpp(). The issue could occur when pipe_ctx->plane_state is null. The fix adds a check to ensure `pipe_ctx->plane_state` is not null before accessing. This prevents a null pointer dereference. Fixes: 0baae6246307 ("drm/amd/display: Refactor fast update to use new HWSS build sequence") Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/amd/display: Fix null check for pipe_ctx->plane_state in dcn20_program_pipeZicheng Qu1-3/+3
[ Upstream commit 6a057072ddd127255350357dd880903e8fa23f36 ] This commit addresses a null pointer dereference issue in dcn20_program_pipe(). Previously, commit 8e4ed3cf1642 ("drm/amd/display: Add null check for pipe_ctx->plane_state in dcn20_program_pipe") partially fixed the null pointer dereference issue. However, in dcn20_update_dchubp_dpp(), the variable pipe_ctx is passed in, and plane_state is accessed again through pipe_ctx. Multiple if statements directly call attributes of plane_state, leading to potential null pointer dereference issues. This patch adds necessary null checks to ensure stability. Fixes: 8e4ed3cf1642 ("drm/amd/display: Add null check for pipe_ctx->plane_state in dcn20_program_pipe") Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/radeon: Fix spurious unplug event on radeon HDMISteven 'Steve' Kendall1-4/+8
[ Upstream commit 7037bb04265ef05c6ffad56d884b0df76f57b095 ] On several HP models (tested on HP 3125 and HP Probook 455 G2), spurious unplug events are emitted upon login on Chrome OS. This is likely due to the way Chrome OS restarts graphics upon login, so it's possible it's an issue on other distributions but not as common, though I haven't reproduced the issue elsewhere. Use logic from an earlier version of the merged change (see link below) which iterates over connectors and finds matching encoders, rather than the other way around. Also fixes an issue with screen mirroring on Chrome OS. I've deployed this patch on Fedora and did not observe any regression on these devices. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1569#note_1603002 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3771 Fixes: 20ea34710f7b ("drm/radeon: Add HD-audio component notifier support (v6)") Signed-off-by: Steven 'Steve' Kendall <skend@chromium.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/amdkfd: Fix wrong usage of INIT_WORK()Yuan Can1-2/+3
[ Upstream commit 21cae8debc6a1d243f64fa82cd1b41cb612b5c61 ] In kfd_procfs_show(), the sdma_activity_work_handler is a local variable and the sdma_activity_work_handler.sdma_activity_work should initialize with INIT_WORK_ONSTACK() instead of INIT_WORK(). Fixes: 32cb59f31362 ("drm/amdkfd: Track SDMA utilization per process") Signed-off-by: Yuan Can <yuancan@huawei.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/amdgpu: Fix map/unmap queue logicLijo Lazar3-20/+63
[ Upstream commit fa31798582882740f2b13d19e1bd43b4ef918e2f ] In current logic, it calls ring_alloc followed by a ring_test. ring_test in turn will call another ring_alloc. This is illegal usage as a ring_alloc is expected to be closed properly with a ring_commit. Change to commit the map/unmap queue packet first followed by a ring_test. Add a comment about the usage of ring_test. Also, reorder the current pre-condition checks of job hang or kiq ring scheduler not ready. Without them being met, it is not useful to attempt ring or memory allocations. Fixes tag refers to the original patch which introduced this issue which then got carried over into newer code. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Fixes: 6c10b5cc4eaa ("drm/amdgpu: Remove duplicate code in gfx_v8_0.c") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/amdgpu: fix ACA bank count boundary check errorYang Wang1-1/+1
[ Upstream commit 2bb7dced1c2f8c0e705cc74840f776406db492c3 ] fix ACA bank count boundary check error. Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/panthor: Fix OPP refcnt leaks in devfreq initialisationAdrián Larumbe1-9/+8
[ Upstream commit 21c23e4b64e360d74d31b480f0572c2add0e8558 ] Rearrange lookup of recommended OPP for the Mali GPU device and its refcnt decremental to make sure no OPP object leaks happen in the error path. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Fixes: fac9b22df4b1 ("drm/panthor: Add the devfreq logical block") Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105205458.1318989-2-adrian.larumbe@collabora.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/panthor: record current and maximum device clock frequenciesAdrián Larumbe2-1/+23
[ Upstream commit 37591ae11f89cdfc0a647945a589468642a44c17 ] In order to support UM in calculating rates of GPU utilisation, the current operating and maximum GPU clock frequencies must be recorded during device initialisation, and also during OPP state transitions. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240923230912.2207320-3-adrian.larumbe@collabora.com Stable-dep-of: 21c23e4b64e3 ("drm/panthor: Fix OPP refcnt leaks in devfreq initialisation") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/panthor: introduce job cycle and timestamp accountingAdrián Larumbe2-49/+306
[ Upstream commit f8ff51a4708451763e6cfa36cc83dea8513d3318 ] Enable calculations of job submission times in clock cycles and wall time. This is done by expanding the boilerplate command stream when running a job to include instructions that compute said times right before and after a user CS. A separate kernel BO is created per queue to store those values. Jobs can access their sampled data through an index different from that of the queue's ringbuffer. The reason for this is saving memory on the profiling information kernel BO, since the amount of simultaneous profiled jobs we can write into the queue's ringbuffer might be much smaller than for regular jobs, as the former take more CSF instructions. This commit is done in preparation for enabling DRM fdinfo support in the Panthor driver, which depends on the numbers calculated herein. A profile mode mask has been added that will in a future commit allow UM to toggle performance metric sampling behaviour, which is disabled by default to save power. When a ringbuffer CS is constructed, timestamp and cycling sampling instructions are added depending on the enabled flags in the profiling mask. A helper was provided that calculates the number of instructions for a given set of enablement mask, and these are passed as the number of credits when initialising a DRM scheduler job. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240923230912.2207320-2-adrian.larumbe@collabora.com Stable-dep-of: 21c23e4b64e3 ("drm/panthor: Fix OPP refcnt leaks in devfreq initialisation") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/panfrost: Add missing OPP table refcnt decrementalAdrián Larumbe1-1/+2
[ Upstream commit 043e8afebf6c19abde9da1ac3d5cbf8b7ac8393f ] Commit f11b0417eec2 ("drm/panfrost: Add fdinfo support GPU load metrics") retrieves the OPP for the maximum device clock frequency, but forgets to keep the reference count balanced by putting the returned OPP object. This eventually leads to an OPP core warning when removing the device. Fix it by putting OPP objects as many times as they're retrieved. Also remove an unnecessary whitespace. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Fixes: f11b0417eec2 ("drm/panfrost: Add fdinfo support GPU load metrics") Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105205458.1318989-1-adrian.larumbe@collabora.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm: use ATOMIC64_INIT() for atomic64_tJonathan Gray1-1/+1
[ Upstream commit 9877bb2775d020fb7000af5ca989331d09d0e372 ] use ATOMIC64_INIT() not ATOMIC_INIT() for atomic64_t Fixes: 3f09a0cd4ea3 ("drm: Add common fdinfo helper") Signed-off-by: Jonathan Gray <jsg@jsg.id.au> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240111023045.50013-1-jsg@jsg.id.au Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/amdkfd: Use dynamic allocation for CU occupancy array in ↵Srinivasan Shanmugam1-3/+6
'kfd_get_cu_occupancy()' [ Upstream commit 922f0e00017b09d9d47e3efac008c8b20ed546a0 ] The `kfd_get_cu_occupancy` function previously declared a large `cu_occupancy` array as a local variable, which could lead to stack overflows due to excessive stack usage. This commit replaces the static array allocation with dynamic memory allocation using `kcalloc`, thereby reducing the stack size. This change avoids the risk of stack overflows in kernel space, in scenarios where `AMDGPU_MAX_QUEUES` is large. The allocated memory is freed using `kfree` before the function returns to prevent memory leaks. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c: In function ‘kfd_get_cu_occupancy’: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:322:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] 322 | } | ^ Fixes: 6ae9e1aba97e ("drm/amdkfd: Update logic for CU occupancy calculations") Cc: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/amdgpu: Fix the memory allocation issue in amdgpu_discovery_get_nps_info()Li Huafei1-3/+5
[ Upstream commit a1144da794adedb9447437c57d69add56494309d ] Fix two issues with memory allocation in amdgpu_discovery_get_nps_info() for mem_ranges: - Add a check for allocation failure to avoid dereferencing a null pointer. - As suggested by Christophe, use kvcalloc() for memory allocation, which checks for multiplication overflow. Additionally, assign the output parameters nps_type and range_cnt after the kvcalloc() call to prevent modifying the output parameters in case of an error return. Fixes: b194d21b9bcc ("drm/amdgpu: Use NPS ranges from discovery table") Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/vkms: Drop unnecessary call to drm_crtc_cleanup()José Expósito1-4/+1
[ Upstream commit 1d43dddd7c38ea1aa93f78f7ee10087afb0a561f ] CRTC creation uses drmm_crtc_init_with_planes(), which automatically handles cleanup. However, an unnecessary call to drm_crtc_cleanup() is still present in the vkms_output_init() error path. Fixes: 99cc528ebe92 ("drm/vkms: Use drmm_crtc_init_with_planes()") Signed-off-by: José Expósito <jose.exposito89@gmail.com> Reviewed-by: Maíra Canal <mcanal@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241031183835.3633-1-jose.exposito89@gmail.com Acked-by: Louis Chauvet <louis.chauvet@bootlin.com> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/msm/dpu: cast crtc_clk calculation to u64 in _dpu_core_perf_calc_clk()Zichen Xie1-1/+1
[ Upstream commit 20c7b42d9dbd048019bfe0af39229e3014007a98 ] There may be a potential integer overflow issue in _dpu_core_perf_calc_clk(). crtc_clk is defined as u64, while mode->vtotal, mode->hdisplay, and drm_mode_vrefresh(mode) are defined as a smaller data type. The result of the calculation will be limited to "int" in this case without correct casting. In screen with high resolution and high refresh rate, integer overflow may happen. So, we recommend adding an extra cast to prevent potential integer overflow. Fixes: c33b7c0389e1 ("drm/msm/dpu: add support for clk and bw scaling for display") Signed-off-by: Zichen Xie <zichenxie0106@gmail.com> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/622206/ Link: https://lore.kernel.org/r/20241029194209.23684-1-zichenxie0106@gmail.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm: xlnx: zynqmp_disp: layer may be null while releasingSteffen Dirkwinkel1-0/+3
[ Upstream commit 223842c7702b52846b1c5aef8aca7474ec1fd29b ] layer->info can be null if we have an error on the first layer in zynqmp_disp_create_layers Fixes: 1836fd5ed98d ("drm: xlnx: zynqmp_dpsub: Minimize usage of global flag") Signed-off-by: Steffen Dirkwinkel <s.dirkwinkel@beckhoff.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241028133941.54264-1-lists@steffen.cc Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm: zynqmp_kms: Unplug DRM device before removalSean Anderson1-1/+1
[ Upstream commit 2e07c88914fc5289c21820b1aa94f058feb38197 ] Prevent userspace accesses to the DRM device from causing use-after-frees by unplugging the device before we remove it. This causes any further userspace accesses to result in an error without further calls into this driver's internals. Fixes: d76271d22694 ("drm: xlnx: DRM/KMS driver for Xilinx ZynqMP DisplayPort Subsystem") Closes: https://lore.kernel.org/dri-devel/4d8f4c9b-2efb-4774-9a37-2f257f79b2c9@linux.dev/ Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240809193600.3360015-2-sean.anderson@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/nouveau/gr/gf100: Fix missing unlock in gf100_gr_chan_new()Li Huafei1-0/+1
[ Upstream commit a2f599046c671d6b46d93aed95b37241ce4504cf ] When the call to gf100_grctx_generate() fails, unlock gr->fecs.mutex before returning the error. Fixes smatch warning: drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:480 gf100_gr_chan_new() warn: inconsistent returns '&gr->fecs.mutex'. Fixes: ca081fff6ecc ("drm/nouveau/gr/gf100-: generate golden context during first object alloc") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241026173844.2392679-1-lihuafei1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/amd/display: Reduce HPD Detection Interval for IPSFangzhi Zuo1-1/+1
[ Upstream commit a88b19b13fb41a3fa03ec67b5f57cc267fbfb160 ] Fix DP Compliance test 4.2.1.3, 4.2.2.8, 4.3.1.12, 4.3.1.13 when IPS enabled. Original HPD detection interval is set to 5s which violates DP compliance. Reduce the interval parameter, such that link training can be finished within 5 seconds. Fixes: afca033f10d3 ("drm/amd/display: Add periodic detection for IPS") Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/amd/display: Increase idle worker HPD detection timeRoman Li1-1/+1
[ Upstream commit 60612f75992d96955fb7154468c58d5d168cf1ab ] [Why] Idle worker thread waits HPD_DETECTION_TIME for HPD processing complete. Some displays require longer time for that. [How] Increase HPD_DETECTION_TIME to 100ms. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: a88b19b13fb4 ("drm/amd/display: Reduce HPD Detection Interval for IPS") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05drm/etnaviv: hold GPU lock across perfmon samplingLucas Stach1-6/+14
[ Upstream commit 37dc4737447a7667f8e9ec790dac251da057eb27 ] The perfmon sampling mutates shared GPU state (e.g. VIVS_HI_CLOCK_CONTROL to select the pipe for the perf counter reads). To avoid clashing with other functions mutating the same state (e.g. etnaviv_gpu_update_clock) the perfmon sampling needs to hold the GPU lock. Fixes: 68dc0b295dcb ("drm/etnaviv: use 'sync points' for performance monitor requests") Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>