summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm
AgeCommit message (Collapse)AuthorFilesLines
2025-07-15drm/bridge: megachips-stdpxxxx-ge-b850v3-fw: Fix a compile error due to ↵Andy Yan1-1/+1
bridge->detect parameter changes Fix the compile error due to bridge->detect parameter changes. Reported-by: Dixit Ashutosh <ashutosh.dixit@intel.com> Closes: https://lore.kernel.org/dri-devel/175250667117.3567548.8371527247937906463.b4-ty@oss.qualcomm.com/T/#m8ecd00a05a330bc9c76f11c981daafcb30a7c2e0 Fixes: 5d156a9c3d5e ("drm/bridge: Pass down connector to drm bridge detect hook") Signed-off-by: Andy Yan <andyshrk@163.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250715054754.800765-1-andyshrk@163.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-07-15drm/panfrost: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the resetMaíra Canal1-4/+4
Panfrost can skip the reset if TDR has fired before the free-job worker. Currently, since Panfrost doesn't take any action on these scenarios, the job is being leaked, considering that `free_job()` won't be called. To avoid such leaks, inform the scheduler that the job did not actually timeout and no reset was performed through the new status code DRM_GPU_SCHED_STAT_NO_HANG. Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-8-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the resetMaíra Canal1-9/+3
Xe can skip the reset if TDR has fired before the free job worker and can also re-arm the timeout timer in some scenarios. Instead of manipulating scheduler's internals, inform the scheduler that the job did not actually timeout and no reset was performed through the new status code DRM_GPU_SCHED_STAT_NO_HANG. Note that, in the first case, there is no need to restart submission if it hasn't been stopped. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-7-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the resetMaíra Canal1-8/+4
Etnaviv can skip a hardware reset in two situations: 1. TDR has fired before the free-job worker and the timeout is spurious. 2. The GPU is still making progress on the front-end and we can give the job a chance to complete. Instead of manipulating scheduler's internals, inform the scheduler that the job did not actually timeout and no reset was performed through the new status code DRM_GPU_SCHED_STAT_NO_HANG. Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-6-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the resetMaíra Canal1-14/+2
When a CL/CSD job times out, we check if the GPU has made any progress since the last timeout. If so, instead of resetting the hardware, we skip the reset and allow the timer to be rearmed. This gives long-running jobs a chance to complete. Instead of manipulating scheduler's internals, inform the scheduler that the job did not actually timeout and no reset was performed through the new status code DRM_GPU_SCHED_STAT_NO_HANG. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-5-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANGMaíra Canal3-0/+49
Add a test to submit a single job against a scheduler with the timeout configured and verify that if the job is still running, the timeout handler will skip the reset and allow the job to complete. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-4-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15drm/sched: Make timeout KUnit tests fasterMaíra Canal1-3/+5
As more KUnit tests are introduced to evaluate the basic capabilities of the `timedout_job()` hook, the test suite will continue to increase in duration. To reduce the overall running time of the test suite, decrease the scheduler's timeout for the timeout tests. Before this commit: [15:42:26] Elapsed time: 15.637s total, 0.002s configuring, 10.387s building, 5.229s running After this commit: [15:45:26] Elapsed time: 9.263s total, 0.002s configuring, 5.168s building, 4.037s running Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-3-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15drm/sched: Allow drivers to skip the reset and keep on runningMaíra Canal1-2/+44
When the DRM scheduler times out, it's possible that the GPU isn't hung; instead, a job just took unusually long (longer than the timeout) but is still running, and there is, thus, no reason to reset the hardware. This can occur in two scenarios: 1. The job is taking longer than the timeout, but the driver determined through a GPU-specific mechanism that the hardware is still making progress. Hence, the driver would like the scheduler to skip the timeout and treat the job as still pending from then onward. This happens in v3d, Etnaviv, and Xe. 2. Timeout has fired before the free-job worker. Consequently, the scheduler calls `sched->ops->timedout_job()` for a job that isn't timed out. These two scenarios are problematic because the job was removed from the `sched->pending_list` before calling `sched->ops->timedout_job()`, which means that when the job finishes, it won't be freed by the scheduler though `sched->ops->free_job()` - leading to a memory leak. To solve these problems, create a new `drm_gpu_sched_stat`, called DRM_GPU_SCHED_STAT_NO_HANG, which allows a driver to skip the reset. The new status will indicate that the job must be reinserted into `sched->pending_list`, and the hardware / driver will still complete that job. Reviewed-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-2-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESETMaíra Canal13-23/+23
Among the scheduler's statuses, the only one that indicates an error is DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV signifies that the operation succeeded and the GPU is in a nominal state. However, to provide more information about the GPU's status, it is needed to convey more information than just "OK". Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has hung, but it has been successfully reset and is now in a nominal state again. Reviewed-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-1-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15drm/xe/pf: Invalidate LMTT after completing changesMichal Wajdeczko1-1/+11
Once we finish populating all leaf pages in the VF's LMTT we should make sure that hardware will not access any stale data. Explicitly force LMTT invalidation (as it was already planned in the past). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-7-michal.wajdeczko@intel.com
2025-07-15drm/xe/pf: Invalidate LMTT during LMEM unprovisioningMichal Wajdeczko5-0/+94
Invalidate LMTT immediately after removing VF's LMTT page tables and clearing root PTE in the LMTT PD to avoid any invalid access by the hardware (and VF) due to stale data. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-6-michal.wajdeczko@intel.com
2025-07-15drm/xe/pf: Force GuC virtualization modeMichal Wajdeczko1-0/+26
By default the GuC starts in the 'native' mode and enables the VGT mode (aka 'virtualization' mode) only after it receives at least one set of VF configuration data. While this happens naturally while PF begins VFs provisioning, we might need this sooner as some actions, like TLB_INVALIDATION_ALL(0x7002), is supported by the GuC only in the VGT mode. And this becomes a real problem if we would want to use above action to invalidate the LMTT early during VFs auto-provisioning, before VFs are enabled, as such H2G would be rejected: [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: FAST_REQ H2G fence 0x804e failed! e=0x30, h=0 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: Fence 0x804e was used by action 0x7002 sent at: h2g_write+0x33e/0x870 [xe] __guc_ct_send_locked+0x1e1/0x1110 [xe] guc_ct_send_locked+0x9f/0x740 [xe] xe_guc_ct_send_locked+0x19/0x60 [xe] send_tlb_invalidation+0xc2/0x470 [xe] xe_gt_tlb_invalidation_all_async+0x45/0xa0 [xe] xe_gt_tlb_invalidation_all+0x4b/0xa0 [xe] lmtt_invalidate_hw+0x64/0x1a0 [xe] xe_lmtt_invalidate_hw+0x5c/0x340 [xe] pf_update_vf_lmtt+0x398/0xae0 [xe] pf_provision_vf_lmem+0x350/0xa60 [xe] xe_gt_sriov_pf_config_bulk_set_lmem+0xe2/0x410 [xe] xe_gt_sriov_pf_config_set_fair_lmem+0x1c6/0x620 [xe] xe_gt_sriov_pf_config_set_fair+0xd5/0x3f0 [xe] xe_pci_sriov_configure+0x360/0x1200 [xe] sriov_numvfs_store+0xbc/0x1d0 dev_attr_store+0x17/0x40 sysfs_kf_write+0x4a/0x80 kernfs_fop_write_iter+0x166/0x220 vfs_write+0x2ba/0x580 ksys_write+0x77/0x100 __x64_sys_write+0x19/0x30 x64_sys_call+0x2bf/0x2660 do_syscall_64+0x93/0x7a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: CT dequeue failed: -71 [ ] xe 0000:4d:00.0: [drm] GT0: trying reset from receive_g2h [xe] This could be mitigated by pushing earlier a PF self-configuration with some hard-coded values that cover unlimited access to the GGTT, use of all GuC contexts and doorbells. This step is sufficient for the GuC to switch into the VGT mode. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-5-michal.wajdeczko@intel.com
2025-07-15drm/xe/pf: Move GGTT config KLVs encoding to helperMichal Wajdeczko1-11/+20
In upcoming patch we will want to encode GGTT config KLVs based on raw numbers, without relying on the allocated GGTT node. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-4-michal.wajdeczko@intel.com
2025-07-15drm/xe/pf: Resend PF provisioning after GT resetMichal Wajdeczko1-0/+27
If we reload the GuC due to suspend/resume or GT reset then we have to resend not only any VFs provisioning data, but also PF configuration, like scheduling parameters (EQ, PT), as otherwise GuC will continue to use default values. Fixes: 411220808cee ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com
2025-07-15drm/xe/pf: Prepare to stop SR-IOV support prior GT resetMichal Wajdeczko3-0/+27
As part of the resume or GT reset, the PF driver schedules work which is then used to complete restarting of the SR-IOV support, including resending to the GuC configurations of provisioned VFs. However, in case of short delay between those two actions, which could be seen by triggering a GT reset on the suspened device: $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset this PF worker might be still busy, which lead to errors due to just stopped or disabled GuC CTB communication: [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe] [ ] xe 0000:00:02.0: [drm] GT0: reset queued [ ] xe 0000:00:02.0: [drm] GT0: reset started [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled! [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED) [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed While this VFs reprovisioning will be successful during next spin of the worker, to avoid those errors, make sure to cancel restart worker if we are about to trigger next reset. Fixes: 411220808cee ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com
2025-07-14drm/xe/lrc: Add table with LRC layoutLucas De Marchi1-0/+24
Add a table to document the LRC's BO layout to make it easier to visualize how each region stacks on top of each other. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-4-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Waste fewer instructions in emit_wa_job()Tvrtko Ursulin3-41/+49
I was debugging some unrelated issue and noticed the current code was very verbose. We can improve it easily by using the more common batch buffer building pattern. Before: bb->cs[bb->len++] = MI_LOAD_REGISTER_REG | MI_LRR_DST_CS_MMIO; c4d: 41 8b 56 10 mov 0x10(%r14),%edx c51: 49 8b 4e 08 mov 0x8(%r14),%rcx c55: 8d 72 01 lea 0x1(%rdx),%esi c58: 41 89 76 10 mov %esi,0x10(%r14) c5c: c7 04 91 01 00 08 15 movl $0x15080001,(%rcx,%rdx,4) bb->cs[bb->len++] = entry->reg.addr; c63: 8b 08 mov (%rax),%ecx c65: 41 8b 56 10 mov 0x10(%r14),%edx c69: 49 8b 76 08 mov 0x8(%r14),%rsi c6d: 81 e1 ff ff 3f 00 and $0x3fffff,%ecx c73: 8d 7a 01 lea 0x1(%rdx),%edi c76: 41 89 7e 10 mov %edi,0x10(%r14) c7a: 89 0c 96 mov %ecx,(%rsi,%rdx,4) ..etc.. After: *cs++ = MI_LOAD_REGISTER_REG | MI_LRR_DST_CS_MMIO; c52: 41 c7 04 24 01 00 08 movl $0x15080001,(%r12) c59: 15 *cs++ = entry->reg.addr; c5a: 8b 10 mov (%rax),%edx ..etc.. Resulting in the following binary change: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-348 (-348) Function old new delta xe_gt_record_default_lrcs.cold 304 296 -8 xe_gt_record_default_lrcs 2200 1860 -340 Total: Before=13554, After=13206, chg -2.57% Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-7-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe/gt: Drop third submission for default contextLucas De Marchi1-8/+0
There's no need to submit the nop job again on the first queue. Any state needed is already saved when the first LRC is switched out. The comment is a little misleading regarding indirect W/A: first of all there's still no indirect W/A enabled and secondly, even after they are, there's no need to submit this job again for having their state propagated: the indirect W/A will actually run on every LRC switch. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-6-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe/lrc: Remove leftover TODO/FIXMELucas De Marchi1-6/+0
There isn't anything to set for CTX_TIMESTAMP handling in the empty LRC: that is set on every LRC init since it should always start from 0 rather than the value saved in the image after first submission. The FIXME about perma-pinning also doesn't make much sense as we will always going to pin the lrc and the GGTT mapping has nothing to do with VM bind. Nuke these leftover comments. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-5-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe/gt: Extract emit_job_sync()Lucas De Marchi1-32/+22
Both the nop and wa jobs are going through the same boiler plate calls to emit the job with a timeout and handling error for both bb and job. Extract emit_job_sync() so those functions create the bb, handling possible errors and delegate the part about really emitting the job and waiting for its completion. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-3-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Count dwords before allocatingLucas De Marchi2-15/+25
The bb allocation in emit_wa_job() is wrong in 2 ways: first it's allocating enough space for the 3DSTATE or hardcoding 4k depending on the engine. In the first case it doesn't account for the WAs and in the former it may not be sufficient. Secondly it's using the size instead of number of dwords, causing the buffer to be 4x bigger than needed: xe_bb_new() receives number of dwords as parameter and its declaration was also not following its implementation. Lastly, reword the debug message since it's not only about the LRC WAs anymore as it also include the 3DSTATE for render. While it's unlikely this is causing any real issue, let's calculate the needed space and allocate just enough. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-2-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe/lrc: Reduce scope of empty lrc dataLucas De Marchi1-11/+11
The only case in which new lrc data is created from scratch is when it's called prior to recording the default lrc. There's no need to check for NULL init_data since in that case the function already failed: just move the allocation where it's needed. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-1-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/panel-edp: Add BOE NE14QDM panel for Dell Latitude 7455Val Packett1-0/+1
Cannot confirm which variant exactly it is, as the EDID alphanumeric data contains '0RGNR' <0x80> 'NE14QDM' and ends there; but it's 60 Hz and with touch. I do not have access to datasheets for these panels, so the timing is a guess that was tested to work fine on this laptop. Raw EDID dump: 00 ff ff ff ff ff ff 00 09 e5 1e 0b 00 00 00 00 10 20 01 04 a5 1e 13 78 07 fd 85 a7 53 4c 9b 25 0f 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 a7 6d 00 a0 a0 40 78 60 30 20 36 00 2e bc 10 00 00 1a b9 57 00 a0 a0 40 78 60 30 20 36 00 2e bc 10 00 00 1a 00 00 00 fe 00 30 52 47 4e 52 80 4e 45 31 34 51 44 4d 00 00 00 00 00 02 41 31 a8 00 01 00 00 1a 41 0a 20 20 00 8f Signed-off-by: Val Packett <val@packett.cool> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20250706205723.9790-7-val@packett.cool
2025-07-14drm/xe/vf: Store negotiated VF/PF ABI version at device levelMichal Wajdeczko3-24/+30
There is no need to maintain PF ABI version on per-GT level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-8-michal.wajdeczko@intel.com
2025-07-14drm/xe/pf: Stop requiring VF/PF version negotiation on every GTMichal Wajdeczko12-403/+548
While some VF/PF relay actions must be handled on the GT level, like query for runtime registers, it was clarified by the arch team that initial version negotiation can be done by the VF just once, by using any available GuC/GT. Move handling of the VF/PF ABI version negotiation on the PF side from the GT level functions to the device level functions. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-7-michal.wajdeczko@intel.com
2025-07-14drm/xe/pf: Expose basic info about VFs in debugfsMichal Wajdeczko3-0/+53
We already have function to print summary about VFs, but we missed to add debugfs attribute to make it visible. Do it now. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-6-michal.wajdeczko@intel.com
2025-07-14drm/xe: Introduce xe_gt_is_main_type helperMichal Wajdeczko10-34/+39
Instead of checking for not being a media type GT provide a small helper to explicitly express our intentions. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-5-michal.wajdeczko@intel.com
2025-07-14drm/xe: Introduce xe_tile_is_root helperMichal Wajdeczko3-2/+10
Instead of looking at the tile->id member provide a small helper to explicitly express our intentions. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-4-michal.wajdeczko@intel.com
2025-07-14drm/xe: Move PF and VF device types to separate headersMichal Wajdeczko4-36/+58
We plan to add more PF and VF types and mixing them in a single file is not desired. Move them out to new dedicated files. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-3-michal.wajdeczko@intel.com
2025-07-14drm/panthor: Remove dead VM flushing codeAdrián Larumbe2-12/+0
Commit ec62d37d2c0d("drm/panthor: Fix the fast-reset logic") did away with the only reference to panthor_vm_flush_all(), so let's get rid of the orphaned definition. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250711154557.739326-1-adrian.larumbe@collabora.com
2025-07-14drm/xe: Combine PF and VF device data into unionMichal Wajdeczko1-4/+6
There is no need to keep PF and VF data fields fully separate since we can be only in one mode at the time. Move them into a anonymous union to save few bytes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-2-michal.wajdeczko@intel.com
2025-07-14drm/panfrost: Fix scheduler workqueue bugPhilipp Stanner1-1/+1
When the GPU scheduler was ported to using a struct for its initialization parameters, it was overlooked that panfrost creates a distinct workqueue for timeout handling. The pointer to this new workqueue is not initialized to the struct, resulting in NULL being passed to the scheduler, which then uses the system_wq for timeout handling. Set the correct workqueue to the init args struct. Cc: stable@vger.kernel.org # 6.15+ Fixes: 796a9f55a8d1 ("drm/sched: Use struct for drm_sched_init() params") Reported-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Closes: https://lore.kernel.org/dri-devel/b5d0921c-7cbf-4d55-aa47-c35cd7861c02@igalia.com/ Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250709102957.100849-2-phasta@kernel.org
2025-07-14drm/xe: Update register definitions in LRC layout headerXin Wang2-4/+3
Update the register definitions in xe_lrc_layout.h to align with the official hardware specification (Bspec) terminology. Specifically: - rename PVC_CTX_ACC_CTR_THOLD to CTX_ACC_CTR_THOLD - rename PVC_CTX_ASID to CTX_ASID Signed-off-by: Xin Wang <x.wang@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250711060924.7373-1-x.wang@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/bridge: Pass down connector to drm bridge detect hookAndy Yan31-38/+65
In some application scenarios, we hope to get the corresponding connector when the bridge's detect hook is invoked. In most cases, we can get the connector by drm_atomic_get_connector_for_encoder if the encoder attached to the bridge is enabled, however there will still be some scenarios where the detect hook of the bridge is called but the corresponding encoder has not been enabled yet. For instance, this occurs when the device is hot plug in for the first time. Since the call to bridge's detect is initiated by the connector, passing down the corresponding connector directly will make things simpler. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250703125027.311109-3-andyshrk@163.com [DB: added the chunk to the cdn-dp driver] Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-07-14drm/bridge: Make dp/hdmi_audio_* callback keep the same paramter order with ↵Andy Yan11-59/+59
get_modes Make the dp/hdmi_audio_* callback maintain the same parameter order as get_modes and edid_read: first the bridge, then the connector. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250703125027.311109-2-andyshrk@163.com [DB: added the chunk to the cdn-dp driver] Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-07-14drm/xe: Add plumbing for indirect context workaroundsTvrtko Ursulin3-3/+89
Some upcoming workarounds need to be emitted from the indirect workaround context so lets add some plumbing where they will be able to easily slot in. No functional changes for now since everything is still deactivated. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Bspec: 45954 Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-7-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Allow specifying number of extra dwords at the end of wa bb emissionTvrtko Ursulin1-3/+5
Indirect context setup will need more than one. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-6-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Track number of written dwords from workaround batch buffer emissionTvrtko Ursulin1-1/+4
Indirect context setup will need to get to the number of written dwords. Lets add it as an output parameter so it can be accessed from the finish helper regardless of whether code is writing directly or via an shadow buffer. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-5-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Rename utilization workaround emission functionTvrtko Ursulin1-3/+5
Lucas suggested to consolidate to a slightly different naming scheme which will align with the upcoming additions better. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-4-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Pass wa bb setup arguments in a structTvrtko Ursulin1-40/+53
Group the function arguments in a struct for more readable code and easier extending. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-3-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Generalize wa bb emission codeTvrtko Ursulin1-22/+48
Generalize the wa bb emission by splitting it into three phases - setup, emit and finish, and extract setup and finish steps into helpers. This will enable using the same infrastructure for emitting the indirect context workarounds. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-2-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Fix missing kernel-docLucas De Marchi1-0/+1
Fix warning: Warning: drivers/gpu/drm/xe/xe_device_types.h:658 struct member 'wa_active' not described in 'xe_device' Fixes: 661a6950e061 ("drm/xe: Add infrastructure for Device OOB workarounds") Cc: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Jonathan Cavitt <joanthan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250711214911.2009714-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Remove unused functionsDr. David Alan Gilbert6-51/+0
xe_bo_create_from_data() last use was removed in 2023 by commit 0e1a47fcabc8 ("drm/xe: Add a helper for DRM device-lifetime BO create") xe_rtp_match_first_gslice_fused_off() last use was removed in 2023 by commit 4e124151fcfc ("drm/xe/dg2: Drop pre-production workarounds") Remove them, and xe_dss_mask_empty whose last use was by xe_rtp_match_first_gslice_fused_off(). (Xe has a bunch ofother symbols that have been added but not used, given how new it is, I've left those, as opposed to these that had the code that used them removed). Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20250713152531.219326-1-linux@treblig.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-12Merge tag 'amd-drm-next-6.17-2025-07-11' of ↵Simona Vetter26-239/+270
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.17-2025-07-11: amdgpu: - Clean up function signatures - GC 10 KGQ reset fix - SDMA reset cleanups - Misc fixes - LVDS fixes - UserQ fix amdkfd: - Reset fix Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250711205548.21052-1-alexander.deucher@amd.com
2025-07-11drm/xe: Normalize default param valuesLucas De Marchi1-12/+23
Document xe module params with the default values following a similar strategy for all of them: 1) Define a DEFAULT_* macro with the default value. When the value can't be directly stringified, also define a *_STR variant 2) Use __stringify() or the _STR variant to make sure the default value shows up in the param description This allows us to show the correct default according to the configuration. max_vfs for example was wrongly documented for CONFIG_DRM_XE_DEBUG and svm_notifier_size didn't have its default documented. Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250626-guc-log-level-v3-1-c3ed8b452e91@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11drm/xe/migrate: Fix alignment checkLucas De Marchi1-2/+2
The check would fail if the address is unaligned, but not when accounting the offset. Instead of `buf | offset` it should have been `buf + offset`. To make it more readable and also drop the uintptr_t, just use the IS_ALIGNED() macro. Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11drm/nouveau: check ioctl command codes betterArnd Bergmann1-6/+5
nouveau_drm_ioctl() only checks the _IOC_NR() bits in the DRM_NOUVEAU_NVIF command, but ignores the type and direction bits, so any command with '7' in the low eight bits gets passed into nouveau_abi16_ioctl() instead of drm_ioctl(). Check for all the bits except the size that is handled inside of the handler. Fixes: 27111a23d01c ("drm/nouveau: expose the full object/event interfaces to userspace") Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ Fix up two checkpatch warnings and a typo. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250711072458.2665325-1-arnd@kernel.org
2025-07-11drm/xe: Remove references to CONFIG_DRM_XE_DEVMEM_MIRRORMatthew Brost1-2/+2
The prefetch code was referencing CONFIG_DRM_XE_DEVMEM_MIRROR, which has been replaced by CONFIG_DRM_XE_PAGEMAP. As a result, prefetches were limited to SRAM. Update the code to use CONFIG_DRM_XE_PAGEMAP instead of the deprecated option. Fixes: f86ad0ed620c ("drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250710205413.1105595-1-matthew.brost@intel.com
2025-07-11drm/xe: Move page fault init after topology initMatthew Brost1-3/+3
We need the topology to determine GT page fault queue size, move page fault init after topology init. Cc: stable@vger.kernel.org Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250710191208.1040215-1-matthew.brost@intel.com
2025-07-11drm/xe/migrate: fix copy direction in access_memoryMatthew Auld1-1/+1
After we do the modification on the host side, ensure we write the result back to VRAM and not the other way around, otherwise the modification will be lost if treated like a read. Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710134128.800756-2-matthew.auld@intel.com