summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2025-10-09drm/i915/fb: Fix the set_tiling vs. addfb race, againVille Syrjälä1-18/+20
intel_frontbuffer_get() is what locks out subsequent set_tiling changes to the bo. Thus the fence vs. modifier check must be done after intel_frontbuffer_get(), or else a concurrent set_tiling ioctl might sneak in and change the fence after the check has been done. Close the race again. See commit dd689287b977 ("drm/i915: Prevent concurrent tiling/framebuffer modifications") for the previous instance. v2: Reorder intel_user_framebuffer_destroy() to match the unwind (Jani) Cc: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Fixes: 10690b8a49bc ("drm/i915/display: Add intel_fb_bo_framebuffer_fini") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20251003145734.7634-3-ville.syrjala@linux.intel.com
2025-10-09drm/i915/frontbuffer: Move bo refcounting intel_frontbuffer_{get,release}()Ville Syrjälä2-3/+9
Currently xe's intel_frontbuffer implementation forgets to hold a reference on the bo. This makes the entire thing extremely fragile as the cleanup order now depends on bo references held by other things (namely intel_fb_bo_framebuffer_fini()). Move the bo refcounting to intel_frontbuffer_{get,release}() so that both i915 and xe do this the same way. I first tried to fix this by having xe do the refcounting from its intel_bo_set_frontbuffer() implementation (which is what i915 does currently), but turns out xe's drm_gem_object_free() can sleep and thus drm_gem_object_put() isn't safe to call while we hold fb_tracking.lock. Fixes: 10690b8a49bc ("drm/i915/display: Add intel_fb_bo_framebuffer_fini") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20251003145734.7634-2-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com>
2025-10-09drm/panfrost: Name scheduler queues after their job slotsAdrián Larumbe3-11/+15
Drawing from commit d2624d90a0b7 ("drm/panthor: assign unique names to queues"), give scheduler queues proper names that reflect the function of their JM slot, so that this will be shown when gathering DRM scheduler tracepoints. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20251009114313.1374948-1-adrian.larumbe@collabora.com
2025-10-09drm/xe: Increase global invalidation timeout to 1000usKenneth Graunke1-1/+1
The previous timeout of 500us seems to be too small; panning the map in the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered timeouts within a few seconds of usage, causing the monitor to freeze and the following to be printed to dmesg: [Jul30 13:44] xe 0000:03:00.0: [drm] *ERROR* GT0: Global invalidation timeout [Jul30 13:48] xe 0000:03:00.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out I haven't hit a single timeout since increasing it to 1000us even after several multi-hour testing sessions. Fixes: 0dd2dd0182bc ("drm/xe: Move DSB l2 flush to a more sensible place") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-09drm: Prevent sign extension on hdisplay and vdisplayJonathan Cavitt2-2/+2
Some functions in drm multiply hdisplay and vdisplay with a third factor, which can result in a sign extension according to static analysis due to an implicit s32 promotion. Use a cast to u32 to prevent this. Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Krzystof Karas <krzysztof.karas@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Link: https://lore.kernel.org/r/20251007153645.90920-2-jonathan.cavitt@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-10-09drm/i915/display: Make intel_crtc_get_vblank_counter safe on PREEMPT_RTMaarten Lankhorst1-2/+7
drm_crtc_accurate_vblank_count takes a spinlock, which we should avoid in tracepoints and debug functions. This also prevents taking the spinlock 2x during the critical section of pipe updates for DSI updates. Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://lore.kernel.org/r/20250829131701.15607-2-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-10-09drm/i915: Disable tracepoints for PREEMPT_RTMaarten Lankhorst1-0/+5
Luca Abeni reported this: | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003 | CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10 | Call Trace: | rt_spin_lock+0x3f/0x50 | gen6_read32+0x45/0x1d0 [i915] | g4x_get_vblank_counter+0x36/0x40 [i915] | trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915] The tracing events use trace_intel_pipe_update_start() among other events use functions acquire spinlock_t locks which are transformed into sleeping locks on PREEMPT_RT. A few trace points use intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also might acquire a sleeping locks on PREEMPT_RT. At the time the arguments are evaluated within trace point, preemption is disabled and so the locks must not be acquired on PREEMPT_RT. Based on this I don't see any other way than disable trace points on PREMPT_RT. [mlankhorst] The original patch was insufficient, and since the tracing infrastructure does not allow for partial disabling of tracepoints. Completely disable tracing for the entire i915 driver in PREEMPT_RT, a separate fix for display tracepoints on xe is added to make those work. Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reported-by: Luca Abeni <lucabe72@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20250828090944.101069-1-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-10-09drm/panthor: skip regulator setup if no such propRain Yang1-2/+1
The regulator is optional, skip the setup instead of returning an error if it is not present Signed-off-by: Rain Yang <jiyu.yang@nxp.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250928090334.35389-2-jiyu.yang@oss.nxp.com
2025-10-09drm/panthor: Ensure MCU is disabled on suspendKetil Johnsen1-0/+1
Currently the Panthor driver needs the GPU to be powered down between suspend and resume. If this is not done, then the MCU_CONTROL register will be preserved as AUTO, which again will cause a premature FW boot on resume. The FW will go directly into fatal state in this case. This case needs to be handled as there is no guarantee that the GPU will be powered down after the suspend callback on all platforms. The fix is to call panthor_fw_stop() in "pre-reset" path to ensure the MCU_CONTROL register is cleared (set DISABLE). This matches well with the already existing call to panthor_fw_start() from the "post-reset" path. Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block") Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20251008105112.4077015-1-ketil.johnsen@arm.com
2025-10-09drm/xe/guc: Increase wait timeout to 2sec after BUSY reply from GuCSatyanarayana K V P1-1/+1
Some VF2GUC actions may take longer to process. Increase default timeout after received BUSY indication to 2sec to cover all worst case scenarios. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-35-matthew.brost@intel.com
2025-10-09drm/xe/vf: Rebase CCS save/restore BB GGTT addressesMatthew Brost3-0/+33
Rebase the CCS save/restore BB's GGTT addresses during VF post-migration recovery by setting the software ring tail to zero, the LRC ring head to zero, and rewriting the jump-to-BB instructions. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-34-matthew.brost@intel.com
2025-10-09drm/xe/vf: Ensure media GT VF recovery runs after primary GT on PTLMatthew Brost1-2/+27
It is possible that the media GT's VF post-migration recovery work item gets scheduled before the primary GT's work item. Since the media GT depends on the primary GT's work item to complete CCS restore, if the media GT's work item is scheduled first, detect this condition and re-queue the media GT's work item for a later time. v5: - Adjust debug message (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-33-matthew.brost@intel.com
2025-10-09drm/xe/vf: Use primary GT ordered work queue on media GT on PTL VFMatthew Brost4-4/+19
VF CCS restore is a primary GT operation on which the media GT depends. Therefore, it doesn't make much sense to run these operations in parallel. To address this, point the media GT's ordered work queue to the primary GT's ordered work queue on platforms that require (PTL VFs) CCS restore as part of VF post-migration recovery. v7: - Remove bool from xe_gt_alloc (Lucas) v9: - Fix typo (Lucas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-32-matthew.brost@intel.com
2025-10-09drm/xe: Use PPGTT addresses for TLB invalidation to avoid GGTT fixupsSatyanarayana K V P1-8/+20
The migrate VM builds the CCS metadata save/restore batch buffer (BB) in advance and retains it so the GuC can submit it directly when saving a VM’s state. When a VM migrates between VFs, the GGTT base can change. Any GGTT-based addresses embedded in the BB would then have to be parsed and patched. Use PPGTT addresses in the BB (including for TLB invalidation) so the BB remains GGTT-agnostic and requires no address fixups during migration. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-31-matthew.brost@intel.com
2025-10-09drm/xe/vf: Workaround for race condition in GuC firmware during VF pauseMatthew Brost1-0/+4
A race condition exists where a paused VF's H2G request can be processed and subsequently rejected. This rejection results in a FAST_REQ failure being delivered to the KMD, which then terminates the CT via a dead worker and triggers a GT reset—an undesirable outcome. This workaround mitigates the issue by checking if a VF post-migration recovery is in progress and aborting these adverse actions accordingly. The GuC firmware will address this bug in an upcoming release. Once that version is available and VF migration depends on it, this workaround can be safely removed. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-30-matthew.brost@intel.com
2025-10-09drm/xe/vf: Add debug prints for GuC replaying state during VF recoveryMatthew Brost1-3/+20
Helpful to manually verify the GuC state machine can correctly replay the state during a VF post-migration recovery. All replay paths have been manually verified as triggered and working during testing. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-29-matthew.brost@intel.com
2025-10-09drm/xe: Move queue init before LRC creationMatthew Brost7-13/+98
A queue must be in the submission backend's tracking state before the LRC is created to avoid a race condition where the LRC's GGTT addresses are not properly fixed up during VF post-migration recovery. Move the queue initialization—which adds the queue to the submission backend's tracking state—before LRC creation. Also wait on pending GGTT fixups before allocating LRCs to avoid racing with fixups. v2: - Wait on VF GGTT fixes before creating LRC (testing) v5: - Adjust comment in code (Tomasz) - Reduce race window v7: - Only wakeup waiters in recovery path (CI) - Wakeup waiters on abort - Use GT warn on (Michal) - Fix kernel doc for LRC ring size function (Tomasz) v8: - Guard against migration not supported or no memirq (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-28-matthew.brost@intel.com
2025-10-09drm/xe/vf: Replay GuC submission state on pause / unpauseMatthew Brost7-32/+295
Fixup GuC submission pause / unpause functions to properly replay any possible state lost during VF post migration recovery. v3: - Add helpers for revert / replay (Tomasz) - Add comment around WQ NOPs (Tomasz) v7: - Only fixup / replay parallel queues once (Testing) - Skip unpause step on queues created after resfix done (Testing) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-27-matthew.brost@intel.com
2025-10-09drm/xe/vf: Abort VF post migration recovery on failureMatthew Brost3-0/+31
If VF post-migration recovery fails, the device is wedged. However, submission queues still need to be enabled for proper cleanup. In such cases, call into the GuC submission backend to restart all queues that were previously paused. v3: - s/Avort/Abort (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-26-matthew.brost@intel.com
2025-10-09drm/xe/vf: Start CTs before resfix VF post migration recoveryMatthew Brost3-13/+55
Before RESFIX_DONE, all CTs stuck in the H2G queue need to be squashed, as they may contain actions which contain invalid GGTT references or are unnecessary after HW change. Starting the CTs clears all H2Gs in the queue. Any lost H2Gs are resubmitted by the GuC submission state machine. v3: - Don't mess with head / tail values (Michal) v4: - Don't mess with broke (Michal) - Add CTB_H2G_BUFFER_OFFSET (Michal) v5: - Adjust commit message (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-25-matthew.brost@intel.com
2025-10-09drm/xe: Add CTB_H2G_BUFFER_OFFSET defineMatthew Brost1-5/+6
Rather than open coding the H2G buffer offset as 'CTB_DESC_SIZE * 2' add CTB_H2G_BUFFER_OFFSET define. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-24-matthew.brost@intel.com
2025-10-09drm/xe/vf: Kickstart after resfix in VF post migration recoveryMatthew Brost1-8/+9
GuC needs to be live for the GuC submission state machine to resubmit anything lost during VF post-migration recovery. Therefore, move the kickstart step after `resfix` to ensure proper resubmission. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-23-matthew.brost@intel.com
2025-10-09drm/xe/vf: Reset TLB invalidations during VF post migration recoveryMatthew Brost1-0/+2
TLB invalidations requests can be lost during VF post-migration recovery. Since the VF has migrated, these invalidations are no longer needed. Reset the TLB invalidation frontend, which will signal all pending fences. v3: - Move TLB invalidation reset after pausing submission (Tomasz) - Adjust commit message (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-22-matthew.brost@intel.com
2025-10-09drm/xe/vf: Flush and stop CTs in VF post migration recoveryMatthew Brost3-0/+12
Flushing CTs (i.e., progressing all pending G2H messages) gives VF post-migration recovery an accurate view of which H2G messages the GuC has processed, enabling the GuC submission state machine to correctly rebuild all state. Also, stop all CT traffic, as the CT is not live during VF post-migration recovery. v3: - xe_guc_ct_flush_and_stop rename (Michal) - Drop extra GuC CT WQ wake up (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-21-matthew.brost@intel.com
2025-10-09drm/xe/vf: Use GUC_HXG_TYPE_EVENT for GuC context registerMatthew Brost1-8/+27
The only case where the GuC submission backend cannot reason 100% correctly is when a GuC context is registered during VF post-migration recovery. In this scenario, it's possible that the GuC context register H2G is processed, but the immediately following schedule-enable H2G gets lost. The schedule-enable G2H "done" response is how the GuC state machine determines whether context registration has completed. A double register is harmless when using `GUC_HXG_TYPE_EVENT`, as GuC simply drops the duplicate H2G. To keep things simple, use `GUC_HXG_TYPE_EVENT` for all context registrations on VFs. v5: - Check for xe_sriov_vf_migration_supported (Tomasz) v7: - Add comment about subsequent protocol failures (Tomasz) - Modify commit message (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-20-matthew.brost@intel.com
2025-10-09drm/xe/vf: Avoid indefinite blocking in preempt rebind worker for VFs ↵Matthew Brost1-1/+25
supporting migration Blocking in work queues on a hardware action that may never occur — especially when it depends on a software fixup also scheduled on the a work queue — is a recipe for deadlock. This situation arises with the preempt rebind worker and VF post-migration recovery. To prevent potential deadlocks, avoid indefinite blocking in the preempt rebind worker for VFs that support migration. v4: - Use dma_fence_wait_timeout (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-19-matthew.brost@intel.com
2025-10-09drm/xe/vf: Wakeup in GuC backend on VF post migration recoveryMatthew Brost5-21/+99
If VF post-migration recovery is in progress, the recovery flow will rebuild all GuC submission state. In this case, exit all waiters to ensure that submission queue scheduling can also be paused. Avoid taking any adverse actions after aborting the wait. As part of waking up the GuC backend, suspend_wait can now return -EAGAIN indicating the waiter should be retried. If the caller is running on work item, that work item need to be requeued to avoid a deadlock for the work item blocking the VF migration recovery work item. v3: - Don't block in preempt fence work queue as this can interfere with VF post-migration work queue scheduling leading to deadlock (Testing) - Use xe_gt_recovery_inprogress (Michal) v5: - Use static function for vf_recovery (Michal) - Add helper to wake CT waiters (Michal) - Move some code to following patch (Michal) - Adjust commit message to explain suspend_wait returning -EAGAIN (Michal) - Add kernel doc to suspend_wait around returning -EAGAIN v7: - Add comment on why a shared wait queue is need on VFs (Michal) - Guard again suspend_wait signaling early on resfix donw (Tomasz) v8: - Fix kernel doc (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-18-matthew.brost@intel.com
2025-10-09drm/xe/vf: Don't allow GT reset to be queued during VF post migration recoveryMatthew Brost4-56/+5
With well-behaved software, a GT reset should never occur, nor should it happen during VF post-migration recovery. If it does, trigger a warning but suppress the GT reset, as VF post-migration recovery is expected to bring the VF back to a working state. v3: - Better commit message (Tomasz) v5: - Use xe_gt_WARN_ON (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-17-matthew.brost@intel.com
2025-10-09drm/xe/vf: Teardown VF post migration worker on driver unloadMatthew Brost4-2/+43
Be cautious and ensure the VF post-migration worker is not running during driver unload. v3: - More teardown later in driver init, use devm (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-16-matthew.brost@intel.com
2025-10-09drm/xe/vf: Close multi-GT GGTT shift raceMatthew Brost6-101/+123
As multi-GT VF post-migration recovery can run in parallel on different workqueues, but both GTs point to the same GGTT, only one GT needs to shift the GGTT. However, both GTs need to know when this step has completed. To coordinate this, perform the GGTT shift under the GGTT lock. With shift being done under the lock, storing the shift value becomes unnecessary. In addition to above, move the GGTT VF config from the GT to the tile. v3: - Update commmit message (Tomasz) v4: - Move GGTT values to tile state (Michal) - Use GGTT lock (Michal) v5: - Only take GGTT lock during recovery (CI) - Drop goto in vf_get_submission_cfg (Michal) - Add kernel doc around recovery in xe_gt_sriov_vf_query_config (Michal) v7: - Drop recovery variable (Michal) - Use _locked naming (Michal) - Use guard (Michal) v9: - Break LMEM changes into different patch (Michal) - Fix layering (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-15-matthew.brost@intel.com
2025-10-09drm/xe/vf: Move LMEM config to tile layerMatthew Brost7-30/+71
The LMEM VF provision is tile-layer-specific information. Move the LMEM configuration to the tile layer accordingly. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-14-matthew.brost@intel.com
2025-10-09drm/xe: Move GGTT lock init to allocMatthew Brost1-16/+23
The GGTT lock is needed very early during GT initialization for a VF; move the GGTT lock initialization to the allocation phase. v8: - Rework function structure (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-13-matthew.brost@intel.com
2025-10-09drm/xe/vf: Remove memory allocations from VF post migration recoveryMatthew Brost2-10/+15
VF post migration recovery is the path of dma-fence signaling / reclaim, avoid memory allocations in this path. v3: - s/lrc_wa_bb/scratch (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-12-matthew.brost@intel.com
2025-10-09drm/xe/vf: Abort H2G sends during VF post-migration recoveryMatthew Brost1-2/+2
While VF post-migration recovery is in progress, abort H2G sends with -ECANCEL. These messages are treated as lost, and TLB invalidation errors are suppressed. During this phase, the H2G channel is down, and VF recovery requires the CT lock to proceed. v3: - Use xe_gt_recovery_inprogress (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-11-matthew.brost@intel.com
2025-10-09drm/xe/vf: Make VF recovery run on per-GT workerMatthew Brost7-257/+185
VF recovery is a per-GT operation, so it makes sense to isolate it to a per-GT queue. Scheduling this operation on the same worker as the GT reset and TDR not only aligns with this design but also helps avoid race conditions, as those operations can also modify the queue state. v2: - Fix lockdep splat (Adam) - Use xe_sriov_vf_migration_supported helper v3: - Drop xe_gt_sriov_ prefix for private functions (Michal) - Drop message in xe_gt_sriov_vf_migration_init_early (Michal) - Logic rework in vf_post_migration_notify_resfix_done (Michal) - Rework init sequence layering (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-10-matthew.brost@intel.com
2025-10-09drm/xe/vf: Add xe_gt_recovery_pending helperMatthew Brost6-4/+99
Add xe_gt_recovery_pending helper. This helper serves as the singular point to determine whether a GT recovery is currently in progress. Expected callers include the GuC CT layer and the GuC submission layer. Atomically visable as soon as vCPU are unhalted until VF recovery completes. v3: - Add GT layer xe_gt_recovery_inprogress (Michal) - Don't blow up in memirq not enabled (CI) - Add __memirq_received with clear argument (Michal) - xe_memirq_sw_int_0_irq_pending rename (Michal) - Use offset in xe_memirq_sw_int_0_irq_pending (Michal) v4: - Refactor xe_gt_recovery_inprogress logic around memirq (Michal) v5: - s/inprogress/pending (Michal) v7: - Fix typos, adjust comment (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-9-matthew.brost@intel.com
2025-10-09drm/xe: Make LRC W/A scratch buffer usage consistentMatthew Brost1-1/+1
The LRC W/A currently checks for LRC being iomem in some places, while in others it checks if the scratch buffer is non-NULL. This inconsistency causes issues with the VF post-migration recovery code, which blindly passes in a scratch buffer. This patch standardizes the check by consistently verifying whether the LRC is iomem to determine if the scratch buffer should be used. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-8-matthew.brost@intel.com
2025-10-09drm/xe: Don't change LRC ring head on job resubmissionMatthew Brost1-2/+16
Now that we save the job's head during submission, it's no longer necessary to adjust the LRC ring head during resubmission. Instead, a software-based adjustment of the tail will overwrite the old jobs in place. For some odd reason, adjusting the LRC ring head didn't work on parallel queues, which was causing issues in our CI. v5: - Add comment in guc_exec_queue_start explaning why the function works (Auld) v7: - Only adjust first state on first unsignaled job (Auld) v8: - Break unsignaled job handling to separate patch (Auld) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-7-matthew.brost@intel.com
2025-10-09drm/xe: Return first unsignaled job first pending job helperMatthew Brost1-4/+17
In all cases where the first pending job helper is called, we only want to retrieve the first unsignaled pending job, as this helper is used exclusively in recovery flows. It is possible for signaled jobs to remain in the pending list as the scheduler is stopped, so those should be skipped. Also, add kernel documentation to clarify this behavior. v8: - Split out into own patch (Auld) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-6-matthew.brost@intel.com
2025-10-09drm/xe: Track LR jobs in DRM scheduler pending listMatthew Brost4-45/+31
VF migration requires jobs to remain pending so they can be replayed after the VF comes back. Previously, LR job fences were intentionally signaled immediately after submission to avoid the risk of exporting them, as these fences do not naturally signal in a timely manner and could break dma-fence contracts. A side effect of this approach was that LR jobs were never added to the DRM scheduler’s pending list, preventing them from being tracked for later resubmission. We now avoid signaling LR job fences and ensure they are never exported; Xe already guards against exporting these internal fences. With that guarantee in place, we can safely track LR jobs in the scheduler’s pending list so they are eligible for resubmission during VF post-migration recovery (and similar recovery paths). An added benefit is that LR queues now gain the DRM scheduler’s built-in flow control over ring usage rather than rejecting new jobs in the exec IOCTL if the ring is full. v2: - Ensure DRM scheduler TDR doesn't run for LR jobs - Stack variable for killed_or_banned_or_wedged v4: - Clarify commit message (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-5-matthew.brost@intel.com
2025-10-09drm/xe/guc: Track pending-enable source in submission stateMatthew Brost1-0/+36
Add explicit tracking in the GuC submission state to record the source of a pending enable (TDR vs. queue resume path vs. submission). Disambiguating the origin lets the GuC submission state machine apply the correct recovery/replay behavior. This helps VF restore: when the device comes back, the state machine knows whether the pending enable stems from timeout recovery, from a queue resume sequence, or submission and can gate sequencing and fixups accordingly. v4: - Clarify commit message (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-4-matthew.brost@intel.com
2025-10-09drm/xe: Save off position in ring in which a job was programmedMatthew Brost2-4/+24
VF post-migration recovery needs to modify the ring with updated GGTT addresses for pending jobs. Save off position in ring in which a job was programmed to facilitate. v4: - s/VF resume/VF post-migration recovery (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-3-matthew.brost@intel.com
2025-10-09drm/xe: Add NULL checks to scratch LRC allocationMatthew Brost1-4/+9
kmalloc can fail, the returned value must have a NULL check. This should be immediately after kmalloc for clarity. v5: - Assert state->buffer in setup_bo if buffer is iomem (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-2-matthew.brost@intel.com
2025-10-09drm/i915/guc: Skip communication warning on reset in progressZhanjun Dong1-1/+8
GuC IRQ and tasklet handler receive just single G2H message, and let other messages to be received from next tasklet. During this chained tasklet process, if reset process started, communication will be disabled. Skip warning for this condition. Fixes: 65dd4ed0f4e1 ("drm/i915/guc: Don't receive all G2H messages in irq handler") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15018 Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250929152904.269776-1-zhanjun.dong@intel.com
2025-10-09drm/i915/alpm: Remove parameters suffix from intel_dp->alpm_parametersJouni Högander3-15/+15
Now as intel_dp->alpm_parameters doesn't really contain any parameters it doesn't make sense to call it as alpm_parameters -> remove parameters suffix. Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Animesh Manna <animesh.manna@intel.com> Link: https://lore.kernel.org/r/20250929130003.28365-2-jouni.hogander@intel.com
2025-10-09drm/i915/alpm: Compute ALPM parameters into crtc_state->alpm_stateJouni Högander4-42/+53
Currently ALPM parameters are computed directly into intel_dp->alpm_parameters. This is a problem when compute config ends up to not using the computed state. Fix this by adding ALPM parameters into intel_crtc_state and compute into there. Copy needed parameters (io_wake_lines and fast_wake_lines used by PSR activate/exit) from crtc_state->alpm_state into intel_dp->alpm.alpm_parameters when they are configured into HW. v3: - enhance commit message v2: - store io/fast wake lines into intel_dp->dp instead of intel_dp->alpm_parameters and do it in intel_psr_enable_locked - rename crtc_state->alpm_parameters -> crtc_state->alpm_state - clarify commit message Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Animesh Manna <animesh.manna@intel.com> Link: https://lore.kernel.org/r/20250929130003.28365-1-jouni.hogander@intel.com
2025-10-09drm/virtgpu: Use vblank timerThomas Zimmermann1-2/+27
Use a vblank timer to simulate the vblank interrupt. The DRM vblank helpers provide an implementation on top of Linux' hrtimer. Virtgpu enables and disables the timer as part of the CRTC. The atomic_flush callback sets up the event. Like vblank interrupts, the vblank timer fires at the rate of the display refresh. Most userspace limits its page flip rate according to the DRM vblank event. Virtgpu's virtual hardware does not provide vblank interrupts, so DRM sends each event ASAP. With the fast access times of virtual display memory, the event rate is much higher than the display mode's refresh rate; creating the next page flip almost immediately. This leads to excessive CPU overhead from even small display updates, such as moving the mouse pointer. This problem affects virtgpu and all other virtual displays. See [1] for a discussion in the context of hypervdrm. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/dri-devel/SN6PR02MB415702B00D6D52B0EE962C98D46CA@SN6PR02MB4157.namprd02.prod.outlook.com/ # [1] Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://lore.kernel.org/r/20251008130701.246988-1-tzimmermann@suse.de
2025-10-09drm/virtio: Handle drm_crtc_init_with_planes() errorsAlexandr Sapozhnikov1-2/+5
Return value of function drm_crtc_init_with_planes(), called by vgdev_output_init(), is not checked, but it is usually checked for this function. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Alexandr Sapozhnikov <alsp705@gmail.com> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> [dmitry.osipenko@collabora.com: coding style fix, edit commit message] Link: https://lore.kernel.org/r/20250922144418.41-1-alsp705@gmail.com
2025-10-08drm/xe: Move declarations under conditional branchTejas Upadhyay1-3/+3
The xe_device_shutdown() function was needing a few declarations that were only required under a specific condition. This change moves those declarations to be within that conditional branch to avoid unnecessary declarations. Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20251007100208.1407021-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2025-10-08drm: atmel-hlcdc: replace dev_* print functions with drm_* variantsEslam Khafagy4-21/+23
Update the Atmel HLCDC code to use DRM print macros drm_*() instead of dev_warn() and dev_err(). This change ensures consistency with DRM subsystem logging conventions [1]. [1] Link: https://docs.kernel.org/gpu/todo.html#convert-logging-to-drm-functions-with-drm-device-parameter Signed-off-by: Eslam Khafagy <eslam.medhat1993@gmail.com> Reviewed-by: Manikandan Muralidharan <manikandan.m@microchip.com> Link: https://lore.kernel.org/r/20250813224000.130292-1-eslam.medhat1993@gmail.com Signed-off-by: Dharma Balasubiramani <dharma.b@microchip.com>