summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-01-12drm/xe/gsc: Make GSC FW load optional for newer platformsDaniele Ceraolo Spurio3-10/+15
On newer platforms GSC FW is only required for content protection features, so the core driver features work perfectly fine without it (and we did in fact not enable it to start with on PTL). Therefore, we can selectively enable the GSC only if the FW is found on disk, without failing if it is not found. Note that this means that the FW can now be enabled (i.e., we're looking for it) but not available (i.e., we haven't found it), so checks on FW support should use the latter state to decide whether to go on or not. As part of the rework, the message for FW not found has been cleaned up to be more readable. While at it, drop the comment about xe_uc_fw_init() since the code has been reworked and the statement no longer applies. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patch.msgid.link/20260108011340.2562349-6-daniele.ceraolospurio@intel.com
2026-01-12drm/xe/device: Convert wait for lmem init into an assertBalasubramani Vivekanandan1-57/+16
Prior to lmem init check, driver is waiting for the pcode uncore_init status. uncore_init status will be flagged after the complete boot and initialization of the SoC by the pcode. uncore_init confirms that lmem init and mmio unblock has been already completed. It makes no sense to check for lmem init after the pcode uncore_init check. So change the wait for lmem init check into an assert which confirms lmem init is set. Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20251219145024.2955946-2-balasubramani.vivekanandan@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-01-12drm/xe: Privatize xe_ggtt_nodeMaarten Lankhorst2-18/+19
Nothing requires it any more, make the member private. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-16-dev@lankhorst.se
2026-01-12drm/xe: Improve xe_gt_sriov_pf_config GGTT handlingMaarten Lankhorst3-7/+20
Do not directly dereference xe_ggtt_node, and add a function to retrieve the allocated GGTT size. Reviewed-by: Matthew.brost@intel.com Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-15-dev@lankhorst.se
2026-01-12drm/xe: Do not dereference ggtt_node in xe_bo.cMaarten Lankhorst1-3/+3
A careful inspection of __xe_ggtt_insert_bo_at() shows that the ggtt_node can always be seen as inserted from xe_bo.c due to the way error handling is performed. The checks are also a little bit too paranoid, since we never create a bo with ggtt_node[id] initialised but not inserted into the GGTT, which can be seen by looking at __xe_ggtt_insert_bo_at() Additionally, the size of the GGTT is never bigger than 4 GB, so adding a check at that level is incorrect. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108101014.579906-14-dev@lankhorst.se
2026-01-12drm/xe/display: Avoid dereferencing xe_ggtt_nodeMaarten Lankhorst3-5/+5
Start using xe_ggtt_node_addr, and avoid comparing the base offset as vma->node is dynamically allocated. Also sneak in a xe_bo_size() for stolen, too small to put as separate commit. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260108101014.579906-13-dev@lankhorst.se
2026-01-12drm/xe: Add xe_ggtt_node_addr() to avoid dereferencing xe_ggtt_nodeMaarten Lankhorst3-3/+18
This function makes it possible to add an offset that is applied to all xe_ggtt_node's, and hides the internals from all its users. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108101014.579906-12-dev@lankhorst.se
2026-01-12drm/xe: Convert xe_fb_pin to use a callback for insertion into GGTTMaarten Lankhorst4-81/+131
The rotation details belong in xe_fb_pin.c, while the operations involving GGTT belong to xe_ggtt.c. As directly locking xe_ggtt etc results in exposing all of xe_ggtt details anyway, create a special function that allocates a ggtt_node, and allow display to populate it using a callback as a compromise. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-11-dev@lankhorst.se
2026-01-12drm/xe: Start using ggtt->start in preparation of balloon removalMaarten Lankhorst5-34/+60
Instead of having ggtt->size point to the end of ggtt, have ggtt->size be the actual size of the GGTT, and introduce ggtt->start to point to the beginning of GGTT. This will allow a massive cleanup of GGTT in case of SRIOV-VF. Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-10-dev@lankhorst.se
2026-01-12drm/xe/mert: Move MERT initialization to xe_mert.cMichal Wajdeczko3-3/+15
Most of the MERT code is already in dedicated file, no reason to keep internal MERT data structure initialization elsewhere. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260109151219.26206-6-michal.wajdeczko@intel.com
2026-01-12drm/xe/mert: Use local mert variable to simplify the codeMichal Wajdeczko1-5/+6
There is no need to always refer to MERT data using tile pointer. Use of local mert pointer will simplify the code and make it look like other existing MERT function. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260109151219.26206-5-michal.wajdeczko@intel.com
2026-01-12drm/xe/mert: Always refer to MERT using xe_deviceMichal Wajdeczko3-8/+6
There is only one MERT instance and while it is located on the root tile, it is safer to refer to it using xe_device rather than xe_tile. This will also allow to align signature with other MERT function. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260111213847.27869-1-michal.wajdeczko@intel.com
2026-01-12drm/xe/mert: Fix kernel-doc for struct xe_mertMichal Wajdeczko1-1/+4
Add simple top level kernel-doc for the struct itself to allow the script recognize that and fix tag of the one member. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260109151219.26206-3-michal.wajdeczko@intel.com
2026-01-12drm/xe/mert: Normalize xe_mert.h include guardsMichal Wajdeczko1-3/+3
Most of our header files are using include guard names with single underscore and we don't use trailing comments on final #endif. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260109151219.26206-2-michal.wajdeczko@intel.com
2026-01-11drm/xe: Avoid toggling schedule state to check LRC timestamp in TDRMatthew Brost6-96/+78
We now have proper infrastructure to accurately check the LRC timestamp without toggling the scheduling state for non-VFs. For VFs, it is still possible to get an inaccurate view if the context is on hardware. We guard against free-running contexts on VFs by banning jobs whose timestamps are not moving. In addition, VFs have a timeslice quantum that naturally triggers context switches when more than one VF is running, thus updating the LRC timestamp. For multi-queue, it is desirable to avoid scheduling toggling in the TDR because this scheduling state is shared among many queues. Furthermore, this change simplifies the GuC state machine. The trade-off for VF cases seems worthwhile. v5: - Add xe_lrc_timestamp helper (Umesh) v6: - Reduce number of tries on stuck timestamp (VF testing) - Convert job timestamp save to a memory copy (VF testing) v7: - Save ctx timestamp to LRC when start VF job (VF testing) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20260110012739.2888434-8-matthew.brost@intel.com
2026-01-11drm/xe: Disable timestamp WA on VFsMatthew Brost1-0/+3
The timestamp WA does not work on a VF because it requires reading MMIO registers, which are inaccessible on a VF. This timestamp WA confuses LRC sampling on a VF during TDR, as the LRC timestamp would always read as 1 for any active context. Disable the timestamp WA on VFs to avoid this confusion. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization") Link: https://patch.msgid.link/20260110012739.2888434-7-matthew.brost@intel.com
2026-01-11drm/xe: Remove special casing for LR queues in submissionMatthew Brost3-127/+11
Now that LR jobs are tracked by the DRM scheduler, there's no longer a need to special-case LR queues. This change removes all LR queue-specific handling, including dedicated TDR logic, reference counting schemes, and other related mechanisms. v4: - Remove xe_exec_queue_lr_cleanup tracepoint (Niranjana) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260110012739.2888434-6-matthew.brost@intel.com
2026-01-11drm/xe: Do not deregister queues in TDRMatthew Brost1-60/+10
Deregistering queues in the TDR introduces unnecessary complexity, requiring reference-counting techniques to function correctly, particularly to prevent use-after-free (UAF) issues while a deregistration initiated from the TDR is in progress. All that's needed in the TDR is to kick the queue off the hardware, which is achieved by disabling scheduling. Queue deregistration should be handled in a single, well-defined point in the cleanup path, tied to the queue's reference count. v4: - Explain why extra ref were needed prior to this patch (Niranjana) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260110012739.2888434-5-matthew.brost@intel.com
2026-01-11drm/xe: Only toggle scheduling in TDR if GuC is runningMatthew Brost1-1/+2
If the firmware is not running during TDR (e.g., when the driver is unloading), there's no need to toggle scheduling in the GuC. In such cases, skip this step. v4: - Bail on wait UC not running (Niranjana) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260110012739.2888434-4-matthew.brost@intel.com
2026-01-11drm/xe: Stop abusing DRM scheduler internalsMatthew Brost6-120/+27
Use new pending job list iterator and new helper functions in Xe to avoid reaching into DRM scheduler internals. Part of this change involves removing pending jobs debug information from debugfs and devcoredump. As agreed, the pending job list should only be accessed when the scheduler is stopped. However, it's not straightforward to determine whether the scheduler is stopped from the shared debugfs/devcoredump code path. Additionally, the pending job list provides little useful information, as pending jobs can be inferred from seqnos and ring head/tail positions. Therefore, this debug information is being removed. v4: - Add comment around DRM_GPU_SCHED_STAT_NO_HANG (Niranjana) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260110012739.2888434-3-matthew.brost@intel.com
2026-01-11drm/xe: Add dedicated message lockMatthew Brost3-4/+7
Stop abusing DRM scheduler job list lock for messages, add dedicated message lock. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Acked-by: Philipp Stanner <phasta@kernel.org> Link: https://patch.msgid.link/20260110012739.2888434-2-matthew.brost@intel.com
2026-01-10drm/xe: Allow compressible surfaces to be 1-way coherentXin Wang7-14/+109
Previously, compressible surfaces were required to be non-coherent (allocated as WC) because compression and coherency were mutually exclusive. Starting with Xe3, hardware supports combining compression with 1-way coherency, allowing compressible surfaces to be allocated as WB memory. This provides applications with more efficient memory allocation by avoiding WC allocation overhead that can cause system stuttering and memory management challenges. The implementation adds support for compressed+coherent PAT entry for the xe3_lpg devices and updates the driver logic to handle the new compression capabilities. v2: (Matthew Auld) - Improved error handling with XE_IOCTL_DBG() - Enhanced documentation and comments - Fixed xe_bo_needs_ccs_pages() outdated compression assumptions v3: - Improve WB compression support detection by checking PAT table instead of version check v4: - Add XE_CACHE_WB_COMPRESSION, which simplifies the logic. v5: - Use U16_MAX for the invalid PAT index. (Matthew Auld) Bspec: 71582, 59361, 59399 Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Xin Wang <x.wang@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260109093007.546784-1-x.wang@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-01-09drm/xe: improve header checkJani Nikula1-1/+2
Improve header check: Remove unused -DHDRTEST. Include the header twice to check for include guards. Run kernel-doc on the header. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260107155401.2379127-5-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-01-09drm/xe/vm: fix xe_vm_validation_exec() kernel-docJani Nikula1-1/+1
Fix kernel-doc warnings on xe_vm_validation_exec(): Warning: ../drivers/gpu/drm/xe/xe_vm.h:392 expecting prototype for xe_vm_set_validation_exec(). Prototype was for xe_vm_validation_exec() instead Fixes: 0131514f9789 ("drm/xe: Pass down drm_exec context to validation") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260107155401.2379127-4-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-01-09drm/xe/xe_late_bind_fw: fix enum xe_late_bind_fw_id kernel-docJani Nikula1-1/+3
Fix kernel-doc warnings on enum xe_late_bind_fw_id: Warning: ../drivers/gpu/drm/xe/xe_late_bind_fw_types.h:19 cannot understand function prototype: 'enum xe_late_bind_fw_id' Fixes: 45832bf9c10f ("drm/xe/xe_late_bind_fw: Initialize late binding firmware") Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patch.msgid.link/20260107155401.2379127-3-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-01-09drm/xe/vf: fix struct xe_gt_sriov_vf_migration kernel-docJani Nikula1-2/+2
Fix kernel-doc warnings on struct xe_gt_sriov_vf_migration: Warning: ../drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h:47 cannot understand function prototype: 'struct xe_gt_sriov_vf_migration' Fixes: e1d2e2d878bf ("drm/xe/vf: Add xe_gt_recovery_pending helper") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260107155401.2379127-2-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-01-09drm/xe/guc: fix struct guc_lfd_file_header kernel-docJani Nikula1-3/+2
Fix kernel-doc warnings on struct guc_lfd_file_header: Warning: ../drivers/gpu/drm/xe/abi/guc_lfd_abi.h:168 expecting prototype for struct guc_logfile_header. Prototype was for struct guc_lfd_file_header instead Fixes: 7eeb0e5408bd ("drm/xe/guc: Add LFD related abi definitions") Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260107155401.2379127-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-01-09drm/xe: Add page reclamation related statsBrian Nguyen6-0/+20
Add page reclaim list (PRL) related stats to GT stats to assist in debugging and tuning of page reclaim related actions. Include counters of page sizes added to PRL and if PRL action is issued. v2: - Add PRL_ABORTED_COUNT stats and corresponding changes. (Matthew B) Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260107010447.4125005-10-brian3.nguyen@intel.com
2026-01-09drm/xe: Fix page reclaim entry handling for large pagesBrian Nguyen1-17/+47
For 64KB pages, XE_PTE_PS64 is defined for all consecutive 4KB pages and are all considered leaf nodes, so existing check was falsely adding multiple 64KB pages to PRL. For larger entries such as 2MB PDE, the check for pte->base.children is insufficient since this array is always defined for page directory, level 1 and above, so perform a check on the entry itself pointing to the correct page. For unmaps, if the range is properly covered by the page full directory, page walker may finish without walking to the leaf nodes. For example, a 1G range can be fully covered by 512 2MB pages if alignment allows. In this case, the page walker will walk until it reaches this corresponding directory which can correlate to the 1GB range. Page walker will simply complete its walk and the individual 2MB PDE leaves won't get accessed. In this case, PRL invalidation is also required, so add a check to see if pt entry cover the entire range since the walker will complete the walk. There are possible race conditions that will cause driver to read a pte that hasn't been written to yet. The 2 scenarios are: - Another issued TLB invalidation such as from userptr or MMU notifier. - Dependencies on original bind that has yet to be executed with an unbind on that job. The expectation is these race conditions are likely rare cases so simply perform a fallback to full PPC flush invalidation instead. v2: - Reword commit and updated zero-pte handling. (Matthew B) v3: - Rework if statement for abort case with additional comments. (Matthew B) Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind") Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260107010447.4125005-9-brian3.nguyen@intel.com
2026-01-09drm/xe: Add explicit abort page reclaim listBrian Nguyen2-12/+28
PRLs could be invalidated to indicate its getting dropped from current scope but are still valid. So standardize calls and add abort to clearly define when an invalidation is a real abort and PRL should fallback. v3: - Update abort function to macro. (Matthew B) Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260107010447.4125005-8-brian3.nguyen@intel.com
2026-01-09drm/xe: Remove debug comment in page reclaimBrian Nguyen1-1/+0
Drop debug comment erronenously added in patch commit. Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260107010447.4125005-7-brian3.nguyen@intel.com
2026-01-09drm/xe: fix WQ_MEM_RECLAIM passed as max_active to alloc_workqueue()Marco Crivellari1-1/+1
Workqueue xe-ggtt-wq has been allocated using WQ_MEM_RECLAIM, but the flag has been passed as 3rd parameter (max_active) instead of 2nd (flags) creating the workqueue as per-cpu with max_active = 8 (the WQ_MEM_RECLAIM value). So change this by set WQ_MEM_RECLAIM as the 2nd parameter with a default max_active. Fixes: 60df57e496e4 ("drm/xe: Mark GGTT work queue with WQ_MEM_RECLAIM") Cc: stable@vger.kernel.org Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108180148.423062-1-marco.crivellari@suse.com
2026-01-09drm/xe: Add missing newlines to drm_warn messagesOsama Abdelkader1-7/+7
The drm_warn() calls in the default cases of various switch statements in xe_vm.c were missing trailing newlines, which can cause log messages to be concatenated with subsequent output. Add '\n' to all affected messages. Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Link: https://patch.msgid.link/20251224212116.59021-1-osama.abdelkader@gmail.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-09drm/xe/pf: Allow upon-any-hang wedged mode only in debug configLukasz Laguna1-1/+2
The GuC reset policy is global, so disabling it on PF can affect all running VFs. To avoid unintended side effects, restrict setting upon-any-hang (2) wedged mode on the PF to debug builds only. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260107174741.29163-5-lukasz.laguna@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-09drm/xe/vf: Disallow setting wedged mode to upon-any-hangLukasz Laguna1-0/+5
In upon-any-hang (2) wedged mode, engine resets need to be disabled, which requires changing the GuC reset policy. VFs are not permitted to do that. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260107174741.29163-4-lukasz.laguna@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-09drm/xe: Update wedged.mode only after successful reset policy changeLukasz Laguna4-20/+71
Previously, the driver's internal wedged.mode state was updated without verifying whether the corresponding engine reset policy update in GuC succeeded. This could leave the driver reporting a wedged.mode state that doesn't match the actual reset behavior programmed in GuC. With this change, the reset policy is updated first, and the driver's wedged.mode state is modified only if the policy update succeeds on all available GTs. This patch also introduces two functional improvements: - The policy is sent to GuC only when a change is required. An update is needed only when entering or leaving XE_WEDGED_MODE_UPON_ANY_HANG, because only in that case the reset policy changes. For example, switching between XE_WEDGED_MODE_UPON_CRITICAL_ERROR and XE_WEDGED_MODE_NEVER doesn't affect the reset policy, so there is no need to send the same value to GuC. - An inconsistent_reset flag is added to track cases where reset policy update succeeds only on a subset of GTs. If such inconsistency is detected, future wedged mode configuration will force a retry of the reset policy update to restore a consistent state across all GTs. Fixes: 6b8ef44cc0a9 ("drm/xe: Introduce the wedged_mode debugfs") Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260107174741.29163-3-lukasz.laguna@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-09drm/xe: Validate wedged_mode parameter and define enum for modesLukasz Laguna9-20/+94
Check correctness of the wedged_mode parameter input to ensure only supported values are accepted. Additionally, replace magic numbers with a clearly defined enum. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260107174741.29163-2-lukasz.laguna@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08drm/xe/pm: Handle GT resume failureRaag Jadav1-4/+22
We've been historically ignoring GT resume failure. Since the function can return error, handle it properly. v2: Bring up display before bailing (Matt Roper, Rodrigo) Signed-off-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20251220073657.166810-1-raag.jadav@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08drm/xe/nvls: Define GuC firmware for NVL-SMatt Roper1-0/+1
Although NVL-S has a similar Xe3 to PTL/WCL, it requires a unique GuC firmware. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251016-xe3p-v3-12-3dd173a3097a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patch.msgid.link/20260108181956.1254908-9-julia.filipchuk@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08drm/pagemap: Disable device-to-device migrationMatthew Brost1-2/+12
Device-to-device migration is causing xe_exec_system_allocator --r *race*no* to intermittently fail with engine resets and a kernel hang on a page lock. This should work but is clearly buggy somewhere. Disable device-to-device migration in the interim until the issue can be root-caused. The only downside of disabling device-to-device migration is that memory will bounce through system memory during migration. However, this path should be rare, as it only occurs when madvise attributes are changed or atomics are used. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: ec265e1f1cfc ("drm/pagemap: Support source migration over interconnect") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20260107182716.2236607-3-matthew.brost@intel.com
2026-01-08drm/pagemap Fix error paths in drm_pagemap_migrate_to_devmemMatthew Brost1-3/+5
Avoid unlocking and putting device pages unless they were successfully locked, and do not calculate migrated_pages on error paths. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 75af93b3f5d0 ("drm/pagemap, drm/xe: Support destination migration over interconnect") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20260107182716.2236607-2-matthew.brost@intel.com
2026-01-08drm/xe: Adjust page count tracepoints in shrinkerMatthew Brost1-2/+7
Page accounting can change via the shrinker without calling xe_ttm_tt_unpopulate(), which normally updates page count tracepoints through update_global_total_pages. Add a call to update_global_total_pages when the shrinker successfully shrinks a BO. v2: - Don't adjust global accounting when pinning (Stuart) Cc: stable@vger.kernel.org Fixes: ce3d39fae3d3 ("drm/xe/bo: add GPU memory trace points") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260107205732.2267541-1-matthew.brost@intel.com
2026-01-08Merge drm/drm-next into drm-xe-nextRodrigo Vivi708-6643/+12921
Bring some drm-scheduler patches to Xe. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-07drm/xe: Validate preferred system memory placement in xe_svm_range_validateMatthew Brost1-0/+2
Ensure preferred system memory placement is checked in xe_svm_range_validate when dpagemap is NULL. Without this check, a prefetch to system memory may become a no-op because device memory is considered a valid placement. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 238dbc9d9f4a ("drm/xe: Use the vma attibute drm_pagemap to select where to migrate") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patch.msgid.link/20260106213443.1866797-1-matthew.brost@intel.com
2026-01-06drm/xe/doc: Remove KEEP_ACTIVE featureNiranjana Vishwanathapura1-3/+2
The KEEP_ACTIVE feature is being reverted, update documentation. Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260106191051.2866538-6-niranjana.vishwanathapura@intel.com
2026-01-06Revert "drm/xe/multi_queue: Support active group after primary is destroyed"Niranjana Vishwanathapura5-69/+3
This reverts commit 3131a43ecb346ae3b5287ee195779fc38c6fcd11. There is no must have requirement for this feature from Compute UMD. Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260106191051.2866538-5-niranjana.vishwanathapura@intel.com
2026-01-05drm/xe/i2c: Force polling mode in survivabilityRaag Jadav2-4/+7
SGUnit interrupts are not initialized in survivability. Force I2C controller to polling mode while in survivability. v2: Use helper function instead of manual check (Riana) Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://patch.msgid.link/20260105080750.16605-1-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-01-01Merge tag 'drm-xe-next-2025-12-30' of ↵Dave Airlie62-483/+3879
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next Core Changes: - Dynamic pagemaps and multi-device SVM (Thomas) Driver Changes: - Introduce SRIOV scheduler Groups (Daniele) - Configure migration queue as low latency (Francois) - Don't use absolute path in generated header comment (Calvin Owens) - Add SoC remapper support for system controller (Umesh) - Insert compiler barriers in GuC code (Jonathan) - Rebar updates (Lucas) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/aVOiULyYdnFbq-JB@fedora
2025-12-27Merge tag 'drm-xe-next-2025-12-19' of ↵Dave Airlie152-1498/+5329
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next [airlied: fix guc submit double definition] UAPI Changes: - Multi-Queue support (Niranjana) - Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE (Brost) - Add NO_COMPRESSION BO flag and query capability (Sanjay) - Add gt_id to struct drm_xe_oa_unit (Ashutosh) - Expose MERT OA unit (Ashutosh) - Sysfs Survivability refactor (Riana) Cross-subsystem Changes: - VFIO: Add device specific vfio_pci driver variant for Intel graphics (Winiarski) Driver Changes: - MAINTAINERS update (Lucas -> Matt) - Add helper to query compression enable status (Xin) - Xe_VM fixes and updates (Shuicheng, Himal) - Documentation fixes (Winiarski, Swaraj, Niranjana) - Kunit fix (Roper) - Fix potential leaks, uaf, null derref, and oversized allocations (Shuicheng, Sanjay, Mika, Tapani) - Other minor fixes like kbuild duplication and sysfs_emit (Shuicheng, Madhur) - Handle msix vector0 interrupt (Venkata) - Scope-based forcewake and runtime PM (Roper, Raag) - GuC/HuC related fixes and refactors (Lucas, Zhanjun, Brost, Julia, Wajdeczko) - Fix conversion from clock ticks to milliseconds (Harish) - SRIOV PF PF: Add support for MERT (Lukasz) - Enable SR-IOV VF migration and other SRIOV updates (Winiarski, Satya, Brost, Wajdeczko, Piotr, Tomasz, Daniele) - Optimize runtime suspend/resume and other PM improvements (Raag) - Some W/a additions and updates (Bala, Harish, Roper) - Use for_each_tlb_inval() to calculate invalidation fences (Roper) - Fix VFIO link error (Arnd) - Fix ix drm_gpusvm_init() arguments (Arnd) - Other OA refactor (Ashutosh) - Refactor PAT and expose debugfs (Xin) - Enable Indirect Ring State for xe3p_xpc (Niranjana) - MEI interrupt fix (Junxiao) - Add stats for mode switching on hw_engine_group (Francois) - DMA-Buf related changes (Thomas) - Multi Queue feature support (Niranjana) - Enable I2C controller for Crescent Island (Raag) - Enable NVM for Crescent Island (Sasha) - Increase TDF timeout (Jagmeet) - Restore engine registers before restarting schedulers after GT reset (Jan) - Page Reclamation Support for Xe3p Platforms (Brian, Brost, Oak) - Fix performance when pagefaults and 3d/display share resources (Brost) - More OA MERT work (Ashutosh) - Fix return values (Dan) - Some log level and messages improvements (Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/aUXUhEgzs6hDLQuu@intel.com
2025-12-27Merge tag 'drm-intel-next-2025-12-19' of ↵Dave Airlie176-2506/+3796
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next Beyond Display related: - Switch to use kernel standard fault injection in i915 (Juha-Pekka) Display uAPI related: - Display uapi vs. hw state fixes (Ville) - Expose sharpness only if num_scalers is >= 2 (Nemesa) Display related: - More display driver refactor and clean-ups, specially towards separation (Jani) - Add initial support Xe3p_LPD for NVL (Gustavo, Sai, ) - BMG FBC W/a (Vinod) - RPM fix (Dibin) - Add MTL+ platforms to support dpll framework (Mika, Imre) - Other PLL related fixes (Imre) - Fix DIMM_S DRAM decoding on ICL (Ville) - Async flip refactor (Ville, Jouni) - Go back to using AUX interrupts (Ville) - Reduce severity of failed DII FEC enabling (Grzelak) - Enable system cache support for FBC (Vinod) - Move PSR/Panel Replay sink data into intel_connector and other PSR changes (Jouni) - Detect AuxCCS support via display parent interface (Tvrtko) - Clean up link BW/DSC slice config computation(Imre) - Toggle powerdown states for C10 on HDMI (Gustavo) - Add parent interface for PC8 forcewake tricks (Ville) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/aUW3bVDdE63aSFOJ@intel.com