summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/xe
AgeCommit message (Collapse)AuthorFilesLines
2025-07-11drm/xe: Add infrastructure for Device OOB workaroundsMatt Atwood5-2/+108
Some workarounds need to be able to be applied ahead of any GT initialization for example 15015404425. This patch creates XE_DEVICE_WA macro, in the same vein as XE_WA. This macro can be used ahead of GT initialization, and can be tracked in sysfs. This should alleviate some of the complexities that exist in i915. v2: name change SoC to Device, address style issues v5: split into separate patch from RTP changes, put oob within a struct, move the initiation of oob workarounds into xe_device_probe_early(), clean up the comments around XE_WA. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-5-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11drm/xe: add new type to RTP contextMatt Atwood3-1/+36
Prepare the RTP context to be used before GT init. Add the xe device as a type, put WARN_ONs to protect existing RTP_MATCHes. v5: split out into separate patch, change definition order v6: catch missing cases for checking gt init Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-4-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11drm/xe: add xe_device_wa infrastructureMatt Atwood2-1/+8
There are some workarounds that must be appplied before gt init, wa_15015404425 for example. Instead of sprinking them conditionally throughout the driver as we did for i915 generate an oob.rules file reusing the RTP infrastructure to make these easier to track. v2: rename xe_soc_wa to xe_device_wa v5: derive prefix from argument rather than hard coding the values. v6: split out xe_gen-wa_oob changes Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-3-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11drm/xe: prepare xe_gen_wa_oob to be multi-useMatt Atwood1-7/+38
There is a need for additional oob rules files. Make the current gen file more robust to support more files. Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250709221605.172516-2-matthew.s.atwood@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10drm/xe/guc: Cancel ongoing H2G requests when stopping CTMichal Wajdeczko1-0/+24
Once we have started a GT reset sequence, which includes stopping GuC CTB communication, we should also cancel all ongoing H2G send- recv requests, as either GuC is already dead, or due to imminent reset GuC will not be able to reply, or due to internal cleanup we will lose pending fences. With this we will report dedicated -ECANCELED error instead of misleading -ETIME. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250709174038.1876-4-michal.wajdeczko@intel.com
2025-07-10drm/xe/guc: Move state change logger to helperMichal Wajdeczko1-1/+6
In the state change helper we are already doing extra stuff, move debug state logger there to cover all state changes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250709174038.1876-3-michal.wajdeczko@intel.com
2025-07-10drm/xe/guc: Rename CT state change helperMichal Wajdeczko1-4/+4
In this helper we are already doing much more than just setting a new CT state and its name was little misleading. Rename it. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250709174038.1876-2-michal.wajdeczko@intel.com
2025-07-10drm/xe/pm: Correct comment of xe_pm_set_vram_threshold()Shuicheng Lin1-3/+5
The parameter threshold is with size in MiB, not in bits. Correct it to avoid any confusion. v2: s/mb/MiB, s/vram/VRAM, fix return section. (Michal) Fixes: 30c399529f4c ("drm/xe: Document Xe PM component") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250708021450.3602087-2-shuicheng.lin@intel.com Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10drm/xe/bmg: Don't use WA 16023588340 and 22019338487 on VFMichal Wajdeczko1-2/+2
These workarounds are not applicable for use by the VFs. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Link: https://lore.kernel.org/r/20250710103040.375610-2-jakub1.kolakowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10drm/xe/bo: add GPU memory trace pointsJuston Li2-0/+25
Add TRACE_GPU_MEM tracepoints for tracking global GPU memory usage. These are required by VSR on Android 12+ for reporting GPU driver memory allocations. v5: - Drop process_mem tracking - Set the gpu_id field to dev->primary->index (Lucas, Tvrtko) - Formatting cleanup under 80 columns v3: - Use now configurable CONFIG_TRACE_GPU_MEM instead of adding a per-driver Kconfig (Lucas) v2: - Use u64 as preferred by checkpatch (Tvrtko) - Fix errors in comments/Kconfig description (Tvrtko) - drop redundant "CONFIG" in Kconfig Signed-off-by: Juston Li <justonli@chromium.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250709192313.479336-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10drm/xe/xe_i2c: Add support for i2c in survivability modeRiana Tauro1-8/+11
Initialize i2c in survivability mode to allow firmware update of Add-In Management Controller (AMC) in survivability mode. Signed-off-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-6-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10drm/xe/pm: Wire up suspend/resume for I2C controllerRaag Jadav4-0/+47
Wire up suspend/resume handles for I2C controller to match its power state with SGUnit. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-5-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10drm/xe: Support for I2C attached MCUsHeikki Krogerus11-1/+390
Adding adaption/glue layer where the I2C host adapter (Synopsys DesignWare I2C adapter) and the I2C clients (the microcontroller units) are enumerated. The microcontroller units (MCU) that are attached to the GPU depend on the OEM. The initially supported MCU will be the Add-In Management Controller (AMC). Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo fixed the co-developed tags and SPDX format in the .c file]
2025-07-10drm/xe/guc: Don't allocate temporary policies objectMichal Wajdeczko1-18/+9
Since we are already using reusable buffer objects from the GuC buffer cache, we can directly write into their CPU pointers and spare unnecessary temporary allocation. While around, also make sure to clear obtained buffer, to avoid sending some stale data. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250702142504.1656-1-michal.wajdeczko@intel.com
2025-07-10drm/xe/pf: Print configuration KLVs using debug printerMichal Wajdeczko1-5/+5
While we print VF's configuration KLVs only under DEBUG_SRIOV config, we should be doing it at debug level, not info level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250703145709.1832-1-michal.wajdeczko@intel.com
2025-07-10drm/xe/pf: Print runtime registers using debug printerMichal Wajdeczko1-1/+1
While we already print VF's runtime registers only under DEBUG_SRIOV config, we should be still doing it at debug level, not info. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250604190021.725-2-michal.wajdeczko@intel.com
2025-07-10drm/xe: Expose fan control and voltage regulator versionRaag Jadav2-1/+157
Add sysfs attributes for late binding features which expose bound version to the user. v2: Rework attribute and macro naming (Badal) v3: Drop fancy formatting (Rodrigo) v4: Form version string using local variables (Rodrigo) Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250709164224.2676086-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-09drm/xe: Release runtime pm for error path of xe_devcoredump_read()Shuicheng Lin1-10/+28
xe_pm_runtime_put() is missed to be called for the error path in xe_devcoredump_read(). Add function description comments for xe_devcoredump_read() to help understand it. v2: more detail function comments and refine goto logic (Matt) Fixes: c4a2e5f865b7 ("drm/xe: Add devcoredump chunking") Cc: stable@vger.kernel.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707004911.3502904-6-shuicheng.lin@intel.com
2025-07-09drm/xe: Remove unused code in devcoredump_snapshot()Shuicheng Lin1-12/+0
The deleted code is no longer needed because patch "drm/xe/guc: Plumb GuC-capture into dev coredump" has removed the related usage code. Remove the code to tidy up the function. v2: s/bacause/because Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707004911.3502904-5-shuicheng.lin@intel.com
2025-07-09drm/xe/uc: Disable GuC communication on hardware initialization errorZhanjun Dong2-7/+19
Disable GuC communication on Xe micro controller hardware initialization error. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4917 Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707231108.3217573-1-zhanjun.dong@intel.com
2025-07-08drm/xe/pm: Restore display pm if there is error after display suspendShuicheng Lin1-2/+1
xe_bo_evict_all() is called after xe_display_pm_suspend(). So if there is error with xe_bo_evict_all(), display pm should be restored. Fixes: 51462211f4a9 ("drm/xe/pxp: add PXP PM support") Fixes: cb8f81c17531 ("drm/xe/display: Make display suspend/resume work on discrete") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250708035424.3608190-2-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-08drm/xe/ptl: Drop force_probe requirementMatt Atwood1-1/+0
Panther Lake has proven to be stable through testing and use. Remove the force_probe requirement and enable the platform by default. Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250707201959.319406-1-matthew.s.atwood@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-08Merge tag 'drm-intel-next-2025-07-04' of ↵Simona Vetter8-41/+140
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next drm/i915 feature pull #2 for v6.17: Features and functionality: - Add drm_panic support for both i915 and xe drivers (Jocelyn Falempe) - Add initial flip queue implementation, disabled by default, for LNL and PTL (Ville) - Add support for Wildcat Lake (WCL) display, version 30.02 (Matt Roper, Matt Atwood, Dnyaneshwar) - Extend drm_panel and follower support to DDI eDP (Arun) Refactoring and cleanups: - Make all global state objects opaque (Jani) - Move display works to display specific unordered workqueue (Luca) - Add and use struct drm_device based pcode interface (Jani, Lucas) - Use clamp() instead of max()+min() combo (Ankit) - Simplify wait for power well disable (Jani) - Various stylistics cleanups and renames (Jani) Fixes: - Deal with loss of pipe DMC state (Ville) - Fix PTL HDCP2 stream status check (Suraj) - Add workaround for ADL-P DKL PHY DP and HDMI (Nemesa) - Fix skl_print_wm_changes() stack usage with KMSAN (Arnd Bergmann) - Fix PCON capability reads on non-branch devices (Chaitanya) - Fix which platforms have ultra joiner (Ankit) DRM core changes: - Add ttm_bo_kmap_try_from_panic() for xe drm_panic support (Jocelyn Falempe) - Add private pointer to struct drm_scanout buffer for xe/i915 drm_panic support (Jocelyn Falempe) Merges: - Backmerge drm-next for drm_panel and xe changes (Jani) Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6d728bf6ef23681b00dfbc7da9aeae41042dee02@intel.com
2025-07-08drm/xe/bmg: fix compressed VRAM handlingMatthew Auld1-1/+1
There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user. To fix this pass along use_comp_pat into emit_pte() on the src side, to indicate that compression should be considered. v2 (Jonathan): tweak the commit message Fixes: 523f191cc0c7 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com (cherry picked from commit f7a2fd776e57bd6468644bdecd91ab3aba57ba58) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-08Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"Matthew Brost1-1/+0
This reverts commit fe0154cf8222d9e38c60ccc124adb2f9b5272371. Seeing some unexplained random failures during LRC context switches with indirect ring state enabled. The failures were always there, but the repro rate increased with the addition of WA BB as a separate BO. Commit 3a1edef8f4b5 ("drm/xe: Make WA BB part of LRC BO") helped to reduce the issues in the context switches, but didn't eliminate them completely. Indirect ring state is not required for any current features, so disable for now until failures can be root caused. Cc: stable@vger.kernel.org Fixes: fe0154cf8222 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 03d85ab36bcbcbe9dc962fccd3f8e54d7bb93b35) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-08drm/xe: Allocate PF queue size on pow2 boundaryMatthew Brost1-0/+1
CIRC_SPACE does not work unless the size argument is a power of 2, allocate PF queue size on power of 2 boundary. Cc: stable@vger.kernel.org Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size") Fixes: 29582e0ea75c ("drm/xe: Add page queue multiplier") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com (cherry picked from commit 491b9783126303755717c0cbde0b08ee59b6abab) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-08drm/xe/pf: Clear all LMTT pages on allocMichal Wajdeczko1-0/+11
Our LMEM buffer objects are not cleared by default on alloc and during VF provisioning we only setup LMTT PTEs for the actually provisioned LMEM range. But beyond that valid range we might leave some stale data that could either point to some other VFs allocations or even to the PF pages. Explicitly clear all new LMTT page to avoid the risk that a malicious VF would try to exploit that gap. While around add asserts to catch any undesired PTE overwrites and low-level debug traces to track LMTT PT life-cycle. Fixes: b1d204058218 ("drm/xe/pf: Introduce Local Memory Translation Table") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com (cherry picked from commit 3fae6918a3e27cce20ded2551f863fb05d4bef8d) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-07drm/xe/ptl: Add HuC FW definition for PTLDaniele Ceraolo Spurio1-0/+1
Add the unversioned define for the PTL HuC FW. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-15-daniele.ceraolospurio@intel.com
2025-07-07drm/xe/ptl: Add GuC FW definition for PTLDaniele Ceraolo Spurio1-0/+1
The first official GuC relase for PTL is 70.47.0, which maps to API version 1.22.4. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-14-daniele.ceraolospurio@intel.com
2025-07-07drm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2Julia Filipchuk1-3/+3
UAPI compatibility version 1.22.2 Resolves various bugs. Recommend newer version. Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-13-daniele.ceraolospurio@intel.com
2025-07-04drm/xe/bmg: fix compressed VRAM handlingMatthew Auld1-1/+1
There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user. To fix this pass along use_comp_pat into emit_pte() on the src side, to indicate that compression should be considered. v2 (Jonathan): tweak the commit message Fixes: 523f191cc0c7 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com
2025-07-04Merge tag 'drm-misc-next-2025-07-03' of ↵Dave Airlie1-1/+7
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - bridge: More reference counting - dp: Implement backlight control helpers - fourcc: Add half-float and 32b float formats, RGB161616, BGR161616 - mipi-dsi: Drop MIPI_DSI_MODE_VSYNC_FLUSH flag - ttm: Improve eviction Driver Changes: - i915: Use backlight control helpers for eDP - tidss: Add AM65x OLDI bridge support - panels: - panel-edp: Add CMN N116BCJ-EAK support - raydium-rm67200: misc cleanups, optional reset - new panel: DJN HX83112B Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250703-chirpy-lilac-dalmatian-2c5838@houat
2025-07-03Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"Matthew Brost1-1/+0
This reverts commit fe0154cf8222d9e38c60ccc124adb2f9b5272371. Seeing some unexplained random failures during LRC context switches with indirect ring state enabled. The failures were always there, but the repro rate increased with the addition of WA BB as a separate BO. Commit 3a1edef8f4b5 ("drm/xe: Make WA BB part of LRC BO") helped to reduce the issues in the context switches, but didn't eliminate them completely. Indirect ring state is not required for any current features, so disable for now until failures can be root caused. Cc: stable@vger.kernel.org Fixes: fe0154cf8222 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-03drm/xe/vf: Make multi-GT migration less error proneTomasz Lis1-90/+77
There is a remote chance that after migration, some GTs will not send the MIGRATED interrupt, or due to current VF KMD state the interrupt will not lead to marking the GT for recovery. Requiring IRQs from all GTs before starting migration introduces the possibility that the process will get stalled due to one GuC. One could argue it is also waste of time to wait for all IRQs, but we should get them all IRQs as soon as VGPU starts, so that's not really an impactful argument. Still, not waiting for all GTs makes it easier to handle situations: * where one GuC IRQ is missing * where state before probe is unclean - getting MIGRATED IRQ as soon as interrupts are enabled * where multiple migrations happen close to each other To help with these cases, this patch alters the post-migration recovery so that recovery task is started as soon as one GuC IRQ is handled, and other GTs are included in recovery later as the subsequent IRQs are serviced. The post-migration recovery can now be called for any selection of GTs, and it will perform recovery on all GTs for which IRQs have arrived, even multiple times if necessary. v2: Typos and style fixes v3: Transferring gt_flags by value rather than reference to last function where it is used Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250630152155.195648-1-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-07-03drm/xe: Allocate PF queue size on pow2 boundaryMatthew Brost1-0/+1
CIRC_SPACE does not work unless the size argument is a power of 2, allocate PF queue size on power of 2 boundary. Cc: stable@vger.kernel.org Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size") Fixes: 29582e0ea75c ("drm/xe: Add page queue multiplier") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com
2025-07-03drm/xe/pf: Clear all LMTT pages on allocMichal Wajdeczko1-0/+11
Our LMEM buffer objects are not cleared by default on alloc and during VF provisioning we only setup LMTT PTEs for the actually provisioned LMEM range. But beyond that valid range we might leave some stale data that could either point to some other VFs allocations or even to the PF pages. Explicitly clear all new LMTT page to avoid the risk that a malicious VF would try to exploit that gap. While around add asserts to catch any undesired PTE overwrites and low-level debug traces to track LMTT PT life-cycle. Fixes: b1d204058218 ("drm/xe/pf: Introduce Local Memory Translation Table") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com
2025-07-03drm/xe: Do not wedge device on killed exec queuesMatthew Brost1-4/+6
When a user closes an exec queue or interrupts an app with Ctrl-C, this does not warrant wedging the device in mode 2. Avoid this by skipping the wedge check for killed exec queues in the TDR and LR exec queue cleanup worker. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250624174103.2707941-1-matthew.brost@intel.com (cherry picked from commit 5a2f117a80c207372513ca8964eeb178874f4990) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-03drm/xe: Extend WA 14018094691 to BMGDaniele Ceraolo Spurio1-1/+2
This WA is applicable to BMG as well. Note that this is a GSC WA and we don't load the GSC on BMG, so extending the WA to BMG won't do anything right now. However, it helps future-proof the driver so that if we ever turn the GSC on we won't have to remember to extend this WA. v2: don't use VERSION_RANGE from 2001 to 2004 (Matt) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250613231128.1261815-2-daniele.ceraolospurio@intel.com (cherry picked from commit 1a5ce0c5b95b0624ebd44f574b98003a466973be) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-03drm/xe/xe_pmu: Validate gt in event supportedRiana Tauro1-2/+5
Validate gt instead of checking gt_id is lesser than max gts per tile Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250630093741.2435281-1-riana.tauro@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-03drm/xe/xe_query: Use separate iterator while filling GT listMatt Roper1-12/+15
The 'id' value updated by for_each_gt() is the uapi GT ID of the GTs being iterated over, and may skip over values if a GT is not present on the device. Use a separate iterator for GT list array assignments to ensure that the array will be filled properly on future platforms where index in the GT query list may not match the uapi ID. v2: - Include the missing increment of the iterator. (Jonathan) Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-16-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-03drm/xe: Don't compare GT ID to GT count when determining valid GTsMatt Roper4-9/+8
On current platforms with multiple GTs, all of the GT IDs are consecutive; as a result we know that the GT IDs range from 0 to gt_count-1 and can determine if a GT ID is valid by comparing against the count. The consecutive nature of GT IDs may not hold true on future platforms if/when we have platforms that are both multi-tile and have multiple GTs within each tile. Once such platforms exist, it's quite possible that we could wind up with something like a GT list composed of IDs 0, 2, and 3 with no GT 1 (which would be a 2-tile platform with media only on the second tile). To future-proof the code we should stop comparing against the GT count to determine whether a GT ID is valid or not. Instead we should do an actual lookup of the ID to determine whether the GT exists. This also means that our GT loop macro should not end at the GT count, but should rather examine the entire space up to (# of tiles) * (max GT per tile) to ensure it doesn't stop prematurely. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-15-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-03drm/xe: Assign GT IDs properly on multi-tile + multi-GT platformsMatt Roper2-16/+14
Although "multi-tile" and "multiple GTs per tile" are mutually-exclusive characteristics on all of our platforms today, this may not always be true. Assign GT IDs according to xe->info.max_gt_per_tile in a way that should work even if future platforms have different configurations. This patch should not change the behavior of current platforms; it only future-proofs for potential future designs. v2: - Re-calculate gt_count if tile count gets reduced by MTCFG. (PVC CI) Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-14-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-03drm/xe/tests/pci: Ensure all platforms have a valid GT/tile countMatt Roper5-39/+85
Add a simple kunit test to ensure each platform's GT per tile count is non-zero and does not exceed the global XE_MAX_GT_PER_TILE definition. We need to move 'struct xe_subplatform_desc' from the .c file to the types header to ensure it is accessible from the kunit test. v2: - Rebase on latest xe_pci test rework from Michal and convert to a parameterized test that runs on each PCI ID supported by the driver. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-03drm/xe: Track maximum GTs per tile on a per-platform basisMatt Roper4-24/+41
Today all of our platforms fall into one of three cases: * Single tile platforms with a single (primary) GT * Single tile platforms with two GTs (primary + media) * Two-tile platforms with a single GT (primary) in each Our numbering of GTs has been a bit inconsistent between platforms (e.g., GT1 is the media GT on some platforms, but the second tile's primary GT on others). In the future we'll likely have platforms that are both multi-tile and multi-GT, which will make the situation more confusing. We could also wind up with more than just two types of GTs at some point in the future. Going forward we should standardize the way we assign uapi GT IDs to internal GT structures. Let's declare that for userspace GT ID n, GT[n]'s tile = n / (max gt per tile) GT[n]'s slot within tile = n % (max gt per tile) We don't want the GT numbering to change for any of our current platforms since the current IDs are part of our ABI contract with userspace so this means we should track the 'max gt per tile' value on a per-platform basis rather than just using a single value across the driver. Encode this into device descriptors in xe_pci.c and use the per-platform number for various checks in the code. Constant XE_MAX_GT_PER_TILE will remain just as the maximum across all platforms for easy of sizing array allocations. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-12-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-03drm/xe: Export xe_step_name for kunit testsMatt Roper1-0/+2
xe_step_name() is used by xe_assert(), so adding assertions to functions like xe_device_get_gt() will result in ERROR: modpost: "xe_step_name" [drivers/gpu/drm/xe/tests/xe_test.ko] undefined! while building the kunit tests. Export xe_step_name to avoid these build failures when adding assertions. Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-11-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-02drm/xe/hw_engine_group: Fix potential leakMichal Wajdeczko1-14/+5
If we fail to allocate a workqueue we will leak kzalloc'ed group object since it was designed to be kfree'ed in the drmm cleanup action, but we didn't have a chance to register this action yet. To avoid this leak allocate a group object using drmm_kzalloc() and start using predefined drmm action to release the workqueue. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250627184143.1480-1-michal.wajdeczko@intel.com
2025-07-01drm/xe: Allow dropping kunit dependency as built-inHarry Austen1-1/+2
Fix Kconfig symbol dependency on KUNIT, which isn't actually required for XE to be built-in. However, if KUNIT is enabled, it must be built-in too. Fixes: 08987a8b6820 ("drm/xe: Fix build with KUNIT=m") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Harry Austen <hpausten@protonmail.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-2-756fe5cd56cf@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit a559434880b320b83733d739733250815aecf1b0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-01drm/xe: Fix kconfig promptLucas De Marchi2-3/+4
The xe driver is the official driver for Intel Xe2 and later, while maintaining experimental support for earlier GPUs. Reword the help message accordingly. Reviewed-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://lore.kernel.org/r/20250611-xe-kconfig-help-v1-1-8bcc6b47d11a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 1488a3089de3d0bcdc9532da7ce04cf0af9d7dd0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-01drm/xe/bmg: Update Wa_22019338487Vinay Belgaumkar4-2/+135
Limit GT max frequency to 2600MHz and wait for frequency to reduce before proceeding with a transient flush. This is really only needed for the transient flush: if L2 flush is needed due to 16023588340 then there's no need to do this additional wait since we are already using the bigger hammer. v2: Use generic names, ensure user set max frequency requests wait for flush to complete (Rodrigo) v3: - User requests wait via wait_var_event_timeout (Lucas) - Close races on flush + user requests (Lucas) - Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt rather than root gt (Lucas) v4: - Only apply the freq reducing part if a TDF is needed: L2 flush trumps the need for waiting a lower frequency Fixes: aaa08078e725 ("drm/xe/bmg: Apply Wa_22019338487") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit deea6a7d6d803d6bb874a3e6f1b312e560e6c6df) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-01drm/xe/bmg: Update Wa_14022085890Vinay Belgaumkar2-0/+9
Set GT min frequency to 1200Mhz once driver load is complete. v2: Review comments (Rodrigo) v3: Apply Wa earlier so user_req_min is not clobbered. v4: Apply to all GTs (Lucas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-3-94ba5dcc1e30@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit bdde16c9ac5cb56ad2ee19792222fa1853577af7) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>