summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/xe
AgeCommit message (Collapse)AuthorFilesLines
2025-06-13drm/xe/guc: Ignore GuC CT errors when wedgedVinay Belgaumkar1-5/+5
Messaging to GuC may get canceled when device is wedged. Don't flag this as an error in xe_guc_pc code. Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-1-94ba5dcc1e30@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-13drm/xe: Add helper function to inject fault into ct_dead_capture()Satyanarayana K V P2-0/+26
When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv() functions, the CI test systems are going out of space and crashing. To avoid this issue, a new helper function is created and when fault is injected into this xe_is_injection_active() helper function, ct dead capture is avoided which suppresses ct dumps in the log. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Suggested-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250612080402.22011-1-satyanarayana.k.v.p@intel.com
2025-06-13drm/xe: Fix early wedge on GuC load failureDaniele Ceraolo Spurio4-2/+21
When the GuC fails to load we declare the device wedged. However, the very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done before the GT1 GuC objects are initialized, so things go bad when the wedge code attempts to cleanup GT1. To fix this, check the initialization status in the functions called during wedge. Fixes: 7dbe8af13c18 ("drm/xe: Wedge the entire device") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: stable@vger.kernel.org # v6.12+: 1e1981b16bb1: drm/xe: Fix taking invalid lock on wedge Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250611214453.1159846-2-daniele.ceraolospurio@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-12drm/xe: Make WA BB part of LRC BOMatthew Brost2-21/+18
No idea why, but without this GuC context switches randomly fail when running IGTs in a loop. Need to follow up why this fixes the aforementioned issue but can live with a stable driver for now. Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250612031925.4009701-1-matthew.brost@intel.com
2025-06-12drm/xe: Use WRITE_ONCE for range->tile_invalidated updateMatthew Brost1-2/+5
Updating range->tile_invalidated should be done with WRITE_ONCE to pair with READ_ONCE in opportunistic checks. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhrost <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20250604234712.2441130-1-matthew.brost@intel.com
2025-06-12drm/xe: Don't use drm exec locking in SVM pagefaultsMatthew Brost1-23/+13
Only the VM dma-resv lock is needed in SVM pagefaults so xe_vm_lock/unlock can be used instead of drm exec. Micro optimization but should save some CPU cycles in a critical path. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250603174012.2195759-1-matthew.brost@intel.com
2025-06-12drm/xe/lrc: Use a temporary buffer for WA BBLucas De Marchi1-4/+20
In case the BO is in iomem, we can't simply take the vaddr and write to it. Instead, prepare a separate buffer that is later copied into io memory. Right now it's just a few words that could be using xe_map_write32(), but the intention is to grow the WA BB for other uses. Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization") Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit ef48715b2d3df17c060e23b9aa636af3d95652f8) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-12drm/xe/lrc: Prepare WA BB setup for more usersLucas De Marchi1-14/+55
The post context restore (WA BB) is a mechanism in HW that may be used for things other than the utilization setup. Create a new function called setup_wa_bb() that wraps any function writing useful commands in the buffer. Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-2-0dfc5dafcef0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-12drm/xe/lrc: Use a temporary buffer for WA BBLucas De Marchi1-4/+20
In case the BO is in iomem, we can't simply take the vaddr and write to it. Instead, prepare a separate buffer that is later copied into io memory. Right now it's just a few words that could be using xe_map_write32(), but the intention is to grow the WA BB for other uses. Fixes: 82b98cadb01f ("drm/xe: Add WA BB to capture active context utilization") Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-11drm/xe/xe2_hpg: Define additional Xe2_HPG GMD_IDShekhar Chauhan1-0/+1
Add another GMD_ID for Xe2_HPG Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: James Ausmus <james.ausmus@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250605190804.1287289-4-dnyaneshwar.bhadane@intel.com
2025-06-11drm/xe/xe2_hpg: Add set of workaroundsShekhar Chauhan2-21/+29
Add set of workarounds for xe2_hpg. -v2: Fix xe2_hpg GMD version for some workarounds. -v3: Removed extra Workaround (Matt Roper) Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250605190804.1287289-3-dnyaneshwar.bhadane@intel.com
2025-06-11drm/i915/display: drop i915_reg.h include where possibleJani Nikula1-1/+0
A number of files have unnecessary i915_reg.h includes. Drop them. Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> Link: https://lore.kernel.org/r/7c4002322f4d8132fd2eaa1a4d688539cdd043c3.1749469962.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-06-11Revert "drm/xe/display: use xe->display to decide whether to do anything"Vivek Kasireddy1-20/+20
This reverts commit 5a9f299f956ef9764f56044cfca7aafa23cea1d1. The following crash/regression was seen with the reverted commit on a specific BMG SKU with no display capabilities: [ 115.582833] BUG: kernel NULL pointer dereference, address: 00000000000005d0 [ 115.589775] #PF: supervisor write access in kernel mode [ 115.594976] #PF: error_code(0x0002) - not-present page [ 115.600088] PGD 0 P4D 0 [ 115.602617] Oops: Oops: 0002 [#1] SMP [ 115.606267] CPU: 14 UID: 0 PID: 1547 Comm: kworker/14:3 Tainted: G U E 6.15.0-local+ #62 PREEMPT(voluntary) [ 115.617332] Tainted: [U]=USER, [E]=UNSIGNED_MODULE [ 115.622100] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPEMI1.R00.3471.D49.2401260852 01/26/2024 [ 115.635314] Workqueue: pm pm_runtime_work [ 115.639309] RIP: 0010:_raw_spin_lock+0x17/0x30 [ 115.662382] RSP: 0018:ffffd13f82e7bc30 EFLAGS: 00010246 [ 115.667581] RAX: 0000000000000000 RBX: ffff8be919076000 RCX: 0000000000000002 [ 115.674675] RDX: 0000000000000001 RSI: 000000000000004b RDI: 00000000000005d0 [ 115.681775] RBP: ffffd13f82e7bc60 R08: ffffd13f82e7bb00 R09: ffff8beb0c1b06c0 [ 115.688869] R10: ffff8be7c034f4c0 R11: fefefefefefefeff R12: fffffffffffffff0 [ 115.695965] R13: ffff8be9190762e8 R14: ffff8be919077798 R15: 00000000000005d0 [ 115.703062] FS: 0000000000000000(0000) GS:ffff8beb552b6000(0000) knlGS:0000000000000000 [ 115.711106] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.716826] CR2: 00000000000005d0 CR3: 000000024c68d002 CR4: 0000000000f72ef0 [ 115.723921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 115.731015] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 115.738113] PKRU: 55555554 [ 115.740816] Call Trace: [ 115.743258] <TASK> [ 115.745363] ? xe_display_flush_cleanup_work+0x92/0x120 [xe] [ 115.751102] xe_display_pm_runtime_suspend+0x42/0x80 [xe] [ 115.756542] xe_pm_runtime_suspend+0x11b/0x1b0 [xe] [ 115.761463] xe_pci_runtime_suspend+0x23/0xd0 [xe] [ 115.766291] pci_pm_runtime_suspend+0x6b/0x1a0 [ 115.770717] ? pci_pm_thaw_noirq+0xa0/0xa0 [ 115.774797] __rpm_callback+0x48/0x1e0 [ 115.778531] ? pci_pm_thaw_noirq+0xa0/0xa0 [ 115.782614] rpm_callback+0x66/0x70 [ 115.786090] ? pci_pm_thaw_noirq+0xa0/0xa0 [ 115.790173] rpm_suspend+0xe1/0x5e0 [ 115.793647] ? psi_task_switch+0xb8/0x200 [ 115.797643] ? finish_task_switch.isra.0+0x8d/0x270 [ 115.802502] pm_runtime_work+0xa6/0xc0 [ 115.806238] process_one_work+0x186/0x350 [ 115.810234] worker_thread+0x33a/0x480 [ 115.813968] ? process_one_work+0x350/0x350 [ 115.818132] kthread+0x10c/0x220 [ 115.821350] ? kthreads_online_cpu+0x120/0x120 [ 115.825774] ret_from_fork+0x3a/0x60 [ 115.829339] ? kthreads_online_cpu+0x120/0x120 [ 115.833768] ret_from_fork_asm+0x11/0x20 [ 115.829339] ? kthreads_online_cpu+0x120/0x120 [ 115.833768] ret_from_fork_asm+0x11/0x20 [ 115.837680] </TASK> [ 115.839907] acpi_tad(E) drm(E) [ 115.931629] CR2: 00000000000005d0 [ 115.934935] ---[ end trace 0000000000000000 ]--- [ 115.939531] RIP: 0010:_raw_spin_lock+0x17/0x30 We cannot yet use xe->display to determine whether display hardware has been successfully probed/initialized or not. This is because xe->display would not be set to NULL even with GPUs with no display capabilities (e.g, GMD_ID_DISPLAY = 0). However, this might change in the future as Xe and i915 code is unified to deal with no display cases. Therefore, for now we have to continue to rely on xe->info.probe_display (which would be set to false with display-less GPUs) to decide whether to invoke any display related functions or not. Cc: Jani Nikula <jani.nikula@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20250605054247.386633-1-vivek.kasireddy@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-06-11Merge drm/drm-next into drm-misc-nextThomas Zimmermann39-253/+815
Backmerging to forward to v6.16-rc1 Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-06-10drm/xe: Reorder 'Get pages failed' messageMatthew Brost1-2/+2
Print the error from get pages failing, not the cast to -ENODATA. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>> Link: https://lore.kernel.org/r/20250610045649.3149801-1-matthew.brost@intel.com
2025-06-09Merge drm/drm-next into drm-xe-nextThomas Hellström11-184/+48
Backmerging to bring in 6.16 Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-09drm/xe: GSM size should be constant on most platformsMatt Roper1-1/+1
On old Intel platforms, the size of the GSM (i.e., the stolen memory that holds the GGTT page table entries) could vary, so the driver needed to read the actual size from the PCI config space. However from Xe_HP onward, the GSM is now always guaranteed to be exactly 8MB (which translates to a 4GB GGTT address space); this is always true regardless of what the platform's much larger PPGTT address space is. The bspec doesn't document the PCI config space as being a valid way to query the size of the GSM after Xe_LP platforms, although so far it still seems to be giving us proper values for Xe_HP, Xe2, and Xe3. However we suspect that the config space will stop providing correct values on some upcoming platforms, so we should stop relying on it. Instead just use the hardcoded 8MB value as documented elsewhere in the bspec. Bspec: 49636, 67090, 50589 Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20250605225352.2333981-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-06-09drm/xe/svm: Fix regression disallowing 64K SVM migrationMaarten Lankhorst1-1/+1
When changing the condition from >= SZ_64K, it was changed to <= SZ_64K. This disallows migration of 64K, which is the exact minimum allowed. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5057 Fixes: 794f5493f518 ("drm/xe: Strict migration policy for atomic SVM faults") Cc: stable@vger.kernel.org Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://lore.kernel.org/r/20250521090102.2965100-1-dev@lankhorst.se (cherry picked from commit 531bef26d189b28bf0d694878c0e064b30990b6c) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-09drm/xe/uc: Use GT-oriented firmware messagesMichal Wajdeczko1-9/+10
We are already prepared to define firmwares per-GT type, so we should also prepare our messages to be GT-oriented. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250606204311.813-1-michal.wajdeczko@intel.com
2025-06-09drm/i915: split out display register macros to a separate fileJani Nikula1-0/+1
This is a scripted split of the display related register macros from i915_reg.h to display/intel_display_regs.h. As a starting point, move all the macros that are only used in display code (or GVT). If there are users in core i915 code or soc/, or no users anywhere, keep the macros in i915_reg.h. This is done in groups of macros separated by blank lines, moving the comments along with the groups. Some manually picked macro groups are kept/moved regardless of the heuristics above. This is obviously a very crude approach. It's not perfect. But there are 4.2k lines in i915_reg.h, and its refactoring has ground to a halt. This is the big hammer that splits the file to two, and enables further cleanup. Cc: Suraj Kandpal <suraj.kandpal@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> # v2 Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250606102256.2080073-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-06-09Merge drm/drm-next into drm-intel-nextJani Nikula40-257/+831
Sync to v6.16-rc1, among other things to get the fixed size GENMASK_U*() and BIT_U*() macros. Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-06-09drm/xe: Do not rely on GGTT internals in xe_guc_buf kunit testsMaarten Lankhorst3-11/+24
Add a function to init ggtt for kunit, and use the GGTT function for initialising the GGTT node without populating it. This prevents the test from ever knowing about struct xe_ggtt. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-11-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe: Implement a helper for reading out a GGTT PTE at a specified offsetMaarten Lankhorst3-5/+14
Split the GGTT PTE readout to a separate function, this is useful for adding testcases in the next commit, and also cleaner than manually reading out GGTT. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://lore.kernel.org/r/20250505121924.921544-10-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe: Remove pte_encode_bo callbackMaarten Lankhorst2-17/+0
The users inside display have been converted to use thepte_encode_flags callback, we can now remove the pte_encode_bo cb. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://lore.kernel.org/r/20250505121924.921544-9-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe/display: Convert GGTT mapping to use pte_encode_flagsMaarten Lankhorst1-12/+5
Another small step in removing pte_encode_bo callback. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://lore.kernel.org/r/20250505121924.921544-8-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe/display: Dont poke into GGTT internals to fill a DPTMaarten Lankhorst3-11/+26
For DPT, it is sufficient to get the GGTT encode flags to fill the DPT. Create a function to return the encode flags, and then encode using the BO address. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://lore.kernel.org/r/20250505121924.921544-7-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe/ggtt: Seperate flags and address in PTE encodingMaarten Lankhorst3-27/+65
Pinning large linear display framebuffers is becoming a bottleneck. My plan of attack is doing a custom walk over the BO, this allows for easier optimization of consecutive entries. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-6-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe/display: Remove dereferences of ggtt for tile idMaarten Lankhorst1-6/+7
Obtain the id from the root tile. Likely this can be hardcoded to 0, but use the clean solution of obtaining root id and doing that. to_xe_device(ggtt->tile) can also be easily replaced with xe. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-5-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe: Add xe_ggtt_allocMaarten Lankhorst3-5/+19
Instead of allocating inside xe_tile, create a new function that returns an allocated struct xe_ggtt from xe_ggtt.c Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-4-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe: Add xe_ggtt_might_lockMaarten Lankhorst3-1/+15
Another requirement of hiding more of struct xe_ggtt. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-3-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe: Use xe_ggtt_map_bo_unlocked for resumeMaarten Lankhorst3-5/+17
This is the first step to hide the details of struct xe_ggtt. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-2-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-09drm/xe: Remove IOSF_MBI select.Maarten Lankhorst1-1/+0
IOSF_MBI was only useful for some gen8 platforms, which were never supported by Xe. Presumably needed for display at one point, but display is fixed to put stubs in compat-i915-headers/vlv_sideband.h. (in drm-intel-next: vlv_iosf_sb.h) Link: https://lore.kernel.org/r/20250605074644.71036-1-dev@lankhorst.se Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-06drm/xe/topology: Stop trying to fix programming mistakesMichal Wajdeczko1-2/+1
We shouldn't ever pass more DSS registers than our hardcoded limit, it should be sufficient to just assert that instead of trying to fix it, as this will never happen in the production driver. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250604202908.769-4-michal.wajdeczko@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-06-06drm/xe/topology: Use register array size instead magic numberMichal Wajdeczko1-2/+2
Since we keep registers in the array we can simply count them and stop relying on magic number when checking if didn't make mistake. Also we can switch to use xe_gt_assert() since it could be just our programming mistake during platform bringup, no need to keep drm_WARN() in the production driver. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250604202908.769-3-michal.wajdeczko@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-06-06drm/xe/topology: Simplify code for loading DSS maskMichal Wajdeczko1-15/+18
Instead of passing registers using va_list we can keep them in the static array and pass as such and also lower driver footprint: add/remove: 2/0 grow/shrink: 0/2 up/down: 24/-175 (-151) Function old new delta geometry_regs - 12 +12 compute_regs - 12 +12 xe_gt_topology_init 550 527 -23 load_dss_mask 449 297 -152 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250604202908.769-2-michal.wajdeczko@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-06-06drm/xe: don't store the xe device pointer inside xe_ttm_ttDave Airlie2-31/+32
This device pointer is nearly always available without storing an extra copy for each tt in the system. Just noticed this while reading over the xe shrinker code. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250605062103.1234620-1-airlied@gmail.com
2025-06-06drm/xe/vf: Store the GuC FW info in guc->fwDaniele Ceraolo Spurio4-10/+64
The GuC compatibility version that we read from the CSS header in native/PF and the GuC VF version that we get when a VF handshakes with the GuC are the same version number, so we should store it into the same structure. This makes all the checks based on the compatibility version automatically work for VFs without having to copy the value over. For completion, also copy the wanted version and set the path to a known string to indicate that the FW is PF-loaded. This way all the info will be coherent when dumped from debugfs. v2: several code cleanups and style changes (Michal), rebase on bootstrap changes. v3: s/min/wanted/, clarify that handshake must happen before we can get the VF versions (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250603235432.720833-10-daniele.ceraolospurio@intel.com
2025-06-06drm/xe/vf: Use uc_fw_version to store the negotiated GuC ABIDaniele Ceraolo Spurio2-76/+78
Instead of using a VF-specific type, we can use the common uc_fw_version structure. This also means that we can use the available macros to compare ABI versions. While at it, exit early from the bootstrap if this is not the first time we're doing it and the version hasn't changed, so we don't end up logging it multiple times. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250603235432.720833-9-daniele.ceraolospurio@intel.com
2025-06-06drm/xe/vf: Boostrap all GTs immediately after MMIO initDaniele Ceraolo Spurio2-13/+6
Currently we perform the bootstrap for the primary GT early on during device init, while the media GT bootstrap happens when we try and fetch the hwconfig table. For consistency, move the bootstrap of the media GT happen at the same time as the primary GT, so that all the subsequent code can rely on both GTs being in the same state. v2: Also drop config query from min_guc_load since we now do it early (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250603235432.720833-8-daniele.ceraolospurio@intel.com
2025-06-06drm/xe/uc: Prepare uc_fw_version for storing the VF ABI versionDaniele Ceraolo Spurio2-1/+3
The VF ABI version has a branch field, so to store it inside the uc_fw_version we need to add a new branch variable to the latter. Existing code needs to be updated to handle the fact that we have the new field. v2: split out to its own patch (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250603235432.720833-7-daniele.ceraolospurio@intel.com
2025-06-06Merge tag 'drm-next-2025-06-06' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds20-176/+481
Pull drm fixes from Dave Airlie: "This is pretty much two weeks worth of fixes, plus one thing that might be considered next: amdkfd is now able to be enabled on risc-v platforms. Otherwise, amdgpu and xe with the majority of fixes, and then a smattering all over. panel: - nt37801: fix IS_ERR - nt37801: fix KConfig connector: - Fix null deref in HDMI audio helper. bridge: - analogix_dp: fixup clk-disable removal nouveau: - minor typo fix (',' vs ';') msm: - mailmap updates i915: - Fix the enabling/disabling of DP audio SDP splitting - Fix PSR register definitions for ALPM - Fix u32 overflow in SNPS PHY HDMI PLL setup - Fix GuC pending message underflow when submit fails - Fix GuC wakeref underflow race during reset xe: - Two documentation fixes - A couple of vm init fixes - Hwmon fixes - Drop reduntant conversion to bool - Fix CONFIG_INTEL_VSEC dependency - Rework eviction rejection of bound external bos - Stop re-submitting signalled jobs - A couple of pxp fixes - Add back a fix that got lost in a merge - Create LRC bo without VM - Fix for the above fix amdgpu: - UserQ fixes - SMU 13.x fixes - VCN fixes - JPEG fixes - Misc cleanups - runtime pm fix - DCN 4.0.1 fixes - Misc display fixes - ISP fix - VRAM manager fix - RAS fixes - IP discovery fix - Cleaner shader fix for GC 10.1.x - OD fix - Non-OLED panel fix - Misc display fixes - Brightness fixes amdkfd: - Enable CONFIG_HSA_AMD on RISCV - SVM fix - Misc cleanups - Ref leak fix - WPTR BO fix radeon: - Misc cleanups" * tag 'drm-next-2025-06-06' of https://gitlab.freedesktop.org/drm/kernel: (105 commits) drm/nouveau/vfn/r535: Convert comma to semicolon drm/xe: remove unmatched xe_vm_unlock() from __xe_exec_queue_init() drm/xe: Create LRC BO without VM drm/xe/guc_submit: add back fix drm/xe/pxp: Clarify PXP queue creation behavior if PXP is not ready drm/xe/pxp: Use the correct define in the set_property_funcs array drm/xe/sched: stop re-submitting signalled jobs drm/xe: Rework eviction rejection of bound external bos drm/xe/vsec: fix CONFIG_INTEL_VSEC dependency drm/xe: drop redundant conversion to bool drm/xe/hwmon: Move card reactive critical power under channel card drm/xe/hwmon: Add support to manage power limits though mailbox drm/xe/vm: move xe_svm_init() earlier drm/xe/vm: move rebind_work init earlier MAINTAINERS: .mailmap: update Rob Clark's email address mailmap: Update entry for Akhil P Oommen MAINTAINERS: update my email address MAINTAINERS: drop myself as maintainer drm/i915/display: Fix u32 overflow in SNPS PHY HDMI PLL setup drm/amd/display: Fix default DC and AC levels ...
2025-06-06drm/xe/xe3: Disable null query for anyhit shaderNitin Gote1-0/+5
Set DIS_NULL_QUERY bit of RT_CTRL register to disable null query for anyhit shader for Xe3 IP. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://lore.kernel.org/r/20250605100812.2547808-1-nitin.r.gote@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-06-05drm/xe: remove unmatched xe_vm_unlock() from __xe_exec_queue_init()Maciej Patelczyk1-5/+1
There is unmatched xe_vm_unlock() in the __xe_exec_queue_init(). Leftover from commit fbeaad071a98 ("drm/xe: Create LRC BO without VM") Fixes: 2b0a0ce0c20b ("drm/xe: Create LRC BO without VM") Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://lore.kernel.org/r/20250530135627.2821612-1-maciej.patelczyk@intel.com (cherry picked from commit 28b996ce73982a44fa86736ca0e3684cb1ae8b24) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-05drm/xe: Create LRC BO without VMNiranjana Vishwanathapura2-28/+4
Specifying VM during lrc->bo creation requires VM's reference to be held for the lifetime of lrc->bo as it will use VM's dma reservation object. Using VM's dma reservation object for lrc->bo doesn't provide any advantage. Hence do not pass VM while creating lrc->bo. v2: Use xe_bo_unpin_map_no_vm (Matthew Brost) Fixes: 264eecdba211 ("drm/xe: Decouple xe_exec_queue and xe_lrc") Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250529052031.2429120-2-niranjana.vishwanathapura@intel.com (cherry picked from commit fbeaad071a98fef87deccee81d564de1c8e8e16d) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-05drm/xe/guc_submit: add back fixMatthew Auld1-0/+11
Daniele noticed that the fix in commit 2d2be279f1ca ("drm/xe: fix UAF around queue destruction") looks to have been unintentionally removed as part of handling a conflict in some past merge commit. Add it back. Fixes: ac44ff7cec33 ("Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes") Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250603174213.1543579-2-matthew.auld@intel.com (cherry picked from commit 9d9fca62dc49d96f97045b6d8e7402a95f8cf92a) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-05drm/xe/pxp: Clarify PXP queue creation behavior if PXP is not readyDaniele Ceraolo Spurio1-2/+6
The expected flow of operations when using PXP is to query the PXP status and wait for it to transition to "ready" before attempting to create an exec_queue. This flow is followed by the Mesa driver, but there is no guarantee that an incorrectly coded (or malicious) app will not attempt to create the queue first without querying the status. Therefore, we need to clarify what the expected behavior of the queue creation ioctl is in this scenario. Currently, the ioctl always fails with an -EBUSY code no matter the error, but for consistency it is better to distinguish between "failed to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl does. Note that, while this is a change in the return code of an ioctl, the behavior of the ioctl in this particular corner case was not clearly spec'd, so no one should have been relying on it (and we know that Mesa, which is the only known userspace for this, didn't). v2: Minor rework of the doc (Rodrigo) Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com (cherry picked from commit 21784ca96025b62d95b670b7639ad70ddafa69b8) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-05drm/xe/pxp: Use the correct define in the set_property_funcs arrayDaniele Ceraolo Spurio1-1/+1
The define of the extension type was accidentally used instead of the one of the property itself. They're both zero, so no functional issue, but we should use the correct define for code correctness. Fixes: 41a97c4a1294 ("drm/xe/pxp/uapi: Add API to mark a BO as using PXP") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250522225401.3953243-6-daniele.ceraolospurio@intel.com (cherry picked from commit 1d891ee820fd0fbb4101eacb0d922b5050a24933) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-05drm/xe/sched: stop re-submitting signalled jobsMatthew Auld1-1/+9
Customer is reporting a really subtle issue where we get random DMAR faults, hangs and other nasties for kernel migration jobs when stressing stuff like s2idle/s3/s4. The explosions seems to happen somewhere after resuming the system with splats looking something like: PM: suspend exit rfkill: input handler disabled xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=bcs, logical_mask: 0x2, guc_id=0 xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=24496, lrc_seqno=24496, guc_id=0, flags=0x13 in no process [-1] xe 0000:00:02.0: [drm] GT0: Kernel-submitted job timed out The likely cause appears to be a race between suspend cancelling the worker that processes the free_job()'s, such that we still have pending jobs to be freed after the cancel. Following from this, on resume the pending_list will now contain at least one already complete job, but it looks like we call drm_sched_resubmit_jobs(), which will then call run_job() on everything still on the pending_list. But if the job was already complete, then all the resources tied to the job, like the bb itself, any memory that is being accessed, the iommu mappings etc. might be long gone since those are usually tied to the fence signalling. This scenario can be seen in ftrace when running a slightly modified xe_pm IGT (kernel was only modified to inject artificial latency into free_job to make the race easier to hit): xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ... xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13 xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x4 xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=0, guc_state=0x0, flags=0x3 xe_exec_queue_stop: dev=0000:00:02.0, 1:0x1, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x3 xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=2, guc_state=0x0, flags=0x3 xe_exec_queue_resubmit: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13 xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ... ..... xe_exec_queue_memory_cat_error: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x3, flags=0x13 So the job_run() is clearly triggered twice for the same job, even though the first must have already signalled to completion during suspend. We can also see a CAT error after the re-submit. To prevent this only resubmit jobs on the pending_list that have not yet signalled. v2: - Make sure to re-arm the fence callbacks with sched_start(). v3 (Matt B): - Stop using drm_sched_resubmit_jobs(), which appears to be deprecated and just open-code a simple loop such that we skip calling run_job() on anything already signalled. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4856 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: William Tseng <william.tseng@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250528113328.289392-2-matthew.auld@intel.com (cherry picked from commit 38fafa9f392f3110d2de431432d43f4eef99cd1b) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-05drm/xe: Rework eviction rejection of bound external bosThomas Hellström3-18/+105
For preempt_fence mode VM's we're rejecting eviction of shared bos during VM_BIND. However, since we do this in the move() callback, we're getting an eviction failure warning from TTM. The TTM callback intended for these things is eviction_valuable(). However, the latter doesn't pass in the struct ttm_operation_ctx needed to determine whether the caller needs this. Instead, attach the needed information to the vm under the vm->resv, until we've been able to update TTM to provide the needed information. And add sufficient lockdep checks to prevent misuse and races. v2: - Fix a copy-paste error in xe_vm_clear_validating() v3: - Fix kerneldoc errors. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 0af944f0e308 ("drm/xe: Reject BO eviction if BO is bound to current VM") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250528164105.234718-1-thomas.hellstrom@linux.intel.com (cherry picked from commit 9d5558649f68e2e84a87a909631b30e15ca0f8ec) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-05drm/xe/vsec: fix CONFIG_INTEL_VSEC dependencyArnd Bergmann1-1/+2
The XE driver can be built with or without VSEC support, but fails to link as built-in if vsec is in a loadable module: x86_64-linux-ld: vmlinux.o: in function `xe_vsec_init': (.text+0x1e83e16): undefined reference to `intel_vsec_register' The normal fix for this is to add a 'depends on INTEL_VSEC || !INTEL_VSEC', forcing XE to be a loadable module as well, but that causes a circular dependency: symbol DRM_XE depends on INTEL_VSEC symbol INTEL_VSEC depends on X86_PLATFORM_DEVICES symbol X86_PLATFORM_DEVICES is selected by DRM_XE The problem here is selecting a symbol from another subsystem, so change that as well and rephrase the 'select' into the corresponding dependency. Since X86_PLATFORM_DEVICES is 'default y', there is no change to defconfig builds here. Fixes: 0c45e76fcc62 ("drm/xe/vsec: Support BMG devices") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250529172355.2395634-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit e4931f8be347ec5f19df4d6d33aea37145378c42) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>