kernel/linux.git/drivers/gpu/drm/xe, branch v7.0-rc7

drm/xe: Avoid memory allocations in xe_device_declare_wedged()

2026-03-30T12:52:20+00:00

xe_device_declare_wedged() runs in the DMA-fence signaling path, where GFP_KERNEL memory allocations are not allowed. However, registering xe_device_wedged_fini via drmm_add_action_or_reset() triggers a GFP_KERNEL allocation. Fix this by deferring the registration of xe_device_wedged_fini until late in the driver load sequence. Additionally, drop the wedged PM reference only if the device is actually wedged in xe_device_wedged_fini. Fixes: 452bca0edbd0 ("drm/xe: Don't suspend device upon wedge") Signed-off-by: Matthew Brost Reviewed-by: Rodrigo Vivi Link: https://patch.msgid.link/20260326210116.202585-2-matthew.brost@intel.com (cherry picked from commit b08ceb443866808b881b12d4183008d214d816c1) Signed-off-by: Rodrigo Vivi

drm/xe: Disable garbage collector work item on SVM close

2026-03-30T12:52:14+00:00

When an SVM is closed, the garbage collector work item must be stopped synchronously and any future queuing must be prevented. Replace flush_work() with disable_work_sync() to ensure both conditions are met. Fixes: 63f6e480d115 ("drm/xe: Add SVM garbage collector") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost Reviewed-by: Thomas Hellström Link: https://patch.msgid.link/20260227015225.3081787-1-matthew.brost@intel.com (cherry picked from commit 2247feb9badca5a4774df9a437bfc44fba4f22de) Signed-off-by: Rodrigo Vivi

drm/xe/pxp: Don't allow PXP on older PTL GSC FWs

2026-03-30T12:52:08+00:00

On PTL, older GSC FWs have a bug that can cause them to crash during PXP invalidation events, which leads to a complete loss of power management on the media GT. Therefore, we can't use PXP on FWs that have this bug, which was fixed in PTL GSC build 1396. Fixes: b1dcec9bd8a1 ("drm/xe/ptl: Enable PXP for PTL") Signed-off-by: Daniele Ceraolo Spurio Cc: Julia Filipchuk Reviewed-by: Julia Filipchuk Acked-by: Rodrigo Vivi Link: https://patch.msgid.link/20260324153718.3155504-10-daniele.ceraolospurio@intel.com (cherry picked from commit 6eb04caaa972934c9b6cea0e0c29e466bf9a346f) Signed-off-by: Rodrigo Vivi

drm/xe/pxp: Clear restart flag in pxp_start after jumping back

2026-03-30T12:52:02+00:00

If we don't clear the flag we'll keep jumping back at the beginning of the function once we reach the end. Fixes: ccd3c6820a90 ("drm/xe/pxp: Decouple queue addition from PXP start") Signed-off-by: Daniele Ceraolo Spurio Cc: Julia Filipchuk Reviewed-by: Julia Filipchuk Link: https://patch.msgid.link/20260324153718.3155504-9-daniele.ceraolospurio@intel.com (cherry picked from commit 0850ec7bb2459602351639dccf7a68a03c9d1ee0) Signed-off-by: Rodrigo Vivi

drm/xe/pxp: Remove incorrect handling of impossible state during suspend

2026-03-30T12:51:55+00:00

The default case of the PXP suspend switch is incorrectly exiting without releasing the lock. However, this case is impossible to hit because we're switching on an enum and all the valid enum values have their own cases. Therefore, we can just get rid of the default case and rely on the compiler to warn us if a new enum value is added and we forget to add it to the switch. Fixes: 51462211f4a9 ("drm/xe/pxp: add PXP PM support") Signed-off-by: Daniele Ceraolo Spurio Cc: Alan Previn Teres Alexis Cc: Julia Filipchuk Reviewed-by: Julia Filipchuk Link: https://patch.msgid.link/20260324153718.3155504-8-daniele.ceraolospurio@intel.com (cherry picked from commit f1b5a77fc9b6a90cd9a5e3db9d4c73ae1edfcfac) Signed-off-by: Rodrigo Vivi

drm/xe/pxp: Clean up termination status on failure

2026-03-30T12:51:49+00:00

If the PXP HW termination fails during PXP start, the normal completion code won't be called, so the termination will remain uncomplete. To avoid unnecessary waits, mark the termination as completed from the error path. Note that we already do this if the termination fails when handling a termination irq from the HW. Fixes: f8caa80154c4 ("drm/xe/pxp: Add PXP queue tracking and session start") Signed-off-by: Daniele Ceraolo Spurio Cc: Alan Previn Teres Alexis Cc: Julia Filipchuk Reviewed-by: Julia Filipchuk Link: https://patch.msgid.link/20260324153718.3155504-7-daniele.ceraolospurio@intel.com (cherry picked from commit 5d9e708d2a69ab1f64a17aec810cd7c70c5b9fab) Signed-off-by: Rodrigo Vivi

drm/xe/madvise: Accept canonical GPU addresses in xe_vm_madvise_ioctl

2026-03-30T12:51:42+00:00

Userspace passes canonical (sign-extended) GPU addresses where bits 63:48 mirror bit 47. The internal GPUVM uses non-canonical form (upper bits zeroed), so passing raw canonical addresses into GPUVM lookups causes mismatches for addresses above 128TiB. Strip the sign extension with xe_device_uncanonicalize_addr() at the top of xe_vm_madvise_ioctl(). Non-canonical addresses are unaffected. Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe") Suggested-by: Matthew Brost Cc: Thomas Hellström Reviewed-by: Matthew Brost Signed-off-by: Himal Prasad Ghimiray Signed-off-by: Arvind Yadav Signed-off-by: Matthew Brost Link: https://patch.msgid.link/20260326130843.3545241-13-arvind.yadav@intel.com (cherry picked from commit 05c8b1cdc54036465ea457a0501a8c2f9409fce7) Signed-off-by: Rodrigo Vivi

drm/xe/xe_pagefault: Disallow writes to read-only VMAs

2026-03-30T12:51:29+00:00

The page fault handler should reject write/atomic access to read only VMAs. Add code to handle this in xe_pagefault_service after the VMA lookup. v2: - Apply max line length (Matthew) Fixes: fb544b844508 ("drm/xe: Implement xe_pagefault_queue_work") Signed-off-by: Jonathan Cavitt Suggested-by: Matthew Brost Cc: Shuicheng Lin Reviewed-by: Matthew Brost Signed-off-by: Matthew Brost Link: https://patch.msgid.link/20260324152935.72444-7-jonathan.cavitt@intel.com (cherry picked from commit 714ee6754ac5fa3dc078856a196a6b124cd797a0) Signed-off-by: Rodrigo Vivi

drm/xe: always keep track of remap prev/next

2026-03-25T12:05:33+00:00

During 3D workload, user is reporting hitting: [ 413.361679] WARNING: drivers/gpu/drm/xe/xe_vm.c:1217 at vm_bind_ioctl_ops_unwind+0x1e2/0x2e0 [xe], CPU#7: vkd3d_queue/9925 [ 413.361944] CPU: 7 UID: 1000 PID: 9925 Comm: vkd3d_queue Kdump: loaded Not tainted 7.0.0-070000rc3-generic #202603090038 PREEMPT(lazy) [ 413.361949] RIP: 0010:vm_bind_ioctl_ops_unwind+0x1e2/0x2e0 [xe] [ 413.362074] RSP: 0018:ffffd4c25c3df930 EFLAGS: 00010282 [ 413.362077] RAX: 0000000000000000 RBX: ffff8f3ee817ed10 RCX: 0000000000000000 [ 413.362078] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 413.362079] RBP: ffffd4c25c3df980 R08: 0000000000000000 R09: 0000000000000000 [ 413.362081] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8f41fbf99380 [ 413.362082] R13: ffff8f3ee817e968 R14: 00000000ffffffef R15: ffff8f43d00bd380 [ 413.362083] FS: 00000001040ff6c0(0000) GS:ffff8f4696d89000(0000) knlGS:00000000330b0000 [ 413.362085] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 413.362086] CR2: 00007ddfc4747000 CR3: 00000002e6262005 CR4: 0000000000f72ef0 [ 413.362088] PKRU: 55555554 [ 413.362089] Call Trace: [ 413.362092] [ 413.362096] xe_vm_bind_ioctl+0xa9a/0xc60 [xe] Which seems to hint that the vma we are re-inserting for the ops unwind is either invalid or overlapping with something already inserted in the vm. It shouldn't be invalid since this is a re-insertion, so must have worked before. Leaving the likely culprit as something already placed where we want to insert the vma. Following from that, for the case where we do something like a rebind in the middle of a vma, and one or both mapped ends are already compatible, we skip doing the rebind of those vma and set next/prev to NULL. As well as then adjust the original unmap va range, to avoid unmapping the ends. However, if we trigger the unwind path, we end up with three va, with the two ends never being removed and the original va range in the middle still being the shrunken size. If this occurs, one failure mode is when another unwind op needs to interact with that range, which can happen with a vector of binds. For example, if we need to re-insert something in place of the original va. In this case the va is still the shrunken version, so when removing it and then doing a re-insert it can overlap with the ends, which were never removed, triggering a warning like above, plus leaving the vm in a bad state. With that, we need two things here: 1) Stop nuking the prev/next tracking for the skip cases. Instead relying on checking for skip prev/next, where needed. That way on the unwind path, we now correctly remove both ends. 2) Undo the unmap va shrinkage, on the unwind path. With the two ends now removed the unmap va should expand back to the original size again, before re-insertion. v2: - Update the explanation in the commit message, based on an actual IGT of triggering this issue, rather than conjecture. - Also undo the unmap shrinkage, for the skip case. With the two ends now removed, the original unmap va range should expand back to the original range. v3: - Track the old start/range separately. vma_size/start() uses the va info directly. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7602 Fixes: 8f33b4f054fc ("drm/xe: Avoid doing rebinds") Signed-off-by: Matthew Auld Cc: Matthew Brost Cc: # v6.8+ Reviewed-by: Matthew Brost Link: https://patch.msgid.link/20260318100208.78097-2-matthew.auld@intel.com (cherry picked from commit aec6969f75afbf4e01fd5fb5850ed3e9c27043ac) Signed-off-by: Rodrigo Vivi

drm/xe: Implement recent spec updates to Wa_16025250150

2026-03-24T13:29:10+00:00

The hardware teams noticed that the originally documented workaround steps for Wa_16025250150 may not be sufficient to fully avoid a hardware issue. The workaround documentation has been augmented to suggest programming one additional register; make the corresponding change in the driver. Fixes: 7654d51f1fd8 ("drm/xe/xe2hpg: Add Wa_16025250150") Reviewed-by: Matt Atwood Link: https://patch.msgid.link/20260319-wa_16025250150_part2-v1-1-46b1de1a31b2@intel.com Signed-off-by: Matt Roper (cherry picked from commit a31566762d4075646a8a2214586158b681e94305) Signed-off-by: Rodrigo Vivi