kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
6 days	drm/xe: fix job timeout recovery for unstarted jobs and kernel queues	Rodrigo Vivi	1	-14/+35
	A job that GuC never scheduled (never started) indicates a GuC scheduling failure; previously such jobs were silently errored out instead of triggering a GT reset to recover. Trigger a GT reset and resubmit them, but only when the queue was not already killed or banned: an unstarted job on an already banned queue is the ban working as intended and must neither clear the ban nor kick off a reset, otherwise a banned userspace queue could be resurrected and spam GT resets. Kernel queues are always recovered this way and wedge the device once recovery attempts are exhausted, since kernel work must not silently fail. A started job that times out on a userspace VM bind queue stays banned rather than being reset and retried. The queue is banned early in the timeout handler to signal the G2H scheduling-done handler so it wakes the disable-scheduling waiter; without it the waiter sleeps the full 5s timeout. When a reset is warranted the ban is cleared before rearming so that guc_exec_queue_start() can resubmit jobs after the GT reset - a still-banned queue would block resubmission and cause an infinite TDR loop. The already-banned case is gated out before this point via skip_timeout_check, so it is unaffected. v2: (Himal) Do it for any queue type, not just kernel/migration v3: - (Sashiko and Sanjay): don't clear the ban / GT reset for already killed/banned queues on unstarted-job timeout - Update commit message - (Matt) Add Fixes tag Fixes: fe05cee4d953 ("drm/xe: Don't short circuit TDR on jobs not started") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Assisted-by: GitHub-Copilot:claude-sonnet-4.6 Assisted-by: GitHub-Copilot:claude-opus-4.8 Tested-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patch.msgid.link/20260610152548.404575-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit b1107d085e7e8ed15ba6f80c102528a9c8a6cb0e) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
6 days	drm/xe: fix refcount leak in xe_range_fence_insert()	Wentao Liang	1	-0/+2
	xe_range_fence_insert() acquires a reference on fence via dma_fence_get() and stores it in rfence->fence. It then calls dma_fence_add_callback() and handles two cases: when the callback is successfully registered (err == 0) the fence is transferred to the tree for later cleanup; when the fence is already signaled (err == -ENOENT) it manually drops the extra reference with dma_fence_put(fence). However, dma_fence_add_callback() can fail with other errors (e.g. -EINVAL) and in that case the code falls through to the free: label without releasing the acquired reference, leaking it. Fix the leak by adding an else branch that calls dma_fence_put() before jumping to free: for any error other than -ENOENT. Fixes: 845f64bdbfc9 ("drm/xe: Introduce a range-fence utility") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260610172705.3450560-1-matthew.brost@intel.com (cherry picked from commit 98c4a4201290823c2c5c7ba21692bd9a64b61021) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
7 days	drm/xe: include all registered queues in TLB invalidation	Tangudu Tilak Tirumalesh	1	-4/+3
	Context-based TLB invalidation currently selects only scheduling-active exec queues via q->ops->active(). During rebind flows, queues may be suspended (or transitioning through resume) while still owning valid translations, causing them to be skipped from invalidation and leading to missed TLB invalidations on LR rebinds. The underlying issue is a TOCTOU: q->guc->state bits are flipped lock-free from enable_scheduling(), disable_scheduling{,_deregister}(), the suspend/resume sched-msg handlers, handle_sched_done(), and guc_exec_queue_stop(); nothing in send_tlb_inval_ctx_ppgtt() serializes against them, so any state-based predicate can race. Include all the registered queues so that TLB invalidations are not missed. This is race-free because list membership on vm->exec_queues.list is stable under vm->exec_queues.lock held by the caller. The performance impact is expected to be minimal and harmless. If it does turn out to be a concern, we can come back with a race-safe solution to ignore certain queues. Fixes: 6cdaa5346d6f ("drm/xe: Add context-based invalidation to GuC TLB invalidation backend") Assisted-by: Claude:claude-opus-4.6 Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260608162745.338725-2-tilak.tirumalesh.tangudu@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit aa625e1e9f0710e424fe4f0e3f032807df81b5b0) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
7 days	drm/xe/hw_error: Use HW_ERR prefix in log	Raag Jadav	1	-6/+6
	Hardware errors should be logged with HW_ERR prefix. Make them consistent with existing logs. Fixes: 01aab7e1c9d4 ("drm/xe/xe_hw_error: Add support for PVC SoC errors") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patch.msgid.link/20260602044919.702209-5-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit ad60a618c49fef07d1860bfb1091140d29f5eddb) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
7 days	drm/xe/drm_ras: Add per node cleanup action	Raag Jadav	1	-35/+23
	cleanup_node_param() is not registered for previous node in case of counter allocation failure, which results in stale memory of previous node that isn't cleaned up on unwind. Add per node cleanup action which guarantees cleanup on unwind and also simplifies the cleanup logic. Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patch.msgid.link/20260602044919.702209-4-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 67fc5543d8274b2fcbef87734fad0469358f4478) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
7 days	drm/xe/drm_ras: Make counter allocation drm managed	Raag Jadav	1	-2/+1
	cleanup_node_param() is not registered for previous node in case of counter allocation failure, which results in stale memory of previous node that isn't cleaned up on unwind. Fix this using drm managed allocation, which is guaranteed to be cleaned up on unwind. Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patch.msgid.link/20260602044919.702209-3-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 58d77c77ea0c5cb2b755ebe23e973c8272acd896) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
7 days	drm/xe/display: fix oops in suspend/shutdown without display	Jani Nikula	1	-2/+9
	The xe driver keeps track of whether to probe display, and whether display hardware is there, using xe->info.probe_display. It gets set to false if there's no display after intel_display_device_probe(). However, the display may also be disabled via fuses, detected at a later time in intel_display_device_info_runtime_init(). In this case, the xe driver does for_each_intel_crtc() on uninitialized mode config in xe_display_flush_cleanup_work(), leading to a NULL pointer dereference, and generally calls display code with display info cleared. Check for intel_display_device_present() after intel_display_device_info_runtime_init(), and reset xe->info.probe_display as necessary. Also do unset_display_features() for completeness, although display runtime init has already done that. This will need to be unified across all cases later. Move intel_display_device_info_runtime_init() call slightly earlier, similar to i915, to avoid a bunch of unnecessary setup for no display cases. Note #1: The xe driver has no business doing low level display plumbing like for_each_intel_crtc() to begin with. It all needs to happen in display code. Note #2: The actual bug is present already in commit 44e694958b95 ("drm/xe/display: Implement display support"), but the oops was likely introduced later at commit ddf6492e0e50 ("drm/xe/display: Make display suspend/resume work on discrete"). Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7904 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/6150 Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260515160920.1082842-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 7c3eb9f47533220888a67266448185fd0775d4da) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
13 days	drm/xe/multi_queue: skip submit when primary queue is suspended	Niranjana Vishwanathapura	1	-1/+4
	Return early in submit path when the multi-queue primary exec queue is suspended to avoid submitting while suspended. v2: Remove idle_skip_suspend fix as that feature is being reverted here https://patchwork.freedesktop.org/series/167262/ Fixes: bc5775c59258 ("drm/xe/multi_queue: Add GuC interface for multi queue support") Cc: stable@vger.kernel.org # v7.0+ Assisted-by: GitHub-Copilot:claude-sonnet-4.6 Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260603233946.863663-2-niranjana.vishwanathapura@intel.com (cherry picked from commit b7fb55cc3364ca128cfff9d50649ffd4327cd01e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
13 days	drm/xe: Clear pending_disable before signaling suspend fence	Tangudu Tilak Tirumalesh	1	-1/+1
	In the schedule-disable done path for suspend, we signal the suspend fence before clearing pending_disable. That wakeup can let suspend_wait complete and resume be queued immediately. The resume path may then reach enable_scheduling() while pending_disable is still set and hit the !exec_queue_pending_disable(q) assertion. Fix this by clearing pending_disable before signaling the suspend fence, so any resumed transition observes a consistent state. Fixes: 87651f31ae4e ("drm/xe/guc_submit: fix race around suspend_pending") Cc: stable@vger.kernel.org # v7.0+ Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260603065217.3131066-3-tilak.tirumalesh.tangudu@intel.com (cherry picked from commit 4b1ae138b0e103d753773956a84eebc2edbf62c4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
13 days	Revert "drm/xe: Skip exec queue schedule toggle if queue is idle during suspend"	Tangudu Tilak Tirumalesh	3	-77/+5
	This reverts commit 8533051ce92015e9cc6f75e0d52119b9d91610b6. The idle-skip optimization bypasses GuC suspend, so the GPU may not perform the context switch that flushes TLB entries for invalidated userptr VMAs. In LR/preempt-fence VM mode, this can lead to missed TLB invalidation and page faults during userptr invalidation tests. Restore unconditional schedule toggling on suspend so the context-switch TLB flush is always performed. This optimization will be reintroduced with a fix that does not skip suspend in LR/preempt-fence VM mode. Fixes: 8533051ce920 ("drm/xe: Skip exec queue schedule toggle if queue is idle during suspend") Cc: stable@vger.kernel.org # v7.0+ Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260603065217.3131066-2-tilak.tirumalesh.tangudu@intel.com (cherry picked from commit 6a1e7934d9a6cf46aecae00a99c2603d1295e170) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-06-01	Revert "drm/xe/nvls: Define GuC firmware for NVL-S"	Daniele Ceraolo Spurio	1	-1/+0
	This reverts commit 4e88de313ff4d1c67b644b1f39f9fb4089711b71. The early GuC FW definition meant for our CI branch was accidentally merged to the drm-xe-next branch instead. This GuC FW will never be released to linux-firmware, so we do not want the definition to be available in the mainline Linux codebase. Fixes: 4e88de313ff4 ("drm/xe/nvls: Define GuC firmware for NVL-S") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: stable@vger.kernel.org # v7.0+ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260529193558.185436-11-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 65b8e0ac86e48cfc9128c04dfc53ea3395d030dd) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-27	drm/xe: Restore IDLEDLY regiter on engine reset	Balasubramani Vivekanandan	1	-0/+5
	Wa_16023105232 programs the register IDLEDLY. The register is reset whenever the engine is reset. Therefore it should be added to the GuC save-restore register list for it to be restored after reset. Fixes: 7c53ff050ba8 ("drm/xe: Apply Wa_16023105232") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260522163531.1365540-2-balasubramani.vivekanandan@intel.com Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> (cherry picked from commit df1cfe24743a93b71eab27687e148ab8ae9b69e3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-21	drm/xe/oa: Fix exec_queue leak on width check in stream open	Shuicheng Lin	1	-2/+4
	In xe_oa_stream_open_ioctl(), when param.exec_q->width > 1 the function returns -EOPNOTSUPP directly, skipping the existing err_exec_q cleanup path. The exec_queue reference obtained by xe_exec_queue_lookup() is leaked. The exec queue holds a reference on the xe_file, which is only dropped during queue teardown. The leaked lookup ref is not on the file's exec_queue xarray, so file close cannot release it. This keeps both the exec queue and the file private state pinned indefinitely. Jump to err_exec_q instead of returning directly so the reference is released. Fixes: f0ed39830e60 ("xe/oa: Fix query mode of operation for OAR/OAC") Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20260514203210.593488-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 339fa0be9e4a5d69fa47e91f4a36574224fb478f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-19	drm/xe/multi_queue: Fix secondary queue error case	Niranjana Vishwanathapura	1	-8/+8
	If xe_lrc_create() fails, the secondary queue added to the multi-queue group list is not removed before freeing the queue. Fix error path handling for secondary queues by removing it from the multi-queue group list at the right place. Reported-by: Sebastian Österlund <sebastian.osterlund@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7979 Fixes: d716a5088c88 ("drm/xe/multi_queue: Handle tearing down of a multi queue") Cc: stable@vger.kernel.org # v7.0+ Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260518191639.320890-2-niranjana.vishwanathapura@intel.com (cherry picked from commit d2d23c12789cf69eddc35b8d38cd8eaabd0168f1) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-18	drm/xe: Define and use MCR version of COMMON_SLICE_CHICKEN4	Gustavo Sousa	3	-3/+4
	The register COMMON_SLICE_CHICKEN4 is a MCR register on both Xe2 and Xe3. Let's make sure to define a MCR version of it and use it for the relevant IP versions. Use XEHP_ as prefix for the register name, since it is MCR as of Xe_HP. v2: - Also change for one entry in lrc_tunnings, which was caught by manual testing and add corresponging Fixes tag in commit message. (Gustavo) Fixes: 8d6f16f1f082 ("drm/xe: Extend Wa_22021007897 to Xe3 platforms") Fixes: e5c13e2c505b ("drm/xe/xe2hpg: Add Wa_22021007897") Fixes: 8ccf5f6b2295 ("drm/xe/tuning: Apply windower hardware filtering setting on Xe3 and Xe3p") Bspec: 66534, 71185, 74417 Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260514-rtp-mcr-check-v3-3-30dd47855fee@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> (cherry picked from commit 75f65f1a4c06da1d87f28570a9d4cdad28f13360) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-18	drm/xe: Define and use MCR version of COMMON_SLICE_CHICKEN1	Gustavo Sousa	2	-1/+2
	The register COMMON_SLICE_CHICKEN1 is a MCR register on Xe2. Let's make sure to define a MCR version of it and use it for the relevant IP versions. Use XEHP_ as prefix for the register name, since it is MCR as of Xe_HP. Fixes: a5d221924e13 ("drm/xe/xe2_hpg: Add set of workarounds") Fixes: 9f18b55b6d3f ("drm/xe/xe2: Add workaround 18033852989") Bspec: 66534, 71185 Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260514-rtp-mcr-check-v3-2-30dd47855fee@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> (cherry picked from commit a672725fdbfc3ea430130039d677c7dc98d59df8) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-18	drm/xe: Define CACHE_MODE_1 as MCR register	Gustavo Sousa	1	-1/+1
	CACHE_MODE_1 is a MCR register for all platforms that currently use it in the Xe driver. Use XE_REG_MCR() when defining it. Fixes: 8cd7e9759766 ("drm/xe: Add missing DG2 lrc workarounds") Fixes: ff063430caa8 ("drm/xe/mtl: Add some initial MTL workarounds") Bspec: 66534, 67788 Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260514-rtp-mcr-check-v3-1-30dd47855fee@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> (cherry picked from commit 8f765f0c054e0fb39980a76b4c899b027395929d) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-18	drm/xe/pf: Fix CFI failure in debugfs access	Mohanram Meenakshisundaram	2	-2/+6
	Reading debugfs file (/sys/kernel/debug/dri/0/gt*/pf/adverse_events) with CFI (Control Flow Integrity) enabled, the kernel panics at xe_gt_debugfs_simple_show+0x82/0xc0. xe_gt_debugfs_simple_show() declare a function pointer expecting int return type, but xe_gt_sriov_pf_monitor_print_events() is void return type, leading to CFI failure and kernel panic. [507620.973657] CFI failure at xe_gt_debugfs_simple_show+0x82/0xc0 [xe] (target: xe_gt_sriov_pf_monitor_print_events+0x0/0x130 [xe]; expected type: 0xd72c7139) Fix xe_gt_sriov_pf_monitor_print_events() function by updating to return an int type. Fixes: 1c99d3d3edab ("drm/xe/pf: Expose PF monitor details via debugfs") Signed-off-by: Mohanram Meenakshisundaram <mohanram.meenakshisundaram@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260514174918.1556357-2-mohanram.meenakshisundaram@intel.com (cherry picked from commit ff1d386a8359746d9699ac30336e3b0684c68958) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-18	drm/xe/vf: Fix signature of print functions	Michal Wajdeczko	2	-9/+21
	We have plugged-in existing VF print functions into our GT debugfs show helper as-is, but we missed that the helper expects functions to return int, while they were defined as void. This can lead to errors being reported when CFI is enabled. Fixes: 63d8cb8fe3dd ("drm/xe/vf: Expose SR-IOV VF attributes to GT debugfs") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Mohanram Meenakshisundaram <mohanram.meenakshisundaram@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260514155726.7165-1-michal.wajdeczko@intel.com (cherry picked from commit 314e31c9a8a1c421ee4f7f755b9348aefbbca090) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-18	drm/xe/gsc: Fix double-free of managed BO in error path	Shuicheng Lin	1	-4/+1
	The error path in xe_gsc_init_post_hwconfig() explicitly frees a BO allocated with xe_managed_bo_create_pin_map() via xe_bo_unpin_map_no_vm(). Since the managed BO already has a devm cleanup action registered, this causes a double-free when devm unwinds during probe failure. Remove the explicit free and let devm handle it, consistent with all other xe_managed_bo_create_pin_map() callers. Fixes: 2e5d47fe7839 ("drm/xe/uc: Use managed bo for HuC and GSC objects") Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Assisted-by: Claude:claude-opus-4.6 Link: https://patch.msgid.link/20260511154134.223696-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 71d61e3e299a17139e47f980a4d6f425b2c59bf7) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-18	drm/xe/memirq: Update interrupt handler logic	Michal Wajdeczko	1	-4/+22
	To workaround some corner case hardware limitations, new programming note for the memory based interrupt handler suggests to assume that some status bytes, like GT_MI_USER_INTERRUPT and GUC_INTR_GUC2HOST, are always set. Update our interrupt handler to follow the new rules. Bspec: 53672 Fixes: a6581ebe7685 ("drm/xe/vf: Introduce Memory Based Interrupts Handler") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patch.msgid.link/20260511172838.2299-2-michal.wajdeczko@intel.com (cherry picked from commit 284f4cae4579eed9dd4406f18a6c1becc69f8931) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-13	drm/xe: Drop unused ggtt_balloon field	Michal Wajdeczko	1	-2/+0
	During recent GGTT refactoring we missed to drop now unused field from the xe_tile. Drop it now. Fixes: e904c56ba6e0 ("drm/xe: Rewrite GGTT VF initialization") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260510205605.642-1-michal.wajdeczko@intel.com (cherry picked from commit 21d5a871f57909dc4d8e4f5d3bf92f9ccf2597b2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-11	drm/xe/dma-buf: fix UAF with retry loop	Matthew Auld	1	-37/+12
	Retry doesn't work here, since bo will be freed on error, leading to UAF. However, now that we do the alloc & init before the attach, we can now combine this as one unit and have the init do the alloc for us. This should make the retry safe. Reported by Sashiko. v2: Fix up the error unwind (CI) Closes: https://sashiko.dev/#/patchset/20260506184332.86743-2-matthew.auld%40intel.com Fixes: eb289a5f6cc6 ("drm/xe: Convert xe_dma_buf.c for exhaustive eviction") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.18+ Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260508102635.149172-4-matthew.auld@intel.com (cherry picked from commit 479669418253e0f27f8cf5db01a731352ea592e7) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-11	drm/xe/dma-buf: handle empty bo and UAF races	Matthew Auld	1	-15/+16
	There look to be some nasty races here when triggering the invalidate_mappings hook: 1) We do xe_bo_alloc() followed by the attach, before the actual full bo init step in xe_dma_buf_init_obj(). However the bo is visible on the attachments list after the attach. This is bad since exporter driver, say amdgpu, can at any time call back into our invalidate_mappings hook, with an empty/bogus bo, leading to potential bugs/crashes. 2) Similar to 1) but here we get a UAF, when the invalidate_mappings hook is triggered. For example, we get as far as xe_bo_init_locked() but this fails in some way. But here the bo will be freed on error, but we still have it attached from dma-buf pov, so if the invalidate_mappings is now triggered then the bo we access is gone and we trigger UAF and more bugs/crashes. To fix this, move the attach step until after we actually have a fully set up buffer object. Note that the bo is not published to userspace until later, so not sure what the comment "Don't publish the bo until we have a valid attachment", is referring to. We have at least two different customers reporting hitting a NULL ptr deref in evict_flags when importing something from amdgpu, followed by triggering the evict flow. Hit rate is also pretty low, which would hint at some kind of race, so something like 1) or 2) might explain this. v2: - Shuffle the order of the ops slightly (no functional change) - Improve the comment to better explain the ordering (Matt B) Assisted-by: Gemini:gemini-3 #debug Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7903 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/4055 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260508102635.149172-3-matthew.auld@intel.com (cherry picked from commit af1f2ad0c59fe4e2f924c526f66e968289d77971) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-11	drm/xe: Make decision to use Xe2-style blitter instructions a feature flag	Matt Roper	3	-16/+18
	The blitter engines' MEM_COPY and MEM_SET instructions were added as part of the same hardware change that introduced service copy engines (i.e., BCS1-BCS8) which is why the driver checks for service copy engine presence when deciding whether to use these instructions or the older XY_* instructions. However when making this decision the driver should consider which engines are part of the hardware architecture, not which engines are present/usable on the current device. For graphics IP versions that architecturally include service copy engines (i.e., everything Xe2 and later, plus PVC's Xe_HPC) we should use MEM_SET and MEM_COPY even in if all of the service copy engines wind up getting fused off. I.e., we need to decide based on whether the platform's graphics descriptor contains these engines, rather than whether the usable engine mask contains them. This logic got broken when gt->info.__engine_mask was removed, although in practice that mistake has been harmless so far because there haven't been any hardware SKUs that fuse off all of the service copy engines yet. Replace the incorrect has_service_copy_support() function with a GT feature flag that tracks more accurately whether the new blitter instructions are usable. In addition to fixing incorrect logic if all service copies are fused off, the flag also makes it more obvious what the calling code is trying to do; previously it wasn't terribly obvious why "has service copy engines" was being used as the condition for using different instructions on all copy engine types. The new feature flag is named 'has_xe2_blt_instructions' because we expect this flag to be set for all Xe2 and later platforms (i.e., everything officially supported by the Xe driver). Technically there's also one Xe1-era platform (PVC) that supports these engines/instructions and will set this flag, but this still seems to be the most clear and understandable name for the flag. Fixes: 61549a2ee594 ("drm/xe: Drop __engine_mask") Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patch.msgid.link/20260507-xe2_copy-v1-1-26506381b821@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 09b399842907565a64e351fb22da790b4c673ffb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-11	drm/xe/madvise: Track purgeability with BO-local counters	Arvind Yadav	7	-175/+190
	xe_bo_recompute_purgeable_state() walks all VMAs of a BO to determine whether the BO can be made purgeable. This makes VMA create/destroy and madvise updates O(n) in the number of mappings. Replace the walk with BO-local counters protected by the BO dma-resv lock: - vma_count tracks the number of VMAs mapping the BO. - willneed_count tracks active WILLNEED holders, including WILLNEED VMAs and active dma-buf exports for non-imported BOs. A DONTNEED BO is promoted back to WILLNEED on a 0->1 transition of willneed_count. A BO is demoted to DONTNEED on a 1->0 transition only when it still has VMAs, preserving the previous behaviour where a BO with no mappings keeps its current madvise state. PURGED remains terminal, preserving the existing "once purged, always purged" rule. Fixes: 4f44961eab84 ("drm/xe/vm: Prevent binding of purged buffer objects") v2: - Use early return for imported BOs in all four helpers to avoid nesting (Matt B). - Group purgeability state into a purgeable sub-struct on struct xe_bo (Matt B). - Reword xe_bo_willneed_put_locked() kernel-doc to explain that a 1->0 transition means all remaining active VMAs are DONTNEED (Matt B). v3: - Move DONTNEED/PURGED reject from vma_lock_and_validate() into xe_vma_create(), gated on attr->purgeable_state == WILLNEED. Fixes vm_bind bypass and partial-unbind rejection on DONTNEED BOs (Matt B). - Drop .check_purged from MAP and REMAP; keep it for PREFETCH and add a comment why (Matt B). - Skip BO validation in vma_lock_and_validate() for non-WILLNEED VMA remnants so cleanup/remap paths do not repopulate DONTNEED/PURGED BOs. Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260506132027.2556046-1-arvind.yadav@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> (cherry picked from commit 23fb2ea56cb4fa2587bc072b04e4e698687a48e4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-05-06	drm/xe/guc: Exclude indirect ring state page from ADS engine state size	Satyanarayana K V P	3	-7/+11
	The engine state size reported to GuC via ADS should only include the engine state portion and should not include the indirect ring state page that comes after it in the context image. The GuC uses this size to overwrite the engine state in the LRC on watchdog resets and we don't want it to overwrite the indirect ring state as well. Fixes: d6219e1cd5e3 ("drm/xe: Add Indirect Ring State support") Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260504094924.3760713-4-satyanarayana.k.v.p@intel.com (cherry picked from commit 3ec5f003f6c377beda8bd5438941f5a7795e1848) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
2026-05-06	drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration	Shuicheng Lin	1	-4/+4
	pf_migration_mmio_save() and pf_migration_mmio_restore() initialize a local VF-specific MMIO view via xe_mmio_init_vf_view() but then pass &gt->mmio (the PF base) to all xe_mmio_read32()/xe_mmio_write32() calls instead of the local &mmio. This causes the PF own SW flag registers to be saved/restored rather than the target VF registers, silently corrupting migration state. Use the VF MMIO view for all register accesses, matching the correct pattern used in pf_clear_vf_scratch_regs(). Fixes: b7c1b990f719 ("drm/xe/pf: Handle MMIO migration data as part of PF control") Cc: Michał Winiarski <michal.winiarski@intel.com> Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260429192259.4009211-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 7d9c39cfb31ff389490ca1308767c2807a9829a6) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
2026-05-06	drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()	Shuicheng Lin	1	-3/+4
	PTR_ERR() returns a negative value, so comparing against the positive EAGAIN is always true for ERR_PTR(-EAGAIN), causing pf_migration_consume() to bail out instead of continuing to the remaining GTs. On multi-GT platforms this can skip GTs that already have data ready. Compare against -EAGAIN to match the intent (and the following line that correctly uses -EAGAIN). While at it, gate PTR_ERR() with IS_ERR(). v2: add IS_ERR() guard before PTR_ERR(). (Gustavo) Fixes: 67df4a5cbc58 ("drm/xe/pf: Add data structures and handlers for migration rings") Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20260428201448.3999428-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 9d770e72e1edb54beacfce5f402edb51632811e3) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
2026-05-06	drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()	Gustavo Sousa	1	-2/+10
	When media GT is disabled via configfs, there is no allocation for media_gt, which is kept as NULL. In such scenario, intel_hdcp_gsc_check_status() results in a kernel pagefault error due to &gt->uc.gsc being evaluated as an invalid memory address. Fix that by introducing a NULL check on media_gt and bailing out early if so. While at it, also drop the NULL check for gsc, since it can't be NULL if media_gt is not NULL. v2: - Get address for gsc only after checking that gt is not NULL. (Shuicheng) - Drop the NULL check for gsc. (Shuicheng) v3: - Add "Fixes" and "Cc: <stable...>" tags. (Matt) Fixes: 4af50beb4e0f ("drm/xe: Use gsc_proxy_init_done to check proxy status") Cc: <stable@vger.kernel.org> # v6.10+ Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260416-check-for-null-media_gt-in-intel_hdcp_gsc_check_status-v2-1-9adb9fd3b621@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> (cherry picked from commit bfaf87e84ca3ca3f6e275f9ae56da47a8b55ffd1) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
2026-04-29	drm/xe/uapi: Reject coh_none PAT index for CPU_ADDR_MIRROR	Jia Yao	1	-0/+2
	Add validation in xe_vm_bind_ioctl() to reject PAT indices with XE_COH_NONE coherency mode when used with DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR. CPU address mirror mappings use system memory that is CPU cached, which makes them incompatible with COH_NONE PAT indices. Allowing COH_NONE with CPU cached buffers is a security risk, as the GPU may bypass CPU caches and read stale sensitive data from DRAM. Although CPU_ADDR_MIRROR does not create an immediate mapping, the backing system memory is still CPU cached. Apply the same PAT coherency restrictions as DRM_XE_VM_BIND_OP_MAP_USERPTR. v2: - Correct fix tag v6: - No change v7: - Correct fix tag v8: - Rebase v9: - Limit the restrictions to iGPU v10: - Just add the iGPU logic but keep dGPU logic Fixes: b43e864af0d4 ("drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR") Cc: <stable@vger.kernel.org> # v6.15+ Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Mathew Alwin <alwin.mathew@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260417055917.2027459-3-jia.yao@intel.com (cherry picked from commit 4d58d7535e826a3175527b6174502f0db319d7f6) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/uapi: Reject coh_none PAT index for CPU cached memory in madvise	Jia Yao	1	-0/+47
	Add validation in xe_vm_madvise_ioctl() to reject PAT indices with XE_COH_NONE coherency mode when applied to CPU cached memory. Using coh_none with CPU cached buffers is a security issue. When the kernel clears pages before reallocation, the clear operation stays in CPU cache (dirty). GPU with coh_none can bypass CPU caches and read stale sensitive data directly from DRAM, potentially leaking data from previously freed pages of other processes. This aligns with the existing validation in vm_bind path (xe_vm_bind_ioctl_validate_bo). v2(Matthew brost) - Add fixes - Move one debug print to better place v3(Matthew Auld) - Should be drm/xe/uapi - More Cc v4(Shuicheng Lin) - Fix kmem leak issues by the way v5 - Remove kmem leak because it has been merged by another patch v6 - Remove the fix which is not related to current fix v7 - No change v8 - Rebase v9 - Limit the restrictions to iGPU v10 - No change Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe") Cc: <stable@vger.kernel.org> # v6.18+ Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Mathew Alwin <alwin.mathew@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260417055917.2027459-2-jia.yao@intel.com (cherry picked from commit 016ccdb674b8c899940b3944952c96a6a490d10a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/xelp: Fix Wa_18022495364	Tvrtko Ursulin	1	-1/+1
	Command parser relative MMIO addressing needs to be enabled when writing to the register. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: ca33cd271ef9 ("drm/xe/xelp: Add Wa_18022495364") Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260420131603.70357-1-tvrtko.ursulin@igalia.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 5627392001802a98ed6cf8cf79a303abd00d1c0f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/gsc: Fix BO leak on error in query_compatibility_version()	Shuicheng Lin	1	-1/+1
	When xe_gsc_read_out_header() fails, query_compatibility_version() returns directly instead of jumping to the out_bo label. This skips the xe_bo_unpin_map_no_vm() call, leaving the BO pinned and mapped with no remaining reference to free it. Fix by using goto out_bo so the error path properly cleans up the BO, consistent with the other error handling in the same function. Fixes: 0881cbe04077 ("drm/xe/gsc: Query GSC compatibility version") Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260417163308.3416147-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 8de86d0a843c32ca9d36864bdb92f0376a830bce) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/eustall: Fix drm_dev_put called before stream disable in close	Shuicheng Lin	1	-2/+2
	In xe_eu_stall_stream_close(), drm_dev_put() is called before the stream is disabled and its resources are freed. If this drops the last reference, the device structures could be freed while the subsequent cleanup code still accesses them, leading to a use-after-free. Fix this by moving drm_dev_put() after all device accesses are complete. This matches the ordering in xe_oa_release(). Fixes: 9a0b11d4cf3b ("drm/xe/eustall: Add support to init, enable and disable EU stall sampling") Cc: Harish Chegondi <harish.chegondi@intel.com> Assisted-by: Claude:claude-opus-4.6 Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20260415225428.3399934-1-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 35aff528f7297e949e5e19c9cd7fd748cf1cf21c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl()	Shuicheng Lin	1	-2/+5
	Two error handling issues exist in xe_exec_queue_create_ioctl(): 1. When xe_hw_engine_group_add_exec_queue() fails, the error path jumps to put_exec_queue which skips xe_exec_queue_kill(). If the VM is in preempt fence mode, xe_vm_add_compute_exec_queue() has already added the queue to the VM's compute exec queue list. Skipping the kill leaves the queue on that list, leading to a dangling pointer after the queue is freed. 2. When xa_alloc() fails after xe_hw_engine_group_add_exec_queue() has succeeded, the error path does not call xe_hw_engine_group_del_exec_queue() to remove the queue from the hw engine group list. The queue is then freed while still linked into the hw engine group, causing a use-after-free. Fix both by: - Changing the xe_hw_engine_group_add_exec_queue() failure path to jump to kill_exec_queue so that xe_exec_queue_kill() properly removes the queue from the VM's compute list. - Adding a del_hw_engine_group label before kill_exec_queue for the xa_alloc() failure path, which removes the queue from the hw engine group before proceeding with the rest of the cleanup. Fixes: 7970cb36966c ("'drm/xe/hw_engine_group: Register hw engine group's exec queues") Cc: Francois Dugast <francois.dugast@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408020647.3397933-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 37c831f401746a45d510b312b0ed7a77b1e06ec8) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe: Fix dma-buf attachment leak in xe_gem_prime_import()	Shuicheng Lin	1	-4/+7
	When xe_dma_buf_init_obj() fails, the attachment from dma_buf_dynamic_attach() is not detached. Add dma_buf_detach() before returning the error. Note: we cannot use goto out_err here because xe_dma_buf_init_obj() already frees bo on failure, and out_err would double-free it. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Mattheq Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408175255.3402838-5-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit a828eb185aac41800df8eae4b60501ccc0dbbe51) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe: Fix bo leak in xe_dma_buf_init_obj() on allocation failure	Shuicheng Lin	1	-1/+11
	When drm_gpuvm_resv_object_alloc() fails, the pre-allocated storage bo is not freed. Add xe_bo_free(storage) before returning the error. xe_dma_buf_init_obj() calls xe_bo_init_locked(), which frees the bo on error. Therefore, xe_dma_buf_init_obj() must also free the bo on its own error paths. Otherwise, since xe_gem_prime_import() cannot distinguish whether the failure originated from xe_dma_buf_init_obj() or from xe_bo_init_locked(), it cannot safely decide whether the bo should be freed. Add comments documenting the ownership semantics: on success, ownership of storage is transferred to the returned drm_gem_object; on failure, storage is freed before returning. v2: Add comments to explain the free logic. Fixes: eb289a5f6cc6 ("drm/xe: Convert xe_dma_buf.c for exhaustive eviction") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408175255.3402838-4-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 78a6c5f899f22338bbf48b44fb8950409c5a69b9) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/bo: Fix bo leak on GGTT flag validation in xe_bo_init_locked()	Shuicheng Lin	1	-1/+3
	When XE_BO_FLAG_GGTT_ALL is set without XE_BO_FLAG_GGTT, the function returns an error without freeing a caller-provided bo, violating the documented contract that bo is freed on failure. Add xe_bo_free(bo) before returning the error. Fixes: 5a3b0df25d6a ("drm/xe: Allow bo mapping on multiple ggtts") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408175255.3402838-3-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 3fbd6cf43cac7b60757f3ce3d95195d3843a902c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/bo: Fix bo leak on unaligned size validation in xe_bo_init_locked()	Shuicheng Lin	1	-1/+3
	When type is ttm_bo_type_device and aligned_size != size, the function returns an error without freeing a caller-provided bo, violating the documented contract that bo is freed on failure. Add xe_bo_free(bo) before returning the error. Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408175255.3402838-2-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 601c2aa087b6f21014300a3f107a08ee4dde7bdf) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe: Fix potential NULL deref in ↵	Shuicheng Lin	1	-1/+1
	xe_exec_queue_tlb_inval_last_fence_put_unlocked xe_exec_queue_tlb_inval_last_fence_put_unlocked() uses q->vm->xe as the first argument to xe_assert(). This function is called unconditionally from xe_exec_queue_destroy() for all queues, including kernel queues that have q->vm == NULL (e.g., queues created during GT init in xe_gt_record_default_lrcs() with vm=NULL). While current compilers optimize away the q->vm->xe dereference (even in CONFIG_DRM_XE_DEBUG=y builds, the compiler pushes the dereference into the WARN branch that is only taken when the assert condition is false), the code is semantically incorrect and constitutes undefined behavior in the C abstract machine for the NULL pointer case. Use gt_to_xe(q->gt) instead, which is always valid for any exec queue. This is consistent with how xe_exec_queue_destroy() itself obtains the xe_device pointer in its own xe_assert at the top of the function. Fixes: b2d7ec41f2a3 ("drm/xe: Attach last fence to TLB invalidation job queues") Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260409003449.3405767-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 96078a1c68bf97f17fd1d08c3f58f5c5cc9ccd65) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/vf: Use drm mm instead of drm sa for CCS read/write	Satyanarayana K V P	4	-55/+63
	The suballocator algorithm tracks a hole cursor at the last allocation and tries to allocate after it. This is optimized for fence-ordered progress, where older allocations are expected to become reusable first. In fence-enabled mode, that ordering assumption holds. In fence-disabled mode, allocations may be freed in arbitrary order, so limiting allocation to the current hole window can miss valid free space and fail allocations despite sufficient total space. Use DRM memory manager instead of sub-allocator to get rid of this issue as CCS read/write operations do not use fences. Fixes: 864690cf4dd6 ("drm/xe/vf: Attach and detach CCS copy commands with BO") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408110145.1639937-6-satyanarayana.k.v.p@intel.com (cherry picked from commit 6c84b493012aeb05dec29c709377bf0e17ac6815) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe: Add memory pool with shadow support	Satyanarayana K V P	4	-0/+460
	Add a memory pool to allocate sub-ranges from a BO-backed pool using drm_mm. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408110145.1639937-5-satyanarayana.k.v.p@intel.com (cherry picked from commit 1ce3229f8f269a245ff3b8c65ffae36b4d6afb93) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/debugfs: Correct printing of register whitelist ranges	Matt Roper	1	-1/+1
	The register-save-restore debugfs prints whitelist entries as offset ranges. E.g., REG[0x39319c-0x39319f]: allow read access for a single dword-sized register. However the GENMASK value used to set the lower bits to '1' for the upper bound of the whitelist range incorrectly included one more bit than it should have, causing the whitelist ranges to sometimes appear twice as large as they really were. For example, REG[0x6210-0x6217]: allow rw access was also intended to be a single dword-sized register whitelist (with a range 0x6210-0x6213) but was printed incorrectly as a qword-sized range because one too many bits was flipped on. Similar 'off by one' logic was applied when printing 4-dword register ranges and 64-dword register ranges as well. Correct the GENMASK logic to print these ranges in debugfs correctly. No impact outside of correcting the misleading debugfs output. Fixes: d855d2246ea6 ("drm/xe: Print whitelist while applying") Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260408-regsr_wl_range-v1-1-e9a28c8b4264@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 1a2a722ff96749734a5585dfe7f0bea7719caa8b) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe: Mark ROW_CHICKEN5 as a masked register	Matt Roper	1	-1/+1
	ROW_CHICKEN5 is a masked register (i.e., to adjust the value of any of the lower 16 bits, the corresponding bit in the upper 16 bits must also be set). Add the XE_REG_OPTION_MASKED to its definition; failure to do so will cause workaround updates of this register to not apply properly. Bspec: 56853 Fixes: 835cd6cbb0d0 ("drm/xe/xe3p_lpg: Add initial workarounds for graphics version 35.10") Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20260410-xe3p_tuning-v1-3-e206a62ee38f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit cd84bfbba7feb4c1e72356f14de026dfda1a9e2a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/tuning: Use proper register offset for GAMSTLB_CTRL	Matt Roper	1	-1/+1
	From Xe2 onward (i.e., all platforms officially supported by the Xe driver), the GAMSTLB_CTRL register is located at offset 0x477C and represented by the macro "GAMSTLB_CTRL" in code. However the register formerly resided at offset 0xCF4C on Xe1-era platforms, and we also have macro XEHP_GAMSTLB_CTRL that represents this old offset in the unofficial/developer-only Xe1 code. When tuning for the register was added for Xe3p_LPG, the old Xe1-era macro was accidentally used instead of the proper macro for Xe2 and beyond, causing the tuning to not be applied properly. Use the proper definition so that the correct offset is written to. Bspec: 59298 Fixes: 377c89bfaa5d ("drm/xe/xe3p_lpg: Set STLB bank hash mode to 4KB") Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20260410-xe3p_tuning-v1-2-e206a62ee38f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 0b1676eafdd1ba5a5436bdca0d2a25ce56699783) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/xe3p_lpg: Add missing indirect ring state feature flag	Gustavo Sousa	1	-0/+1
	Even though commit 8fcb7dfb8bbf ("drm/xe/xe3p_lpg: Add support for graphics IP 35.10") mentions that the support for Indirect Ring State exists for Xe3p_LPG, it missed actually setting the feature flag in graphics_xe3p_lpg. Fix that by adding the missing member. Fixes: 8fcb7dfb8bbf ("drm/xe/xe3p_lpg: Add support for graphics IP 35.10") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260401-xe3p_lpg-indirect-ring-state-v1-1-0e4b5edf6898@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> (cherry picked from commit ec4f4970eb744fd7d6d135f40f5c83bd05982e72) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe: Drop redundant rtp entries for Wa_14019988906 & Wa_14019877138	Matt Roper	1	-8/+0
	There appears to have been a silent merge conflict between some commits updating the workaround tables on Xe's -fixes and -next branches: - Commit bc6387a2e0c1 ("drm/xe/xe2_hpg: Fix handling of Wa_14019988906 & Wa_14019877138") from the fixes branch moved the Xe2_HPG instance of two workarounds touching the PSS_CHICKEN register from the engine_was[] table to the lrc_was[] table; the equivalent implementation for all other platforms/IPs were already properly located on lrc_was[]. This commit on the fixes branch is a cherry-pick of commit e04c609eedf4 ("drm/xe/xe2_hpg: Fix handling of Wa_14019988906 & Wa_14019877138") that already existed on the next branch. - Commit 55b19abb6c44 ("drm/xe: Consolidate workaround entries for Wa_14019877138") and commit c2142a1a8415 ("drm/xe: Consolidate workaround entries for Wa_14019988906") consolidated the individual entries per IP generation for each workaround into single, larger range-based entries. During merge conflict resolution the Xe2_HPG-specific entries (i.e., those with rule "GRAPHICS_VERSION_RANGE(2001, 2002)") were accidentally resurrected, even though the table already contains the consolidated entries that match a superset of thse ranges. These redundant entries don't cause any build failures but do trigger a dmesg error during probe on BMG-G21 devices: xe 0000:03:00.0: [drm] ERROR Tile0: GT0: discarding save-restore reg 7044 (clear: 00000400, set: 00000400, masked: yes, mcr: yes): ret=-22 xe 0000:03:00.0: [drm] ERROR Tile0: GT0: discarding save-restore reg 7044 (clear: 00000020, set: 00000020, masked: yes, mcr: yes): ret=-22 Re-drop the Xe2_HPG-specific table entries to eliminate the error. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7433 Fixes: 17b95278ae6a ("Merge tag 'drm-xe-next-2026-03-02' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next") Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260401-wa_merge_conflict-v1-1-b477ab53fedc@intel.com Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> (cherry picked from commit c79bc999442ff3c0908ab8bce92b2a3cb7d59861) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe/vm: Add missing pad and extensions check	Jonathan Cavitt	1	-1/+2
	Add missing pad and extensions check to xe_vm_get_property_ioctl v2: - Combine with other check (Auld) Fixes: 50c577eab051 ("drm/xe/xe_vm: Implement xe_vm_get_property_ioctl") Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260331181216.37775-2-jonathan.cavitt@intel.com (cherry picked from commit 896070686b16cc45cca7854be2049923b2b303d3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29	drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge()	Matthew Brost	1	-24/+9
	xe_guc_submit_wedge() runs in the DMA-fence signaling path, where GFP_KERNEL memory allocations are not permitted. However, registering guc_submit_wedged_fini via drmm_add_action_or_reset() triggers such an allocation. Avoid this by moving the logic from guc_submit_wedged_fini() into guc_submit_fini(), where wedged exec queue references are dropped during normal teardown. Fixes: 8ed9aaae39f3 ("drm/xe: Force wedged state and block GT reset upon any GPU hang") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260326210116.202585-3-matthew.brost@intel.com (cherry picked from commit 4a706bd93c4fb156a13477e26ffdf2e633edeb10) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>