kernel/linux.git/drivers/gpu/drm/xe/xe_exec_queue.c, branch v6.18.34

drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl()

2026-05-23T11:07:12+00:00

[ Upstream commit f3cc22d4df3ed58439ea7e21daa54c3608e03b78 ] Two error handling issues exist in xe_exec_queue_create_ioctl(): 1. When xe_hw_engine_group_add_exec_queue() fails, the error path jumps to put_exec_queue which skips xe_exec_queue_kill(). If the VM is in preempt fence mode, xe_vm_add_compute_exec_queue() has already added the queue to the VM's compute exec queue list. Skipping the kill leaves the queue on that list, leading to a dangling pointer after the queue is freed. 2. When xa_alloc() fails after xe_hw_engine_group_add_exec_queue() has succeeded, the error path does not call xe_hw_engine_group_del_exec_queue() to remove the queue from the hw engine group list. The queue is then freed while still linked into the hw engine group, causing a use-after-free. Fix both by: - Changing the xe_hw_engine_group_add_exec_queue() failure path to jump to kill_exec_queue so that xe_exec_queue_kill() properly removes the queue from the VM's compute list. - Adding a del_hw_engine_group label before kill_exec_queue for the xa_alloc() failure path, which removes the queue from the hw engine group before proceeding with the rest of the cleanup. Fixes: 7970cb36966c ("'drm/xe/hw_engine_group: Register hw engine group's exec queues") Cc: Francois Dugast Cc: Matthew Brost Cc: Niranjana Vishwanathapura Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Matthew Brost Link: https://patch.msgid.link/20260408020647.3397933-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin (cherry picked from commit 37c831f401746a45d510b312b0ed7a77b1e06ec8) Signed-off-by: Rodrigo Vivi Signed-off-by: Sasha Levin

drm/xe/uapi: disallow bind queue sharing

2026-01-30T09:32:19+00:00

[ Upstream commit 6f4b7aed61817624250e590ba0ef304146d34614 ] Currently this is very broken if someone attempts to create a bind queue and share it across multiple VMs. For example currently we assume it is safe to acquire the user VM lock to protect some of the bind queue state, but if allow sharing the bind queue with multiple VMs then this quickly breaks down. To fix this reject using a bind queue with any VM that is not the same VM that was originally passed when creating the bind queue. This a uAPI change, however this was more of an oversight on kernel side that we didn't reject this, and expectation is that userspace shouldn't be using bind queues in this way, so in theory this change should go unnoticed. Based on a patch from Matt Brost. v2 (Matt B): - Hold the vm lock over queue create, to ensure it can't be closed as we attach the user_vm to the queue. - Make sure we actually check for NULL user_vm in destruction path. v3: - Fix error path handling. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Thomas Hellström Signed-off-by: Matthew Auld Cc: José Roberto de Souza Cc: Matthew Brost Cc: Michal Mrozek Cc: Carl Zhang Cc: # v6.8+ Acked-by: José Roberto de Souza Reviewed-by: Matthew Brost Reviewed-by: Arvind Yadav Acked-by: Michal Mrozek Link: https://patch.msgid.link/20260120110609.77958-3-matthew.auld@intel.com (cherry picked from commit 9dd08fdecc0c98d6516c2d2d1fa189c1332f8dab) Signed-off-by: Thomas Hellström Stable-dep-of: 772157f626d0 ("drm/xe/migrate: fix job lock assert") Signed-off-by: Sasha Levin

drm/xe: Enforce correct user fence signaling order using

2025-11-07T11:55:19+00:00

Prevent application hangs caused by out-of-order fence signaling when user fences are attached. Use drm_syncobj (via dma-fence-chain) to guarantee that each user fence signals in order, regardless of the signaling order of the attached fences. Ensure user fence writebacks to user space occur in the correct sequence. v7: - Skip drm_syncbj create of error (CI) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost Reviewed-by: Thomas Hellström Link: https://patch.msgid.link/20251031234050.3043507-2-matthew.brost@intel.com (cherry picked from commit adda4e855ab6409a3edaa585293f1f2069ab7299) Signed-off-by: Lucas De Marchi

drm/xe: Fix error handling if PXP fails to start

2025-09-16T22:54:28+00:00

Since the PXP start comes after __xe_exec_queue_init() has completed, we need to cleanup what was done in that function in case of a PXP start error. __xe_exec_queue_init calls the submission backend init() function, so we need to introduce an opposite for that. Unfortunately, while we already have a fini() function pointer, it performs other operations in addition to cleaning up what was done by the init(). Therefore, for clarity, the existing fini() has been renamed to destroy(), while a new fini() has been added to only clean up what was done by the init(), with the latter being called by the former (via xe_exec_queue_fini). Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio Cc: John Harrison Cc: Matthew Brost Reviewed-by: John Harrison Signed-off-by: John Harrison Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com

drm/xe: s/tlb_invalidation/tlb_inval

2025-08-27T18:49:00+00:00

tlb_invalidation is a bit verbose leading to ugly wraps in the code, shorten to tlb_inval. Signed-off-by: Stuart Summers Reviewed-by: Stuart Summers Signed-off-by: Matthew Brost Link: https://lore.kernel.org/r/20250826182911.392550-4-stuart.summers@intel.com

drm/xe/vf: Refactor CCS save/restore to use default migration context

2025-08-08T17:29:37+00:00

Previously, CCS save/restore operations created separate migration contexts with new VM memory allocations, resulting in significant overhead. This commit eliminates redundant context creation reusing the default migration context by registering new execution queues for CCS save and restore on the existing migrate VM. Signed-off-by: Satyanarayana K V P Suggested-by: Matthew Brost Cc: Michal Wajdeczko Cc: John Harrison Reviewed-by: Matthew Brost Reviewed-by: Stuart Summers Signed-off-by: Matthew Brost Link: https://lore.kernel.org/r/20250808073628.32745-2-satyanarayana.k.v.p@intel.com

drm/xe/vf: Refresh utilization buffer during migration recovery

2025-08-04T14:47:07+00:00

The WA buffer we use to capture context utilization contains GGTT references. This means its instructions have to be either fixed or re-emitted during VF post-migration recovery. This patch adds re-emitting content of the utilization WA BB during the recovery. The way we write to vram requires scratch buffer to be used before the whole block is memcopied. We are re-using a scratch buffer introduced in earlier part of the recovery. This is not a performance optimization, but a necessity to avoid creating dependencies between locks. v2: Notable rebase after "Prepare WA BB setup for more users" patch v3: Added error propagation Signed-off-by: Tomasz Lis Cc: Michal Wajdeczko Cc: Michal Winiarski Reviewed-by: Michal Winiarski Link: https://lore.kernel.org/r/20250802031045.1127138-8-tomasz.lis@intel.com Signed-off-by: Michał Winiarski

drm/xe/vf: Post migration, repopulate ring area for pending request

2025-08-04T14:46:56+00:00

The commands within ring area allocated for a request may contain references to GGTT. These references require update after VF migration, in order to continue any preempted LRCs, or jobs which were emitted to the ring but not sent to GuC yet. This change calls the emit function again for all such jobs, as part of post-migration recovery. v2: Moved few functions to better files v3: Take job_list_lock v4: Rephrased comments Signed-off-by: Tomasz Lis Cc: Michal Wajdeczko Cc: Michal Winiarski Reviewed-by: Michal Winiarski Cc: Jonathan Cavitt Reviewed-by: Jonathan Cavitt Link: https://lore.kernel.org/r/20250802031045.1127138-7-tomasz.lis@intel.com Signed-off-by: Michał Winiarski

drm/xe/vf: Rebase MEMIRQ structures for all contexts after migration

2025-08-04T14:46:52+00:00

All contexts require an update of state data, as the data includes GGTT references to memirq-related buffers. Default contexts need these references updated as well, because they are not refreshed when a new context is created from them. The way we write to vram requires scratch buffer to be used before the whole block is memcopied. Since using kalloc() within specific recovery functions would lead to unintended relations between locks, we are allocating the buffer earlier, before any locks are taken. The same buffer will be used for other steps of the recovery. v2: Update addresses by xe_lrc_write_ctx_reg() rather than set_memory_based_intr() v3: Renamed parameter, reordered parameters in some functs v4: Check if have MEMIRQ, move `xe_gt*` funct to proper file v5: Revert back to requiring scratch buffer, but allocate it earlier this time Signed-off-by: Tomasz Lis Cc: Michal Wajdeczko Cc: Michal Winiarski Acked-by: Satyanarayana K V P Reviewed-by: Michal Winiarski Link: https://lore.kernel.org/r/20250802031045.1127138-6-tomasz.lis@intel.com Signed-off-by: Michał Winiarski

drm/xe/vf: Rebase HWSP of all contexts after migration

2025-08-04T14:46:47+00:00

All contexts require an update due to GGTT range shift, as that affects their HWSP. The HW status page of a context contains GGTT references, which need to be shifted to a new range (or re-computed using the previously updated vma nodes). The references include ring start address and indirect state address. v2: move some functions to better matched files v3: Add missing kerneldocs v4: Style fix Signed-off-by: Tomasz Lis Cc: Michal Wajdeczko Cc: Michal Winiarski Acked-by: Satyanarayana K V P Reviewed-by: Michal Winiarski Link: https://lore.kernel.org/r/20250802031045.1127138-5-tomasz.lis@intel.com Signed-off-by: Michał Winiarski