<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/xe, branch v7.2-rc1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-06-16T17:18:52+00:00</updated>
<entry>
<title>drm/xe: Add compact-PT and addr mask handling for page reclaim</title>
<updated>2026-06-16T17:18:52+00:00</updated>
<author>
<name>Brian Nguyen</name>
<email>brian3.nguyen@intel.com</email>
</author>
<published>2026-06-05T22:42:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b5ed2756d45b04669502a1f13b1657ec7664571'/>
<id>urn:sha1:0b5ed2756d45b04669502a1f13b1657ec7664571</id>
<content type='text'>
Current implementation of generate_reclaim_entry() overlooks some
differences between the different page implementations: address masking
and compact 64K page handling.

Address masking of each leaf varies depending on the leaf entry size.
generate_reclaim_entry() is using XE_PTE_ADDR_MASK [51:12] for all leaf
entries. For 2MB PTEs, bit 12 (PAT) is part of the flags so the old mask
corrupts the physical address extraction.

64K pages can be represented as PS64 and a compact PT, which the latter
was not handled. Compact pages aren't walked by the unbind walker, so we
separately walk through the compact PT to ensure none of the leaf 64K
PTEs are dropped. Previously, compact PT were causing an abort since it
was considered covered and not descended into.

v2:
 - Update 64K entry/unbind walker for 64K compact PT handling. (Matthew)
 - Rework calculations of reclamation and address mask size.
 - Add new func abstracting the error handling before generating the
   reclaim entry.

v3:
 - Report finer addr granularity in abort debug print for compact.
   (Zongyao)
 - Add comments for ADDR_MASK usage. (Zongyao)
 - Drop existing phys_addr asserts, the new XE_PAGE_ADDR_MASK clears
   bits checked, so redundant asserts. (Sashiko)
 - WARN_ON to verify compact pt and edge pt won't be possible.

Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind")
Assisted-by: Sashiko-Review:gemini-3.1-pro-preview
Cc: stable@vger.kernel.org
Cc: Matthew Auld &lt;matthew.auld@intel.com&gt;
Suggested-by: Zongyao Bai &lt;zongyao.bai@intel.com&gt;
Signed-off-by: Brian Nguyen &lt;brian3.nguyen@intel.com&gt;
Reviewed-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Reviewed-by: Zongyao Bai &lt;zongyao.bai@intel.com&gt;
Link: https://patch.msgid.link/20260605224257.2194194-2-brian3.nguyen@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit 669252801a4aa4098fbc5dd9dd0bd93f0625abd7)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/guc: Fix buffer overflow in steered register list allocation</title>
<updated>2026-06-16T17:18:49+00:00</updated>
<author>
<name>Tejas Upadhyay</name>
<email>tejas.upadhyay@intel.com</email>
</author>
<published>2026-06-12T07:04:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=632ecc90e1ca5d3b6822bb4d08f84a175b6c42c0'/>
<id>urn:sha1:632ecc90e1ca5d3b6822bb4d08f84a175b6c42c0</id>
<content type='text'>
The size calculation for the steered register extarray uses only the
geometry DSS mask (g_dss_mask) to determine the number of entries to
allocate:

  total = bitmap_weight(gt-&gt;fuse_topo.g_dss_mask, ...) * steer_reg_num;

However, the filling loop uses for_each_dss_steering(), which iterates
over for_each_dss(), defined as the union of g_dss_mask and c_dss_mask
(geometry + compute DSS). On platforms with compute-only DSS bits, the
loop writes past the allocated buffer, corrupting adjacent slab objects.

This manifests as list_del corruption and SLUB redzone overwrites during
drm_managed_release on device unbind, since the overflow corrupts the
drmres list_head of neighboring allocations.

Fix by computing the allocation size using the union of both DSS masks,
matching the iteration pattern of for_each_dss_steering().

--
v2:
- use bitmap_weighted_or() (Zhanjun)

Fixes: b170d696c1e2 ("drm/xe/guc: Add XE_LP steered register lists")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/8049
Cc: Zhanjun Dong &lt;zhanjun.dong@intel.com&gt;
Cc: stable@vger.kernel.org
Assisted-by: GitHub-Copilot:claude-opus-4.6
Reviewed-by: Zhanjun Dong &lt;zhanjun.dong@intel.com&gt;
Link: https://patch.msgid.link/20260612070401.543305-2-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
(cherry picked from commit 0a78a44f4901aa6c9263e66be7fce02282f1109f)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Set TTM device beneficial_order to 9 (2M)</title>
<updated>2026-06-16T17:18:46+00:00</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2026-06-11T23:58:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ba7fd163422877ad5a5cf31306a38c07d3932b0c'/>
<id>urn:sha1:ba7fd163422877ad5a5cf31306a38c07d3932b0c</id>
<content type='text'>
Set the TTM device beneficial_order to 9 (2M), which is the sweet
spot for Xe when attempting reclaim on system memory BOs, as it matches
the large GPU page size. This ensures reclaim is attempted at the most
effective order for the driver.

This fixes an issue where an order-10 (4M) allocation cannot be found
despite an abundance of memory. The 4M allocation triggers reclaim,
unnecessarily evicting the working set and hurting performance. Since
the TTM infrastructure was introduced recently, we are tagging the TTM
patch as the Fixes target, even though this resolves an Xe-side problem.

Fixes: 7e9c548d3709 ("drm/ttm: Allow drivers to specify maximum beneficial TTM pool size")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Andi Shyti &lt;andi.shyti@linux.intel.com&gt;
Reviewed-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Link: https://patch.msgid.link/20260611235844.3725147-1-matthew.brost@intel.com
(cherry picked from commit 0d81db90d364cb3d733410829118759f28957c5a)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Fix wa_oob codegen recipe for external module builds</title>
<updated>2026-06-16T17:18:43+00:00</updated>
<author>
<name>Thomas Hellström</name>
<email>thomas.hellstrom@linux.intel.com</email>
</author>
<published>2026-06-04T07:45:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=92dc59ab2a09097cdf249e0288ff9b69261761c6'/>
<id>urn:sha1:92dc59ab2a09097cdf249e0288ff9b69261761c6</id>
<content type='text'>
When building with 'make M=drivers/gpu/drm/xe modules', kbuild invokes
scripts/Makefile.build with obj=., causing $(obj) to expand to '.'.
Make normalizes './xe_gen_wa_oob' to 'xe_gen_wa_oob' when constructing
the $^ automatic variable (target name normalization), so the recipe
command becomes just 'xe_gen_wa_oob ...' without any path prefix, and
the shell cannot find the tool.

Fix by replacing $^ with explicit $(obj)/xe_gen_wa_oob and
$(src)/&lt;rules-file&gt; references in both wa_oob recipe commands.
In recipe strings, make does not apply target name normalization, so
$(obj)/xe_gen_wa_oob correctly expands to './xe_gen_wa_oob' and the
shell can execute it. This matches the pattern already used by other
DRM drivers (e.g. radeon's mkregtable).

Fixes: f037e0b78e6d ("drm/xe: add xe_device_wa infrastructure")
Cc: Matt Atwood &lt;matthew.s.atwood@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Cc: intel-xe@lists.freedesktop.org
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Reviewed-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Link: https://patch.msgid.link/20260604074501.172129-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 3a11a63cc16660d514ff584e7551589655337e87)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: fix job timeout recovery for unstarted jobs and kernel queues</title>
<updated>2026-06-16T17:18:40+00:00</updated>
<author>
<name>Rodrigo Vivi</name>
<email>rodrigo.vivi@intel.com</email>
</author>
<published>2026-06-10T15:25:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=770031ec2312bfab307d05db5469f24fd297e758'/>
<id>urn:sha1:770031ec2312bfab307d05db5469f24fd297e758</id>
<content type='text'>
A job that GuC never scheduled (never started) indicates a GuC
scheduling failure; previously such jobs were silently errored out
instead of triggering a GT reset to recover. Trigger a GT reset and
resubmit them, but only when the queue was not already killed or banned:
an unstarted job on an already banned queue is the ban working as
intended and must neither clear the ban nor kick off a reset, otherwise
a banned userspace queue could be resurrected and spam GT resets.

Kernel queues are always recovered this way and wedge the device once
recovery attempts are exhausted, since kernel work must not silently
fail. A started job that times out on a userspace VM bind queue stays
banned rather than being reset and retried.

The queue is banned early in the timeout handler to signal the G2H
scheduling-done handler so it wakes the disable-scheduling waiter;
without it the waiter sleeps the full 5s timeout. When a reset is
warranted the ban is cleared before rearming so that
guc_exec_queue_start() can resubmit jobs after the GT reset - a
still-banned queue would block resubmission and cause an infinite TDR
loop. The already-banned case is gated out before this point via
skip_timeout_check, so it is unaffected.

v2: (Himal) Do it for any queue type, not just kernel/migration
v3: - (Sashiko and Sanjay): don't clear the ban / GT reset for already
      killed/banned queues on unstarted-job timeout
    - Update commit message
    - (Matt) Add Fixes tag

Fixes: fe05cee4d953 ("drm/xe: Don't short circuit TDR on jobs not started")
Cc: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Sanjay Yadav &lt;sanjay.kumar.yadav@intel.com&gt;
Cc: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Assisted-by: GitHub-Copilot:claude-sonnet-4.6
Assisted-by: GitHub-Copilot:claude-opus-4.8
Tested-by: Sanjay Yadav &lt;sanjay.kumar.yadav@intel.com&gt;
Reviewed-by: Sanjay Yadav &lt;sanjay.kumar.yadav@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Link: https://patch.msgid.link/20260610152548.404575-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
(cherry picked from commit b1107d085e7e8ed15ba6f80c102528a9c8a6cb0e)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: fix refcount leak in xe_range_fence_insert()</title>
<updated>2026-06-16T17:18:37+00:00</updated>
<author>
<name>Wentao Liang</name>
<email>vulab@iscas.ac.cn</email>
</author>
<published>2026-06-10T17:27:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b837315ca0adc317e5c8a9c7e484a0e29199b9b'/>
<id>urn:sha1:0b837315ca0adc317e5c8a9c7e484a0e29199b9b</id>
<content type='text'>
xe_range_fence_insert() acquires a reference on fence via
dma_fence_get() and stores it in rfence-&gt;fence.  It then calls
dma_fence_add_callback() and handles two cases: when the callback
is successfully registered (err == 0) the fence is transferred to
the tree for later cleanup; when the fence is already signaled
(err == -ENOENT) it manually drops the extra reference with
dma_fence_put(fence).

However, dma_fence_add_callback() can fail with other errors
(e.g. -EINVAL) and in that case the code falls through to the free:
label without releasing the acquired reference, leaking it.

Fix the leak by adding an else branch that calls dma_fence_put()
before jumping to free: for any error other than -ENOENT.

Fixes: 845f64bdbfc9 ("drm/xe: Introduce a range-fence utility")
Signed-off-by: Wentao Liang &lt;vulab@iscas.ac.cn&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patch.msgid.link/20260610172705.3450560-1-matthew.brost@intel.com
(cherry picked from commit 98c4a4201290823c2c5c7ba21692bd9a64b61021)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: include all registered queues in TLB invalidation</title>
<updated>2026-06-16T17:18:35+00:00</updated>
<author>
<name>Tangudu Tilak Tirumalesh</name>
<email>tilak.tirumalesh.tangudu@intel.com</email>
</author>
<published>2026-06-08T16:27:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f2d238408db2b963951f0f2ae70ff7f9e337b540'/>
<id>urn:sha1:f2d238408db2b963951f0f2ae70ff7f9e337b540</id>
<content type='text'>
Context-based TLB invalidation currently selects only scheduling-active
exec queues via q-&gt;ops-&gt;active(). During rebind flows, queues may be
suspended (or transitioning through resume) while still owning valid
translations, causing them to be skipped from invalidation and leading
to missed TLB invalidations on LR rebinds.

The underlying issue is a TOCTOU: q-&gt;guc-&gt;state bits are flipped lock-free
from enable_scheduling(), disable_scheduling{,_deregister}(), the
suspend/resume sched-msg handlers, handle_sched_done(), and
guc_exec_queue_stop(); nothing in send_tlb_inval_ctx_ppgtt() serializes
against them, so any state-based predicate can race.

Include all the registered queues so that TLB invalidations are not
missed. This is race-free because list membership on vm-&gt;exec_queues.list
is stable under vm-&gt;exec_queues.lock held by the caller. The performance
impact is expected to be minimal and harmless. If it does turn out to be
a concern, we can come back with a race-safe solution to ignore certain
queues.

Fixes: 6cdaa5346d6f ("drm/xe: Add context-based invalidation to GuC TLB invalidation backend")
Assisted-by: Claude:claude-opus-4.6
Suggested-by: Thomas Hellstrom &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Tangudu Tilak Tirumalesh &lt;tilak.tirumalesh.tangudu@intel.com&gt;
Reviewed-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patch.msgid.link/20260608162745.338725-2-tilak.tirumalesh.tangudu@intel.com
Signed-off-by: Shuicheng Lin &lt;shuicheng.lin@intel.com&gt;
(cherry picked from commit aa625e1e9f0710e424fe4f0e3f032807df81b5b0)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/hw_error: Use HW_ERR prefix in log</title>
<updated>2026-06-16T17:18:32+00:00</updated>
<author>
<name>Raag Jadav</name>
<email>raag.jadav@intel.com</email>
</author>
<published>2026-06-02T04:48:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d46319bc8273c1018fa532a383eb515ee1b1de4b'/>
<id>urn:sha1:d46319bc8273c1018fa532a383eb515ee1b1de4b</id>
<content type='text'>
Hardware errors should be logged with HW_ERR prefix. Make them
consistent with existing logs.

Fixes: 01aab7e1c9d4 ("drm/xe/xe_hw_error: Add support for PVC SoC errors")
Signed-off-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patch.msgid.link/20260602044919.702209-5-raag.jadav@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit ad60a618c49fef07d1860bfb1091140d29f5eddb)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/drm_ras: Add per node cleanup action</title>
<updated>2026-06-16T17:18:29+00:00</updated>
<author>
<name>Raag Jadav</name>
<email>raag.jadav@intel.com</email>
</author>
<published>2026-06-02T04:48:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=042f1d5f20d63fb9b0e2fbba97648d16fe1a845d'/>
<id>urn:sha1:042f1d5f20d63fb9b0e2fbba97648d16fe1a845d</id>
<content type='text'>
cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Add per node cleanup action which guarantees
cleanup on unwind and also simplifies the cleanup logic.

Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS")
Signed-off-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patch.msgid.link/20260602044919.702209-4-raag.jadav@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit 67fc5543d8274b2fcbef87734fad0469358f4478)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/drm_ras: Make counter allocation drm managed</title>
<updated>2026-06-16T17:18:27+00:00</updated>
<author>
<name>Raag Jadav</name>
<email>raag.jadav@intel.com</email>
</author>
<published>2026-06-02T04:48:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=45fa032b68570a9ccd80328a857ac5872217be92'/>
<id>urn:sha1:45fa032b68570a9ccd80328a857ac5872217be92</id>
<content type='text'>
cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Fix this using drm managed allocation, which is
guaranteed to be cleaned up on unwind.

Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS")
Signed-off-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patch.msgid.link/20260602044919.702209-3-raag.jadav@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit 58d77c77ea0c5cb2b755ebe23e973c8272acd896)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
</entry>
</feed>
