<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/xe/xe_exec_queue.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-01-09T12:33:38+00:00</updated>
<entry>
<title>drm/xe: Fix fault on fd close after unbind</title>
<updated>2025-01-09T12:33:38+00:00</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.demarchi@intel.com</email>
</author>
<published>2024-12-18T05:31:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=924d43bd10a1f6723ac5181a6e6cc2196ba98cdd'/>
<id>urn:sha1:924d43bd10a1f6723ac5181a6e6cc2196ba98cdd</id>
<content type='text'>
[ Upstream commit fe39b222a4139354d32ff9d46b88757f63f71d63 ]

If userspace holds an fd open, unbinds the device and then closes it,
the driver shouldn't try to access the hardware. Protect it by using
drm_dev_enter()/drm_dev_exit(). This fixes the following page fault:

&lt;6&gt; [IGT] xe_wedged: exiting, ret=98
&lt;1&gt; BUG: unable to handle page fault for address: ffffc901bc5e508c
&lt;1&gt; #PF: supervisor read access in kernel mode
&lt;1&gt; #PF: error_code(0x0000) - not-present page
...
&lt;4&gt;   xe_lrc_update_timestamp+0x1c/0xd0 [xe]
&lt;4&gt;   xe_exec_queue_update_run_ticks+0x50/0xb0 [xe]
&lt;4&gt;   xe_exec_queue_fini+0x16/0xb0 [xe]
&lt;4&gt;   __guc_exec_queue_fini_async+0xc4/0x190 [xe]
&lt;4&gt;   guc_exec_queue_fini_async+0xa0/0xe0 [xe]
&lt;4&gt;   guc_exec_queue_fini+0x23/0x40 [xe]
&lt;4&gt;   xe_exec_queue_destroy+0xb3/0xf0 [xe]
&lt;4&gt;   xe_file_close+0xd4/0x1a0 [xe]
&lt;4&gt;   drm_file_free+0x210/0x280 [drm]
&lt;4&gt;   drm_close_helper.isra.0+0x6d/0x80 [drm]
&lt;4&gt;   drm_release_noglobal+0x20/0x90 [drm]

Fixes: 514447a12190 ("drm/xe: Stop accumulating LRC timestamp on job_free")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3421
Reviewed-by: Umesh Nerlige Ramappa &lt;umesh.nerlige.ramappa@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241218053122.2730195-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
(cherry picked from commit 4ca1fd418338d4d135428a0eb1e16e3b3ce17ee8)
Signed-off-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/xe: Stop accumulating LRC timestamp on job_free</title>
<updated>2024-11-05T23:40:13+00:00</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.demarchi@intel.com</email>
</author>
<published>2024-11-04T14:38:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=514447a1219021298329ce586536598c3b4b2dc0'/>
<id>urn:sha1:514447a1219021298329ce586536598c3b4b2dc0</id>
<content type='text'>
The exec queue timestamp is only really useful when it's being queried
through the fdinfo. There's no need to update it so often, on every
job_free. Tracing a simple app like vkcube running shows an update
rate of ~ 120Hz. In case of discrete, the BO is on vram, creating a lot
of pcie transactions.

The update on job_free() is used to cover a gap: if exec
queue is created and destroyed rapidly, before a new query, the
timestamp still needs to be accumulated and accounted for in the xef.

Initial implementation in commit 6109f24f87d7 ("drm/xe: Add helper to
accumulate exec queue runtime") couldn't do it on the exec_queue_fini
since the xef could be gone at that point. However since commit
ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") the xef is
refcounted and the exec queue always holds a reference, making this safe
now.

Improve the fix in commit 2149ded63079 ("drm/xe: Fix use after free when
client stats are captured") by reducing the frequency in which the
update is needed.

Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured")
Reviewed-by: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Reviewed-by: Jonathan Cavitt &lt;jonathan.cavitt@intel.com&gt;
Reviewed-by: Umesh Nerlige Ramappa &lt;umesh.nerlige.ramappa@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241104143815.2112272-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
(cherry picked from commit 83db047d9425d9a649f01573797558eff0f632e1)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/queue: move xa_alloc to prevent UAF</title>
<updated>2024-10-03T06:22:50+00:00</updated>
<author>
<name>Matthew Auld</name>
<email>matthew.auld@intel.com</email>
</author>
<published>2024-09-25T07:14:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=67801fa67b94ebd0e4da7a77ac2d9f321b75fbe0'/>
<id>urn:sha1:67801fa67b94ebd0e4da7a77ac2d9f321b75fbe0</id>
<content type='text'>
Evil user can guess the next id of the queue before the ioctl completes
and then call queue destroy ioctl to trigger UAF since create ioctl is
still referencing the same queue. Move the xa_alloc all the way to the end
to prevent this.

v2:
 - Rebase

Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured")
Signed-off-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240925071426.144015-4-matthew.auld@intel.com
(cherry picked from commit 16536582ddbebdbdf9e1d7af321bbba2bf955a87)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Clean up VM / exec queue file lock usage.</title>
<updated>2024-10-03T06:21:55+00:00</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2024-09-21T01:17:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9e3c85ddea7a473ed57b6cdfef2dfd468356fc91'/>
<id>urn:sha1:9e3c85ddea7a473ed57b6cdfef2dfd468356fc91</id>
<content type='text'>
Both the VM / exec queue file lock protect the lookup and reference to
the object, nothing more. These locks are not intended anything else
underneath them. XA have their own locking too, so no need to take the
VM / exec queue file lock aside from when doing a lookup and reference
get.

Add some kernel doc to make this clear and cleanup a few typos too.

Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240921011712.2681510-1-matthew.brost@intel.com
(cherry picked from commit fe4f5d4b661666a45b48fe7f95443f8fefc09c8c)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: fix missing 'xe_vm_put'</title>
<updated>2024-09-12T23:04:36+00:00</updated>
<author>
<name>Dafna Hirschfeld</name>
<email>dhirschfeld@habana.ai</email>
</author>
<published>2024-09-01T04:42:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2efba0c095419f93f8913f1cbae8bf3fb030db20'/>
<id>urn:sha1:2efba0c095419f93f8913f1cbae8bf3fb030db20</id>
<content type='text'>
Fix memleak caused by missing xe_vm_put

Fixes: 852856e3b6f6 ("drm/xe: Use reserved copy engine for user binds on faulting devices")
Signed-off-by: Dafna Hirschfeld &lt;dhirschfeld@habana.ai&gt;
Reviewed-by: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240901044227.1177211-1-dhirschfeld@habana.ai
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
(cherry picked from commit 249df8cbecf0ab4877eab66cae857748631831a9)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: replace #include &lt;drm/xe_drm.h&gt; with &lt;uapi/drm/xe_drm.h&gt;</title>
<updated>2024-08-28T19:17:54+00:00</updated>
<author>
<name>Jani Nikula</name>
<email>jani.nikula@intel.com</email>
</author>
<published>2024-08-27T09:15:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=87d8ecf015444c51ea9d9154f633f98b7748a724'/>
<id>urn:sha1:87d8ecf015444c51ea9d9154f633f98b7748a724</id>
<content type='text'>
include/drm/xe_drm.h does not exist. Prefer the explicit uapi include.

Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
Reviewed-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240827091539.4136838-1-jani.nikula@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: prevent UAF around preempt fence</title>
<updated>2024-08-19T11:38:09+00:00</updated>
<author>
<name>Matthew Auld</name>
<email>matthew.auld@intel.com</email>
</author>
<published>2024-08-14T11:01:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7116c35aacedc38be6d15bd21b2fc936eed0008b'/>
<id>urn:sha1:7116c35aacedc38be6d15bd21b2fc936eed0008b</id>
<content type='text'>
The fence lock is part of the queue, therefore in the current design
anything locking the fence should then also hold a ref to the queue to
prevent the queue from being freed.

However, currently it looks like we signal the fence and then drop the
queue ref, but if something is waiting on the fence, the waiter is
kicked to wake up at some later point, where upon waking up it first
grabs the lock before checking the fence state. But if we have already
dropped the queue ref, then the lock might already be freed as part of
the queue, leading to uaf.

To prevent this, move the fence lock into the fence itself so we don't
run into lifetime issues. Alternative might be to have device level
lock, or only release the queue in the fence release callback, however
that might require pushing to another worker to avoid locking issues.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2454
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2342
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2020
Signed-off-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # v6.8+
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240814110129.825847-2-matthew.auld@intel.com
</content>
</entry>
<entry>
<title>drm/xe/exec_queue: Prepare last fence for hw engine group resume context</title>
<updated>2024-08-18T01:31:54+00:00</updated>
<author>
<name>Francois Dugast</name>
<email>francois.dugast@intel.com</email>
</author>
<published>2024-08-09T15:51:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0d92cd8935a3fffbbfee0fd59cdc89ac5167b14a'/>
<id>urn:sha1:0d92cd8935a3fffbbfee0fd59cdc89ac5167b14a</id>
<content type='text'>
Ensure we can safely take a ref of the exec queue's last fence from the
context of resuming jobs from the hw engine group. The locking requirements
differ from the general case, hence the introduction of this new function.

v2: Add kernel doc, rework the code to prevent code duplication

v3: Fix kernel doc, remove now unnecessary lockdep variants (Matt Brost)

v4: Remove new put function (Matt Brost)

Signed-off-by: Francois Dugast &lt;francois.dugast@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-7-francois.dugast@intel.com
</content>
</entry>
<entry>
<title>drm/xe/exec_queue: Remove duplicated code</title>
<updated>2024-08-18T01:31:53+00:00</updated>
<author>
<name>Francois Dugast</name>
<email>francois.dugast@intel.com</email>
</author>
<published>2024-08-09T15:51:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7f0d7bee2079fc899c8280e177b0c0feb8b9debe'/>
<id>urn:sha1:7f0d7bee2079fc899c8280e177b0c0feb8b9debe</id>
<content type='text'>
This code section is the same as the body of
xe_exec_queue_last_fence_put_unlocked() so call the function instead and
remove duplicated code to make maintenance easier.

Signed-off-by: Francois Dugast &lt;francois.dugast@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-6-francois.dugast@intel.com
</content>
</entry>
<entry>
<title>'drm/xe/hw_engine_group: Register hw engine group's exec queues</title>
<updated>2024-08-18T01:31:51+00:00</updated>
<author>
<name>Francois Dugast</name>
<email>francois.dugast@intel.com</email>
</author>
<published>2024-08-09T15:51:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7970cb36966c9b9183255dc097ae0446300eebcf'/>
<id>urn:sha1:7970cb36966c9b9183255dc097ae0446300eebcf</id>
<content type='text'>
Add helpers to safely add and delete the exec queues attached to a hw
engine group, and make use them at the time of creation and destruction of
the exec queues. Keeping track of them is required to control the
execution mode of the hw engine group.

v2: Improve error handling and robustness, suspend exec queues created in
    fault mode if group in dma-fence mode, init queue link (Matt Brost)

v3: Delete queue from hw engine group when it is destroyed by the user,
    also clean up at the time of closing the file in case the user did
    not destroy the queue

v4: Use correct list when checking if empty, do not add the queue if VM
    is in xe_vm_in_preempt_fence_mode (Matt Brost)

v5: Remove unrelated newline, add checks and asserts for group, unwind on
    suspend failure (Matt Brost)

Signed-off-by: Francois Dugast &lt;francois.dugast@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-4-francois.dugast@intel.com
</content>
</entry>
</feed>
