<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdkfd, branch v7.2-rc1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-06-17T22:35:32+00:00</updated>
<entry>
<title>drm/amdkfd: Use exclusive bounds for SVM split alignment checks</title>
<updated>2026-06-17T22:35:32+00:00</updated>
<author>
<name>Gerhard Schwanzer</name>
<email>geschw@pm.me</email>
</author>
<published>2026-06-16T10:56:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b89d58b6595d79dc3fe75e213e1f4c5efd0251d4'/>
<id>urn:sha1:b89d58b6595d79dc3fe75e213e1f4c5efd0251d4</id>
<content type='text'>
SVM ranges use inclusive page indices: prange-&gt;last is the last page in
the range. The split-remap logic introduced by commit 448ee45353ef
("drm/amdkfd: Use huge page size to check split svm range alignment")
uses ALIGN_DOWN(prange-&gt;last, 512) to determine whether the original
range can contain a 2MB huge-page mapping.

That aligns the last page itself down. Thus a range ending one page
before the next 2MB boundary is classified as if the final 2MB block did
not exist. When such a range is split inside that final block, the
split head or tail can be left off the remap list even though it was
derived from an original range that may have PMD mappings.

Use prange-&gt;last + 1 as the exclusive upper bound when computing the
original range's last 2MB-aligned boundary. Then use the actual split
boundary for the head and tail alignment checks: tail-&gt;start for a tail
split, and new_start for a head split. new_start is equivalent to
head-&gt;last + 1 and directly names the exclusive end of the split head.

Using head-&gt;last for the head-side check can both remap a head that ends
exactly one page before a 2MB boundary and miss a head whose split
boundary is one page after such a boundary. Philip Yang pointed out in
the review of the original change that this condition should use
head-&gt;last + 1 or new_start.

Xiaogang Chen identified the inclusive-last cause and posted the
candidate fix in the regression thread. With the culprit change active
and the local revert not applied, the unchanged C/HSA reproducer
completes 10/10 runs with this change on an RX 7600 XT.

Fixes: 448ee45353ef ("drm/amdkfd: Use huge page size to check split svm range alignment")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4914
Link: https://lore.kernel.org/stable/IA1PR12MB85172F7FE9157C092EDA46A0E3112@IA1PR12MB8517.namprd12.prod.outlook.com/
Link: https://lore.kernel.org/all/32ce2b72-aa16-4202-9f99-92e3cd4408bc@amd.com/
Suggested-by: Xiaogang Chen &lt;xiaogang.chen@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Gerhard Schwanzer &lt;geschw@pm.me&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit a60ea15807126b148a328051636977a33ad0e9bb)
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: Use memdup_array_user to copy data from/to user space at kfd ioctls</title>
<updated>2026-06-17T22:24:53+00:00</updated>
<author>
<name>Xiaogang Chen</name>
<email>xiaogang.chen@amd.com</email>
</author>
<published>2026-06-16T18:25:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2321831d7e95d4e1abaff3ffd682be9dd45db62e'/>
<id>urn:sha1:2321831d7e95d4e1abaff3ffd682be9dd45db62e</id>
<content type='text'>
Several kfd ioctls need transfer array data from/to user space. Kfd driver
uses kmalloc_array with user provided size. That can oversize alloc or 32-bit
wrap with hostile value. Replace it by memdup_array_user that does overflow
checking and allocates through dedicated slab caches, also physical continuous
as kmalloc.

Signed-off-by: Xiaogang Chen &lt;xiaogang.chen@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 4eca4742eb215951f9739ffe0122d179d545a7a4)
</content>
</entry>
<entry>
<title>drm/amdkfd: check find_first_zero_bit before __set_bit on kfd-&gt;doorbell_bitmap</title>
<updated>2026-06-17T22:24:47+00:00</updated>
<author>
<name>Xiaogang Chen</name>
<email>xiaogang.chen@amd.com</email>
</author>
<published>2026-06-16T17:54:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=516bf737a5602875f6c28d1028967837c8edf2c0'/>
<id>urn:sha1:516bf737a5602875f6c28d1028967837c8edf2c0</id>
<content type='text'>
If inx from find_first_zero_bit is beyond range not need set doorbell_bitmap.

Signed-off-by: Xiaogang Chen &lt;xiaogang.chen@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 2664ce9143d174651a793d96a6a2326050c4f45a)
</content>
</entry>
<entry>
<title>drm/amdkfd: Let driver decide buffer size at AMDKFD_IOC_GET_DMABUF_INFO ioctl</title>
<updated>2026-06-17T22:24:38+00:00</updated>
<author>
<name>Xiaogang Chen</name>
<email>xiaogang.chen@amd.com</email>
</author>
<published>2026-05-27T03:50:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8242a8d9d7194d5ef8a8b38a5621ca0966a3ec15'/>
<id>urn:sha1:8242a8d9d7194d5ef8a8b38a5621ca0966a3ec15</id>
<content type='text'>
amdkfd driver needs allocate buffer to return bo metadata to user space. The
buffer size is controlled by user currently. It is a potential security issue
that hostile value (e.g. 2 GiB) lets any render-group user trigger order-MAX
allocation/OOM in kernel context.

This patch first finds bo metadata size. If the size is smaller than user
provided value drive can safely allocate buffer in kernel space and copy to
user space buffer. If not, driver will let user know, not allocate and copy.
User will redo with new buffer in user space.

This patch lets driver decide buffer allocation size to avoid potential hostile
size from user space.

Signed-off-by: Xiaogang Chen &lt;xiaogang.chen@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit f54ce9e8cbd3abe0eda3a285f54dc4f572fe589a)
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix NULL deref during sysfs teardown</title>
<updated>2026-06-17T22:21:27+00:00</updated>
<author>
<name>Geoffrey McRae</name>
<email>geoffrey.mcrae@amd.com</email>
</author>
<published>2026-06-01T13:55:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d072a3f603c639ee12a05126aa0bab0ff1732323'/>
<id>urn:sha1:d072a3f603c639ee12a05126aa0bab0ff1732323</id>
<content type='text'>
Move kfd_process_remove_sysfs() earlier in kfd_process_wq_release() so
that all sysfs/procfs entries are removed before tearing down PDDs and
dropping lead_thread. The per-process sysfs attributes are backed by
struct kfd_process_device, and their show/store callbacks dereference
PDD fields. Since sysfs removal waits for active callbacks to complete,
removing these entries first closes a race where userspace reads sdma_*
and stats_* files after PDD teardown.

Previously this cleanup ran after kfd_process_destroy_pdds(), which
resets p-&gt;n_pdds to 0. This meant kfd_process_remove_sysfs() could no
longer walk the PDD array, so the per-PDD sysfs cleanup did not run as
intended.

This race caused NULL pointer dereferences observed in
kfd_sdma_activity_worker and kfd_procfs_stats_show.

Also harden kfd_process_remove_sysfs() against partially
initialized or already-freed objects:
- Check kobj_queues before removing PASID and deleting it
- Guard kobj_stats and kobj_counters before use

These checks prevent invalid dereferences during cleanup.

Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Geoffrey McRae &lt;geoffrey.mcrae@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 674c692702341fed321720b4b92036c5934fb485)
</content>
</entry>
<entry>
<title>drm/amdkfd: fix list_del corruption in kfd_criu_resume_svm</title>
<updated>2026-06-17T22:19:37+00:00</updated>
<author>
<name>Mario Limonciello</name>
<email>mario.limonciello@amd.com</email>
</author>
<published>2026-06-13T02:22:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8fa5655da368d0306c03e9dc9cda8ae2a7840926'/>
<id>urn:sha1:8fa5655da368d0306c03e9dc9cda8ae2a7840926</id>
<content type='text'>
The cleanup tail of kfd_criu_resume_svm() walks
svms-&gt;criu_svm_metadata_list and kfree()s each struct criu_svm_metadata
without removing it from the list. The list head is left pointing at
freed kmalloc-96 objects.

A second AMDKFD_IOC_CRIU_OP from the same process re-enters: list_empty()
reads the dangling -&gt;next (use-after-free), the loop walks freed entries,
and each is kfree()'d again (double-free). This is reachable by an
unprivileged render-group user via /dev/kfd with no capabilities required.

Add list_del() before the kfree() so the list is properly emptied. The
list_for_each_entry_safe() iterator already caches the next pointer, so
unlinking during the walk is safe.

Fixes: 2a909ae71871 ("drm/amdkfd: CRIU resume shared virtual memory ranges")
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 6322d278a298e2c1430b9d2697743d3a04b788b1)
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix SMI event PID reporting for containers</title>
<updated>2026-06-17T22:12:00+00:00</updated>
<author>
<name>Andrew Martin</name>
<email>andrew.martin@amd.com</email>
</author>
<published>2026-05-28T14:32:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1142738572ef3fcf8b169f1c48d94b4a71cc2d97'/>
<id>urn:sha1:1142738572ef3fcf8b169f1c48d94b4a71cc2d97</id>
<content type='text'>
SMI events were reporting incorrect PIDs in containerized environments,
causing test failures where container processes expected to see their
namespace-local PIDs but instead received global host PIDs.

The issue had two root causes:

1. Event functions were called from kernel context (page fault handlers,
   migration workers) where 'current' refers to the kernel worker thread,
   not the userspace GPU process that triggered the event.

2. PID conversion used task_tgid_vnr() which returns the PID in the
   caller's namespace (init namespace for kernel threads), not the task's
   own namespace.

This patch updates the SMI event interface:

- Change 8 event function signatures to accept task_struct pointer
  instead of pid_t, allowing proper namespace-aware PID conversion

- Convert PIDs using task_tgid_nr_ns(task, task_active_pid_ns(task))
  which returns the PID as the process sees it via getpid()

- Update 10 call sites to pass p-&gt;lead_thread (the GPU process)
  instead of p-&gt;lead_thread-&gt;pid or current (kernel worker)

This ensures SMI events report container-local PIDs, which is critical
for containerized GPU workloads to correctly correlate events with their
processes.

Tested-by: Andrew Martin &lt;andmarti@amd.com&gt;
Assisted-by: Claude:Sonnet 4-5
Signed-off-by: Andrew Martin &lt;andrew.martin@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 60271ec06e04ba5d69d68714f3abdf637d86c257)
</content>
</entry>
<entry>
<title>drm/amdkfd: Properly acquire queue buffers in CRIU restore</title>
<updated>2026-06-17T22:08:14+00:00</updated>
<author>
<name>David Francis</name>
<email>David.Francis@amd.com</email>
</author>
<published>2026-06-04T19:04:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=20a5e7ffdfecddc34c60a6b4483f42acf3d8731d'/>
<id>urn:sha1:20a5e7ffdfecddc34c60a6b4483f42acf3d8731d</id>
<content type='text'>
When kfd_queue_acquire_buffers() was split off from
set_queue_properties_from_user(), set_queue_properties_from_criu()
was missed. Thus, set_queue_properties_from_criu() is not
filling out the buffer fields of queue_properties, which
can come up when subsequent code expects them to be non-null.

Add the proper call to kfd_queue_acquire_buffers(), and also
use the right cast types in set_queue_properties_from_criu()
(which were missed at the same time)

Signed-off-by: David Francis &lt;David.Francis@amd.com&gt;
Reviewed-by: Kent Russell &lt;kent.russell@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 88ed96abbbe27b70193544fbc1ee06448c274714)
</content>
</entry>
<entry>
<title>drm/amdkfd: always resume_all after suspend_all</title>
<updated>2026-06-04T19:38:08+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2026-05-06T20:50:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=56ae73c92e200e630c2bdf1e98c88b86c8483b37'/>
<id>urn:sha1:56ae73c92e200e630c2bdf1e98c88b86c8483b37</id>
<content type='text'>
Need to restore any good queues even if the suspend_all
failed for some.  Always run remove_queue as that will
schedule a GPU reset is removing the queue fails.

v2: move resume_all after remove

Fixes: eb067d65c33e ("drm/amdkfd: Update BadOpcode Interrupt handling with MES")
Reviewed-by: Amber Lin &lt;Amber.Lin@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix infinite loop parsing CRAT with zero subtype length</title>
<updated>2026-06-04T19:27:41+00:00</updated>
<author>
<name>Yongqiang Sun</name>
<email>Yongqiang.Sun@amd.com</email>
</author>
<published>2026-06-01T19:28:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=961323c26ad4c895e3b0ea1711fc41dfd6368c12'/>
<id>urn:sha1:961323c26ad4c895e3b0ea1711fc41dfd6368c12</id>
<content type='text'>
Malformed ACPI CRAT tables can advertise a zero or undersized subtype
length. The parser then fails to advance the cursor and loops forever
while the remaining image still looks large enough for a generic header.

Validate sub_type_hdr-&gt;length on each iteration before parsing or
advancing. Return -EINVAL and warn when length is zero or smaller than
the generic subtype header.

Signed-off-by: Yongqiang Sun &lt;Yongqiang.Sun@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
