<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdkfd, branch v6.18.34</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.34</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.34'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-05-17T15:15:34+00:00</updated>
<entry>
<title>drm/amdkfd: Make all TLB-flushes heavy-weight</title>
<updated>2026-05-17T15:15:34+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>felix.kuehling@amd.com</email>
</author>
<published>2026-04-20T15:55:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6fbd52d210c15f3987e04d8fa2cf6ca3ace2d0d8'/>
<id>urn:sha1:6fbd52d210c15f3987e04d8fa2cf6ca3ace2d0d8</id>
<content type='text'>
commit 9b4e3495d1bd2469bf94b74930c153c2d534ddb7 upstream.

With only one sequence number we cannot track the need for legacy vs
heavy-weight flushes reliably. Always use heavy-weight.

Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Philip Yang &lt;philip.yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit c1a3ff1d327820cd9a52bc1056b98681fc088949)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: validate SVM ioctl nattr against buffer size</title>
<updated>2026-05-17T15:15:33+00:00</updated>
<author>
<name>Alysa Liu</name>
<email>Alysa.Liu@amd.com</email>
</author>
<published>2026-04-21T14:18:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db9530a9873a7c85d2266a922589ebcf427fa631'/>
<id>urn:sha1:db9530a9873a7c85d2266a922589ebcf427fa631</id>
<content type='text'>
commit 045e0ff208f0838a246c10204105126611b267a1 upstream.

Validate nattr field against the buffer size, preventing
out-of-bounds buffer access via user-controlled attribute count.

Reviewed-by: Amir Shetaia &lt;Amir.Shetaia@amd.com&gt;
Signed-off-by: Alysa Liu &lt;Alysa.Liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 5eca8bfdfa456c3304ca77523718fe24254c172f)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Add upper bound check for num_of_nodes</title>
<updated>2026-05-17T15:15:32+00:00</updated>
<author>
<name>Alysa Liu</name>
<email>Alysa.Liu@amd.com</email>
</author>
<published>2026-03-30T14:50:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=44d5a450c04d3d734c13a03561c3131020d66edf'/>
<id>urn:sha1:44d5a450c04d3d734c13a03561c3131020d66edf</id>
<content type='text'>
commit 74b73fa56a395d46745e4f245225963e9f8be7f1 upstream.

drm/amdkfd: Add upper bound check for num_of_nodes
in kfd_ioctl_get_process_apertures_new.

Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alysa Liu &lt;Alysa.Liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 98ff46a5ea090c14d2cdb4f5b993b05d74f3949f)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix queue preemption/eviction failures by aligning control stack size to GPU page size</title>
<updated>2026-04-22T11:22:14+00:00</updated>
<author>
<name>Donet Tom</name>
<email>donettom@linux.ibm.com</email>
</author>
<published>2026-03-23T04:28:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=647fb0dc3818733024fc96c1df1ec3af806b0256'/>
<id>urn:sha1:647fb0dc3818733024fc96c1df1ec3af806b0256</id>
<content type='text'>
[ Upstream commit 78746a474e92fc7aaed12219bec7c78ae1bd6156 ]

The control stack size is calculated based on the number of CUs and
waves, and is then aligned to PAGE_SIZE. When the resulting control
stack size is aligned to 64 KB, GPU hangs and queue preemption
failures are observed while running RCCL unit tests on systems with
more than two GPUs.

amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4
amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues

This issue is observed on both 4 KB and 64 KB system page-size
configurations.

This patch fixes the issue by aligning the control stack size to
AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size
will not be 64 KB on systems with a 64 KB page size and queue
preemption works correctly.

Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE,
which can waste memory if the system page size is large. In this patch,
wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated
from wg_data_size and the control stack size, is aligned to PAGE_SIZE.

Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Donet Tom &lt;donettom@linux.ibm.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit a3e14436304392fbada359edd0f1d1659850c9b7)
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Change AMDGPU_VA_RESERVED_TRAP_SIZE to 64KB</title>
<updated>2026-04-11T12:26:38+00:00</updated>
<author>
<name>Donet Tom</name>
<email>donettom@linux.ibm.com</email>
</author>
<published>2026-03-26T12:21:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=77c918eaa4c916751769242567407f61c6af142a'/>
<id>urn:sha1:77c918eaa4c916751769242567407f61c6af142a</id>
<content type='text'>
commit 4487571ef17a30d274600b3bd6965f497a881299 upstream.

Currently, AMDGPU_VA_RESERVED_TRAP_SIZE is hardcoded to 8KB, while
KFD_CWSR_TBA_TMA_SIZE is defined as 2 * PAGE_SIZE. On systems with
4K pages, both values match (8KB), so allocation and reserved space
are consistent.

However, on 64K page-size systems, KFD_CWSR_TBA_TMA_SIZE becomes 128KB,
while the reserved trap area remains 8KB. This mismatch causes the
kernel to crash when running rocminfo or rccl unit tests.

Kernel attempted to read user page (2) - exploit attempt? (uid: 1001)
BUG: Kernel NULL pointer dereference on read at 0x00000002
Faulting instruction address: 0xc0000000002c8a64
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
CPU: 34 UID: 1001 PID: 9379 Comm: rocminfo Tainted: G E
6.19.0-rc4-amdgpu-00320-gf23176405700 #56 VOLUNTARY
Tainted: [E]=UNSIGNED_MODULE
Hardware name: IBM,9105-42A POWER10 (architected) 0x800200 0xf000006
of:IBM,FW1060.30 (ML1060_896) hv:phyp pSeries
NIP:  c0000000002c8a64 LR: c00000000125dbc8 CTR: c00000000125e730
REGS: c0000001e0957580 TRAP: 0300 Tainted: G E
MSR:  8000000000009033 &lt;SF,EE,ME,IR,DR,RI,LE&gt; CR: 24008268
XER: 00000036
CFAR: c00000000125dbc4 DAR: 0000000000000002 DSISR: 40000000
IRQMASK: 1
GPR00: c00000000125d908 c0000001e0957820 c0000000016e8100
c00000013d814540
GPR04: 0000000000000002 c00000013d814550 0000000000000045
0000000000000000
GPR08: c00000013444d000 c00000013d814538 c00000013d814538
0000000084002268
GPR12: c00000000125e730 c000007e2ffd5f00 ffffffffffffffff
0000000000020000
GPR16: 0000000000000000 0000000000000002 c00000015f653000
0000000000000000
GPR20: c000000138662400 c00000013d814540 0000000000000000
c00000013d814500
GPR24: 0000000000000000 0000000000000002 c0000001e0957888
c0000001e0957878
GPR28: c00000013d814548 0000000000000000 c00000013d814540
c0000001e0957888
NIP [c0000000002c8a64] __mutex_add_waiter+0x24/0xc0
LR [c00000000125dbc8] __mutex_lock.constprop.0+0x318/0xd00
Call Trace:
0xc0000001e0957890 (unreliable)
__mutex_lock.constprop.0+0x58/0xd00
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x6fc/0xb60 [amdgpu]
kfd_process_alloc_gpuvm+0x54/0x1f0 [amdgpu]
kfd_process_device_init_cwsr_dgpu+0xa4/0x1a0 [amdgpu]
kfd_process_device_init_vm+0xd8/0x2e0 [amdgpu]
kfd_ioctl_acquire_vm+0xd0/0x130 [amdgpu]
kfd_ioctl+0x514/0x670 [amdgpu]
sys_ioctl+0x134/0x180
system_call_exception+0x114/0x300
system_call_vectored_common+0x15c/0x2ec

This patch changes AMDGPU_VA_RESERVED_TRAP_SIZE to 64 KB and
KFD_CWSR_TBA_TMA_SIZE to the AMD GPU page size. This means we reserve
64 KB for the trap in the address space, but only allocate 8 KB within
it. With this approach, the allocation size never exceeds the reserved
area.

Fixes: 34a1de0f7935 ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole")
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Donet Tom &lt;donettom@linux.ibm.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 31b8de5e55666f26ea7ece5f412b83eab3f56dbb)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Unreserve bo if queue update failed</title>
<updated>2026-03-19T15:08:20+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2025-12-09T20:13:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=529c985da1b277b36dc99aad660f96dc70f3c467'/>
<id>urn:sha1:529c985da1b277b36dc99aad660f96dc70f3c467</id>
<content type='text'>
[ Upstream commit 2ce75a0b7e1bfddbcb9bc8aeb2e5e7fa99971acf ]

Error handling path should unreserve bo then return failed.

Fixes: 305cd109b761 ("drm/amdkfd: Validate user queue update")
Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Alex Sierra &lt;alex.sierra@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit c24afed7de9ecce341825d8ab55a43a254348b33)
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix out-of-bounds write in kfd_event_page_set()</title>
<updated>2026-03-04T12:21:24+00:00</updated>
<author>
<name>Sunday Clement</name>
<email>Sunday.Clement@amd.com</email>
</author>
<published>2026-02-02T17:41:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bfcd6b53e1f4feb182952f4ff9a137c36ceaf20b'/>
<id>urn:sha1:bfcd6b53e1f4feb182952f4ff9a137c36ceaf20b</id>
<content type='text'>
[ Upstream commit 8a70a26c9f34baea6c3199a9862ddaff4554a96d ]

The kfd_event_page_set() function writes KFD_SIGNAL_EVENT_LIMIT * 8
bytes via memset without checking the buffer size parameter. This allows
unprivileged userspace to trigger an out-of bounds kernel memory write
by passing a small buffer, leading to  potential privilege
escalation.

Signed-off-by: Sunday Clement &lt;Sunday.Clement@amd.com&gt;
Reviewed-by: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()</title>
<updated>2026-03-04T12:19:59+00:00</updated>
<author>
<name>Donet Tom</name>
<email>donettom@linux.ibm.com</email>
</author>
<published>2026-01-12T14:06:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=662ec0a17b50462963a0e01d73991324e3a90152'/>
<id>urn:sha1:662ec0a17b50462963a0e01d73991324e3a90152</id>
<content type='text'>
[ Upstream commit 6c160001661b6c4e20f5c31909c722741e14c2d8 ]

In svm_migrate_gart_map(), while migrating GART mapping, the number of
bytes copied for the GART table only accounts for CPU pages. On non-4K
systems, each CPU page can contain multiple GPU pages, and the GART
requires one 8-byte PTE per GPU page. As a result, an incorrect size was
passed to the DMA, causing only a partial update of the GART table.

Fix this function to work correctly on non-4K page-size systems by
accounting for the number of GPU pages per CPU page when calculating the
number of bytes to be copied.

Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Signed-off-by: Donet Tom &lt;donettom@linux.ibm.com&gt;
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Relax size checking during queue buffer get</title>
<updated>2026-03-04T12:19:59+00:00</updated>
<author>
<name>Donet Tom</name>
<email>donettom@linux.ibm.com</email>
</author>
<published>2026-01-12T14:06:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=acfc84cfa70aca5b970faf152979bc97b9f8b0c0'/>
<id>urn:sha1:acfc84cfa70aca5b970faf152979bc97b9f8b0c0</id>
<content type='text'>
[ Upstream commit 42ea9cf2f16b7131cb7302acb3dac510968f8bdc ]

HW-supported EOP buffer sizes are 4K and 32K. On systems that do not
use 4K pages, the minimum buffer object (BO) allocation size is
PAGE_SIZE (for example, 64K). During queue buffer acquisition, the driver
currently checks the allocated BO size against the supported EOP buffer
size. Since the allocated BO is larger than the expected size, this check
fails, preventing queue creation.

Relax the strict size validation and allow PAGE_SIZE-sized BOs to be used.
Only the required 4K region of the buffer will be used as the EOP buffer
and avoids queue creation failures on non-4K page systems.

Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Philip Yang &lt;yangp@amd.com&gt;
Signed-off-by: Donet Tom &lt;donettom@linux.ibm.com&gt;
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Handle GPU reset and drain retry fault race</title>
<updated>2026-03-04T12:19:48+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2025-11-19T21:32:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b3d6ca4f27d49cb0e8c1c507c44d6ac0583213e2'/>
<id>urn:sha1:b3d6ca4f27d49cb0e8c1c507c44d6ac0583213e2</id>
<content type='text'>
[ Upstream commit 5b57c3c3f22336e8fd5edb7f0fef3c7823f8eac1 ]

Only check and drain IH1 ring if CAM is not enabled.

If GPU is under reset, don't access IH to drain retry fault.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
