<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdkfd, branch v6.6.36</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.36</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.36'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-06-16T11:47:38+00:00</updated>
<entry>
<title>Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"</title>
<updated>2024-06-16T11:47:38+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2024-05-20T18:41:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bb430ea4ba31dcc439d26e02bcbc4473cad4c6d9'/>
<id>urn:sha1:bb430ea4ba31dcc439d26e02bcbc4473cad4c6d9</id>
<content type='text'>
commit dd2b75fd9a79bf418e088656822af06fc253dbe3 upstream.

This reverts commit 28ebbb4981cb1fad12e0b1227dbecc88810b1ee8.

Revert this commit as apparently the LLVM code to take advantage of
this never landed.

Reviewed-by: Feifei Xu &lt;Feifei.Xu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Feifei Xu &lt;feifei.xu@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Flush the process wq before creating a kfd_process</title>
<updated>2024-06-12T09:11:29+00:00</updated>
<author>
<name>Lancelot SIX</name>
<email>lancelot.six@amd.com</email>
</author>
<published>2024-04-10T13:14:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6c49ba4025ee04913eedac508260aefcc2766d1f'/>
<id>urn:sha1:6c49ba4025ee04913eedac508260aefcc2766d1f</id>
<content type='text'>
[ Upstream commit f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5 ]

There is a race condition when re-creating a kfd_process for a process.
This has been observed when a process under the debugger executes
exec(3).  In this scenario:
- The process executes exec.
 - This will eventually release the process's mm, which will cause the
   kfd_process object associated with the process to be freed
   (kfd_process_free_notifier decrements the reference count to the
   kfd_process to 0).  This causes kfd_process_ref_release to enqueue
   kfd_process_wq_release to the kfd_process_wq.
- The debugger receives the PTRACE_EVENT_EXEC notification, and tries to
  re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE).
 - When handling this request, KFD tries to re-create a kfd_process.
   This eventually calls kfd_create_process and kobject_init_and_add.

At this point the call to kobject_init_and_add can fail because the
old kfd_process.kobj has not been freed yet by kfd_process_wq_release.

This patch proposes to avoid this race by making sure to drain
kfd_process_wq before creating a new kfd_process object.  This way, we
know that any cleanup task is done executing when we reach
kobject_init_and_add.

Signed-off-by: Lancelot SIX &lt;lancelot.six@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Add VRAM accounting for SVM migration</title>
<updated>2024-06-12T09:11:23+00:00</updated>
<author>
<name>Mukul Joshi</name>
<email>mukul.joshi@amd.com</email>
</author>
<published>2024-04-18T19:13:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bdbb7611dd2ebb6b6035c8c039ef931452af8f41'/>
<id>urn:sha1:bdbb7611dd2ebb6b6035c8c039ef931452af8f41</id>
<content type='text'>
[ Upstream commit 1e214f7faaf5d842754cd5cfcd76308bfedab3b5 ]

Do VRAM accounting when doing migrations to vram to make sure
there is enough available VRAM and migrating to VRAM doesn't evict
other possible non-unified memory BOs. If migrating to VRAM fails,
driver can fall back to using system memory seamlessly.

Signed-off-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: don't allow mapping the MMIO HDP page with large pages</title>
<updated>2024-05-17T10:02:34+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2024-04-14T17:06:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4b4cff994a27ebf7bd3fb9a798a1cdfa8d01b724'/>
<id>urn:sha1:4b4cff994a27ebf7bd3fb9a798a1cdfa8d01b724</id>
<content type='text'>
commit be4a2a81b6b90d1a47eaeaace4cc8e2cb57b96c7 upstream.

We don't get the right offset in that case.  The GPU has
an unused 4K area of the register BAR space into which you can
remap registers.  We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM.  However, on systems with &gt;4K pages, we end up
exposing PAGE_SIZE of MMIO space.

Fixes: d8e408a82704 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>amd/amdkfd: sync all devices to wait all processes being evicted</title>
<updated>2024-05-17T10:02:16+00:00</updated>
<author>
<name>Zhigang Luo</name>
<email>Zhigang.Luo@amd.com</email>
</author>
<published>2024-03-18T18:13:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b6f6626528fe724b512c34f3fb5946c36a135f58'/>
<id>urn:sha1:b6f6626528fe724b512c34f3fb5946c36a135f58</id>
<content type='text'>
[ Upstream commit d06af584be5a769d124b7302b32a033e9559761d ]

If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault.

Signed-off-by: Zhigang Luo &lt;Zhigang.Luo@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: range check cp bad op exception interrupts</title>
<updated>2024-05-17T10:02:11+00:00</updated>
<author>
<name>Jonathan Kim</name>
<email>Jonathan.Kim@amd.com</email>
</author>
<published>2024-03-01T17:25:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=41dc6791596656dd41100b85647ed489e1d5c2f2'/>
<id>urn:sha1:41dc6791596656dd41100b85647ed489e1d5c2f2</id>
<content type='text'>
[ Upstream commit 0cac183b98d8a8c692c98e8dba37df15a9e9210d ]

Due to a CP interrupt bug, bad packet garbage exception codes are raised.
Do a range check so that the debugger and runtime do not receive garbage
codes.
Update the user api to guard exception code type checking as well.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Tested-by: Jesse Zhang &lt;jesse.zhang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Check cgroup when returning DMABuf info</title>
<updated>2024-05-17T10:02:11+00:00</updated>
<author>
<name>Mukul Joshi</name>
<email>mukul.joshi@amd.com</email>
</author>
<published>2024-02-12T15:40:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d644962423704441713be6669b9544beed5d4176'/>
<id>urn:sha1:d644962423704441713be6669b9544beed5d4176</id>
<content type='text'>
[ Upstream commit 9d7993a7ab9651afd5fb295a4992e511b2b727aa ]

Check cgroup permissions when returning DMA-buf info and
based on cgroup info return the GPU id of the GPU that have
access to the BO.

Signed-off-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix memory leak in create_process failure</title>
<updated>2024-04-27T15:11:42+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>felix.kuehling@amd.com</email>
</author>
<published>2024-04-10T19:52:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aa02d43367a9adf8c85fb382fea4171fb266c8d0'/>
<id>urn:sha1:aa02d43367a9adf8c85fb382fea4171fb266c8d0</id>
<content type='text'>
commit 18921b205012568b45760753ad3146ddb9e2d4e2 upstream.

Fix memory leak due to a leaked mmget reference on an error handling
code path that is triggered when attempting to create KFD processes
while a GPU reset is in progress.

Fixes: 0ab2d7532b05 ("drm/amdkfd: prepare per-process debug enable and disable")
CC: Xiaogang Chen &lt;xiaogang.chen@amd.com&gt;
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Tested-by: Harish Kasiviswanthan &lt;Harish.Kasiviswanthan@amd.com&gt;
Reviewed-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Reset GPU on queue preemption failure</title>
<updated>2024-04-17T09:19:34+00:00</updated>
<author>
<name>Harish Kasiviswanathan</name>
<email>Harish.Kasiviswanathan@amd.com</email>
</author>
<published>2024-03-26T19:32:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d87f08eb75513334a85458306373d7560af1017'/>
<id>urn:sha1:4d87f08eb75513334a85458306373d7560af1017</id>
<content type='text'>
commit 8bdfb4ea95ca738d33ef71376c21eba20130f2eb upstream.

Currently, with F32 HWS GPU reset is only when unmap queue fails.

However, if compute queue doesn't repond to preemption request in time
unmap will return without any error. In this case, only preemption error
is logged and Reset is not triggered. Call GPU reset in this case also.

Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Reviewed-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>amdkfd: use calloc instead of kzalloc to avoid integer overflow</title>
<updated>2024-04-13T11:07:28+00:00</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2024-04-11T20:11:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=315eb3c2df7e4cb18e3eacfa18a53a46f2bf0ef7'/>
<id>urn:sha1:315eb3c2df7e4cb18e3eacfa18a53a46f2bf0ef7</id>
<content type='text'>
commit 3b0daecfeac0103aba8b293df07a0cbaf8b43f29 upstream.

This uses calloc instead of doing the multiplication which might
overflow.

Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
