<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdkfd/kfd_process.c, branch linux-6.9.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.9.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.9.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-05-01T01:59:16+00:00</updated>
<entry>
<title>drm/amdkfd: Flush the process wq before creating a kfd_process</title>
<updated>2024-05-01T01:59:16+00:00</updated>
<author>
<name>Lancelot SIX</name>
<email>lancelot.six@amd.com</email>
</author>
<published>2024-04-10T13:14:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5'/>
<id>urn:sha1:f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5</id>
<content type='text'>
There is a race condition when re-creating a kfd_process for a process.
This has been observed when a process under the debugger executes
exec(3).  In this scenario:
- The process executes exec.
 - This will eventually release the process's mm, which will cause the
   kfd_process object associated with the process to be freed
   (kfd_process_free_notifier decrements the reference count to the
   kfd_process to 0).  This causes kfd_process_ref_release to enqueue
   kfd_process_wq_release to the kfd_process_wq.
- The debugger receives the PTRACE_EVENT_EXEC notification, and tries to
  re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE).
 - When handling this request, KFD tries to re-create a kfd_process.
   This eventually calls kfd_create_process and kobject_init_and_add.

At this point the call to kobject_init_and_add can fail because the
old kfd_process.kobj has not been freed yet by kfd_process_wq_release.

This patch proposes to avoid this race by making sure to drain
kfd_process_wq before creating a new kfd_process object.  This way, we
know that any cleanup task is done executing when we reach
kobject_init_and_add.

Signed-off-by: Lancelot SIX &lt;lancelot.six@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix rescheduling of restore worker</title>
<updated>2024-04-24T03:23:28+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>felix.kuehling@amd.com</email>
</author>
<published>2024-04-19T17:25:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e26305f369ed0e087a043c2cdc76f3d9a6efb3bd'/>
<id>urn:sha1:e26305f369ed0e087a043c2cdc76f3d9a6efb3bd</id>
<content type='text'>
Handle the case that the restore worker was already scheduled by another
eviction while the restore was in progress.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Tested-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix eviction fence handling</title>
<updated>2024-04-24T03:17:30+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>felix.kuehling@amd.com</email>
</author>
<published>2024-04-18T01:13:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=37865e02e6ccecdda240f33b4332105a5c734984'/>
<id>urn:sha1:37865e02e6ccecdda240f33b4332105a5c734984</id>
<content type='text'>
Handle case that dma_fence_get_rcu_safe returns NULL.

If restore work is already scheduled, only update its timer. The same
work item cannot be queued twice, so undo the extra queue eviction.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Tested-by: Gang BA &lt;Gang.Ba@amd.com&gt;
Reviewed-by: Gang BA &lt;Gang.Ba@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix memory leak in create_process failure</title>
<updated>2024-04-17T15:05:09+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>felix.kuehling@amd.com</email>
</author>
<published>2024-04-10T19:52:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=18921b205012568b45760753ad3146ddb9e2d4e2'/>
<id>urn:sha1:18921b205012568b45760753ad3146ddb9e2d4e2</id>
<content type='text'>
Fix memory leak due to a leaked mmget reference on an error handling
code path that is triggered when attempting to create KFD processes
while a GPU reset is in progress.

Fixes: 0ab2d7532b05 ("drm/amdkfd: prepare per-process debug enable and disable")
CC: Xiaogang Chen &lt;xiaogang.chen@amd.com&gt;
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Tested-by: Harish Kasiviswanthan &lt;Harish.Kasiviswanthan@amd.com&gt;
Reviewed-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix sparse __rcu annotation warnings</title>
<updated>2024-01-09T20:44:13+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>felix.kuehling@amd.com</email>
</author>
<published>2024-01-03T23:13:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c147ddc68e741aed78bba796effe049344d87ab8'/>
<id>urn:sha1:c147ddc68e741aed78bba796effe049344d87ab8</id>
<content type='text'>
Properly mark kfd_process-&gt;ef as __rcu and consistently use the right
accessor functions.

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202312052245.yFpBSgNH-lkp@intel.com/
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Run restore_workers on freezable WQs</title>
<updated>2023-11-29T21:49:23+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>Felix.Kuehling@amd.com</email>
</author>
<published>2023-10-27T22:21:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9a1c1339abf972477aeef4ea037e650f49c5892d'/>
<id>urn:sha1:9a1c1339abf972477aeef4ea037e650f49c5892d</id>
<content type='text'>
Make restore workers freezable so we don't have to explicitly flush them
in suspend and GPU reset code paths, and we don't accidentally try to
restore BOs while the GPU is suspended. Not having to flush restore_work
also helps avoid lock/fence dependencies in the GPU reset case where we're
not allowed to wait for fences.

A side effect of this is, that we can now have multiple concurrent threads
trying to signal the same eviction fence. Rework eviction fence signaling
and replacement to account for that.

The GPU reset path can no longer rely on restore_process_worker to resume
queues because evict/restore workers can run independently of it. Instead
call a new restore_process_helper directly.

This is an RFC and request for testing.

v2:
- Reworked eviction fence signaling
- Introduced restore_process_helper

v3:
- Handle unsignaled eviction fences in restore_process_bos

Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Tested-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Move TLB flushing logic into amdgpu</title>
<updated>2023-11-17T14:29:53+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>Felix.Kuehling@amd.com</email>
</author>
<published>2023-02-24T23:22:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=94e2dae0a8bfd456abfd866f1eee8342f0858012'/>
<id>urn:sha1:94e2dae0a8bfd456abfd866f1eee8342f0858012</id>
<content type='text'>
This will make it possible for amdgpu GEM ioctls to flush TLBs on compute
VMs.

This removes VMID-based TLB flushing and always uses PASID-based
flushing. This still works because it scans the VMID-PASID mapping
registers to find the right VMID. It's only slightly less efficient. This
is not a production use case.

Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd: Disable XNACK on SRIOV environment</title>
<updated>2023-11-07T16:15:37+00:00</updated>
<author>
<name>Surbhi Kakarya</name>
<email>surbhi.kakarya@amd.com</email>
</author>
<published>2023-09-25T16:12:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9256e8d47a2fa0bcb5d32e7fee8c674c476a480f'/>
<id>urn:sha1:9256e8d47a2fa0bcb5d32e7fee8c674c476a480f</id>
<content type='text'>
The purpose of this patch is to disable XNACK or set XNACK OFF mode
on SRIOV platform which doesn't support it.

This will prevent user-space application to fail or result into
unexpected behaviour whenever the application need to run test-case
in XNACK ON mode.

Signed-off-by: Surbhi Kakarya &lt;surbhi.kakarya@amd.com&gt;
Reviewed-by: Shaoyun Liu &lt;shaoyun.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: drop IOMMUv2 support</title>
<updated>2023-08-11T18:47:25+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2023-07-28T16:20:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c99a2e7ae291e5b19b60443eb6397320ef9e8571'/>
<id>urn:sha1:c99a2e7ae291e5b19b60443eb6397320ef9e8571</id>
<content type='text'>
Now that we use the dGPU path for all APUs, drop the
IOMMUv2 support.

v2: drop the now unused queue manager functions for gfx7/8 APUs

Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Tested-by: Mike Lothian &lt;mike@fireburn.co.uk&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: use doorbell mgr for kfd process doorbells</title>
<updated>2023-08-07T21:14:07+00:00</updated>
<author>
<name>Shashank Sharma</name>
<email>shashank.sharma@amd.com</email>
</author>
<published>2023-07-14T14:13:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2105a15a20463cb6cee3061f309aeaabd2996d94'/>
<id>urn:sha1:2105a15a20463cb6cee3061f309aeaabd2996d94</id>
<content type='text'>
This patch:
- adds a doorbell object in kfd pdd structure.
- allocates doorbells for a process while creating its queue.
- frees the doorbells with pdd destroy.
- moves doorbell bitmap init function to kfd_doorbell.c

PS: This patch ensures that we don't break the existing KFD
    functionality, but now KFD userspace library should also
    create doorbell pages as AMDGPU GEM objects using libdrm
    functions in userspace. The reference code for the same
    is available with AMDGPU Usermode queue libdrm MR. Once
    this is done, we will not need to create process doorbells
    in kernel.

V2: - Do not use doorbell wrapper API, use amdgpu_bo_create_kernel
      instead (Alex).
    - Do not use custom doorbell structure, instead use separate
      variables for bo and doorbell_bitmap (Alex)
V3:
   - Do not allocate doorbell page with PDD, delay doorbell process
     page allocation until really needed (Felix)

Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian Koenig &lt;christian.koenig@amd.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felilx.Kuehling@amd.com&gt;
Signed-off-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
