<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdkfd, branch v6.1.18</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.18</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.18'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-03-10T08:33:55+00:00</updated>
<entry>
<title>drm/amdkfd: Page aligned memory reserve size</title>
<updated>2023-03-10T08:33:55+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2023-01-09T23:08:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=82a6debd4e4b98dabd99ab26a23553276559d509'/>
<id>urn:sha1:82a6debd4e4b98dabd99ab26a23553276559d509</id>
<content type='text'>
[ Upstream commit 0c2dece8fb541ab07b68c3312a1065fa9c927a81 ]

Use page aligned size to reserve memory usage because page aligned TTM
BO size is used to unreserve memory usage, otherwise no page aligned
size causes memory usage accounting unbalanced.

Change vram_used definition type to int64_t to be able to trigger
WARN_ONCE(adev &amp;&amp; adev-&gt;kfd.vram_used &lt; 0, "..."), to help debug the
accounting issue with warning and backtrace.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU</title>
<updated>2023-02-01T07:34:32+00:00</updated>
<author>
<name>Eric Huang</name>
<email>jinhuieric.huang@amd.com</email>
</author>
<published>2023-01-05T19:01:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca6263dc22acafc04c0d164900e5662a0e920a8a'/>
<id>urn:sha1:ca6263dc22acafc04c0d164900e5662a0e920a8a</id>
<content type='text'>
[ Upstream commit a6941f89d7c6a6ba49316bbd7da2fb2f719119a7 ]

The point bo-&gt;kfd_bo is NULL for queue's write pointer BO
when creating queue on mGPU. To avoid using the pointer
fixes the error.

Signed-off-by: Eric Huang &lt;jinhuieric.huang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Add sync after creating vram bo</title>
<updated>2023-02-01T07:34:32+00:00</updated>
<author>
<name>Eric Huang</name>
<email>jinhuieric.huang@amd.com</email>
</author>
<published>2023-01-09T19:16:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=92af2d3b57a1afdfdcafb1c6a07ffd89cf3e98fb'/>
<id>urn:sha1:92af2d3b57a1afdfdcafb1c6a07ffd89cf3e98fb</id>
<content type='text'>
[ Upstream commit ba029e9991d9be90a28b6a0ceb25e9a6fb348829 ]

There will be data corruption on vram allocated by svm
if the initialization is not complete and application is
writting on the memory. Adding sync to wait for the
initialization completion is to resolve this issue.

Signed-off-by: Eric Huang &lt;jinhuieric.huang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix kernel warning during topology setup</title>
<updated>2023-01-12T11:02:52+00:00</updated>
<author>
<name>Mukul Joshi</name>
<email>mukul.joshi@amd.com</email>
</author>
<published>2022-12-20T22:11:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=306888b1246bf44e703b6f1ccc746c2746c1a981'/>
<id>urn:sha1:306888b1246bf44e703b6f1ccc746c2746c1a981</id>
<content type='text'>
commit cf97eb7e47d4671084c7e114c5d88a3d0540ecbd upstream.

This patch fixes the following kernel warning seen during
driver load by correctly initializing the p2plink attr before
creating the sysfs file:

[  +0.002865] ------------[ cut here ]------------
[  +0.002327] kobject: '(null)' (0000000056260cfb): is not initialized, yet kobject_put() is being called.
[  +0.004780] WARNING: CPU: 32 PID: 1006 at lib/kobject.c:718 kobject_put+0xaa/0x1c0
[  +0.001361] Call Trace:
[  +0.001234]  &lt;TASK&gt;
[  +0.001067]  kfd_remove_sysfs_node_entry+0x24a/0x2d0 [amdgpu]
[  +0.003147]  kfd_topology_update_sysfs+0x3d/0x750 [amdgpu]
[  +0.002890]  kfd_topology_add_device+0xbd7/0xc70 [amdgpu]
[  +0.002844]  ? lock_release+0x13c/0x2e0
[  +0.001936]  ? smu_cmn_send_smc_msg_with_param+0x1e8/0x2d0 [amdgpu]
[  +0.003313]  ? amdgpu_dpm_get_mclk+0x54/0x60 [amdgpu]
[  +0.002703]  kgd2kfd_device_init.cold+0x39f/0x4ed [amdgpu]
[  +0.002930]  amdgpu_amdkfd_device_init+0x13d/0x1f0 [amdgpu]
[  +0.002944]  amdgpu_device_init.cold+0x1464/0x17b4 [amdgpu]
[  +0.002970]  ? pci_bus_read_config_word+0x43/0x80
[  +0.002380]  amdgpu_driver_load_kms+0x15/0x100 [amdgpu]
[  +0.002744]  amdgpu_pci_probe+0x147/0x370 [amdgpu]
[  +0.002522]  local_pci_probe+0x40/0x80
[  +0.001896]  work_for_cpu_fn+0x10/0x20
[  +0.001892]  process_one_work+0x26e/0x5a0
[  +0.002029]  worker_thread+0x1fd/0x3e0
[  +0.001890]  ? process_one_work+0x5a0/0x5a0
[  +0.002115]  kthread+0xea/0x110
[  +0.001618]  ? kthread_complete_and_exit+0x20/0x20
[  +0.002422]  ret_from_fork+0x1f/0x30
[  +0.001808]  &lt;/TASK&gt;
[  +0.001103] irq event stamp: 59837
[  +0.001718] hardirqs last  enabled at (59849): [&lt;ffffffffb30fab12&gt;] __up_console_sem+0x52/0x60
[  +0.004414] hardirqs last disabled at (59860): [&lt;ffffffffb30faaf7&gt;] __up_console_sem+0x37/0x60
[  +0.004414] softirqs last  enabled at (59654): [&lt;ffffffffb307d9c7&gt;] irq_exit_rcu+0xd7/0x130
[  +0.004205] softirqs last disabled at (59649): [&lt;ffffffffb307d9c7&gt;] irq_exit_rcu+0xd7/0x130
[  +0.004203] ---[ end trace 0000000000000000 ]---

Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
Signed-off-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix double release compute pasid</title>
<updated>2023-01-12T11:02:38+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2022-12-13T05:50:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a02c07b619899179384fde06f951530438a3512d'/>
<id>urn:sha1:a02c07b619899179384fde06f951530438a3512d</id>
<content type='text'>
[ Upstream commit 1a799c4c190ea9f0e81028e3eb3037ed0ab17ff5 ]

If kfd_process_device_init_vm returns failure after vm is converted to
compute vm and vm-&gt;pasid set to compute pasid, KFD will not take
pdd-&gt;drm_file reference. As a result, drm close file handler maybe
called to release the compute pasid before KFD process destroy worker to
release the same pasid and set vm-&gt;pasid to zero, this generates below
WARNING backtrace and NULL pointer access.

Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step
of kfd_process_device_init_vm, to ensure vm pasid is the original pasid
if acquiring vm failed or is the compute pasid with pdd-&gt;drm_file
reference taken to avoid double release same pasid.

 amdgpu: Failed to create process VM object
 ida_free called for id=32770 which is not allocated.
 WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140
 RIP: 0010:ida_free+0x96/0x140
 Call Trace:
  amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
  amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
  drm_file_free.part.13+0x216/0x270 [drm]
  drm_close_helper.isra.14+0x60/0x70 [drm]
  drm_release+0x6e/0xf0 [drm]
  __fput+0xcc/0x280
  ____fput+0xe/0x20
  task_work_run+0x96/0xc0
  do_exit+0x3d0/0xc10

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 RIP: 0010:ida_free+0x76/0x140
 Call Trace:
  amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
  amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
  drm_file_free.part.13+0x216/0x270 [drm]
  drm_close_helper.isra.14+0x60/0x70 [drm]
  drm_release+0x6e/0xf0 [drm]
  __fput+0xcc/0x280
  ____fput+0xe/0x20
  task_work_run+0x96/0xc0
  do_exit+0x3d0/0xc10

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix kfd_process_device_init_vm error handling</title>
<updated>2023-01-12T11:02:37+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2022-12-14T15:15:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d74d1f52e16d8e07f7fbe52e96d6391418a2fe9'/>
<id>urn:sha1:9d74d1f52e16d8e07f7fbe52e96d6391418a2fe9</id>
<content type='text'>
[ Upstream commit 29d48b87db64b6697ddad007548e51d032081c59 ]

Should only destroy the ib_mem and let process cleanup worker to free
the outstanding BOs. Reset the pointer in pdd-&gt;qpd structure, to avoid
NULL pointer access in process destroy worker.

 BUG: kernel NULL pointer dereference, address: 0000000000000010
 Call Trace:
  amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel+0x46/0xb0 [amdgpu]
  kfd_process_device_destroy_cwsr_dgpu+0x40/0x70 [amdgpu]
  kfd_process_destroy_pdds+0x71/0x190 [amdgpu]
  kfd_process_wq_release+0x2a2/0x3b0 [amdgpu]
  process_one_work+0x2a1/0x600
  worker_thread+0x39/0x3d0

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix error handling in criu_checkpoint</title>
<updated>2022-11-09T22:57:48+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>Felix.Kuehling@amd.com</email>
</author>
<published>2022-11-01T19:02:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b91c23e099f0b65d62159da13458c5eefa76083f'/>
<id>urn:sha1:b91c23e099f0b65d62159da13458c5eefa76083f</id>
<content type='text'>
Checkpoint BOs last. That way we don't need to close dmabuf FDs if
something else fails later. This avoids problematic access to user mode
memory in the error handling code path.

criu_checkpoint_bos has its own error handling and cleanup that does not
depend on access to user memory.

In the private data, keep BOs before the remaining objects. This is
necessary to restore things in the correct order as restoring events
depends on the events-page BO being restored first.

Fixes: be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
Reported-by: Jann Horn &lt;jannh@google.com&gt;
CC: Rajneesh Bhardwaj &lt;Rajneesh.Bhardwaj@amd.com&gt;
Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-and-tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix error handling in kfd_criu_restore_events</title>
<updated>2022-11-09T22:57:16+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>Felix.Kuehling@amd.com</email>
</author>
<published>2022-11-03T21:01:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=66f7903779fbbc620bf1040017e4833ef6a0b541'/>
<id>urn:sha1:66f7903779fbbc620bf1040017e4833ef6a0b541</id>
<content type='text'>
mutex_unlock before the exit label because all the error code paths that
jump there didn't take that lock. This fixes unbalanced locking errors
in case of restore errors.

Fixes: 40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: update GFX11 CWSR trap handler</title>
<updated>2022-11-02T21:16:25+00:00</updated>
<author>
<name>Jay Cornwall</name>
<email>jay.cornwall@amd.com</email>
</author>
<published>2022-10-14T02:41:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6640f8e5adb69a0550fe1d224d3ac64c10f00eef'/>
<id>urn:sha1:6640f8e5adb69a0550fe1d224d3ac64c10f00eef</id>
<content type='text'>
With corresponding FW change fixes issue where triggering CWSR on a
workgroup with waves in s_barrier wouldn't lead to a back-off and
therefore cause a hang.

Signed-off-by: Jay Cornwall &lt;jay.cornwall@amd.com&gt;
Tested-by: Graham Sider &lt;Graham.Sider@amd.com&gt;
Acked-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Acked-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Graham Sider &lt;Graham.Sider@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.0.x
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()</title>
<updated>2022-11-02T20:58:36+00:00</updated>
<author>
<name>Yang Li</name>
<email>yang.lee@linux.alibaba.com</email>
</author>
<published>2022-10-26T02:00:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5b994354af3cab770bf13386469c5725713679af'/>
<id>urn:sha1:5b994354af3cab770bf13386469c5725713679af</id>
<content type='text'>
./drivers/gpu/drm/amd/amdkfd/kfd_migrate.c:985:58-62: ERROR: p is NULL but dereferenced.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2549
Reported-by: Abaci Robot &lt;abaci@linux.alibaba.com&gt;
Signed-off-by: Yang Li &lt;yang.lee@linux.alibaba.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
</feed>
