<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdkfd, branch v6.1.5</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.5</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.5'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-01-12T11:02:52+00:00</updated>
<entry>
<title>drm/amdkfd: Fix kernel warning during topology setup</title>
<updated>2023-01-12T11:02:52+00:00</updated>
<author>
<name>Mukul Joshi</name>
<email>mukul.joshi@amd.com</email>
</author>
<published>2022-12-20T22:11:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=306888b1246bf44e703b6f1ccc746c2746c1a981'/>
<id>urn:sha1:306888b1246bf44e703b6f1ccc746c2746c1a981</id>
<content type='text'>
commit cf97eb7e47d4671084c7e114c5d88a3d0540ecbd upstream.

This patch fixes the following kernel warning seen during
driver load by correctly initializing the p2plink attr before
creating the sysfs file:

[  +0.002865] ------------[ cut here ]------------
[  +0.002327] kobject: '(null)' (0000000056260cfb): is not initialized, yet kobject_put() is being called.
[  +0.004780] WARNING: CPU: 32 PID: 1006 at lib/kobject.c:718 kobject_put+0xaa/0x1c0
[  +0.001361] Call Trace:
[  +0.001234]  &lt;TASK&gt;
[  +0.001067]  kfd_remove_sysfs_node_entry+0x24a/0x2d0 [amdgpu]
[  +0.003147]  kfd_topology_update_sysfs+0x3d/0x750 [amdgpu]
[  +0.002890]  kfd_topology_add_device+0xbd7/0xc70 [amdgpu]
[  +0.002844]  ? lock_release+0x13c/0x2e0
[  +0.001936]  ? smu_cmn_send_smc_msg_with_param+0x1e8/0x2d0 [amdgpu]
[  +0.003313]  ? amdgpu_dpm_get_mclk+0x54/0x60 [amdgpu]
[  +0.002703]  kgd2kfd_device_init.cold+0x39f/0x4ed [amdgpu]
[  +0.002930]  amdgpu_amdkfd_device_init+0x13d/0x1f0 [amdgpu]
[  +0.002944]  amdgpu_device_init.cold+0x1464/0x17b4 [amdgpu]
[  +0.002970]  ? pci_bus_read_config_word+0x43/0x80
[  +0.002380]  amdgpu_driver_load_kms+0x15/0x100 [amdgpu]
[  +0.002744]  amdgpu_pci_probe+0x147/0x370 [amdgpu]
[  +0.002522]  local_pci_probe+0x40/0x80
[  +0.001896]  work_for_cpu_fn+0x10/0x20
[  +0.001892]  process_one_work+0x26e/0x5a0
[  +0.002029]  worker_thread+0x1fd/0x3e0
[  +0.001890]  ? process_one_work+0x5a0/0x5a0
[  +0.002115]  kthread+0xea/0x110
[  +0.001618]  ? kthread_complete_and_exit+0x20/0x20
[  +0.002422]  ret_from_fork+0x1f/0x30
[  +0.001808]  &lt;/TASK&gt;
[  +0.001103] irq event stamp: 59837
[  +0.001718] hardirqs last  enabled at (59849): [&lt;ffffffffb30fab12&gt;] __up_console_sem+0x52/0x60
[  +0.004414] hardirqs last disabled at (59860): [&lt;ffffffffb30faaf7&gt;] __up_console_sem+0x37/0x60
[  +0.004414] softirqs last  enabled at (59654): [&lt;ffffffffb307d9c7&gt;] irq_exit_rcu+0xd7/0x130
[  +0.004205] softirqs last disabled at (59649): [&lt;ffffffffb307d9c7&gt;] irq_exit_rcu+0xd7/0x130
[  +0.004203] ---[ end trace 0000000000000000 ]---

Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
Signed-off-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix double release compute pasid</title>
<updated>2023-01-12T11:02:38+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2022-12-13T05:50:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a02c07b619899179384fde06f951530438a3512d'/>
<id>urn:sha1:a02c07b619899179384fde06f951530438a3512d</id>
<content type='text'>
[ Upstream commit 1a799c4c190ea9f0e81028e3eb3037ed0ab17ff5 ]

If kfd_process_device_init_vm returns failure after vm is converted to
compute vm and vm-&gt;pasid set to compute pasid, KFD will not take
pdd-&gt;drm_file reference. As a result, drm close file handler maybe
called to release the compute pasid before KFD process destroy worker to
release the same pasid and set vm-&gt;pasid to zero, this generates below
WARNING backtrace and NULL pointer access.

Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step
of kfd_process_device_init_vm, to ensure vm pasid is the original pasid
if acquiring vm failed or is the compute pasid with pdd-&gt;drm_file
reference taken to avoid double release same pasid.

 amdgpu: Failed to create process VM object
 ida_free called for id=32770 which is not allocated.
 WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140
 RIP: 0010:ida_free+0x96/0x140
 Call Trace:
  amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
  amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
  drm_file_free.part.13+0x216/0x270 [drm]
  drm_close_helper.isra.14+0x60/0x70 [drm]
  drm_release+0x6e/0xf0 [drm]
  __fput+0xcc/0x280
  ____fput+0xe/0x20
  task_work_run+0x96/0xc0
  do_exit+0x3d0/0xc10

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 RIP: 0010:ida_free+0x76/0x140
 Call Trace:
  amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
  amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
  drm_file_free.part.13+0x216/0x270 [drm]
  drm_close_helper.isra.14+0x60/0x70 [drm]
  drm_release+0x6e/0xf0 [drm]
  __fput+0xcc/0x280
  ____fput+0xe/0x20
  task_work_run+0x96/0xc0
  do_exit+0x3d0/0xc10

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix kfd_process_device_init_vm error handling</title>
<updated>2023-01-12T11:02:37+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2022-12-14T15:15:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d74d1f52e16d8e07f7fbe52e96d6391418a2fe9'/>
<id>urn:sha1:9d74d1f52e16d8e07f7fbe52e96d6391418a2fe9</id>
<content type='text'>
[ Upstream commit 29d48b87db64b6697ddad007548e51d032081c59 ]

Should only destroy the ib_mem and let process cleanup worker to free
the outstanding BOs. Reset the pointer in pdd-&gt;qpd structure, to avoid
NULL pointer access in process destroy worker.

 BUG: kernel NULL pointer dereference, address: 0000000000000010
 Call Trace:
  amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel+0x46/0xb0 [amdgpu]
  kfd_process_device_destroy_cwsr_dgpu+0x40/0x70 [amdgpu]
  kfd_process_destroy_pdds+0x71/0x190 [amdgpu]
  kfd_process_wq_release+0x2a2/0x3b0 [amdgpu]
  process_one_work+0x2a1/0x600
  worker_thread+0x39/0x3d0

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix error handling in criu_checkpoint</title>
<updated>2022-11-09T22:57:48+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>Felix.Kuehling@amd.com</email>
</author>
<published>2022-11-01T19:02:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b91c23e099f0b65d62159da13458c5eefa76083f'/>
<id>urn:sha1:b91c23e099f0b65d62159da13458c5eefa76083f</id>
<content type='text'>
Checkpoint BOs last. That way we don't need to close dmabuf FDs if
something else fails later. This avoids problematic access to user mode
memory in the error handling code path.

criu_checkpoint_bos has its own error handling and cleanup that does not
depend on access to user memory.

In the private data, keep BOs before the remaining objects. This is
necessary to restore things in the correct order as restoring events
depends on the events-page BO being restored first.

Fixes: be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
Reported-by: Jann Horn &lt;jannh@google.com&gt;
CC: Rajneesh Bhardwaj &lt;Rajneesh.Bhardwaj@amd.com&gt;
Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-and-tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix error handling in kfd_criu_restore_events</title>
<updated>2022-11-09T22:57:16+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>Felix.Kuehling@amd.com</email>
</author>
<published>2022-11-03T21:01:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=66f7903779fbbc620bf1040017e4833ef6a0b541'/>
<id>urn:sha1:66f7903779fbbc620bf1040017e4833ef6a0b541</id>
<content type='text'>
mutex_unlock before the exit label because all the error code paths that
jump there didn't take that lock. This fixes unbalanced locking errors
in case of restore errors.

Fixes: 40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: update GFX11 CWSR trap handler</title>
<updated>2022-11-02T21:16:25+00:00</updated>
<author>
<name>Jay Cornwall</name>
<email>jay.cornwall@amd.com</email>
</author>
<published>2022-10-14T02:41:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6640f8e5adb69a0550fe1d224d3ac64c10f00eef'/>
<id>urn:sha1:6640f8e5adb69a0550fe1d224d3ac64c10f00eef</id>
<content type='text'>
With corresponding FW change fixes issue where triggering CWSR on a
workgroup with waves in s_barrier wouldn't lead to a back-off and
therefore cause a hang.

Signed-off-by: Jay Cornwall &lt;jay.cornwall@amd.com&gt;
Tested-by: Graham Sider &lt;Graham.Sider@amd.com&gt;
Acked-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Acked-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Graham Sider &lt;Graham.Sider@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.0.x
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()</title>
<updated>2022-11-02T20:58:36+00:00</updated>
<author>
<name>Yang Li</name>
<email>yang.lee@linux.alibaba.com</email>
</author>
<published>2022-10-26T02:00:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5b994354af3cab770bf13386469c5725713679af'/>
<id>urn:sha1:5b994354af3cab770bf13386469c5725713679af</id>
<content type='text'>
./drivers/gpu/drm/amd/amdkfd/kfd_migrate.c:985:58-62: ERROR: p is NULL but dereferenced.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2549
Reported-by: Abaci Robot &lt;abaci@linux.alibaba.com&gt;
Signed-off-by: Yang Li &lt;yang.lee@linux.alibaba.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: correct the cache info for gfx1036</title>
<updated>2022-10-24T18:44:03+00:00</updated>
<author>
<name>Jesse Zhang</name>
<email>jesse.zhang@amd.com</email>
</author>
<published>2022-10-11T05:23:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=969758bbf5e9360b63bbb2328ac3fda46bbbc9f5'/>
<id>urn:sha1:969758bbf5e9360b63bbb2328ac3fda46bbbc9f5</id>
<content type='text'>
correct the cache information for gfx1036

Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Yifan Zhang &lt;yifan1.zhang@amd.com&gt;
Signed-off-by: Yifan Zhang &lt;yifan1.zhang@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;jesse.zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: update gfx1037 Lx cache setting</title>
<updated>2022-10-24T18:44:03+00:00</updated>
<author>
<name>Prike Liang</name>
<email>Prike.Liang@amd.com</email>
</author>
<published>2022-10-20T06:44:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9656db1b933caf6ffaaef10322093fe018359090'/>
<id>urn:sha1:9656db1b933caf6ffaaef10322093fe018359090</id>
<content type='text'>
Update the gfx1037 L1/L2 cache setting.

Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2022-10-14T19:28:43+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-10-14T19:28:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5e714bf1713b4b096d20ec75c13880b7086964bd'/>
<id>urn:sha1:5e714bf1713b4b096d20ec75c13880b7086964bd</id>
<content type='text'>
Pull more MM updates from Andrew Morton:

 - fix a race which causes page refcounting errors in ZONE_DEVICE pages
   (Alistair Popple)

 - fix userfaultfd test harness instability (Peter Xu)

 - various other patches in MM, mainly fixes

* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
  highmem: fix kmap_to_page() for kmap_local_page() addresses
  mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
  mm/selftest: uffd: explain the write missing fault check
  mm/hugetlb: use hugetlb_pte_stable in migration race check
  mm/hugetlb: fix race condition of uffd missing/minor handling
  zram: always expose rw_page
  LoongArch: update local TLB if PTE entry exists
  mm: use update_mmu_tlb() on the second thread
  kasan: fix array-bounds warnings in tests
  hmm-tests: add test for migrate_device_range()
  nouveau/dmem: evict device private memory during release
  nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
  mm/migrate_device.c: add migrate_device_range()
  mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
  mm/memremap.c: take a pgmap reference on page allocation
  mm: free device private pages have zero refcount
  mm/memory.c: fix race when faulting a device private page
  mm/damon: use damon_sz_region() in appropriate place
  mm/damon: move sz_damon_region to damon_sz_region
  lib/test_meminit: add checks for the allocation functions
  ...
</content>
</entry>
</feed>
