<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdkfd, branch v6.1.85</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.85</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.85'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-04-03T13:19:49+00:00</updated>
<entry>
<title>drm/amdkfd: fix TLB flush after unmap for GFX9.4.2</title>
<updated>2024-04-03T13:19:49+00:00</updated>
<author>
<name>Eric Huang</name>
<email>jinhuieric.huang@amd.com</email>
</author>
<published>2024-03-20T19:53:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b691954c94dbbe63c0aa9241cc516985aa59df05'/>
<id>urn:sha1:b691954c94dbbe63c0aa9241cc516985aa59df05</id>
<content type='text'>
commit 1210e2f1033dc56b666c9f6dfb761a2d3f9f5d6c upstream.

TLB flush after unmap accidentially was removed on
gfx9.4.2. It is to add it back.

Signed-off-by: Eric Huang &lt;jinhuieric.huang@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix 'node' NULL check in 'svm_range_get_range_boundaries()'</title>
<updated>2024-02-05T20:13:00+00:00</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2024-01-09T11:27:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8eea7e1d69e255f98a94e17b07cfb9a6d68128af'/>
<id>urn:sha1:8eea7e1d69e255f98a94e17b07cfb9a6d68128af</id>
<content type='text'>
[ Upstream commit d7a254fad873775ce6c32b77796c81e81e6b7f2e ]

Range interval [start, last] is ordered by rb_tree, rb_prev, rb_next
return value still needs NULL check, thus modified from "node" to "rb_node".

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2691 svm_range_get_range_boundaries() warn: can 'node' even be NULL?

Suggested-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix lock dependency warning with srcu</title>
<updated>2024-02-05T20:12:59+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2023-12-29T20:19:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b602f098f716723fa5c6c96a486e0afba83b7b94'/>
<id>urn:sha1:b602f098f716723fa5c6c96a486e0afba83b7b94</id>
<content type='text'>
[ Upstream commit 2a9de42e8d3c82c6990d226198602be44f43f340 ]

======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-yangp #2289 Not tainted
------------------------------------------------------
kworker/0:2/996 is trying to acquire lock:
        (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0

but task is already holding lock:
        ((work_completion)(&amp;svms-&gt;deferred_list_work)){+.+.}-{0:0}, at:
	process_one_work+0x211/0x560

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-&gt; #3 ((work_completion)(&amp;svms-&gt;deferred_list_work)){+.+.}-{0:0}:
        __flush_work+0x88/0x4f0
        svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu]
        svm_range_set_attr+0xd6/0x14c0 [amdgpu]
        kfd_ioctl+0x1d1/0x630 [amdgpu]
        __x64_sys_ioctl+0x88/0xc0

-&gt; #2 (&amp;info-&gt;lock#2){+.+.}-{3:3}:
        __mutex_lock+0x99/0xc70
        amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu]
        restore_process_helper+0x22/0x80 [amdgpu]
        restore_process_worker+0x2d/0xa0 [amdgpu]
        process_one_work+0x29b/0x560
        worker_thread+0x3d/0x3d0

-&gt; #1 ((work_completion)(&amp;(&amp;process-&gt;restore_work)-&gt;work)){+.+.}-{0:0}:
        __flush_work+0x88/0x4f0
        __cancel_work_timer+0x12c/0x1c0
        kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu]
        __mmu_notifier_release+0xad/0x240
        exit_mmap+0x6a/0x3a0
        mmput+0x6a/0x120
        do_exit+0x322/0xb90
        do_group_exit+0x37/0xa0
        __x64_sys_exit_group+0x18/0x20
        do_syscall_64+0x38/0x80

-&gt; #0 (srcu){.+.+}-{0:0}:
        __lock_acquire+0x1521/0x2510
        lock_sync+0x5f/0x90
        __synchronize_srcu+0x4f/0x1a0
        __mmu_notifier_release+0x128/0x240
        exit_mmap+0x6a/0x3a0
        mmput+0x6a/0x120
        svm_range_deferred_list_work+0x19f/0x350 [amdgpu]
        process_one_work+0x29b/0x560
        worker_thread+0x3d/0x3d0

other info that might help us debug this:
Chain exists of:
  srcu --&gt; &amp;info-&gt;lock#2 --&gt; (work_completion)(&amp;svms-&gt;deferred_list_work)

Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
        lock((work_completion)(&amp;svms-&gt;deferred_list_work));
                        lock(&amp;info-&gt;lock#2);
			lock((work_completion)(&amp;svms-&gt;deferred_list_work));
        sync(srcu);

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix lock dependency warning</title>
<updated>2024-02-05T20:12:59+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>felix.kuehling@amd.com</email>
</author>
<published>2024-01-02T20:07:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8b25d397162b0316ceda40afaa63ee0c4a97d28b'/>
<id>urn:sha1:8b25d397162b0316ceda40afaa63ee0c4a97d28b</id>
<content type='text'>
[ Upstream commit 47bf0f83fc86df1bf42b385a91aadb910137c5c9 ]

======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-fkuehlin #276 Not tainted
------------------------------------------------------
kworker/8:2/2676 is trying to acquire lock:
ffff9435aae95c88 ((work_completion)(&amp;svm_bo-&gt;eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550

but task is already holding lock:
ffff9435cd8e1720 (&amp;svms-&gt;lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-&gt; #2 (&amp;svms-&gt;lock){+.+.}-{3:3}:
       __mutex_lock+0x97/0xd30
       kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu]
       kfd_ioctl+0x1b2/0x5d0 [amdgpu]
       __x64_sys_ioctl+0x86/0xc0
       do_syscall_64+0x39/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-&gt; #1 (&amp;mm-&gt;mmap_lock){++++}-{3:3}:
       down_read+0x42/0x160
       svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu]
       process_one_work+0x27a/0x540
       worker_thread+0x53/0x3e0
       kthread+0xeb/0x120
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x11/0x20

-&gt; #0 ((work_completion)(&amp;svm_bo-&gt;eviction_work)){+.+.}-{0:0}:
       __lock_acquire+0x1426/0x2200
       lock_acquire+0xc1/0x2b0
       __flush_work+0x80/0x550
       __cancel_work_timer+0x109/0x190
       svm_range_bo_release+0xdc/0x1c0 [amdgpu]
       svm_range_free+0x175/0x180 [amdgpu]
       svm_range_deferred_list_work+0x15d/0x340 [amdgpu]
       process_one_work+0x27a/0x540
       worker_thread+0x53/0x3e0
       kthread+0xeb/0x120
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x11/0x20

other info that might help us debug this:

Chain exists of:
  (work_completion)(&amp;svm_bo-&gt;eviction_work) --&gt; &amp;mm-&gt;mmap_lock --&gt; &amp;svms-&gt;lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&amp;svms-&gt;lock);
                               lock(&amp;mm-&gt;mmap_lock);
                               lock(&amp;svms-&gt;lock);
  lock((work_completion)(&amp;svm_bo-&gt;eviction_work));

I believe this cannot really lead to a deadlock in practice, because
svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO
refcount is non-0. That means it's impossible that svm_range_bo_release
is running concurrently. However, there is no good way to annotate this.

To avoid the problem, take a BO reference in
svm_range_schedule_evict_svm_bo instead of in the worker. That way it's
impossible for a BO to get freed while eviction work is pending and the
cancel_work_sync call in svm_range_bo_release can be eliminated.

v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also
removed redundant checks that are already done in
amdkfd_fence_enable_signaling.

Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Philip Yang &lt;philip.yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix iterator used outside loop in 'kfd_add_peer_prop()'</title>
<updated>2024-02-05T20:12:57+00:00</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2023-12-29T09:37:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=05a0900bd7a50036c4286ce42ba7f4105a1a428a'/>
<id>urn:sha1:05a0900bd7a50036c4286ce42ba7f4105a1a428a</id>
<content type='text'>
[ Upstream commit b1a428b45dc7e47c7acc2ad0d08d8a6dda910c4c ]

Fix the following about iterator use:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1456 kfd_add_peer_prop() warn: iterator used outside loop: 'iolink3'

Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: fixes for HMM mem allocation</title>
<updated>2024-01-25T23:27:50+00:00</updated>
<author>
<name>Dafna Hirschfeld</name>
<email>dhirschfeld@habana.ai</email>
</author>
<published>2024-01-07T13:07:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=359fadf5f770632f03d1948db94a0c413f2d635e'/>
<id>urn:sha1:359fadf5f770632f03d1948db94a0c413f2d635e</id>
<content type='text'>
[ Upstream commit 02eed83abc1395a1207591aafad9bcfc5cb1abcb ]

Fix err return value and reset pgmap-&gt;type after checking it.

Fixes: c83dee9b6394 ("drm/amdkfd: add SPM support for SVM")
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Dafna Hirschfeld &lt;dhirschfeld@habana.ai&gt;
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Use resource_size() helper function</title>
<updated>2024-01-25T23:27:50+00:00</updated>
<author>
<name>Deepak R Varma</name>
<email>drv@mailo.com</email>
</author>
<published>2022-12-22T21:15:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=81b86a10b8b69cddc7a2605274e9e75f15c9f080'/>
<id>urn:sha1:81b86a10b8b69cddc7a2605274e9e75f15c9f080</id>
<content type='text'>
[ Upstream commit 9d086e0ddaeb08876f4df3a1485166bfd7483252 ]

Use the resource_size() function instead of a open coded computation
resource size. It makes the code more readable.

Issue identified using resource_size.cocci coccinelle semantic patch.

Signed-off-by: Deepak R Varma &lt;drv@mailo.com&gt;
Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Stable-dep-of: 02eed83abc13 ("drm/amdkfd: fixes for HMM mem allocation")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Confirm list is non-empty before utilizing list_first_entry in kfd_topology.c</title>
<updated>2024-01-25T23:27:38+00:00</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2023-12-21T01:46:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4525525cb7161d08f95d0e47025323dd10214313'/>
<id>urn:sha1:4525525cb7161d08f95d0e47025323dd10214313</id>
<content type='text'>
[ Upstream commit 499839eca34ad62d43025ec0b46b80e77065f6d8 ]

Before using list_first_entry, make sure to check that list is not
empty, if list is empty return -ENODATA.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1347 kfd_create_indirect_link_prop() warn: can 'gpu_link' even be NULL?
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1428 kfd_add_peer_prop() warn: can 'iolink1' even be NULL?
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1433 kfd_add_peer_prop() warn: can 'iolink2' even be NULL?

Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Suggested-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Suggested-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix shift out-of-bounds issue</title>
<updated>2023-11-28T17:06:58+00:00</updated>
<author>
<name>Jesse Zhang</name>
<email>jesse.zhang@amd.com</email>
</author>
<published>2023-10-20T01:43:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2806f880379232e789957c2078d612669eb7a69c'/>
<id>urn:sha1:2806f880379232e789957c2078d612669eb7a69c</id>
<content type='text'>
[ Upstream commit 282c1d793076c2edac6c3db51b7e8ed2b41d60a5 ]

[  567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned int'
[  567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G           OE      6.2.0-34-generic #34~22.04.1-Ubuntu
[  567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 09/25/2023
[  567.614504] Workqueue: events send_exception_work_handler [amdgpu]
[  567.614748] Call Trace:
[  567.614750]  &lt;TASK&gt;
[  567.614753]  dump_stack_lvl+0x48/0x70
[  567.614761]  dump_stack+0x10/0x20
[  567.614763]  __ubsan_handle_shift_out_of_bounds+0x156/0x310
[  567.614769]  ? srso_alias_return_thunk+0x5/0x7f
[  567.614773]  ? update_sd_lb_stats.constprop.0+0xf2/0x3c0
[  567.614780]  svm_range_split_by_granularity.cold+0x2b/0x34 [amdgpu]
[  567.615047]  ? srso_alias_return_thunk+0x5/0x7f
[  567.615052]  svm_migrate_to_ram+0x185/0x4d0 [amdgpu]
[  567.615286]  do_swap_page+0x7b6/0xa30
[  567.615291]  ? srso_alias_return_thunk+0x5/0x7f
[  567.615294]  ? __free_pages+0x119/0x130
[  567.615299]  handle_pte_fault+0x227/0x280
[  567.615303]  __handle_mm_fault+0x3c0/0x720
[  567.615311]  handle_mm_fault+0x119/0x330
[  567.615314]  ? lock_mm_and_find_vma+0x44/0x250
[  567.615318]  do_user_addr_fault+0x1a9/0x640
[  567.615323]  exc_page_fault+0x81/0x1b0
[  567.615328]  asm_exc_page_fault+0x27/0x30
[  567.615332] RIP: 0010:__get_user_8+0x1c/0x30

Signed-off-by: Jesse Zhang &lt;jesse.zhang@amd.com&gt;
Suggested-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Yifan Zhang &lt;yifan1.zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix a race condition of vram buffer unref in svm code</title>
<updated>2023-11-28T17:06:57+00:00</updated>
<author>
<name>Xiaogang Chen</name>
<email>xiaogang.chen@amd.com</email>
</author>
<published>2023-09-27T16:20:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=50f35a907c4f9ed431fd3dbb8b871ef1cbb0718e'/>
<id>urn:sha1:50f35a907c4f9ed431fd3dbb8b871ef1cbb0718e</id>
<content type='text'>
[ Upstream commit 709c348261618da7ed89d6c303e2ceb9e453ba74 ]

prange-&gt;svm_bo unref can happen in both mmu callback and a callback after
migrate to system ram. Both are async call in different tasks. Sync svm_bo
unref operation to avoid random "use-after-free".

Signed-off-by: Xiaogang Chen &lt;xiaogang.chen@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Tested-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
