<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-04-20T08:15:23+00:00</updated>
<entry>
<title>drm/amdgpu: Unlocked unmap only clear page table leaves</title>
<updated>2025-04-20T08:15:23+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2025-01-14T14:53:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=357ba4ed6980afe236b961d5bcedc47c22e10f93'/>
<id>urn:sha1:357ba4ed6980afe236b961d5bcedc47c22e10f93</id>
<content type='text'>
[ Upstream commit 23b645231eeffdaf44021debac881d2f26824150 ]

SVM migration unmap pages from GPU and then update mapping to GPU to
recover page fault. Currently unmap clears the PDE entry for range
length &gt;= huge page and free PTB bo, update mapping to alloc new PT bo.
There is race bug that the freed entry bo maybe still on the pt_free
list, reused when updating mapping and then freed, leave invalid PDE
entry and cause GPU page fault.

By setting the update to clear only one PDE entry or clear PTB, to
avoid unmap to free PTE bo. This fixes the race bug and improve the
unmap and map to GPU performance. Update mapping to huge page will
still free the PTB bo.

With this change, the vm-&gt;pt_freed list and work is not needed. Add
WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the
PTB.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: nuke the VM PD/PT shadow handling</title>
<updated>2024-09-18T20:15:06+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2024-08-27T14:12:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7181faaa4703705939580abffaf9cb5d6b50dbb7'/>
<id>urn:sha1:7181faaa4703705939580abffaf9cb5d6b50dbb7</id>
<content type='text'>
This was only used as workaround for recovering the page tables after
VRAM was lost and is no longer necessary after the function
amdgpu_vm_bo_reset_state_machine() started to do the same.

Compute never used shadows either, so the only proplematic case left is
SVM and that is most likely not recoverable in any way when VRAM is
lost.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: re-work VM syncing</title>
<updated>2024-09-06T21:36:24+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2024-08-20T10:01:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4da5a95bf125fd682249f60e296455c6413b4e10'/>
<id>urn:sha1:4da5a95bf125fd682249f60e296455c6413b4e10</id>
<content type='text'>
Rework how VM operations synchronize to submissions. Provide an
amdgpu_sync container to the backends instead of an reservation
object and fill in the amdgpu_sync object in the higher layers
of the code.

No intended functional change, just prepares for upcomming changes.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Friedrich Vock &lt;friedrich.vock@gmx.de&gt;
Acked-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent</title>
<updated>2024-05-29T18:09:41+00:00</updated>
<author>
<name>Jesse Zhang</name>
<email>jesse.zhang@amd.com</email>
</author>
<published>2024-05-23T09:14:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=511a623fb46a6cf578c61d4f2755783c48807c77'/>
<id>urn:sha1:511a623fb46a6cf578c61d4f2755783c48807c77</id>
<content type='text'>
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent.
To make the code more robust, check the pointer parent.

Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: support gfx v12 specific pte/pde fields</title>
<updated>2024-04-30T14:01:01+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-03-08T14:56:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=980a0a9452e1a74cb1384378989d0c5237ad8cd2'/>
<id>urn:sha1:980a0a9452e1a74cb1384378989d0c5237ad8cd2</id>
<content type='text'>
Add gfx v12 pte/pde support to gmc common helper.

v2: squash in fixes (Alex)

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Likun Gao &lt;Likun.Gao@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add a NULL check for freeing root PT</title>
<updated>2024-03-22T19:51:55+00:00</updated>
<author>
<name>Shashank Sharma</name>
<email>shashank.sharma@amd.com</email>
</author>
<published>2024-03-21T14:21:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f88a7dd06ab435f8c07dc9b84a003f321c08cd72'/>
<id>urn:sha1:f88a7dd06ab435f8c07dc9b84a003f321c08cd72</id>
<content type='text'>
This patch adds a NULL check to fix this crash reported during the
freeing of root PT entry:

 BUG: unable to handle page fault for address: ffffc9002d637aa0
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 RIP: 0010:amdgpu_vm_pt_free+0x66/0xe0 [amdgpu]
 PKRU: 55555554
 Call Trace:
 &lt;TASK&gt;
  amdgpu_vm_pt_free_root+0x60/0xa0 [amdgpu]
  amdgpu_vm_fini+0x2cb/0x5d0 [amdgpu]
  ? amdgpu_ctx_mgr_entity_fini+0x53/0x1c0 [amdgpu]
  amdgpu_driver_postclose_kms+0x191/0x2d0 [amdgpu]
  drm_file_free.part.0+0x1e5/0x260 [drm]

Cc: Christian König &lt;Christian.Koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Cc: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Acked-by: Christian König &lt;Christian.Koenig@amd.com&gt;
Signed-off-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: sync page table freeing with tlb flush</title>
<updated>2024-03-22T19:47:17+00:00</updated>
<author>
<name>Shashank Sharma</name>
<email>shashank.sharma@amd.com</email>
</author>
<published>2024-03-18T10:54:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b6c4f90b3819148b066154ac7ae5388232aa1773'/>
<id>urn:sha1:b6c4f90b3819148b066154ac7ae5388232aa1773</id>
<content type='text'>
The idea behind this patch is to delay the freeing of PT entry objects
until the TLB flush is done.

This patch:
- Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the
  objects that need to be freed after tlb_flush.
- Adds PT entries in this list in amdgpu_vm_ptes_update after finding
  the PT entry.
- Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free)
  to simply freeing of the BOs, also renames it to
  amdgpu_vm_pt_free_list to reflect this same.
- Exports function amdgpu_vm_pt_free_list to be called directly.
- Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range.

V2: rebase
V4: Addressed review comments from Christian
    - add only locked PTEs entries in TLB flush waitlist.
    - do not create a separate function for list flush.
    - do not create a new lock for TLB flush.
    - there is no need to wait on tlb_flush_fence exclusively.

V5: Addressed review comments from Christian
    - change the amdgpu_vm_pt_free_dfs's functionality to simple freeing
      of the objects and rename it.
    - add all the PTE objects in params-&gt;tlb_flush_waitlist
    - let amdgpu_vm_pt_free_root handle the freeing of BOs independently
    - call amdgpu_vm_pt_free_list directly

V6: Rebase
V7: Rebase
V8: Added a NULL check to fix this backtrace issue:
[  415.351447] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  415.359245] #PF: supervisor write access in kernel mode
[  415.365081] #PF: error_code(0x0002) - not-present page
[  415.370817] PGD 101259067 P4D 101259067 PUD 10125a067 PMD 0
[  415.377140] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  415.382004] CPU: 0 PID: 25481 Comm: test_with_MPI.e Tainted: G           OE     5.18.2-mi300-build-140423-ubuntu-22.04+ #24
[  415.394437] Hardware name: AMD Corporation Sh51p/Sh51p, BIOS RMO1001AS 02/21/2024
[  415.402797] RIP: 0010:amdgpu_vm_ptes_update+0x6fd/0xa10 [amdgpu]
[  415.409648] Code: 4c 89 ff 4d 8d 66 30 e8 f1 ed ff ff 48 85 db 74 42 48 39 5d a0 74 40 48 8b 53 20 48 8b 4b 18 48 8d 43 18 48 8d 75 b0 4c 89 ff &lt;48
&gt; 89 51 08 48 89 0a 49 8b 56 30 48 89 42 08 48 89 53 18 4c 89 63
[  415.430621] RSP: 0018:ffffc9000401f990 EFLAGS: 00010287
[  415.436456] RAX: ffff888147bb82f0 RBX: ffff888147bb82d8 RCX: 0000000000000000
[  415.444426] RDX: 0000000000000000 RSI: ffffc9000401fa30 RDI: ffff888161f80000
[  415.452397] RBP: ffffc9000401fa80 R08: 0000000000000000 R09: ffffc9000401fa00
[  415.460368] R10: 00000007f0cc0000 R11: 00000007f0c85000 R12: ffffc9000401fb20
[  415.468340] R13: 00000007f0d00000 R14: ffffc9000401faf0 R15: ffff888161f80000
[  415.476312] FS:  00007f132ff89840(0000) GS:ffff889f87c00000(0000) knlGS:0000000000000000
[  415.485350] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  415.491767] CR2: 0000000000000008 CR3: 0000000161d46003 CR4: 0000000000770ef0
[  415.499738] PKRU: 55555554
[  415.502750] Call Trace:
[  415.505482]  &lt;TASK&gt;
[  415.507825]  amdgpu_vm_update_range+0x32a/0x880 [amdgpu]
[  415.513869]  amdgpu_vm_clear_freed+0x117/0x250 [amdgpu]
[  415.519814]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x18c/0x250 [amdgpu]
[  415.527729]  kfd_ioctl_unmap_memory_from_gpu+0xed/0x340 [amdgpu]
[  415.534551]  kfd_ioctl+0x3b6/0x510 [amdgpu]

V9: Addressed review comments from Christian
    - No NULL check reqd for root PT freeing
    - Free PT list regardless of needs_flush
    - Move adding BOs in list in a separate function

V10: Added Christian's RB
V11: squash in list fix

Cc: Christian König &lt;Christian.Koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Cc: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Acked-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Acked-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Reviewed-by: Christian König &lt;Christian.Koenig@amd.com&gt;
Tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: implement TLB flush fence</title>
<updated>2024-03-20T17:38:14+00:00</updated>
<author>
<name>Christian Koenig</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2024-03-18T10:43:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d8a3f0a0348d02adf975fb0be71938dfb1c2e273'/>
<id>urn:sha1:d8a3f0a0348d02adf975fb0be71938dfb1c2e273</id>
<content type='text'>
The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
    - rebase
    - set dma_fence_error only in case of error
    - add tlb_flush fence only when PT/PD BO is locked (Felix)
    - use vm-&gt;pasid when f is NULL (Mukul)

V4: - add a wait for (f-&gt;dependency) in tlb_fence_work (Christian)
    - move the misplaced fence_create call to the end (Philip)

V5: - free the f-&gt;dependency properly

V6: (Shashank)
    - light code movement, moved all the clean-up in previous patch
    - introduce params.needs_flush and its usage in this patch
    - rebase without TLB HW sequence patch

V7:
   - Keep the vm-&gt;last_update_fence and tlb_cb code until
     we can fix the HW sequencing (Christian)
   - Move all the tlb_fence related code in a separate function so that
     its easier to read and review

V9: Addressed review comments from Christian
    - start PT update only when we have callback memory allocated

V10:
    - handle device unlock in OOM case (Christian, Mukul)
    - added Christian's R-B

Cc: Christian Koenig &lt;christian.koenig@amd.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Acked-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Acked-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Reviewed-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Reviewed-by: Christian Koenig &lt;christian.koenig@amd.com&gt;
Signed-off-by: Christian Koenig &lt;christian.koenig@amd.com&gt;
Signed-off-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: remove unused code</title>
<updated>2024-03-06T20:24:24+00:00</updated>
<author>
<name>Jesse Zhang</name>
<email>jesse.zhang@amd.com</email>
</author>
<published>2024-03-05T02:22:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bb8863cc9d067c44e751579881048dca0403133c'/>
<id>urn:sha1:bb8863cc9d067c44e751579881048dca0403133c</id>
<content type='text'>
Remove the unused function - amdgpu_vm_pt_is_root_clean
and remove the impossible condition

v1: entries == 0 is not possible any more,
    so this condition could probably be removed (Felix)

Signed-off-by: Jesse Zhang &lt;jesse.zhang@amd.com&gt;
Suggested-by：Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: change vm-&gt;task_info handling</title>
<updated>2024-03-04T20:59:08+00:00</updated>
<author>
<name>Shashank Sharma</name>
<email>shashank.sharma@amd.com</email>
</author>
<published>2024-01-18T19:15:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b8f67b9ddf4f8fe6dd536590712b5912ad78f99c'/>
<id>urn:sha1:b8f67b9ddf4f8fe6dd536590712b5912ad78f99c</id>
<content type='text'>
This patch changes the handling and lifecycle of vm-&gt;task_info object.
The major changes are:
- vm-&gt;task_info is a dynamically allocated ptr now, and its uasge is
  reference counted.
- introducing two new helper funcs for task_info lifecycle management
    - amdgpu_vm_get_task_info: reference counts up task_info before
      returning this info
    - amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm-&gt;task_info.

V2: Do not block all the prints when task_info not found (Felix)

V3: Fixed review comments from Felix
   - Fix wrong indentation
   - No debug message for -ENOMEM
   - Add NULL check for task_info
   - Do not duplicate the debug messages (ti vs no ti)
   - Get first reference of task_info in vm_init(), put last
     in vm_fini()

V4: Fixed review comments from Felix
   - fix double reference increment in create_task_info
   - change amdgpu_vm_get_task_info_pasid
   - additional changes in amdgpu_gem.c while porting

Cc: Christian Koenig &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
