<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c, branch v6.11.8</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.11.8</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.11.8'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-10-04T14:37:49+00:00</updated>
<entry>
<title>drm/amdgpu: fix invalid fence handling in amdgpu_vm_tlb_flush</title>
<updated>2024-10-04T14:37:49+00:00</updated>
<author>
<name>Lang Yu</name>
<email>lang.yu@amd.com</email>
</author>
<published>2024-09-01T12:56:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=08d576b6b9af039d7ca52414b36288bc3872f9e8'/>
<id>urn:sha1:08d576b6b9af039d7ca52414b36288bc3872f9e8</id>
<content type='text'>
[ Upstream commit 4453808d9eab0461dea338e89372ffc4a3c50acc ]

CPU based update doesn't produce a fence, handle such cases properly.

Fixes: d8a3f0a0348d ("drm/amdgpu: implement TLB flush fence")
Signed-off-by: Lang Yu &lt;lang.yu@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: reset vm state machine after gpu reset(vram lost)</title>
<updated>2024-07-24T21:30:49+00:00</updated>
<author>
<name>ZhenGuo Yin</name>
<email>zhenguo.yin@amd.com</email>
</author>
<published>2024-07-19T08:10:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5659b0c93a1ea02c662a030b322093203f299185'/>
<id>urn:sha1:5659b0c93a1ea02c662a030b322093203f299185</id>
<content type='text'>
[Why]
Page table of compute VM in the VRAM will lost after gpu reset.
VRAM won't be restored since compute VM has no shadows.

[How]
Use higher 32-bit of vm-&gt;generation to record a vram_lost_counter.
Reset the VM state machine when vm-&gt;genertaion is not equal to
the new generation token.

v2: Check vm-&gt;generation instead of calling drm_sched_entity_error
in amdgpu_vm_validate.
v3: Use new generation token instead of vram_lost_counter for check.

Signed-off-by: ZhenGuo Yin &lt;zhenguo.yin@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
(cherry picked from commit 47c0388b0589cb481c294dcb857d25a214c46eb3)
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix amdgpu_vm_is_bo_always_valid kerneldoc</title>
<updated>2024-05-20T20:20:26+00:00</updated>
<author>
<name>Tvrtko Ursulin</name>
<email>tvrtko.ursulin@igalia.com</email>
</author>
<published>2024-05-20T08:18:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9c1a429217d294ade1a88eddde987a183d4d3ae4'/>
<id>urn:sha1:9c1a429217d294ade1a88eddde987a183d4d3ae4</id>
<content type='text'>
Align kerneldoc with the function argument name.

Signed-off-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Fixes: 26e20235ce00 ("drm/amdgpu: Add amdgpu_bo_is_vm_bo helper")
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add amdgpu_bo_is_vm_bo helper</title>
<updated>2024-05-17T21:40:38+00:00</updated>
<author>
<name>Tvrtko Ursulin</name>
<email>tvrtko.ursulin@igalia.com</email>
</author>
<published>2024-05-06T16:59:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=26e20235ce00219a1ca2fb617d82fa24607190ae'/>
<id>urn:sha1:26e20235ce00219a1ca2fb617d82fa24607190ae</id>
<content type='text'>
Help code readability by replacing a bunch of:

bo-&gt;tbo.base.resv == vm-&gt;root.bo-&gt;tbo.base.resv

With:

amdgpu_vm_is_bo_always_valid(vm, bo)

No functional changes.

v2:
 * Rename helper and move to amdgpu_vm. (Christian)

v3:
 * Use Christian's kerneldoc.

v4:
 * Fixed logic inversion in amdgpu_vm_bo_get_memory.

Signed-off-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: support gfx v12 specific pte/pde fields</title>
<updated>2024-04-30T14:01:01+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-03-08T14:56:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=980a0a9452e1a74cb1384378989d0c5237ad8cd2'/>
<id>urn:sha1:980a0a9452e1a74cb1384378989d0c5237ad8cd2</id>
<content type='text'>
Add gfx v12 pte/pde support to gmc common helper.

v2: squash in fixes (Alex)

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Likun Gao &lt;Likun.Gao@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: validate the parameters of bo mapping operations more clearly</title>
<updated>2024-04-17T02:39:15+00:00</updated>
<author>
<name>xinhui pan</name>
<email>xinhui.pan@amd.com</email>
</author>
<published>2024-04-11T03:11:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=98856136c485e586ab063f0b3780dfc0c78df780'/>
<id>urn:sha1:98856136c485e586ab063f0b3780dfc0c78df780</id>
<content type='text'>
Verify the parameters of
amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place.

Fixes: dc54d3d1744d ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2")
Cc: stable@vger.kernel.org
Reported-by: Vlad Stolyarov &lt;hexed@google.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: xinhui pan &lt;xinhui.pan@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: sync page table freeing with tlb flush</title>
<updated>2024-03-22T19:47:17+00:00</updated>
<author>
<name>Shashank Sharma</name>
<email>shashank.sharma@amd.com</email>
</author>
<published>2024-03-18T10:54:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b6c4f90b3819148b066154ac7ae5388232aa1773'/>
<id>urn:sha1:b6c4f90b3819148b066154ac7ae5388232aa1773</id>
<content type='text'>
The idea behind this patch is to delay the freeing of PT entry objects
until the TLB flush is done.

This patch:
- Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the
  objects that need to be freed after tlb_flush.
- Adds PT entries in this list in amdgpu_vm_ptes_update after finding
  the PT entry.
- Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free)
  to simply freeing of the BOs, also renames it to
  amdgpu_vm_pt_free_list to reflect this same.
- Exports function amdgpu_vm_pt_free_list to be called directly.
- Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range.

V2: rebase
V4: Addressed review comments from Christian
    - add only locked PTEs entries in TLB flush waitlist.
    - do not create a separate function for list flush.
    - do not create a new lock for TLB flush.
    - there is no need to wait on tlb_flush_fence exclusively.

V5: Addressed review comments from Christian
    - change the amdgpu_vm_pt_free_dfs's functionality to simple freeing
      of the objects and rename it.
    - add all the PTE objects in params-&gt;tlb_flush_waitlist
    - let amdgpu_vm_pt_free_root handle the freeing of BOs independently
    - call amdgpu_vm_pt_free_list directly

V6: Rebase
V7: Rebase
V8: Added a NULL check to fix this backtrace issue:
[  415.351447] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  415.359245] #PF: supervisor write access in kernel mode
[  415.365081] #PF: error_code(0x0002) - not-present page
[  415.370817] PGD 101259067 P4D 101259067 PUD 10125a067 PMD 0
[  415.377140] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  415.382004] CPU: 0 PID: 25481 Comm: test_with_MPI.e Tainted: G           OE     5.18.2-mi300-build-140423-ubuntu-22.04+ #24
[  415.394437] Hardware name: AMD Corporation Sh51p/Sh51p, BIOS RMO1001AS 02/21/2024
[  415.402797] RIP: 0010:amdgpu_vm_ptes_update+0x6fd/0xa10 [amdgpu]
[  415.409648] Code: 4c 89 ff 4d 8d 66 30 e8 f1 ed ff ff 48 85 db 74 42 48 39 5d a0 74 40 48 8b 53 20 48 8b 4b 18 48 8d 43 18 48 8d 75 b0 4c 89 ff &lt;48
&gt; 89 51 08 48 89 0a 49 8b 56 30 48 89 42 08 48 89 53 18 4c 89 63
[  415.430621] RSP: 0018:ffffc9000401f990 EFLAGS: 00010287
[  415.436456] RAX: ffff888147bb82f0 RBX: ffff888147bb82d8 RCX: 0000000000000000
[  415.444426] RDX: 0000000000000000 RSI: ffffc9000401fa30 RDI: ffff888161f80000
[  415.452397] RBP: ffffc9000401fa80 R08: 0000000000000000 R09: ffffc9000401fa00
[  415.460368] R10: 00000007f0cc0000 R11: 00000007f0c85000 R12: ffffc9000401fb20
[  415.468340] R13: 00000007f0d00000 R14: ffffc9000401faf0 R15: ffff888161f80000
[  415.476312] FS:  00007f132ff89840(0000) GS:ffff889f87c00000(0000) knlGS:0000000000000000
[  415.485350] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  415.491767] CR2: 0000000000000008 CR3: 0000000161d46003 CR4: 0000000000770ef0
[  415.499738] PKRU: 55555554
[  415.502750] Call Trace:
[  415.505482]  &lt;TASK&gt;
[  415.507825]  amdgpu_vm_update_range+0x32a/0x880 [amdgpu]
[  415.513869]  amdgpu_vm_clear_freed+0x117/0x250 [amdgpu]
[  415.519814]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x18c/0x250 [amdgpu]
[  415.527729]  kfd_ioctl_unmap_memory_from_gpu+0xed/0x340 [amdgpu]
[  415.534551]  kfd_ioctl+0x3b6/0x510 [amdgpu]

V9: Addressed review comments from Christian
    - No NULL check reqd for root PT freeing
    - Free PT list regardless of needs_flush
    - Move adding BOs in list in a separate function

V10: Added Christian's RB
V11: squash in list fix

Cc: Christian König &lt;Christian.Koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Cc: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Acked-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Acked-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Reviewed-by: Christian König &lt;Christian.Koenig@amd.com&gt;
Tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: implement TLB flush fence</title>
<updated>2024-03-20T17:38:14+00:00</updated>
<author>
<name>Christian Koenig</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2024-03-18T10:43:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d8a3f0a0348d02adf975fb0be71938dfb1c2e273'/>
<id>urn:sha1:d8a3f0a0348d02adf975fb0be71938dfb1c2e273</id>
<content type='text'>
The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
    - rebase
    - set dma_fence_error only in case of error
    - add tlb_flush fence only when PT/PD BO is locked (Felix)
    - use vm-&gt;pasid when f is NULL (Mukul)

V4: - add a wait for (f-&gt;dependency) in tlb_fence_work (Christian)
    - move the misplaced fence_create call to the end (Philip)

V5: - free the f-&gt;dependency properly

V6: (Shashank)
    - light code movement, moved all the clean-up in previous patch
    - introduce params.needs_flush and its usage in this patch
    - rebase without TLB HW sequence patch

V7:
   - Keep the vm-&gt;last_update_fence and tlb_cb code until
     we can fix the HW sequencing (Christian)
   - Move all the tlb_fence related code in a separate function so that
     its easier to read and review

V9: Addressed review comments from Christian
    - start PT update only when we have callback memory allocated

V10:
    - handle device unlock in OOM case (Christian, Mukul)
    - added Christian's R-B

Cc: Christian Koenig &lt;christian.koenig@amd.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Acked-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Acked-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Reviewed-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Reviewed-by: Christian Koenig &lt;christian.koenig@amd.com&gt;
Signed-off-by: Christian Koenig &lt;christian.koenig@amd.com&gt;
Signed-off-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add recent pagefault info in vm_manager</title>
<updated>2024-03-20T17:37:36+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2024-03-07T16:01:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc406d92a097c669e6468ac4f694b4c927c47ab6'/>
<id>urn:sha1:dc406d92a097c669e6468ac4f694b4c927c47ab6</id>
<content type='text'>
Currently page fault information is stored per
vm and which could be freed or stale during
reset. Add it pagefault information in the
vm_manager which is a global space for vm's
and remains valid across.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: cleanup conditional execution</title>
<updated>2024-03-04T20:59:08+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2023-08-15T06:34:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c68cbbfd54c68485ac67b5a04c06feacdce15fec'/>
<id>urn:sha1:c68cbbfd54c68485ac67b5a04c06feacdce15fec</id>
<content type='text'>
First of all calculating the number of dw to patch into a
conditional execution is not something HW generation specific.
This is just standard ring buffer calculations. While at it also
reduce the BUG_ON() into WARN_ON().

Then instead of a random bit pattern use 0 as default value for
the number of dw skipped, this way it's not mandatory any more
to patch the conditional execution.

And last make the address to check a parameter of the
conditional execution instead of getting this from the ring.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
