kernel/linux.git/drivers/gpu/drm/amd/amdkfd, branch v6.0.14

drm/amdkfd: update GFX11 CWSR trap handler

2022-12-02T16:43:11+00:00

[ Upstream commit 6640f8e5adb69a0550fe1d224d3ac64c10f00eef ] With corresponding FW change fixes issue where triggering CWSR on a workgroup with waves in s_barrier wouldn't lead to a back-off and therefore cause a hang. Signed-off-by: Jay Cornwall Tested-by: Graham Sider Acked-by: Harish Kasiviswanathan Acked-by: Felix Kuehling Reviewed-by: Graham Sider Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Sasha Levin

drm/amdgpu: Enable SA software trap.

2022-12-02T16:43:11+00:00

[ Upstream commit 585a82618bc422508c0c8ae0dfe2f76f22c28361 ] Enables support for software trap for MES >= 4. Adapted from implementation from Jay Cornwall. v2: Add IP version check in conditions. v3: Remove debugger code changes. Signed-off-by: Jay Cornwall Signed-off-by: David Belanger Reviewed-by: Felix Kuehling Acked-by: Alex Deucher Signed-off-by: Alex Deucher Stable-dep-of: 6640f8e5adb6 ("drm/amdkfd: update GFX11 CWSR trap handler") Signed-off-by: Sasha Levin

drm/amdkfd: Migrate in CPU page fault use current mm

2022-11-16T09:04:14+00:00

commit 3a876060892ba52dd67d197c78b955e62657d906 upstream. migrate_vma_setup shows below warning because we don't hold another process mm mmap_lock. We should use current vmf->vma->vm_mm instead, the caller already hold current mmap lock inside CPU page fault handler. WARNING: CPU: 10 PID: 3054 at include/linux/mmap_lock.h:155 find_vma Call Trace: walk_page_range+0x76/0x150 migrate_vma_setup+0x18a/0x640 svm_migrate_vram_to_ram+0x245/0xa10 [amdgpu] svm_migrate_to_ram+0x36f/0x470 [amdgpu] do_swap_page+0xcfe/0xec0 __handle_mm_fault+0x96b/0x15e0 handle_mm_fault+0x13f/0x3e0 do_user_addr_fault+0x1e7/0x690 Fixes: e1f84eef313f ("drm/amdkfd: handle CPU fault on COW mapping") Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Fix error handling in kfd_criu_restore_events

2022-11-16T09:04:08+00:00

commit 66f7903779fbbc620bf1040017e4833ef6a0b541 upstream. mutex_unlock before the exit label because all the error code paths that jump there didn't take that lock. This fixes unbalanced locking errors in case of restore errors. Fixes: 40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events") Signed-off-by: Felix Kuehling Reviewed-by: Rajneesh Bhardwaj Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Fix error handling in criu_checkpoint

2022-11-16T09:04:08+00:00

commit b91c23e099f0b65d62159da13458c5eefa76083f upstream. Checkpoint BOs last. That way we don't need to close dmabuf FDs if something else fails later. This avoids problematic access to user mode memory in the error handling code path. criu_checkpoint_bos has its own error handling and cleanup that does not depend on access to user memory. In the private data, keep BOs before the remaining objects. This is necessary to restore things in the correct order as restoring events depends on the events-page BO being restored first. Fixes: be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects") Reported-by: Jann Horn CC: Rajneesh Bhardwaj Signed-off-by: Felix Kuehling Reviewed-and-tested-by: Rajneesh Bhardwaj Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()

2022-11-16T09:03:49+00:00

[ Upstream commit 5b994354af3cab770bf13386469c5725713679af ] ./drivers/gpu/drm/amd/amdkfd/kfd_migrate.c:985:58-62: ERROR: p is NULL but dereferenced. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2549 Reported-by: Abaci Robot Signed-off-by: Yang Li Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin

drm/amdkfd: handle CPU fault on COW mapping

2022-11-16T09:03:49+00:00

[ Upstream commit e1f84eef313f4820cca068a238c645d0a38c6a9b ] If CPU page fault in a page with zone_device_data svm_bo from another process, that means it is COW mapping in the child process and the range is migrated to VRAM by parent process. Migrate the parent process range back to system memory to recover the CPU page fault. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin

drm/amdkfd: correct the cache info for gfx1036

2022-11-03T15:00:21+00:00

commit 969758bbf5e9360b63bbb2328ac3fda46bbbc9f5 upstream. correct the cache information for gfx1036 Acked-by: Alex Deucher Reviewed-by: Yifan Zhang Signed-off-by: Yifan Zhang Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: update gfx1037 Lx cache setting

2022-11-03T15:00:21+00:00

commit 9656db1b933caf6ffaaef10322093fe018359090 upstream. Update the gfx1037 L1/L2 cache setting. Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: Fix UBSAN shift-out-of-bounds warning

2022-10-21T10:39:17+00:00

[ Upstream commit b292cafe2dd02d96a07147e4b160927e8399d5cc ] This was fixed in initialize_cpsch before, but not in initialize_nocpsch. Factor sdma bitmap initialization into a helper function to apply the correct implementation in both cases without duplicating it. v2: Added a range check Reported-by: Ellis Michael Signed-off-by: Felix Kuehling Reviewed-by: Graham Sider Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin