summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2022-06-14drm/amdgpu/cs: make commands with 0 chunks illegal behaviour.Dave Airlie1-1/+1
commit 31ab27b14daaa75541a415c6794d6f3567fea44a upstream. Submitting a cs with 0 chunks, causes an oops later, found trying to execute the wrong userspace driver. MESA_LOADER_DRIVER_OVERRIDE=v3d glxinfo [172536.665184] BUG: kernel NULL pointer dereference, address: 00000000000001d8 [172536.665188] #PF: supervisor read access in kernel mode [172536.665189] #PF: error_code(0x0000) - not-present page [172536.665191] PGD 6712a0067 P4D 6712a0067 PUD 5af9ff067 PMD 0 [172536.665195] Oops: 0000 [#1] SMP NOPTI [172536.665197] CPU: 7 PID: 2769838 Comm: glxinfo Tainted: P O 5.10.81 #1-NixOS [172536.665199] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CROSSHAIR V FORMULA-Z, BIOS 2201 03/23/2015 [172536.665272] RIP: 0010:amdgpu_cs_ioctl+0x96/0x1ce0 [amdgpu] [172536.665274] Code: 75 18 00 00 4c 8b b2 88 00 00 00 8b 46 08 48 89 54 24 68 49 89 f7 4c 89 5c 24 60 31 d2 4c 89 74 24 30 85 c0 0f 85 c0 01 00 00 <48> 83 ba d8 01 00 00 00 48 8b b4 24 90 00 00 00 74 16 48 8b 46 10 [172536.665276] RSP: 0018:ffffb47c0e81bbe0 EFLAGS: 00010246 [172536.665277] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [172536.665278] RDX: 0000000000000000 RSI: ffffb47c0e81be28 RDI: ffffb47c0e81bd68 [172536.665279] RBP: ffff936524080010 R08: 0000000000000000 R09: ffffb47c0e81be38 [172536.665281] R10: ffff936524080010 R11: ffff936524080000 R12: ffffb47c0e81bc40 [172536.665282] R13: ffffb47c0e81be28 R14: ffff9367bc410000 R15: ffffb47c0e81be28 [172536.665283] FS: 00007fe35e05d740(0000) GS:ffff936c1edc0000(0000) knlGS:0000000000000000 [172536.665284] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [172536.665286] CR2: 00000000000001d8 CR3: 0000000532e46000 CR4: 00000000000406e0 [172536.665287] Call Trace: [172536.665322] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [172536.665332] drm_ioctl_kernel+0xaa/0xf0 [drm] [172536.665338] drm_ioctl+0x201/0x3b0 [drm] [172536.665369] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [172536.665372] ? selinux_file_ioctl+0x135/0x230 [172536.665399] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [172536.665403] __x64_sys_ioctl+0x83/0xb0 [172536.665406] do_syscall_64+0x33/0x40 [172536.665409] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2018 Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14drm/amdgpu/ucode: Remove firmware load type check in amdgpu_ucode_free_boAlice Wong1-2/+1
[ Upstream commit ab0cd4a9ae5b4679b714d8dbfedc0901fecdce9f ] When psp_hw_init failed, it will set the load_type to AMDGPU_FW_LOAD_DIRECT. During amdgpu_device_ip_fini, amdgpu_ucode_free_bo checks that load_type is AMDGPU_FW_LOAD_DIRECT and skips deallocating fw_buf causing memory leak. Remove load_type check in amdgpu_ucode_free_bo. Signed-off-by: Alice Wong <shiwei.wong@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14drm/amd/pm: fix the compile warningEvan Quan1-13/+1
[ Upstream commit 555238d92ac32dbad2d77ad2bafc48d17391990c ] Fix the compile warning below: drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:1641 kv_get_acp_boot_level() warn: always true condition '(table->entries[i]->clk >= 0) => (0-u32max >= 0)' Reported-by: kernel test robot <lkp@intel.com> CC: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14drm/amd/pm: fix double free in si_parse_power_table()Keita Suzuki1-5/+3
[ Upstream commit f3fa2becf2fc25b6ac7cf8d8b1a2e4a86b3b72bd ] In function si_parse_power_table(), array adev->pm.dpm.ps and its member is allocated. If the allocation of each member fails, the array itself is freed and returned with an error code. However, the array is later freed again in si_dpm_fini() function which is called when the function returns an error. This leads to potential double free of the array adev->pm.dpm.ps, as well as leak of its array members, since the members are not freed in the allocation function and the array is not nulled when freed. In addition adev->pm.dpm.num_ps, which keeps track of the allocated array member, is not updated until the member allocation is successfully finished, this could also lead to either use after free, or uninitialized variable access in si_dpm_fini(). Fix this by postponing the free of the array until si_dpm_fini() and increment adev->pm.dpm.num_ps everytime the array member is allocated. Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-12drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpuFelix Kuehling1-3/+7
commit b40a6ab2cf9213923bf8e821ce7fa7f6a0a26990 upstream. amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap to access the BO through the corresponding file descriptor. The VM can also be extracted from drm_priv, so drm_priv can replace the vm parameter in the kfd2kgd interface. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> [This is a partial cherry-pick of the upstream commit.] Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20drm/amdkfd: Fix Incorrect VMIDs passed to HWSTushar Patel1-1/+1
[ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ] Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix this by passing correct number of VMIDs to HWS v2: squash in warning fix (Alex) Signed-off-by: Tushar Patel <tushar.patel@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20drm/amd: Add USBC connector IDAurabindo Pillai1-0/+1
[ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ] [Why&How] Add a dedicated AMDGPU specific ID for use with newer ASICs that support USB-C output Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15drm/amdkfd: Fix -Wstrict-prototypes from amdgpu_amdkfd_gfx_10_0_get_functions()Nathan Chancellor1-1/+1
This patch is for linux-5.4.y only, it has no equivalent change upstream. When building x86_64 allmodconfig with tip of tree clang, there is an instance of -Wstrict-prototypes: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c:168:59: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions() ^ void 1 error generated. amdgpu_amdkfd_gfx_10_0_get_functions() is prototyped properly in drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h but its definition in amdgpu_amdkfd_gfx_v10.c does not have the argument types specified, which causes the warning. GCC does not warn because it permits an old-style definition if the prototype has the argument types. This code was eliminated by commit e392c887df97 ("drm/amdkfd: Use array to probe kfd2kgd_calls"), which was a part of a larger series that does not look very suitable for stable. Just fix this one location, as it was the only instance of this new warning across a variety of builds. Fixes: 6bdadb207224 ("drm/amdgpu: Add navi10 kfd support for amdgpu (v3)") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()Dan Carpenter1-1/+1
[ Upstream commit 1647b54ed55d4d48c7199d439f8834626576cbe9 ] This post-op should be a pre-op so that we do not pass -1 as the bit number to test_bit(). The current code will loop downwards from 63 to -1. After changing to a pre-op, it loops from 63 to 0. Fixes: 71c37505e7ea ("drm/amdgpu/gfx: move more common KIQ code to amdgpu_gfx.c") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15drm/amdgpu: Fix recursive locking warningRajneesh Bhardwaj1-1/+2
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ] Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks. [ +0.000003] WARNING: possible recursive locking detected [ +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted [ +0.000004] -------------------------------------------- [ +0.000002] python/4822 is trying to acquire lock: [ +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000203] but task is already holding lock: [ +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000017] other info that might help us debug this: [ +0.000002] Possible unsafe locking scenario: [ +0.000003] CPU0 [ +0.000002] ---- [ +0.000002] lock(reservation_ww_class_mutex); [ +0.000004] lock(reservation_ww_class_mutex); [ +0.000003] *** DEADLOCK *** [ +0.000002] May be due to missing lock nesting notation [ +0.000003] 7 locks held by python/4822: [ +0.000003] #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at: kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu] [ +0.000232] #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu] [ +0.000241] #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu] [ +0.000236] #3: ffffb2b35606fd28 (reservation_ww_class_acquire){+.+.}-{0:0}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu] [ +0.000235] #4: ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000015] #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at: drm_dev_enter+0x5/0xa0 [drm] [ +0.000038] #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3}, at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu] [ +0.000195] stack backtrace: [ +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted 5.13.0-kfd-rajneesh #1030 [ +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 08/29/2018 [ +0.000003] Call Trace: [ +0.000003] dump_stack+0x6d/0x89 [ +0.000010] __lock_acquire+0xb93/0x1a90 [ +0.000009] lock_acquire+0x25d/0x2d0 [ +0.000005] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060 [ +0.000007] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] ? lock_release+0x13f/0x270 [ +0.000005] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm] [ +0.000010] amdgpu_bo_unref+0x1a/0x30 [amdgpu] [ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu] [ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu] [ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu] [ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu] [ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu] [ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu] [ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu] [ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60 [amdgpu] [ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu] [ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu] [ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu] [ +0.000216] ? lock_release+0x13f/0x270 [ +0.000006] ? __fget_files+0x107/0x1e0 [ +0.000007] __x64_sys_ioctl+0x8b/0xd0 [ +0.000007] do_syscall_64+0x36/0x70 [ +0.000004] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000007] RIP: 0033:0x7fbff90a7317 [ +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 [ +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX: 00007fbff90a7317 [ +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI: 0000000000000004 [ +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09: 00007fbcc402d880 [ +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12: 00000000c0184b18 [ +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15: 00007fbcc402d820 Cc: Christian König <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence objXin Xiong1-0/+1
[ Upstream commit dfced44f122c500004a48ecc8db516bb6a295a1b ] This issue takes place in an error path in amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into default case, the function simply returns -EINVAL, forgetting to decrement the reference count of a dma_fence obj, which is bumped earlier by amdgpu_cs_get_fence(). This may result in reference count leaks. Fix it by decreasing the refcount of specific object before returning the error code. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-02drm/amdgpu: disable MMHUB PG for PicassoEvan Quan1-1/+4
commit f626dd0ff05043e5a7154770cc7cda66acee33a3 upstream. MMHUB PG needs to be disabled for Picasso for stability reasons. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23drm/amdgpu: fix logic inversion in checkChristian König1-1/+1
[ Upstream commit e8ae38720e1a685fd98cfa5ae118c9d07b45ca79 ] We probably never trigger this, but the logic inside the check is inverted. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27drm/amdgpu: fixup bad vram size on gmc v8Zongmin Zhou1-3/+10
[ Upstream commit 11544d77e3974924c5a9c8a8320b996a3e9b2f8b ] Some boards(like RX550) seem to have garbage in the upper 16 bits of the vram size register. Check for this and clamp the size properly. Fixes boards reporting bogus amounts of vram. after add this patch,the maximum GPU VRAM size is 64GB, otherwise only 64GB vram size will be used. Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode()Zhou Qingyang1-0/+6
[ Upstream commit b220110e4cd442156f36e1d9b4914bb9e87b0d00 ] In amdgpu_connector_lcd_native_mode(), the return value of drm_mode_duplicate() is assigned to mode, and there is a dereference of it in amdgpu_connector_lcd_native_mode(), which will lead to a NULL pointer dereference on failure of drm_mode_duplicate(). Fix this bug add a check of mode. This bug was found by a static analyzer. The analysis employs differential checking to identify inconsistent security operations (e.g., checks or kfrees) between two code paths and confirms that the inconsistent operations are not recovered in the current function or the callers, so they constitute bugs. Note that, as a bug found by static analysis, it can be a false positive or hard to trigger. Multiple researchers have cross-reviewed the bug. Builds with CONFIG_DRM_AMDGPU=m show no new warnings, and our static analyzer no longer warns about this code. Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Signed-off-by: Zhou Qingyang <zhou1615@umn.edu> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORELe Ma1-2/+2
commit f3a8076eb28cae1553958c629aecec479394bbe2 upstream. should count on GC IP base address Signed-off-by: Le Ma <le.ma@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-26drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga ↵hongao1-0/+1
and dvi connectors commit bf552083916a7f8800477b5986940d1c9a31b953 upstream. amdgpu_connector_vga_get_modes missed function amdgpu_get_native_mode which assign amdgpu_encoder->native_mode with *preferred_mode result in amdgpu_encoder->native_mode.clock always be 0. That will cause amdgpu_connector_set_property returned early on: if ((rmx_type != DRM_MODE_SCALE_NONE) && (amdgpu_encoder->native_mode.clock == 0)) when we try to set scaling mode Full/Full aspect/Center. Add the missing function to amdgpu_connector_vga_get_mode can fix this. It also works on dvi connectors because amdgpu_connector_dvi_helper_funcs.get_mode use the same method. Signed-off-by: hongao <hongao@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17drm/amdgpu/gmc6: fix DMA mask from 44 to 40 bitsAlex Deucher1-2/+2
[ Upstream commit 403475be6d8b122c3e6b8a47e075926d7299e5ef ] The DMA mask on SI parts is 40 bits not 44. Copy paste typo. Fixes: 244511f386ccb9 ("drm/amdgpu: simplify and cleanup setting the dma mask") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1762 Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17drm/amdgpu: fix warning for overflow checkArnd Bergmann2-2/+2
[ Upstream commit 335aea75b0d95518951cad7c4c676e6f1c02c150 ] The overflow check in amdgpu_bo_list_create() causes a warning with clang-14 on 64-bit architectures, since the limit can never be exceeded. drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c:74:18: error: result of comparison of constant 256204778801521549 with expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (num_entries > (SIZE_MAX - sizeof(struct amdgpu_bo_list)) ~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The check remains useful for 32-bit architectures, so just avoid the warning by using size_t as the type for the count. Fixes: 920990cb080a ("drm/amdgpu: allocate the bo_list array after the list") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17drm/amdgpu: fix gart.bo pin_count leakLeslie Shi2-2/+4
[ Upstream commit 66805763a97f8f7bdf742fc0851d85c02ed9411f ] gmc_v{9,10}_0_gart_disable() isn't called matched with correspoding gart_enbale function in SRIOV case. This will lead to gart.bo pin_count leak on driver unload. Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10Ernst Sjöstrand1-1/+1
commit 67a44e659888569a133a8f858c8230e9d7aad1d5 upstream. Seems like newer cards can have even more instances now. Found by UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29 index 8 is out of range for type 'uint32_t *[8]' Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697 Cc: stable@vger.kernel.org Signed-off-by: Ernst Sjöstrand <ernstp@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22drm/amdgpu: Fix BUG_ON assertAndrey Grodzovsky1-1/+1
commit ea7acd7c5967542353430947f3faf699e70602e5 upstream. With added CPU domain to placement you can have now 3 placemnts at once. CC: stable@kernel.org Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210622162339.761651-5-andrey.grodzovsky@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable ↵Tuo Li1-1/+1
access in amdgpu_i2c_router_select_ddc_port() [ Upstream commit a211260c34cfadc6068fece8c9e99e0fe1e2a2b6 ] The variable val is declared without initialization, and its address is passed to amdgpu_i2c_get_byte(). In this function, the value of val is accessed in: DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n", addr, *val); Also, when amdgpu_i2c_get_byte() returns, val may remain uninitialized, but it is accessed in: val &= ~amdgpu_connector->router.ddc_mux_control_pin; To fix this possible uninitialized-variable access, initialize val to 0 in amdgpu_i2c_router_select_ddc_port(). Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22drm/amdgpu: Fix amdgpu_ras_eeprom_init()Luben Tuikov1-1/+1
[ Upstream commit dce4400e6516d18313d23de45b5be8a18980b00e ] No need to account for the 2 bytes of EEPROM address--this is now well abstracted away by the fixes the the lower layers. Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15drm/amdgpu/acp: Make PM domain really workKai-Heng Feng1-28/+26
[ Upstream commit aff890288de2d818e4f83ec40c9315e2d735df07 ] Devices created by mfd_add_hotplug_devices() don't really increase the index of its name, so get_mfd_cell_dev() cannot find any device, hence a NULL dev is passed to pm_genpd_add_device(): [ 56.974926] (NULL device *): amdgpu: device acp_audio_dma.0.auto added to pm domain [ 56.974933] (NULL device *): amdgpu: Failed to add dev to genpd [ 56.974941] [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <acp_ip> failed -22 [ 56.975810] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed [ 56.975839] amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init [ 56.977136] ------------[ cut here ]------------ [ 56.977143] kernel BUG at mm/slub.c:4206! [ 56.977158] invalid opcode: 0000 [#1] SMP NOPTI [ 56.977167] CPU: 1 PID: 1648 Comm: modprobe Not tainted 5.12.0-051200rc8-generic #202104182230 [ 56.977175] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A68M-HD+, BIOS P5.20 02/13/2019 [ 56.977180] RIP: 0010:kfree+0x3bf/0x410 [ 56.977195] Code: 89 e7 48 d3 e2 f7 da e8 5f 0d 02 00 80 e7 02 75 3e 44 89 ee 4c 89 e7 e8 ef 5f fd ff e9 fa fe ff ff 49 8b 44 24 08 a8 01 75 b7 <0f> 0b 4c 8b 4d b0 48 8b 4d a8 48 89 da 4c 89 e6 41 b8 01 00 00 00 [ 56.977202] RSP: 0018:ffffa48640ff79f0 EFLAGS: 00010246 [ 56.977210] RAX: 0000000000000000 RBX: ffff9286127d5608 RCX: 0000000000000000 [ 56.977215] RDX: 0000000000000000 RSI: ffffffffc099d0fb RDI: ffff9286127d5608 [ 56.977220] RBP: ffffa48640ff7a48 R08: 0000000000000001 R09: 0000000000000001 [ 56.977224] R10: 0000000000000000 R11: ffff9286087d8458 R12: fffff3ae0449f540 [ 56.977229] R13: 0000000000000000 R14: dead000000000122 R15: dead000000000100 [ 56.977234] FS: 00007f9de5929540(0000) GS:ffff928612e80000(0000) knlGS:0000000000000000 [ 56.977240] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 56.977245] CR2: 00007f697dd97160 CR3: 00000001110f0000 CR4: 00000000001506e0 [ 56.977251] Call Trace: [ 56.977261] amdgpu_dm_encoder_destroy+0x1b/0x30 [amdgpu] [ 56.978056] drm_mode_config_cleanup+0x4f/0x2e0 [drm] [ 56.978147] ? kfree+0x3dd/0x410 [ 56.978157] ? drm_managed_release+0xc8/0x100 [drm] [ 56.978232] drm_mode_config_init_release+0xe/0x10 [drm] [ 56.978311] drm_managed_release+0x9d/0x100 [drm] [ 56.978388] devm_drm_dev_init_release+0x4d/0x70 [drm] [ 56.978450] devm_action_release+0x15/0x20 [ 56.978459] release_nodes+0x77/0xc0 [ 56.978469] devres_release_all+0x3f/0x50 [ 56.978477] really_probe+0x245/0x460 [ 56.978485] driver_probe_device+0xe9/0x160 [ 56.978492] device_driver_attach+0xab/0xb0 [ 56.978499] __driver_attach+0x8f/0x150 [ 56.978506] ? device_driver_attach+0xb0/0xb0 [ 56.978513] bus_for_each_dev+0x7e/0xc0 [ 56.978521] driver_attach+0x1e/0x20 [ 56.978528] bus_add_driver+0x135/0x1f0 [ 56.978534] driver_register+0x91/0xf0 [ 56.978540] __pci_register_driver+0x54/0x60 [ 56.978549] amdgpu_init+0x77/0x1000 [amdgpu] [ 56.979246] ? 0xffffffffc0dbc000 [ 56.979254] do_one_initcall+0x48/0x1d0 [ 56.979265] ? kmem_cache_alloc_trace+0x120/0x230 [ 56.979274] ? do_init_module+0x28/0x280 [ 56.979282] do_init_module+0x62/0x280 [ 56.979288] load_module+0x71c/0x7a0 [ 56.979296] __do_sys_finit_module+0xc2/0x120 [ 56.979305] __x64_sys_finit_module+0x1a/0x20 [ 56.979311] do_syscall_64+0x38/0x90 [ 56.979319] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 56.979328] RIP: 0033:0x7f9de54f989d [ 56.979335] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48 [ 56.979342] RSP: 002b:00007ffe3c395a28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 56.979350] RAX: ffffffffffffffda RBX: 0000560df3ef4330 RCX: 00007f9de54f989d [ 56.979355] RDX: 0000000000000000 RSI: 0000560df3a07358 RDI: 000000000000000f [ 56.979360] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000 [ 56.979365] R10: 000000000000000f R11: 0000000000000246 R12: 0000560df3a07358 [ 56.979369] R13: 0000000000000000 R14: 0000560df3ef4460 R15: 0000560df3ef4330 [ 56.979377] Modules linked in: amdgpu(+) iommu_v2 gpu_sched drm_ttm_helper ttm drm_kms_helper cec rc_core i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt nft_counter xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables libcrc32c nfnetlink ip6_tables iptable_filter bpfilter input_leds binfmt_misc edac_mce_amd kvm_amd ccp kvm snd_hda_codec_realtek snd_hda_codec_generic crct10dif_pclmul snd_hda_codec_hdmi ledtrig_audio ghash_clmulni_intel aesni_intel snd_hda_intel snd_intel_dspcfg snd_seq_midi crypto_simd snd_intel_sdw_acpi cryptd snd_hda_codec snd_seq_midi_event snd_rawmidi snd_hda_core snd_hwdep snd_seq fam15h_power k10temp snd_pcm snd_seq_device snd_timer snd mac_hid soundcore sch_fq_codel nct6775 hwmon_vid drm ip_tables x_tables autofs4 dm_mirror dm_region_hash dm_log hid_generic usbhid hid uas usb_storage r8169 crc32_pclmul realtek ahci xhci_pci i2c_piix4 [ 56.979521] xhci_pci_renesas libahci video [ 56.979541] ---[ end trace cb8f6a346f18da7b ]--- Instead of finding MFD hotplugged device by its name, simply iterate over the child devices to avoid the issue. Squash in unused variable removal (Alex) BugLink: https://bugs.launchpad.net/bugs/1920674 Fixes: 25030321ba28 ("drm/amd: add pm domain for ACP IP sub blocks") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-19drm/amdkfd: use allowed domain for vmbo validationNirmoy Das1-17/+4
[ Upstream commit bc05716d4fdd065013633602c5960a2bf1511b9c ] Fixes handling when page tables are in system memory. v3: remove struct amdgpu_vm_parser. v2: remove unwanted variable. change amdgpu_amdkfd_validate instead of amdgpu_amdkfd_bo_validate. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-19drm/amd/amdgpu/sriov disable all ip hw status by defaultJack Zhang1-1/+1
[ Upstream commit 95ea3dbc4e9548d35ab6fbf67675cef8c293e2f5 ] Disable all ip's hw status to false before any hw_init. Only set it to true until its hw_init is executed. The old 5.9 branch has this change but somehow the 5.11 kernrel does not have this fix. Without this change, sriov tdr have gfx IB test fail. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Review-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-30Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full ↵Yifan Zhang1-5/+1
doorbell." commit baacf52a473b24e10322b67757ddb92ab8d86717 upstream. This reverts commit 1c0b0efd148d5b24c4932ddb3fa03c8edd6097b3. Reason for revert: Side effect of enlarging CP_MEC_DOORBELL_RANGE may cause some APUs fail to enter gfxoff in certain user cases. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue."Yifan Zhang1-5/+1
commit ee5468b9f1d3bf48082eed351dace14598e8ca39 upstream. This reverts commit 4cbbe34807938e6e494e535a68d5ff64edac3f20. Reason for revert: side effect of enlarging CP_MEC_DOORBELL_RANGE may cause some APUs fail to enter gfxoff in certain user cases. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-23drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue.Yifan Zhang1-1/+5
commit 4cbbe34807938e6e494e535a68d5ff64edac3f20 upstream. If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-23drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell.Yifan Zhang1-1/+5
commit 1c0b0efd148d5b24c4932ddb3fa03c8edd6097b3 upstream. If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10drm/amdgpu: make sure we unpin the UVD BONirmoy Das1-0/+1
commit 07438603a07e52f1c6aa731842bd298d2725b7be upstream. Releasing pinned BOs is illegal now. UVD 6 was missing from: commit 2f40801dc553 ("drm/amdgpu: make sure we unpin the UVD BO") Fixes: 2f40801dc553 ("drm/amdgpu: make sure we unpin the UVD BO") Cc: stable@vger.kernel.org Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10drm/amdgpu: Don't query CE and UE errorsLuben Tuikov1-16/+0
commit dce3d8e1d070900e0feeb06787a319ff9379212c upstream. On QUERY2 IOCTL don't query counts of correctable and uncorrectable errors, since when RAS is enabled and supported on Vega20 server boards, this takes insurmountably long time, in O(n^3), which slows the system down to the point of it being unusable when we have GUI up. Fixes: ae363a212b14 ("drm/amdgpu: Add a new flag to AMDGPU_CTX_OP_QUERY_STATE2") Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03drm/amd/amdgpu: fix a potential deadlock in gpu resetLang Yu1-1/+0
[ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ] When amdgpu_ib_ring_tests failed, the reset logic called amdgpu_device_ip_suspend twice, then deadlock occurred. Deadlock log: [ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [ 806.290952] [drm] free PSP TMR buffer [ 806.319406] ============================================ [ 806.320315] WARNING: possible recursive locking detected [ 806.321225] 5.11.0-custom #1 Tainted: G W OEL [ 806.322135] -------------------------------------------- [ 806.323043] cat/2593 is trying to acquire lock: [ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.325668] but task is already holding lock: [ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.328430] other info that might help us debug this: [ 806.329539] Possible unsafe locking scenario: [ 806.330549] CPU0 [ 806.330983] ---- [ 806.331416] lock(&adev->dm.dc_lock); [ 806.332086] lock(&adev->dm.dc_lock); [ 806.332738] *** DEADLOCK *** [ 806.333747] May be due to missing lock nesting notation [ 806.334899] 3 locks held by cat/2593: [ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.340869] stack backtrace: [ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1 [ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020 [ 806.344413] Call Trace: [ 806.344849] dump_stack+0x93/0xbd [ 806.345435] __lock_acquire.cold+0x18a/0x2cf [ 806.346179] lock_acquire+0xca/0x390 [ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.347813] __mutex_lock+0x9b/0x930 [ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu] [ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50 [ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.354064] mutex_lock_nested+0x1b/0x20 [ 806.354747] ? mutex_lock_nested+0x1b/0x20 [ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu] [ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu] [ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu] [ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu] Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Christian KÃnig <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03drm/amdgpu: Fix a use-after-freexinhui pan1-0/+1
[ Upstream commit 1e5c37385097c35911b0f8a0c67ffd10ee1af9a2 ] looks like we forget to set ttm->sg to NULL. Hit panic below [ 1235.844104] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b7b4b: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI [ 1235.989074] Call Trace: [ 1235.991751] sg_free_table+0x17/0x20 [ 1235.995667] amdgpu_ttm_backend_unbind.cold+0x4d/0xf7 [amdgpu] [ 1236.002288] amdgpu_ttm_backend_destroy+0x29/0x130 [amdgpu] [ 1236.008464] ttm_tt_destroy+0x1e/0x30 [ttm] [ 1236.013066] ttm_bo_cleanup_memtype_use+0x51/0xa0 [ttm] [ 1236.018783] ttm_bo_release+0x262/0xa50 [ttm] [ 1236.023547] ttm_bo_put+0x82/0xd0 [ttm] [ 1236.027766] amdgpu_bo_unref+0x26/0x50 [amdgpu] [ 1236.032809] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7aa/0xd90 [amdgpu] [ 1236.040400] kfd_ioctl_alloc_memory_of_gpu+0xe2/0x330 [amdgpu] [ 1236.046912] kfd_ioctl+0x463/0x690 [amdgpu] Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03drm/amd/amdgpu: fix refcount leakJingwen Chen1-0/+3
[ Upstream commit fa7e6abc75f3d491bc561734312d065dc9dc2a77 ] [Why] the gem object rfb->base.obj[0] is get according to num_planes in amdgpufb_create, but is not put according to num_planes [How] put rfb->base.obj[0] in amdgpu_fbdev_destroy according to num_planes Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gateJames Zhu1-0/+2
commit 2fb536ea42d557f39f70c755f68e1aa1ad466c55 upstream. Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gateJames Zhu1-0/+2
commit 0c6013377b4027e69d8f3e63b6bf556b6cb87802 upstream. Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gateJames Zhu1-1/+5
commit b95f045ea35673572ef46d6483ad8bd6d353d63c upstream. Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-26drm/amdgpu: update sdma golden setting for Navi12Guchun Chen1-0/+4
commit 77194d8642dd4cb7ea8ced77bfaea55610574c38 upstream. Current golden setting is out of date. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-26drm/amdgpu: update gc golden setting for Navi12Guchun Chen1-2/+4
commit 99c45ba5799d6b938bd9bd20edfeb6f3e3e039b9 upstream. Current golden setting is out of date. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-26drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hangChangfeng2-5/+7
commit dbd1003d1252db5973dddf20b24bb0106ac52aa2 upstream. There is problem with 3DCGCG firmware and it will cause compute test hang on picasso/raven1. It needs to disable 3DCGCG in driver to avoid compute hang. Signed-off-by: Changfeng <Changfeng.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11drm/amdgpu: fix NULL pointer dereferenceGuchun Chen1-1/+1
[ Upstream commit 3c3dc654333f6389803cdcaf03912e94173ae510 ] ttm->sg needs to be checked before accessing its child member. Call Trace: amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu] ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm] ttm_bo_release+0x17d/0x300 [ttm] amdgpu_bo_unref+0x1a/0x30 [amdgpu] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x78b/0x8b0 [amdgpu] kfd_ioctl_alloc_memory_of_gpu+0x118/0x220 [amdgpu] kfd_ioctl+0x222/0x400 [amdgpu] ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu] __x64_sys_ioctl+0x8e/0xd0 ? __context_tracking_exit+0x52/0x90 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f97f264d317 Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffdb402c338 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f97f3cc63a0 RCX: 00007f97f264d317 RDX: 00007ffdb402c380 RSI: 00000000c0284b16 RDI: 0000000000000003 RBP: 00007ffdb402c380 R08: 00007ffdb402c428 R09: 00000000c4000004 R10: 00000000c4000004 R11: 0000000000000246 R12: 00000000c0284b16 R13: 0000000000000003 R14: 00007f97f3cc63a0 R15: 00007f8836200000 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11amdgpu: avoid incorrect %hu format stringArnd Bergmann1-1/+1
[ Upstream commit 7d98d416c2cc1c1f7d9508e887de4630e521d797 ] clang points out that the %hu format string does not match the type of the variables here: drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] version_major, version_minor); ^~~~~~~~~~~~~ include/drm/drm_print.h:498:19: note: expanded from macro 'DRM_ERROR' __drm_err(fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ Change it to a regular %u, the same way a previous patch did for another instance of the same warning. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tom Rix <trix@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4fshaoyunl1-1/+1
[ Upstream commit c8941550aa66b2a90f4b32c45d59e8571e33336e ] This recent change introduce SDMA interrupt info printing with irq->process function. These functions do not require a set function to enable/disable the irq Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11drm/amdgpu: mask the xgmi number of hops reported from psp to kfdJonathan Kim1-1/+8
[ Upstream commit 4ac5617c4b7d0f0a8f879997f8ceaa14636d7554 ] The psp supplies the link type in the upper 2 bits of the psp xgmi node information num_hops field. With a new link type, Aldebaran has these bits set to a non-zero value (1 = xGMI3) so the KFD topology will report the incorrect IO link weights without proper masking. The actual number of hops is located in the 3 least significant bits of this field so mask if off accordingly before passing it to the KFD. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Amber Lin <amber.lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07drm/amdgpu: check alignment on CPU page for bo mapXℹ Ruoyao1-4/+4
commit e3512fb67093fabdf27af303066627b921ee9bd8 upstream. The page table of AMDGPU requires an alignment to CPU page so we should check ioctl parameters for it. Return -EINVAL if some parameter is unaligned to CPU page, instead of corrupt the page table sliently. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-07drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()Nirmoy Das1-1/+1
commit 5e61b84f9d3ddfba73091f9fbc940caae1c9eb22 upstream. Offset calculation wasn't correct as start addresses are in pfn not in bytes. CC: stable@vger.kernel.org Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30drm/amdgpu: fb BO should be ttm_bo_type_deviceNirmoy Das1-1/+1
[ Upstream commit 521f04f9e3ffc73ef96c776035f8a0a31b4cdd81 ] FB BO should not be ttm_bo_type_kernel type and amdgpufb_create_pinned_object() pins the FB BO anyway. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-09drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcieKevin Wang1-2/+2
commit 1aa46901ee51c1c5779b3b239ea0374a50c6d9ff upstream. the register offset isn't needed division by 4 to pass RREG32_PCIE() Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>