Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 0be7ed8e7eb15282b5d0f6fdfea884db594ea9bf ]
Fix potential NULL dereference, in the case when "man", the resource manager
might be NULL, when/if we print debug information.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Cc: Dan Carpenter <error27@gmail.com>
Cc: kernel test robot <lkp@intel.com>
Fixes: 7554886daa31ea ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e1d900df63adcb748905131dd6258e570e11aed1 ]
Enable VCN Dynamic Power Gating control for GC IP v11.0.4.
Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2a0fe2ca6e9c9bf9c47a9f9f0d67c13281a13f8c ]
This enable VCN PG, CG and JPEG PG, CG
Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: e1d900df63ad ("drm/amdgpu: enable VCN DPG for GC IP v11.0.4")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 311d52367d0a7985ee1132662bad46f09169eed2 ]
Add common soc21 ip block support for GC 11.0.4.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: e1d900df63ad ("drm/amdgpu: enable VCN DPG for GC IP v11.0.4")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 60cfad329ab877cb62975ea78ed442c2496990ba ]
enable mode1 reset and prioritize debug port on smu_v13_0_10
as a more reliable message processing
v2 - move mode1 reset callback to smu_v13_0_0_ppt.c
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 1794f6a9535b ("drm/amd/pm: enable GPO dynamic control support for SMU13.0.0")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 99f1a36c90a7524972be5a028424c57fa17753ee upstream.
Fixed bug on error when unloading amdgpu.
The error message is as follows:
[ 377.706202] kernel BUG at drivers/gpu/drm/drm_buddy.c:278!
[ 377.706215] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 377.706222] CPU: 4 PID: 8610 Comm: modprobe Tainted: G IOE 6.0.0-thomas #1
[ 377.706231] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021
[ 377.706238] RIP: 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy]
[ 377.706264] Code: 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 04 00 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53
[ 377.706282] RSP: 0018:ffffad2dc4683cb8 EFLAGS: 00010287
[ 377.706289] RAX: 0000000000000000 RBX: ffff8b1743bd5138 RCX: 0000000000000000
[ 377.706297] RDX: ffff8b1743bd5160 RSI: ffff8b1743bd5c78 RDI: ffff8b16d1b25f70
[ 377.706304] RBP: ffff8b1743bd59e0 R08: 0000000000000001 R09: 0000000000000001
[ 377.706311] R10: ffff8b16c8572400 R11: ffffad2dc4683cf0 R12: ffff8b16d1b25f70
[ 377.706318] R13: ffff8b16d1b25fd0 R14: ffff8b1743bd59c0 R15: ffff8b16d1b25f70
[ 377.706325] FS: 00007fec56c72c40(0000) GS:ffff8b1836500000(0000) knlGS:0000000000000000
[ 377.706334] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 377.706340] CR2: 00007f9b88c1ba50 CR3: 0000000110450004 CR4: 00000000003706e0
[ 377.706347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 377.706354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 377.706361] Call Trace:
[ 377.706365] <TASK>
[ 377.706369] drm_buddy_free_list+0x2a/0x60 [drm_buddy]
[ 377.706376] amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu]
[ 377.706572] amdgpu_ttm_fini+0x12e/0x1a0 [amdgpu]
[ 377.706650] amdgpu_bo_fini+0x22/0x90 [amdgpu]
[ 377.706727] gmc_v11_0_sw_fini+0x26/0x30 [amdgpu]
[ 377.706821] amdgpu_device_fini_sw+0xa1/0x3c0 [amdgpu]
[ 377.706897] amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
[ 377.706975] drm_dev_release+0x20/0x40 [drm]
[ 377.707006] release_nodes+0x35/0xb0
[ 377.707014] devres_release_all+0x8b/0xc0
[ 377.707020] device_unbind_cleanup+0xe/0x70
[ 377.707027] device_release_driver_internal+0xee/0x160
[ 377.707033] driver_detach+0x44/0x90
[ 377.707039] bus_remove_driver+0x55/0xe0
[ 377.707045] pci_unregister_driver+0x3b/0x90
[ 377.707052] amdgpu_exit+0x11/0x6c [amdgpu]
[ 377.707194] __x64_sys_delete_module+0x142/0x2b0
[ 377.707201] ? fpregs_assert_state_consistent+0x22/0x50
[ 377.707208] ? exit_to_user_mode_prepare+0x3e/0x190
[ 377.707215] do_syscall_64+0x38/0x90
[ 377.707221] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1923bc5a56daeeabd7e9093bad2febcd6af2416a upstream.
Removing the firmware framebuffer from the driver means that even
if the driver doesn't support the IP blocks in a GPU it will no
longer be functional after the driver fails to initialize.
This change will ensure that unsupported IP blocks at least cause
the driver to work with the EFI framebuffer.
Cc: stable@vger.kernel.org
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6fe6ece398f7431784847e922a2c8c385dc58a35 upstream.
This reverts commit de05abe6b9d0fe08f65d744f7f75a4cba4df27ad.
The bug referenced below was bisected to this commit. There has been no
activity toward fixing it in 3 months, so let's revert for now.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1a799c4c190ea9f0e81028e3eb3037ed0ab17ff5 ]
If kfd_process_device_init_vm returns failure after vm is converted to
compute vm and vm->pasid set to compute pasid, KFD will not take
pdd->drm_file reference. As a result, drm close file handler maybe
called to release the compute pasid before KFD process destroy worker to
release the same pasid and set vm->pasid to zero, this generates below
WARNING backtrace and NULL pointer access.
Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step
of kfd_process_device_init_vm, to ensure vm pasid is the original pasid
if acquiring vm failed or is the compute pasid with pdd->drm_file
reference taken to avoid double release same pasid.
amdgpu: Failed to create process VM object
ida_free called for id=32770 which is not allocated.
WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140
RIP: 0010:ida_free+0x96/0x140
Call Trace:
amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
drm_file_free.part.13+0x216/0x270 [drm]
drm_close_helper.isra.14+0x60/0x70 [drm]
drm_release+0x6e/0xf0 [drm]
__fput+0xcc/0x280
____fput+0xe/0x20
task_work_run+0x96/0xc0
do_exit+0x3d0/0xc10
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:ida_free+0x76/0x140
Call Trace:
amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
drm_file_free.part.13+0x216/0x270 [drm]
drm_close_helper.isra.14+0x60/0x70 [drm]
drm_release+0x6e/0xf0 [drm]
__fput+0xcc/0x280
____fput+0xe/0x20
task_work_run+0x96/0xc0
do_exit+0x3d0/0xc10
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7554886daa31eacc8e7fac9e15bbce67d10b8f1f ]
Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the
requested memory exists, else we get a kernel oops when dereferencing "man".
v2: Make the patch standalone, i.e. not dependent on local patches.
v3: Preserve old behaviour and just check that the manager pointer is not
NULL.
v4: Complain if GTT domain requested and it is uninitialized--most likely a
bug.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 81d0bcf9900932633d270d5bc4a54ff599c6ebdb upstream.
Only apply the static threshold for Stoney and Carrizo.
This hardware has certain requirements that don't allow
mixing of GTT and VRAM. Newer asics do not have these
requirements so we should be able to be more flexible
with where buffers end up.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2270
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2291
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2255
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1d4624cd72b912b2680c08d0be48338a1629a858 upstream.
Some special polaris 10 chips overlap with the polaris11
DID range. Handle this properly in the driver.
v2: use local flags for other function calls.
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 347fafe0eb46df941965c355c77ce480e4d49f1f upstream.
fix MMHUB register base coding error.
Fixes: ec6837591f992 ("drm/amdgpu/gmc10: program the smallK fragment size")
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8660495a9c5b9afeec4cc006b3b75178f0fb2f10 upstream.
MES is part of gfxoff and MES suspend and resume are skipped for S0i3.
But the mes_self_test call path is still in the amdgpu_device_ip_late_init.
it's should also be skipped for s0ix as no hardware re-initialization
happened.
Besides, mes_self_test will free the BO that triggers a lot of warning
messages while in the suspend state.
[ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[ 81.679435] Call Trace:
[ 81.679726] <TASK>
[ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu]
[ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu]
[ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu]
[ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu]
[ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu]
[ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[ 81.684818] pci_pm_resume+0x5c/0xa0
[ 81.685247] ? pci_pm_thaw+0x90/0x90
[ 81.685658] dpm_run_callback+0x4e/0x160
[ 81.686110] device_resume+0xad/0x210
[ 81.686529] async_resume+0x1e/0x40
[ 81.686931] async_run_entry_fn+0x33/0x120
[ 81.687405] process_one_work+0x21d/0x3f0
[ 81.687869] worker_thread+0x4a/0x3c0
[ 81.688293] ? process_one_work+0x3f0/0x3f0
[ 81.688777] kthread+0xff/0x130
[ 81.689157] ? kthread_complete_and_exit+0x20/0x20
[ 81.689707] ret_from_fork+0x22/0x30
[ 81.690118] </TASK>
[ 81.690380] ---[ end trace 0000000000000000 ]---
v2: make the comment clean and use adev->in_s0ix instead of
adev->suspend
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit afa6646b1c5d3affd541f76bd7476e4b835a9174 upstream.
It's also part of gfxoff.
Cc: stable@vger.kernel.org # 6.0, 6.1
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit dfd0287bd3920e132a8dae2a0ec3d92eaff5f2dd ]
In amdgpu_get_xgmi_hive(), we should not call kfree() after
kobject_put() as the PUT will call kfree().
In amdgpu_device_ip_init(), we need to check the returned *hive*
which can be NULL before we dereference it.
Signed-off-by: Liang He <windhl@126.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f0d0f1087333714ee683cc134a95afe331d7ddd9 ]
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:
drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c:412:15: error: incompatible function pointer types initializing 'void (*)(struct amdgpu_device *, u32, u32, u32, u32)' (aka 'void (*)(struct amdgpu_device *, unsigned int, unsigned int, unsigned int, unsigned int)') with an expression of type 'void (struct amdgpu_device *, enum idh_request, u32, u32, u32)' (aka 'void (struct amdgpu_device *, enum idh_request, unsigned int, unsigned int, unsigned int)') [-Werror,-Wincompatible-function-pointer-types-strict]
.trans_msg = xgpu_ai_mailbox_trans_msg,
^~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:435:15: error: incompatible function pointer types initializing 'void (*)(struct amdgpu_device *, u32, u32, u32, u32)' (aka 'void (*)(struct amdgpu_device *, unsigned int, unsigned int, unsigned int, unsigned int)') with an expression of type 'void (struct amdgpu_device *, enum idh_request, u32, u32, u32)' (aka 'void (struct amdgpu_device *, enum idh_request, unsigned int, unsigned int, unsigned int)') [-Werror,-Wincompatible-function-pointer-types-strict]
.trans_msg = xgpu_nv_mailbox_trans_msg,
^~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
The type of the second parameter in the prototype should be 'enum
idh_request' instead of 'u32'. Update it to clear up the warnings.
Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 75818afff631e1ea785a82c3e8bb82eb0dee539c ]
This patch fixes potential memory leakage and seg fault
in _gpuvm_import_dmabuf() function
Fixes: d4ec4bdc0bd5 ("drm/amdkfd: Allow access for mmapping KFD BOs")
Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ca54639c7752edf1304d92ff4d0c049d4efc9ba0 ]
As comment of pci_get_class() says, it returns a pci_device with its
refcount increased and decreased the refcount for the input parameter
@from if it is not NULL.
If we break the loop in amdgpu_atrm_get_bios() with 'pdev' not NULL, we
need to call pci_dev_put() to decrease the refcount. Add the missing
pci_dev_put() to avoid refcount leak.
Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 65009bf2b4d287ef7ad7e6eb082b7c3d35eb611f ]
Corrected the typo in the 4K resolution parameters.
Fixes: b3a24461f9fb15 ("amdgpu/nv.c - Added codec query for Beige Goby")
Fixes: 9075096b09e590 ("amdgpu/nv.c - Optimize code for video codec support structure")
Fixes: 9ac0edaa0f8323 ("drm/amdgpu: add vcn_4_0_0 video codec query")
Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b85e285e3d6352b02947fc1b72303673dfacb0aa ]
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().
So before returning from amdgpu_device_resume|suspend_display_audio(),
pci_dev_put() is called to avoid refcount leak.
Fixes: 3f12acc8d6d4 ("drm/amdgpu: put the audio codec into suspend state before gpu reset V3")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
In the SDMA s0ix save process requires to turn off SDMA ring buffer for
avoiding the SDMA in-flight request, otherwise will suffer from SDMA page
fault which causes by page request from in-flight SDMA ring accessing at
SDMA restore phase.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2248
Cc: stable@vger.kernel.org # 6.0,5.15+
Fixes: f8f4e2a51834 ("drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.")
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
So that uses PSP to initialize HW.
Fixes: 0c2c02b66c672e ("drm/amdgpu/vcn: add firmware support for dimgrey_cavefish")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.1-2022-11-23:
amdgpu:
- DCN 3.1.4 fixes
- DP MST DSC deadlock fixes
- HMM userptr fixes
- Fix Aldebaran CU occupancy reporting
- GFX11 fixes
- PSP suspend/resume fix
- DCE12 KASAN fix
- DCN 3.2.x fixes
- Rotated cursor fix
- SMU 13.x fix
- DELL platform suspend/resume fixes
- VCN4 SR-IOV fix
- Display regression fix for polled connectors
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221123143453.8977-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc7:
- Another amdgpu gang submit fix.
- Use dma_fence_unwrap_for_each when importing sync files.
- Fix race in dma_heap_add().
- Fix use of uninitialized memory in logo.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a5721505-4823-98ef-7d6f-0ea478221391@linux.intel.com
|
|
root cause that S2A need to use deduct offset flag.
after setting this flag, vcn0 doorbell value works.
so return it as before
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
when the edid is read"
This partially reverts 20543be93ca45968f344261c1a997177e51bd7e1.
Calling drm_connector_update_edid_property() in
amdgpu_connector_free_edid() causes a noticeable pause in
the system every 10 seconds on polled outputs so revert this
part of the change.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2257
Cc: Claudio Suarez <cssk@net-c.es>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
[Why]
[ 754.862560] refcount_t: underflow; use-after-free.
[ 754.862898] Call Trace:
[ 754.862903] <TASK>
[ 754.862913] amdgpu_job_free_cb+0xc2/0xe1 [amdgpu]
[ 754.863543] drm_sched_main.cold+0x34/0x39 [amd_sched]
[How]
The fw_fence may be not init, check whether dma_fence_init
is performed before job free
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We can reuse the same buffers on resume.
v2: squash in S4 fix from Shikai
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2213
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
If mes enabled, reserve VM invalidation engine 5 for firmware.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
|
|
Allow user to know number of compute units (CU) that are in use at any
given moment. Enable access to the method kgd_gfx_v9_get_cu_occupancy
that computes CU occupancy.
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
The basic problem here is that it's not allowed to page fault while
holding the reservation lock.
So it can happen that multiple processes try to validate an userptr
at the same time.
Work around that by putting the HMM range object into the mutex
protected bo list for now.
v2: make sure range is set to NULL in case of an error
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
CC: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Since switching to HMM we always need that because we no longer grab
references to the pages.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
CC: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Otherwise it can happen that not all gang members can get a VMID
assigned and we deadlock.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 68ce8b242242 ("drm/amdgpu: add gang submit backend v2")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221118153023.312582-1-christian.koenig@amd.com
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.1-2022-11-16:
amdgpu:
- Fix a possible memory leak in ganng submit error path
- DP tunneling fixes
- DCN 3.1 page flip fix
- DCN 3.2.x fixes
- DCN 3.1.4 fixes
- Don't expose degamma on hardware that doesn't support it
- BACO fixes for SMU 11.x
- BACO fixes for SMU 13.x
- Virtual display fix for devices with no display hardware
amdkfd:
- Memory limit regression fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221117040416.6100-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc6:
- Fix error handling in vc4_atomic_commit_tail()
- Set bpc for logictechno panels.
- Fix potential memory leak in drm_dev_init()
- Fix potential null-ptr-deref in drm_vblank_destroy_worker()
- Set lima's clkname corrrectly when regulator is missing.
- Small amdgpu fix to gang submission.
- Revert hiding unregistered connectors from userspace, as it breaks on DP-MST.
- Add workaround for DP++ dual mode adaptors that don't support
i2c subaddressing.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c7d02936-c550-199b-6cb7-cbf6cf104e4a@linux.intel.com
|
|
If we enable virtual display functionality on parts with
no display hardware we can end up trying to check for and
reserve the vbios FB area on devices where it doesn't exist.
Check if display hardware is actually present on the hardware
before trying to reserve the memory.
v2: move the check into common code
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It is to resolve a regression, which fails to allocate
VRAM due to no free memory in application, the reason
is we add check of vram_pin_size for memory limit, and
application is pinning the memory for Peerdirect, KFD
should not count it in memory limit. So removing
vram_pin_size will resolve it.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
TA firmware loaded on psp v13_0_10, but it is missing in modinfo.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When p->gang_size equals 0, amdgpu_cs_pass1() will return directly
without freeing chunk_array, which will cause a memory leak issue,
this patch fixes it.
Fixes: 4624459c84d7 ("drm/amdgpu: add gang submit frontend v6")
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It turned out that not the last IB specified is the gang leader,
but instead the last job allocated.
This is a bit unfortunate and not very intuitive for the CS
interface, so try to fix this.
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221115094206.6181-1-christian.koenig@amd.com
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 4624459c84d7 ("drm/amdgpu: add gang submit frontend v6")
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc5:
- HDMI fixes to vc4.
- Make panfrost's uapi header compile with C++.
- Add rotation quirks for 2 panels.
- Fix s/r in amdgpu_vram_mgr_new
- Handle 1 gb boundary correctly in panfrost mmu code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e02de501-4b85-28a0-3f6e-751ca13f5f9d@linux.intel.com
|
|
Re-take the eviction lock immediately again after the allocation is
completed, to fix circular locking warning with drm_buddy allocator.
Move amdgpu_vm_eviction_lock/unlock/trylock to amdgpu_vm.h as they are
called from multiple files.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Get below kernel WARNING backtrace when pressing ctrl-C to kill kfdtest
application.
If amdgpu_cs_parser_bos returns error after taking bo_list_mutex, as
caller amdgpu_cs_ioctl will not unlock bo_list_mutex, this generates the
kernel WARNING.
Add unlock bo_list_mutex after amdgpu_cs_parser_bos error handling to
cleanup bo_list userptr bo.
WARNING: kfdtest/2930 still has locks held!
1 lock held by kfdtest/2930:
(&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0xce5/0x1f10 [amdgpu]
stack backtrace:
dump_stack_lvl+0x44/0x57
get_signal+0x79f/0xd00
arch_do_signal_or_restart+0x36/0x7b0
exit_to_user_mode_prepare+0xfd/0x1b0
syscall_exit_to_user_mode+0x19/0x40
do_syscall_64+0x40/0x80
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It can happen that we query the sequence value before the callback
had a chance to run.
Workaround that by grabbing the fence lock and releasing it again.
Should be replaced by hw handling soon.
Signed-off-by: Christian König <christian.koenig@amd.com>
CC: stable@vger.kernel.org # 5.19+
Fixes: 5255e146c99a6 ("drm/amdgpu: rework TLB flushing")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2113
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Stefan Springer <stefanspr94@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Because the value of man->size is changed during suspend/resume process,
use mgr->mm.size instead of man->size here for lpfn checking.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220914125331.2467162-1-Jun.Ma2@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
The recent change brought a bug on SRIOV envrionment. It caused
unloading amdgpu failed on Guest VM. The reason is that the VF
FLR was requested while unloading amdgpu driver, but the VF FLR
of SRIOV sequence is wrong while removing PCI device.
For SRIOV, the guest driver should not trigger the whole XGMI hive
to do the reset. Host driver control how the device been reset.
Fixes: f5c7e7797060 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Temporary workaround to fix issues observed in some compute applications
when GFXOFF is enabled on GFX11.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
|
|
If a system does not have swap and memory is under 100% usage,
amdgpu will fail to evict resources. Currently the suspend
carries on proceeding to reset the GPU:
```
[drm] evicting device resources failed
[drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vcn_v3_0> failed -12
[drm] free PSP TMR buffer
[TTM] Failed allocating page table
[drm] evicting device resources failed
amdgpu 0000:03:00.0: amdgpu: MODE1 reset
amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
```
At this point if the suspend actually succeeded I think that amdgpu
would have recovered because the GPU would have power cut off and
restored. However the kernel fails to continue the suspend from the
memory pressure and amdgpu fails to run the "resume" from the aborted
suspend.
```
ACPI: PM: Preparing to enter system sleep state S3
SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL|__GFP_ZERO)
cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0
node 0: slabs: 22, objs: 1122, free: 0
ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651)
[drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed!
[drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
[drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62
amdgpu 0000:03:00.0: PM: failed to resume async: error -62
```
To avoid this series of unfortunate events, fail amdgpu's suspend
when the memory eviction fails. This will let the system gracefully
recover and the user can try suspend again when the memory pressure
is relieved.
Reported-by: post@davidak.de
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use mes.sched_version, mes.kiq_version for debugfs as
mes.ucode_fw_version does not contain correct versioning information.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|