summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
10 daysdrm/amdgpu: Forward VMID reservation errorsNatalie Vock1-2/+4
[ Upstream commit 8defb4f081a5feccc3ea8372d0c7af3522124e1f ] Otherwise userspace may be fooled into believing it has a reserved VMID when in reality it doesn't, ultimately leading to GPU hangs when SPM is used. Fixes: 80e709ee6ecc ("drm/amdgpu: add option params to enforce process isolation between graphics and compute") Cc: stable@vger.kernel.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> [ adapted 3-argument amdgpu_vmid_alloc_reserved(adev, vm, vmhub) call to 2-argument version and added separate error check to preserve reserved_vmid tracking logic. ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysdrm/amdgpu: add missing lock to amdgpu_ttm_access_memory_sdmaPierre-Eric Pelloux-Prayer1-0/+2
commit 4fa944255be521b1bbd9780383f77206303a3a5c upstream. Users of ttm entities need to hold the gtt_window_lock before using them to guarantee proper ordering of jobs. Cc: stable@vger.kernel.org Fixes: cb5cc4f573e1 ("drm/amdgpu: improve debug VRAM access performance using sdma") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-07drm/amdgpu: fix cyan_skillfish2 gpu info fw handlingAlex Deucher1-0/+2
[ Upstream commit 7fa666ab07ba9e08f52f357cb8e1aad753e83ac6 ] If the board supports IP discovery, we don't need to parse the gpu info firmware. Backport to 6.18. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4721 Fixes: fa819e3a7c1e ("drm/amdgpu: add support for cyan skillfish gpu_info") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5427e32fa3a0ba9a016db83877851ed277b065fb) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-01drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabledYifan Zha1-2/+2
commit 80d8a9ad1587b64c545d515ab6cb7ecb9908e1b3 upstream. [Why] Accoreding to CP updated to RS64 on gfx11, WRITE_DATA with PREEMPTION_META_MEMORY(dst_sel=8) is illegal for CP FW. That packet is used for MCBP on F32 based system. So it would lead to incorrect GRBM write and FW is not handling that extra case correctly. [How] With gfx11 rs64 enabled, skip emit de meta data. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8366cd442d226463e673bed5d199df916f4ecbcf) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-24drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devicesJesse.Zhang3-6/+7
[ Upstream commit 883f309add55060233bf11c1ea6947140372920f ] Previously, APU platforms (and other scenarios with uninitialized VRAM managers) triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root cause is not that the `struct ttm_resource_manager *man` pointer itself is NULL, but that `man->bdev` (the backing device pointer within the manager) remains uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to a kernel OOPS. 1. **amdgpu_cs.c**: Extend the existing bandwidth control check in `amdgpu_cs_get_threshold_for_moves()` to include a check for `ttm_resource_manager_used()`. If the manager is not used (uninitialized `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific logic that would trigger the NULL dereference. 2. **amdgpu_kms.c**: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info reporting to use a conditional: if the manager is used, return the real VRAM usage; otherwise, return 0. This avoids accessing `man->bdev` when it is NULL. 3. **amdgpu_virt.c**: Modify the vf2pf (virtual function to physical function) data write path. Use `ttm_resource_manager_used()` to check validity: if the manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set `fb_usage` to 0 (APUs have no discrete framebuffer to report). This approach is more robust than APU-specific checks because it: - Works for all scenarios where the VRAM manager is uninitialized (not just APUs), - Aligns with TTM's design by using its native helper function, - Preserves correct behavior for discrete GPUs (which have fully initialized `man->bdev` and pass the `ttm_resource_manager_used()` check). v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24drm/amd: Fix suspend failure with secure display TAMario Limonciello1-1/+4
[ Upstream commit b09cb2996cdf50cd1ab4020e002c95d742c81313 ] commit c760bcda83571 ("drm/amd: Check whether secure display TA loaded successfully") attempted to fix extra messages, but failed to port the cleanup that was in commit 5c6d52ff4b61e ("drm/amd: Don't try to enable secure display TA multiple times") to prevent multiple tries. Add that to the failure handling path even on a quick failure. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679 Fixes: c760bcda8357 ("drm/amd: Check whether secure display TA loaded successfully") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4104c0a454f6a4d1e0d14895d03c0e7bdd0c8240) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24drm/amdgpu: Fix function header names in amdgpu_connectors.cSrinivasan Shanmugam1-3/+12
commit 38ab33dbea594700c8d6cc81eec0a54e95d3eb2f upstream. Align the function headers for `amdgpu_max_hdmi_pixel_clock` and `amdgpu_connector_dvi_mode_valid` with the function implementations so they match the expected kdoc style. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1199: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Returns the maximum supported HDMI (TMDS) pixel clock in KHz. drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1212: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Validates the given display mode on DVI and HDMI connectors. Fixes: 585b2f685c56 ("drm/amdgpu: Respect max pixel clock for HDMI and DVI-D (v2)") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-24drm/amdgpu: reject gang submissions under SRIOVChristian König1-1/+1
[ Upstream commit d7ddcf921e7d0d8ebe82e89635bc9dc26ba9540d ] Gang submission means that the kernel driver guarantees that multiple submissions are executed on the HW at the same time on different engines. Background is that those submissions then depend on each other and each can't finish stand alone. SRIOV now uses world switch to preempt submissions on the engines to allow sharing the HW resources between multiple VFs. The problem is now that the SRIOV world switch can't know about such inter dependencies and will cause a timeout if it waits for a partially running gang submission. To conclude SRIOV and gang submissions are fundamentally incompatible at the moment. For now just disable them. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24drm/amd: Avoid evicting resources at S5Mario Limonciello (AMD)1-0/+4
[ Upstream commit 531df041f2a5296174abd8292d298eb62fe1ea97 ] Normally resources are evicted on dGPUs at suspend or hibernate and on APUs at hibernate. These steps are unnecessary when using the S4 callbacks to put the system into S5. Cc: AceLan Kao <acelan.kao@canonical.com> Cc: Kai-Heng Feng <kaihengf@nvidia.com> Cc: Mark Pearson <mpearson-lenovo@squebb.ca> Cc: Denis Benato <benato.denis96@gmail.com> Cc: Merthan Karakaş <m3rthn.k@gmail.com> Tested-by: Eric Naim <dnaim@cachyos.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24drm/amdgpu: Use memdup_array_user in amdgpu_cs_wait_fences_ioctlTvrtko Ursulin1-14/+5
[ Upstream commit dea75df7afe14d6217576dbc28cc3ec1d1f712fb ] Replace kmalloc_array() + copy_from_user() with memdup_array_user(). This shrinks the source code and improves separation between the kernel and userspace slabs. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24drm/amdgpu: add support for cyan skillfish gpu_infoAlex Deucher1-0/+4
[ Upstream commit fa819e3a7c1ee994ce014cc5a991c7fd91bc00f1 ] Some SOCs which are part of the cyan skillfish family rely on an explicit firmware for IP discovery. Add support for the gpu_info firmware. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24drm/amdgpu: don't enable SMU on cyan skillfishAlex Deucher1-1/+4
[ Upstream commit 94bd7bf2c920998b4c756bc8a54fd3dbdf7e4360 ] Cyan skillfish uses different SMU firmware. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24drm/amd: add more cyan skillfish PCI idsAlex Deucher1-0/+5
[ Upstream commit 1e18746381793bef7c715fc5ec5611a422a75c4c ] Add additional PCI IDs to the cyan skillfish family. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24drm/amdgpu: Respect max pixel clock for HDMI and DVI-D (v2)Timur Kristóf1-13/+44
[ Upstream commit 585b2f685c56c5095cc22c7202bf74d8e9a73cdd ] Update the legacy (non-DC) display code to respect the maximum pixel clock for HDMI and DVI-D. Reject modes that would require a higher pixel clock than can be supported. Also update the maximum supported HDMI clock value depending on the ASIC type. For reference, see the DC code: check max_hdmi_pixel_clock in dce*_resource.c v2: Fix maximum clocks for DVI-D and DVI/HDMI adapters. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-24drm/amdgpu/jpeg: Hold pg_lock before jpeg poweroffSathishkumar S1-2/+4
[ Upstream commit 0e7581eda8c76d1ca4cf519631a4d4eb9f82b94c ] Acquire jpeg_pg_lock before changes to jpeg power state and release it after power off from idle work handler. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23drm/amd: Check whether secure display TA loaded successfullyMario Limonciello1-1/+1
commit c760bcda83571e07b72c10d9da175db5051ed971 upstream. [Why] Not all renoir hardware supports secure display. If the TA is present but the feature isn't supported it will fail to load or send commands. This shows ERR messages to the user that make it seems like there is a problem. [How] Check the resp_status of the context to see if there was an error before trying to send any secure display commands. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1415 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Adrian Yip <adrian.ytw@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-23drm/amdgpu: use atomic functions with memory barriers for vm fault infoGui-Dong Han3-11/+8
commit 6df8e84aa6b5b1812cc2cacd6b3f5ccbb18cda2b upstream. The atomic variable vm_fault_info_updated is used to synchronize access to adev->gmc.vm_fault_info between the interrupt handler and get_vm_fault_info(). The default atomic functions like atomic_set() and atomic_read() do not provide memory barriers. This allows for CPU instruction reordering, meaning the memory accesses to vm_fault_info and the vm_fault_info_updated flag are not guaranteed to occur in the intended order. This creates a race condition that can lead to inconsistent or stale data being used. The previous implementation, which used an explicit mb(), was incomplete and inefficient. It failed to account for all potential CPU reorderings, such as the access of vm_fault_info being reordered before the atomic_read of the flag. This approach is also more verbose and less performant than using the proper atomic functions with acquire/release semantics. Fix this by switching to atomic_set_release() and atomic_read_acquire(). These functions provide the necessary acquire and release semantics, which act as memory barriers to ensure the correct order of operations. It is also more efficient and idiomatic than using explicit full memory barriers. Fixes: b97dfa27ef3a ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15drm/amdgpu: Power up UVD 3 for FW validation (v2)Timur Kristóf1-4/+25
[ Upstream commit c661219cd7be75bb5599b525f16a455a058eb516 ] Unlike later versions, UVD 3 has firmware validation. For this to work, the UVD should be powered up correctly. When DPM is enabled and the display clock is off, the SMU may choose a power state which doesn't power the UVD, which can result in failure to initialize UVD. v2: Add code comments to explain about the UVD power state and how UVD clock is turned on/off. Fixes: b38f3e80ecec ("drm amdgpu: SI UVD v3_1 (v2)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-02minmax: make generic MIN() and MAX() macros available everywhereLinus Torvalds1-0/+2
[ Upstream commit 1a251f52cfdc417c84411a056bc142cbd77baef4 ] This just standardizes the use of MIN() and MAX() macros, with the very traditional semantics. The goal is to use these for C constant expressions and for top-level / static initializers, and so be able to simplify the min()/max() macros. These macro names were used by various kernel code - they are very traditional, after all - and all such users have been fixed up, with a few different approaches: - trivial duplicated macro definitions have been removed Note that 'trivial' here means that it's obviously kernel code that already included all the major kernel headers, and thus gets the new generic MIN/MAX macros automatically. - non-trivial duplicated macro definitions are guarded with #ifndef This is the "yes, they define their own versions, but no, the include situation is not entirely obvious, and maybe they don't get the generic version automatically" case. - strange use case #1 A couple of drivers decided that the way they want to describe their versioning is with #define MAJ 1 #define MIN 2 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) which adds zero value and I just did my Alexander the Great impersonation, and rewrote that pointless Gordian knot as #define DRV_VERSION "1.2" instead. - strange use case #2 A couple of drivers thought that it's a good idea to have a random 'MIN' or 'MAX' define for a value or index into a table, rather than the traditional macro that takes arguments. These values were re-written as C enum's instead. The new function-line macros only expand when followed by an open parenthesis, and thus don't clash with enum use. Happily, there weren't really all that many of these cases, and a lot of users already had the pattern of using '#ifndef' guarding (or in one case just using '#undef MIN') before defining their own private version that does the same thing. I left such cases alone. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19drm/amdgpu: fix a memory leak in fence cleanup when unloadingAlex Deucher1-3/+0
[ Upstream commit 7838fb5f119191403560eca2e23613380c0e425e ] Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting. Reported-by: Lin.Cao <lincao12@amd.com> Cc: Vitaly Prosyak <vitaly.prosyak@amd.com> Cc: Christian König <christian.koenig@amd.com> Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a525fa37aac36c4591cc8b07ae8957862415fbd5) Cc: stable@vger.kernel.org [ Adapt to conditional check ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packagesDavid Rosca1-31/+21
commit 2b10cb58d7a3fd621ec9b2ba765a092e562ef998 upstream. There can be multiple engine info packages in one IB and the first one may be common engine, not decode/encode. We need to parse the entire IB instead of stopping after finding first engine info. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dc8f9f0f45166a6b37864e7a031c726981d6e5fc) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any timeDavid Rosca2-8/+16
commit 3318f2d20ce48849855df5e190813826d0bc3653 upstream. There is no reason to require this to happen on first submitted IB only. We need to wait for the queue to be idle, but it can be done at any time (including when there are multiple video sessions active). Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8908fdce0634a623404e9923ed2f536101a39db5) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-09drm/amd/amdgpu: Fix missing error return on kzalloc failureColin Ian King1-1/+1
[ Upstream commit 467e00b30dfe75c4cfc2197ceef1fddca06adc25 ] Currently the kzalloc failure check just sets reports the failure and sets the variable ret to -ENOMEM, which is not checked later for this specific error. Fix this by just returning -ENOMEM rather than setting ret. Fixes: 4fb930715468 ("drm/amd/amdgpu: remove redundant host to psp cmd buf allocations") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1ee9d1a0962c13ba5ab7e47d33a80e3b8dc4b52e) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-09drm/amdgpu: Replace DRM_* with dev_* in amdgpu_psp.cHawking Zhang1-69/+75
[ Upstream commit ac3ff8a90637e813005404a0110802aa384af4aa ] So kernel message has the device pcie bdf information, which helps issue debugging especially in multiple GPU system. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 467e00b30dfe ("drm/amd/amdgpu: Fix missing error return on kzalloc failure") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-09Revert "drm/amdgpu: Avoid extra evict-restore process."Alex Deucher1-2/+4
This reverts commit a3201e3b7cf10bcd3d7eef4859d275eb6d98e12a which is commit 1f02f2044bda1db1fd995bc35961ab075fa7b5a2 upstream. This commit introduced a regression, however the fix for the regression: aa5fc4362fac ("drm/amdgpu: fix task hang from failed job submission during process kill") depends on things not yet present in 6.12.y and older kernels. Since this commit is more of an optimization, just revert it for 6.12.y and older stable kernels. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x - 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-09drm/amdgpu: drop hw access in non-DC audio finiAlex Deucher4-20/+0
commit 71403f58b4bb6c13b71c05505593a355f697fd94 upstream. We already disable the audio pins in hw_fini so there is no need to do it again in sw_fini. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4481 Cc: oushixiong <oushixiong1025@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5eeb16ca727f11278b2917fd4311a7d7efb0bbd6) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-04Revert "drm/amdgpu: fix incorrect vm flags to map bo"Alex Deucher1-2/+2
commit ac4ed2da4c1305a1a002415058aa7deaf49ffe3e upstream. This reverts commit b08425fa77ad2f305fe57a33dceb456be03b653f. Revert this to align with 6.17 because the fixes tag was wrong on this commit. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit be33e8a239aac204d7e9e673c4220ef244eb1ba3) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28drm/amdgpu: update mmhub 3.0.1 client id mappingsAlex Deucher1-25/+32
commit 0bae62cc989fa99ac9cb564eb573aad916d1eb61 upstream. Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2a2681eda73b99a2c1ee8cdb006099ea5d0c2505) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28drm/amdgpu: Avoid extra evict-restore process.Gang Ba1-4/+2
commit 1f02f2044bda1db1fd995bc35961ab075fa7b5a2 upstream. If vm belongs to another process, this is fclose after fork, wait may enable signaling KFD eviction fence and cause parent process queue evicted. [677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu] [677852.634814] __dma_fence_enable_signaling+0x3e/0xe0 [677852.634820] dma_fence_wait_timeout+0x3a/0x140 [677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl] [677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu] [677852.635026] amdgpu_flush+0x34/0x50 [amdgpu] [677852.635208] filp_flush+0x38/0x90 [677852.635213] filp_close+0x14/0x30 [677852.635216] do_close_on_exec+0xdd/0x130 [677852.635221] begin_new_exec+0x1da/0x490 [677852.635225] load_elf_binary+0x307/0xea0 [677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5 [677852.635235] ? ima_bprm_check+0xa2/0xd0 [677852.635240] search_binary_handler+0xda/0x260 [677852.635245] exec_binprm+0x58/0x1a0 [677852.635249] bprm_execve.part.0+0x16f/0x210 [677852.635254] bprm_execve+0x45/0x80 [677852.635257] do_execveat_common.isra.0+0x190/0x200 Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Gang Ba <Gang.Ba@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28drm/amdgpu: fix incorrect vm flags to map boJack Xiao1-2/+2
[ Upstream commit 040bc6d0e0e9c814c9c663f6f1544ebaff6824a8 ] It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b08425fa77ad2f305fe57a33dceb456be03b653f) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-24drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resumeEeli Haapalainen1-0/+1
commit 83261934015c434fabb980a3e613b01d9976e877 upstream. Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the ring pointer always to be reset on resume from suspend. This caused compute rings to fail since the reset was done without also resetting it for the firmware. Reset wptr on the GPU to avoid a disconnect between the driver and firmware wptr. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911 Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2becafc319db3d96205320f31cc0de4ee5a93747) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: switch job hw_fence to amdgpu_fenceAlex Deucher6-32/+32
commit ebe43542702c3d15d1a1d95e8e13b1b54076f05a upstream. Use the amdgpu fence container so we can store additional data in the fence. This also fixes the start_time handling for MCBP since we were casting the fence to an amdgpu_fence and it wasn't. Fixes: 3f4c175d62d8 ("drm/amdgpu: MCBP based on DRM scheduler (v9)") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bf1cd14f9e2e1fdf981eed273ddd595863f5288c) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: Add kicker device detectionFrank Min2-0/+23
commit 0bbf5fd86c585d437b75003f11365b324360a5d6 upstream. 1. add kicker device list 2. add kicker device checking helper function Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 09aa2b408f4ab689c3541d22b0968de0392ee406) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vramJohn Olender1-1/+1
commit 4d2f6b4e4c7ed32e7fa39fcea37344a9eab99094 upstream. The drm_mm allocator tolerated being passed end > mm->size, but the drm_buddy allocator does not. Restore the pre-buddy-allocator behavior of allowing such placements. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448 Signed-off-by: John Olender <john.olender@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04drm/amdgpu: enlarge the VBIOS binary size limitShiwu Zhang1-1/+1
[ Upstream commit 667b96134c9e206aebe40985650bf478935cbe04 ] Some chips have a larger VBIOS file so raise the size limit to support the flashing tool. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04drm/amdgpu: reset psp->cmd to NULL after releasing the bufferJiang Liu1-3/+2
[ Upstream commit e92f3f94cad24154fd3baae30c6dfb918492278d ] Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini(). Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04drm/amdgpu: Set snoop bit for SDMA for MI seriesHarish Kasiviswanathan3-0/+83
[ Upstream commit 3394b1f76d3f8adf695ceed350a5dae49003eb37 ] SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for this to happen. v2: Missed a few mmhub_v9_4. Added now. v3: Calculate hub offset once since it doesn't change inside the loop Modified function names based on review comments. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.cVictor Lu1-5/+5
[ Upstream commit 057fef20b8401110a7bc1c2fe9d804a8a0bf0d24 ] SRIOV VF does not have write access to AGP BAR regs. Skip the writes to avoid a dmesg warning. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04drm/amdgpu: Update SRIOV video codec capsDavid Rosca2-16/+10
[ Upstream commit 19478f2011f8b53dee401c91423c4e0b73753e4f ] There have been multiple fixes to the video caps that are missing for SRIOV. Update the SRIOV caps with correct values. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04drm/amdgpu: Allow P2P access through XGMIFelix Kuehling1-1/+29
[ Upstream commit a92741e72f91b904c1d8c3d409ed8dbe9c1f2b26 ] If peer memory is accessible through XGMI, allow leaving it in VRAM rather than forcing its migration to GTT on DMABuf attachment. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Tested-by: Hao (Claire) Zhou <hao.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 372c8d72c3680fdea3fbb2d6b089f76b4a6d596a) Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22drm/amdgpu: fix pm notifier handlingAlex Deucher2-22/+6
commit 4aaffc85751da5722e858e4333e8cf0aa4b6c78f upstream. Set the s3/s0ix and s4 flags in the pm notifier so that we can skip the resource evictions properly in pm prepare based on whether we are suspending or hibernating. Drop the eviction as processes are not frozen at this time, we we can end up getting stuck trying to evict VRAM while applications continue to submit work which causes the buffers to get pulled back into VRAM. v2: Move suspend flags out of pm notifier (Mario) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support") Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 06f2dcc241e7e5c681f81fbc46cacdf4bfd7d6d7) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22Revert "drm/amd: Stop evicting resources on APUs in suspend"Alex Deucher3-29/+2
[ Upstream commit d0ce1aaa8531a4a4707711cab5721374751c51b0 ] This reverts commit 3a9626c816db901def438dc2513622e281186d39. This breaks S4 because we end up setting the s3/s0ix flags even when we are entering s4 since prepare is used by both flows. The causes both the S3/s0ix and s4 flags to be set which breaks several checks in the driver which assume they are mutually exclusive. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3634 Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ce8f7d95899c2869b47ea6ce0b3e5bf304b2fff4) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22drm/amd: Add Suspend/Hibernate notification callback supportMario Limonciello3-2/+46
[ Upstream commit 2965e6355dcdf157b5fafa25a2715f00064da8bf ] As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend. Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible. Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary. Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22drm/amdgpu: trigger flr_work if reading pf2vf data failedZhigang Luo5-10/+41
[ Upstream commit ab66c832847fcdffc97d4591ba5547e3990d9d33 ] if reading pf2vf data failed 30 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue. also use dev_err to print the error message to get which device has issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL timeout. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22drm/amdgpu: Fix the runtime resume failure issueMa Jun1-0/+3
[ Upstream commit bbfaf2aea7164db59739728d62d9cc91d64ff856 ] Don't set power state flag when system enter runtime suspend, or it may cause runtime resume failure issue. Fixes: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22drm/amd: Stop evicting resources on APUs in suspendMario Limonciello3-2/+26
[ Upstream commit 226db36032c61d8717dfdd052adac351b22d3e83 ] commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") intentionally moved the eviction of resources to earlier in the suspend process, but this introduced a subtle change that it occurs before adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to evict resources at suspend time as well. Explicitly set s0ix or s3 in the prepare() stage, and unset them if the prepare() stage failed. v2: squash in warning fix from Stephen Rothwell Reported-by: Jürg Billeter <j@bitron.ch> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038 Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18drm/amdgpu/hdp6: use memcfg register to post the write for HDP flushAlex Deucher1-1/+6
commit ca28e80abe4219c8f1a2961ae05102d70af6dc87 upstream. Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: abe1cbaec6cf ("drm/amdgpu/hdp6.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 84141ff615951359c9a99696fd79a36c465ed847) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18drm/amdgpu/hdp5: use memcfg register to post the write for HDP flushAlex Deucher1-1/+6
commit 0e33e0f339b91eecd9558311449a3d1e728722d4 upstream. Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: cf424020e040 ("drm/amdgpu/hdp5.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a5cb344033c7598762e89255e8ff52827abb57a4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18drm/amdgpu/hdp5.2: use memcfg register to post the write for HDP flushAlex Deucher1-1/+11
commit dbc988c689333faeeed44d5561f372ff20395304 upstream. Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: f756dbac1ce1 ("drm/amdgpu/hdp5.2: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4a89b7698e771914b4d5b571600c76e2fdcbe2a9) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18drm/amdgpu/hdp4: use memcfg register to post the write for HDP flushAlex Deucher1-1/+6
commit f690e3974755a650259a45d71456decc9c96a282 upstream. Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: c9b8dcabb52a ("drm/amdgpu/hdp4.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5c937b4a6050316af37ef214825b6340b5e9e391) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>