| Age | Commit message (Collapse) | Author | Files | Lines |
|
Timeline Management v7
[ Upstream commit efdc66fe12b07e7b7d28650bd8d4f7e3bb92c5d4 ]
When GPU memory mappings are updated, the driver returns a fence so
userspace knows when the update is finished.
The previous refactor could pick the wrong fence or rely on checks that
are not safe for GPU mappings that stay valid even when memory is
missing. In some cases this could return an invalid fence or cause fence
reference counting problems.
Fix this by (v5,v6, per Christian):
- Starting from the VM’s existing last update fence, so a valid and
meaningful fence is always returned even when no new work is required.
- Selecting the VM-level fence only for always-valid / PRT mappings using
the required combined bo_va + bo guard.
- Using the per-BO page table update fence for normal MAP and REPLACE
operations.
- For UNMAP and CLEAR, returning the fence provided by
amdgpu_vm_clear_freed(), which may remain unchanged when nothing needs
clearing.
- Keeping fence reference counting balanced.
v7: Drop the extra bo_va/bo NULL guard since
amdgpu_vm_is_bo_always_valid() handles NULL BOs correctly (including
PRT). (Christian)
This makes VM timeline fences correct and prevents crashes caused by
incorrect fence handling.
Fixes: bd8150a1b337 ("drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline Management v4")
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 096bb75e13cc508d3915b7604e356bcb12b17766 ]
On Intel MacBookPros with switchable graphics, when the iGPU
is enabled, the address of VRAM gets put at 0 in the dGPU's
virtual address space. This is non-standard and seems to cause
issues with the cursor if it ends up at 0. We have the framework
to reserve memory at 0 in the address space, so enable it here if
the vram start address is 0.
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4302
Cc: stable@vger.kernel.org
Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b6a65009e7ce3f0cc72da18f186adb60717b51a0 ]
[Why]
Fix fastboot broken in driver.
This is caused by an open source backport change 7495962c.
from the comment, the intended check is to disable fastboot
for pre-DCN10. but the logic check is reversed, and causes
fastboot to be disabled on all DCN10 and after.
fastboot is for driver trying to pick up bios used hw setting
and bypass reprogramming the hw if dc_validate_boot_timing()
condition meets.
Fixes: 7495962cbceb ("drm/amd/display: Disable fastboot on DCE 6 too")
Cc: stable@vger.kernel.org
Reviewed-by: Mario Limonciello <Mario.Limonciello@amd.com>
Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b18fc0ab837381c1a6ef28386602cd888f2d9edf ]
Invalidating a dmabuf will impact other users of the shared BO.
In the scenario where process A moves the BO, it needs to inform
process B about the move and process B will need to update its
page table.
The commit fixes a synchronisation bug caused by the use of the
ticket: it made amdgpu_vm_handle_moved behave as if updating
the page table immediately was correct but in this case it's not.
An example is the following scenario, with 2 GPUs and glxgears
running on GPU0 and Xorg running on GPU1, on a system where P2P
PCI isn't supported:
glxgears:
export linear buffer from GPU0 and import using GPU1
submit frame rendering to GPU0
submit tiled->linear blit
Xorg:
copy of linear buffer
The sequence of jobs would be:
drm_sched_job_run # GPU0, frame rendering
drm_sched_job_queue # GPU0, blit
drm_sched_job_done # GPU0, frame rendering
drm_sched_job_run # GPU0, blit
move linear buffer for GPU1 access #
amdgpu_dma_buf_move_notify -> update pt # GPU0
It this point the blit job on GPU0 is still running and would
likely produce a page fault.
Cc: stable@vger.kernel.org
Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 318917e1d8ecc89f820f4fabf79935f4fed718cd ]
[Why & How]
On Framework laptops with DDR5 modules, underflow can be observed.
It's unclear why it only occurs on specific desktop contents. However,
increasing enter/exit latencies by 3us seems to resolve it.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4463
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3b948dd0366a0b64c02e4ed1aefdf7825942e803 ]
Regardless if CPU enable 5-level paging, GPU vm use 5-level paging if
gmc init with 57-bit address space support, because
ARM64 4-level paging support 48-bit VA, x86 and GPU 4-level paging
support 47-bit VA, require 5-level paging on GPU to support ARM64.
NPA address space 52-bit mapping on NPA GPU VM require 5-level paging.
Debugger trap get device snapshot expect LDS and Scratch base, limit
above 57-bit, which is set only for 5-level paging.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.19.x
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f6b1c1f5fd7237f77fc3880603ea54dcf0371a20 ]
If GPU supports 5-level page table, but CPU disable 5-level page table
by using boot option no5lvl or CPU feature not available, the virtual
address will be 48bit, not needed to enable 5-level page table on GPU
vm.
If adev->vm_manager.num_level, number of pde levels, set to 4, then
gfxhub and mmhub register VM_CONTEXTx_CNTL/PAGE_TABLE_DEPTH will set
to 4 to enable 5-level page table in page table walker.
Set vm_manager.root_level to AMDGPU_VM_PDE3, then update GPU mapping
will allocate and update PDE3/PDE2/PDE1/PDE0/PTB 5-level page tables.
If max_level is not 4, no change for the logic to support features
needed by old ASICs.
v2: squash in CONFIG fix
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 3b948dd0366a ("drm/amdgpu: Use 5-level paging if gmc support 57-bit VA")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 39fc2bc4da0082c226cbee331f0a5d44db3997da ]
Ungate GPU CG/PG in device_fini_hw and device_halt to protect GPU
register accesses, e.g. GC registers are accessed in amdgpu_irq_disable_all()
and amdgpu_fence_driver_hw_fini().
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8a70a26c9f34baea6c3199a9862ddaff4554a96d ]
The kfd_event_page_set() function writes KFD_SIGNAL_EVENT_LIMIT * 8
bytes via memset without checking the buffer size parameter. This allows
unprivileged userspace to trigger an out-of bounds kernel memory write
by passing a small buffer, leading to potential privilege
escalation.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5cc7bbd9f1b74d9fe2f7ac08d6ba0477e8d2d65f ]
sdma ring reset is not supported in SRIOV. kfd driver does not check
reset mask, and could queue sdma ring reset during unmap_queues_cpsch.
Avoid the ring reset for sriov.
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1b38a87b8f8020e8ef4563e7752a64182b5a39b9 ]
[Why]
Shaper programming has high chance to fail on first time after
power-on or reboot. This can be verified by running IGT's kms_colorop.
[How]
Always power on the shaper and 3DLUT before programming by
removing the debug flag of low power mode.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7d9ec9dc20ecdb1661f4538cd9112cd3d6a5f15a ]
[Why]
For RGB BT2020 full and limited color spaces, overlay adjustments were
applied twice (once by MM and once by DAL). This results in incorrect
colours and a noticeable difference between mpo and non-mpo cases.
[How]
Add RGB BT2020 full and limited color spaces to list that bypasses post
csc adjustment.
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 49fe2c57bdc0acff9d2551ae337270b6fd8119d9 ]
This patch limits the clock speeds of the AMD Radeon R5 M420 GPU from
850/1000MHz (core/memory) to 800/950 MHz, making it work stably. This
patch is for amdgpu.
Signed-off-by: decce6 <decce6@proton.me>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d0728aee5090853d0b9982757f5fb1b13e2e2b27 ]
[Why/How]
Align the default value of the flag with DCN35/351.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: YiLing Chen <yi-lchen@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3ee1c72606bd2842f0f377fd4b118362af0323ae ]
Tune the sleep interval in the PSP fence wait loop from 10-100us to
60-100us.This adjustment results in an overall wait window of 1.2s
(60us * 20000 iterations) to 2 seconds (100us * 20000 iterations),
which guarantees that we can retrieve the correct fence value
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 044f8d3b1fac6ac89c560f61415000e6bdab3a03 ]
end the function flow when ras table checksum is error
Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1a38ded4bc8ac09fd029ec656b1e2c98cc0d238c ]
[Why & How]
Although it's dummy updates of surface update for committing stream
updates, we should not have dummy_updates[j].surface all indicating
to the same surface under multiple surfaces case. Otherwise,
copy_surface_update_to_plane() in update_planes_and_stream_state()
will update to the same surface only.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 64c94cd9be2e188ed07efeafa6a109bce638c967 ]
[Why]
System will try to apply idle power optimizations setting during
system resume. But system power state is still in D3 state, and
it will cause the idle power optimizations command not actually
to be sent to DMUB and cause some platforms to go into IPS.
[How]
Set power state to D0 first before calling the
dc_dmub_srv_apply_idle_power_optimizations(dm->dc, false)
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8980be03b3f9a4b58197ef95d3b37efa41a25331 ]
VF doesn't enable VCN poison irq in VCNv2.5. Skip releasing it and avoid
call trace during deinitialization.
[ 71.913601] [drm] clean up the vf2pf work item
[ 71.915088] ------------[ cut here ]------------
[ 71.915092] WARNING: CPU: 3 PID: 1079 at /tmp/amd.aFkFvSQl/amd/amdgpu/amdgpu_irq.c:641 amdgpu_irq_put+0xc6/0xe0 [amdgpu]
[ 71.915355] Modules linked in: amdgpu(OE-) amddrm_ttm_helper(OE) amdttm(OE) amddrm_buddy(OE) amdxcp(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) drm_suballoc_helper drm_display_helper cec rc_core i2c_algo_bit video wmi binfmt_misc nls_iso8859_1 intel_rapl_msr intel_rapl_common input_leds joydev serio_raw mac_hid qemu_fw_cfg sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel usbhid 8139too sha256_ssse3 sha1_ssse3 hid psmouse bochs i2c_i801 ahci drm_vram_helper libahci i2c_smbus lpc_ich drm_ttm_helper 8139cp mii ttm aesni_intel crypto_simd cryptd
[ 71.915484] CPU: 3 PID: 1079 Comm: rmmod Tainted: G OE 6.8.0-87-generic #88~22.04.1-Ubuntu
[ 71.915489] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-2.el9_5.1 04/01/2014
[ 71.915492] RIP: 0010:amdgpu_irq_put+0xc6/0xe0 [amdgpu]
[ 71.915768] Code: 75 84 b8 ea ff ff ff eb d4 44 89 ea 48 89 de 4c 89 e7 e8 fd fc ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 55 30 3b c7 <0f> 0b eb d4 b8 fe ff ff ff eb a8 e9 b7 3b 8a 00 66 2e 0f 1f 84 00
[ 71.915771] RSP: 0018:ffffcf0800eafa30 EFLAGS: 00010246
[ 71.915775] RAX: 0000000000000000 RBX: ffff891bda4b0668 RCX: 0000000000000000
[ 71.915777] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 71.915779] RBP: ffffcf0800eafa50 R08: 0000000000000000 R09: 0000000000000000
[ 71.915781] R10: 0000000000000000 R11: 0000000000000000 R12: ffff891bda480000
[ 71.915782] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 71.915792] FS: 000070cff87c4c40(0000) GS:ffff893abfb80000(0000) knlGS:0000000000000000
[ 71.915795] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 71.915797] CR2: 00005fa13073e478 CR3: 000000010d634006 CR4: 0000000000770ef0
[ 71.915800] PKRU: 55555554
[ 71.915802] Call Trace:
[ 71.915805] <TASK>
[ 71.915809] vcn_v2_5_hw_fini+0x19e/0x1e0 [amdgpu]
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9ef84a307582a92ef055ef0bd3db10fd8ac75960 ]
[WHAT]
1. Set no scaling for writeback as they are hardcoded in DCN3.2+.
2. Set no fast plane update for writeback commits.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1197366cca89a4c44c541ddedb8ce8bf0757993d ]
If SMU is disabled, during RAS initialization,
there will be null pointer dereference issue here.
Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8079b87c02e531cc91601f72ea8336dd2262fdf1 ]
Add validation to ensure user queue sizes meet hardware requirements:
- Size must be a power of two for efficient ring buffer wrapping
- Size must be at least AMDGPU_GPU_PAGE_SIZE to prevent undersized allocations
This prevents invalid configurations that could lead to GPU faults or
unexpected behavior.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 15b1d7b77e9836ff4184093163174a1ef28bbdd7 ]
[Why]
When usb4 link training fails, the dpia sym clock will be disabled and SYMCLK
source should be changed back to phy clock. In enable_streams, it is
assumed that link training succeeded and will switch from refclk to
phy clock. But phy clk here might not be on. Dig reg access timeout
will occur.
[How]
When enable_stream is hit, check if link training failed for usb4.
If it did, fall back to the ref clock to avoid reg access timeout.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Zhongwei <Zhongwei.Zhang@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bdad08670278829771626ea7b57c4db531e2544f ]
Using >=, <= for checking the family is not always correct.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0028b86b52f7609e36af635ef6cb908925306233 ]
set retired_page of invalid ras records to U64_MAX, and skip
them when reading ras records
Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a625dc4989a2affb8f06e7b418bf30e1474b99c1 ]
[Why&How]
This reverts commit 14bb17cc37e0.
Due to the change, the display shows garbage on startup.
We have an alternative solution for the original issue:
d24203bb629f ("drm/amd/display: Re-check seamless boot can be enabled or not")
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Wang, Sung-huai <Danny.Wang@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bdc26342c49e1dc1afb48feeb20c9d74d15b784c ]
[Why&How]
This reverts commit f082daf08f2f.
Due to the change, the display shows garbage on startup.
We have an alternative solution for the original issue:
d24203bb629f ("drm/amd/display: Re-check seamless boot can be enabled or not")
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Wang, Sung-huai <Danny.Wang@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4589712e0111352973131bad975023b25569287c ]
[Why]
We're missing the code to actually disable the link output when we have
to leave the SYMCLK_ON but the TX remains OFF.
[How]
Port the code from DCN401 that detects SYMCLK_ON_TX_OFF and disable
the link output when the backend is reset.
Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8cee62904caf95e5698fa0f2d420f5f22b4dea15 ]
[why & how]
VBIOS DMCUB FW can enable FEC for capable eDPs, but S/W DC state is
only updated for link0 when transitioning into OS with driver loaded.
This causes issues when the eDP is immediately hidden and DIG0 is
assigned to another link that does not support FEC. Driver will
attempt to disable FEC but FEC enablement occurs based on the link
state, which does not have fec_state updated since it is a different
link. Thus, FEC disablement on DIG0 will get skipped and cause no
light up.
Reviewed-by: Karen Chen <karen.chen@amd.com>
Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6c160001661b6c4e20f5c31909c722741e14c2d8 ]
In svm_migrate_gart_map(), while migrating GART mapping, the number of
bytes copied for the GART table only accounts for CPU pages. On non-4K
systems, each CPU page can contain multiple GPU pages, and the GART
requires one 8-byte PTE per GPU page. As a result, an incorrect size was
passed to the DMA, causing only a partial update of the GART table.
Fix this function to work correctly on non-4K page-size systems by
accounting for the number of GPU pages per CPU page when calculating the
number of bytes to be copied.
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 42ea9cf2f16b7131cb7302acb3dac510968f8bdc ]
HW-supported EOP buffer sizes are 4K and 32K. On systems that do not
use 4K pages, the minimum buffer object (BO) allocation size is
PAGE_SIZE (for example, 64K). During queue buffer acquisition, the driver
currently checks the allocated BO size against the supported EOP buffer
size. Since the allocated BO is larger than the expected size, this check
fails, preventing queue creation.
Relax the strict size validation and allow PAGE_SIZE-sized BOs to be used.
Only the required 4K region of the buffer will be used as the EOP buffer
and avoids queue creation failures on non-4K page systems.
Acked-by: Christian König <christian.koenig@amd.com>
Suggested-by: Philip Yang <yangp@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 75372d75a4e23783583998ed99d5009d555850da ]
[Why]
A backport of the change made for DCN401 that addresses an issue where
we turn off the PHY PLL when disabling TMDS output, which causes the
OTG to remain stuck.
The OTG being stuck can lead to a hang in the DCHVM's ability to ACK
invalidations when it thinks the HUBP is still on but it's not receiving
global sync.
The transition to PLL_ON needs to be atomic as there's no guarantee
that the thread isn't pre-empted or is able to complete before the
IOMMU watchdog times out.
[How]
Backport the implementation from dcn401 back to dcn35.
There's a functional difference in when the eDP output is disabled in
dcn401 code so we don't want to utilize it directly.
Reviewed-by: Yihan Zhu <yihan.zhu@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0839d8d24e6f1fc2587c4a976f44da9fa69ae3d0 ]
This avoids any issues with dpia endpoints
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 39c21b81112321cbe1267b02c77ecd2161ce19aa ]
VFs use the PF SDMA ucode and are unable to load SDMA_RS64.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Signed-off-by: Victor Skvortsov <Victor.Skvortsov@amd.com>
Reviewed-by: Gavin Wan <gavin.wan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Timeline Management v4
[ Upstream commit bd8150a1b3370a9f7761c5814202a3fe5a79f44f ]
This commit simplifies the amdgpu_gem_va_ioctl function, key updates
include:
- Moved the logic for managing the last update fence directly into
amdgpu_gem_va_update_vm.
- Introduced checks for the timeline point to enable conditional
replacement or addition of fences.
v2: Addressed review comments from Christian.
v3: Updated comments (Christian).
v4: The previous version selected the fence too early and did not manage its
reference correctly, which could lead to stale or freed fences being used.
This resulted in refcount underflows and could crash when updating GPU
timelines.
The fence is now chosen only after the VA mapping work is completed, and its
reference is taken safely. After exporting it to the VM timeline syncobj, the
driver always drops its local fence reference, ensuring balanced refcounting
and avoiding use-after-free on dma_fence.
Crash signature:
[ 205.828135] refcount_t: underflow; use-after-free.
[ 205.832963] WARNING: CPU: 30 PID: 7274 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
...
[ 206.074014] Call Trace:
[ 206.076488] <TASK>
[ 206.078608] amdgpu_gem_va_ioctl+0x6ea/0x740 [amdgpu]
[ 206.084040] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[ 206.089994] drm_ioctl_kernel+0x86/0xe0 [drm]
[ 206.094415] drm_ioctl+0x26e/0x520 [drm]
[ 206.098424] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[ 206.104402] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
[ 206.109387] __x64_sys_ioctl+0x96/0xe0
[ 206.113156] do_syscall_64+0x66/0x2d0
...
[ 206.553351] BUG: unable to handle page fault for address: ffffffffc0dfde90
...
[ 206.553378] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0
...
[ 206.553405] Call Trace:
[ 206.553409] <IRQ>
[ 206.553415] ? __pfx_drm_sched_fence_free_rcu+0x10/0x10 [gpu_sched]
[ 206.553424] dma_fence_signal+0x30/0x60
[ 206.553427] drm_sched_job_done.isra.0+0x123/0x150 [gpu_sched]
[ 206.553434] dma_fence_signal_timestamp_locked+0x6e/0xe0
[ 206.553437] dma_fence_signal+0x30/0x60
[ 206.553441] amdgpu_fence_process+0xd8/0x150 [amdgpu]
[ 206.553854] sdma_v4_0_process_trap_irq+0x97/0xb0 [amdgpu]
[ 206.554353] edac_mce_amd(E) ee1004(E)
[ 206.554270] amdgpu_irq_dispatch+0x150/0x230 [amdgpu]
[ 206.554702] amdgpu_ih_process+0x6a/0x180 [amdgpu]
[ 206.555101] amdgpu_irq_handler+0x23/0x60 [amdgpu]
[ 206.555500] __handle_irq_event_percpu+0x4a/0x1c0
[ 206.555506] handle_irq_event+0x38/0x80
[ 206.555509] handle_edge_irq+0x92/0x1e0
[ 206.555513] __common_interrupt+0x3e/0xb0
[ 206.555519] common_interrupt+0x80/0xa0
[ 206.555525] </IRQ>
[ 206.555527] <TASK>
...
[ 206.555650] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0
...
[ 206.555667] Kernel panic - not syncing: Fatal exception in interrupt
Link: https://patchwork.freedesktop.org/patch/654669/
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c979d8db7b0f293111f2e83795ea353c8ed75de9 ]
Trying to access link enc on a dpia link will cause a crash otherwise
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c8cf9ddc549fb93cb5a35f3fe23487b1e6707e74 ]
Only set an error on the fence if the fence is not
signalled. We can end up with a warning if the
per queue reset path signals the fence and sets an error
as part of the reset, but fails to recover.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 878a4b73c11111ff5f820730f59a7f8c6fd59374 ]
[why]
Need to add function hook check before use
Reviewed-by: Mohit Bawa <mohit.bawa@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit af3303970da5ce5bfe6dffdd07f38f42aad603e0 ]
[Why]
The evaluation for whether we need to use the DMUB HW lock isn't the
same as whether we need to unlock which results in a hang when the
fast path is used for ASIC without FAMS support.
[How]
Store a flag that indicates whether we should use the lock and use
that same flag to specify whether unlocking is needed.
Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e2fd14f579b841f54a9b7162fef15234d8c0627a ]
This initializes HDP IP version 6.1.1.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9aeb31b2456452257ad1ff7ec566f21bab1f3e8a ]
[WHY]
User may connect mst dock with multi monitors and do quick unplug
and plug in one of the monitor. This operatioin may create CSN from
dock to display driver. Then display driver would disable and then enable
mst link and also disable/enable DPCD mst_en bit in dock RX. However,
when mst_en bit being disabled, if dock has another CSN message to
transmit then the message would be removed because of the disabling of
mst_en. In this case, the message is missing and it ends up no display in
the replugged monitor.
[HOW]
Don't disable mst_en bit when link still has sink connected.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cea573a8e1ed83840a2173d153dd68e172849d44 ]
[Why]
USB-C DisplayPort Alt Mode with concurrent USB data needs lane count
limitation to prevent incorrect 4-lane DP configuration when only 2 lanes
are available due to hardware lane sharing between DP and USB3.
[How]
Query DMUB for Alt Mode status (is_dp_alt_disable, is_usb, is_dp4) in
dcn32_link_encoder_get_max_link_cap() and cap DP to 2 lanes when USB is
active on USB-C port. Added inline documentation explaining the USB-C
lane sharing constraint.
Reviewed-by: PeiChen Huang <peichen.huang@amd.com>
Signed-off-by: LinCheng Ku <lincheng.ku@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bf5e396957acafd46003318965500914d5f4edfa ]
[why]
need to enable APG_CLOCK_ENABLE enable first
also need to wake up az from D3 before access az block
Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5b57c3c3f22336e8fd5edb7f0fef3c7823f8eac1 ]
Only check and drain IH1 ring if CAM is not enabled.
If GPU is under reset, don't access IH to drain retry fault.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7dedb906cdfec100061daf41f8e54266e975987d ]
[WHY&HOW]
If DMCUB is not initialized or FAMS2 is not supported, the
interface should not be called.
Reviewed-by: Sridevi Arvindekar <sridevi.arvindekar@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 531fe6e0fee85a1bdb5b8223a706fff654ed0a61 ]
[WHY&HOW]
The condition is only perform toggle if FIXED_VS LTTPR reports
no IEEE OUI.
The literal "\x0,\x0,\x0" contains commas changes the
bytes being compared to {0x00,0x2C,0X00}.
The correct literal should be "\x00\x00\x00" without commas.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Jing Zhou <Jing.Zhou@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c02288724b98cbc018231200891d66578f83f848 ]
[Why]
The hubp401_cursor_set_position function programs a different value
than it stores for use with cursor offload.
This can cause a desync when switching between cursor programming paths.
[How]
We do the translation to destination space currently twice: once in the
HWSS layer, and then again in the HUBP layer since we never store the
translated result.
HUBP expects to program the pos->x and pos->y directly for other ASIC,
so follow that pattern here as well.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c7062be3380cb20c8b1c4a935a13f1848ead0719 ]
[WHY]
- After the addition of all OVT patches, DSC padding was being accounted
for multiple times, effectively doubling the padding
- This caused compliance failures or corruption
[HOW]
- Add padding to DSC pic width when required by HW, and do not re-add
when calculating reg values
- Do not add padding when computing PPS values, and instead track padding
separately to add when calculating slice width values
Reviewed-by: Chris Park <chris.park@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Relja Vojvodic <rvojvodi@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bd68a1404b6fa2e7e9957b38ba22616faba43e75 ]
In the rare event if eeprom has only invalid address entries,
allocation is skipped, this causes following NULL pointer issue
[ 547.103445] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 547.118897] #PF: supervisor read access in kernel mode
[ 547.130292] #PF: error_code(0x0000) - not-present page
[ 547.141689] PGD 124757067 P4D 0
[ 547.148842] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 547.158504] CPU: 49 PID: 8167 Comm: cat Tainted: G OE 6.8.0-38-generic #38-Ubuntu
[ 547.177998] Hardware name: Supermicro AS -8126GS-TNMR/H14DSG-OD, BIOS 1.7 09/12/2025
[ 547.195178] RIP: 0010:amdgpu_ras_sysfs_badpages_read+0x2f2/0x5d0 [amdgpu]
[ 547.210375] Code: e8 63 78 82 c0 45 31 d2 45 3b 75 08 48 8b 45 a0 73 44 44 89 f1 48 8b 7d 88 48 89 ca 48 c1 e2 05 48 29 ca 49 8b 4d 00 48 01 d1 <48> 83 79 10 00 74 17 49 63 f2 48 8b 49 08 41 83 c2 01 48 8d 34 76
[ 547.252045] RSP: 0018:ffa0000067287ac0 EFLAGS: 00010246
[ 547.263636] RAX: ff11000167c28130 RBX: ff11000127600000 RCX: 0000000000000000
[ 547.279467] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff11000125b1c800
[ 547.295298] RBP: ffa0000067287b50 R08: 0000000000000000 R09: 0000000000000000
[ 547.311129] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 547.326959] R13: ff11000217b1de00 R14: 0000000000000000 R15: 0000000000000092
[ 547.342790] FS: 0000746e59d14740(0000) GS:ff11017dfda80000(0000) knlGS:0000000000000000
[ 547.360744] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 547.373489] CR2: 0000000000000010 CR3: 000000019585e001 CR4: 0000000000f71ef0
[ 547.389321] PKRU: 55555554
[ 547.395316] Call Trace:
[ 547.400737] <TASK>
[ 547.405386] ? show_regs+0x6d/0x80
[ 547.412929] ? __die+0x24/0x80
[ 547.419697] ? page_fault_oops+0x99/0x1b0
[ 547.428588] ? do_user_addr_fault+0x2ee/0x6b0
[ 547.438249] ? exc_page_fault+0x83/0x1b0
[ 547.446949] ? asm_exc_page_fault+0x27/0x30
[ 547.456225] ? amdgpu_ras_sysfs_badpages_read+0x2f2/0x5d0 [amdgpu]
[ 547.470040] ? mas_wr_modify+0xcd/0x140
[ 547.478548] sysfs_kf_bin_read+0x63/0xb0
[ 547.487248] kernfs_file_read_iter+0xa1/0x190
[ 547.496909] kernfs_fop_read_iter+0x25/0x40
[ 547.506182] vfs_read+0x255/0x390
This also result in space left assigned to negative values.
Moving data alloc call before bad page check resolves both the issue.
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f752e79d38857011f1293fcb6c810409c3b669ee ]
__amdgpu_ras_restore_bad_pages is responsible for the maintenance of bad
page number, drop the unnecessary bad page number update in the error
handling path of add_bad_pages.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|