Age | Commit message (Collapse) | Author | Files | Lines |
|
In DCN401 pre-blending degamma LUT isn't affecting cursor as in previous
DCN version. As this is not the behavior close to what is expected for
CRTC degamma LUT, disable CRTC degamma LUT property in this HW.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4176
---
When enabling HDR on KDE, it takes the first CRTC 1D LUT available and
apply a color transformation (Gamma 2.2 -> PQ). AMD driver usually
advertises a CRTC degamma LUT as the first CRTC 1D LUT, but it's
actually applied pre-blending. In previous HW version, it seems to work
fine because the 1D LUT was applied to cursor too, but DCN401 presents a
different behavior and the 1D LUT isn't affecting the hardware cursor.
To address the wrong gamma on cursor with HDR (see the link), I came up
with this patch that disables CRTC degamma LUT in this hw, since it
presents a different behavior than others. With this KDE sees CRTC
regamma LUT as the first post-blending 1D LUT available. This is
actually more consistent with AMD color pipeline. It was tested by the
reporter, since I don't have the HW available for local testing and
debugging.
Melissa
---
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This reverts commit 99e25e4683d7cfdf79dcc328e11bb6c924c77566.
[Why & How]
This commit caused a blank screen on internal display when projecting to
an external display on DCN314.
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SmartMux is a mechanism to switch the GPU being used for scanout in a
hybrid configuration. This is used for devices with an eDP and two GPUs.
This is only valid when the system has a physical switch (Multiplexer)
in the board to switch between the two GPUs.
When a graphically intensive workload like a game is being run, the
system can be switch the active display to the dGPU, so that we can
avoid copying the buffer from dGPU to APU for scanout. This helps with
latency and FPS. When power consumption is preferred, the system can be
switched to the APU.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
Free memory to avoid memory leak
Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Clayton King <clayton.king@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why & how]
UHBR link rate capable eDPs will use HPO for encoding. Need to pass
HPO stream and link encoder instances to DMCUB for Replay FSM to
know which instances to use.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why & how]
DP1 eDP is still considered a single-eDP case and should support Panel Replay.
Modify secondary eDP policy to reflect this and update Replay state accordingly.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
For SQ128 pattern some vendor-specific overrides are required.
Previously a hardcoded clock gen source value was incorrectly programmed,
causing our override to retimer's clock source override to be ignored.
Due to some PHY issues on certain APU programs, we see failures on retimer
bypass ports extend to electrical testing downstream of PHY due to some host
clock jitter which the retimer follows.
[HOW]
Fix typo to use correct clock gen source override of 0xC4 rather than 0x4C.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why & How]
To facilitate debugging, the following behaviors are defined for existing
debug option disable_ips_in_vpb
0 - Enable IPS in LVP - let driver decide (legacy)
1 - Disable IPS in LVP
2 - Enable IPS1 and RCG in LVP
3 - Enable IPS1 Z8, IPS1 and RCG in LVP
Reviewed-by: Duncan Ma <duncan.ma@amd.com>
Signed-off-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why & How]
Add static pg implementations and debug flags for future use.
Reviewed-by: Duncan Ma <duncan.ma@amd.com>
Signed-off-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why & How]
DMUB shall be notified on driver hardware
release. Implement notification.
Reviewed-by: Duncan Ma <duncan.ma@amd.com>
Signed-off-by: Duncan Ma <Duncan.Ma@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why & How]
Aligned IPS FW state with DMCUB IPS FW state
Added debug option disable_ips_rcg to modify RCG behaviour in IPS modes.
Updated existing debug option disable_ips to align with new changes introduced by IPSv2.0
Reviewed-by: Duncan Ma <duncan.ma@amd.com>
Signed-off-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why & How]
Display idle notification shall
be sent by driver on D3 entry. Implement
notification to DMUB and PMFW.
Reviewed-by: Duncan Ma <duncan.ma@amd.com>
Signed-off-by: Duncan Ma <Duncan.Ma@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why]
dc has some code out of sync:
dc_commit_updates_for_stream handles v1/v2/v3,
but dc_update_planes_and_stream makes v1 asic to use v2.
as a reression fix: limit clear_update_flags to dcn32 or newer asic.
need to follow up that v1 asic using v2 issue.
Reviewed-by: Syed Hassan <syed.hassan@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why & How]
Some monitor have audio output but SAB data is zero. Skip
check this in this case.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Signed-off-by: Fudongwang <Fudong.Wang@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why/How]
Add the timing source needed to support DID Type5.
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
DSC block level should only be responsible for reporting single DSC
instance capabilities. Factoring in ODM combine requirements should be
handled in dc_dsc.c. Both components should acquire clocks from clk_mgr
to determine throughput capabilities instead of relying on hard coded
values as these can differ by SoC and SKU.
[HOW]
1) Add dsc_get_single_enc_caps to acquire single DSC instance
capabilities (replacing dsc_get_enc_caps), factoring in DSCCLK
2) add build_dsc_enc_caps to combine single DSC instance capabilities
3) account for max pixel rate per pipe (DISPCLK) when calculating
minimum slice count
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
compilation units
[Why & How]
Expose dcn401_initialize_min_clocks() for future use and add additional
check for IP register.
Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com>
Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY & HOW]
Adding basic logic to allocate unused RMCM block and TMZ support.
Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
When booting without an HDMI display connected, the I2C registers
are not initialized correctly, leading to DC_I2C_ARBITRATION register
getting stuck with DC_I2C_REG_RW_CNTL_STATUS == USED_BY_SW.
[How]
* Correct TOCTOU race condition in engine acquire logic which did not
check against DMUB trying to acquire it at the same time.
* Deassert SOFT_RESET before acquire, as it can block access to other
I2C registers.
* Add a workaround in release, checking that after triggerring
DC_I2C_SW_DONE_USING_I2C_REG, DC_I2C_REG_RW_CNTL_STATUS != USED_BY_SW.
If necessary, trigger DC_I2C_SW_DONE_USING_I2C_REG again.
* Remove unnecessary clear of DC_I2C_SW_USE_I2C_REG_REQ, which engine
ignores according to specification.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dominik Kaszewski <dominik.kaszewski@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When the driver is unloaded, the interrupt source of
the rma device is not released, resulting in the failure
of hw_init when loading again using bad_page_threshold.
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
KFD has been confirmed that can run on LoongArch systems.
It's necessary to support CONFIG_HSA_AMD on LoongArch.
Signed-off-by: Han Gao <rabenda.cn@gmail.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Start and stop can fail, so add checks.
Fixes: b54695dae995 ("drm/amd: Add per-ring reset for vcn v5.0.0 use")
Reviewed-by: Mario Limonciello <mari.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
|
|
Start and stop can fail, so add checks.
Fixes: d1a46cdd0053 ("drm/amd: Add per-ring reset for vcn v4.0.5 use")
Reviewed-by: Mario Limonciello <mari.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
|
|
Start and stop can fail, so add checks.
Fixes: b8b6e6f1654d ("drm/amd: Add per-ring reset for vcn v4.0.0 use")
Reviewed-by: Mario Limonciello <mari.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
|
|
The ring test needs to be inside the lock.
Fixes: 097af47d3cfb ("drm/amdgpu/gfx10: wait for reset done before remap")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>
|
|
The ring test needs to be inside the lock.
Fixes: 4c953e53cc34 ("drm/amdgpu/gfx_9.4.3: wait for reset done before remap")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>
|
|
The ring test needs to be inside the lock.
Fixes: fdbd69486b46 ("drm/amdgpu/gfx9: wait for reset done before remap")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>
|
|
For current partition mode queries, return the mode cached in partition
manager whenever it's valid.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Among the scheduler's statuses, the only one that indicates an error is
DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV
signifies that the operation succeeded and the GPU is in a nominal state.
However, to provide more information about the GPU's status, it is needed
to convey more information than just "OK".
Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to
DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this
status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has
hung, but it has been successfully reset and is now in a nominal state
again.
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-1-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.17-2025-07-11:
amdgpu:
- Clean up function signatures
- GC 10 KGQ reset fix
- SDMA reset cleanups
- Misc fixes
- LVDS fixes
- UserQ fix
amdkfd:
- Reset fix
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250711205548.21052-1-alexander.deucher@amd.com
|
|
When a ring reset happens, amdgpu calls drm_dev_wedged_event() using
struct amdgpu_task_info *ti as one of the arguments. After using *ti, a
call to amdgpu_vm_put_task_info(ti) is required to correctly track its
lifetime.
However, it's called from a place that the ring reset path never reaches
due to a goto after drm_dev_wedged_event() is called. Move
amdgpu_vm_put_task_info() bellow the exit label to make sure that it's
called regardless of the code path.
amdgpu_vm_put_task_info() can only accept a valid address or NULL as
argument, so initialise *ti to make sure we can call this function if
*ti isn't used.
Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info")
Reported-by: Dave Airlie <airlied@gmail.com>
Closes: https://lore.kernel.org/dri-devel/CAPM=9tz0rQP8VZWKWyuF8kUMqRScxqoa6aVdwWw9=5yYxyYQ2Q@mail.gmail.com/
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250704030629.1064397-1-andrealmeid@igalia.com
Signed-off-by: André Almeida <andrealmeid@igalia.com>
|
|
For normal hibernation, GPU do not need to be resumed in thaw since it is
not involved in writing the hibernation image. Skip resume in this case
can reduce the hibernation time.
On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
this can save 50 minutes.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Link: https://lore.kernel.org/r/20250710062313.3226149-6-guoqing.zhang@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
|
|
When hibernate with data center dGPUs, huge number of VRAM BOs evicted
to GTT and takes too much system memory. This will cause hibernation
fail due to insufficient memory for creating the hibernation image.
Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel
hibernation code to make room for hibernation image.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250710062313.3226149-3-guoqing.zhang@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
|
|
Fix the smatch warning,
smatch warnings:
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:2146 amdgpu_pt_info_read()
error: we previously assumed 'fpriv' could be null (see line 2146)
"if (!fpriv && !fpriv->vm.root.bo)", It has to be an OR condition
rather than an AND which makes an NULL dereference in case fpriv is NULL.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507090525.9rDWGhz3-lkp@intel.com/
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Link: https://lore.kernel.org/r/20250709071618.591866-1-sunil.khatri@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
Fix undefined reference to amdgpu_mqd_info_fops during
debugfs_create_file if DEBUG_FS=n
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Link: https://lore.kernel.org/r/20250708101551.68033-1-sunil.khatri@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
Pull in drm-intel-next for the updates to drm panic handling.
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
[ +0.000020] BUG: KASAN: slab-use-after-free in amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu]
[ +0.000817] Read of size 8 at addr ffff88812eec8c58 by task amd_pci_unplug/1733
[ +0.000027] CPU: 10 UID: 0 PID: 1733 Comm: amd_pci_unplug Tainted: G W 6.14.0+ #2
[ +0.000009] Tainted: [W]=WARN
[ +0.000003] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000003] dump_stack_lvl+0x76/0xa0
[ +0.000011] print_report+0xce/0x600
[ +0.000009] ? srso_return_thunk+0x5/0x5f
[ +0.000006] ? kasan_complete_mode_report_info+0x76/0x200
[ +0.000007] ? kasan_addr_to_slab+0xd/0xb0
[ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu]
[ +0.000707] kasan_report+0xbe/0x110
[ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu]
[ +0.000541] __asan_report_load8_noabort+0x14/0x30
[ +0.000005] amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu]
[ +0.000535] ? stop_cpsch+0x396/0x600 [amdgpu]
[ +0.000556] ? stop_cpsch+0x429/0x600 [amdgpu]
[ +0.000536] ? __pfx_amdgpu_userq_suspend+0x10/0x10 [amdgpu]
[ +0.000536] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? kgd2kfd_suspend+0x132/0x1d0 [amdgpu]
[ +0.000542] amdgpu_device_fini_hw+0x581/0xe90 [amdgpu]
[ +0.000485] ? down_write+0xbb/0x140
[ +0.000007] ? __mutex_unlock_slowpath.constprop.0+0x317/0x360
[ +0.000005] ? __pfx_amdgpu_device_fini_hw+0x10/0x10 [amdgpu]
[ +0.000482] ? __kasan_check_write+0x14/0x30
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? up_write+0x55/0xb0
[ +0.000007] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? blocking_notifier_chain_unregister+0x6c/0xc0
[ +0.000008] amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
[ +0.000484] amdgpu_pci_remove+0x93/0x130 [amdgpu]
[ +0.000482] pci_device_remove+0xae/0x1e0
[ +0.000008] device_remove+0xc7/0x180
[ +0.000008] device_release_driver_internal+0x3d4/0x5a0
[ +0.000007] device_release_driver+0x12/0x20
[ +0.000004] pci_stop_bus_device+0x104/0x150
[ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40
[ +0.000005] remove_store+0xd7/0xf0
[ +0.000005] ? __pfx_remove_store+0x10/0x10
[ +0.000006] ? __pfx__copy_from_iter+0x10/0x10
[ +0.000006] ? __pfx_dev_attr_store+0x10/0x10
[ +0.000006] dev_attr_store+0x3f/0x80
[ +0.000006] sysfs_kf_write+0x125/0x1d0
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? __kasan_check_write+0x14/0x30
[ +0.000005] kernfs_fop_write_iter+0x2ea/0x490
[ +0.000005] ? rw_verify_area+0x70/0x420
[ +0.000005] ? __pfx_kernfs_fop_write_iter+0x10/0x10
[ +0.000006] vfs_write+0x90d/0xe70
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? __pfx_vfs_write+0x10/0x10
[ +0.000004] ? local_clock+0x15/0x30
[ +0.000008] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __kasan_slab_free+0x5f/0x80
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __kasan_check_read+0x11/0x20
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? fdget_pos+0x1d3/0x500
[ +0.000007] ksys_write+0x119/0x220
[ +0.000005] ? putname+0x1c/0x30
[ +0.000006] ? __pfx_ksys_write+0x10/0x10
[ +0.000007] __x64_sys_write+0x72/0xc0
[ +0.000006] x64_sys_call+0x18ab/0x26f0
[ +0.000006] do_syscall_64+0x7c/0x170
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __pfx___x64_sys_openat+0x10/0x10
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __kasan_check_read+0x11/0x20
[ +0.000003] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? fpregs_assert_state_consistent+0x21/0xb0
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? syscall_exit_to_user_mode+0x4e/0x240
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? do_syscall_64+0x88/0x170
[ +0.000003] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? irqentry_exit+0x43/0x50
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? exc_page_fault+0x7c/0x110
[ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000006] RIP: 0033:0x7480c0b14887
[ +0.000005] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ +0.000005] RSP: 002b:00007fff142b0058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ +0.000006] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007480c0b14887
[ +0.000003] RDX: 0000000000000001 RSI: 00007480c0e7365a RDI: 0000000000000004
[ +0.000003] RBP: 00007fff142b0080 R08: 0000563b2e73c170 R09: 0000000000000000
[ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff142b02f8
[ +0.000003] R13: 0000563b159a72a9 R14: 0000563b159a9d48 R15: 00007480c0f19040
[ +0.000008] </TASK>
[ +0.000445] Allocated by task 427 on cpu 5 at 29.342331s:
[ +0.000011] kasan_save_stack+0x28/0x60
[ +0.000006] kasan_save_track+0x18/0x70
[ +0.000006] kasan_save_alloc_info+0x38/0x60
[ +0.000005] __kasan_kmalloc+0xc1/0xd0
[ +0.000006] __kmalloc_cache_noprof+0x1bd/0x430
[ +0.000007] amdgpu_driver_open_kms+0x172/0x760 [amdgpu]
[ +0.000493] drm_file_alloc+0x569/0x9a0
[ +0.000007] drm_client_init+0x1b7/0x410
[ +0.000007] drm_fbdev_client_setup+0x174/0x470
[ +0.000006] drm_client_setup+0x8a/0xf0
[ +0.000006] amdgpu_pci_probe+0x510/0x10c0 [amdgpu]
[ +0.000483] local_pci_probe+0xe7/0x1b0
[ +0.000006] pci_device_probe+0x5bf/0x890
[ +0.000006] really_probe+0x1fd/0x950
[ +0.000005] __driver_probe_device+0x307/0x410
[ +0.000006] driver_probe_device+0x4e/0x150
[ +0.000005] __driver_attach+0x223/0x510
[ +0.000006] bus_for_each_dev+0x102/0x1a0
[ +0.000005] driver_attach+0x3d/0x60
[ +0.000006] bus_add_driver+0x309/0x650
[ +0.000005] driver_register+0x13d/0x490
[ +0.000006] __pci_register_driver+0x1ee/0x2b0
[ +0.000006] rfcomm_dlc_clear_state+0x69/0x220 [rfcomm]
[ +0.000011] do_one_initcall+0x9c/0x3e0
[ +0.000007] do_init_module+0x29e/0x7f0
[ +0.000006] load_module+0x5c75/0x7c80
[ +0.000006] init_module_from_file+0x106/0x180
[ +0.000006] idempotent_init_module+0x377/0x740
[ +0.000006] __x64_sys_finit_module+0xd7/0x180
[ +0.000006] x64_sys_call+0x1f0b/0x26f0
[ +0.000006] do_syscall_64+0x7c/0x170
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000013] Freed by task 1733 on cpu 5 at 59.907086s:
[ +0.000011] kasan_save_stack+0x28/0x60
[ +0.000006] kasan_save_track+0x18/0x70
[ +0.000005] kasan_save_free_info+0x3b/0x60
[ +0.000005] __kasan_slab_free+0x54/0x80
[ +0.000006] kfree+0x127/0x470
[ +0.000006] amdgpu_driver_postclose_kms+0x455/0x760 [amdgpu]
[ +0.000493] drm_file_free.part.0+0x5b1/0xba0
[ +0.000006] drm_file_free+0x13/0x30
[ +0.000006] drm_client_release+0x1c4/0x2b0
[ +0.000006] drm_fbdev_ttm_fb_destroy+0xd2/0x120 [drm_ttm_helper]
[ +0.000007] put_fb_info+0x97/0xe0
[ +0.000007] unregister_framebuffer+0x197/0x380
[ +0.000005] drm_fb_helper_unregister_info+0x94/0x100
[ +0.000005] drm_fbdev_client_unregister+0x3c/0x80
[ +0.000007] drm_client_dev_unregister+0x144/0x330
[ +0.000006] drm_dev_unregister+0x49/0x1b0
[ +0.000006] drm_dev_unplug+0x4c/0xd0
[ +0.000006] amdgpu_pci_remove+0x58/0x130 [amdgpu]
[ +0.000484] pci_device_remove+0xae/0x1e0
[ +0.000008] device_remove+0xc7/0x180
[ +0.000007] device_release_driver_internal+0x3d4/0x5a0
[ +0.000006] device_release_driver+0x12/0x20
[ +0.000007] pci_stop_bus_device+0x104/0x150
[ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40
[ +0.000006] remove_store+0xd7/0xf0
[ +0.000006] dev_attr_store+0x3f/0x80
[ +0.000005] sysfs_kf_write+0x125/0x1d0
[ +0.000006] kernfs_fop_write_iter+0x2ea/0x490
[ +0.000006] vfs_write+0x90d/0xe70
[ +0.000006] ksys_write+0x119/0x220
[ +0.000006] __x64_sys_write+0x72/0xc0
[ +0.000006] x64_sys_call+0x18ab/0x26f0
[ +0.000005] do_syscall_64+0x7c/0x170
[ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000012] The buggy address belongs to the object at ffff88812eec8000
which belongs to the cache kmalloc-rnd-07-4k of size 4096
[ +0.000016] The buggy address is located 3160 bytes inside of
freed 4096-byte region [ffff88812eec8000, ffff88812eec9000)
[ +0.000023] The buggy address belongs to the physical page:
[ +0.000009] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12eec8
[ +0.000007] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ +0.000005] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[ +0.000007] page_type: f5(slab)
[ +0.000008] raw: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000
[ +0.000005] raw: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000
[ +0.000006] head: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000
[ +0.000005] head: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000
[ +0.000006] head: 0017ffffc0000003 ffffea0004bbb201 ffffffffffffffff 0000000000000000
[ +0.000005] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
[ +0.000005] page dumped because: kasan: bad access detected
[ +0.000010] Memory state around the buggy address:
[ +0.000009] ffff88812eec8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000012] ffff88812eec8b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] >ffff88812eec8c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] ^
[ +0.000010] ffff88812eec8c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] ffff88812eec8d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] ==================================================================
The use-after-free occurs because a delayed work item (`suspend_work`) may still
be pending or running when resources it accesses are freed during device removal
or file close. The previous code used `flush_work(&fpriv->evf_mgr.suspend_work.work)`,
which does not wait for delayed work that has not yet started. As a result, the
delayed work could run after its memory was freed, causing a use-after-free.
By switching to `flush_delayed_work(&fpriv->evf_mgr.suspend_work)`, we ensure that
the kernel waits for both queued and delayed work to finish before
freeing memory, closing this race.
Fixes: adba0929736a ("drm/amdgpu: Fix Illegal opcode in command stream Error")
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This reverts commit 5fb90421fa0fbe0a968274912101fe917bf1c47b.
The original patch moved `amdgpu_userq_mgr_fini()` to the driver's
`postclose` callback, which is called after `drm_gem_release()` in
the DRM file cleanup sequence.If a user application crashes or aborts
without cleaning up its user queues, 'drm_gem_release()` may free
GEM objects that are still referenced by active user queues, leading
to use-after-free. By reverting, we ensure that user queues are
disabled and cleaned up before any GEM objects are released,
preventing this class of bug. However, this reintroduces a race
during PCI hot-unplug, where device removal can race with per-file
cleanup, leading to use-after-free in suspend/unplug paths.
This will be fixed in the next patch.
Fixes: 5fb90421fa0f ("drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c")
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Common resolutions are added to supported modes to enable compatibility
scenarios that compositors may use to do things like clone displays. There
is no guarantee however that the panel will natively support these modes.
[How]
If the compositor hasn't enabled scaling but a non-native resolution has
been picked for an LVDS panel turn the scaler on anyway. This will ensure
compatibility.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Common modes are added to LVDS for compatibility in clone mode, but
not all panels support them. Non-native modes were disabled in the past
but this caused problems because compositors didn't use scaling for non
native modes. Now non-native modes on LVDS will enable the scaler by
default.
[How]
Check the connector type. If the connector is LVDS avoid adding common
modes.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a parameter to amdgpu_sdma_reset_engine() to let the
caller handle the kernel rings. This allows the kernel
rings to back up their unprocessed state if the reset comes in
via the drm scheduler rather than KFD.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move the force completion handling into the common
engine reset function. No need to duplicate it for
every IP version.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If ring reset is disabled, skip resetting queues. Instead, fall back to
device based reset.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For extended wait with retries on a PSP register value, add a noverbose
flag to avoid excessive error messages on each timeout.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
There is a small typo in phm_wait_on_indirect_register().
Swap mask and value arguments provided to phm_wait_on_register() so that
they satisfy the function signature and actual usage scheme.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
In practice this doesn't fix any issues because the only place this
function is used uses the same value for the value and mask.
Fixes: 3bace3591493 ("drm/amd/powerplay: add hardware manager sub-component")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Need to reinit the ring before remapping it and all of
the KIQ handling needs to be within the kiq lock.
Fixes: 1741281a157f ("drm/amdgpu/gfx10: add ring reset callbacks")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Pass amdgpu device context instead of drm device context to some
amdgpu_device_* functions. DRM device context is not required in those
functions. No functional change.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add debugfs support for mqd for each queue of the client.
The address exposed to debugfs could be used to dump
the mqd.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250704075548.1549849-5-sunil.khatri@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
Add a debugfs file under the client directory which shares
the root page table base address of the VM.
This address could be used to dump the pagetable for debug
memory issues.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250704075548.1549849-4-sunil.khatri@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.17-2025-07-01:
amdgpu:
- FAMS2 fixes
- OLED fixes
- Misc cleanups
- AUX fixes
- DMCUB updates
- SR-IOV hibernation support
- RAS updates
- DP tunneling fixes
- DML2 fixes
- Backlight improvements
- Suspend improvements
- Use scaling for non-native modes on eDP
- SDMA 4.4.x fixes
- PCIe DPM fixes
- SDMA 5.x fixes
- Cleaner shader updates for GC 9.x
- Remove fence slab
- ISP genpd support
- Parition handling rework
- SDMA FW checks for userq support
- Add missing firmware declaration
- Fix leak in amdgpu_ctx_mgr_entity_fini()
- Freesync fix
- Ring reset refactoring
- Legacy dpm verbosity changes
amdkfd:
- GWS fix
- mtype fix for ext coherent system memory
- MMU notifier fix
- gfx7/8 fix
radeon:
- CS validation support for additional GL extensions
- Bump driver version for new CS validation checks
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250701194707.32905-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|