Age | Commit message (Collapse) | Author | Files | Lines |
|
The RB bitmap should be global active RB bitmap &
active RB bitmap based on active SA.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
[Why]
set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE
packet to init shadow memory will be skiped, hence there has a page fault.
[How]
VM flush is needed after GPU reset, clear set_q_mode_offs when
emitting VM flush.
Fixes: 8bc75586ea01 ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In
certain configs, VCN instance can be exclusively allocated to a
partition even under CPX mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
fix the high voltage issue after unload on smu 13.0.10
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA_CNTL is not set in some cases, driver configures it by itself.
v2: simplify code
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch to add smu 14.0.1 support
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
From MES version 0x54, the log entry increased and require the log buffer
size to be increased. The 16k is maximum size agreed
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The MES log might slow down the performance for extra step of log the data,
disable it by default and introduce a parameter can enable it when necessary
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For SOC21 ASICs, there is an issue in re-enabling PM features if a
suspend got aborted. In such cases, reset the device during resume
phase. This is a workaround till a proper solution is finalized.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Otherwise the old one will be used during GPU reset.
That's not expected.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Downgrade to debug information when IBs are skipped. Also, use dev_* to
identify the device.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
There are cases where soft reset seems to succeed, but
does not, so always use mode1/2 for now.
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
This patch to differentiate external rev id for gfx 11.5.0.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The RB bitmap should be global active RB bitmap &
active RB bitmap based on active SA.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5.
2. set fatel error detected flag.
Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In case of fatal errors, set FED status when interrupt is received. Set
the flag on other devices in the hive before RAS recovery work.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE
packet to init shadow memory will be skiped, hence there has a page fault.
[How]
VM flush is needed after GPU reset, clear set_q_mode_offs when
emitting VM flush.
Fixes: 8bc75586ea01 ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
RAS TA will handle it, the function is useless.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
support MES command SET_HW_RESOURCE1 in sriov
Signed-off-by: chongli2 <chongli2@amd.com>
Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Acked-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Check more possible ext error codes.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In
certain configs, VCN instance can be exclusively allocated to a
partition even under CPX mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Optimize the code to add support for BAMACO mode checking
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
ASD is not needed by headless GPU.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add support for MACO flag checking.
MACO mode only works if BACO is supported.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
fix the high voltage issue after unload on smu 13.0.10
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Apply this rule to all newer asics in sriov case.
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.
Moved the check to amdgpu_device_init() to ensure it is done after
amdgpu_device_ip_early_init() where the IP versions are discovered.
Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable pipe1 support starting from SIENNA CICHLID asic
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use correct ref/mask for differnent gfx ring pipe.
This should fix the gfx hang issue after enabling gfx pipe1.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA_CNTL is not set in some cases, driver configures it by itself.
v2: simplify code
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch to add smu 14.0.1 support
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
From MES version 0x54, the log entry increased and require the log buffer
size to be increased. The 16k is maximum size agreed
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The MES log might slow down the performance for extra step of log the data,
disable it by default and introduce a parameter can enable it when necessary
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
refine function signature of amdgpu_aca_get_error_data();
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For SOC21 ASICs, there is an issue in re-enabling PM features if a
suspend got aborted. In such cases, reset the device during resume
phase. This is a workaround till a proper solution is finalized.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Add FW information of all the IP's in the devcoredump.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Otherwise the old one will be used during GPU reset.
That's not expected.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Downgrade to debug information when IBs are skipped. Also, use dev_* to
identify the device.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Two misc-next in one.
drm-misc-next for v6.10-rc1:
The deal of a lifetime! You get ALL of the previous
drm-misc-next-2024-03-21-1 tag!!
But WAIT, there's MORE!
Cross-subsystem Changes:
- Assorted DT binding updates.
Core Changes:
- Clarify how optional wait_hpd_asserted is.
- Shuffle Kconfig names around.
Driver Changes:
- Assorted build fixes for panthor, imagination,
- Add AUO B120XAN01.0 panels.
- Assorted small fixes to panthor, panfrost.
drm-misc-next for v6.10:
UAPI Changes:
- Move some nouveau magic constants to uapi.
Cross-subsystem Changes:
- Move drm-misc to gitlab and freedesktop hosting.
- Add entries for panfrost.
Core Changes:
- Improve placement for TTM bo's in idle/busy handling.
- Improve drm/bridge init ordering.
- Add CONFIG_DRM_WERROR, and use W=1 for drm.
- Assorted documentation updates.
- Make more (drm and driver) headers self-contained and add header
guards.
- Grab reservation lock in pin/unpin callbacks.
- Fix reservation lock handling for vmap.
- Add edp and edid panel matching, use it to fix a nearly identical
panel.
Driver Changes:
- Add drm/panthor driver and assorted fixes.
- Assorted small fixes to xlnx, panel-edp, tidss, ci, nouveau,
panel and bridge drivers.
- Add Samsung s6e3fa7, BOE NT116WHM-N44, CMN N116BCA-EA1,
CrystalClear CMT430B19N00, Startek KD050HDFIA020-C020A,
powertip PH128800T006-ZHC01 panels.
- Fix console for omapdrm.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/bea310a6-6ff6-477e-9363-f9f053cfd12a@linux.intel.com
|
|
Backmerging to get v6.9-rc2 changes into drm-misc-next.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.
Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_HDMI_HELPER to depend on it.
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-12-eafee11b84b3@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.
Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_HDCP_HELPER to depend on it.
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-11-eafee11b84b3@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.
Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_DP_HELPER to depend on it.
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-10-eafee11b84b3@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.
Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_HELPER to depend on it.
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-8-eafee11b84b3@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
An errant disk backup on my desktop got into debugfs and triggered the
following deadlock scenario in the amdgpu debugfs files. The machine
also hard-resets immediately after those lines are printed (although I
wasn't able to reproduce that part when reading by hand):
[ 1318.016074][ T1082] ======================================================
[ 1318.016607][ T1082] WARNING: possible circular locking dependency detected
[ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted
[ 1318.017598][ T1082] ------------------------------------------------------
[ 1318.018096][ T1082] tar/1082 is trying to acquire lock:
[ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80
[ 1318.019084][ T1082]
[ 1318.019084][ T1082] but task is already holding lock:
[ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.020607][ T1082]
[ 1318.020607][ T1082] which lock already depends on the new lock.
[ 1318.020607][ T1082]
[ 1318.022081][ T1082]
[ 1318.022081][ T1082] the existing dependency chain (in reverse order) is:
[ 1318.023083][ T1082]
[ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0
[ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90
[ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330
[ 1318.025683][ T1082] do_one_initcall+0x6a/0x350
[ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310
[ 1318.026728][ T1082] kernel_init+0x15/0x1a0
[ 1318.027242][ T1082] ret_from_fork+0x2c/0x40
[ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20
[ 1318.028281][ T1082]
[ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
[ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330
[ 1318.029790][ T1082] do_one_initcall+0x6a/0x350
[ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310
[ 1318.030722][ T1082] kernel_init+0x15/0x1a0
[ 1318.031168][ T1082] ret_from_fork+0x2c/0x40
[ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20
[ 1318.032011][ T1082]
[ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}:
[ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680
[ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0
[ 1318.033487][ T1082] __might_fault+0x58/0x80
[ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu]
[ 1318.034181][ T1082] full_proxy_read+0x55/0x80
[ 1318.034487][ T1082] vfs_read+0xa7/0x360
[ 1318.034788][ T1082] ksys_read+0x70/0xf0
[ 1318.035085][ T1082] do_syscall_64+0x94/0x180
[ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.035664][ T1082]
[ 1318.035664][ T1082] other info that might help us debug this:
[ 1318.035664][ T1082]
[ 1318.036487][ T1082] Chain exists of:
[ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[ 1318.036487][ T1082]
[ 1318.037310][ T1082] Possible unsafe locking scenario:
[ 1318.037310][ T1082]
[ 1318.037838][ T1082] CPU0 CPU1
[ 1318.038101][ T1082] ---- ----
[ 1318.038350][ T1082] lock(reservation_ww_class_mutex);
[ 1318.038590][ T1082] lock(reservation_ww_class_acquire);
[ 1318.038839][ T1082] lock(reservation_ww_class_mutex);
[ 1318.039083][ T1082] rlock(&mm->mmap_lock);
[ 1318.039328][ T1082]
[ 1318.039328][ T1082] *** DEADLOCK ***
[ 1318.039328][ T1082]
[ 1318.040029][ T1082] 1 lock held by tar/1082:
[ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.040560][ T1082]
[ 1318.040560][ T1082] stack backtrace:
[ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2
[ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023
[ 1318.041614][ T1082] Call Trace:
[ 1318.041895][ T1082] <TASK>
[ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80
[ 1318.042460][ T1082] check_noncircular+0x145/0x160
[ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680
[ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0
[ 1318.043301][ T1082] ? __might_fault+0x40/0x80
[ 1318.043580][ T1082] ? __might_fault+0x40/0x80
[ 1318.043856][ T1082] __might_fault+0x58/0x80
[ 1318.044131][ T1082] ? __might_fault+0x40/0x80
[ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab]
[ 1318.044749][ T1082] full_proxy_read+0x55/0x80
[ 1318.045042][ T1082] vfs_read+0xa7/0x360
[ 1318.045333][ T1082] ksys_read+0x70/0xf0
[ 1318.045623][ T1082] do_syscall_64+0x94/0x180
[ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100
[ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d
[ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
[ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d
[ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008
[ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007
[ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00
[ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800
[ 1318.050638][ T1082] </TASK>
amdgpu_debugfs_mqd_read() holds a reservation when it calls
put_user(), which may fault and acquire the mmap_sem. This violates
the established locking order.
Bounce the mqd data through a kernel buffer to get put_user() out of
the illegal section.
Fixes: 445d85e3c1df ("drm/amdgpu: add debugfs interface for reading MQDs")
Cc: stable@vger.kernel.org # v6.5+
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Share same codes with 4.0.5 and enable collaborate mode for VPE.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Align with FW changes.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If the system hasn't entered GFXOFF when suspend starts it can cause
hangs accessing GC and RLC during the suspend stage.
Cc: <stable@vger.kernel.org> # 6.1.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback")
Cc: <stable@vger.kernel.org> # 6.1.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
Cc: <stable@vger.kernel.org> # 6.1.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()")
Cc: <stable@vger.kernel.org> # 6.1.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend")
Cc: <stable@vger.kernel.org> # 6.6.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback")
Cc: <stable@vger.kernel.org> # 6.6.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
Cc: <stable@vger.kernel.org> # 6.6.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()")
Cc: <stable@vger.kernel.org> # 6.6.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend")
Cc: <stable@vger.kernel.org> # 6.1+
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132
Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
To fix mode2 reset failure.
Should power on VPE when hw_init.
Signed-off-by: Peyton Lee <peytolee@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
There are cases where soft reset seems to succeed, but
does not, so always use mode1/2 for now.
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
An errant disk backup on my desktop got into debugfs and triggered the
following deadlock scenario in the amdgpu debugfs files. The machine
also hard-resets immediately after those lines are printed (although I
wasn't able to reproduce that part when reading by hand):
[ 1318.016074][ T1082] ======================================================
[ 1318.016607][ T1082] WARNING: possible circular locking dependency detected
[ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted
[ 1318.017598][ T1082] ------------------------------------------------------
[ 1318.018096][ T1082] tar/1082 is trying to acquire lock:
[ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80
[ 1318.019084][ T1082]
[ 1318.019084][ T1082] but task is already holding lock:
[ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.020607][ T1082]
[ 1318.020607][ T1082] which lock already depends on the new lock.
[ 1318.020607][ T1082]
[ 1318.022081][ T1082]
[ 1318.022081][ T1082] the existing dependency chain (in reverse order) is:
[ 1318.023083][ T1082]
[ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0
[ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90
[ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330
[ 1318.025683][ T1082] do_one_initcall+0x6a/0x350
[ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310
[ 1318.026728][ T1082] kernel_init+0x15/0x1a0
[ 1318.027242][ T1082] ret_from_fork+0x2c/0x40
[ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20
[ 1318.028281][ T1082]
[ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
[ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330
[ 1318.029790][ T1082] do_one_initcall+0x6a/0x350
[ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310
[ 1318.030722][ T1082] kernel_init+0x15/0x1a0
[ 1318.031168][ T1082] ret_from_fork+0x2c/0x40
[ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20
[ 1318.032011][ T1082]
[ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}:
[ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680
[ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0
[ 1318.033487][ T1082] __might_fault+0x58/0x80
[ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu]
[ 1318.034181][ T1082] full_proxy_read+0x55/0x80
[ 1318.034487][ T1082] vfs_read+0xa7/0x360
[ 1318.034788][ T1082] ksys_read+0x70/0xf0
[ 1318.035085][ T1082] do_syscall_64+0x94/0x180
[ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.035664][ T1082]
[ 1318.035664][ T1082] other info that might help us debug this:
[ 1318.035664][ T1082]
[ 1318.036487][ T1082] Chain exists of:
[ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[ 1318.036487][ T1082]
[ 1318.037310][ T1082] Possible unsafe locking scenario:
[ 1318.037310][ T1082]
[ 1318.037838][ T1082] CPU0 CPU1
[ 1318.038101][ T1082] ---- ----
[ 1318.038350][ T1082] lock(reservation_ww_class_mutex);
[ 1318.038590][ T1082] lock(reservation_ww_class_acquire);
[ 1318.038839][ T1082] lock(reservation_ww_class_mutex);
[ 1318.039083][ T1082] rlock(&mm->mmap_lock);
[ 1318.039328][ T1082]
[ 1318.039328][ T1082] *** DEADLOCK ***
[ 1318.039328][ T1082]
[ 1318.040029][ T1082] 1 lock held by tar/1082:
[ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.040560][ T1082]
[ 1318.040560][ T1082] stack backtrace:
[ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2
[ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023
[ 1318.041614][ T1082] Call Trace:
[ 1318.041895][ T1082] <TASK>
[ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80
[ 1318.042460][ T1082] check_noncircular+0x145/0x160
[ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680
[ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0
[ 1318.043301][ T1082] ? __might_fault+0x40/0x80
[ 1318.043580][ T1082] ? __might_fault+0x40/0x80
[ 1318.043856][ T1082] __might_fault+0x58/0x80
[ 1318.044131][ T1082] ? __might_fault+0x40/0x80
[ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab]
[ 1318.044749][ T1082] full_proxy_read+0x55/0x80
[ 1318.045042][ T1082] vfs_read+0xa7/0x360
[ 1318.045333][ T1082] ksys_read+0x70/0xf0
[ 1318.045623][ T1082] do_syscall_64+0x94/0x180
[ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100
[ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d
[ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
[ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d
[ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008
[ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007
[ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00
[ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800
[ 1318.050638][ T1082] </TASK>
amdgpu_debugfs_mqd_read() holds a reservation when it calls
put_user(), which may fault and acquire the mmap_sem. This violates
the established locking order.
Bounce the mqd data through a kernel buffer to get put_user() out of
the illegal section.
Fixes: 445d85e3c1df ("drm/amdgpu: add debugfs interface for reading MQDs")
Cc: stable@vger.kernel.org # v6.5+
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|